It has become commonplace in science-fiction: Men and women talk directly to computers as if they were other people. The machines understand spoken language and except for Hal, the smart-computer-turned-bad in the movie "2001" respond immediately to people's needs.
University of Colorado researchers are trying to make reality out of that fiction.
For now, their goals are slightly less audacious than computers that understand all spoken language. John Hansen and his colleagues at the University of Colorado's Center for Spoken Language Research are developing computers that understand the words and grammar necessary for guiding a driver through a new city, for example.
A driver might ask "Where's the nearest Chinese restaurant?" and get a quick answer through her cell phone.
The technology could revolutionize phone systems, too, and might help disabled people.
"Let's say someone doesn't have use of their hands," Hansen said. "This could give them access to information on the Web, or voice control of a wheelchair."
And instead of pressing "1" then "4" then "27" on a bank's touch-tone dialing system, a caller might say "Um, I guess I need to ask someone about a bounced check," and a server would send him to the right department.
Hansen said he's not sure how soon consumers will see such natural language processors perfected and made widely available. It'll be a couple years, at least, depending on research funding and the interest of private companies, he said.
Putting people in control
Hansen is working on a smart program that can understand the needs of drivers, find addresses, directions and other information online, and pass that information on to drivers. He criticizes current navigation systems as passive, unintelligent and often disorienting.
"There's a lot of cognitive work load on the individual," he said "We'd like to relieve that."
His "mixed initiative dialogue" system lets people take control at times, forcing the computer to respond. "Uh, where am I?" is a fair phrase. So is "Look, I'm lost. I just passed Speer Boulevard. How do I find the Pepsi Center?"
Hansen already has a navigation program that works usually for the city of Boulder. But there are problems. One is that people don't all pronounce "center" or "Broadway" exactly the same.
"The ability to recognize speech given regional variation is enormously challenging," he said.
It is also technically difficult to filter out typical road noises, from automatic windows and passing trucks.
But thanks to advances in computer speed, it's possible to train computers to filter out useless words and phrases um, uh, you know or to determine the meaning of sentences, said Jim Glass, a principal research scientist in the Massachusetts Institute of Technology's Spoken Language Systems Laboratory.
Glass, whose research goals are similar to Hansen's, said it's important to improve communication between people and computers, for disabled people and others.
"It will benefit everybody in society," Glass said. "It's clear that there's a need for people to be able to interact more naturally with machines, and speech is clearly a modality that's very natural for humans."
Hansen and Glass both talk about the day when people can talk with their computers as easily as astronauts did with Hal in "2001."
"Even though it's 2001, we don't have anything like Hal on the horizon," Glass admitted. "These next generation systems that are in research labs will be more sophisticated, more flexible (than today's), but they're still constrained."
In Glass's laboratory, for example, research projects are tightly focused around particular sources of information weather, for example.
With a program called "Jupiter," callers can request information about weather in cities throughout the world. If the computer doesn't recognize the town name, it prompts the user with questions such as "Try asking, 'For what cities in this state do you have information?'"
Ready for testing
At the MIT and University of Colorado laboratories and others, researchers have set up functional smart voice programs on toll-free numbers, for collecting human voices data and testing.
Hansen is also collecting data in the field because he's determined to make his systems broadly useful in noisy and distracting real-world environments.
In a recent field test, Hansen drove a sport-utility vehicle while research assistant Jay Plucienkowski, his former graduate student, sat in the back with recording devices. As a passenger and test subject, I read from a gray laptop computer strapped to the dashboard.
Words, phrases and numbers scrolled down the screen and high-tech microphones attached to the windshield captured my voice.
While I read phonetically balanced sentences, Hansen rolled the car windows up and down. He flipped door locks, turned on and off the windshield wipers, turned the radio on low and used his turn signals.
The intention: Obtain challenging data for training his computer systems to filter out nontarget noises.
"Voice prints" are usually sent back to his laboratory via cell phone, where computers try to filter out that extraneous noise. Programs called "dialogue managers" extract useful words and their meaning from the voice prints, and look up directions and addresses through online map and phone directories.