"Command & Control" speech recognition allows the user to speak a word, phrase, or sentence from a list of phrases that the computer is expecting to hear. For example, a user might be able to speak the command, "Send mail to Fred Smith", "Send mail to Bob Jones", or "Turn on the television."
The number of different commands a user might speak at any time can easily number in the hundreds. Furthermore, the commands are not just limited to a "list" but can also contain other fields, like "Send mail to
In general, use Command and Control recognition when:
If an application uses speech recognition solely to impress people, it will work well for demos but will not be used by real users.
Command and Control recognition might be used in some of the following situations:
The specific use of command and control recognition will depend on the application. Here are some sample ideas and their uses:
Games and Edutainment
Game and edutainment software titles will be some of the heaviest users of Command & Control speech recognition in the near term. Christmas 1995 saw the appearance of several games and half a dozen language-learning titles that use speech recognition. Only high-end machines sold in Christmas 1995 could run speech recognition. Because memory and CPU in machines will increase, and because of the introduction of Microsoft's speech API, many more games and edutainment titles in future Christmas markets will be using speech recognition.
What do these titles use speech recognition for?
One of the most compelling uses of speech recognition technology is in interactive verbal exchanges and conversation with computer-based characters. With games, for example, traditional computer-based characters can now evolve into characters that the user can actually talk to.
While speech recognition enhances the realism and fun in many computer games, it also provides a useful alternative to keyboard-based control of games and applications-voice commands provide new freedom for the user in all sorts of applications, from entertainment to productivity.
Many applications such as database front-ends and spreadsheets require users to keyboard paper-based data into the computer. It is much easier for users to read the data directly to the computer, and speech recognition can significantly speed up data entry.
A data-entry application can use speech recognition if the data is specific enough. While speech recognition cannot effectively be used to enter names, it's very good at entering numbers and selecting items out of a small (less than 100) list. Some recognizers can even handle spelling fairly well. If an application uses speech recognition, the user no longer has to look at the keyboard. If speech recognition is combined with text to speech playback of the recognized entry, then the user doesn't even need to look at the screen, and is able to focus on the paper.
Furthermore, because speech recognition is not as "modal" as a keyboard, some applications don't even need to require a specific field to have focus. If the form that is being filled in has fields with mutually exclusive data types -- one field allows "male" or "female", the other is an age, and the third is a city -- then speech recognition can hear the command and automatically determine which field to fill in. After all, if only one field accepts "New York City" as a valid entry and the user speaks "New York City" then the application knows which field to fill in.
Command and control recognition is useful for document editing when the user wishes to keep his/her hands on the keyboard to type, or on the mouse to drag and select. He/she can simultaneously speak commands for manipulating the data that he/she is working on. A word processor might provide commands like "bold", "italic", "change to Times New Roman font", "use bullet list text style," and "use 18 point type". A paint package might have "select eraser" or "choose a wider brush."
Of course, there are users who won't find speaking a command to be preferable to using keyboard equivalents. People have been using keyboard equivalents for so long that the combinations have become for them a routine part of program control. But for many (if not most) people, keyboard equivalents are a lot of unused shortcuts. Voice commands will provide these users with the means to execute a command without first mousing through cascading menus.
For a full description of telephony, see the Text-To-Speech for Telephony article.
A speech application requires certain hardware and software on the user's computer to run. Not all computers have the memory, speed, microphone, or speakers required to support speech, so it is a good idea to design the application so that speech is optional.
These hardware and software requirements should be considered when designing a speech application:
For a list of engine vendors that support the Speech API, see the ENGINE.DOC file included with the Speech Software Development Kit.
Currently, even the most sophisticated speech recognition engine has limitations that affect what it can recognize and how accurate the recognition will be. The following list illustrates many of the limitations found today. The limitations do pose some problems, but they do not prevent the design and development of applications that use voice commands.
Microphones and sound cards
The microphone is the largest problem that speech recognition encounters. Microphones inherently have the following problems:
Most applications can do little about the microphone. Luckily, when the user installs the speech recognition engine, the engine should come with software that makes sure the user's microphone is correctly plugged in and working.
Problems with ambient noise
In general, the user should be using a microphone as close to his/her mouth as possible to reduce noise coming from their environment. Users in quiet environments can afford to have the microphone positioned several feet away, but users in noisier environments, such as office cubicles, will need a headset that positions the microphone a few centimeters from the mouth. Unfortunately, speech recognition is limited in its utility for many people in noisy environments, because they find the headset uncomfortable, or its cord restrictive.
Problems with computer generated sounds
What's often worse than ambient noises are intentional sounds like the sounds generated by the user's computer and played through a powerful stereo system.
There are several ways to make sure that the microphone isn't hearing the speakers:
Half-Duplex Sound Cards
Many sound cards are only "half duplex" (as opposed to "full duplex"). If a sound card is "half duplex" it cannot record and play audio at the same time. For speech recognition, half-duplex sound cards cannot be listening while the card is playing sound. Fortunately, with plug-and-play the number of full duplex sound cards is increasing.
Speech Recognition Likes to Hear
Speech recognition engines like to hear -- no surprise. They like to hear so much that if the user is having a phone conversation in the room while speech recognition is listening, the recognizer will think that the user is talking to it, and it will hear random words. Sometimes the speech recognizer even hears a sound, like a slamming door, as words.
There are several ways to overcome this obstacle:
Command and Control Engines need exact commands
Before an application starts a command and control recognizer listening it must first give the recognizer a "list" of commands to listen for. The list might include commands like "minimize window," "make the font bold," "call extension
If the user speaks the command as it is written they are going to get very good accuracy. However, if they word the command differently (and the application hasn't provided the alternate wording) then recognition will either not recognize anything or, even worse, it will recognize something completely different. So, if a user speaks, "bold that" instead of "make the font bold" there's a pretty good chance that the computer will hear "minimize window".
Applications can work around this problem by:
Over time speech recognizers will start applying natural language processing and this problem will go away.
Speech Recognizers make mistakes
Speech recognizers make mistakes, and will always make mistakes. The only thing that is changing is that every two years recognizers make half as many mistakes as they did before. But, no matter how great a recognizer is it will always make mistakes.
An application can minimize some of the misrecognitions by:
Some other problems crop up:
Here are some design considerations for applications using command and control speech recognition.
Design Speech Recognition in From the Start
Don't make the mistake of using Speech Recognition as an add-on feature. It's poor design to just bolt speech recognition onto an application that is designed for a mouse and keyboard. Applications designed for just the keyboard and mouse get little benefit from speech recognition. After all, how many DOS applications that were designed for just the keyboard came up with effective uses for the mouse?
Do Not Replace the Keyboard and Mouse
Speech recognition is not a replacement for the keyboard and mouse. In some, but not all, circumstances it is a better input device than they keyboard/mouse. Speech recognition makes a terrible pointing device, just like the mouse makes a terrible text entry device, or the keyboard is bad for drawing. When speech recognition systems were first bolted onto the PC, it was thought that speaking menu names would be really useful. As it turns out, very few users use speech recognition to access a window menu because the mouse is much faster and easier.
Generally speaking, every feature in an application should be accessible from all input devices, keyboard, mouse, and speech recognition. Users will naturally use whichever input mechanism provides them the quickest or easiest access to the feature. The ideal input device for a feature may vary from user to user.
Work Around Recognizer Limitations
Speech recognizers have a lot of limitations, as listed in the previous section. Make sure that the application isn't using or requiring speech recognition to be used for purposes where it performs poorly.
Communicate Speech Awareness
Since most applications today do not include speech recognition, users will find speech recognition a new technology. They probably won't assume that your application has it, and won't know how to use it.
When you design a speech recognition application, it is important to communicate to the user that your application is speech-aware and to provide him or her with the commands it understands. It is also important to provide command sets that are consistent and complete.
Managing User Expectations
When users hear that they can speak to their computers, they instantly think of Star Trek and 2001: A Space Odyssey, expecting that the computer will correctly transcribe every word that they speak, understand it, and then act upon it in an intelligent manner.
You should convey as clearly as possible exactly what an application can and cannot do and emphasize that the user should speak clearly, using words the application understands.
Communicating the Command Set
A graphic user interface provides users with tremendous feedback about what they can do by displaying menus, buttons, and other controls on the screen. Furthermore, the keyboard and mouse typically do not send erroneous signals to the application.
This is not so with speech recognition. The number of voice commands that can be recognized at any given time can easily number in the hundreds, perhaps even thousands. Although cues for the most likely commands could be displayed on the screen, it is impossible to display the full set at once. To compensate, an application can provide mechanisms to scan through the large list of active commands or can prompt the user for the most common voice responses through visuals or text-to-speech. For example, the application might say "Do you want to save the file? Say Yes or No." If the application does not recognize a command, it can also provide more extensive help. For example, "Please say either Yes or No, or say Help if you need more help."
Providing Feedback to the User
Whenever a voice command is spoken, you should give some sort of feedback to the user indicating that the command was understood and acted upon. Visual indications are usually sufficient, but if it is impossible to have noticeable visuals, you should verify with a short text-to-speech or recorded phrase.
Breaking Up Long Series of Numbers
Most engines have a very high error rate for long series of digits that are spoken continuously. For phone numbers or other long series, either break the number into groups of four or fewer digits or have the user speak each digit as an isolated word.
Avoiding Speech for Directional Commands
Speech input should not be used as a means of moving the mouse cursor because it is inefficient and annoying for the user. For example, the user would need to repeat directional commands, such as up, many times in succession to move the cursor to the desired screen position.
Where the Engine Comes From
Of course, for speech recognition to work on an end user's PC the system must have a speech recognition engine installed on it. The application has two choices:
The user may already have speech recognition because many PCs and sound cards will come bundled with an engine. Alternatively, the user may have purchased another application that included an engine. If the user has no speech recognition engine installed then the application can tell the user that they need to purchase a speech recognition engine and install it. Several engine vendors offer retail versions of their engines.