Previous post:

Next post:

Voice Recognition Software Review: Dragon NaturallySpeaking 8.0 and IBM ViaVoice 10

January 1, 2005

in All Articles,Audio,Books and documents

The casually-conversing HAL 2000 computer in the movie 2001: A Space Odyssey makes it look easy. From “Danger, Will Robinson!” to the Star Wars’ language translation droid C-3PO, the movies continually assume that computers can and will understand the spoken word as well as any human.

It’s not that easy. When you consider the use of slang, accents, enunciation (or lack thereof, in my case) and the constantly-evolving rules of language, it’s a little easier to understand why we’re not yet talking to our TV’s remote control.

Heck, if my cellphone can’t even tell the difference between “Call Mom” and “Call Bob”, can a computer do that much better? Happily, the answer is yes.

Talk into a microphone and the computer types what you say. That’s an easy way to describe voice recognition software like Dragon NaturallySpeaking and IBM ViaVoice. Morphing through various forms since 1994, Dragon NaturallySpeaking is arguably the best voice recognition software available today. IBM’s “40 years of commitment to speech research and development” have in part lead to the ViaVoice software. This article compares these two as well as providing general comments on voice recognition technology. (If we want to get picky, “speech recognition” is the process of converting speech to text, and “voice recognition” is the process of identifying a person by their voice alone. However, for this article, I’m using “voice recognition” to describe these software packages since the correct version of this term is rarely used correctly.)

While functionality and options are compared elsewhere (IBM Viavoice or Dragon NaturallySpeaking), I’m focusing on recognition quality and ease-of-use. The versions used for this comparison review were Dragon NaturallySpeaking 8 Professional Editionand IBM ViaVoice 10 Pro USB Edition.

Dragon NaturallySpeaking pricing starts at $100. You’ll get a version targeted towards users who focus primarily on e-mail and document creation. Advanced features (including editions tuned specifically for medical and legal workers) are available for an extra cost.

IBM ViaVoice pricing starts at $30. More expensive versions give you ease-of-use enhancements and functionality better suited to professional environments.

Note that for each product, there is no difference in the speech engines throughout the price ranges. More money means you are paying for increased functionality and better microphone technology. (The microphones that come with the software are adequate, but dedicated users should upgrade. Click for microphone recommendations.)

Benefits

So you grew up using keyboards. You take pride in your “Fast Fingers” trophy from high school typing class. You can type 90 words per minute. Why would you be interested in voice recognition technology? The short answer is that with practice, you can do even better.

The longer answer is there are many applications for which voice recognition software can improve your productivity. This could benefit people like these:

People who do dictation or lots of typing. Often speaking your thoughts is much faster than typing them.

People who are disabled or can’t type easily, like carpel-tunnel sufferers. This is helpful even beyond typing. The software mentioned here allows you to take full control of your computer using just your voice, enabling things like web navigation and mouse clicking.

Fast thinkers who can think faster than they can type. I know a few creative types like this. Their brain works faster than their fingers can process. Luckily, their mouths can keep up.

Dictation. You can record your voice using in many formats (digital recorder, tape recorder, handheld device, etc.) and then play it back later for conversion to text. See a hardware compatibility list of supported dictation devices and other hardware, or detail about transcribing dictiation.

Setup and Training

I hate it when I purchase something that turns out to be a huge box with a single CD rattling around inside. Happily, neither product is a waste of packaging. Both boxes are no bigger than they need to be, and are packed with the software media, documentation and microphone headsets.

Dragon NaturallySpeaking: Installation took about five minutes and 205 megs. Initial microphone tests on my laptop (an IBM ThinkPad T41) failed. This worried me at first, but after further testing I found that what Dragon NaturallySpeaking called a “failure” still worked fine. Reading the help text, I found documentation about my problem. Some laptops cause extra microphone interference when plugged in. This was what was happening to me. If I unplugged my laptop, the interference went away and the microphone test passed. (Although Dragon NaturallySpeaking was detecting more background noise with the AC plugged in, it made no difference in later performance tests.) Dragon NaturallySpeaking comes with a microphone that plugs directly into your computer’s microphone and speaker jacks. The microphone cable is about 6 feet long.

IBM ViaVoice: Installation took about three minutes and 300 megs. However, there were a few confusing issues. During initial set up and training, my firewall interrupted me about 10 times as IBM ViaVoice continually tried to connect to the Internet. This is bad installation etiquette. Software should warn you before it tries for a Net connection. Hidden communication, for whatever reason, is impolite at best and dangerous at worst (see adware and spyware as examples). The USB-connected microphone setup worked perfectly on the first try. The microphone cable is about 8 feet long.

After installation and microphone setup, both software packages offered to scan my computer’s documents to learn a little bit more about my writing habits. Dragon NaturallySpeaking was also able to scan my e-mail. This was training step number one.

Step two for both programs was to have me read text aloud so the software could listen to my voice and learn my speaking patterns. This text was supplied by the software, and this process took about 15 minutes. After this, I was ready to really start using the software. (Further training helps recognition even more, but both packages could perform fairly well at this point.)

Short-term use

As my first test, I just grabbed one of my ever-present science-fiction classics and started reading. Reading, that is, in the way appropriate for voice-to-text conversion. Consider this quote by Mark Twain:

“Why shouldn’t truth be stranger than fiction? Fiction, after all, has to make sense.”

We would dictate it to voice recognition software like this:

“why shouldn’t truth be stranger than fiction question-mark [pause] fiction comma after all comma has to make sense period”

You speak the punctuation, and insert short pauses to separate phrases or sentences. Talk at normal human-talking speed, but enunciate more carefully than normal. Realize that the software isn’t just recognizing words, but patterns and context. Martin Markoe fromeMicrophones, Inc. provides more information (original text at this link):

“Speech Recognition software works not only by decoding the sounds (phonemes) but through the use of the context of the words before or after each word. It uses hundreds of thousands of tables of probabilities, developed by linguists who scour a country for regional speech samples, called bigrams and trigrams. Bigrams are two word phrases and trigrams three word phrases. Accuracy is greatly improved not only by clear enunciation but by speaking in phrases.”

Dragon NaturallySpeaking: It performed well. Recognition was in the mid or high 90 percentile.

IBM ViaVoice: Translation accuracy was similar to Dragon NaturallySpeaking, but in other aspects in didn’t perform as well. Specifically, I had application problems. I was using Microsoft Word 2003 as my “vocal word processor”. Dragon NaturallySpeaking could dictate to the software just fine. IBM ViaVoice did not. What’s more, IBM ViaVoice processed information slower than Dragon NaturallySpeaking: When I spoke, IBM ViaVoice took noticeably longer to get my words onto the screen.

Long term use

Over time, the differences between both packages grew.

Dragon NaturallySpeaking: Recognition ended up performing very well. After more training (which included reading more supplied text as well as manually tweaking certain words and phrases) I was able to get Dragon NaturallySpeaking performance consistently near 100%. As a techie, I also appreciated advanced features allowing me access to some of the software’s nuts and bolts.

IBM ViaVoice: In my opinion, this software is not yet ready for prime time. First, it’s buggy, which is a big surprise coming from provider IBM. I experienced program hangs and significant mistranslations. (This was not due to poor initial set up or training, as I re-trained multiple profiles and used different microphones and different Windows sound configurations to rule out this possibility.) Second, it’s slower than Dragon NaturallySpeaking. Translation takes noticeably longer, even on my fairly zippy computer. Third, cross-software compatibility is poor. I can dictate fairly well using IBM ViaVoice’s provided word processor, but any time I try to dictate to an application (like Microsoft Word, Microsoft Outlook, or even Windows Notepad), performance degrades. According to the IBM ViaVoice help documentation, this should not happen, but it does in my case.

The wordy conclusion

To achieve improved voice-to-text translation, I spent about four hours of additional training with both packages. This was primarily reading from provided training texts. But because of the problems mentioned above with IBM ViaVoice, I stopped using it and finished the testing and this article with Dragon NaturallySpeaking.

There is a reason for IBM ViaVoice’s bad performance: It was purchased by ScanSoft, the same company that owns Dragon NaturallySpeaking. Dragon NaturallySpeaking’s development is ongoing, ViaVoice is no longer supported. Most likely, no further updates or versions to ViaVoice will be available.

There are a couple of speed bumps with voice recognition:

Learning curve. I’m a pretty fast typist, and have gotten to the point where words in my brain get to my fingers quickly. It was a significant readjustment for me to send those same words to my mouth instead. Yes, you can simply think of this as talking out loud to your computer, but when you’re trying to compose text it adds a level of difficulty. While I’m still not 100% comfortable with dictation, I can say that I can generate text with voice faster than I can type it. (In a comparison test, my typing speed was 70 words per minute and my voice-to-text speed was 133 words per minute.)

Ambient noise. The software can compensate for background interference to some extent by readjusting audio levels, but there’s simply not much you can do if you’re cranking a Led Zeppelin album in the background. Keep it quiet for best performance. If you like to listen to music while you work, you’ll have to use the headphones.

Training. Out-of-the-box with 15 minutes of training, accuracy is in the mid to high 90% range, and that’s not good enough for extended writing. It does take a little effort on the user’s part to get the program working with high accuracy. A little patience is required in the beginning, but the software adapts to your corrections so quickly you can see it improving minute by minute. The effort you take in training and correcting is definitely worth it.

Editing. As previously mentioned, the software gives you full control over your computer, down to controlling mouse movement, menu navigation and access to advanced scripts and macro creation. But for me, there are still some things that are easier to do by just clicking a mouse. Examples are moving the cursor to a different part of text, capitalizing words, or changing the formatting and presentation of text after it’s been typed. Yes, the software allows you to do all these things, but going back after the text is already on the screen requires too much vocal effort when I could take one second to click the mouse a couple times.

The short conclusion

I’m very happy with Dragon NaturallySpeaking, and would recommend against IBM ViaVoice. The $70 difference between the base versions of these products is worth paying. If you’re going to spend money, spend it wisely on quality software from a leader in the voice-recognition industry. Dragon NaturallySpeaking performs very well.

This entire article text was dictated and edited using Dragon NaturallySpeaking. If you’d like to see it in action, see a demonstration video. You can also read the Digital Bits column about voice recognition, or see answers to some frequently asked questions about voice recognition.



Previous post:

Next post: