Sunday, May 2, 2010

Know Your Tools: Speech Recognition Software

Writers rely on technology to get their work done.  The technology could be a ballpoint pen or a computer running a word processor program.  This week I'm going to talk about speech recognition software.

Everyone thinks they can write a book.  I happen to agree:  they can write a book.  Maybe not a good one, but they can write a book.  But they won't.  Most of them will never get started.

I've met a couple of people who think that all that scribbling or typing is too much effort, and that having the computer write what they say will expose their inner author.  Nice thought.  Doesn't work that way, though.
Voice recognition software has come a long way, and is getting better every year.  What you need to write a novel is something called continuous speech recognition, something like Dragon Naturally Speaking.  I used DNS three or four years ago.  It was a fascinating experience.  They state now (and I believe they stated then) that you can get up to 99% accuracy right from the start.  The key part of that, naturally, is up to.  Up to 99% certainly covers 80% accuracy, which is about where I started.  After months of hard work training both myself and the recognizer, I got to about 90%.

90% accuracy is really bad.  It means ten words wrong out of every one-hundred.  It means that the majority of sentences will have an error.  These are errors that your word processor (did I mention that you're speaking this into a word processor?) will not flag because they are properly spelled words, they're just not the words you wanted to spell.  And before you jump up and say that the grammar-checker would catch most of them, I have to tell you, "no, it won't."  Why not?  Because you've turned your grammar- checker off:  it is meant for business letters and not novels (and especially not dialog!).

The recognition software has to be watched, too.  You can't just start speaking and turn your back on your screen, pacing around the room while you wax eloquent.  There are two reasons for this:  you want to be sure that what you speak is being written into the proper place, and you want to make sure you're not deleting your work.  Your cursor might jump into the middle of the previous chapter, or worse, a dialog box might pop up and grab the focus -- if you don't notice, you'll speak into a black hole and lose your inspired verbiage forever.  More insidious is when you slip into "command mode" so that instead of transcribing your speech, the program attempts to carry out your commands.  Think , select all, delete, save, exit.  All sorts of painful things can happen if you're not watching.  Trust me.

Even when keeping an eye on the transcription process, I found I spent more time on the keyboard doing corrections than I did speaking my sentences in the first place.  Oh, but miracle of miracles, you can speak your corrections, too!  I tried that once.  My arthritic fingers were paining me greatly (which was why I had DNS in the first place), so I decided I was going to work that evening without using the keyboard for anything beyond startup and shutdown.  The first line I spoke was "The lawn was freshly mown."  Looking back on it, that was a terrible sentence to ask the recognizer to deal with.  Unfortunately, terrible sentences crop up all the time.

The immediate problem was mown.  The software produced the homonym, moan.  There was a problem with freshly as well:  flesh Lee.  I decided to leave Mr. Lee and the flesh where they were for the time being, and worked on the moan.  Mown was not in the DNS dictionary, so I had to add it, and I did that by spelling it out.  Or I tried to.  M and N are largely indistinguishable through the (premium) bluetooth microphone I was using, so I couldn't get mown into the dictionary.  In desperation I decided to rephrase to something like "He recently mowed the lawn," hoping that the moaning would stop, and Lee could take a much-needed break.  I corrected, spelled, re-read the manual, coerced, cajoled, backspaced, "undo-that"ed, and muttered to myself for another quarter of an hour.  Intensely.  By the way:  muttering to yourself is an uncommonly stupid thing to do with a speech-recognizer listening.  I finally gave up when the best I could do for my sentence was, "The Jews know the law."

And they may, but in the meantime the grass is growing up to Lee's fleshy knees, and there's no one to cut it.

DNS certainly didn't cut it.  I tried it for another six months and finally abandoned it.  In the meantime it made for some amusing conversations the few times I tried using it for instant messaging.

Tools don't do the job for you, but sometimes they can make it easier.  Someday (in the not-too-distant future) I think continuous speech recognizers will be useful, even considered indispensable by the majority of novelists.  That day is not today.  Of course, even if you had someone to take fast and accurate dictation, there's more to writing a book than speaking for a few hours.

Have any of you tried this "shortcut?"


  1. Hi John!
    No. I've never used DNS. I have used dictation but hated it. I'm a fast typer and I like to have the ability to see what I'm writing and edit as I go. I for sure would not have had the patience you did to try to make it work.

  2. At that point, my fingers didn't leave me much choice. Patience was all that was left to me. They're usually much better now, thank goodness.

    Did you use dictation as a nurse, or did you actually try it for writing?

  3. I used dictation for nursing. At the time I was working for a national company, from my home office, visiting patients in my surrounding area. They tried the dictation system to limit carpal tunnel syndrome and expedite reporting. The nurses, myself included, hated it. Although given time, maybe I would have gotten used to it.