wiley-logo-sm.gif
> wiley.com

Designing Effective Speech Interfaces

 


         
     

                  

 

 


mic_75.jpg (8889 bytes)      Introduction

 

In 1985 I worked on my first natural language interface. We were building a proof of concept system that would allow top managers at an insurance company to use natural language commands to query a data base. If the system didn’t know a word that was used it would come back with a question asking for the meaning of the term it did not know. During an early test one of the users asked, “How much life insurance has been sold last year in the Northeast?” The computer questioned us back with, “What is the meaning of life?” How thrilled we were that we had created the first thinking computer!!

During the last few months I have talked to many people about this book I was writing.  I can’t count how many people mentioned the Star Trek computer or HAL from 2001 to me when I would tell them the title the book. For a while I thought it was neat that people could relate to what I was writing about. After all, if I mention the title of my last book, GUI Design Essentials to people who are not in the field of computers I usually get blank, glazed stares. As the same comments continued to come I started to inwardly groan. But as I write this now, in the thick of the project, I have decided that it is significant. The idea of a talking computer and a computer that understands our speech is different, it’s striking, it’s a novel idea that captures people’s attention and imagination. At first it seems easy to understand why so many people would remember HAL who had seen 2001. HAL is the somewhat sinister computer in 2001 who closes the pod doors.  People remember “him” because he’s evil, right? But then, how many evil characters are there in movies? Do we always remember their names? And what about the original computer in Star Trek? That androgynous computer was just called “computer”. And he/she wasn’t sinister at all. Why do we remember that one?

I think it’s because speech is something that is uniquely human. Dogs bark, and chimpanzees can communicate with “written” language, but only humans talk. To talk is to be human, and HAL and “Computer” stick in our memory because they are computers with uniquely human traits. We are fascinated by this idea, and maybe even repelled by it.

No wonder, then, that speech capabilities in computers seem to hold that love/hate, avoidance/attraction aspect for us. Speaking is so natural to us as humans, that we want to be able to talk to our computers. It seems that it would be easier. And we want our computers to talk to us, because listening to information rather than reading it leaves our eyes and hands free to do other tasks. Yet we hesitate. We come up with so many reasons not to embrace the technology.

True, the reasons are plentiful and many are all too valid. The state of speech recognition, and dealing with errors it produces is enough to still drive many away. I used speech technology, specifically dictation systems when I was writing this book. I dictated the phrase “…and manual output, and performance was worse with auditory input and spoken output”. When I glanced up at the screen the recognizer had written “and Immanual Kant put an the of paint boatman and Spokane output”.

Last week a press release from the University of Southern California proclaimed a breakthrough in speech recognition using neural network technology. The improvements are dramatic. By the time this book is released, using neural networks for speech recognition may have made many of the error handling and recovery issues unimportant. None too soon for me.

No, it’s more than error rate we worry about. I think we worry about losing our uniqueness. We don’t want humans to be machines, and we are still uncomfortable with our machines being human-like.

I don’t believe that talking to a computer, or listening to a computer makes us any less human, or the computer any more human. We are about to overcome our hesitancy and leap into speech in a big way. What will that transition be like? Will it continue to happen slowly? Or will it snowball all at once?

We are historically very poor at predicting the pace of acceptance of new technologies. In a conversation I had with one of the original creators of the cell phone, he told me that when they developed the first cell phone they thought the total world-wide market was about 100 phones (top CEOs and maybe heads of large powerful governments). TJ Watson from IBM predicted world-wide computer use at 5 computers (of course he meant large mainframes). The internet was confined to a group of scientists for many years. We cannot predict the speed or the direction or the use of new technologies.

I suspect, however, that the uses of speech technologies will be much more social and human than we think. When people embrace a technology they take it into their social situations. People sign on to the internet to chat with each other, or sell items like a big garage sale. People use their cell phones to call home and talk to children and spouses. Certainly business and commerce will be targeted for speech applications, as they have already been, but if you want to find the point at which the technology is fully embraced, look for places where speech brings us closer to other people.


book_cover_copy.jpg (5968 bytes)

ISBN 0-471-37545-4
416 pages
April, 2000

Wiley Computer Publishing
Timely. Practical. Reliable.

Weinschenk Consulting
Home Page

  top.gif (836 bytes)