Mixed Emotions: Humans React to Natural Language Computer
By Applications User Experience on Nov 18, 2011
Anna Wichansky, Senior Director, Applications User Experience
There was a big event in Silicon Valley on Tuesday, November 15. Watson, the natural language computer developed at IBM Watson Research Center in Yorktown Heights, New York, and its inventor and principal research investigator, David Ferrucci, were guests at the Computer History Museum in Mountain View, California for another round of the television game Jeopardy. You may have read about or watched on YouTube how Watson beat Ken Jennings and Brad Rutter, two top Jeopardy competitors, last February. This time, Watson swept the floor with two Silicon Valley high-achievers, one a venture capitalist with a background in math, computer engineering, and physics, and the other a technology and finance writer well-versed in all aspects of culture and humanities.
Watson is the product of the DeepQA research project, which attempts to create an artificially intelligent computing system through advances in natural language processing (NLP), among other technologies. NLP is a computing strategy that seeks to provide answers by processing large amounts of unstructured data contained in multiple large domains of human knowledge. There are several ways to perform NLP, but one way to start is by recognizing key words, then processing contextual cues associated with the keyword concepts so that you get many more “smart” (that is, human-like) deductions, rather than a series of “dumb” matches. Jeopardy questions often require more than key word matching to get the correct answer; typically several pieces of information put together, often from vastly different categories, to come up with a satisfactory word string solution that can be rephrased as a question. Smarter than your average search engine, but is it as smart as a human?
Watson was especially fast at descrambling mixed-up state capital names, and recalling and pairing movie titles where one started and the other ended in the same word (e.g., Billion Dollar Baby Boom, where both titles used the word Baby). David said they had basically removed the variable of how fast Watson hit the buzzer compared to human contestants, but frustration frequently appeared on the faces of the contestants beaten to the punch by Watson. David explained that top Jeopardy winners like Jennings achieved their success with a similar strategy, timing their buzz to the end of the reading of the clue, and “running the board”, being first to respond on about 60% of the clues. Similar results for Watson.
It made sense that Watson would be good at the technical and scientific stuff, so I figured the venture capitalist was toast. But I thought for sure Watson would lose to the writer in categories such as pop culture, wines and foods, and other humanities. Surprisingly, it held its own. I was amazed it could recognize a word definition of a syllogism in the category of philosophy.
So what was the audience reaction to all of this? We started out expecting our formidable human contestants to easily run some of their categories; however, they started off on the wrong foot with the state capitals which Watson could unscramble so efficiently. By the end of the first round, contestants and the audience were feeling a little bit, well, …. deflated. Watson was winning by about $13,000, and the humans had gone into negative dollars. The IBM host said he was going to “slow Watson down a bit,” and the humans came back with respectable scores in Double Jeopardy. This was partially thanks to a very sympathetic audience (and host, also a human) providing “group-think” on many questions, especially baseball ‘s most valuable players, which by the way, couldn’t have been hard because even I knew them. Yes, that’s right, the humans cheated. Since Watson could speak but not hear us (it didn’t have speech recognition capability), it was probably unaware of this. In Final Jeopardy, the single question had to do with law. I was sure Watson would blow this one, but all contestants were able to answer correctly about a copyright law.
In a career devoted to making computers more helpful to people, I think I may have seen how a computer can do too much. I’m not sure I’d want to work side-by-side with a Watson doing my job. Certainly listening and empathy are important traits we humans still have over Watson. While there was great enthusiasm in the packed room of computer scientists and their friends for this standing-room-only show, I think it made several of us uneasy (especially the poor human contestants whose egos were soundly bashed in the first round). This computer system, by the way , only took 4 years to program.
David Ferrucci mentioned several practical uses for Watson, including medical diagnoses and legal strategies.
Are you “the expert” in your job? Imagine NLP computing on an Oracle database. This may be the user interface of the future to enable users to better process big data. How do you think you’d like it?
Postscript: There were three little boys sitting in front of me in the very first row. They looked, how shall I say it, … unimpressed!