Kent Pitman

Kent Pitman
Location
New England, USA
Title
Philosopher, Technologist, Writer
Bio
I've been using the net in various roles—technical, social, and political—for the last 30 years. I'm disappointed that most forums don't pay for good writing and I'm ever in search of forums that do. (I've not seen any Tippem money, that's for sure.) And I worry some that our posting here for free could one day put paid writers in Closed Salon out of work. See my personal home page for more about me.

MY RECENT POSTS

Editor’s Pick
FEBRUARY 15, 2011 11:10AM

Computers in Jeopardy!

Rate: 30 Flag

Monday through Wednesday, Feb 14-16, 2011, the evening television show Jeopardy! is airing a man versus machine competition called “The IBM Challenge” in which a computer (or really a set of computers) named “Watson” is challenging two all-time Jeopardy! champions in an exhibition match.

I took basic classes in artificial intelligence, computational linguistics, and knowledge representation at MIT some years ago, so I watched with interest to see how the state of the art might have advanced. I have to admit I was mildly disappointed.

Now, mind you, it's always hard to know what is going on inside. Absent detailed technical knowledge of how something works inside, people are forced to form opinions on the basis of what they see and how they would do a thing. That's sometimes a good thing when dealing with a person or animal that is similar in nature to oneself, but it's a less good idea when dealing with a computer, which is really an alien entity.

The Turing Test

A famous mathematician named Alan Turing created a concept called the Turing Test, which asks whether it's possible to build a machine to which you could pass messages remotely through some sort of interface that hid its identity as a machine and have the replies be indistinguishable from those you might get from a person talking through the same interface. In at least some ways, this IBM Challenge on Jeopardy! is an example of a Turing Test played out.

But the late Joseph Weizenbaum, author of the book Computer Power and Human Reason, argued that you could probably build a machine that would confuse people into thinking they were talking to a person, but that there was still something important in being a person that could not be encoded. As a consequence, he argued, there is a big danger in trusting a machine just because its behavior seems human-like in a few (or even many) examples. And certainly this TV game raised exactly that question.

Depth of Understanding?

As I watched, I was amazed by the degree to which the answers seemed to be heavily tuned toward worrying only about nouns, or noun phrases, as if the game were a kind of internally-generated multiple choice. It seemed to generate a list of options and then to try add or exclude probabilities. The graphics they offered about its workings told essentially the same story. Maybe their graphics oversimplified for TV, though. That might have biased my expectations, so read all of this with a grain of salt. Even so, the errors it made were consistent with what I would have predicted from this superficial guess about what it was doing.

A human being hearing a sentence like "John went to market." may conjure in their mind a rich set of information. John may be the name of someone the listener knows and so the listener may supply imagery related to that. The listener may imagine a specific vendor at the market or a specific product John was shopping for. But at the same time the listener will know these are just assumptions. Some questions may require using those assumptions, some not. There is lots of information is at one's fingertips based on reference and experience.

Although there is research into the area of having computers that grow up like people with experiences like us, most computers do not acquire information experientially. They are given data. And that data is usually specific about some details and not about others. For example, the same sentence “John went to market.” may cause the machine to know that John is a person, or it may not. John is a common name for a person. Maybe the computer knows, maybe not. It might not think this sentence is a lot different than “Object-1 moved to Place-1.” It may think the only difference is spelling. So it may not know the difference between the sentence “Apple went to market.” and “John went to market.” One of these is a metaphor, the other is a physical act. To the machine, John and Apple may be just uninteresting labels, not really names of tangible things.

You could see hints of this on the show when the answer was:

SHE “DIED IN THE CHURCH AND WAS BURIED ALONG WITH HER NAME. NOBODY CAME.”

Watson responded with the question “What is Eleanor Rigby?” It may seem a small matter, but this suggests that this phrase was just another name, nothing deeper. It hadn't processed the “She” in the question and merged it with the answer, which any human would do. Watson really just didn't “care” what an “Eleanor Rigby” was. Watson isn't thinking like a person and by this alone, in my opinion, does not come anywhere close to passing the Turing test.

Consider this answer as well:

“BANG BANG” HIS “SILVER HAMMER CAME DOWN UPON HER HEAD.”

Watson responded, “What is Maxwell's silver hammer?” but that was not the correct question. The correct question was “Who was Maxwell?” With a human, we give them the benefit of the doubt because we know what they mean. The entire point of this competition is to make sure Watson shows us it knows what is going on, and the benefit of the doubt must not be given because that suggests we may trust that common sense applies. In fact, at the present state of the art, computers are well-known for not having common sense. That's part of the point. We cannot know if Watson even knew there was a Maxwell in the scene at all! It may have thought “Maxwell's silver hammer” an uninteresting label just like it seemed to think of “Eleanor Rigby.” There is no way to know. The second and third choices did not show Maxwell as options it was considering. They were, instead, “FRANK SINATRA” and “Brown.”

Lack of Voice Recognition

Another detail of this competition bothered me a great deal was that Watson was working from written transcripts, while people were working from voice. This may seem a small matter, but there is plenty of voice recognition software in the world. It takes makes mistakes, but tough. The contestants make mistakes, too. They hear incorrectly sometimes, and are penalized for it. And most importantly, natural language processing from auditory input is expensive in time—it may take an additional second or fraction of a second even on a fast computer. But that might have affected outcomes. And certainly the possibility of misunderstanding the question affects outcomes. It was simply not a fair fight in that regard, and IBM should a bit be ashamed for not working from voice input rather than text input.

IBM actually sells voice recognition software. This should have been a chance to showcase it. I was surprised it was not. I have Dragon NaturallySpeaking (a competitor's product) on my computer and it's quite amazing how accurate it can be. It would probably have done quite well in this competition. Maybe IBM should have swallowed its pride and tried using that instead if their own software wasn't up to this task, but it simply was not fair to insist that Watson received written versions of the questions.

How This Affects Us

One final point, stepping back. The reason IBM probably invested this much money was probably not to win the Jeopardy! pot. It wouldn't break even. They are surely after technology that will be useful in search engines. We already rely very heavily on so-called “full text search” and this probably heralds a new generation of search based on actually answering questions rather than finding text. People will like that because it saves them time. We're a lazy bunch, we humans. But every time we save ourselves time, we yield some of ourselves to the vagueries of the technology.

As Weizenbaum would no doubt point out, we should not be too quick to trust. It would be an interesting variation on this game to require not only answers but rationales. Watson might even have good explanations to offer—sometimes. I'd love to have seen the rationale for “FRANK SINATRA” in the question above. My point is not to say it doesn't have an explanation, but seeing that explanation may tell us something very important about whether to trust the answer. Perhaps it would tell us how thorough the search was, or about how valid the inference techniques employed really were. We should not just trust because “a computer told us.” It might be reasoning by what in a trial would be called “circumstantial evidence” or it might be using really solid logic. There is no way to tell without asking to see the reasoning. It's just playing the probabilities and bluffing. I hope the future of humanity's decision-making is based on firmer stuff.


If you got value from this post, please "rate" it.

Your tags:

TIP:

Enter the amount, and click "Tip" to submit!
Recipient's email address:
Personal message (optional):

Your email address:

Comments

Type your comment below:
A terrific analysis of the game play from last night, Kent. My personal favorite answer was when WATSON didn't have a clue that Ken had answered incorrectly and offered the same answer, in the form of a question of course. We humans may be a lazy bunch but we also like to see the computer lose.
Coyote, yes, that's a thing where it would have benefited from voice recognition. It probably didn't know he had said anything.
Where is HAL when you need him??:)
rated with hugs
Very good article. You raise many points that I will keep in mind when I watch tonight's show. And I, too, wonder why they did not use voice recognition software. Doesn't seem like a very adequate test of the computer if it has to rely on text input.
Is Jeopardy this close to going off of the air? I want to watch people vs. people - that proves intelligence.
I watched at MIT where one of the principal architects of the system gave an hour-long lecture beforehand to a packed lecture hall on the details of how the system works. It's considerably more sophisticated than simply matching noun phrases, though ultimately, linguistic, textual analysis is at the root of everything it does. It looks at many different dimensions, learns metaphors and associations and multiple meanings (a politician RUNNING for office is associated with a different semantic net than a sprinter RUNNING for a gold medal).

It's done with enough modularity that they can plug in semantic capabilities (e.g. they wrote a module to take puns into account, which was separate from other modules). The modules then get tuned or tune themselves through experience as to how significant each module is. For example, it may be that in questions involving proper names recognized as presidents, doing a temporal analysis (looking for dates and events that correspond to the President's life or term in office) may be more heavily weighted than a rhyming analysis. But if the category is "rhyme time," the rhyming analysis gets weighted more heavily.

Since I'm writing from memory, I can't vouch for the details of what I just said, but I was reasonably impressed with the system and its ability to comb through several terrabytes of data and give an answer in under two seconds.
Linda, interestingly, HAL was supposed to be around almost two decades ago. Yeah, he's overdue.

CZPhoenix, I'm glad you'll have some thoughts to ponder tonight.

Razzle, people vs. people is certainly a different experience entirely, yes. :)

Stever, this is really great commentary. Thanks for adding it. Of course, I tried to write the piece to anticipate someone might fill gaps in this way, even partly contradicting my guesses. The whole issue with a Turing test is that you guess what's inside based on behavior though, not based on secret inside knowledge about what is really programmed there. So I stand by the philosophical concerns even as I acknowledge I got some of the guesses wrong. I'm also mildly curious how they determine the skewing of the probabilities—whether “rhyme time” is a known category or whether general-purpose knowledge is used to infer the need for a bias in determining the meta-level weightings. Do you know?
The problem with HAL was that he was designed with such rigidity that he eventually was a detriment to every human around him/it. Now, if Asimov's Three Laws were programmed in, I wonder how that would play out. If computers are at risk for completely replacing the need for humans, they technically would be violating the First Law. Why? Because as Victor Frankl stated, we are driven by finding a point to what we do, i.e., a " . . . Search for Meaning". If we don't have that, we could be emotionally lost to the point of destruction. Would the computers eat themselves to save the human race from obsoletion/extinction?

I am dubious as to the purpose of this exercise. If Watson wins, does that prove computers are smarter than humans? If it loses, does that mean we have to improve our technology so that it does win? If so, we get caught in a circular argument. Perhaps the computers are jsut tools to reach our actual potential since technology can only be built while standing on the proverbial shoulders of giants. There is also a possibility that we are mere facilitators, i.e., designers, of something that is bigger (faster, stronger, smarter) than all of us.

I explore my inner Luddite when I consider this.
The generalized notion of a Turing Test is over-rated anyway. The fundamental flaw in the idea is that our ability to judge is dependent on the quality of our ability to observe.

Over a low-bandwidth, noisy channel, it will be very hard to tell. It will be a much harder problem over a full sensory channel -- that is, the ability to see, hear, touch, smell -- I will omit "taste" in the interests of good taste.

But even limited to written transcripts, I think it may take longer and longer for us to tell, but eventually, we'll always be able to tell.

It's not that it couldn't be done. It's that there is little REASON to do it. The technical difficulties pale in comparison to the lack of economic incentive. There's plenty of reason to "perform as well as a human", or even "be indistinguishable under limited conditions", but never to "to be indistinguishable". In the competition with humans to be human, since humans are already human, computers lose automatically.

In truly long-term scheme of things, the whole idea will seem silly. Why should a computer perform as poorly at tasks as humans?
Very interesting post. I'm following the show with interest, but what still puzzles me is your argument that if Watson "hadn't processed the “She” in the question and merged it with the answer, " then how did it come up with the question "What is Elanor Rigby"? Elanor rigby is not just another name, it belongs to a very specific and fictitious context contained within the answer - unlike a clue " He won two purple hearts in 1907" (just an example). I was also noticing the frustration on Ken Jenning's face as Watson kept buzzing in faster than him. Looking forward to tonight's game.
What an informative piece! From what I have heard or read, Watson is acting "as if the game were a kind of internally-generated multiple choice." W. searches its data bases for possible matches, ranks them for probability of correctness (a mystery to me) and gives you the one with the top rating. Which can make for some really funny misses. It seems that W. doesn't really "know" anything, but simply engages in a series of successive approximations.

But here's the deal, if, as you say, W. is really a series of computers, then why can't the humans form a committee to go up against them?

Oh...wait, a committee...like at work...never mind.
As a former IBM-er (secret handshakes and "Ever Onward" choruses to all you others!), I am happy to see the publicity generated by Watson, but I'm waiting for them to develop Sherlock.
Purple, yes, the three laws would be interesting though I'm not 100% sure you have to go even as far as the future of humanity to find such relevance. If the Laws were in play, I might say to the computer during the opening “Hello, Watson. If I win this match, I will donate some of the money to medical charities, where it will save lives. I hope you won't try to beat me, lest those lives be lost.” Nothing wrong with indulging that inner Luddite. To succeed we must have both the curiosity and drive needed to move forward and the wisdom not to move too fast. The past does not always inform the future, but neither is it of no value.

Bob, when you refer to it taking longer and longer to tell, I can't help thing of the scene in Blade Runner where the Voight-Kampff machine is used, with increasing difficulty fo the more sophisticated replicants. As to your point about replacing humans, Jaron Lanier remarks in his book I Am Not a Gadget that any real attempt at achieving the singularity is nihilism.

Jane, thanks! Glad you had a good visit.
I'm looking to see if there's a button bias -- Watson consistently being accurate on ringing in first as an unfair electronic advantage.
As far as Watson hearing instead of reading the answer, my personal experience is it's easier to answer the ones that require an "educated guess" when I'm reading instead of only listening. I'm like that with spelling also, much better when writing it out than verbalizing.

They need to program in a set of anecdotes so Watson will have something to say during the single Jeopardy break.
Maybe he knew Hal, and has an interesting story to relate.
The talk did not use words like "intelligence" and "understanding". It's just a program to do as well as possible on Jeapordy. Many of the techniques it uses are simple. It has many modules that take different approaches, and each has metadata to decide how "sure" each module is that it's answer is right. They tried lots of modules and had a big test suite, and just kept tweaking and getting better results. Yes, there pretty much ARE rationales. Of course, when televising a game, one does not have pauses for technical analysis. Once the big competition is over, then they might publish a lot more. At the MIT talk he asked people not to take notes, and I think it was more of a "press embargo" than anything else.

I also saw the MIT talk. The software is a lot more sophisticated than you are giving it credit for. On the other hand, it is not using sophisticated computational linguistics, as you say. It has quite a lot of specialization for Jeapordy; for example, there are modules that know quite a lot about puns. The "Eleanor Rigby" question was almost certainly just a text search with very little parsing or understanding; it knows how to turn the "answer" into a pattern to match. The fact that this isn't AI is not important to the IBM team. They are not making any claims about AI.

Yes, IBM did not do it solely to win the game. There were two reasons, he said. First, a "grand challenge" is a great way to motivate work. Winning Jeapordy is easy to explain, and quite "crisp" rather than being vague goal. It can be measured in a simple way. Second, IBM wants to sell products, and this is a demonstration of the hardware and and the kind of software that can be built on it.

In the original rules of the game with the previous host (Art Fleming), the players were allowed to press the button before Art was finished asking the question (answer). Watson would have no chance of winning under those rules. The extra time provided by waiting for the question to end helps Watson substantially.

I don't remember his discussing speech recognition. I'm sure that IBM would rather win more than showcase its speech technology. He did talk about the question of what constitutes "pressing the button", and how they negotiated that with the Jeapordy people, who, by the way, were very pleased and excited about this project from day one.
I'm watching Jeopardy now, and I think there's an obvious button bias. I know the two champs know the questions to most answers so far, but they never seem to beat Watson when it intends to respond.
To me, this is far less about AI and more about a quick solenoid, as far as the competition goes.
Death to computers long live humans!
Fusun, I agree, they should have tuned the speed of how fast the thing was allowed to come in by time-testing how fast a human can press the button. I'm sure data must exist.

Steve, if they allowed multiple people it would be more like Family Feud than Jeopardy, I suspect. And look how slow and unreliable that consensus step is there! But a fun idea. I wonder how IBM would do at Family Feud. :)

Paul H., are you being humorous or is there such a project in the works. Can you tell us its scope? Anyway, I'm glad you're able to just sit back and take enjoyment and perhaps pride in the fun.

Paul J., good idea about giving him a personality. You know, I was told in class long ago that they had computer diagnosis tools a long time ago that used to only ask relevant questions and they found that if the thing didn't also ask social pleasantries starting up, people didn't trust it. Might be something to the personal touch.
I take it you and Joseph Weizenbaum "something important in being a person that could not be encoded" are not believers in The Singularity.
Dan, thanks for all the extra color. Is Watson starting thinking at the start of the question and is everyone locked out until after the question is done? If so, that would be hugely unfair giving Watson a virtual lock on the precision needed to be first to ring in. It'd be interesting to know the precise rules. If there is not at least a little delay between the end of the lockout time and the time Watson is allowed to respond, then its precision at measuring when it can reliably start ringing in will be the deciding factor.

Paul J., “far less about AI and more about a quick solenoid”—precisely.

Caracalla, thanks for the pointer back to your earlier post “Are You Smarter Than The Smartest Computer?

Don, I'm not sure I feel that strongly about the computers, but having humans live a while would be good.
Tom, I believe in the theoretical possibility of the singularity just not the ethical appropriateness of allowing it to arrive without considerable care. As to the timing of it, the numbers may be overhyped. I'm not sure if Joe thought that it could be achieved or not—I had the impression he did not, but I'm not sure that's not the relevant detail. What I'm pretty sure is the relevant detail and what I'm pretty sure he did believe is that before the issue came up we're more likely reach the point where we improperly thought we were at the singularity and we prematurely trusted something that was not really smart in the way we were expecting. Whether that means it was sociopathic in ways we didn't expect (consider The Forbin Project) or naive in ways we didn't expect (consider The Squire of Gothos in Star Trek's original series, even though it wasn't a computer) or some other form of failure, I'm not sure one could say. Remember that even a human intelligence goes first through adolescence, where it has knowledge ahead of wisdom. If we end up with a singular adolescence, we are in deep doo-doo. This is why Jaron Lanier thinks it's nihilism. I should add that I've made the case we're already there with emulations of what we should expect with corporate people, which operate as legal sociopaths (see my Fiduciary Duty vs the Three Laws of Robotics and Teetering on the Brink of Moral Bankruptcy, though I probably owe another blog that makes it clear why this relates so directly to the singularity).
utterly fascinating - the experiment, the s/w development, the contest, your post, the comments.
Femme, thanks for stopping by. Glad you got so much out of it.
I understand after reading your article why I get so frustrated in dealing with computer qeues online because they don't think or respond as we do and cannot handle something outside the ordinary. Great article.
Bernadine, this is related to a point that I've often made about simplicity/elegance. Simplicity is often equated with mathematical conciseness—that is, something that can be compactly expressed by math is thought simpler by some. But I think since the human brain is not a “math engine” that a possibly fairer method of simplicity is “having a description that is least distant from how humans conceive things.” This leads to a very different set of choices.

The most obvious concrete example is that some say languages where each word has only one meaning and where there is no ambiguity of words that must be resolved by context are better than languages with ambiguity. Yet nearly all of the very many human languages tolerate and embrace ambiguity. A quick run to the dictionary will tell you that humans aggressively seek out multiple meanings of words. This tells you something important about the way people think about things. And so the simplest languages tend to be those that are easily used, not those that are easily learned. Or so I claim. I further claim this is because languages are learned only once and spoken many times, so the time to learn is not as important as the time to speak.

Computers often have ways of doing things that are simple in some ways but that do not satisfy what we would call “common sense” because they fight with the human way of doing things rather than embracing them.

In fairness, the reason we use computers is because they are good at things we are not. One does not go to IMDB to talk to a person who's as forgetful about movies as we are. We want a reliable answer. But that doesn't mean we want the answer presented in computerese—we still want the answer presented in human-readable form. It's a tricky but important balance to strike.
Great analysis. I've been feeling guilty that I'm rooting AGAINST Watson. The machine's victory should be a sign of human accomplishment, but I see it as a "cheat" for some reason, and I'm hoping human brains still manage to beat a computer, no matter how advanced.
After trying to outfox the Fedex computer today, I can only hope they'll extend a job offer to Watson following the show. Good thing they still have that hit zero for a human option.

One of the commenters mentioned how Watson i just some spiffy program. Maybe so but it probably lays the groundwork for a talking Wiki. Or some kind of dial up an imaginary friend who can chat about anything, convincing feigns an interest in you, and has no ego. I bet that day is not so far off.
Nick, one of the things about society is that it presumes that because we can build something that we must, that because we can put people out of work by computers, that we must. I don't think that follows. I'm not sure where that comes from. Is it a fundamental precept of capitalism or perhaps an emergent behavior of it?

Abrawang, I assume you've seen Eliza, but if not you should read about it. Yeah, chatbots have been around a while but will get more refined over time. Some of them are simplistic but some are quite elaborate.
My Sherlock remark was meant as a joke because I doubt computers will ever be intuitive in our lifetimes. There are short-lived promises every once in a while, like neural nets etc., but in the end it is just a machine. That said, I wouldn't be surprised if the breakthrough came about at the Thomas J. Watson Center, possibly the last pure research facility left in the US. They've done a lot of magic.
Paul, you're making me tear up when you talk about research falling off on the same week that there is all this political push to cut budgets in education, etc. Do we not understand the difference between investment and mere spending any more in this country?
Kent, I'd forgotten about Eliza (named after Doolittle?) so thanks for the link. I read about it at university back in the 70s. I think they did one that mimicked Goldwater too. The newer version must be much more versatile but I reckon it will take another generation before they ace the Turing Test. Then when they combine the personality software with holograms or whatever, it'll be like that old Twilight Zone episode.
Abrawang, yes, named after Eliza Doolittle. You can see an approximation to the original's behavior by clicking here. A transcript of my personal contribution to the large space of Eliza variants is found by clicking here; I have paper text with the source code for my program in various forms (all very large), but alas I think none are online anywhere.
Jeopardy is the only daytime TV show I watch and, I record it to view it at my own pace and ff past the commercials.
In actuality, it's an exercize in triviality memory.
I believe that winners rather than being of superior intellect, have more active and accessible memories.

That said, I read you line here;
"A human being hearing a sentence like "John went to market." may conjure in their mind a rich set of information."

If a person happened to be dyslexic, they might see that as, "Mark went to the john".lol
It just does what it is programed to do and when I saw the number of question marks after the Toronto response I couldn’t help but wonder if it was programed to do that. It would be better if they presented in a different setting other than Jeopardy and I could be more certain about the sincerity. Nothing the Mass Media does can be trusted anymore.
I was writing artificial intelligence programs back in the mid 1980s, and I never did understand what all the hype was about.

No matter how you try to fancy it up, an AI is nothing more than a database with a b-trieve data sorting mechanism.

Human beings and computers share one characteristic: we can only choose between two options at any given moment in time.

When you go to dinner, it may seem that you are choosing between numerous alternatives as you choose your dinner, but it's always a choice between two options. Meat or Vegetarian? Beef or chicken? Cajun or blackened?

Computers do the same thing. Choices are always made between two options and then the surviving choice is measured against the next option.

Moved to the internet, this process simply regards the entire internet as a database, which is all it is, and sorts through it by sorting data in groups, and then as sets within a group and then as items within a set.

The problem that people who believe in the singularity face is that computers have no feelings, and its the propriaceptive feelings that give us a sense of rightness or wrongness.

These feelings originate in the brain, out of consciousness, and are registered as bodily reactions to these unconscious thoughts.

(Speculate for a moment on how a computer can experience unconscious thought.)

The startling conclusion that I come to is that computers are naturally sociopathic because they are incapable of interpolating the feelings of others on the basis of their understanding of what those feelings feel like because they can't feel them.

This is true of the true psychopath....and the computer.