Hacker News new | past | comments | ask | show | jobs | submit login
IBM Watson's team Q&A on reddit (reddit.com)
85 points by hootx on Feb 23, 2011 | hide | past | favorite | 28 comments



Alas, my somewhat-skeptical question came in late and got little support:

http://www.reddit.com/r/IAmA/comments/fnfg3/by_request_we_ar...

What determined the use of exactly 10 racks of 9 maxed-out (32-core, 512GB RAM) 4U Power750 servers? For example, would Watson have done better with more hardware? Or could it have made-do with far less, after all the bulk pre-processing of, and training on, source material was finished?

(My intuitions about the necessary amount of reference data and topical associations – written up at http://redd.it/fnixm – made me think way less hardware should have been required, at least at the very end during the match.)


Tony Pearson wrote a bunch of stuff about Watson. https://www.ibm.com/developerworks/mydeveloperworks/blogs/In... Had a bunch of details including some performance details...

Single core -- 1 -- 2 hours

Single IBM Power750 server -- 32 -- < 4 minutes

Single rack (10 servers) -- 320 -- < 30 seconds

IBM Watson (90 servers) -- 2,880 -- < 3 seconds

https://www.ibm.com/developerworks/mydeveloperworks/blogs/In... Had details on the data/storage/ram:

When Watson is booted up, the 15TB of total RAM are loaded up, and thereafter the DeepQA processing is all done from memory. According to IBM Research, "The actual size of the data (analyzed and indexed text, knowledge bases, etc.) used for candidate answer generation and evidence evaluation is under 1TB." For performance reasons, various subsets of the data are replicated in RAM on different functional groups of cluster nodes. The entire system is self-contained, Watson is NOT going to the internet searching for answers.


I wonder if this is marketing.

IBM makes BIG HUGE MASSIVE (tm) server clusters that have lots of blinkenlights and require lots of power and are so crazy and huge and awesome that the people working at IBM must be hyper-geniuses!

vs

Watson runs on a laptop.


Watson can run on a laptop.

It just can't get the answers fast enough to win Jeopardy. https://www.ibm.com/developerworks/mydeveloperworks/blogs/In...


I can't see how that would be marketing. Most people would be more impressed if Watson could run on a laptop, today. I know I would.


More impressed? Yes. More willing to spend millions on hardware and software? No.


And how would they look if, as the original poster surmised, it were possible for a competitor to produce something similar to Watson on today's laptop? IBM would've spent millions of dollars making themselves look very silly.

It's a similar situation to moon-landing skepticism. If it didn't happen, it would be trivial for the Soviet Union to disgrace and discredit the achievement by faking their own.


No other team was given a crack at the Jeopardy spotlight. IBM didn't win against a bunch of other teams for the slot on the human/machine faceoff; it was all orchestrated for them. So give others some time to emerge.

I suspect in a year or two – perhaps sooner, if an organized open competition with real cash prizes is launched – we'll have a better idea of what a lean team could do on the trivia domain. It may not be reduced to a single 2011-equivalent laptop... but it might be a single 2011-equivalent maxed-out-server (rather than a Watson-like server room).


I can see IBM charging customers millions a year to keep their Watson database up to date, even if it ends up running on an x86 Linux box. When they adapt it for general Q&A use, Watson will probably be as useful as Wolfram Alpha wishes it was.


Most programmers would, but any non-technical people would think that you could hire a programmer to do the same thing for $1000.


I also feel that they sort of jogged around the buzzing in question. Obviously Watson has to calculate and decide his answer, but there is no denying that he was very fast on the buzzer in the game.


From the Q&A with Ken Jennings: http://live.washingtonpost.com/jeopardy-ken-jennings.html?hp...

    Q: Seemed to me, for many of the questions, that the computer was just
    better at buzzing in. Does Watson have an unfair advantage for timing the
    buzz-in?

    A: As Jeopardy devotees know, if you're trying to win on the show, the buzzer is
    all. On any given night, nearly all the contestants know nearly all the
    answers, so it's just a matter of who masters buzzer rhythm the best.

    Watson does have a big advantage in this regard, since it can knock out a
    microsecond-precise buzz every single time with little or no variation. Human
    reflexes can't compete with computer circuits in this regard. But I wouldn't
    call this unfair...precise timing just happens to be one thing computers are
    better at than we humans. It's not like I think Watson should try buzzing in
    more erratically just to give homo sapiens a chance.


Seems to me they should have chosen harder questions.

They should try to pick questions such that the contestants only know about 1/3 of them.

Then lets see how the computer does.

This is (should be) a contest of knowledge not buzzing.


But that's true for normal Jeopardy too. The best buzzer person (Ken Jennings or Brad Rutter) wins because of their reflexes and timing. Watson just took that edge off the table and flipped it back at them.


I meant change it for regular Jeopardy as well.


That's good for a intellectual challenge, but for game shows to work, they need to be questions that players at home, viewers who earn them their advertising dollars, know. They want us to be sitting on the couch shouting out the answers. If we're sitting there clueless, it's a much shorter distance to changing the channel or taking it off the dvr.


Homo sapiens didn't have a chance in the first place.


Agreed. I don't want them to offset Watson's raw potential, but why not just acknowledge the processing power put him ahead of the buzzer.


Agreed. I wonder out of the cases where both human and Watson had an answer ahead of time and were waiting for the buzzer light, what percentage did Watson win?

If it's much higher than 50%, I would say that a significant portion of Watson's advantage derives not from the language processing but the response time / mechanical advantages.


while yes human reaction times are nowhere near as good as those achievable with computers I think the physical pressing of the buzzer was enough to level the playing field. I can't remember where I read it but it seems that at most Watson had 100ms of an advantage if he was confident of his answer before the buzzers opened. I also can't think of a consistent way of getting rid of this time gap.


Parts of the game were eliminated to give Watson a chance: audio/video clues, and categories that required extra explanation.

If the exact timing of buzzing in – and not 'first' but 'first after a light goes on'– gave an overwhelming advantage to Watson, that could be eliminated, too. For example, why not just 'first to buzz in'? (That is, no penalty for being early?) Let Watson's parsing of the question text race the humans' sight-reading.

It looked to me like the humans usually won the buzzer on short questions, which strongly suggests it was the slowness of reading long questions that put it on par with the humans in knowledge, and then it crushed them on buzzing-after-the-light precision.


You can program a computer to wait until it has high confidence before buzzing in, but would you trust a human to wait to buzz in until he has already chosen his answer? In many situations, a contestant will buzz in knowing they're likely to be able to determine the answer in the five seconds they're given after being called upon. You'd be trading one player's perceived advantage for another.


Question 3 was the most interesting but data on parsing remarkably incomplete. As far as I can tell, we have only lists of possible ways to break down the data without any explanation of how or why one possible way is preferred to another.

Case in point (1):How it decides to treat "Treasure Island" as a proper noun. We see only "modifies(Treasure, Island)" -- indicating that it treats treasure and adjective modifying island, then suddenly in the semantic assumption phase they are treated as a compound.

Case in point (2). We are given:

        island(Treasure Island)

        location(Treasure Island)

        resort(Treasure Island)

        book(Treasure Island)

        movie(Treasure Island)
I assume what he is giving us is method names written in Java with "Treasure Island" as the single argument that return a value indicating the likelihood that "Treasure Island" is what the method name refers to. This is extraordinarily interesting. However, it is not at all clear which methods are chosen and why, if they are run in some sort of sequence or simultaneously, etc .

Case in point (3) : "Builds different semantic queries based on phrases, keywords and semantic assumptions." This is very vague but indicates that Watson generate a set of queries which it runs against its own internal search engine ranking answers presumably based on the quality of the initial search and the confidence of the answer. Would be very very cool to have an example.

All in all, wets the appetite but leaves one wishing for more hearty fare (or a job at IBM!).


A while ago I submitted a link to a blog post( http://bit.ly/igJeRB ) I did that goes into the system in a bit more depth, based on a paper IBM published - there's a link to the paper (open access) there too.

For your cases, from what I got from reading papers:

(1) You could spend two weeks only reading papers on noun-compound semantics. Try a google scholar search just to get an idea of the volume of research. A simple technique to test how idiomatic a phrase is would be a Bayesian type test to see how many times "Treasure Island" occurs in a corpus divided by how many times "Treasure X" and "X Island" occurs. In this case the capitalization probably cues it to look up Treasure Island in Freebase. Interesting thought actually - do the contestants also get the question as text? I think they do, so they get capitalization.

(2)I would be pretty sure these are not Java methods, I'd say they are logical predicates representing the fact that 'Treasure Island' is returned as being a member of the set of things indicated by the predicate, as returned either by syntactic processing (island) or from the knowledge bases (WordNet, Yago, Freebase, dbpedia)

(3) there isn't a worked example in the paper, but my idea of this is that it's basically Watson's way of figuring out what queries to type into it's unstructured text corpus (they have a corpus of web snippets indexed with lucene)



> I assume what he is giving us is method names written in Java with "Treasure Island" as the single argument that return a value indicating the likelihood that "Treasure Island" is what the method name refers to.

The sample queries struck me as Prolog rather than Java, even before they mentioned it. Using Prolog would allow testing the combination of various propositions in addition to the individual values. This would imply that they derive a list of possible assertions, then try combinations of those assertions and see what values of X will match those assertions. How they avoid combinatorial explosion when testing all those assertion combinations, I have no idea, but it must involve ranking and pruning.


Some interesting nuggets in here, I had watched the Nova specials on Watson etc. I would have liked to have had a question about the team, their work stress etc., but otherwise a fun read. I especially enjoyed the step by step parsing and examination of a question in the process of how Watson would work through it.


One could make the argument that since Watson is trained with English information and English Jeopardy! clues, English is Watson's native language. Sure, there's Java down to Assembly beneath Watson's understand of English, but the same goes for native English-speaking humans. English speakers aren't biologically any different than, say, French speakers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: