One question I have about Watson that I don't recall being mentioned in any videos or articles so far - what sort of interface does Watson receive the questions over? Is Watson performing speech recognition or getting the text of the question via some sort of interface?
This was answered in the Nova special, in passing. The question is fed to Watson as text at the same moment it appears on the display that the contestants can see. Likewise for correct answers. Watson is not performing speech recognition.
Aw this seems rather half-assed to me. I hope at a later time they do another one of these with a new and improved Watson. I really want to see a humanoid robot standing on the podium (connected by wifi to the supercomputer) and using OCR and speech recognition. I also think Watson's voice should be more authoritative. It's like they went to extraordinary effort to make Watson sound unimposing.
It can always be "awesomer". However, calling it half-assed I think is a bit harsh. The truly novel aspects of Watson are that in its abilies in to answer questions in a truly domian-free way. It is not trained to be an expert in a specific domain, but instead to calculate what it believes to be the best answer from any domain. This is truly novel and amazing. That it can do this with Jeopardy questions, regardless of how it is given those questions, is truly remarkable. Jeopardy questions are often times laden with puns or other literay devices. Watson can understand and answer better than the vast majority of humans on this planet. That it can't currently "see" or "hear" is, IMO, less interesting given the advances it makes in deep QA.
I didn't mean to sound harsh - I understand the significance. I just meant the technology involved in Watson being able to answer such questions is undoubtedly orders of magnitude more complex than the OCR and speech recognition required to process the question in the first place.
They're playing a "human" game after all, and they're not playing like a human is fed into the computer electronically.
You realize that if it were only to recognize speech, other players would have a significant time advantage because they can read the text in probably 1/10th of the time it takes the host to speak the question out loud. "Feeding" Watson the text is just another way of saying Watson "reads" the text displayed on the screen, just like every human is allowed to.
Yeah, OCR (optical character recognition) would probably be used for reading the questions. The speech recognition would be for listening to other people's answers so as not to duplicate wrong answers.
Well is the text "fed" in a word at a time, or the whole question as a block that takes nanoseconds to parse. That's an unfair advantage if so because because the computer is getting the data in a different format than the other contestants, who are handicapped since they have to preprocess the data in a way the computer doesn't because it has been preprocessed in advance for it.
How are the two situations any different? Just because a program has an entire block of text in memory doesn't mean that it can instantaneously build all the data structures needed to process and make sense of it. We both "read," Watson's "reading" just takes place in its code.
It's very different. The humans have to decode the visual and aural representations before parsing for meaning can commence. The computer should have to do this too, but that step has been done in advance for the computer. This gives the computer a time advantage not because it was faster at computing, but because necessary computing was removed in advance from its task.
I kind of agree, but OCR with a stable camera on the clue board would be extremely fast, reliable, and trivial, so it wouldn't make it much more impressive. Voice recognition (without OCR or a text interface to fall back on) would certainly be impressive, but I doubt it would be very reliable or impressive. I may be underestimating the state of the art on voice recognition; it may be very accurate considering Trebek's great diction and the clean studio audio that would be captured.
That's ironic, because "HAL" is a shift of IBM:
[I B M] - [1 1 1] = [H A L].
Apparently Arthur C. Clarke and Stanley Kubrick didn't notice this. "As it happened, IBM had given us a good deal of help, so we were quite embarrassed by this, and would have changed the name had we spotted the coincidence." - A.C.C.
The documentary mentioned that Watson does some speech recognition. It attempts to listen to the competitor's answers so that it doesn't give a wrong answer that was already given.
No, it is fed the correct answer by text once the question is completed. They said "listen" in the documentary, but later, in passing, specified what they meant by that. In particular, the demonstrated the active learning when Watson failed to understand a category required dates as answers. After 4 it learned that dates were the type of answer required and got the last question correct. Again, the feeding of the correct answers was happening after the question completed and via text, not speech.
You're talking about two different things -- SMrF is saying that if Jennings or Rutter gives an incorrect answer, Watson is able to run speech recognition on what they said, ruling out the answer they gave. This would happen before the question is completed.
So what translates the speech to text? I guess I just assumed they used software but perhaps there is a human doing this. I probably just missed when they mentioned this.
The correct answers are already known. Watson presumably gets the same answer the host has in front of him.
Someone would have to transcribe what the other contestants are saying only if Watson would also get their incorrect answers which doesn’t seem to be the case.
The contestant said "20s", IBM Watson said "1920's" - both were wrong but one can see how Watson didn't recognise that the answer was the same, it wasn't.
What about video clues? I assume there is a transcript, but sometimes there's a visual reference. Perhaps some sort of integration to Tineye.com or advanced video recognition? :-)
I would think as soon as it computes an answer. Of course there is a penalty for getting an answer wrong, so it can't be pressing the button all the time. But the PBS documentary I think hinted that it will be a sent a signal to know what amount to wait before reading aloud an answer.
That said, I wish the engineers would have implemented OCR technology into Watson so it could see the TV screen, decipher the text question, and understand it -- instead of directly being fed the question. It would have given Watson a nice little title of being'complete'.
What he means is that you aren't permitted to buzz in on Jeopardy until the host has completed reading the question and the buzzers are activated. Perhaps Watson is spamming the 'answer' button from the moment he knows the answer, or perhaps they let him cheat by sending him a signal when the buzzers become activated. Given that the process of buzzing in to answer a question is a large part of the strategy of the show though, I hope they aren't letting him cheat in that way.
Edit: Failed to read the comments lower on the page. Looks like Watson plays by the same rules as everyone else, using a physical buzzer which can't be activated until the question is read and the 'answer now' signal light is activated. I know that another restriction on when to answer that Watson has is that he won't buzz in until he's confident in an answer, while human players will sometimes buzz in figuring they will be able to come up with the answer in the couple seconds between buzzing and answering.
I'd been told by a competitor on the show that timing the button press correctly was an underappreciated part of the game. If you press the button before you're allowed to (before Mr. Trebek finishes reading the question) your button is locked out for long enough (maybe a second?) that you're unlikely to get a second chance at that question. So I assume Watson wouldn't be allowed to "spam" the button.
In the case of Deep Blue the movement of the pieces isn't pertinent to the game - if human/robot assistants moved all pieces for both players it wouldn't alter the result [except possibly in some very edge cases].
For IBM Watson playing Jeopardy against humans the buzzer reaction time is exceedingly important. The buzzer is an integral, possibly even central, part of the game.
PBS is basically "owned" by the corporations that advertise now in front of every show. Congress keeps cutting back their funding and viewer donations are a fraction of their operating budget.
In a few tiny ways their low budget is a good thing, ie. PBS Newshour doesn't have the insane stupid graphics/animations/toys that mainstream news does.
But in almost every other way their low budget in ruining the experience now.
Not in the same way. PBS never plays commercials and actually takes several days off each year to ask people to donate money so that they can continue broadcasting.
It genuinely is supported by individual donations. Showing their programs to a wider audience could only result in more people donating to them.
How much more does it cost them to make their content available worldwide? Couldn’t they upload everything to YouTube and let Google foot the bill? (Which Google would gladly do.)
I don’t know whether more people would donate if they made their content available worldwide but if such a move wouldn’t cost them anything or nearly nothing there certainly wouldn’t be any harm in doing so.
Mmm. They could try it; but the only shows for which it would be easy to try it would be those for which they own the copyright. I don't know whether that would be a representative sample.
Have you watched KQED (PBS affiliate in the Bay Area) recently? It's filled with commercials! HHMI, Subaru, etc. Not just static images; video commercials, just like you see in other channels. I wonder what sets PBS apart now (except for the pledge drive) ?
You still have regional concerns. Some are purchased from the BBC (like some episode of Horizon or Imagine) or are co-funded (like the Earth series). Those contracts will only allow viewing in the US.
What's amazing to me is NPR, PBS, CBC and BBC haven't gotten together to make their own public domain codecs. It's kind of their job to do this stuff. I guess if you take contributions from "The Bill and Melinda Gates Foundation", you got make some compromises.
Furthermore, the insinuation at the end is uncalled for. Calling for cutting funding of public radio/tv is already a political talking point (at least in regards to cutting BBC), so can you imagine the heat they'd catch for trying to develop codecs as well?
Edit: removed PBS from list of cutting funding. Forgot they're donation supported.
The real question I have is according to that episode Watson does not take into account the category only the 'answer' and the already shown questions. It seems _really_ odd that they would ignore this bit of information.
They could at least factor it in selectively, in obvious situations. For example, all categories in jeopardy that contain a quoted string include that string in all of the answers. Wouldn't be too hard to use that information.
edit: furthermore, sometimes it's impossible to narrow down the answer if you don't understand the category. For example, if the category is '"C" you later' or something, there might be like 8 possible synonyms that fit, but only one starts with a "C".
This is a significant disadvantage for some categories. For example one category from Friday was "WARE"-ING with each question containing "ware" in the response.
After typing out this comment it occurs to me that if you carefully consider each answer there is only only logical question for all the answers in this category. Although I don't have any specific examples, there are times when a question is ruled incorrect because of the category name (e.g The category called for a specific number of letters in the response).
I would have to disagree. Every bit of information helps. I was even surprised that they didn't implement a voice to text system so that he could learn what the other contestable guessed incorrectly and utilize that information if only to not guess the same thing on his turn.
They have added this. This was actually a problem in earlier versions where it would sometimes guess the exact same wrong answer as a previous contestant.
They said in the end of the episode that they started feeding him the correct questions once they are known (Thus in the example learning that all of the questions are names of months), but they didn't mention the categories that I saw.
I was curious about the same thing - did they ever mention what languages were used in the documentary?
The screenshots looked like Eclipse with Java. The brief snippets I could make out certainly looked like they had to do with evaluating rules and scoring judgment, but that could have easily been stock footage.
I'm currently finding out what information we're allowed to share about how the avatar works and what went into developing it. The problem is we're so far down the totem pole I probably won't know for a while yet. :-/
One thing I'm interested in is any skew in the questions from normal. In particular I hope they ask linguistically tricky questions where you can't even figure out what's being asked at first. I felt like they went a bit easy on that front in the preview round:
I doubt they are skewing the questions. That kind of defeats the point. I suspect Jeopardy is constantly adding to a pool of questions. My hope is that they simply grab from that pool just like any other episode.
Note, the hardest questions for Watson are short questions -- the reduced time hurts Watson more than most human competitors.
The writers had no knowledge of which clues would be used for the Watson episodes; they only mention that they don't use audio or video clues. I certainly haven't done any analysis of it, but I didn't feel like they were any less common in the preview round than normal. Between all the games I'm sure we'd see more "linguistically tricky" clues. I'm hoping for some good "before and after" questions.
It’s a shame that this is (necessarily!) such an insular challenge. Everybody knows what chess is all about, I fear that the impact of this game will be limited to the US or the Anglosphere. Just as an example, there has been no Jeopardy on German TV since 2000, it’s not really a part of German pop culture and because of all the puns it doesn’t translate well.
(Question for native speakers: When watching the practice round [0] are you generally able to keep up and answer the questions? The speed with which the game was moving made it nearly impossible for me to follow or enjoy the game. I would like to know what the experience is like for native speakers.)
Well, that's kind of the point. Chess is a hard game, but it doesn't require broad knowledge of a human culture. If you're going to build a machine that exhibits such knowledge, you've got to pick a culture.
I can definitely follow along with this or any game of Jeopardy!, but my problem is that I just don't know all the answers. To me this is part of the appeal.
This is definitely faster-paced than regular episodes of Jeopardy!, but I don't think it's prohibitively fast for native speakers.
It's too bad that Jeopardy! doesn't fit in with international pop culture better. I think it's a perfect fit for a lot of AI problems.
Typically real jeopardy will show the text on the screen. I can get about 30 (sometimes up to 35ish) "questions" for a normal show but it is all about reading it to get those extra few brain cycles.
Don't fret; you can still tune in after another decade or so for a machine to beat us at GO. Given IBM's trending business interest in corporate AI I would not be surprised if it were another IBM sponsored team that finally builds the successful bot.
If you're okay with waiting a few hours/days, The Pirate Bay seems to have a good selection of torrents of recent Jeopardy! episodes, though I've not yet tried it.
You'll also be able to see a full transcript of the board, as well as all questions provided by contestants, at www.j-archive.com.
I had thought from the counter that IBM had run on the Watson page that there might be but it seems that us non-US residents will have to wait. Still very much looking forward to this. History in the making.
Hulu refers to crackle.com. Crackle has "minisodes" which seem to date from 2002-2003.
It is owned by Sony, distributed by CBS and seems to be syndicated, which means there is no network that carries it specifically, and thus no online distribution, as each market has a different station (Eg: Fox might buy it in one market while ABC might buy it in another).
In case anyone is interested to know Watson's opponents: Watson will compete against Brad Rutter, the current biggest all-time money winner on Jeopardy!, and Ken Jennings, the record holder for the longest championship streak [source: wikipedia]
I'm glad they touched on the idea that this is actually a "Human vs Human" competition... Really good Jeopardy players vs. a team of humans that built their own "Jeopardy Player".
Watson winning the tournament is a triumph for humanity, not just for machines.
Did anyone else notice that the vignettes about Watson's creation featured IBM researchers using MacBook Pros? So much for "International Business Machines." ;)
As a former IBM employee, you can pretty much use what ever type of machine you want for your work as long as it doesn't interfere with what you do. There is a significant Mac subculture internal to IBM, and they love and suppor it. However a huge percent of their 600,000 employee's still use IBM branded laptop's and desktop's.
Much of success on Jeopardy is not just deciphering clues in the answers, but your timing on ringing in to give the question. I'd imagine a machine could get really good at getting the timing down. Does Jeopardy have a way of varying who rings in first to keep things more fair?
Something I wrote from a previous thread. The one piece of info that wasn't in this comment is that Watson can NOT anticipate when Trebek is about to finish the question. Watson must wait until it gets a signal that the buzzer is now available, and only then can it begin the process to physically depress the button.
Remember that Watson also has to depress a physical button (the same buzzer everyone else uses).
The eye to finger path for humans is about 200ms. It probably takes about 100ms for Watson to physically press the button. So Watson is about 100ms faster. But that also gives humans about a 100ms window in which to beat Watson. This means that you need to start your press 100-200ms before Trebek finishes his last word.
That's pretty good sized window for most people given you are reading the question along with Trebek. If the person who turns the light on is very consistent, I think a human who is good at this could consistently beat Watson.
Interesting fact: during Ken Jennings' original 74-game winning streak, he felt it was an unfair advantage he had so much experience with the buzzers. He asked the producers to allow the other contestants (competing for the first time each episode) time to practice with the buzzers, which they did.
Another key to remember is that there is time after the button press to think of the answer - it's a common strategy to ring in when you have a certain confidence level that you can think of the answer quickly. Watson uses confidence levels, too, but he already has a particular answer in mind instead of a vague confidence level for the whole clue.
The timing of when players are allowed to click in is controlled by a human. Someone backstage decides at what moment Alex is finished speaking and then opens the clickers. If you click too early, you get ~ 300ms delay penalty which gives the other players a chance to click in.
Click timing is indeed very important but I do not think Watson has any special advantage there.
But there's a visual cue letting the players know that clickers are open (a light flashes, I think), so the computer probably gets some notification too, at which point it could instabuzz.
Not it does not have a special advantage but it also does not interface with the questions in the way that humans do, it cannot hear Alex or read the question on the screen, the questions are messaged to Watson via something akin to a text message or an email.
That’s Jeopardy. Whoever buzzes first gets to answer, I don’t see what’s unfair about that. Computers can react faster than humans, why should they be denied that advantage?
Because it goes against the spirit of the competition. Nobody would be impressed if IBM built a machine that could simply buzz in faster than Jennings and Rutter.
In case anyone missed it the first time around, the nytimes mag had a pretty good writeup on Watson back in June -- might be worth instapapering and reading later if you're going to catch the broadcast this week:
As an AI researcher I'm excited to watch this week. Even if it's not the most elegant artificial Jeopardy player imaginable, it raises the public profile of a lot of AI & ML topics and might encourage and inspire other groups to tackle ambitious projects.
I think it's fun to think about how Watson type intelligence will be at the average consumer's figertips (and affordable, to boot) in less that 20 years.
A quick look at the TOP500 supercomputers puts about a 10,000x increase in raw flops since 1993. Using that as a rough benchmark (a very rough one), we're looking at some impressive stuff in the next twenty years. It's not unreasonable to think that:
current consumer processor in FLOPS * 10,000 < Watson's grid's FLOPS
Naturally, that is a very rough estimate and has no scientific bearing at all.
I just walked into my local bar, asked the bartender to switch to channel 7, and voila! There weren't too many people around, but it was fun (and between breaks I chatted with a couple of drunk chicks who were trying to drown their V-Day sorrows in shots of vodka).
Assuming the space required to run watson halves every two years and v:: of the human brain is v1 = 1500cm³, and Σv of Watson's servers = v2 = 90 x (17.5cm x 44.0cm x 73.0xcm) = 5.05e6 cm^3.
So Watson will be the size of a human brain in t = - ln(v1/v2)/ln(2).
I'm sure there are those that disagree strongly, but I feel like this isn't as much an advance for AI as it is another interesting combination of filtered human-structured knowledge and computation power. Just as Deep Blue brute-forced chess, this is mostly a brute-force of another, albeit more open-ended domain (i.e. Q/A).
I'm not arguing that this isn't an impressive accomplishment, but that the statistical-learning stream of research is likely a conceptual local optima that yields the best results in the near term but is probably unrelated to the way we ultimately achieve a creative, general AI.
I hope someone writes down all the answers (questions?) and feeds them into google so we can have a Google-Watson showdown. (Oh, we need a third contestant, how about bing too).
What I found was that the answer was in the snippets on the first SERP but certainly for some of the questions it seemed pretty hard to parse it out.
Also I think Jeopardy got 1 or possibly 2 answers wrong
- White City was the City of the 1908 Olympics, it was in London Borough or Greater London but not in the City of London (http://en.wikipedia.org/wiki/City_of_London). White City is a borough and not a city, which makes this a really hard question for a computer to answer I think.
- Nagini killed whatever Harry Potter character was mentioned ; I'm not familiar with the series and didn't see this question this was just based on a comment on Reddit and a Google for "killed $CharName" (I have a really bad memory).
Is it bad that I thought "Oh, so you are going to have Google compete twice"? ;)
It would be interesting to feed to clues into Google to see what comes out. There are easy ones like http://www.google.com/search?hl=en&source=hp&biw=115... but I'm sure there are also some that Google wouldn't have a clue.
As would I. However, due to the computer resources it takes per question it would be extremely expensive to scale.
Watson is made up of a cluster of ninety IBM Power 750 servers (plus additional I/O, network and cluster controller nodes in 10 racks) with a total of 2880 POWER7 processor cores and 16 Terabytes of RAM. Each Power 750 server uses a 3.5 GHz POWER7 eight core processor, with four threads per core. and it still takes ~15 seconds a question.
PS: Rough order of magnitude: 90 IBM Power 750 ~= 180 EC2 High-Memory Quadruple Extra Large Instance * 15 seconds ~= $1.50 a question. (Edit: that's probably still low, each of those 750's have 16TB / 90 = 180GB of ram).
Because in order to answer a question Watson is really executing hundreds or thousands of different algorithms in parallel to attempt to find candidate answers, and then comparing and building confidence scores for each.
It was pre-taped a few weeks ago in line with Jeopardy's usual schedule. There were a number of untelevised practice sessions beforehand against other Jeopardy contestants and wannabes. IBM did their tweaking then and promised that the actual competition rounds would be televised regardless of results.
Also if IBM were to cancel an episode that showed bad results, it'll eventually leaked out, especially for project of this caliber. It would be an instant negative publicity, since EVERY tech site would pick it up.
You'll have the scorn of all the geeks on the internet, which could be interesting lol
IBM wants to sell the stuff that's Watson's based on to companies. Companies will get their own 'answer engine'. I guess they think that licenses will make them more money than ad-money for search engines. Which could well be true in this case. Watson's insane knowledge still won't give me the cat pictures I want.
QA technology is very different than search because the former makes strong assumptions about quality of the underlying data. Google's search is much better at filtering out and dealing with low quality content (e.g. blogs) and spam.
IBM should use this tech to one-up the content-scraper SEO junk sites. They should put out sites that are what the scraper sites purport to be and put them out of business.
http://www.pbs.org/wgbh/nova/tech/smartest-machine-on-earth....
One question I have about Watson that I don't recall being mentioned in any videos or articles so far - what sort of interface does Watson receive the questions over? Is Watson performing speech recognition or getting the text of the question via some sort of interface?