Dan Klein shared in a graduate NLP class at Berkeley a few years back a AAAI article on Watson back in 2010 (when it actually was a distinct technology stack and not just marketing nonsense). At that time IBM was focused on question answering in Jeopardy. It was pretty clearly incremental rather than novel— Dan used the example to show that 1) ensemble techniques can be effective if done properly and 2) hyper parameters matter, a lot 3) there's human intelligence and then there's Ken Jennings intelligence: looking at precision and percent answered, he's in his own separate league. It made me think a lot about individual differences in terms of declarative knowledge.
It was also unclear to me when they did the contest as to whether Watson only had access to the analog audio and/or image of the questions asked. So did they have to parse the question the same as Ken.
Also, it was clearly optimized for a specific use case. If the questions were reworded with more clues that were puns or needed inference, I think Ken would have done about the same but Watson would have faired much more poorly.
They had the text of the question (the sentence), and they had to parse that and then the resulting question was then sent through a text to speech engine obviously, but there was no speech to text.
https://www.aaai.org/Magazine/Watson/watson.php