The mention of a speech synthesis module, combined with a comment on how demonic the voice is, adds up to an expectation - the metaphor is misunderstood as real.
The phrasing is somewhat confusing, I had to reread it after being confused at the lack of sound in the video and the mention of a voice synthesis module. The "voice" refers to the voice in the subsequent paragraph, the one driving the programmer, not the computer.
Interestingly, it shows that the previous commenter and myself both read the article the same way, one that I don't think a typical author would necessarily assume. I skipped the video entirely, upon reading the first words of the following paragraph went back and played the video, eager to hear that voice module.