Haha, try again, the human is 1,2,2,1 according to the filenames (I was fooled too).
I do think the difference would become obvious with a paragraph or more of speech, though. It's difficult to judge what the correct intonation should be on these single sentences without context. Ultimately, correct intonation requires a complete understanding of meaning which is still out of reach. An audiobook read by tacotron 2 would still sound strange.
Depends on the audiobook. I think technical docs would be alright, which is what I want this mostly for. Lots of technical docs I'd like to listen while I work out.
I do think the difference would become obvious with a paragraph or more of speech, though. It's difficult to judge what the correct intonation should be on these single sentences without context. Ultimately, correct intonation requires a complete understanding of meaning which is still out of reach. An audiobook read by tacotron 2 would still sound strange.