Hacker News new | past | comments | ask | show | jobs | submit login

Is voice conversion via ML possible yet?

I know we can create vaguely convincing synthesised voices, but I've not run across a program which can take my recorded voice, and convert it to a celebrity, or even shift apparent pitch or gender convincingly.




I haven't seen it turnkey yet, but it's coming.

BTW, we can create extremely convincing synthesized voices: https://google.github.io/tacotron/publications/tacotron2/ind...


I was about to say that those clips still registered as a computer quite easily to me, until I got to the comparison with a human voice.

I think I've just gotten so used to that voice as the "google" voice that I automatically associate it with computers. It would be strange to meet the human that was providing the human voice in those samples.


"Deep Voice 2 can learn from hundreds of voices and imitate them perfectly. Unlike traditional systems, which need dozens of hours of audio from a single speaker, Deep Voice 2 can learn from hundreds of unique voices from less than half an hour of data per speaker, while achieving high audio quality." - http://research.baidu.com/deep-voice-2-multi-speaker-neural-...

Now imagine that with Tacotron quality, and you'll get that "strange" effect with anyone, meeting their vocal clone.

This is still text-to-speech, so it's not live-copying your intonation, but you could easily imagine a seq2seq network designed to do so.


I had a very strange experience like that recently when listening to Radio 3, the BBC's mostly-classical channel. They had an opera programme with guest presenters from the Met Opera in New York. The usual BBC presenters of course have British accents, and one of these American presenters had a particular accent that my brain latched onto as matching the sound of synthetic speech. I just could not suspend disbelief and convince myself that this speech - which rationally of course I knew was human - was that of a real person rather than some sort of AI assistant. It was a very strange feeling.

I did have a fever at the time, which might not have helped.


Maybe that person was the human source of a voice you use in text-to-speech in your GPS perhaps or book reading app?


It's intent and emotion that I'm really interested in - which to the best of my knowledge, computer generation still isn't good at. (This is for VR games, so high quality voice acting is a priority)

Hence, if I could find a program which could reliably turn one actor's voice into another, I could use their acting ability, but with more characters and less requirement for them to "put on" voices. That's powerful because really good actors are thin on the ground, and also because trying to hold a different voice or accent can limit the quality of the main performance.


Note quite what you're asking for, but a couple years ago Adobe demoed some voice/speech editing tech that's pretty impressive: https://arstechnica.com/information-technology/2016/11/adobe...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: