Record her reading the texts of a standardized text training corpus. That way, y...

asveikau · on June 11, 2020

Is she on board with this? I can imagine a lot of people being severely put off by being asked to record "a corpus of approximately 1000 hours" in advance of what sounds like a stressful surgery.

tech4all · on June 11, 2020

Good concern. We won't be doing any "hundreds of hours" solution. We've been married over 30 years so we're a pretty good team - naturally I wouldn't do it without both of us thinking it was a great idea.

joshribakoff · on June 11, 2020

Seconding this, also, reproducing her voice with an AI may not be something she is on board with, it could make her feel like you don't accept her with or without a voice. It may also be unhealthy for you, similar to how spending too long on social media can become a dangerous source of dopamine.

It might make sense to consider making a recording that is more meaningful, and focus on giving her emotional support rather than building an AI that could be perceived as a replacement.

netsharc · on June 11, 2020

It's not like OP is replacing her entirity with Alexa, if I were the wife I'd think "sure, let's 'backup' my voice, having it available in case I lose mine would be useful, so that people can still hear my thoughts in my voice instead of a robot's."...

badRNG · on June 11, 2020

> if I were the wife I'd think "sure, let's 'backup' my voice"

That very well seems to be the OP's position as well. That's a far more generous reading of the situation. It makes sense that someone here would have the mindset of "lets keep a backup in case we want access to it later."

aspaceman · on June 11, 2020

Of course, but I think it's very, very important that OP has this conversation themselves and doesn't take the word of folks on the Internet

fxtentacle · on June 11, 2020

It's 1000 hours because multiple speakers record the same articles.

I believe some speakers only recorded 1-2 hours, which seems doable.

jfkebwjsbx · on June 11, 2020

They have 500 hours left, so that would be impossible.

audiohermit · on June 11, 2020

I'll push back on this. The quality of the read speech should be a higher concern than having parallel data. Unless OP's wife is a teacher or actor/voice actor, if LibriSpeech transcripts are boring, it will come out in the speech.

I think OP would ideally want the model to pick up on more natural intonation, instead of monotone dictation. Record everything from now on, as best you can with similar recording context, and hopefully that data will be enough to cover more natural nuances.

windowshopping · on June 11, 2020

And get a high quality mic to do it with!

cabite · on June 11, 2020

or better, rent a recording room for the time it takes.

trynewideas · on June 11, 2020

Mozilla's is licensed CC-BY, which is pretty liberal. In case the Attribution license is a blocker, here's CMU_ARCTIC's, which is built from copyright-free sources and has no licensing restraints: http://festvox.org/cmu_arctic/

savingsPossible · on June 11, 2020

I think this is backwards... This is a corpus to train speech to text, not text to speech, right?

joshribakoff · on June 11, 2020

It's a corpus designed to capture the full breadth of combinatorial nuances of human speech in a general sense.

reubenmorais · on June 11, 2020

No, it is not. For one, it's a corpus of read speech, which means it does not capture well the characteristics of conversational human speech – hesitation, disfluencies, different tones and registers, etc. LibriSpeech has a paper explaining the design of the corpus, all you need to read is the first sentence of the abstract to know what it is supposed to capture:

This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems.

http://www.danielpovey.com/files/2015_icassp_librispeech.pdf

joshribakoff · on June 11, 2020

That sentence alone does not establish that read speech differs from conversational speech, thanks for the information / pointing this out, though.

tech4all · on June 11, 2020

Thanks!

Mo3 · on June 11, 2020

This right here

olivermarks · on June 11, 2020

https://youtu.be/0sR1rU3gLzQ

'This AI Clones Your Voice After Listening for 5 Seconds'

fintechie · on June 11, 2020

Woah, this is definitely the solution that the OP needs. I did read about WaveNet's text-to-speech a couple of years ago but didn't know it has progressed this far. It's crazy good, mindblowing.