Is she on board with this? I can imagine a lot of people being severely put off by being asked to record "a corpus of approximately 1000 hours" in advance of what sounds like a stressful surgery.
Good concern. We won't be doing any "hundreds of hours" solution. We've been married over 30 years so we're a pretty good team - naturally I wouldn't do it without both of us thinking it was a great idea.
Seconding this, also, reproducing her voice with an AI may not be something she is on board with, it could make her feel like you don't accept her with or without a voice. It may also be unhealthy for you, similar to how spending too long on social media can become a dangerous source of dopamine.
It might make sense to consider making a recording that is more meaningful, and focus on giving her emotional support rather than building an AI that could be perceived as a replacement.
It's not like OP is replacing her entirity with Alexa, if I were the wife I'd think "sure, let's 'backup' my voice, having it available in case I lose mine would be useful, so that people can still hear my thoughts in my voice instead of a robot's."...
> if I were the wife I'd think "sure, let's 'backup' my voice"
That very well seems to be the OP's position as well. That's a far more generous reading of the situation. It makes sense that someone here would have the mindset of "lets keep a backup in case we want access to it later."
I'll push back on this. The quality of the read speech should be a higher concern than having parallel data. Unless OP's wife is a teacher or actor/voice actor, if LibriSpeech transcripts are boring, it will come out in the speech.
I think OP would ideally want the model to pick up on more natural intonation, instead of monotone dictation. Record everything from now on, as best you can with similar recording context, and hopefully that data will be enough to cover more natural nuances.
Mozilla's is licensed CC-BY, which is pretty liberal. In case the Attribution license is a blocker, here's CMU_ARCTIC's, which is built from copyright-free sources and has no licensing restraints: http://festvox.org/cmu_arctic/
No, it is not. For one, it's a corpus of read speech, which means it does not capture well the characteristics of conversational human speech – hesitation, disfluencies, different tones and registers, etc. LibriSpeech has a paper explaining the design of the corpus, all you need to read is the first sentence of the abstract to know what it is supposed to capture:
This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems.
Woah, this is definitely the solution that the OP needs. I did read about WaveNet's text-to-speech a couple of years ago but didn't know it has progressed this far. It's crazy good, mindblowing.
That way, you can retrain an existing AI to do text to speech with her own voice.
Edit: here's a link to the corpus that I believe Mozilla uses http://www.openslr.org/12/