Hacker News new | past | comments | ask | show | jobs | submit login

Record her reading the texts of a standardized text training corpus.

That way, you can retrain an existing AI to do text to speech with her own voice.

Edit: here's a link to the corpus that I believe Mozilla uses http://www.openslr.org/12/




Is she on board with this? I can imagine a lot of people being severely put off by being asked to record "a corpus of approximately 1000 hours" in advance of what sounds like a stressful surgery.


Good concern. We won't be doing any "hundreds of hours" solution. We've been married over 30 years so we're a pretty good team - naturally I wouldn't do it without both of us thinking it was a great idea.


Seconding this, also, reproducing her voice with an AI may not be something she is on board with, it could make her feel like you don't accept her with or without a voice. It may also be unhealthy for you, similar to how spending too long on social media can become a dangerous source of dopamine.

It might make sense to consider making a recording that is more meaningful, and focus on giving her emotional support rather than building an AI that could be perceived as a replacement.


It's not like OP is replacing her entirity with Alexa, if I were the wife I'd think "sure, let's 'backup' my voice, having it available in case I lose mine would be useful, so that people can still hear my thoughts in my voice instead of a robot's."...


> if I were the wife I'd think "sure, let's 'backup' my voice"

That very well seems to be the OP's position as well. That's a far more generous reading of the situation. It makes sense that someone here would have the mindset of "lets keep a backup in case we want access to it later."


Of course, but I think it's very, very important that OP has this conversation themselves and doesn't take the word of folks on the Internet


It's 1000 hours because multiple speakers record the same articles.

I believe some speakers only recorded 1-2 hours, which seems doable.


They have 500 hours left, so that would be impossible.


I'll push back on this. The quality of the read speech should be a higher concern than having parallel data. Unless OP's wife is a teacher or actor/voice actor, if LibriSpeech transcripts are boring, it will come out in the speech.

I think OP would ideally want the model to pick up on more natural intonation, instead of monotone dictation. Record everything from now on, as best you can with similar recording context, and hopefully that data will be enough to cover more natural nuances.


And get a high quality mic to do it with!


or better, rent a recording room for the time it takes.


Mozilla's is licensed CC-BY, which is pretty liberal. In case the Attribution license is a blocker, here's CMU_ARCTIC's, which is built from copyright-free sources and has no licensing restraints: http://festvox.org/cmu_arctic/


I think this is backwards... This is a corpus to train speech to text, not text to speech, right?


It's a corpus designed to capture the full breadth of combinatorial nuances of human speech in a general sense.


No, it is not. For one, it's a corpus of read speech, which means it does not capture well the characteristics of conversational human speech – hesitation, disfluencies, different tones and registers, etc. LibriSpeech has a paper explaining the design of the corpus, all you need to read is the first sentence of the abstract to know what it is supposed to capture:

This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems.

http://www.danielpovey.com/files/2015_icassp_librispeech.pdf


That sentence alone does not establish that read speech differs from conversational speech, thanks for the information / pointing this out, though.


Thanks!


This right here


https://youtu.be/0sR1rU3gLzQ

'This AI Clones Your Voice After Listening for 5 Seconds'


Woah, this is definitely the solution that the OP needs. I did read about WaveNet's text-to-speech a couple of years ago but didn't know it has progressed this far. It's crazy good, mindblowing.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: