Hacker News new | past | comments | ask | show | jobs | submit login

The voices aren’t the model; while the model takes cobventional training for which code is not provided, voices are, or at least can be, built by what could be described as “accumulated in-context learning”. Every time you run text with a voice (which can be null) through the inference process, the result is an audio waveform and an updated history prompt.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: