Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A very cool project, you should build a website interface you could easily charge for it or take donations/advertise on it if you want to keep it free

What would it take to add a specific language to piper? And do you know a good speech to text model?



Thank you! Wouldn't a website interface then make it competing with and thus inferior to solutions like those from 11elevenlabs? I am not opposed to creating a SaaS offering, but I feel I do not have the economies of scale nor proprietary models a large company has. Let me know if I am wrong! Maybe I will one day do something as a separate project on the browser with WebGPU.

With regards to adding languages, first check if support already exists [0]. Then there are a few tutorials that might be relevant [1] [2] [3]. Once you have the onnx model you can just put it in the QuickPiperAudiobook model directory and specify it via the cli args.

[0] https://rhasspy.github.io/piper-samples/ [1] https://github.com/rhasspy/piper/issues/51 [2] https://github.com/rhasspy/piper/blob/master/TRAINING.md [3] https://www.youtube.com/watch?v=b_we_jma220


> And do you know a good speech to text model?

OpenAI's whisper, code+model are available, and multiple projects have built on it. You could try this wrapper: https://github.com/m-bain/whisperX -- or for short utterances on a smart-phone https://github.com/futo-org/whisper-acft


Deepgram is another alternative. I use it at work, fastest service and also relatively cheap. But Whisper is better for selfhosting




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: