A very cool project, you should build a website interface you could easily charg...

C-Loftus · 2024-10-07T12:17:58 1728303478

Thank you! Wouldn't a website interface then make it competing with and thus inferior to solutions like those from 11elevenlabs? I am not opposed to creating a SaaS offering, but I feel I do not have the economies of scale nor proprietary models a large company has. Let me know if I am wrong! Maybe I will one day do something as a separate project on the browser with WebGPU.

With regards to adding languages, first check if support already exists [0]. Then there are a few tutorials that might be relevant [1] [2] [3]. Once you have the onnx model you can just put it in the QuickPiperAudiobook model directory and specify it via the cli args.

[0] https://rhasspy.github.io/piper-samples/ [1] https://github.com/rhasspy/piper/issues/51 [2] https://github.com/rhasspy/piper/blob/master/TRAINING.md [3] https://www.youtube.com/watch?v=b_we_jma220

imurray · 2024-10-07T10:09:29 1728295769

> And do you know a good speech to text model?

OpenAI's whisper, code+model are available, and multiple projects have built on it. You could try this wrapper: https://github.com/m-bain/whisperX -- or for short utterances on a smart-phone https://github.com/futo-org/whisper-acft

Bilal_io · 2024-10-07T10:54:53 1728298493

Deepgram is another alternative. I use it at work, fastest service and also relatively cheap. But Whisper is better for selfhosting