Hey HN. I’m Ivan, hacker from Ukraine.
For about a year, I was working on Listenly — an app to listen to text content with OpenAI's natural-sounding text-to-speech model.
At some moment, I realized that it would be cool to take all the public domain e-books and create audio versions for them.
So I did it... kind-of.
It would cost an immense amount of money to generate all the audio right away (OpenAI TTS costs approximately $0.84/hour of audio; 11labs, for comparison, is 10 times more expensive).
So, I took a more gradual approach.
I took all the metadata from the Project Gutenberg catalog (it's about 70GB of dirty XML), cleaned it, put it into my database, and created a browsable catalog.
When the first user visits a book page on Listenly, I download the full text of the book, save it in my cloud storage, and calculate the price for audio generation based on the book's length. Then, if the user decides to purchase it, we generate the audio.
I know it’s not perfect.
I've burned out a couple of times already while doing it.
But still, I need to show it to the world. And I’ll be glad to hear your feedback.
Peace.
Check out their voice samples: https://rhasspy.github.io/piper-samples/ (or make your own).
Happy to help you set it up locally...
https://github.com/rhasspy/piper