Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Offline audiobook from any format with one CLI command (github.com/c-loftus)
105 points by C-Loftus 87 days ago | hide | past | favorite | 44 comments
QuickPiperAudiobook locally generates an mp3 audiobook on Linux with one easy command. It can convert PDFs, epub, mobi, and many more by using ebook-convert. It uses any piper TTS model, and thus supports a wide variety of languages.

I've had great success using it to read more while reducing eye strain and computer usage. I think I've probably read 30 or so books this way now over the past year. Being able to listen to any content you want in audio form free and offline while going for a walk is extremely handy.

I hope it helps you as well!

Cheers




Cool app! I've had some issues with getting it to work, though:

- ebook-convert is not a small dependency, it seems that it only comes bundled with calibre software. And calibre has huge number of python dependencies (>400 packages on OpenSuse) - don't know about you, but I'm not polluting my install with that for a small tool. So, I've grabbed appimage version of calibre, extracted it and added symlink to the bundled ebook-convert. It is still around ~500mb of wasted space, but atleast it's local to a single folder.

Could you replace it with another tool/library, or include only necessary stuff with binary?

- Then I've encountered another problem. I have no piper installed on my system, but readme says:

> You don't need to have piper installed. This program manages piper and the associated models.

It didn't download piper release and proceeded without errors. Then it did download some models. After that it errored out on trying to change directory to non-existent "~/.config/QuickPiperAudiobook/piper" So naturally, I looked in source code, found link to piper tarball and extracted it myself.

A-ha! Now it works. Until..

- Done. Saved audiobook as /home/archargelod/Audiobooks/text.wav

You could try to guess what was the problem, but I'm going tell you right away: it didn't create "Audiobooks" folder and again there were no errors.

Thankfully, that was the last issue and after I created ~/Audiobooks manually, my generated wav was there.


Thank you for the feedback and I'm sorry you had those issues. I cannot replicate at the moment on Ubuntu 24.04 but will check back on this. I presume it is something simple going wrong with how I am getting the home directory in golang and checking if the path exists.

Your feedback on ebook-convert is very valid. I can take a look at breaking it up. (Granted I am not sure how much of a lift that would be)


The intention seems to have been to skip running ebook-convert if the input file is already a text file, but it runs it anyway. So I recompiled it to not do that.

https://gist.github.com/avelican/8602b417e810f8dd4e31e8e3fbb...

...at which I did some more digging and realized that (for my purposes anyway -- operating on txt files), QPA can simply be replaced with piper itself!

    cat book.txt |  piper --model [model] --output_file book.wav
(which I found kind of funny)

Re: the ebook-convert dependency, I wonder if there are any feasible alternatives? My first thought was pandoc, which is ~140MB, but I guess that's smaller than Calibre's ~1400MB (!!!).


Issues should be fixed now in the latest release.


Can confirm that everything is fixed. Thanks for the update!


That's interesting, thanks for sharing. Does anyone know of a good solution for seamlessly switching between audiobooks and ebooks for books that are not bought from Amazon on Kindle?

In this case you already have the input file, and the audio output file but I guess there would be an app that takes these two files to provide a good reading experience. As they are based on the same source it should be possible to keep the reading progress matched between them.


Try out Storyteller, they're working on this exact problem: https://smoores.gitlab.io/storyteller/


Very cool, thanks for sharing! I'll follow the project and hope there's some way to get this running on Kobo or other eInk readers in the future.


It's using an oft-ignored part of the ePub standard, so I think all that should be needed for Kobo support is implementation of that part of the standard in KOReader.


> Does anyone know of a good solution for seamlessly switching between audiobooks and ebooks for books that are not bought from Amazon on Kindle?

Use Calibre's e-book viewer[^0] which uses Piper for text-to-speech.

[^0]: https://manual.calibre-ebook.com/viewer.html#read-aloud


Thanks, but clarification: I meant on iOS / mobile devices as I'm not reading on my computer. On second thought, it would be an amazing feature for https://prologue.audio, which is a beautiful app and works very well for audiobooks already.


ReadEra Pro is an ereader app with a decent text-to-speech, I often flip between reading and listening.


I’ve really enjoyed moving most of my reading to TTS-generated audiobooks. I haven’t tried the newer AI voices but that certainly sounds like a step up!


As a former audiobook narrator, may your cereal always be soggy and your socks too.

On a more serious note, this is a cool application of the technological advancement in AI voice models, and inevitable in today's society. It just really sucks to watch this race to the bottom actively put people out of work.

But hey, at least we can save a few bucks on an audiobook, right?


>It just really sucks to watch this race to the bottom actively put people out of work

the entire progress of civilization has depended on putting people out of work by increasing productivity and efficiency. Subsistence hunter-gatherers and subsistence farmers were put out of work by cheaper agriculture systems, and some of those unemployed realized they could support themselves by reading books to other people, a task they enjoyed much more.


> Subsistence hunter-gatherers and subsistence farmers were put out of work by cheaper agriculture systems, and some of those unemployed realized they could support themselves by reading books to other people,

The replacement of hunter-gatherers by farming is a change that took centuries to take hold. Nobody lost their ability to feed their family because their ability to hunt and gather was automated away. Ironically, the move away from hunter/gatherer subsistence took free time away (for things like storytelling) instead of adding to it, in exchange for greater reliability in their sustenance.

The loss of entire swaths of employment is a fairly new development. As is the lack of safety nets (US Centric for obvious reasons) for those who become injured or otherwise unable to sustain themselves.


this broad-brush take seems so persuasive.. for about one minute of thinking.. systems of humans are built for humans first.. which work of which humans are being replaced and why? Is anyone actually driving? If the modern answer is "money answers all questions" then, who makes money simply by moving money? Anyone who is not moving money right now is fair game because money is the only decider ?

this superficial thinking is full of holes from the first examination, and, actively harms others.. and is an excuse to ignore the statements of a audio book narrator here.


The premise of this argument is false. Pre-agriculture people were food supply constrained. Nobody is audiobook or other entertainment supply constrained today. And worse, modern farm produce is effectively worth zero. In many cases farmers are paid NOT to produce specific goods. And those who do MUST produce at purely artificial levels as to require the use of unsustainable, patented, and specialized chemicals or GMOs to break even. This entire line of research leads to spam and waste.


I'll go further and say that audiobook production is not cost constrained unless the marketable value of the work is extremely low. What we get is cheap audiobooks for which there is no / low demand, and what it costs us is the decimation of the limited audiobook economy. That's happening at the same time as a billion new / fully generated works hit the market and overwhelm our ability to curate the supply and provide meaningful discovery. Again, more spam. Then AI spam to promote these valueless works. Awesome.


>The premise of this argument is false. Pre-agriculture people were food supply constrained. Nobody is audiobook or other entertainment supply constrained today.

your premise is not false, but your conclusions are. see "indifference curves" in econ 101, and Pareto optimality.

We take consumer preferences as a given, because I don't know why you choose not to spend all of your money on the best and most pure essentials for life, but instead take some amount of your money and buy alcohol or skateboards or any of a number of other downright dangerous inessential things that you enjoy. You even pay money to GP to listen to his audio books when you could read them yourself and make money selling your own recordings. We don't know why you behave the way you do, but that's your choice. Given that you pay money for GP's audiobooks, if computer generated audiobooks drove the price down to zero, that would give you more money for alcohol; and it would give the rest of the economy a worker, GP, who could now participate in creating other products you'd probably want to buy (maybe dangerous jet-suits?) with the extra money you've saved on audiobooks.

We don't need to figure out how everybody wants to spend their time or their money, people figure that out for themselves and markets emerge to accomodate them.

We do need to figure out where the negative externalities lie, which you are attempting to do but knowing what qualities mark them as externalities will help you effectuate change by working with the market instead of failing against the market.


It really depends on what you use this for. For recreational use on novels, high-quality human narrated audiobooks are surely still worth the money. Good pod-casts and radio-shows are overwhelmingly research, curation and writing combined with an engaging narration.

This will only do narration, and the engagement is probably still not 100% there yet (sorry cant try it right now).

This kind of thing is very useful to consume high-level information on the side, while driving, cooking, gardening or doing exercise. So it can be useful to make previously curated and written content more accessible. Including content people have curated themselves, or got a bot to curate for them.

For example, I listened to the entire FT weekend edition while cycling on the weekend, using their text-to-audio function. This allowed me to take in even parts of the paper I normally do not have time to read. Before the advent of the text-to-speed function, I would have to chose between health and information. Now I can have both.


I consume many audiobooks and I usually love an audiobook narrator to the point they are a value add itself or I hate the way they speak and I literally can't read the book. The former is very rare though and the later much more common.

The ability to change voices to one that suites a person's taste is hardly a race to the bottom. It is a HUGE value add.

I am sure lamplighters were not happy about the light bulb either.

c'est la vie.

Breaking the audible monopoly sounds like a nice side effect too.


For what it is worth, I still listen to and love human-read audiobooks! However, it is particularly useful to have an AI option for books that are too niche for there to be an incentive for an individual to narrate them. Lots of academic and personal texts fall into those categories.


Yeah, I'm painstakingly reading out a novel I wrote and recording is hard. We had to scrap last nights recording because there was a hum at the place we are recording and even turning off all breakers we couldn't figure out where it was. Turns out it was a bathroom fan that was hooked to a different breaker in the house.

I'll be writing some music for the intros of chapters and some special sections for suspense.


Hums are the worst. And the quieter your space, the more hums you seem to hear. Good luck!

There's a few good folks on youtube that discuss some of the more nitty gritty details, if you're interested.


Tangentially related: I like to leave my phone at home when I go exercise, and just listen to books via my watch (21st century problems...)

But to this date I cannot use Apple's Books app on the watch to listen to audiobooks I have on mp3/mp4a/... It only works with audiobooks you have purchased in their walled garden.


Yeah unfortunately I am not sure about the Apple Watch. On iOS I personally use BookPlayer [0] and find it easy to transfer mp3 files via USB. I think there are cloud sync options as well. Been very happy with that if you are looking for other mobile options.

[0] https://apps.apple.com/us/app/bookplayer/id1138219998


Is Piper currently the best open source TTS model? I occasionally review open models to see if they match elevenlabs and have been disappointed. However, Piper sounds better than the last time I listened around.


Listening to the piper demos [1] and comparing to coqui [2], I'd say coqui sounds better to me, but I'd love to hear others' opinions. Looks like Piper's latest commits were 3 months ago [3] while Coqui's were 8 months ago [4], so they both seem similar in recency. In terms of ease of use though, especially with this project, personally Piper seems way less overwhelming.

[1] https://rhasspy.github.io/piper-samples/ [2] https://huggingface.co/spaces/coqui/xtts [3] https://github.com/rhasspy/piper [4] https://github.com/coqui-ai/TTS


For anyone who is interested, CoquiTTS (formerly, MozillaTTS) was great, but the project isn't maintained anymore (athough there's been some confusion about whether or not it's active. See: https://github.com/coqui-ai/TTS/issues/4022).

Looks like there's an effort to keep an actively maintained fork here, though: https://github.com/idiap/coqui-ai-TTS


Very interesting. I have listened to an AI audiobook once and although the inflection was somewhat jarring at first you got kinda used to it. I suppose it's good enough for your own use. And audiobook prices being what they are rather affordable one as well.


Yeah that is totally fair. In my experience, I feel that after a while your brain starts to tune out some of the inflection differences. Piper models are honestly pretty solid as well. I think in general, AI audiobook solutions like mine are better for non-fiction compared to fiction. Or at least that is what I read the most of


A very cool project, you should build a website interface you could easily charge for it or take donations/advertise on it if you want to keep it free

What would it take to add a specific language to piper? And do you know a good speech to text model?


Thank you! Wouldn't a website interface then make it competing with and thus inferior to solutions like those from 11elevenlabs? I am not opposed to creating a SaaS offering, but I feel I do not have the economies of scale nor proprietary models a large company has. Let me know if I am wrong! Maybe I will one day do something as a separate project on the browser with WebGPU.

With regards to adding languages, first check if support already exists [0]. Then there are a few tutorials that might be relevant [1] [2] [3]. Once you have the onnx model you can just put it in the QuickPiperAudiobook model directory and specify it via the cli args.

[0] https://rhasspy.github.io/piper-samples/ [1] https://github.com/rhasspy/piper/issues/51 [2] https://github.com/rhasspy/piper/blob/master/TRAINING.md [3] https://www.youtube.com/watch?v=b_we_jma220


> And do you know a good speech to text model?

OpenAI's whisper, code+model are available, and multiple projects have built on it. You could try this wrapper: https://github.com/m-bain/whisperX -- or for short utterances on a smart-phone https://github.com/futo-org/whisper-acft


Deepgram is another alternative. I use it at work, fastest service and also relatively cheap. But Whisper is better for selfhosting


Very nice, I will give it a try! I looked at piper a bit, does this support multi speaker models?


Thanks! At the moment, I don't think there are any public multispeaker models I am aware of. I could be wrong though!


This is awesome, it was pretty easy to set up and start using it.

I have just one question/note to make: I tried a book in the Mexican Spanish language and noticed that it fails to catch the accents on the words (emphasis on words with tildes and strong accents on that syllable) and I am thinking it is because of the .pdf parsing since the Piper Voice Sample on their webpage example does it properly (on both avbailable voices).

Do you have an idea of what could exactly be happening and how I can try to solve it?

Thank you very much for the tool again!!!

Update: Ohh ok I just checked the repo Issues and found the one about polish accents, I tried "--speak-diacritics" but got the same "Error: failed to read file passed as input to piper: read /tmp/ebook-convert-xxxxxxx.txt file already closed". If I skip the diacritics option it converts fine.


Update 2: I went to look at the code and although I have never done anything with Go I was pleased with how easy it is to read plus your code was pretty well structured.

I realized the removal of diacritics was happening at the function RemoveDiacritics inside lib/textProcessing.go on line 26 and modified the definition(?) to not modify special characters, compiled again and voila! It worked great.

After that I used Calibre to convert a couple .pdfs to .txt and with a pretty simple python script got rid of page footnotes/headers/page_numbers and I just ended up with pretty decent Audiobooks.

Thanks again for the great tool!


Would this dedrm my audible stuff?



Can do it with plain old ffmpeg:

https://news.ycombinator.com/item?id=23541424


If you are looking to dedrm ebooks, that can be done via a calibre plugin. For audiobooks, I am not sure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: