Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you're already doing local ffmpeg stuff (i.e. pretty involved with code and scripting already) you're only a couple of steps more away from just downloading the openai-whisper models (or even the faster-whisper models which runs about two times faster). Since this looks like personal usage and not building production quality code, you can use AI (e.g. Cursor) to write a script to run the whisper model inference in seconds.

Then there is no cost at all to run any length of audio. (since cost seems to be the primary factor of this article)

On my m1 mac laptop it takes me about 30 seconds to run it on a 3-minute audio file. I'm guessing for a 40 minute talk it takes about 5-10 minutes to run.



Have you tried faster-whisper and whisper.cpp?


Yeah, my mentioned times are with faster-whisper, but I have not tried whisper.cpp. I just use a python script to run the model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: