Hacker News new | past | comments | ask | show | jobs | submit login

I would say Open AI's Whisper just works, a nice GUI wrapper that leverages Metal/GPU/Co-processors is "Hello Transcribe"

Whisper transcribes conversations from audio files. Hello Transcribe is a GUI wrapper by someone else that uses Whisper under the hood to create subtitles in subtitle format with timestamps.

Does not distinguish between speakers though

I wish other use cases were considered by that application, as I pretty much never want subtitle files and never plan to

There are some techniques to distinguish between speakers but I haven't seen anyone put the combinations together in a nice GPU leveraging app.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: