I would say Open AI's Whisper just works, a nice GUI wrapper that leverages Metal/GPU/Co-processors is "Hello Transcribe"
Whisper transcribes conversations from audio files. Hello Transcribe is a GUI wrapper by someone else that uses Whisper under the hood to create subtitles in subtitle format with timestamps.
Does not distinguish between speakers though
I wish other use cases were considered by that application, as I pretty much never want subtitle files and never plan to
There are some techniques to distinguish between speakers but I haven't seen anyone put the combinations together in a nice GPU leveraging app.
Whisper transcribes conversations from audio files. Hello Transcribe is a GUI wrapper by someone else that uses Whisper under the hood to create subtitles in subtitle format with timestamps.
Does not distinguish between speakers though
I wish other use cases were considered by that application, as I pretty much never want subtitle files and never plan to
There are some techniques to distinguish between speakers but I haven't seen anyone put the combinations together in a nice GPU leveraging app.