I would say Open AI's Whisper just works, a nice GUI wrapper that leverages Meta...

I would say Open AI's Whisper just works, a nice GUI wrapper that leverages Metal/GPU/Co-processors is "Hello Transcribe"

Whisper transcribes conversations from audio files. Hello Transcribe is a GUI wrapper by someone else that uses Whisper under the hood to create subtitles in subtitle format with timestamps.

Does not distinguish between speakers though

I wish other use cases were considered by that application, as I pretty much never want subtitle files and never plan to

There are some techniques to distinguish between speakers but I haven't seen anyone put the combinations together in a nice GPU leveraging app.