This is called speaker diarization, basically one of the 3 components of speaker... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

sva_ on Dec 7, 2022 | parent | context | favorite | on: OpenAI quietly launched Whisper V2 in a GitHub com...

This is called speaker diarization, basically one of the 3 components of speaker recognition (verification, identification, diarization).

You can do this pretty conveniently using pyannote-audio[0].

Coincidentally I did a small presentation on this at a university seminar yesterday :). I could post a Jupyter notebook if you're interested.

PS: Bai & Zhang (2020) is a great review on the literature [1]

[0] https://github.com/pyannote/pyannote-audio

[1] https://arxiv.org/abs/2012.00931

wanderingmind on Dec 7, 2022 | [–]

Yes please posting a jupyter notebook will be of great help

swyx on Dec 7, 2022 | [–]

yes please!

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact