Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I wonder if there's a way to automatically detect how "fast" a person talks in an audio file

Transcribe it locally using whisper and output tokens/sec?



Just count syllables per second by doing an FFT plus some basic analysis.


> FFT plus some basic analysis

Yeah, totally easier than `len(transcribe(a))/len(a)`


Maybe not as quick to code up but way faster to calculate.

The tokens/second can be used as ground truth labels for a fft->small neural net model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: