Hacker News new | past | comments | ask | show | jobs | submit login

There seems to be a small error in the reported results: In most rows the model that did better is highlighted, but in the row reporting results for the FLEURS test, it is the losing model (Gemini, which scored 7.6% while GPT4-v scored 17.6%) that is highlighted.



That row says lower is better. For "word error rate", lower is definitely better.

But they also used Large-v3, which I have not ever seen outperform Large-v2 in even a single case. I have no idea why OpenAI even released Large-v3.


The text beside it says "Automatic speech recognition (based on word error rate, lower is better)"




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: