Hacker News new | past | comments | ask | show | jobs | submit login

I work in a startup doing STT. We have reached SOTA on lowest CER in our language among industry. The main reason we are doing well is not because we have smart engineers tuning on fancy model, but rather we developed novel method to collect tremendous amount of usable data from via internet (crawling speech and text transcripts, using subtitles from movies, etc). Implementing interesting paper improves 1%, but pouring in more data improves 10%. I guess this is why big guys aren't exposing what data they've used. It takes a fortune to collect just 100 hours of clean speech-to-text labeled data, and they will never meet user expectations in market.



Also, we have developed our internal framework that eases pretraining - finetuning to subtask pipeline. After months of usage it required a lot of refactoring to match the need of all forms of DNN models. I would like to hear from other ML engineers if they have a internal framework which generalizes at least one subfield of DNN(nlp, vision, speech etc)


Hi, my name is Alexander, I am an author of both Gradient pieces, Open STT and silero.ai

Interesting We did mostly the same Did you open source your data as well?


I don't think my superiors are going to open our source code. Thanks for letting me know about your project


What's an example of an NLP application that "meets user expectations" in the wild? Google's natural language stuff just annoys me. Facebook and google translate don't seem good after multiple breakthrough announcements.

Edit: Sota, state of the art, is taken incredibly glibly, as if that was enough to make a given application a success. This seems like a massive overstatement.


Lots of things are translated entirely by deep translation systems with some light post-editing in the wild.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: