I work in a startup doing STT. We have reached SOTA on lowest CER in our language among industry. The main reason we are doing well is not because we have smart engineers tuning on fancy model, but rather we developed novel method to collect tremendous amount of usable data from via internet (crawling speech and text transcripts, using subtitles from movies, etc). Implementing interesting paper improves 1%, but pouring in more data improves 10%. I guess this is why big guys aren't exposing what data they've used.
It takes a fortune to collect just 100 hours of clean speech-to-text labeled data, and they will never meet user expectations in market.
Also, we have developed our internal framework that eases pretraining - finetuning to subtask pipeline. After months of usage it required a lot of refactoring to match the need of all forms of DNN models. I would like to hear from other ML engineers if they have a internal framework which generalizes at least one subfield of DNN(nlp, vision, speech etc)
What's an example of an NLP application that "meets user expectations" in the wild? Google's natural language stuff just annoys me. Facebook and google translate don't seem good after multiple breakthrough announcements.
Edit: Sota, state of the art, is taken incredibly glibly, as if that was enough to make a given application a success. This seems like a massive overstatement.