Hacker News new | past | comments | ask | show | jobs | submit login

Maybe in SOTA ml/nlp research, but in the world of building useful tools and products, BERT models are dead simple to tune, work great if you have decent training data, and most importantly are very very fast and very very cheap to run.

I have a small Swiss army collection of custom BERT fine tunes that are equal or better than the best LLM and execute document classification tasks in 2.4ms. Find me an LLM that can do anything in 2.4ms.




Latency, throughput and cost are still very important for many applications.

Also the output of a purpose-built encoder model is preferable to natural language. Not only is it unambiguous, but scores are often an important part of the result.

Last, if you need to get into some advanced methods of training, like pseudolabeling and semi-supervised learning, there’s different options and outlets for utilizing real world datasets.

That said, I’m not sure there’s much value in scaling up current encoder models. It seems like there’s already a point of diminishing returns.


Scores are also interesting in that you can get 1.0 match on a classification task, but if the model is dog shit it’s 1.0 of dog shit.

I’m still struggling with the degree to which I want to expose raw scores to users for that reason.

On the other hand, sometimes a document that slightly above an arbitrary threshold isn’t great, or a document slightly below an arbitrary threshold may be fine.

I’m excited about how easy it is to do this stuff, as the tooling is sophisticated enough now, you don’t need to know too much about the underlying mechanisms to do things that are useful. Once you get into those very fine distinction, it’s still very difficult work.


Want to share your collection with the class so we can all learn? Seems useful.


Product in stealth for a little bit longer, so can’t say much. :-)


What does your swiss army collection do?


Document classification in highly ambiguous contextual space. Solving some specific large scale classification tasks, so multi million document sets.


What technique do you use to get BERT to work on longer documents?


512 has been sufficient to solve my problems. I had done some initial attempts with BigBird that weren’t going well, but then realized I didn’t really need it.

I may revisit at some point.


Yeah, pretty much. When you have 2b files you need to troll through good luck using anything but a vector database. Once you do a level or two of pruning of the results then you can feed it into an LLM for final classification.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: