Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It doesn't help that thanks to RLHF, every time a good example of this gains popularity, e.g. "How many Rs are in 'strawberry'?", it's often snuffed out quickly. If I worked at a company with an LLM product, I'd build tooling to look for these kinds of examples in social media or directly in usage data so they can be prioritized for fixes. I don't know how to feel about this.

On the one hand, it's sort of like red teaming. On the other hand, it clearly gives consumers a false sense of ability.




Indeed. Which is why I think the only way to really evaluate the progress of LLMs is to curate your own personal set of example failures that you don't share with anyone else and only use it via APIs that provide some sort of no-data-retention and no-training guarantees.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: