Hacker News new | past | comments | ask | show | jobs | submit login

> we still need something that understands what it's doing enough to observe and catch that 0.01% where it's wrong.

Nobody has figured out how to get a confidence metric out of the innards of a neural net. This is why chatbots seldom say "I don't know", but, instead, hallucinate something plausible.

Most of the attempts to fix this are hacks outside the LLM. Run several copies and compare. Ask for citations and check them. Throw in more training data. Punish for wrong answers. None of those hacks work very well. The black box part is still not understood.

This is the elephant in the room of LLMs. If someone doesn't crack this soon, AI Winter #3 will begin. There's a lot of startup valuation which assumes this problem gets solved.






> There's a lot of startup valuation which assumes this problem gets solved.

Not just solved, but solved soon. I think this is an extremely difficult problem to solve to the point it'd involve new aspects of computer science to even approach correctly, but we seem to just think throwing more CPU and $$$ at the problem will work itself out. I myself am skeptical.


Is there any progress? About two years ago, there were people training neural nets to play games, looking for a representation of the game state inside the net, and claiming to find it. That doesn't seem to be mentioned any more.

As for "solved soon", the market can remain irrational longer than you can stay solvent. Look at Uber and Tesla, both counting on some kind of miracle to justify their market cap.


I get the impression that most of the 'understand the innards' work isn't scalable - you build out a careful experiment with a specific network, but the work doesn't transfer to new models, fine-tuned models, etc.

I'm just an outside observer, though...


Tesla was mildly successful right until ite CEO satrted to fight its customers. It's unclear if this will revert.

Uber seems to have become sustainable thid year.

There's little reason to expect a correction any soon on any of those.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: