Hacker News new | past | comments | ask | show | jobs | submit login

This take really misses a key part of implementation of these LLMs and I’ve been struggling to put my finger on it.

In every LLM thread someone chimes in with “it’s just a statistical token predictor”.

I feel this misses the point and I think it dismisses attention heads and transformers, and that’s what sits weird with me every time I see this kind of take.

There _is_ an assumption being made within the model at runtime. Assumption, confusion, uncertainty - one camp might argue that none of these exist in the LLM.

But doesn’t the implementation constantly make assumptions? And what even IS your definition of “assumption” that’s not being met here?

Edit: I guess my point, overall, is: what’s even the purpose of making this distinction anymore? It derails the discussion in a way that’s not insightful or productive.






> I feel this misses the point and I think it dismisses attention heads and transformers

Those just makes it better at completing the text, but for very common riddles those tools still gets easily overruled by pretty simple text completion logic since the weights for those will be so extremely strong.

The point is that if you understand its a text completer then its easy to understand why it fails at these. To fix these properly you need to make it no longer try to complete text, and that is hard to do without breaking it.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: