Hacker News new | past | comments | ask | show | jobs | submit login

they’re simply statistical systems predicting the likeliest next words in a sentence

They are far from "simply", as for that "miracle" to happen (we still don't understand why this approach works so well I think as we don't really understand the model data) they have a HUGE amount relationships processed in their data, and AFAIK for each token ALL the available relationships need to be processed, so the importance of a huge memory speed and bandwidth.

And I fail to see why our human brains couldn't be doing something very, very similar with our language capability.

So beware of what we are calling a "simple" phenomenon...




> And I fail to see why our human brains couldn't be doing something very, very similar with our language capability.

Then you might want to read Cormac McCarthy's The Kekulé Problem https://nautil.us/the-kekul-problem-236574/

I'm not saying he is right, but he does point to a plausible reason why our human brains may be doing something very, very different.


Onus of proof fallacy (basically "find the idea I'm referring to yourself"). You might want to clarify or distill your point from that publication without requiring someone to read through it.


Indeed. Nobody would describe a 150 billion dimensional system to be “simple”.


A simple statistical system based on a lot of data can arguably still be called a simple statistical system (because the system as such is not complex).


Last time I checked a GPT is not something simple at all... I'm not the weakest person understanding maths (coded a kinda advanced 3D engine from scratch myself a long time ago) and still it looks to me something really complex. And we keep adding features on top of that I'm hardly able to follow...


It's not even true in a facile way for non-base-models, since the systems are further trained with RLHF -- i.e., the models are trained not just to produce the most likely token, but also to produce "good" responses, as determined by the RLHF model, which was itself trained on human data.

Of course, even just within the regime of "next token prediction", the choice of which training data you use will influence what is learned, and to do a good job of predicting the next token, a rich internal understanding of the world (described by the training set) will necessarily be created in the model.

See e.g. the fascinating report on golden gate claude (1).

Another way to think about this is let's say your a human that doesn't speak any french, and you are kidnapped and held in a cell and subjected to repeated "predict the next word" tests in french. You would not be able to get good at these tests, I submit, without also learning french.

(1) https://www.anthropic.com/news/golden-gate-claude




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: