Two-layer neural networks are universal approximators. Given enough units/parame...

Two-layer neural networks are universal approximators. Given enough units/parameters in the first layer, enough data, and enough computation, they can model any relationship.

(Any relationship with a finite number of discontinuities. Which covers everything we care about here.)

But more layers, and recurrent layers, let deep learning models learn complex relationships with far fewer parameters, far less data and far less computation.

Less parameters (per complexity of data and performance required of the model) means more compressed, more meaningful representations.

The point is that you can’t claim a deep learning model has only learned associations, correlations, conditional probabilities, Markov chains, etc.

Because architecturally, it is capable of learning any kind of relationship.

That includes functional relationships.

Or anything you or I do.

So any critique on the limits of large language models needs to present clear evidence of what it is being claimed it is not doing.

Not just some assumed limitation that has not been demonstrated.

—

Second thought. People make all kinds of mistakes. Including very smart people.

So pointing out that an LLM has trouble with some concept doesn’t mean anything.

Especially given these models already contain more concepts across more human domains than any of us have ever been exposed to.