The LLM miracle comes from the massive amount of text we can use to train it on....

TeMPOraL · 2024-05-21T21:31:33 1716327093

But that idea you describe is exactly what would make the LLM stop working. The "LLM miracle" comes from the fact that all that text is not random[0]. There is a lot of information encoded in what phrases, sentences, paragraphs have been written (and how often), vs. a vastly larger amount of nearly identical texts that were not written. The "complex formula" used is... reality, as perceived and understood by people. LLMs pick up on that.

--

[0] - Well, most of it anyway; I bet the training set contains some amount of purely random text, for example technical articles discussing RNGs and showcasing their output. Some amount of noise is unavoidable.

f33d5173 · 2024-05-21T21:38:50 1716327530

The idea would to generate a false "reality" for the LLM to learn about. You would randomly generate a system of rules, use those rules to generate text, and then train the llm to predict the text. The goal would be to get it to stop encoding the reality proper in its weights, and focus on learning to pick up what reality looks like very quickly from text.

sillysaurusx · 2024-05-21T22:15:31 1716329731

Bonus points for one of the most delightfully creative ideas I’ve heard in some time. I don’t think it will work (the space of "not reality" is superexponentially larger than the space of "this describes reality") but I’m just happy to be thinking about nonstandard ML again.

(I’ve dubbed this sort of thing “nonstandard ML" since, like you, I have a fondness for thinking of unorthodox solutions that seem plausible.)

throwthrowuknow · 2024-05-22T18:09:06 1716401346

It will just learn your formula and won’t generalize to anything else. It would essentially have to unlearn it when you started training on English so it would make training slower.