Here's what it means to understand how something works. If you can accurately pr...

kerkeslager · on March 28, 2023

> Here's what it means to understand how something works. If you can accurately predict what _change_ you need to make to its workings in order to achieve a desired _change_ in the resulting behaviour, then you know how it works. The more reliably and precisely you can predict the impact of various changes, the better your understanding.

For example, one might decide that you don't want your chat engine to return pornography. So you train your model to reject inappropriate requests.

They did that.

I'm not sure why you think that what you're saying doesn't apply to LLMs.

> I don't think you can show me that kind of understanding of large language models.

See above.

> Which is to say, I don't think you can accurately predict what changes you need to make to the internal structure of an LLM to cause specific changes to the interesting high-level ("emergent") behaviours that people are seeing.

What emergent behaviors?

There's literally nothing I've seen any of these LLMs do that I would not expect from the inputs and design of the system. The behaviors aren't "emergent", they're exactly what we'd expect.

zestyping · on March 29, 2023

> train your model to reject inappropriate requests.

Training a model is not the same as understanding how the model works internally. Given an LLM, can you identify which of the parameters that are responsible for generating pornography? By looking at the parameters, can you tell how reliable or unreliable the filter is? You don't know. How do you change the parameters to increase or decrease the reliability of the filter? You don't know.

> There's literally nothing I've seen any of these LLMs do that I would not expect from the inputs and design of the system.

Really? How confident were you, _before_ ChatGPT came out, that it would be able to explain how to remove a peanut butter sandwich from a VCR in the style of the Bible, when simply asked to do so? Sure, it would be reasonable to guess that it would adopt some of the phrasing and vocabulary of the Bible. But did you know it would be _this_ successful? (https://twitter.com/_BRCooper/status/1598569424008667137)

And why _this_ successful, and not better, and not worse? Why in this way? I doubt you could have known. Suppose you were asked to predict how it would answer that request, by writing your own estimate of its answer. Would you have written something of this level of quality? We could even do the same experiment right now — we could challenge you to describe how the quality of the output would change if the model had half the size, or double the size. And then try it, and see. How confident are you that you could draft an answer that would be degraded by just the right amount?

If all of this was entirely predictable, then no one would be surprised. But we have ample evidence that the vast majority of people have been shocked by what it can do.

kerkeslager · on March 29, 2023

> Given an LLM, can you identify which of the parameters that are responsible for generating pornography?

The pornography in the training data.

> By looking at the parameters, can you tell how reliable or unreliable the filter is?

The parameters in the model? No, because that's not in a human-readable form: it's a highly-compressed cache. But those parameters aren't mysterious any more than a JPEG is mysterious because you can't read its bytecode. The parameters come from the training data, just as the lossy compressed bytes in a JPEG come from whatever raw format was compressed.

You're viewing the model as if it's some sort of mystery, but it's not. It's just a lossy-compressed cache of the training data optimized for quick access. There is nothing there which is not from the training data.

The rest of your post is asking me, personally, about my knowledge of the system, and then extrapolating that answer to all of humanity, which happens to include the people who made ChatGPT.

Just because people with no access to the code or training data of ChatGPT can't answer a question about ChatGPT doesn't mean those questions can't be answered.

> If all of this was entirely predictable, then no one would be surprised. But we have ample evidence that the vast majority of people have been shocked by what it can do.

The surprise of people who don't understand how something works, is not evidence that nobody understands how it works.