Hacker News new | past | comments | ask | show | jobs | submit login

Here's what it means to understand how something works. If you can accurately predict what _change_ you need to make to its workings in order to achieve a desired _change_ in the resulting behaviour, then you know how it works. The more reliably and precisely you can predict the impact of various changes, the better your understanding.

I know how a bicycle works. I can prove that I have a very basic level of understanding, for example, by showing you that if I want the bicycle to go faster, I can pedal harder; if I want the bicycle to slow down, I can squeeze the brakes. I can prove a greater level of understanding by showing you that I can increase efficiency by increasing the tire pressure, or showing you that it's impossible to make a left turn (https://www.youtube.com/watch?v=llRkf1fnNDM) unless I first start by turning right.

That's what it means to explain how a bicycle works. (Notice that saying the bicycle is made of atoms does not help you do any of this.)

I don't think you can show me that kind of understanding of large language models. Which is to say, I don't think you can accurately predict what changes you need to make to the internal structure of an LLM to cause specific changes to the interesting high-level ("emergent") behaviours that people are seeing. That's what is meant by "nobody knows."




> Here's what it means to understand how something works. If you can accurately predict what _change_ you need to make to its workings in order to achieve a desired _change_ in the resulting behaviour, then you know how it works. The more reliably and precisely you can predict the impact of various changes, the better your understanding.

For example, one might decide that you don't want your chat engine to return pornography. So you train your model to reject inappropriate requests.

They did that.

I'm not sure why you think that what you're saying doesn't apply to LLMs.

> I don't think you can show me that kind of understanding of large language models.

See above.

> Which is to say, I don't think you can accurately predict what changes you need to make to the internal structure of an LLM to cause specific changes to the interesting high-level ("emergent") behaviours that people are seeing.

What emergent behaviors?

There's literally nothing I've seen any of these LLMs do that I would not expect from the inputs and design of the system. The behaviors aren't "emergent", they're exactly what we'd expect.


> train your model to reject inappropriate requests.

Training a model is not the same as understanding how the model works internally. Given an LLM, can you identify which of the parameters that are responsible for generating pornography? By looking at the parameters, can you tell how reliable or unreliable the filter is? You don't know. How do you change the parameters to increase or decrease the reliability of the filter? You don't know.

> There's literally nothing I've seen any of these LLMs do that I would not expect from the inputs and design of the system.

Really? How confident were you, _before_ ChatGPT came out, that it would be able to explain how to remove a peanut butter sandwich from a VCR in the style of the Bible, when simply asked to do so? Sure, it would be reasonable to guess that it would adopt some of the phrasing and vocabulary of the Bible. But did you know it would be _this_ successful? (https://twitter.com/_BRCooper/status/1598569424008667137)

And why _this_ successful, and not better, and not worse? Why in this way? I doubt you could have known. Suppose you were asked to predict how it would answer that request, by writing your own estimate of its answer. Would you have written something of this level of quality? We could even do the same experiment right now — we could challenge you to describe how the quality of the output would change if the model had half the size, or double the size. And then try it, and see. How confident are you that you could draft an answer that would be degraded by just the right amount?

If all of this was entirely predictable, then no one would be surprised. But we have ample evidence that the vast majority of people have been shocked by what it can do.


> Given an LLM, can you identify which of the parameters that are responsible for generating pornography?

The pornography in the training data.

> By looking at the parameters, can you tell how reliable or unreliable the filter is?

The parameters in the model? No, because that's not in a human-readable form: it's a highly-compressed cache. But those parameters aren't mysterious any more than a JPEG is mysterious because you can't read its bytecode. The parameters come from the training data, just as the lossy compressed bytes in a JPEG come from whatever raw format was compressed.

You're viewing the model as if it's some sort of mystery, but it's not. It's just a lossy-compressed cache of the training data optimized for quick access. There is nothing there which is not from the training data.

The rest of your post is asking me, personally, about my knowledge of the system, and then extrapolating that answer to all of humanity, which happens to include the people who made ChatGPT.

Just because people with no access to the code or training data of ChatGPT can't answer a question about ChatGPT doesn't mean those questions can't be answered.

> If all of this was entirely predictable, then no one would be surprised. But we have ample evidence that the vast majority of people have been shocked by what it can do.

The surprise of people who don't understand how something works, is not evidence that nobody understands how it works.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: