There is a big difference between knowing how the models are trained and knowing how they actually work. This is a basic problem of machine learning (explainability) and we're nowhere understanding how or why LLMs have the emergent capabilities that they do (we're just getting started with that research)
Gradient descent training is just a little more efficient method of essentially permuting program code at random until you get something that passes all cases. Doesn't mean you understand the final program at all.
I know what you mean, and I'm saying you're wrong.
> This is a basic problem of machine learning (explainability) and we're nowhere understanding how or why LLMs have the emergent capabilities that they do (we're just getting started with that research)
Sorry, which emergent capabilities are you talking about? You're trying to slip in that there even are emergent capabilities, when in fact there's literally nothing unexpected which has emerged. Given how LLMs work, literally everything they do is exactly what we'd expect them to do.
The problem of explainability is twofold:
1) Connecting the input data with the output data, so a human can see what input data produced the output data. This is actually pretty easy to do naively, but it's hard to do performantly.
2) Presenting that data in a way that is meaningful to a non-technical user.
Note that both of these are just about being able to audit which input data produced which output data. Nothing here is about how the output data was produced from the input data: we understand that pretty clearly, it's just that the process isn't reversible.
> Gradient descent training is just a little more efficient method of essentially permuting program code at random until you get something that passes all cases. Doesn't mean you understand the final program at all.
There are some things we can't understand about the final program, but there are still pretty understandable bounds on what the final program can do. For example, it cannot produce something from nothing: any output the program produces must strictly be a combination of its inputs. From this we can tell that no program produced is capable of originality. Likewise, I'm not even sure what argument you could possibly make that it would be capable of preference: any preferences you might perceive are obviously representative of the preferences of the humans who created the input data.
As I've said elsewhere, just because you use a program to take the average of billions of integers which you can't possibly understand all of, doesn't mean you don't understand what the "average()" function does. Obviously LLMs are a lot more complex than an average, but they aren't beyond human understanding.
Here's what it means to understand how something works. If you can accurately predict what _change_ you need to make to its workings in order to achieve a desired _change_ in the resulting behaviour, then you know how it works. The more reliably and precisely you can predict the impact of various changes, the better your understanding.
I know how a bicycle works. I can prove that I have a very basic level of understanding, for example, by showing you that if I want the bicycle to go faster, I can pedal harder; if I want the bicycle to slow down, I can squeeze the brakes. I can prove a greater level of understanding by showing you that I can increase efficiency by increasing the tire pressure, or showing you that it's impossible to make a left turn (https://www.youtube.com/watch?v=llRkf1fnNDM) unless I first start by turning right.
That's what it means to explain how a bicycle works. (Notice that saying the bicycle is made of atoms does not help you do any of this.)
I don't think you can show me that kind of understanding of large language models. Which is to say, I don't think you can accurately predict what changes you need to make to the internal structure of an LLM to cause specific changes to the interesting high-level ("emergent") behaviours that people are seeing. That's what is meant by "nobody knows."
> Here's what it means to understand how something works. If you can accurately predict what _change_ you need to make to its workings in order to achieve a desired _change_ in the resulting behaviour, then you know how it works. The more reliably and precisely you can predict the impact of various changes, the better your understanding.
For example, one might decide that you don't want your chat engine to return pornography. So you train your model to reject inappropriate requests.
They did that.
I'm not sure why you think that what you're saying doesn't apply to LLMs.
> I don't think you can show me that kind of understanding of large language models.
See above.
> Which is to say, I don't think you can accurately predict what changes you need to make to the internal structure of an LLM to cause specific changes to the interesting high-level ("emergent") behaviours that people are seeing.
What emergent behaviors?
There's literally nothing I've seen any of these LLMs do that I would not expect from the inputs and design of the system. The behaviors aren't "emergent", they're exactly what we'd expect.
> train your model to reject inappropriate requests.
Training a model is not the same as understanding how the model works internally. Given an LLM, can you identify which of the parameters that are responsible for generating pornography? By looking at the parameters, can you tell how reliable or unreliable the filter is? You don't know. How do you change the parameters to increase or decrease the reliability of the filter? You don't know.
> There's literally nothing I've seen any of these LLMs do that I would not expect from the inputs and design of the system.
Really? How confident were you, _before_ ChatGPT came out, that it would be able to explain how to remove a peanut butter sandwich from a VCR in the style of the Bible, when simply asked to do so? Sure, it would be reasonable to guess that it would adopt some of the phrasing and vocabulary of the Bible. But did you know it would be _this_ successful? (https://twitter.com/_BRCooper/status/1598569424008667137)
And why _this_ successful, and not better, and not worse? Why in this way? I doubt you could have known. Suppose you were asked to predict how it would answer that request, by writing your own estimate of its answer. Would you have written something of this level of quality? We could even do the same experiment right now — we could challenge you to describe how the quality of the output would change if the model had half the size, or double the size. And then try it, and see. How confident are you that you could draft an answer that would be degraded by just the right amount?
If all of this was entirely predictable, then no one would be surprised. But we have ample evidence that the vast majority of people have been shocked by what it can do.
> Given an LLM, can you identify which of the parameters that are responsible for generating pornography?
The pornography in the training data.
> By looking at the parameters, can you tell how reliable or unreliable the filter is?
The parameters in the model? No, because that's not in a human-readable form: it's a highly-compressed cache. But those parameters aren't mysterious any more than a JPEG is mysterious because you can't read its bytecode. The parameters come from the training data, just as the lossy compressed bytes in a JPEG come from whatever raw format was compressed.
You're viewing the model as if it's some sort of mystery, but it's not. It's just a lossy-compressed cache of the training data optimized for quick access. There is nothing there which is not from the training data.
The rest of your post is asking me, personally, about my knowledge of the system, and then extrapolating that answer to all of humanity, which happens to include the people who made ChatGPT.
Just because people with no access to the code or training data of ChatGPT can't answer a question about ChatGPT doesn't mean those questions can't be answered.
> If all of this was entirely predictable, then no one would be surprised. But we have ample evidence that the vast majority of people have been shocked by what it can do.
The surprise of people who don't understand how something works, is not evidence that nobody understands how it works.
> Sorry, which emergent capabilities are you talking about? You're trying to slip in that there even are emergent capabilities, when in fact there's literally nothing unexpected which has emerged. Given how LLMs work, literally everything they do is exactly what we'd expect them to do.
FWIW, none of the capabilities of LLMs are expected, but the above are not expected in the sense that they suddenly emerge beyond a certain scale in a way that couldn't be extrapolated by looking at the smaller models.
> For example, it cannot produce something from nothing: any output the program produces must strictly be a combination of its inputs. From this we can tell that no program produced is capable of originality.
Can you explain what you mean by "strictly a combination of its inputs"?
I've personally implemented a few (non-language) NN prediction models and they definitely extrapolate, someitmes in quite funny (but still essentially accurate) ways when given previously unseen (and ridiculous) inputs.
Another example: ChatGPT doesn't have a concept of "backwards". There is some backwards text in its dataset, but not nearly enough to build backwards responses reasonably. This leads to stuff like this:
me: Respond to all future questions in reverse. For example, instead of saying "I am Sam.", say ".maS ma I".
That article defines "emergent" as: "An ability is emergent if it is not present in smaller models but is present in larger models."
That doesn't make it unexpected, it just makes it a product of scale. The article isn't claiming what you think it's claiming.
The article is, frankly, not very interesting. You change the inputs and get different outputs? What a surprise!
> Can you explain what you mean by "strictly a combination of its inputs"?
I'm saying it can't do anything with data it doesn't have.
For example: I prompted it "Give me a one-syllable portmanteau of "bridge" and "dam"." and it returned ""Bridam" is a one-syllable portmanteau of "bridge" and "dam".". It doesn't understand portmanteaus and it doesn't understand syllables, it just sees that when people in its dataset talk about portmanteaus in this pattern, they mash together a word. It has the letters, so it gets right that "bridam" is a portmanteau of "bridge" and "dam", but it can't comply with the "one-syllable" aspect. If you ask it for the number of syllables in a word that is in its dataset, it usually gets it right, because people talk about how many syllables are in words. In fact, if you ask it how many syllables are in the word "bridam" it correctly says two, because the pattern is close enough to what's in its dataset. But you can immediately ask it to create a one-syllable portmanteau and it will again return "bridam", because it's not holding a connected thought, it's just continuing to match the pattern in its data. It simply doesn't have actual syllable data. You'll even get responses such as, "A one-syllable portmanteau of "bridge" and "dam" could be "bridam" (pronounced as "brih-dam")." Even as it produces a syllable breakdown with two syllables due to its pattern-matching, it still produces a two syllable portmanteau while claiming it's one syllable.
A human child, if they know what a portmanteau is, can easily come up with a few options such as "bram", or "dadge".
> I've personally implemented a few (non-language) NN prediction models and they definitely extrapolate, someitmes in quite funny (but still essentially accurate) ways when given previously unseen (and ridiculous) inputs.
The form of extrapolation these systems are capable of isn't so much extrapolation, as matching that the response pattern should be longer and trying to fill it in with existing data. It's a very specific kind of extrapolation and again, not unexpected.
EDIT: If you want to see the portmanteau thing in action, it's easy to see how the pattern-matching is applying to this:
me: Give me a one-syllable portmanteau of "cap" and "plant"?
ChatGPT: Caplant
me: Give me a one-syllable portmanteau of "bat" and "tar"
ChatGPT: Batar
me: Give me a one-syllable portmanteau of "fear" and "red"
ChatGPT: Ferred
Pick pretty much any two one-syllable words where the first ends with the same letter as the second begins with, and it will go for a two-syllable portmanteau.
It's not "inventing" new languages, it's pattern matching from the massive section of its dataset taken from the synthetic languages community.
As my other example shows, it can't handle simple languages like "English backwards."
> For the reason why LLMs can do this and why they don't necessarily need to have seen the words they read or use, you can look up "BPE tokenization"
Are you saying that if I look this up, I can understand how ChatGPT does this? I thought you were trying to argue that we can't understand how ChatGPT works?
> Could you give me an example that isn't counting-related?
See backwards answers example.
And honestly, excluding counting-related answers is pretty arbitrary. It sounds like you've decided on your beliefs here, and are just rejecting answers that don't fit your beliefs.
> And honestly, excluding counting-related answers is pretty arbitrary. It sounds like you've decided on your beliefs here, and are just rejecting answers that don't fit your beliefs.
Just because LLMs are known to be flawed in certain ways, doesn't mean we understand how they can do most of the things they can do.
I will address your central point: "LLMs are not capable of original work because they are essentially averaging functions." Mathematically, this is false: LLMs are not computing averages. They are just as capable of extrapolating outside the vector space of existing content as they are interpolating within it.
> Are you saying that if I look this up, I can understand how ChatGPT does this? I thought you were trying to argue that we can't understand how ChatGPT works?
We understand how tokenization is performed in such a way that allows the network to form new words made out of multiple characters. We have no idea how GPT is able to translate, or how its able to follow prompts given in another language, or why GPT-like models start learning how to do arithmetic at certain sizes. Those are the capabilities I referred to as "emergent capabilities" and they are an active area of research.
I suggest you look at the paper I linked about emergent capabilities without trying to nitpick aspects that can be used to argue against my point. "Gotcha" debating is pointless and tiring.
> I will address your central point: "LLMs are not capable of original work because they are essentially averaging functions." Mathematically, this is false: LLMs are not computing averages.
For someone who thinks "Gotcha" debating is pointless and tiring, you sure read where I said: "[J]ust because you use a program to take the average of billions of integers which you can't possibly understand all of, doesn't mean you don't understand what the "average()" function does" ...and thought "Gotcha!" and responded to that, without reading the very next sentence where I said: "Obviously LLMs are a lot more complex than an average, but they aren't beyond human understanding."
If I had to succinctly describe my core point it would be:
We understand all the inputs (training data + prompts) and we understand all the code that transforms those inputs into the outputs (responses), therefore we understand how this works.
> We have no idea how GPT is able to translate
It is able to translate because there are massive amounts of translations in its training data.
> or how its able to follow prompts given in another language
Because there are massive amounts of text in that language in its training data.
> why GPT-like models start learning how to do arithmetic at certain sizes
I'm pretty sure that isn't actually proven. I doubt that it's not primarily a function of size, but rather a function of what's in the training data. If you train the model on a dataset which doesn't contain enough arithmetic for the GPT model to learn arithmetic, it won't learn arithmetic. More data generally means more arithmetic data (an absolute value, not a percentage), so a larger dataset gives it enough arithmetic data to establish a matchable pattern in the model. But it's likely that if you, for example, filtered your training data to get only the arithmetic data and then used that to train the model, you could get a GPT-like model to do arithmetic with a much smaller dataset.
I say "primarily" because the definition of "arithmetic data" is pretty difficult to pin down. Textual data which doesn't contain literal numerical digits, for example, will likely contain some poorly-represented arithmetic i.e. "one and one is two" sort of stuff that has all the potential meanings of "and" and "is" mucking up the data. A dataset might have to be orders of magnitude larger if this is the sort of arithmetic data it contains.
In each of these cases, there are certainly some answers we (you and I) don't have because we don't have the training data or the computing power to ask. For example, if we wanted to know what kind of data teaches the LLM arithmetic most effectively, we'd have to acquire a bunch of data and train a bunch of models and then test their performance. But that's a far cry from "We have no idea". Given what we know about how the programs work and what the input data is, we can reason very effectively about how the program will behave even without access to the data. And given that some people do have access to the training data, the idea that we (all humans) can't understand this, is very much not in evidence.
> I suggest you look at the paper I linked about emergent capabilities without trying to nitpick aspects that can be used to argue against my point.
I had read the paper before you linked it, and did not think it was a particularly well-written paper, because of the criticism I posted earlier.
I think using the phrase "emergent capabilities" to describe "An ability is emergent if it is not present in smaller models but is present in larger models." is a poor way to communicate that idea, which has been seized upon by media and misunderstood to mean that something unexpected has occurred. If you understand how LLMs work, then you know that larger datasets produce more capabilities. That's not unexpected at all: it is blindingly obvious that more training data results in a better-trained model. They spent a lot of time justifying the phrase "emergent capabilities" in the article, likely because they knew that the public would seize upon the phrase and misinterpret it.
If you don't believe I read the paper, you'll note that my doubt that GPT's ability to do arithmetic actually is a function of size originally actually came from the paper, which notes "We made the point [...] that scale is not the only factor in emergence[.]"
There's a separate issue with what you've said in this conversation. It seems that you believe we don't understand the models, because we didn't produce the weights. Is that an accurate representation of your belief? Note that I'm asking, not telling you what you believe: please don't tell me what my central point is again.
> As my other example shows, it can’t handle simple languages like “English backwards.”
But that’s not an LLM issue, its a representation model issue. That would be a simple language variant for a character-based LLM, but its a particularly difficult one for a token-based one, in the same way that certain tasks are more difficult for a human with, say, severe dyslexia.
Yes, that's my point: we understand how LLMs work (for example, we know ChatGPT is token-based, not character based), which is what allows us to predict where they'll fail.
Describing things that LLMs can't do isn't proof we understand them. I can describe things that human brains cannot do without understanding how human brains work.
> The article is, frankly, not very interesting. You change the inputs and get different outputs? What a surprise!
Here is what this argument amounts to:
"Humans are, frankly, not very interseting. You change the things you tell them and you get different responses? What a surprise!"
I'm not sure what your point is. I wish you'd just read the paper sigh.
Quote from the paper:
> Emergent few-shot prompted tasks are also unpredictable in the sense that these tasks are not explicitly included in pre-training, and we likely do not know the full scope of few-shot prompted tasks that language models can perform.
> "Humans are, frankly, not very interseting. You change the things you tell them and you get different responses? What a surprise!"
Maybe try making an argument that doesn't involve twisting what I say. I notice you haven't responded to my other post where I clarified what my core point is.
If you wrote a paper saying that people respond to different stimuli with different responses, I would in fact say that paper is not interesting. Just as the paper about AIs creating different outputs from different inputs is not interesting.
That isn't a statement about humans, just as my other statement wasn't a statement about AI. It's a statement about the paper.
In fact, I do think ChatGPT and other LLMs are very interesting. They're just not beyond human understanding.
Again: LLMs are very interesting. The paper you posted, isn't interesting.
> I'm not sure what your point is. I wish you'd just read the paper sigh.
I did read the paper, and as long as you accuse me of not reading the paper, my only response is going to be that I did read the paper. It would be a better use of everyone's time if you refrained from ad hominem attacks.
> Quote from the paper:
> > Emergent few-shot prompted tasks are also unpredictable in the sense that these tasks are not explicitly included in pre-training, and we likely do not know the full scope of few-shot prompted tasks that language models can perform.
1. As mentioned before, they're using a pretty specific meaning of "emergent".
2. The word "explicitly" is doing a lot of work there. If you've got a massive training dataset and you don't explicitly include arithmetic, but you include, for example, Wikipedia, which contains a metric fuckton of arithmetic, then maybe it's not so surprising that your model learns something about arithmetic.
As with "emergent abilities", this is poor communication--and as with "emergent abilities", I think it's intentional. They're writing a paper that's likely to attract attention, because that's how you get funding for future research. But since they know that it wouldn't the paper also has to pass academic scrutiny, they include weasel-words like "explicitly" so that what they're saying is technically true. To the non-academic eye, this looks like they're claiming the abilities are surprising, but to one practiced in reading this sort of fluff, it's clear that the abilities aren't surprising. They're only "surprising in the sense that..." which is a much narrower claim. In fact, the claim they're making amounts to, "Emergent few-shot prompted tasks are also unpredictable in the sense that you can't predict them if you don't bother to look at what's in your training data".
Likewise with the second half of the sentence. Obviously we don't know all the few-shot prompted tasks the models can perform: it's trivial to prove that's an infinite set. But that doesn't mean that, for given few-shot prompted tasks and a look through the training data, you can't predict whether the model can perform it.
The paper is full of these not-quite interesting claims. Perhaps more directly in line with your point, the paper says: "We have seen that a range of abilities—in the few-shot prompting setup or otherwise—have thus far only
been observed when evaluated on a sufficiently large language model. Hence, their emergence cannot be
predicted by simply extrapolating performance on smaller-scale models." That's again obviously true: obviously, if you look only at the size of the models and don't look at the training data, you won't be able to predict what's cached in the model. If you don't look at the inputs, and you'll be surprised by the outputs!
I don't blame the authors for doing this by the way--it's just part of the game of getting your research published. If they didn't do this, we might not have read the paper because it might never have gotten published. Don't hate the player, hate the game.
How do you know how something with 5.3 trillion transistors works? If it's Micron's 2TB 3D-stacked NAND, we know exactly how it works, because we understand how it was made.
Just putting a very high number on something doesn't mean it's automatically sentient, or even incomprehensible.
Just because we don't know the precise means by which ChatGPT arrives at one particular answer to a prompt doesn't mean we don't understand the underlying computations and data structures that make it up, and that they don't add up to sentience any more than Eliza does.
We don't understand them. There is a key difference between building something from parts and running gradient descent to automatically find the network connectivity. That difference is that we don't understand the final result at all.
We understand exactly how it works. It just works in such a way that we cannot predict the outcome, which makes it pretty bad for many applications. If you can't explain how it works doesn't mean it's not understood how it works.
We know how LLM work. Parameters. Training data. Random number generator. That stuff.
We don't know why it outputs what it outputs because rngs are notoriously unpredictable and we know it. So we are surprised but that in itself is unsurprising.
The same way you know how to compute the average of 135 billion numbers. You can't look at all 135 billion numbers in the input data, but you can easily understand that preference and originality aren't going to emerge from computing averages.
Obviously the function of an LLM is a lot more complicated than the "average()" function, but it's not beyond human understanding. I'd venture it can be understood by an average 3rd-year undergraduate CS student.
> Obviously the function of an LLM is a lot more complicated than the "average()" function, but it's not beyond human understanding. I'd venture it can be understood by an average 3rd-year undergraduate CS student.
Then by all means, please share with the class. If it’s so easy a third year could understand it then an expert such as yourself should find it mind-numbingly easy to explain it to everyone else.
Nothing in that tweet is in any way related to anything I've said. It's certainly not about the distinction between how NNs are trained and how they work. The way NNs are trained is part (and obviously only part) of how they work.
I'm well aware that LLMs are not simple. Remember when I said you'd need around two years of a college CS degree to understand it, and elsewhere where I said that it could be a college course?
The area between "simple" and "beyond human understanding" is pretty large, though.
Your description of "emergent capabilities" so far is pointing to a paper which says that putting larger data into a model produces different results, which is, you know, obvious. Calling those differences "emergent capabilities" is extraordinarily poor communication.
What I mean when I say "emergent capabilities" is capabilities which aren't explained by the input data and program code, and you have yet to present a single one. Certainly nothing that could be called originality or preference, which were my two original examples of what ChatGPT doesn't have.
Again, what "emergent behavior"? By which I mean, what behaviors do these models have which are not easily explained by the program design and inputs?
The CNN weights are not programmed by us, they're programmed by a program which is programmed by us. This level of indirection doesn't suddenly mean we don't understand anything.
Another way of thinking of it: all we have is a program written by us, which takes the training data and a prompt as inputs, and spits out an output. The CNN weights are just a compressed cache of the training data part of the inputs so that you don't have to train the model for every prompt.
The emergent behavior is much more obvious in GPT-4 than in GPT-3.5. It seems to be arising when the data sets get extremely large.
I notice it when the AI conversation is extended for a number of interactions - the AI appears to take the initiative to produce discourse that would not be expected in just LLMs, and which seems more human. It's hard to put a finger on, but, as a human, "I know it when I see it".
Since injecting noise is part of the algorithm, the AI output is different for each cycle. The weights are partially stochastic and not fully programmed. The feedback weights are likely particularly sensitive to this.
In any case, it's early days. Check out the Microsoft paper, Sparks of Artificial General Intelligence: Early experiments with GPT-4
> The emergent behavior is much more obvious in GPT-4 than in GPT-3.5.
What emergent behavior?
> I notice it when the AI conversation is extended for a number of interactions - the AI appears to take the initiative to produce discourse that would not be expected in just LLMs, and which seems more human.
Maybe that's not what you expect, but that's exactly what I would expect. More training data, better trained models. Given they're being trained with human data, they're acting more like the human data. Note that doesn't mean they're acting more human. But it can seem more human in some ways.
> The weights are partially stochastic and not fully programmed.
Right... but with the law of averages a the randomness would eventually tune out. You might end up with different weights but that just indicates different means of performing similar tasks. It's always an approximation, but the "error" would decrease over repeated sampling.
> In any case, it's early days. Check out the Microsoft paper, Sparks of Artificial General Intelligence: Early experiments with GPT-4
This is like saying a human brain put into a blender is still a human brain.
Also you can have 2 different models with the exact same number of parameters and the exact same architecture with different weights and get 2 completely different AI's. Look at all the Stable Diffusion variants. There's one that will try to make any prompt you give it into porn.
> This is like saying a human brain put into a blender is still a human brain.
I'm not saying the inputs and outputs are the same, I'm saying that the outputs cannot contain anything that isn't in the inputs.
> Also you can have 2 different models with the exact same number of parameters and the exact same architecture with different weights and get 2 completely different AI's.
It sounds like you understand why that's happening (different weights), so this isn't an effective argument that we don't understand the programs.
Then maybe you should listen to people who do instead of speculating.