>I feel that LLMs raise some very interesting challenges for anyone trying to figure out what it means to understand something and how we do it, but I am not yet ready to agree with Hinton.
Agreed. What LLMs say about understanding deserves a lot more attention than it has received. I wrote down some of my thoughts on the matter:
>Do LLMs do these things, or is what they produce a result of having a lot of information about the purely formal properties of human language use, independently of semantics?
These two points aren't necessarily in opposition, and understanding why is I think key to solving a lot of important problems around intelligence, sentience, etc. To compute is to operate on formal properties. But this doesn't exclude semantic properties from having causal relevance to the behavior of the system. What we need is a way to conceptualize how a system can have multiple related descriptions at different levels. A description at the level of semantics doesn't exclude a description in terms of formal properties or vice versa. I think of it in terms of constraints: the higher level descriptions constrain the lower level behavior. What the computational description does is ensures the higher level semantic constraint is maintained. It does this by the particular space of computational dynamics it follows. Essentially, the information that picks out this programs space of branching dynamics embeds the semantic description in question, and this description realizes the computational dynamic necessary to maintain the higher level semantic constraint. Rather than semantics being in opposition to formal properties, they are two sides of the same coin.
I agree with a lot of what you say in the linked article, and I particularly agree that it is not helpful to define understanding in a way that would, a priori, make it a category error to propose that a suitably-programmed computer might understand things. I do, however, have a few words to say about the relationship between modeling and understanding. I can easily accept that an ability to model is necessary in order to understand something, but I feel the idea that it is sufficient would leave something out.
For example, meteorologists understand a lot about the weather in terms of the underlying physics, representing it as a special application of more general laws, but they are not very good at predicting it. Machine learning produces models which are much better predictors, but it does not seem to follow that they have a superior understanding of the weather.
One problem in assessing whether a token predictor has some sort of understanding is that if its training material is consistent with the supposition that, broadly speaking, it was produced by people who do have a reasonable understanding of what they were writing about, then it seems likely that the productions of a good predictor would unavoidably have that feature as well - but maybe that just is how most human understanding works? I am on the fence on this one.
>Machine learning produces models which are much better predictors, but it does not seem to follow that they have a superior understanding of the weather.
Fair points, and I agree. I don't recall if I made this point in the linked piece, but I think the extra function is a model embedded within some dynamic such that the capacity for modelling is in service to some goal. The goal can be simple like answering questions or something more elaborate. But the point is to engage the model as to influence the dynamic in a semantically rich way. The model itself doesn't represent understanding, but a process that understands will have a model that can be queried and manipulated in various ways corresponding to the process' goals.
>then it seems likely that the productions of a good predictor would unavoidably have that feature as well
Yeah, assessment is hard because of the sheer size of the training data. We can't be sure that some seemingly intelligent response isn't just recalling a similar query from training. One of the requirements for understanding is the counterfactual capacity, being able to report accurate information that is derivative of the training data but not explicitly in the training data. The Sparks of AGI paper, assuming it can be believed, demonstrates this capacity IMO. Particularly where GPT-4 draws a graph of a room after having been given navigation instructions. But its hard to make a determination in particular cases.
Agreed. What LLMs say about understanding deserves a lot more attention than it has received. I wrote down some of my thoughts on the matter:
https://www.reddit.com/r/naturalism/comments/1236vzf
>Do LLMs do these things, or is what they produce a result of having a lot of information about the purely formal properties of human language use, independently of semantics?
These two points aren't necessarily in opposition, and understanding why is I think key to solving a lot of important problems around intelligence, sentience, etc. To compute is to operate on formal properties. But this doesn't exclude semantic properties from having causal relevance to the behavior of the system. What we need is a way to conceptualize how a system can have multiple related descriptions at different levels. A description at the level of semantics doesn't exclude a description in terms of formal properties or vice versa. I think of it in terms of constraints: the higher level descriptions constrain the lower level behavior. What the computational description does is ensures the higher level semantic constraint is maintained. It does this by the particular space of computational dynamics it follows. Essentially, the information that picks out this programs space of branching dynamics embeds the semantic description in question, and this description realizes the computational dynamic necessary to maintain the higher level semantic constraint. Rather than semantics being in opposition to formal properties, they are two sides of the same coin.