GPT-4 reasons about a lot of gotcha logic puzzles correctly any pre GPT-4 opinions should be reconsidered, that is effectively two different epochs in the history of AI effectiveness and reasoning.
> Me: Mike leaves the elevator at the 2nd floor. Jenny leaves at the 9th floor. Who left the elevator first?
> GPT-4: Mike left the elevator first, as he got off at the 2nd floor, while Jenny left at the 9th floor.
> Me: If the elevator started at the 10th floor, who would have left first?
> GPT-4: If the elevator started at the 10th floor and went downward, then Jenny would have left first, as she got off at the 9th floor, while Mike left at the 2nd floor.
> Me: How many footballs fit in a coffe-mug?
> GPT-4: A standard football (soccer ball) has a diameter of around 22 centimeters (8.65 inches), while a coffee mug is typically much smaller, with a diameter of around 8-10 centimeters (3-4 inches). Therefore, it is not possible to fit a standard football inside a coffee mug. If you were to use a mini football or a much larger mug, the number of footballs that could fit would depend on the specific sizes of the footballs and the mug.
It easily answered all of your questions and produces explanations I would expect most reasonable people to make.
That changes exactly nothing about the validity of my statement.
Yes, GPT-4 is better at this mimicry than GPT-3 or GPT-3.5. And GPT-3 was better at it than GPT-2. And all of them were better than my out-of-fun home-built Language Model projects that I trained on small <10GiB Datasets, which in turn were better at it than my Poc models
trained on just a few thousand words.
But being better at mimicking reason, is still not reasoning. The model doesn't know what a coffeemug is, and it doesn't know what a football is. It also has no idea how elevators work. It can form sequences that make it look to us that it does and knows all these things, but in reality, it only knows that "then Jenny would have left first" is a more likely sequence of tokens at that point, given that the sequence before included "started at the 10th floor".
Bear in mind, this doesn't mean that this mimicry isn't useful. It is, tremendously so. I don't care how I get correct answers, I only care that I do.
Simple: I know that humans have intentionality and agency. They want things, they have goals both immediate and long term. Their replies are based not just on the context of their experiences and the conversation but their emotional and physical state, and the applicability of their reply to their goals.
And they are capable of coming up with reasoning about topics for which they have no prior information, by applying reasonable similarities. Example: Even if someone never heard the phrase "walking a mile in someone elses shoes", most humans (provided they speak english) have no difficulty in figuring out what this means. They also have no trouble figuring out that this is a figure of speech, and not a literal action.
>Simple: I know that humans have intentionality and agency. They want things, they have goals both immediate and long term. Their replies are based not just on the context of their experiences and the conversation but their emotional and physical state, and the applicability of their reply to their goals.
This all seems orthogonal to reasoning, but also who is to say that somewhere in those billions of parameters there isn't something like a model of goals and emotional state? I mean, I seriously doubt it, but I also don't think I could evidence that.
Correct, but the problem is how you prove that for humans is by using the output and inferring that. You can apply the same criteria to ML models. If you don't, you need some other criteria to rule out that assumption for ML models.
For humans I can simply refer to my own internal state and look at how I arrive by conclusions.
I am of course aware that this is essentially a form of Ipse dixit, but I will do it anway in this case, because I am saying it as a human, about humans, and to other humans, and so the audience can just try it for themselves.
> You assume that. You can only maybe know that about yourself.
I can also only say with certainty that planetary gravity is an attracting force on the very spot I am standing on. I haven't visited every spot on every planet in the universe after all.
That doesn't make it any more likely that my extrapolation of how gravity works here is wrong somewhere else. Russels Teapot works both ways.
> How do you know that the ML model doesn't?
For the same reason why I know that a Hammer or an Operating System don't. I know how they work. Not in the most minute details, and of course the actual model is essentially a black box, but it's architecture, and MO are not.
It completes sequences. That is all it does. It has no semantic understanding of the things these sequences represent. It has no understanding of true or false. It doesn't know math, it doesn't know who person xyz is, it doesn't know that 1993 already happened and 2221 did not. It cannot have abstract concepts of the things represented by the sequences, because the sequences are the things in its world.
It knows that a sequence is more or less likely to follow another sequence. That's it.
From that limited knowledge however, it can very successfully mimick things like math, logic, and even reasoning to an extend. And it can mimick them well enough to be useful in a lot of areas.
But that mimickry, however useful, is still mimickry. It's still the Chinese-Room thought experiment.
However, I don't really accept the idea that this isn't reasoning, but I'm not entirely sold either way.
I'd say if it mimics something well enough then eventually it's just doing the thing, which is the same side of the argument I fall on with Searle's Chinese Room Argument. If you can't discern a difference, is there a difference?
So far GPT-4 can produce better work than like 50% of humans and better responses to brain teaser questions than most of them too, I'm at least just in a bubble and so I don't run into people that stupid that often. So it's easier for me to see the gaps still.
> I'd say if it mimics something well enough then eventually it's just doing the thing
Right up to the point where it actually needs to reason, and the mimickry doesn't suffice.
My above example about the Football and the Coffemug is an easy one, the objects are well represented in its training data. What if I need a reason why the Service Ping spikes every 60 seconds, here is the code, please LLM look it up. I am sure I will get a great and well written answer.
I am also sure it won't be the correct one, which is that some dumb script I wrote, which has nothing to do with the code shown, blocks the server for about 700ms every minute.
Figuring out that something cannot be explained with the data represented, and thus may come from a source unseen, is one example of actual reasoning. And this "giving up on the data shown" is something I have yet to see any AI do.
I could say the same about most second rate software engineers. Thats why im not moved by your arguments. Theres plenty of peope just as stupid and who will give you confidently wrong answers.
> Me: Mike leaves the elevator at the 2nd floor. Jenny leaves at the 9th floor. Who left the elevator first?
> GPT-4: Mike left the elevator first, as he got off at the 2nd floor, while Jenny left at the 9th floor.
> Me: If the elevator started at the 10th floor, who would have left first?
> GPT-4: If the elevator started at the 10th floor and went downward, then Jenny would have left first, as she got off at the 9th floor, while Mike left at the 2nd floor.
> Me: How many footballs fit in a coffe-mug?
> GPT-4: A standard football (soccer ball) has a diameter of around 22 centimeters (8.65 inches), while a coffee mug is typically much smaller, with a diameter of around 8-10 centimeters (3-4 inches). Therefore, it is not possible to fit a standard football inside a coffee mug. If you were to use a mini football or a much larger mug, the number of footballs that could fit would depend on the specific sizes of the footballs and the mug.
It easily answered all of your questions and produces explanations I would expect most reasonable people to make.