It clearly *does not* reason. Take a famous riddle and make a paradox change. It...

int_19h · on May 6, 2023

Does the following satisfy your requirement for "a famous riddle with a paradox change"? Because GPT-4 aces it most of the time.

"Doom Slayer needs to teleport from Phobos to Deimos. He has his pet bunny, his pet cacodemon, and a UAC scientist who tagged along. The Doom Slayer can only teleport with one of them at a time. But if he leaves the bunny and the cacodemon together alone, the bunny will eat the cacodemon. And if he leaves the cacodemon and the scientist alone, the cacodemon will eat the scientist. How should the Doom Slayer get himself and all his companions safely to Deimos?"

Furthermore, it will reason if you tell it to reason. In this case it is not necessary, but in general, telling GPT to "think it out loud before giving the answer" will result in a more rigorous application of the rules. Better yet, tell it to come up with a draft answer first, and then self-criticize by analyzing the answer for factual correctness and logical reasoning in a loop.

SanderNL · on May 7, 2023

People will see patterns in this riddle and claim it is “just” altering those. “It’s just a bunch a patterns where you can switch the names, like templates”.

Isn’t everything like that?

“Uhh…”

I had the same discussions about chess.

“It has just memorized a bunch of high level patterns and juggles them around”.

I agree, but now I’m curious what you think chess is.

“Chess is not intelligence.”

Goalposts? Anyway, we move on to Go, the game. Same response. Programming, same, but the angle of the response is different now because programming is “clearly” intelligence incarnate.

“It programs and sometimes correctly, but it is a mirage. It will never attain True Programming.”

I’m sitting on the bench riding this one out. We’ll see.

Kim_Bruning · on May 6, 2023

I fed GPT-4 some really old fashioned spatial reasoning questions (inspired on SHRDLU), which it passed. Then when questioned about unstable configurations (which IIRC SHRDLU could not handle) it passed those too.

So it seems like it is definitely capable of some forms of reasoning. Possibly we both tested it in different ways, and some forms of reasoning are harder for it than others?

SanderNL · on May 6, 2023

If reasoning is amenable to being “embedded” at all then we should perhaps reconsider its fundamental nature?

It’s easy to say something like that, but what does it mean in situations where it is producing a novel and correct answer that isn’t guessible?