Does the following satisfy your requirement for "a famous riddle with a paradox change"? Because GPT-4 aces it most of the time.
"Doom Slayer needs to teleport from Phobos to Deimos. He has his pet bunny, his pet cacodemon, and a UAC scientist who tagged along. The Doom Slayer can only teleport with one of them at a time. But if he leaves the bunny and the cacodemon together alone, the bunny will eat the cacodemon. And if he leaves the cacodemon and the scientist alone, the cacodemon will eat the scientist. How should the Doom Slayer get himself and all his companions safely to Deimos?"
Furthermore, it will reason if you tell it to reason. In this case it is not necessary, but in general, telling GPT to "think it out loud before giving the answer" will result in a more rigorous application of the rules. Better yet, tell it to come up with a draft answer first, and then self-criticize by analyzing the answer for factual correctness and logical reasoning in a loop.
People will see patterns in this riddle and claim it is “just” altering those. “It’s just a bunch a patterns where you can switch the names, like templates”.
Isn’t everything like that?
“Uhh…”
I had the same discussions about chess.
“It has just memorized a bunch of high level patterns and juggles them around”.
I agree, but now I’m curious what you think chess is.
“Chess is not intelligence.”
Goalposts? Anyway, we move on to Go, the game. Same response. Programming, same, but the angle of the response is different now because programming is “clearly” intelligence incarnate.
“It programs and sometimes correctly, but it is a mirage. It will never attain True Programming.”
I’m sitting on the bench riding this one out. We’ll see.
I fed GPT-4 some really old fashioned spatial reasoning questions (inspired on SHRDLU), which it passed. Then when questioned about unstable configurations (which IIRC SHRDLU could not handle) it passed those too.
So it seems like it is definitely capable of some forms of reasoning. Possibly we both tested it in different ways, and some forms of reasoning are harder for it than others?
But yes, there is a lot of knowledge embedded into our global use of language and it is fascinating to see how it can be reproduced by such a model.