I played chess against ChatGPT just yesterday, and it got into a winning position against me. After 24 moves it tried to play an illegal move and then when I told it that's illegal it played a bad move, and after that it didn't manage to find any more legal moves (I gave up after asking it to try again about 10 times).
It does the same when you ask it to be DM in a D&D game. It allows the players to do many, many things outside the rules. I don't remember any examples but a general idea was, "The character Frodo now has a the ability to breath fire. He breathes fire on the orcs." Although IIRC that was ChatGPT 3.5.
“If a player tries to do something not strictly within the rules of <insert game>, then you must inform me that it is an invalid move and not accept it”
GPT appears to be slightly tuned to default to “yes and” in these ‘creative’ situations by default rather than “block/deny”.
IMO lots of things where people don’t think GPT can do something end up being possible with basic prompt engineering. Usually people go too-short and too-non-specific on the prompt.
Did you repeat the board position back to it after each move? LLMs have a limited context, so they might forget the board position after a while, unless they're reminded.
But it's very close to being able to play chess.
My prompt was:
> we're playing chess. only send chess moves.
>
> 1. e4