It fundamentally can’t recurse into a thought process. Let’s say I give you a symbol table where each symbol means something and ask you to “evaluate” this list of symbols. You can do that just fine, but even in theory not even GPT-10384 will be able to do that without changing the whole underlying model itself.
Could you try writing even in this simple language a longer program? Just simply increase the input to 20x or something around that. I’m interested in whether it will break and if it does, at what length.
Interesting, it screwed up at step 160. I think it probably ran out of context, if I explicitly told it to output each step in a more compact way it might do better. Or if I had access to the 32k context length it would probably get 4x further.
Actually it might be worth trying to get it to output the original instructions again every 100 steps, so that the instructions are always available in the context. The ChatGPT UI still wouldn't let you output that much at once but the API would.
Obviously there is a limit to how much it can fit in the context, but that seems to be rising fast (went from 4k to 32k in not that long)