Yes, I think you could get quite far with a few tools like memory/todo list + code interpreter + script save/load. You could probably get a lot farther though if you RLVRed this similar to how o3 uses web search so effectively during it's thinking process.