None of what you're saying really addresses the comment, which is a human needs ...

None of what you're saying really addresses the comment, which is a human needs to review all this or it likely won't work. Maybe they will get that work done faster.

But you have shared your experience, this is my experience.

- They get tired when the context is too big. They also can't be reliably run by themselves, so it doesn't really matter if they can be run at 3AM when I'm asleep, I wouldn't do that.

- Searching the internet with LLMs is ass because it combines the worst of both worlds (remember people have been using LLMs to NOT search the internet).

- It's a toss up whether "iterating on test cases" means follow the rules or get stuck in an infinite loop. I have had the latest and most expensive models ping pong themselves between the same two broken lines of code because they are just LLMs.

I'm enjoying Cursor for now, but I am also working on a string of really basic Laravel apps for a few clients and it still gets things wrong. They are useless for novel problems or niche tech.