Whenever I try to tell people about the myth of the objective they look at me like I'm insane. It's not very popular to tell people that their best laid plans are actually part of the problem.
I would suspect that any next step comes with a novel implementation though, not just trying to scale the same shit to infinity.
I guess the bitter lesson is gospel now, which doesn't sit right with me now that we're past the stage of Moore's Law being relevant, but I'm not the one with a trillion dollars, so I don't matter.
This is just the new version of "works on my machine". Oh, I was able to contrive a correct answer from my prompt because the random number generator smiled upon me today.
I give myself 6-18 months before I think top-performing LLM's can do 80% of the day-to-day issues I'm assigned.
> Why doesn’t anyone acknowledge loops like this?
Thisis something you run into early-on using LLM's and learn to sidestep. This looping is a sort of "context-rot" -- the agent has the problem statement as part of it's input, and then a series of incorrect solutions.
Now what you've got is a junk-soup where the original problem is buried somewhere in the pile.
Best approach I've found is to start a fresh conversation with the original problem statement and any improvements/negative reinforcements you've gotten out of the LLM tacked on.
I typically have ChatGPT 5 Thinking, Claude 4.1 Opus, Grok 4, and Gemini 2.5 Pro all churning on the same question at once and then copy-pasting relevant improvements across each.
I concur. Something to keep in mind is that it is often more robust to pull an LLM towards the right place than to push it away from the wrong place (or more specifically, the active parts of its latent space). Sidenote: also kind of true for humans.
That means that positively worded instructions ("do x") work better than negative ones ("don't do y"). The more concepts that you don't want it to use / consider show up in the context, the more they do still tend to pull the response towards them even with explicit negation/'avoid' instructions.
I think this is why clearing all the crap from the context save for perhaps a summarizing negative instruction does help a lot.
> positively worded instructions ("do x") work better than negative ones ("don't do y")
I've noticed this.
I saw someone on Twitter put it eloquently: something about how, just like little kids, the moment you say "DON'T DO XYZ" all they can think about is "XYZ..."
> That means that positively worded instructions ("do x") work better than negative ones ("don't do y").
In teacher school, we're told to always give kids affirmative instructions, ie "walk" instead of "don't run". The idea is that it takes more energy for a child to figure out what to do.
> This looping is a sort of "context-rot" -- the agent has the problem statement as part of it's input, and then a series of incorrect solutions.
While I agree, and also use your work around, I think it stands to reason this shouldn't be a problem. The context had the original problem statement along with several examples of what not to do and yet it keeps repeating those very things instead of coming up with a different solution. No human would keep trying one of the solutions included in the context that are marked as not valid.
I'm sure somewhere in the current labs there are teams that are trying to figure out context pruning and compression.
In theory you should be able to get a multiplicative effect on context window size by consolidating context into it's most distilled form.
30,000 tokens of wheel spinning to get the model back on track consolidated to 500 tokens of "We tried A, and it didn't work because XYZ, so avoid A" and kept in recent context
I agree it shouldn't be a problem, but if you don't regularly run into humans who insist on trying solutions clearly signposted as wrong or not valid, you're far luckier than I am.
> I give myself 6-18 months before I think top-performing LLM's can do 80% of the day-to-day issues I'm assigned.
This is going to age like "full self driving cars in 5 years". Yeah it'll gain capabilities, maybe it does do 80% of the work, but it still can't really drive itself, so it ultimately won't replace you like people are predicting. The money train assures that AGI/FSD will always be 6-18 months away, despite no clear path to solving glaring, perennial problems like the article points out.
> The money train assures that AGI/FSD will always be 6-18 months away
I vividly remember when some folks from Microsoft come to my school to give a talk at some Computer Science event and proclaimed that yep, we have working AGI, the only limiting factor is hardware, but that should be resolved in about ten years.
Birth control uses Ethinyl Estradiol which, despite the name, is not actually Estradiol and so does not undergo the same metabolic pathways and metabolites production
I know this because I recently had to source exogenous Estradiol for my wife after making this same mistaken assumption and being surprised at bloodwork and lack of improvement
reply