I have the same issue plus unnecessary refactorings (that break functionality). ...

mgw · 2025-05-06T15:55:01 1746546901

This has also been my biggest gripe with Gemini 2.5 Pro. While it is fantastic at one-shotting major new features, when wanting to make smaller iterative changes, it always does big refactors at the same time. I haven't found a way to change that behavior through changes in my prompts.

Claude 3.7 Sonnet is much more restrained and does smaller changes.

cryptoz · 2025-05-06T16:07:24 1746547644

This exact problem is something I’m hoping to fix with a tool that parses the source to AST and then has the LLM write code to modify the AST (which you then run to get your changes) rather than output code directly.

I’ve started in a narrow niche of python/flask webapps and constrained to that stack for now, but if you’re interested I’ve just opened it for signups: https://codeplusequalsai.com

Would love feedback! Especially if you see promising results in not getting huge refactors out of small change requests!

(Edit: I also blogged about how the AST idea works in case you're just that curious: https://codeplusequalsai.com/static/blog/prompting_llms_to_m...)

HenriNext · 2025-05-06T17:32:51 1746552771

Interesting idea. But LLMs are trained on vast amount of "code as text" and tiny fraction of "code as AST"; wouldn't that significantly hurt the result quality?

cryptoz · 2025-05-06T17:55:28 1746554128

Thanks and yeah that is a concern; however I have been getting quite good results from this AST approach, at least for building medium-complexity webapps. On the other hand though, this wasn't always true...the only OpenAI model that really works well is o3 series. Older models do write AST code but fail to do a good job because of the exact issue you mention, I suspect!

jtwaleson · 2025-05-06T17:02:47 1746550967

Having the LLM modify the AST seems like a great idea. Constraining an LLM to only generate valid code would be super interesting too. Hope this works out!

tough · 2025-05-06T17:34:05 1746552845

Interesting, i started playing with ts-morph and neo4j to parse TypeScript codebases.

simonw has symbex which could be useful for you for python

polyaniline · 2025-05-07T12:48:05 1746622085

Asking it explicitly once (not necessarily every new prompt in context) to keep output minimal and strive to do nothing more than it is told works for me.

nolist_policy · 2025-05-06T16:40:35 1746549635

Can't you just commit the relevant parts? The git index is made for this sort of thing.

tasuki · 2025-05-06T17:46:51 1746553611

It's not always trivial to find the relevant 5 line change in a diff of 200 lines...

fwip · 2025-05-06T18:33:29 1746556409

Really? I haven't tried Gemini 2.5 yet, but my main complaint with Claude 3.7 is this exact behavior - creating 200+ line diffs when I asked it to fix one function.

bugglebeetle · 2025-05-06T16:05:31 1746547531

This is generally controllable with prompting. I usually include something like, “be excessively cautious and conservative in refactoring, only implementing the desired changes” to avoid.

fkyoureadthedoc · 2025-05-06T15:54:35 1746546875

Where/how do you use it? I've only tried this model through GitHub Copilot in VS Code and I haven't experienced much changing of random things.

diggan · 2025-05-06T16:21:48 1746548508

I've used it via Google's own AI studio and via my own library/program using the API and finally via Aider. All of them lead to the same outcome, large chunks of changes to a lot of unrelated things ("helpful" refactors that I didn't ask for) and tons of unnecessary comments everywhere (like those comments you ask junior devs to stop making). No amount of prompting seems to address either problems.

dherikb · 2025-05-06T15:54:53 1746546893

I have the exactly same issue using it with Aider.