Hacker News new | past | comments | ask | show | jobs | submit login

I've found that as LLMs improve, some of their bugs become increasingly slippery - I think of it as the uncanny valley of code.

Put another way, when I cause bugs, they are often glaring (more typos, fewer logic mistakes). Plus, as the author it's often straightforward to debug since you already have a deep sense for how the code works - you lived through it.

So far, using LLMs has downgraded my productivity. The bugs LLMs introduce are often subtle logical errors, yet "working" code. These errors are especially hard to debug when you didn't write the code yourself — now you have to learn the code as if you wrote it anyway.

I also find it more stressful deploying LLM code. I know in my bones how carefully I write code, due to a decade of roughly "one non critical bug per 10k lines" that keeps me asleep at night. The quality of LLM code can be quite chaotic.

That said, I'm not holding my breath. I expect this to all flip someday, with an LLM becoming a better and more stable coder than I am, so I guess I will keep working with them to make sure I'm proficient when that day comes.






I have been using LLMs for coding a lot during the past year, and I've been writing down my observations by task. I have a lot of tasks where my first entry is thoroughly impressed by how e.g. Claude helped me with a task, and then the second entry is a few days after when I'm thoroughly irritated by chasing down subtle and just _strange_ bugs it introduced along the way. As a rule, these are incredibly hard to find and tedious to debug, because they lurk in the weirdest places, and the root cause is usually some weird confabulation that a human brain would never concoct.

Saw a recent talk where someone described AI as making errors, but not errors that a human would naturally make and are usually "plausible but wrong" answers. i.e. the errors that these AI's make are of a different nature than what a human would do. This is the danger - that reviews now are harder; I can't trust it as much as a person coding at present. The agent tools are a little better (Claude Code, Aider, etc) in that they can at least take build and test output but even then I've noticed it does things that are wrong but are "plausible and build fine".

I've noticed it in my day-to-day: an AI PR review is different than if I get the PR from a co-worker with different kinds of problems. Unfortunately the AI issues seem to be more of the subtle kind - the things if I'm not diligent could sneak into production code. It means reviews are more important, and I can't rely on previous experience of a co-worker and the typical quality of their PR's - every new PR is a different worker effectively.


I'm curious where that expectation of the flip comes from? Your experience (and mine, frankly) would seem to indicate the opposite, so from whence comes this certainty that one day it'll change entirely and become reliable instead?

I ask (and I'll keep asking) because it really seems like the prevailing narrative is that these tools have improved substantially in a short period of time, and that is seemingly enough justification to claim that they will continue to improve until perfection because...? waves hands vaguely

Nobody ever seems to have any good justification for how we're going to overcome the fundamental issues with this tech, just a belief that comes from SOMEWHERE that it'll happen anyway, and I'm very curious to drill down into that belief and see if it comes from somewhere concrete or it's just something that gets said enough that it "becomes true", regardless of reality.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: