Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LLMs have failed to live up to the hype, but they haven't failed outright.




Two claims here:

1) LLMs have failed to live up to the hype.

Maybe. Depends upon's who's hype. But I think it is fine to say that we don't have AGI today (however that is defined) and that some people hyped that up.

2) LLMs haven't failed outright

I think that this is a vast understatement.

LLMs have been a wild success. At big tech over 40% of checked in code is LLM generated. At smaller companies the proportion is larger. ChatGPT has over 800 million weekly active users.

Students throughout the world, and especially in the developed world are using "AI" at 85-90% (from some surveys).

Between 40% of professionals and 90% (depending upon survey and profession) are using "AI".

This is 3 years after the launch of ChatGPT (and the capabilities of chatGPT 3.5 were so limited compared to today that it is a shame that they get bundled together in our discussions). I would say instead of "failed outright" that they are the most successful consumer product of all time (so far).


> At big tech over 40% of checked in code is LLM generated. At smaller companies the proportion is larger.

I have a really hard time believing that stat without any context, is there a source for this?


from what I've seen in a several-thousand-eng company: LLMs generally produce vastly more code than is necessary, so they quickly out-pace human coders. they could easily be producing half or more of all of the code even if only 10% of the teams use it. particularly because huge changes often get approved with just a "lgtm", and LLM-coding teams also often use/trust LLMs for reviews.

but they do that while making the codebase substantially worse for the next person or LLM. large code size, inconsistent behavior, duplicates of duplicates of duplicates strewn everywhere with little to no pattern so you might have to fix something a dozen times in a dozen ways for a dozen reasons before it actually works, nothing handles it efficiently.

the only thing that matters in a business is value produced, and I'm far from convinced that they're even break-even if they were free in most cases. they're burning the future with tech debt, on the hopes that it will be able to handle it where humans cannot, which does not seem true at all to me.


Measuring the value is very difficult. However there are proxies (of varying quality) which are measured, and they are showing that AI code is clearly better than copy-pasted code (which used to be the #1 source of lines of code) and at least as "good" (again, I can't get into the metrics) as human code.

Hopefully one of the major companies will release a comprehensive report to the public, but they seem to guard these metrics.


many value/productivity metrics in use are just "Lines Of Code" in a trenchcoat. a game which LLMs are fantastic at playing.

> At big tech over 40% of checked in code is LLM generated.

Assuming this is true though, how much of that 40% is boilerplate or simple, low effort code that could have been knocked out in a few minutes previously? It's always been the case that 10% of the code is particularly thorny and takes 80% of the time, or whatever.

Not to discount your overall point, LLMs are definitely a technical success.


Before LLMs I used whatever autocomplete tech came with VSCode and the plugins I used. Now with Cursor a lot of what the autocomplete did is replaced with LLM output, at much greater cost. Counting this in the "LLM generated" statistic is misleading at best, and I'm sure it's being counted



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: