The question is what does programming with an LLM get you over batteries-included frameworks with scaffolding like Rails or Django? If the problem only requires a generic infra solution put together by an LLM instead of a bespoke setup, why not look into low-code/no-code PaaS solutions to start with? Unless the LLM is going to provide you with some uniquely better results than existing tools designed to solve the same problems, it feels like a waste of resources to employ GPUs to do what templates/convention-over-configuration/autocomplete/etc already did.
The point isn't that LLMs are useless, or that they aren't interesting technology in the abstract. The point is that aside from the very real entertainment value of being able to conjure artwork apparently out of thin air, when it comes to solving practical problems in the tech space, it's not clear that they are achieving significantly more - faster or cheaper - than existing tools and methods already did.
You're right that it's probably too early to have data to prove their utility either way, but given how much time, money and energy many companies have already sunk into this - precisely without any evidence to prove it's worthwhile - it does come across rather more like a hype cycle at the moment.
The question is what does programming with an LLM get you over batteries-included frameworks with scaffolding like Rails or Django?
Three years ago an LLM would conversationally describe what the code would look like.
Two years ago it might crib common examples with minor typos.
Last year it could do something that isn't on StackOverflow at the level of an intern.
Earlier this year it could do something that isn't on StackOverflow at the level of a junior engineer.
Last week I had a conversation with Claude 3.5 that went something like this:
Write an interactive command-line battleship game
Write a mouse interactive TUI for it
Add a cli flag to connect to `ollama` and ask it to make guesses
There's a bug: write the AI conversation to a file so I can show you
Try some other models: make options for calling OpenAI and Anthropic
GPT and Anthropic are throwing this error (it needed to switch APIs)
The models aren't doing as well as they can: engage them more conversationally
Elapsed time: a few hours. I didn't write any code. Keep in mind that unlike ChatGPT, Claude can't search the net for documentation - this was all "from memory".
I read these stories about using LLMs and I always wonder if it's survivor bias. Like I believe your experience. I've also had impressive results. But also a lot of times the ai gets lost and doesn't know what to do. So I'm willing to see it as a developer tool, but it's hard to see it become more general purpose in the next 6 months time frame people have been promising for the last two years.
I played with it a year ago and it really hasn't improved much since then. I even had it produce a few things similar to your battle ship demo.
And next year I don't see it improving much either if the best idea anybody has it just to give it more data, which seems to be the mantra in ML circles. There's not an infinite supply of data to give it.
Absolutely. I posted a similar experience developing a Chrome extension with GPT 4o in a hour or so when it would have taken me at least a day to do on my own. I have no idea how people are hand waving LLMs away as no big deal.
I think the only justification for such a position is if you are a graybeard with full mastery of a stack and that's all you work in. I've dealt with these guys over the years and they are indeed wizards at Rails or Django or what have you. In those cases, I could see the argument that they are actually more efficient than an LLM when working on their specialty.
Which I guess is the difference. I'm a generalist and I'm often working in technologies that I have little experience in. To me LLMs are a invaluable for this. They're like pair programming with somebody that has memorized all of Stack Overflow.
Where did you get that it can figure out things which was not feed into it (e.g. not on Stackoverflow)? In the past year, none could answer any of my questions, for which I couldn’t find anything on Google, in any reasonable ways. They failed very badly when there was no answer to my question, and the question should have been changed.
We’re going to have AI building Drupal sites soon. The platform is well architected for this. Most of the work is generating configuration files that scaffold the site. There are already AI integrations for content. The surface area is relatively small, and the options are well defined in code and documentation. I would not be surprised if we pull this off first. It’s one of the current project initiatives.
The coding part is still a hard problem. AI for front end and module code is still pretty primitive. LLMs are getting more helpful with that over time.
I’ve noticed that LLM speed me up working with languages I’m bad at, but slow me down when working in languages I’m good at.
When I hear people saying they use them for 80-90% of their code it kind of blows my mind. Like how? Making crazy intricate specs in English seems way more of a pain in the ass to me than just writing code.
I'm a FAANG Sr. Software Engineer, use it both in my company and personal projects, and it has made me much faster, but now I'm just "some other person who made this claim".
Can you publish your workflow? I'm on the hunt for resources from people who make the claim. Publishing their workflow in a repeatable way would go a long way.
I'm skeptical that we aren't inundated with tutorials that prove these extraordinary claims.
Quick example, I wanted to make a clickable visible piano keyboard. I told it I was using Vue and asked for the HTML/CSS to do this (after looking at several github and other examples that looked fairly complicated). It spat out code that worked out of the box in about 1m.
I gave it a package.json file that got messed up with many dependencies versions being off from each other, it immediately fixed them up.
I asked it to give me a specific way using BigQuery SQL to delete duplicate rows while avoiding a certain function, again, 1 minute, done.
I have given it broken code and said "this is the error" and it immediately highlights the error and shows a fix.
But given a large project, does any of that really average out to a multiple? Or is just a nice to have? This is what I keep encountering. It's all very focused on boilerplate or templating or small functions or utility. But at some point of code complexity, what is the % of time saved really?
Especially considering the amount of ai babysitting and verification required. AI code obviously cannot be trusted even if it "works."
I watched the video and there wasn't anything new compared to how I used Copilot and ChatGPT for over a year. I stopped because I realized eventually got in the way, and I felt it was preventing me from building the mental model and muscle memory that the early drudge work of a project requires.
I still map ai code completion to ctrl-; but I find myself hardly ever calling it up.
(For the record, I have 25+ years professional experience)
20+ years here. I've been exploring different ways of using them, and this is why I recommend that people start exploring now. IntelliJ with Copilot was an interesting start, but ended up being a bit of a toy. My early breakthrough was asking chat interfaces to make something small, then refine it over a conversation. It's very fast until it starts returning snippets that you need to fold into the previous outputs.
When Claude 3.5 came out with a long context length, you could start pasting a few files in, or have it break the project into a few files, and it would still produce consistent edits across them. Then I put some coins in the API sides of the chat models and started using Zed. Zed lets you select part of a file and specify a prompt, then it diffs the result over the selection and prompts to confirm the replace. This makes it much easier to validate the changes. There's also a chat panel where you can use /commands to specify which files should be included in the chat context. Some of my co-workers have been pushing Cursor as being even more productive. I like open source and so haven't used Cursor yet, but their descriptions of language-aware context are compelling.
The catch is that, whatever you use, it's going to get much better, for free. We haven't seen that since the 90's, so it's easy to brush it off, but models are getting better and there isn't a fkattening trend yet.
So I stand behind my original statement: this time is different. Do yourself a favor and get your hands dirty.
I'm 20+ years here and want to just call out it's a bit of a "moving goal posts with "but given a large project" comment.
I do think it is helpful in large projects, but much less so. I think the other comment gives a good example of how it can be useful, and it seems fairly obvious as context sizes are increasing exponentially in a short amount of time that it will be able to deal with large projects soon.
When using it in larger projects, I'm typically manipulating specific functions or single pages at a time and use a diff tool, so it comes across more as PR that I need to verify or tweak.
Sure, but advocates are talking about multiples of increased productivity, and how can that be defended if it doesn't scale to a project of some size and complexity? I don't care if I get a Nx increase on a small script or the beginning of a project. That's never been the pain point in development for me.
If someone said that, over a significant amount of time and effort, these tools saved them 5% or maybe even 10% then I would say that seems reasonable. But those aren't the kinds of numbers advocates are claiming. And even then, I'd argue that 5-10% comes with a cost in other areas.
And again, not to belabor the point, but where are the in-depth workflows published for senior engineers to get these productivity increases? Not short YouTube videos, but long form books and playlists and tutorials that we can use to replicate and verify the results?
Don't you think that's a little suspect that we haven't been flooded with them like we are with every other new technology?
No, I don't think it's suspect, because it seems like you're looking at a narrow-focus of productivity increase.
"advocates are talking about multiples of increased productivity", some are, some are not, and I don't think most people are, but sure, there's a lot of media hype.
It seems like the argument is akin to many generalized internet arguments "these [vague people] say [generality] about [other thing]".
There are places that I do think that it can make significant, multiples of difference, but it's in short spurts. Taking over code bases, learning a new language, non-tech co-founders can get started without a tech co-founder. I think it's the Jr Engineers that have a greater chance of being replaced, not the sr engineer.
All I can surmise from comments like this is that you must have invented some completely unreasonable bar for "evidence" that nothing can possibly pass. Either that, or you simply haven't looked at all.
I didn’t get anything from messing with LLM’s but I also don’t get much use out of stack overflow even as some people spend hours a week on that site. It’s not a question of skill just the nature of the work.
Then you don't understand how to use the tools. LLMs are an accelerator for people who learn how to work with the prompts correctly and already have a good grasp of the domain in which they are asking questions.
The point isn't that LLMs are useless, or that they aren't interesting technology in the abstract. The point is that aside from the very real entertainment value of being able to conjure artwork apparently out of thin air, when it comes to solving practical problems in the tech space, it's not clear that they are achieving significantly more - faster or cheaper - than existing tools and methods already did.
You're right that it's probably too early to have data to prove their utility either way, but given how much time, money and energy many companies have already sunk into this - precisely without any evidence to prove it's worthwhile - it does come across rather more like a hype cycle at the moment.