> does anyone else feel that this model isn’t significantly different
According to Anthropic¹, LLMs are mostly a thing in the software engineering space, and not much elsewhere. I am not a software engineer, and so I'm pretty agnostic about the whole thing, mildly annoyed by the constant anthropomorphisation of LLMs in the marketing surrounding it³, and besides having had a short run with Llama about 2 years ago, I have mostly stayed away from it.
Though, I do scripting as a mean to keep my digital life efficient and tidy, and so today I thought that I had a perfect justification for giving Claude 4 Sonnet a spin. I asked it to give me a jujutsu² equivalent for `git -ffdx`. What ensued was that: https://claude.ai/share/acde506c-4bb7-4ce9-add4-657ec9d5c391
I leave you the judge of this, but for me this is very bad. Objectively, for the time that it took me to describe, review, correct some obvious logical flaws, restart, second-guess myself, get annoyed for being right and having my time wasted, fighting unwarranted complexity, etc…, I could have written a better script myself.
So to answer your question, no, I don't think this is significant, and I don't think this generation of LLMs are close to their price tag.
³: "hallucination", "chain of thought", "mixture of experts", "deep thinking" would have you being laughed at in the more "scientifically apt" world I grew up with, but here we are </rant>
According to Anthropic¹, LLMs are mostly a thing in the software engineering space, and not much elsewhere. I am not a software engineer, and so I'm pretty agnostic about the whole thing, mildly annoyed by the constant anthropomorphisation of LLMs in the marketing surrounding it³, and besides having had a short run with Llama about 2 years ago, I have mostly stayed away from it.
Though, I do scripting as a mean to keep my digital life efficient and tidy, and so today I thought that I had a perfect justification for giving Claude 4 Sonnet a spin. I asked it to give me a jujutsu² equivalent for `git -ffdx`. What ensued was that: https://claude.ai/share/acde506c-4bb7-4ce9-add4-657ec9d5c391
I leave you the judge of this, but for me this is very bad. Objectively, for the time that it took me to describe, review, correct some obvious logical flaws, restart, second-guess myself, get annoyed for being right and having my time wasted, fighting unwarranted complexity, etc…, I could have written a better script myself.
So to answer your question, no, I don't think this is significant, and I don't think this generation of LLMs are close to their price tag.
¹: https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-...
²: https://jj-vcs.github.io/jj/latest/
³: "hallucination", "chain of thought", "mixture of experts", "deep thinking" would have you being laughed at in the more "scientifically apt" world I grew up with, but here we are </rant>