I did read it, but the whole premise is flawed due to an apparently incomplete understanding of how LLMs work. Including code samples in your prompt won't have the effect you think it will.
LLMs are trained to produce results that are statistically likely to be syntactically well-formed according to assumptions made about how "language" works. So when you provide code samples, the model incorporates those into the response. But it doesn't have any actually comprehension of what's going on in those code samples, or what any code "means"; it's all just pushing syntax around. So what happens is you end up with responses that are more likely to look like what you want, but there's no guarantee or even necessarily a correlation that the tuned responses will actually produce meaningfully good code. This increases the odds of a bug slipping by because, at a glance, it looked correct.
Until LLMs can generate code with proofs of semantic meaning, I don't think it's a good idea to trust them. You're welcome to do as you please, of course, but I would never use them for anything I work on.
If it works it works and it definitely works for me. I've been using Copilot for about a year and I can't imagine coding without it again. I cannot recall any bugs slipping by because of it. If anything it makes me write less bugs, since it has no problem taking tedious edge cases into account.
> I've been using Copilot for about a year and I can't imagine coding without it again
I for example used Copilot for 2 months at work and wouldn't pay for it. Most suggestions where either useless or buggy. But I work in a huge C++ codebase, maybe that's hard for it as C++ is also hard for ChatGPT.
I think this is incorrect for most use-cases. LLMs do grok code semantically. Adding requests for coding style injects implementation specificity when flattening the semantic multidimensionality back into language.
No, they do not. That's not how LLMs work, and stating that it is betrays an absolute lack of understanding of the underlying mechanisms.
LLMs generate statistically likely sequences of tokens. Their statistical model is derived from huge corpora, such as the contents of the entire (easily searchable) internet, more or less. This makes it statistically likely that, given a common query, they will produce a common response. In the realm of code, this makes it likely the response will be semantically meaningful.
But the statistical model doesn't know what the code means. It can't. (And trying to use large buzzwords to convince people otherwise doesn't prove anything, for what it's worth.)
To see what I mean, just ask ChatGPT about a slightly niche area. I work in programming languages research at a university, and I can't tell you how many times I've had to address student confusion because an LLM generated authoritative-sounding semantic garbage about my domain areas. It's not just that it was wrong, but that it just makes things up in every facet of the exercise to a degree that a human simply couldn't. They don't understand things; they generate text from statistical models, and nothing more.
LLMs are trained to produce results that are statistically likely to be syntactically well-formed according to assumptions made about how "language" works. So when you provide code samples, the model incorporates those into the response. But it doesn't have any actually comprehension of what's going on in those code samples, or what any code "means"; it's all just pushing syntax around. So what happens is you end up with responses that are more likely to look like what you want, but there's no guarantee or even necessarily a correlation that the tuned responses will actually produce meaningfully good code. This increases the odds of a bug slipping by because, at a glance, it looked correct.
Until LLMs can generate code with proofs of semantic meaning, I don't think it's a good idea to trust them. You're welcome to do as you please, of course, but I would never use them for anything I work on.