The problem is when you use your "copy" as inspiration and actually create and publish something. It is very hard to be certain you are safe, besides literal expression close paraphrasing is also infringing, using world building elements, or using any original abstraction (AFC test). You can only know after a lawsuit.
It is impossible to tell how much AI any creator used secretly, so now all works are under suspicion. If copyright maximalists successfully copyright style (vibes), then creativity will be threatened. If they don't succeed, then copyright protection will be meaningless. A catch 22.
Well said, I have been saying the same. Besides helping agents code, it helps us trust the outcome more. You can't trust a code not tested, and you can't read every line of code, it would be like walking a motorcycle. So tests (back pressure, deterministic feedback) become essential. You only know something works as good as its tests show.
What we often like to do in a PR - look over the code and say "LGTM" - I call this "vibe testing" and think it is the real bad pattern to use with AI. You can't commit your eyes on the git repo, and you are probably not doing as good of a job as when you have actual test coverage. LGTM is just vibes. Automating tests removes manual work from you too, not just make the agent more reliable.
But my metaphor for tests is "they are the skin of the agent", allow it to feel pain. And the docs/specs are the "bones", allow it to have structure. The agent itself is the muscle and cerebellum, and the human in the loop is the PFC.
The "pattern matching" perspective is true if you zoom in close enough, just like "protein reactions in water" is true for brains. But if you zoom out you see both humans and LLMs interact with external environments which provide opportunity for novel exploration. The true source of originality is not inside but in the environment. Making it be all about the model inside is a mistake, what matters more than the model is the data loop and solution space being explored.
I don't dispute that the situation is rapidly evolving. It is certainly possible that we could achieve AGI in the near future. It is also entirely possible that we might not. Claims such as that AGI is close or that we will soon be replacing developers entirely are pure hype.
When someone says something to the effect of "LLMs are on the verge of replacing developers any day now" it is perfectly reasonable to respond "I tried it and it came up with crap". If we were actually near that point you wouldn't have gotten crap back when you tried it for yourself.
There's a big difference between "I tried it and it produced crap" and "it will replace developers entirely any day now"
People who use this stuff everyday know that people who are still saying "I tried it and it produced crap" just don't know how to use it correctly. Those developers WILL get replaced - by ones who know how to use the tool.
> Those developers WILL get replaced - by ones who know how to use the tool.
Now _that_ I would believe. But note how different "those who fail to adapt to this new tool will be replaced" is from "the vast majority will be replaced by this tool itself".
If someone had said that six (give or take) months ago I would have dismissed it as hype. But there have been at least a few decently well documented AI assisted projects done by veteran developers that have made the front page recently. Importantly they've shown clear and undeniable results as opposed to handwaving and empty aspirations. They've also been up front about the shortcomings of the new tool.
You probably mean antirez porting Flux to c. There were not too many shortcomings in his breakdown; his biggest one as I saw was that his knowledge and experience building large c programs really was a requirement. But given one of these experts, you don't see how that person and claude code just replaces a team. The less capable people on the team cannot do what he does so before they were just entering code and getting corrected in reviews or asking for help. Now the AI can do that, but on 10 projects in parallel. In a weekend you wont have time for that but not everything has to be done in a weekend.
You scale your way only out in verifiable domains, like code, math, optimizations, games and simulations. In all the other domains the AI developers still got billions (trillions) of tokens daily, which are validated by follow up messages, minutes or even days later. If you can study longitudinally you can get feedback signals, such as when people apply the LLM idea in practice and came back to iterate later.
Untested undocumented LLM code is technical debt, but if you do specs and tests it's actually the opposite, you can go beyond technical debt and regenerate your code as you like. You just need testing to be so good it guarantees the behavior you care about, and that is easier in our age of AI coding agents.
> but if you do specs and tests it's actually the opposite, you can go beyond technical debt and regenerate your code as you like.
Having to write all the specs and tests just right so you can regenerate the code until you get the desired output just sounds like an expensive version of the infinite monkey theorem, but with LLMs instead of monkeys.
I use LLMs to generate tests as well, but sometimes the tests are also buggy. As any competent dev knows, writing high-quality tests generally takes more time than writing the original code.
If you implement a project, keep the specs and tests and re-implement it, it should not matter the exact way it was coded as long as it was well tested. So you don't need deterministic LLMs.
I think work with LLMs should be centered on testing, since it is how the agent is fenced off in a safe space where it can move without risk. Tests are the skin, specs are the bones, and the agent is the muscle.
I think reading the code as the sole defense against errors is a grave mistake, it is "vibe testing". LGTM is something you cannot reproduce. Reading all the code is like walking the motorcycle.
The first time you generate the code, it calls the method doFoo(), and the test calls that method. The second time you generate the code, it calls the method fooify(), and the test breaks.
How do you propose to get around this, without a human specifying every class layout in detail?
It is impossible to tell how much AI any creator used secretly, so now all works are under suspicion. If copyright maximalists successfully copyright style (vibes), then creativity will be threatened. If they don't succeed, then copyright protection will be meaningless. A catch 22.
reply