More

visarga · 2026-01-19T16:59:15 1768841955

The problem is when you use your "copy" as inspiration and actually create and publish something. It is very hard to be certain you are safe, besides literal expression close paraphrasing is also infringing, using world building elements, or using any original abstraction (AFC test). You can only know after a lawsuit.

It is impossible to tell how much AI any creator used secretly, so now all works are under suspicion. If copyright maximalists successfully copyright style (vibes), then creativity will be threatened. If they don't succeed, then copyright protection will be meaningless. A catch 22.

HWR_14 · 2026-01-19T21:40:40 1768858840

> close paraphrasing is also infringing, using world building elements, or using any original abstraction (AFC test)

World building elements? Do you have more details on that, because that feels wrong to me.

Unless you mean the specific names of things in the world like "Hobbits".

visarga · 2026-01-19T05:04:37 1768799077

Well said, I have been saying the same. Besides helping agents code, it helps us trust the outcome more. You can't trust a code not tested, and you can't read every line of code, it would be like walking a motorcycle. So tests (back pressure, deterministic feedback) become essential. You only know something works as good as its tests show.

What we often like to do in a PR - look over the code and say "LGTM" - I call this "vibe testing" and think it is the real bad pattern to use with AI. You can't commit your eyes on the git repo, and you are probably not doing as good of a job as when you have actual test coverage. LGTM is just vibes. Automating tests removes manual work from you too, not just make the agent more reliable.

But my metaphor for tests is "they are the skin of the agent", allow it to feel pain. And the docs/specs are the "bones", allow it to have structure. The agent itself is the muscle and cerebellum, and the human in the loop is the PFC.

wcarss · 2026-01-19T05:23:58 1768800238

For anyone else who briefly got very lost at PFC, probably "prefrontal cortex".

visarga · 2026-01-18T06:40:47 1768718447

The "pattern matching" perspective is true if you zoom in close enough, just like "protein reactions in water" is true for brains. But if you zoom out you see both humans and LLMs interact with external environments which provide opportunity for novel exploration. The true source of originality is not inside but in the environment. Making it be all about the model inside is a mistake, what matters more than the model is the data loop and solution space being explored.

visarga · 2026-01-18T06:36:38 1768718198

But the trend line is less ambiguous, models got better year over year, much much better.

fc417fc802 · 2026-01-18T06:47:31 1768718851

I don't dispute that the situation is rapidly evolving. It is certainly possible that we could achieve AGI in the near future. It is also entirely possible that we might not. Claims such as that AGI is close or that we will soon be replacing developers entirely are pure hype.

When someone says something to the effect of "LLMs are on the verge of replacing developers any day now" it is perfectly reasonable to respond "I tried it and it came up with crap". If we were actually near that point you wouldn't have gotten crap back when you tried it for yourself.

jerkstate · 2026-01-18T13:40:14 1768743614

There's a big difference between "I tried it and it produced crap" and "it will replace developers entirely any day now"

People who use this stuff everyday know that people who are still saying "I tried it and it produced crap" just don't know how to use it correctly. Those developers WILL get replaced - by ones who know how to use the tool.

fc417fc802 · 2026-01-18T16:12:48 1768752768

> Those developers WILL get replaced - by ones who know how to use the tool.

Now _that_ I would believe. But note how different "those who fail to adapt to this new tool will be replaced" is from "the vast majority will be replaced by this tool itself".

If someone had said that six (give or take) months ago I would have dismissed it as hype. But there have been at least a few decently well documented AI assisted projects done by veteran developers that have made the front page recently. Importantly they've shown clear and undeniable results as opposed to handwaving and empty aspirations. They've also been up front about the shortcomings of the new tool.

anonzzzies · 2026-01-20T03:05:14 1768878314

You probably mean antirez porting Flux to c. There were not too many shortcomings in his breakdown; his biggest one as I saw was that his knowledge and experience building large c programs really was a requirement. But given one of these experts, you don't see how that person and claude code just replaces a team. The less capable people on the team cannot do what he does so before they were just entering code and getting corrected in reviews or asking for help. Now the AI can do that, but on 10 projects in parallel. In a weekend you wont have time for that but not everything has to be done in a weekend.

visarga · 2026-01-16T13:48:44 1768571324

> You can't scale your way out of hallucinations

You scale your way only out in verifiable domains, like code, math, optimizations, games and simulations. In all the other domains the AI developers still got billions (trillions) of tokens daily, which are validated by follow up messages, minutes or even days later. If you can study longitudinally you can get feedback signals, such as when people apply the LLM idea in practice and came back to iterate later.

visarga · 2026-01-16T03:22:55 1768533775

Context is the moat, you can't eat so that I feel satiated, my context my benefits, it is nonfungible

visarga · 2026-01-14T23:55:12 1768434912

> Looking at PRs of agents feels a wrong headed

It would be walking the motorcycle.

visarga · 2026-01-14T00:54:58 1768352098

Untested undocumented LLM code is technical debt, but if you do specs and tests it's actually the opposite, you can go beyond technical debt and regenerate your code as you like. You just need testing to be so good it guarantees the behavior you care about, and that is easier in our age of AI coding agents.

mkroman · 2026-01-14T01:05:28 1768352728

> but if you do specs and tests it's actually the opposite, you can go beyond technical debt and regenerate your code as you like.

Having to write all the specs and tests just right so you can regenerate the code until you get the desired output just sounds like an expensive version of the infinite monkey theorem, but with LLMs instead of monkeys.

fourthark · 2026-01-14T01:12:20 1768353140

You can have it write the specs and tests, too, and review and refine them much faster than you could write them.

trollbridge · 2026-01-14T13:18:41 1768396721

... so you hand-write the specs and tests?

I use LLMs to generate tests as well, but sometimes the tests are also buggy. As any competent dev knows, writing high-quality tests generally takes more time than writing the original code.

visarga · 2026-01-13T14:03:59 1768313039

If you implement a project, keep the specs and tests and re-implement it, it should not matter the exact way it was coded as long as it was well tested. So you don't need deterministic LLMs.

I think work with LLMs should be centered on testing, since it is how the agent is fenced off in a safe space where it can move without risk. Tests are the skin, specs are the bones, and the agent is the muscle.

I think reading the code as the sole defense against errors is a grave mistake, it is "vibe testing". LGTM is something you cannot reproduce. Reading all the code is like walking the motorcycle.

akoboldfrying · 2026-01-14T06:06:27 1768370787

The first time you generate the code, it calls the method doFoo(), and the test calls that method. The second time you generate the code, it calls the method fooify(), and the test breaks.

How do you propose to get around this, without a human specifying every class layout in detail?

visarga · 2026-01-13T09:14:59 1768295699

TIL about ADR's, a great idea.