This is the entire point of existence for the GPL. Weaponize copyright. LLMs hav...

FridgeSeal · 2024-09-27T12:23:23.000000Z

Because some people keep asserting that LLM’s “don’t count as stealing” and “how come search links are on but got reciting paywalled NYT articles on demand is bad??” Without so much as a hint of irony.

LLM tech is pretty cool.

Would be a lot cooler if its existence wasn’t predicted on the wholesale theft of everyone’s stuff, immediately followed by denial of theft, poisoning the well, and massively profiting off it.

welferkj · 2024-09-27T15:49:50.000000Z

>Because some people keep asserting that LLM’s “don’t count as stealing”

People who confidently assert either opinion in this regard are wrong. The lawsuits are still pending. But if I had to bet, I'd bet on the OpenAI side. Even if they don't win outright, they'll probably carve out enough exemptions and mandatory licensing deals to be comfortable.

visarga · 2024-09-27T16:00:34.000000Z

You are singling out accidental replication and forgetting it was triggered with fragments from the original material. Almost all LLM outputs are original - both because they use randomness to sample, and because they have user prompt conditioning.

And LLMs are really a bad choice for infringement. They are slow, costly and unreliable at replicating any large piece of text compared to illegal copying. There is no space to perfectly memorize the majority of its training set. A 10B models is trained on 10T tokens, no space for more than 0.1% to be properly memorized.

I see this overreaction as an attempt to strengthen copyright, a kind of nimby-ism where existing authors cut the ladder to the next generation by walling off abstract ideas and making it more probably to get sued for accidental similarities.