my personal perspectives 1) straight up copying. download a bunch of copyrighted...

visarga · 2025-01-21T07:51:09 1737445869

You can ensure a model trains on transformative not derivative synthetic texts, for example, by asking for summary, or turning it into QA pairs, or doing contrastive synthesis across multiple copyrighted works. This will ensure the resulting model will never regurgitate the training set because it has not seen it. This approach only takes abstract ideas from copyrighted sources, protecting their specific expression.

If abstract ideas were protectable what would stop a LLM from learning not from the original source but from social commentary and follow up works? We can't ask people not to reproduce ideas they read about. But on the other hand, protecting abstractions would kneecap creativity both in humans and AI.

Earw0rm · 2025-01-21T08:14:19 1737447259

That's an interesting argument, which makes the case for "it's what you make it do, not what it can do, which constitutes a violation" a little stronger IMO.

Earw0rm · 2025-01-21T07:58:05 1737446285

1) It's definitely copying, but that doesn't necessarily mean the end product is itself a copyright violation. (And that remains true even where some of the steps to make it were themselves violations).

2) Agreed! Where this becomes interesting with LLMs is that, as with people, they can have the capacity to produce a derivative work even without having seen the original.

For example, an LLM that had "read" enough reviews of Harry Potter might be able to produce a reasonable stab at the book (at least enough so for the law to consider it a derivative) without ever having consumed the work itself or direct derivatives.

3) It's more of a tool-use and intent argument. One might make the argument that an LLM is a machine, not a set of content/data, and that the liability for what it does sits firmly with the user/operator, not those who made it. If I use a typewriter to copy Harry Potter - or a weapon to hurt or kill someone - in neither case does the machine or its maker have any liability there.