I agree, but that doesn't make it good - or perhaps even acceptable. To quote myself answering another commenter:
> Never before has been the reuse (I'm trying to avoid using the word theft) of content produced by others have been conducted on such an industrial scale. The entire business model of LLMs and generative models has been to take information created by masses of humans and reproduce it. They seem to have zero qualms taking all the work of professional and amateur artists and feeding it into a statistical model that trivializes replication and reproduction. You could argue that humans do this as well, but I feel scale matters here. The same way that a kitchen knife can be used to murder someone, but with a machinegun you can mow down masses of people. Please excuse the morbid example, but I'm trying to drive a point: if we make a certain thing extremely easy, people will do it, and likely do it on a mass scale. You could argue that this is progress, but is all progress inherently beneficial?
I agree that scale changes the nature of what's going on, but I'm not sure if it follows that the scaled up variant is bad. I think models like GPT3 and Sonnet which are intended for "general purpose intelligence" are fine. Same with Copilot and Phind for coding. They contain copy-written knowledge but not by necessity and their purpose is not to reproduce copy-written materials.
Training a diffusion model on a specific artist's work with the intent to reproduce their style I think obviously lives on the side of wrong. While it's true a human could do the same thing, there is a much stronger case that the model itself is a derivative work.
I think the courts will be able to identify cases where models are "laundering copyright" as separate from cases where copyrighted material is being used to accomplish a secondary goal like image editing. Taking a step back this is in some way what copyright is for— you get protections on your work in exchange for making it part of the public body of knowledge to be used for things you might not have intended.
You raise very good points, and I agree that scale is not necessarily bad, in fact it can be a source of much good. Scale simply increases the frequency and thus likelihood of things, whether good or bad.
I'm sure that big players will be able to assert their rights with their armies of lawyers. Just like the music and movie industries have after the rise of file sharing.
My worry is perhaps more subtle: I'd argue that generative AI draws much more from the masses of small content creators and they will not be able to assert their rights. In some sense, if people pirate the next blockbuster movie, the producers might only make 1 billion instead of 1.1 (and piracy has never been proven to actualy impact sales), but if all content starts being consumed via massive and centralized anonymizers, the masses of people who made the internet what it is will eventually disappear. So the scale on which these tools can hoover up information and reproduce it is unprecedented, as is the fact that nobody important seems to be thinking about how to make sure that we can keep actual humans motivated to keep generating the content that actually feeds the AI.
It's one of those cursed things: the long term interest of the AI companies is that humans will keep feeding the beast with more information, but the short term interest is to capture their audience and do everything possible to keep them inside the walled garden of a single AI provider. It is not in the companies' interest for people to step outside and go straight to the painter/writer/moviemaker, because at that point the AI is no longer needed.
> Never before has been the reuse (I'm trying to avoid using the word theft) of content produced by others have been conducted on such an industrial scale. The entire business model of LLMs and generative models has been to take information created by masses of humans and reproduce it. They seem to have zero qualms taking all the work of professional and amateur artists and feeding it into a statistical model that trivializes replication and reproduction. You could argue that humans do this as well, but I feel scale matters here. The same way that a kitchen knife can be used to murder someone, but with a machinegun you can mow down masses of people. Please excuse the morbid example, but I'm trying to drive a point: if we make a certain thing extremely easy, people will do it, and likely do it on a mass scale. You could argue that this is progress, but is all progress inherently beneficial?