>> Is it really new? Humans have always learnt by studying what's out there already.
"Humans" being the important word here. I don't understand why people keep trying to compare training a model to humans learning through reading etc. They are very different things. Learning done by machines at enormous scale and done to benefit private companies financially is not the same as humans learning.
How is it meaningfully different with respect to this question?
If I go to a museum and look at a bunch of modern paintings, then go home and paint something new but “in the style of”, this is well-established as within my rights, regardless of how any of the painters whose work I studied and was inspired by might feel.
If I take a notebook and write down some notes about the themes and stylistic attributes of what I see, then go home and paint something in the same style, that too is fine - right? Or would you argue the notes I took are a copyright violation? Or the works I made using those notes?
Now let’s say I automate the process of recording those notes. Does that change the fundamentals of what is happening, with respect to copyright?
The law most definitely distinguishes between the rights of a human and the rights of a software program running on a computer.
AI does not read, look at or listen to anything. It runs algorithms on binary data.
An AI developer who uses millions of files to program their AI system also does not read, look at or listen to all of that stuff. They copy it.
That is the part explicitly covered by international copyright law. It is not possible to use some file to "train" a ML model except by copying that file. That's just a fact. It wasn't the computer that went out and read or looked at the work. It was a human who took a binary copy of it, ran some algorithms on it without even looking at it, and published/sold/gave access to the software.
AI software is a work by an author; not an author.
Yes, but also very similar. We learn very well by spaced repetition, and by practicing things. Our whole nervous system stores information in a similar way. Is the brain and the individual neurons more complex? Yes, sure, but that doesn't negate the core similarities.
> Learning done by machines at enormous scale and done to benefit private companies financially is not the same as humans learning.
Yes, that's the important difference. That in the end if you train a robot you get a program that's easy to copy/scale, the marginal cost of using it is orders of magnitude lower than what you'd get if you would do it with humans in the loop.
We already have fair use in copyright, because there are important differences between the various forms and modalities of human imitation.
And of course maybe it's time to rename copyright to usageright. After current copyright doesn't even apply in most cases. (The results are not derivatives, there's sufficient substantial transformative difference, etc. That said, the phrasing in the US constitution still makes sense: "... the exclusive Right to their respective Writings and ..." ... if we interpret Right to mean all rights, including even the right of who can read/see it.)
The differences don't seem salient though. Doing a legal thing faster doesn't generally make it any less legal; doing it for profit changes the legal regime somewhat but not in ways that seem relevant to what's being claimed.
"Humans" being the important word here. I don't understand why people keep trying to compare training a model to humans learning through reading etc. They are very different things. Learning done by machines at enormous scale and done to benefit private companies financially is not the same as humans learning.