That's what I wanted to ask, where do we draw the line of copyright when it come...

hliyan · on June 30, 2021

Copyright is going to get very muddy in the next few decades. ML systems may be able to generate entire novels in the styles of books they have digested, with only some assist from human editors. True of artwork and music, and perhaps eventually video too. Determining "similarity" too, may soon have to be taken off the hands of the judge and given to another ML system.

rhn_mk1 · on June 30, 2021

> It's perfectly fine for me to develop programming skills by reading any code regardless of the license.

I'd be inclined to agree with this, but whenever a high profile leak of source code happens, reading that code can have dire consequences for reverse engineers. It turns clean room reverse engineering into something derivative, as if the code that was read had the ability to infected whatever the programmer wrote later.

A situation involving the above developed in the ReactOS project https://en.wikipedia.org/wiki/ReactOS#Internal_audit

vsareto · on June 30, 2021

>how do we even prove it if we're only given the trained model and it generalizes well?

Someone's going to have to audit the model the training and the data that does it. There's a documentary on black holes on Netflix that did something similar (no idea if it was AI) but each team wrote code to interpret the data independently and without collaboration or hints or information leakage, and they were all within a certain accuracy of one-another for interpreting the raw data at the end of it.

So, as an example, if I can't train something in parallel and get similar results to an already trained model, we know something is up and there is missing or altered data (at least I think that's how it works).

agomez314 · on June 30, 2021

Take it further. You could easily imagine taking a service like this as an invisible middleware behind a front-end and start asking users to pay for the service. Some could argue it's code generation attributable to those who created the model, but reality is that the models were trained by code written by thousand of passionate users at no pay with the intent of free usage.

bogwog · on June 30, 2021

> but reality is that the models were trained by code written by thousand of passionate users at no pay with the intent of free usage.

I hope you're actually reading those LICENSE files before using open source code in your projects.