> *So it is not okay to infringe copyright at a small scale but okay to do it in...

jfk13 · on Jan 6, 2023

Yes, transformative works may be allowed. So I'd guess that creating a model is probably OK (speaking as a non-lawyer!). But using output generated by that model is another matter. The "model" is fundamentally a machine that produces output that is derived from the input it was given. And that output might not be sufficiently transformative to "escape" copyright/licensing restrictions.

In the extreme case, the model's output might be a verbatim copy of a large portion of the original input ("training materials"); but even if it has been extensively modified, e.g. to conform to the coding style of a target repository or to follow a different language standard, this might not be "transformative".

(Compare: A translation of Harry Potter to French looks superficially quite different from the English original, yet it is still a derivative work; and if you're planning to publish one, Ms Rowling (or her publisher) may want a word with you. And that would apply whether you translated it "manually" or pushed it through Google Translate.)

iamevn · on Jan 6, 2023

I'm not sure I buy that training a model is transformative in the fair use sense. How is training a model different from lossy compression?

distcs · on Jan 6, 2023

Great answer! Thanks for taking the time to write this answer. Learnt something new!