> So it is not okay to infringe copyright at a small scale but okay to do it in a large scale?
No, I think you're missing the "transformative" part.
The line of argument isn't "we're going to resell millions of codebases as-is for pure profit", which would be undisputed copyright infringement.
The argument is that something highly transformative (e.g. training models) isn't infringement at all, because transformative works are covered by fair use. And that, if we still wanted to explore interpreting/changing the law to force opt-in for highly transformative things, it's logistically unreasonable, to such an extent that the transformative thing couldn't occur at all. So that it's a waste of time to even be discussing asking for permission as some kind of potential compromise or requirement. If it's transformative and therefore fair use, asking for permission is an irrelevant distraction.
That's why this type of argument is valid. I'm not saying whether the argument will/should win in this particular case, but I'm definitely saying there's nothing absurd whatsoever about it.
Yes, transformative works may be allowed. So I'd guess that creating a model is probably OK (speaking as a non-lawyer!). But using output generated by that model is another matter. The "model" is fundamentally a machine that produces output that is derived from the input it was given. And that output might not be sufficiently transformative to "escape" copyright/licensing restrictions.
In the extreme case, the model's output might be a verbatim copy of a large portion of the original input ("training materials"); but even if it has been extensively modified, e.g. to conform to the coding style of a target repository or to follow a different language standard, this might not be "transformative".
(Compare: A translation of Harry Potter to French looks superficially quite different from the English original, yet it is still a derivative work; and if you're planning to publish one, Ms Rowling (or her publisher) may want a word with you. And that would apply whether you translated it "manually" or pushed it through Google Translate.)
No, I think you're missing the "transformative" part.
The line of argument isn't "we're going to resell millions of codebases as-is for pure profit", which would be undisputed copyright infringement.
The argument is that something highly transformative (e.g. training models) isn't infringement at all, because transformative works are covered by fair use. And that, if we still wanted to explore interpreting/changing the law to force opt-in for highly transformative things, it's logistically unreasonable, to such an extent that the transformative thing couldn't occur at all. So that it's a waste of time to even be discussing asking for permission as some kind of potential compromise or requirement. If it's transformative and therefore fair use, asking for permission is an irrelevant distraction.
That's why this type of argument is valid. I'm not saying whether the argument will/should win in this particular case, but I'm definitely saying there's nothing absurd whatsoever about it.