But... we're all aware - as is GitHub - that plenty of the content there is not posted by the original copyright holders, who are the only parties that are able to enter into such a contract. That was the reason for GitHub coming into existence in the first place. You can't turn around a couple of years later and start arguing that the use of GitHub allows for a blanket exemption on copyright law, which is effectively what this amounts to.
GitHub ToS is written by GitHub, it's not a contract in the sense that no consideration has been given to the other party and as such it isn't legally binding on that other party, but regular law, such as copyright law, still applies to GitHub.
Its the same as other user generated content sites... The ToS is to legally shift blame from GitHub to the users... and thats what made me think of "code id" actually, since GitHub have a firm defence in the form of "Users doing illegal things isn't our fault, we asked them not to and tried to kick people off when we found out they were violating the terms, but they might still get slapped around a bit by the Court and need to implement some form of safeguards the way YouTube was forced to, because your point about how binding the terms of service are when the consideration is "use of this service in exchange for agreement" is true, there is not a super strong contract here, its nominally more binding than the average clickwrap contract pre-install EULA since the consideration in exchange is use of the service itself, but as case law around things like scraping and other internet activity has shown, its definitely not as binding as a physically signed sale contract would be...
It shouldn't matter if the copyright holder agreed to it directly, if they've published the original code under an open source license. Since open source licenses all allow people to use the code for "whatever"
Even GPL doesn't (yet) include a clause saying the code can't be used to train AI unless the AI itself is open source
> Since open source licenses all allow people to use the code for "whatever"
That's not what they allow for, and copyright being a 'right' it allows you to pass those rights on to others and to retain some for yourself. If not explicitly passed on the right still rests with the original author, plenty of precedent for that.
To take an example: someone who used MIT licensed code but doesn't reproduce the license.
Therefore isn't following the terms of the copyright grant, ergo doesn't have a license for use, ergo is violating copyright.
Now what does that look like when I take 100 different open source licenses, including MIT, put them in a GPT blender, and then productize my output without following any of the licenses?
... makes you think there might be a legal component to why OpenAI switched to a SaaS model. Although believe they'd still be in hot water over any AGPL et al. code.
GitHub ToS is written by GitHub, it's not a contract in the sense that no consideration has been given to the other party and as such it isn't legally binding on that other party, but regular law, such as copyright law, still applies to GitHub.