You're right, but GitHub's TOS doesn't (or at least shouldn't) change the conditions of the original license. You're giving GitHub a copy of the source code, not the ability to dictate your license for you. There's certainly a lot of legal ambiguity in the copyright sense, but one thing seems clear: Microsoft trained Copilot on code they weren't certain they could use.
Technically the GitHub TOS is in itself a license; much like how you can dual-license code, uploading to GitHub is its own license grant separate from the license of the code you're granting to anyone who wants to use it for their own purposes. LICENSE.txt/md is not the only way to grant access to code you write.
GitHub's TOS is subject to change so that doesn't hold water. Tomorrow they could claim in their TOS you owe them your firstborn if you upload code to GitHub but that doesn't mean that you are bound by those terms because they cross the reasonable expectation of what you are signing up for. Granting Microsoft a blanket license to use your code in any way they see fit was not a part of the deal for GitHub, and as far as I know it still isn't for code that you claim copyright on. If you release your code into the public domain or use a license that is so permissive that anybody can use it at will, even without attribution that would make it fair game.
Well, that's why major TOS changes are accompanied by the option to discontinue using that service. Usually they say that continuing to use the service after a certain date constitutes your agreement to the new terms.
I think we're over here in our armchairs weirdly assuming that GitHub doesn't have any lawyers working for them. I think they know they're legally in the clear on CoPilot.
I'm not at all a lawyer, but in my opinion we observe that the non-automated version of AI-generated works (the act of making art and prose in the style of an existing copyright work based on the artist's observation of that work) is not illegal. The only thing that AI introduces is automation.
I see Copilot as a trial balloon. If they get away with it you can expect the next move to appropriate the body of open source that is GitHub. Why the archenemy of open source should suddenly be trusted to play nice is something I really can't grasp.
It's not sudden - they've owned it for a while now.
What I can't understand is people feel locked into Github because of the social features. To me they seem the least important part of Github, particularly with so many OSS projects running communities on Discord or Slack.
GitHub is still subject to the terms of your license though; they can impose whatever rules they want on you service-wise, but their use of your software should be dictated by the accompanying LICENSE file.
To illustrate: GitHub could delete any project they want, and there would be no real recourse for the project's author. That is a service decision that they reserve the right to impose via their TOS. However, if they were to steal code from a user's private repository and violate the license therein, the author could sue for theft of intellectual property.
> GitHub is still subject to the terms of your license though; they can impose whatever rules they want on you service-wise, but their use of your software should be dictated by the accompanying LICENSE file.
Again, the LICENSE file in the repo is not the only license for that code. A copyright holder can grant people licenses to their work with or without documentation and with or without that license being accompanied within their work itself.
By uploading code to GitHub, you are asserting that you can legally grant GitHub a license to that code for hosting as described below.
> If you're posting anything you did not create yourself or do not own the rights to, you agree that you are responsible for any Content you post; that you will only submit Content that you have the right to post; and that you will fully comply with any third party licenses relating to Content you post.
Note that this is literally only limited to the provisions set below; uploading to GH doesn't allow them to import or use your code in Windows or the Github codebase or anything like that, doing so would indeed be bound by the license terms you've granted the world via the repo's LICENSE file.
> 4. License Grant to Us
We need the legal right to do things like host Your Content, publish it, and share it. You grant us and our legal successors the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time. This license includes the right to do things like copy it to our database and make backups; show it to you and other users; parse it into a search index or otherwise analyze it on our servers; share it with other users; and perform it, in case Your Content is something like music or video.
The TOS do say users grant GitHub a license to host, copy, and distribute works as comes up while they're providing the GitHub service. So saving copies on servers, making backups, and "distributing" it via their website.
IANAL, but I think Copilot is not a reasonable thing to include in these services.
I’m skeptical of this interpretation since it seems to imply that you could upload copyrighted code and now GitHub has a license to do whatever they want with the code, which is obviously not true. An example would be someone uploading Microsoft Windows source code illegally, and GitHub can’t just use it because it was uploaded to their service. I would argue that this then extends to CoPilot, in that just because they have a license to host it, they don’t have a license to do whatever they want with it.
The license is quite limited, but does include "improving the service over time" which might be their key to CoPilot being okayed by their legal team, at least originally:
> 4. License Grant to Us
We need the legal right to do things like host Your Content, publish it, and share it. You grant us and our legal successors the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time. This license includes the right to do things like copy it to our database and make backups; show it to you and other users; parse it into a search index or otherwise analyze it on our servers; share it with other users; and perform it, in case Your Content is something like music or video.
If that is in fact how CoPilot got the green light from their legal team and what the case will eventually hinge upon, I really wonder if the argument that CoPilot is part of the "service" will hold up. I can imagine a judge or jury not being convinced here because the majority of the paragraph is clearly about the general use of the service (parsing it into a search index so you can search in your repo; making backups so that service isn't disrupted in the case of some server failure; share it with others so that others can access the content you uploaded). In other words, if this is what they argue is the reason CoPilot is okay, I can imagine plaintiff's lawyers able to successfully argue against it on the basis that CoPilot isn't really part of the normal service like e.g. the repos are and the argument that it could fall into the "otherwise analyze" statement is flimsy as it's not clear what analyze is defined as and it's arguable that adding it to an AI training model is not the same as or similar to indexing for search.
I suspect that the main argument will hinge not on the permission though, but rather if the use of code that is copyrighted in an AI model is transformative enough to fall under fair use. Obviously it's to be decided but I would imagine that because it wasn't a human transforming the code and/or hand selecting the code to put into the AI model, that it won't be considered transformative and therefore the use of the code doesn't fall under fair use.