Hacker News new | past | comments | ask | show | jobs | submit login

By uploading your content to GitHub, you’ve granted them a license to use that content to “improve the Service over time”, as specified in the ToS[1].

That effectively “overrides” any license or term that you’ve specified for your repository, since you’ve already licensed the content to GitHub under different terms. Of course, people who are not GitHub are beholden to the terms you specify.

[1] https://docs.github.com/en/github/site-policy/github-terms-o...




I think more specifically, the relevant bit is here: https://docs.github.com/en/github/site-policy/github-terms-o...

> We need the legal right to do things like host Your Content, publish it, and share it. You grant us and our legal successors the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time. This license includes the right to do things like copy it to our database and make backups; show it to you and other users; parse it into a search index or otherwise analyze it on our servers; share it with other users; and perform it, in case Your Content is something like music or video.

But, it goes on to say:

> This license does not grant GitHub the right to sell Your Content. It also does not grant GitHub the right to otherwise distribute or use Your Content outside of our provision of the Service, except that as part of the right to archive Your Content, GitHub may permit our partners to store and archive Your Content in public repositories in connection with the GitHub Arctic Code Vault and GitHub Archive Program.

I'm not a lawyer, but it seems ambiguous to me if this ToS is sufficient to cover CoPilot's butt in corner cases; I bet at least one lawyer is going to make some money trying to answer the question.


IANAL, but I wouldn't read that as granting GitHub the right to do anything like this. There's definitely a reasonable argument to be had here, but I think limiting the grant of rights to incidental copies should trump "[...] or otherwise analyze it on our servers" and what they're allowed to do with the results of that analysis.

On the extreme end, "analysis" is so broad that it could arguably cover breaking down a file of code into its constituent methods and just saving the ASTs of those methods verbatim for Copilot to regurgitate. That's obviously not an acceptable outcome of these terms per se, but arguably isn't any different in principle from what they're already doing.

Ultimately, as I understand, courts tend to prefer a common sense outcome based on a reasonable human understanding of the law, rather than an outcome that may be defensible through some arcane technical logic but is absurd on its face and counter to the intent of the law. If a party were harmed by an instance of Copilot-generated copyright infringement, I don't see a court siding with this tenuous interpretation of the ToS over the explicit terms of the source code license. On the other hand, it would probably also be impossible to prove damages without something like a case of verbatim reproduction, similarly to how having a developer move from working on proprietary code for one company to another isn't automatically copyright infringement.

I doubt that GitHub is doing anything as blatantly malicious as copying snippets of (GPL or proprietary) code to explicitly reuse verbatim, but if they're learning from license-restricted code at all then I don't see how they wouldn't be subjecting themselves and/or consumers of Copilot to the same risk.


Wait so does this mean a “private repo” is meaningless and GitHub can share any code in any repo with anyone?


That is not even the right question.

Why are developers so myopic around big tech? Of course they can. Facebook can use your private photos. It's in their terms and services. Cloud providers have more generous terms.

The response has always been they won't do that because they have a reputation to manage. The further they grow the further they control the narrative so the less this matters.

Wait until you find out they sell your data or use your data to sell products.

Why in 2021 are we giving Microsoft all of our code? It seems like the 90s, 2000s never happened and we all trust microsoft. They have a free editor and a free operating system that sends packets of activity the user does back to microsoft but that's okay.. we want to help improve their products? We trust them.


Of course. A "private" repo is still on their servers. It's only private from other GitHub users, not the actual site administrators. This is the same in any website, of course the admins can see everything. If you truly want privacy, use your own git servers.


Why do you think people care so much about end-to-end encrypted messaging?

Yes, the concept of a "private" repo is enforced only by GitHub's service. A bug in their auth code could lead to others having access. A warrant could lead to others having access. Etc.


yes, that's what that specific section means, but as always with these documents you can't just extract a single section, you need to take the document as a whole (and usually, more than one document - ToS privacy policy are usually different)

these documents are structured as granting the service provider extremely broad rights, and then the rest of the document takes away portions of those rights. so in this case they claim the right to share any code in any repo with anyone, and then somewhere else they specify which code they won't share, and with whom they won't share it.


Fun fact: Every major cloud provider has a similar blanket term. For example, Google doesn't need to license music to use for promotional content, because YouTube's terms grant them a worldwide license to use uploaded content for purposes including promoting their services, and music labels can't afford to not be on YouTube. (It's probable even uploading content to protect it, as in Content ID, would arguably cause this term to apply.)

It all comes down to the nuance of whether the usage counts as part of protecting or improving (or promoting) their services and what other terms are specified.


No.

> GitHub may permit our partners to store and archive Your Content in public repositories in connection


Anyone can upload someone else's freely licensed code to github. Without giving them such a license.

I do not upload my code to github, or give them any special permissions, and I am confident my code was included in the model's corpus.


The use of the definition Your Content may make GitHub's own ToS legally invalid in a large number of cases as it implies that the uploader must be the sole author and "owner" of the code being uploaded.

From the definitions section in the same doc:

> "Your Content" is Content that you create or own.

That will definitely exclude any mirrored open-source projects, any open-source project that has ever migrated to Github from another platform, and also many forked projects.


How is this different from uploading a hollywood movie to youtube? Just because there is a passage in the terms that the uploader supposedly gave them those rights, this does not mean they actually have the power to do that.


You can't give Github or Youtube or anybody else copyright rights if you don't have them in the first place. This is what ultimately torpedoed "Happy Birthday" copyright claims: while it's pretty undisputed that the Hill sisters gave their copyright to (ultimately) Warner/Chapelle, it was the case that they actually didn't invent the lyrics, and thus Warner/Chapelle had no copyright over the lyrics.

So if someone uploads a Hollywood movie to Youtube, Youtube doesn't get the rights to play that movie from them because they didn't have the rights in the first place. Of course, if the actual copyright owner uploads it, it's now permissible for Youtube to play it, even if it's the copy that someone else provided. [This has torpedoed a few filesharing lawsuits.]


Not sure how much it would matter but the main difference I see is that if I upload my own code to GitHub I have the ability to give away the IP, but if I upload Avengers End Game to YouTube I don't have the right to give that away.


I wonder how it would work if we consider you flagged your code as GPL before it hits Github.

We could end up in the same situation as the Hollywood movie even if you are also the one setting the original license on the work. Basically you have a right to change the license, but it doesn’t mean you do.


A very plausible scenario: Alice creates GPL project. Bob forks it and uploads to github. Bob does not have a right to relicense Alices' parts.


> By uploading your content to GitHub, you’ve granted them a license to use that content to “improve the Service over time”, as specified in the ToS.

That's nonsense because they could claim that for almost any reason.

E.g. assume Google put the source code of Google search in Github. Then Github copies that code and uses it in their own search, since that "improves the service". Would that be legal?

It's like selling a pen and claiming the rights to anything written with it.


If the pen was sold with a contract that said the seller has the rights to anything written with it, then yes. These types of contracts are actually quite common, for example an employment contract will almost certainly include an IP grant clause. Pretty much any website that hosts user-generated content as well. IANAL, but quite familiar with business law.


> These types of contracts are actually quite common, for example an employment contract will almost certainly include an IP grant clause.

In the US, maybe. In most of the rest of the world, these sorts of overreaching "we own everything you do anywhere" clauses are decidedly illegal.


I rather suspect judges would not see "improving the Service over time" as permission to create derivative works without compensation.

The person uploading files to github is also not necessarily doing so with permission from the rights holder, which might be a violation of the terms of service, but would mean there's no agreement in place.


I sort of doubt that GitHub could include GPL code in a piece of closed-source program that they distribute that "improves the service" and claim that this gives them the right.


That does not mean that you give them license to your code. In fact some or all of the code may not be yours to give in a first place.


It's aggravating that there is no escape. If you host somewhere else it will be scraped. If you pay for the service it will be used.


Good point, to me that explains why this is a GitHub product instead of a Microsoft (or VSCode) product.


Seems like a good reason to never use GitHub, and encourage other people not to.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: