Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

IANAL. My understanding is that the general legal precedent in the US is that a) datamining text has no copyright implications (in the same way that reading a book has no copyright implications) and b) it is not a copyright violation to use a small amount of copyrighted material provided the context is sufficiently transformative. This might seem silly or unfair to you, but that is the current legal reality.

But even ignoring that, everybody uploading code to GitHub has given GitHub the right to analyze that code as per the GitHub ToS. This is the same mechanism by which you can't upload code to GitHub with a license that says "nobody is allowed to display this code on the internet" and then sue GitHub.



I can't imagine a scenario in which any lawyer would consider granting Github the right to "analyze" code anywhere close to granting Github the right to spit out that same code verbatim without your copyright notice (even if laundered by AI).


Here's Kate Downing, an IP lawyer specializing in software license:

> According to Downing, the answer depends to a certain extent on where that code is hosted. If it’s on GitHub, there very clearly would not be copyright infringement.

> “If you look at the GitHub Terms of Service, no matter what license you use, you give GitHub the right to host your code and to use your code to improve their products and features,” Downing says. “So with respect to code that’s already on GitHub, I think the answer to the question of copyright infringement is fairly straightforward.”

Downing cautions that copilot output of large chunks of code complete with comments are more questionable to use, but that for the most part it looks above board.

https://fossa.com/blog/analyzing-legal-implications-github-c...

Here's an English lawyer on the same topic...

> The licence is broadly worded, and I'm confident that there is scope for argument, but if it turns out that Github does not require a licence for its activities then, in respect of the code hosted on Github, I suspect it could make a reasonable case that the mandatory licence grant in its terms covers this as against the uploader.

https://decoded.legal/blog/2021/06/github-copilot-initial-th...


To me regardless if it is technically legal, it certainly doesn’t feel right. Furthermore, contracts rely on people understanding what they are agreeing to, and I don’t think many developers would agree to letting the code be used outside the terms of the license they uploaded it under.

I am very surprised there hasn’t been a legal challenge to it.


What, exactly, is there to challenge?

“I’m sorry your honor I didn’t understand what I was signing” I don’t think has ever been a valid reason in a courtroom, similar to “I’m sorry I didn’t know I was committing a crime” is not a valid defense.


Courts interpret the intended and understood meaning of contracts and terms all the time. Research the term "meeting of the minds" and case law around it.

When the terms were written, it's exceedingly unlikely that they intended it or anyone understood it to be blanket permission to allow a trained AI to copy code for others and no user would have interpreted it that way. Microsoft/Github can't necessarily unilaterally increase the intended range without making it clear in the terms.

If it got to a court case, and both sides could afford it, it could be a lengthy one.

(This comment is not legal advice. I am not a lawyer.)


How does "[allowing] a trained AI to copy code" change the interpretation of the ToS?

By uploading your code, you give Github an exclusive license to use it to improve their services. Copilot is such a service. Just because it's an AI and it provides others code does not somehow invalidate the license you gave.


Again, research "meeting of the minds". It's a standard legal term directly relevant to all contracts and terms. Also, "transparency" is another important one.

Many online services have very wide terms around what they can do with your data, which most people who bother to read them interpret as being what is required for them to handle the service for you without breaking copyright law. In that context, being able to use and analyse your data to improve their services could be another catch-all that lets them do specific performance optimisation on their backend.

One party instead deciding they've got blanket permission to do whatever they like with your work, including selling it to others, may well not hold up in court.

Contracts aren't programs and one party tricking the other rarely works out in court - courts world-wide tend to rule against trickery and deception.


> “If you look at the GitHub Terms of Service, no matter what license you use, you give GitHub the right to host your code and to use your code to improve their products and features,” Downing says. “So with respect to code that’s already on GitHub, I think the answer to the question of copyright infringement is fairly straightforward.”

That's assuming that all code on GitHub is uploaded in good faith by the copyright owner, which is not always going to be the case.


Many repositories on Github were put there by people that do not own the copyright and never agreed to GitHub's Terms of Service.

Linux, for example, does not require copyright assignment. The original contributor of a change owns the copyright for that code and may have never used Github.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: