Hacker News new | past | comments | ask | show | jobs | submit login

ToS doesn't supercede copyright though, does it?

The real rub will be the first court precedent on whether GPTs infringe on source data IP.

Could see it going either way: fundamentally transformative or not.




You agree to Microsoft's ToS before you can put any code on Github, regardless of the license.

You can't opt-out of those terms, regardless of your license, just as you can't opt out of Facebooks terms which give them the right to use your content for their business or marketing purposes, even if the various chain posts that have spread there might claim otherwise.


The vast majority of the code I've written that is on GitHub isn't on GitHub because I would ever have stooped to putting it there, but because it is open source under a license that lets other people redistribute and edit the code (most often GPL or AGPL; maybe some older code under BSD) and they have chosen to use GitHub (which makes me sad, but as far as I'd have been concerned is totally within their rights). Are you claiming that people should not be allowed to clone other peoples' open source projects and put copies up on GitHub?


> Are you claiming that people should not be allowed to clone other peoples' open source projects and put copies up on GitHub?

If the github ToS supercedes the author's own licence, then I guess the uploader is effectively relicensing the code without the author's permission. That would mean the author has cause for action against the uploader, but not against github.

I personally dislike git; I find it too complicated, because of features I don't need. Microsoft has always disliked FOSS anf GPL, and I suspect that Copilot is a deliberate effort to undermine it.


Well upthread the discussion changed to using a non-open license that prevents people from training AI on it. If you released software under such a license, someone re-uploading to Github would probably be violating their terms or yours. Regardless, Microsoft would probably remove the repo if you contacted them to let them know you're the copyright holder, and the software license is incompatible with their terms.

It remains to be seen if they have a way to then clean their training data of the influence.

It would be the same situation if someone uploaded any other proprietary code.


You shouldn't have to though, they have a responsibility all their own to check that they have the rights regarding someone else's copyright before they do what they want to do, rather than to do it anyway and then to wait for the rights holder to come to them.

Copyright isn't 'opt in'.


I mean, if someone uploads a repo that contains proprietary code that also contains CI actions from the proprietary codebase, formatted the same as github actions, they're going to run those actions under the assumption that they are allowed to (even though they aren't, because it means they're running proprietary code). It's all automated. The person uploading the proprietary code would be the one infringing in that case.


Yes, but that's a different discussion. In this case the person does have the rights under the GPL to do what they do, but GitHub does not have an automatic right to assume that that gives them the right to enforce their ToS on the original copyright holder, which they effectively do.


> Copyright isn't 'opt in'.

Before 1989, copyright protection was opt-in in the US.

https://en.wikipedia.org/wiki/Copyright_notice


It's 2023.


If code with a license, say GPL, goes out somewhere else, by someone else (and therefore I don't have the right to change the license) and then I fork it, as per the license I keep the license and put it on github, and Github violates that license, aren't they violating the law? Don't they then have to remove that code?

If that's the case then in that scenario the license supersedes the ToS right?

Now imagine this: I write some code, license it, but don't publish it yet. It's licensed. Then I upload it to github. Does the license supersede the ToS? Doesn't github have to remove the code as a ToS violation? What if I show my roommate the licensed code first, does that count as publishing?

The whole thing is absurd on it's face. All code is licensed before it ever goes on github. The license always supersedes the ToS. All licenses violate the ToS. All code on github should be removed by Microsoft for ToS violations or because Microsoft cannot abide their licenses. Their ToS is fucking illegal.


> If code with a license, say GPL, goes out somewhere else, by someone else (and therefore I don't have the right to change the license) and then I fork it, as per the license I keep the license and put it on github, and Github violates that license, aren't they violating the law? Don't they then have to remove that code?

You are violating the law, probably. The ToS would say something like "I hereby declare that I hve the right to agree over the software to submit it under the ToS".


I'm not violating the law, I'm violating the ToS. They should them remove the my account and the offending code, lest they then go on to violate the law, no?


Yes but I guess it won't happen until someone complains. Similar to other content, e.g. YouTube, but in reality nobody requests takedowns of forks/copies.


Alright, now suppose someone does. Doesn't that mean Microsoft has to rework all work they made with these codebases using this legal argument and not a fair use one? Doesn't even doing this set them up for a potentially very expensive compliance action?


I think what you say is true: either they train on any open sourced code with fair use, no matter if it was published on github or anywhere, and ignoring the license, OR they trained on data that is potentially not complying with their ToS (e.g. uploaded by someone that is not the author, regardless of license, they couldn't legally agree to a ToS that gives away additional rights of the work).

However, the reality is that this is all extremely muddy, far from proving that software A has copied some code from software B where you can just compare the source code. There are too many muddy steps, and you can bet that Microsoft will just get away with it.


> If code with a license, say GPL, goes out somewhere else, by someone else

If the code is GPL licensed, you can't relicense it under a non-FOSS license like you're talking about

> Don't they then have to remove that code?

Yes, if code gets uploaded whose license is incompatible with Microsoft's terms, they probably do have to remove it.

> Now imagine this: I write some code, license it, but don't publish it yet. It's licensed. Then I upload it to github. Does the license supersede the ToS? Doesn't github have to remove the code as a ToS violation

Again, yes, they probably do, and they also probably have an obligation to clean their training data of it. However, if you're the copyright holder of the code and you agree to their ToS before uploading, they might make the case that you agreeing to their ToS does grant them the license to use it in training data.


Not re-license. I upload with the same license I got it with, as per the license.

So then Github is breaking the law a significant portion of the time then at the very least?


Microsoft/Github makes a reasonable attempt to remove infringing code. If you obtain a copy of proprietary source code owned by Apple on the dark net, and upload to github, they'll definitely remove that. If companies were responsible for user-uploaded content that the company takes reasonable steps to remove, no one would be able to accept user-uploaded content in the first place.

Facebook wouldn't be able to allow users to upload photos.

Hacker news wouldn't allow me to post this comment (someone else could own the copyright, right?)


If Github's ToS says that they have carte blanc to do whatever they want with FLOSS licensed code, including to relicense it, then either every single codebase on github violates their ToS or their ToS violates every single license and therefore the law. A reasonable attempt under these circumstances would be to remove every single FLOSS licensed repository on github, so I'd argue no, they do not make a reasonable attempt to remove infringing code.


You are conflating many different things.


This is true, but a ToS should not be able to override an important law such as copyright, which provides you with several inalienable rights that you can only contract out of with your explicit consent. Doubly so if the ToS are changed after you post your code there without your explicit approval. I could put text in the ToS of my website that you owe me your firstborn, but that wouldn't make it legal or enforceable.


Microsoft cannot opt out of common law either


It shouldn't but good luck fighting the 500 Lbs gorilla on its home turf. Most people will avoid the fight, even if they are in the right.


The ToS is necessary for GitHub to provide their services, the wording is pretty carefully constructed so that GitHub are safe to change their service and develop new things. Loosely speaking "by uploading code you grant GitHub permission to blah blah with that code"

I'm not sure how well it will go down in court fighting this... since we agreed to it. But the more interesting question will be is the result a complete "you loose" and GitHub walks away, or if they are forced to take actions in order to defend the copyright of users producing content... a "Code Id" type system that warns you if the code your uploading is too similar to someone else's in order to allow you to use the fun new AI tools to make code and pay GitHub, but also simultaneously defend users legal intellectual property rights.


I just want to make sure you appreciate that if you really believe this argument then GitHub can only be used by the people who actually directly own the copyright on projects; and if you, for example, want to clone and edit my software (the vast majority of which I explicitly never uploaded to GitHub) then you wouldn't be allowed to (which doesn't seem like either the intention or the way it is commonly used)... and like, it would essentially be impossible to use GitHub to work on an open source project that has some long storied history with many hundreds of contributors without going back and getting all of them to agree.


Ah yes, good point: plenty of the people that fork projects do not actually have the copyright to that code to begin with, they just use github while they themselves are in compliance with the license, that definitely does not give GitHub rights that they would have otherwise to negotiate with the original copyright holders. 'Open source' does not equate 'public domain' and GitHub effectively seems to try to make that claim.


I'm pretty sure thats a narrower interpretation than GitHub are aiming for. I'm just paraphrasing the parts of GitHub's ToS that I can remember since current debate on the topic has lead to me remembering a few important parts reasonably well but I've certainly not memorised them. So this is a good opportunity for me to go re-read them and quote them directly... (also in case anyone is about to mention it ... I am aware this I'm linking to the current incarnation of the ToS and it may have changed... but there have been equivalent sections in the ToS for years, and this is pretty standard stuff for User Generated Content licenses, and digging up Internet Archive links to specific historical versions is a bit further than I feel necessary for the purposes of this specific reply)

The relevant section is this:

GitHub Terms of Service: Section D, Sub-Section 3 - https://docs.github.com/en/site-policy/github-terms/github-t...

The phrase relevant to your point is "If you're posting anything you did not create yourself or do not own the rights to, you agree that you are responsible for any Content you post; that you will only submit Content that you have the right to post; and that you will fully comply with any third party licenses relating to Content you post."

It's fair to interpret that as GitHub are not going to be copyright police. The bit at the end where I suggest a "code Id" is more of a thought experiment as to how they could continue to offer the service while complying with a potential adverse ruling that doesn't ascribe blame on them or the service since theres another section of the ToS that I, with my "knows slightly more about law than average but absolutely not a lawyer" hat firmly on, feel will be how GitHub's legal team at least try to make short work of the lawsuit, their success with this tactic is a matter for the Courts, and I'd love better legal scholars to weigh in.

GitHub Terms of Service: Section D, Sub-Section 4 - https://docs.github.com/en/site-policy/github-terms/github-t...

Which reads thus:

"We need the legal right to do things like host Your Content, publish it, and share it. You grant us and our legal successors the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time. This license includes the right to do things like copy it to our database and make backups; show it to you and other users; parse it into a search index or otherwise analyze it on our servers; share it with other users; and perform it, in case Your Content is something like music or video.

This license does not grant GitHub the right to sell Your Content. It also does not grant GitHub the right to otherwise distribute or use Your Content outside of our provision of the Service, except that as part of the right to archive Your Content, GitHub may permit our partners to store and archive Your Content in public repositories in connection with the GitHub Arctic Code Vault and GitHub Archive Program."

For me the key quote being "including improving the Service over time. This license includes the right to do things like copy it to our database and make backups; show it to you and other users; parse it into a search index or otherwise analyze it on our servers; share it with other users;". Now theres some legal arguing to be done about if charging for the AI constitutes an infringement on the second paragraph which opens with "This license does not grant GitHub the right to sell Your Content. It also does not grant GitHub the right to otherwise distribute or use Your Content outside of our provision of the Service" but thats a very different argument to what I see a lot of people making. People are arguing (generally speaking) from the standpoint of "its not right, this violates my rights as the author having selected this license for my code/commits and published it under that licensed for others to share" ... not "i didn't agree to GitHub selling my content and this constitutes violation of GitHub Terms of Service: Section D, Sub-Section 4 where they told me that they would not sell my content".

But broadly speaking, unless the argument shifts to GitHub Terms of Service: Section D, Sub-Section 4 and classifying this as an unapproved sale of the users content, then I don't see how GitHub are not well within their rights to have trained the AI model and offered it as a service. We by agreeing to the ToS agreed to GitHub Terms of Service: Section D, Sub-Section 3 where we promise to only post code we have the rights to post and that we the user will comply with the legal complexities of third party licenses and basically are responsible for not posting stuff to GitHub for which we cant grant GitHub the requested legal rights, which when combined together means that we gave them permission to use our code and commits, regardless of any license files we may have put in the repos, to train the AI model. We can definitely argue derivation and what justifies a sale, and I'd be inclined to say they may actually have breached that term, but no one I've read is talking about that, its all about copyright infringement for AI generated code and moral rights with respect to using the code to train the model, not a clear cut contractual breach of the Terms of Service that GitHub may or may not have perpetrated on us as the other party agreeing to be bound by the contract.


The key distinction I'm interested in is providing the GitHub (or any similar product) "Service" vs selling a separate, derived product (Copilot / ChatGPT).

A: Common ToS to say that a product's owner obtains a license to user content for purposes of providing that user the product service.

B: Somewhat common ToS to extend that to providing the product service to third party users (i.e. use your content for other users of the service), but depends on business model (e.g. most social* businesses).

C: A lot less common ToS to obtain a right to distribute user content in derived products.

A number of sites have gotten into hot water with their userbase over trying to update their ToS from B to C. From memory... Adobe Cloud, DeviantArt, maybe some others?

Typically this gets flak in creative communities, given that it is many people's business, and they're more concerned about distribution rights than your average coder.

At its base, OpenAI/Microsoft/etc. will eventually run into the exact same issues that bedeviled the Linux kernel in the 1990s, except with a much thornier IP ownership question (given the greater number of parties).


But... we're all aware - as is GitHub - that plenty of the content there is not posted by the original copyright holders, who are the only parties that are able to enter into such a contract. That was the reason for GitHub coming into existence in the first place. You can't turn around a couple of years later and start arguing that the use of GitHub allows for a blanket exemption on copyright law, which is effectively what this amounts to.

GitHub ToS is written by GitHub, it's not a contract in the sense that no consideration has been given to the other party and as such it isn't legally binding on that other party, but regular law, such as copyright law, still applies to GitHub.


Its the same as other user generated content sites... The ToS is to legally shift blame from GitHub to the users... and thats what made me think of "code id" actually, since GitHub have a firm defence in the form of "Users doing illegal things isn't our fault, we asked them not to and tried to kick people off when we found out they were violating the terms, but they might still get slapped around a bit by the Court and need to implement some form of safeguards the way YouTube was forced to, because your point about how binding the terms of service are when the consideration is "use of this service in exchange for agreement" is true, there is not a super strong contract here, its nominally more binding than the average clickwrap contract pre-install EULA since the consideration in exchange is use of the service itself, but as case law around things like scraping and other internet activity has shown, its definitely not as binding as a physically signed sale contract would be...


It shouldn't matter if the copyright holder agreed to it directly, if they've published the original code under an open source license. Since open source licenses all allow people to use the code for "whatever"

Even GPL doesn't (yet) include a clause saying the code can't be used to train AI unless the AI itself is open source


> Since open source licenses all allow people to use the code for "whatever"

That's not what they allow for, and copyright being a 'right' it allows you to pass those rights on to others and to retain some for yourself. If not explicitly passed on the right still rests with the original author, plenty of precedent for that.


To take an example: someone who used MIT licensed code but doesn't reproduce the license.

Therefore isn't following the terms of the copyright grant, ergo doesn't have a license for use, ergo is violating copyright.

Now what does that look like when I take 100 different open source licenses, including MIT, put them in a GPT blender, and then productize my output without following any of the licenses?

... makes you think there might be a legal component to why OpenAI switched to a SaaS model. Although believe they'd still be in hot water over any AGPL et al. code.


I can't wait for this stuff to be legislated to establish once and for all what the legal status is.


This is why I only put code on GitHub if I want it to be seen by everyone (including Big Data).


It is a heavy gorilla.

1. Huge company.

2. Impossible to prove that a weight of -0.7 in a neural net means they used your code.

3. The code spat out by the bot isn't your code.


Apple, Google, Meta, Amazon, Nvidia use Github. They together are a bigger gorilla.


They are not your gorilla


And all of them would have jumped at that chance if they had seen it.


> ToS doesn't supercede copyright though, does it?

It does unless you can afford to sue Microsoft.


> ToS doesn't supercede copyright though, does it?

But in what way does reading a copyrighted work and then producing a mass of numbers as a result infringe copyright?


The copyright infringement comes about later, when that mass of numbers is used to produce a topically related work. The same rules apply for humans -- see the concept of "clean room implementation".


It really doesn't. Prose isn't source code. Learning something, then later writing something else isn't copying.


My limited understanding of case law is that transformative use is still judged very human-centricly.

E.g. the courts take a dim view of any attempt to create a machine (in the abstract sense) that takes in copywritten works and churns out similar-but-uncopywritten works




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: