Copilot sells code other people wrote

nickjj · on June 23, 2022

This might be overreacting but is there a way to opt-out of Copilot using your code in open source repos?

It feels morally wrong to me that I can spend thousands of hours working on projects on my own free will but then a company can sell the code I wrote to others in the form of snippet completion as a service. In fact they end up selling your code back to yourself if you plan to use the service.

If the answer is no, that moves the needle pretty far in the direction where I'd at least consider the idea of moving all of my repos to Gitlab. I don't care much about stars or popularity. I open source things that are interesting and useful to me and if other folks want to use it they can but I don't gain motivation from others using the projects I release. I like Github and its UI and it's no doubt "the spot" for open source but selling code written by others rubs me the wrong way a lot. It stinks because it also means no longer contributing to other code bases too. It's moving us in the opposite direction of what open source is about.

kemiller · on June 23, 2022

This is a really good point that I hadn't considered before. It's facebook all over again — selling your own content back to you. Repo owners should be at least compensated when their code gets used. That would be an incredible market.

selcuka · on June 24, 2022

> That would be an incredible market

I for one welcome our new CEO (Copilot Engine Optimization) overlords.

Jokes aside that will likely cause GitHub to be filled with lots of low quality repos (even AI generated, oh the irony!), to trick Copilot into using their code.

account42 · on June 24, 2022

> Jokes aside that will likely cause GitHub to be filled with lots of low quality repos

Hasn't this already been the case since GitHub became a CV boost.

leereeves · on June 24, 2022

I don't think that would be possible. One of the big limitations of neural networks is that they don't cite their sources.

radus · on June 24, 2022

Calculate a rough semantic similarity score across all your snippets, and pay out a fractional reward to all originating codebases.

I think the bigger problem is that it will almost certainly lead to a proliferation of giant snippet spam repositories.

nemonemo · on June 24, 2022

It is like the search engines vs. SEO arms race. The hope could be that such proliferation can be managed by disincentivizing such abuses. The reality might be vastly different with codes that have more regularity and better chance for AI's emulating humans than the natural language texts.

woleium · on June 25, 2022

Then they need to negotiate a non-attributed contract with you before using your code to train (not sure abiut testing though).

PaulKeeble · on June 23, 2022

It should be automatic based on license. GPL code definitely shouldn't be included but MIT could be. They already have this information in most repositories and if its missing they have no right to use it at all. We don't need extra options the licenses already restrict the use and derivative work.

davesque · on June 23, 2022

Not without the text of the license. I, as a developer, cannot just poach open source code under MIT without including the copyright and terms from the original project. From the license:

"The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software."

meshaneian · on June 23, 2022

They might argue that a snippet isn't a "substantial portion" of "the Software", and they're only charging for the service not the content - regardless, I don't like it, this is exactly what certain licenses attempt to prevent.

leereeves · on June 23, 2022

I would argue that substantial shouldn't be measured in lines of code, it should be measured in importance. Something like the fast inverse square root is substantial even though it's short.

williamcotton · on June 23, 2022

The fast inverse square makes for a poor example when it comes to notions of intellectual property and copyright because there is prior art. The Wikipedia page has a history.

And imagine if Microsoft had been able to copyright the fast inverse square function before Carmack sat down to write Quake!

leereeves · on June 24, 2022

Prior art matters for patents, not copyrights. Carmack's code is still protected by copyright even if he didn't invent the algorithm.

And copyright doesn't prevent someone else from implementing the same algorithm, only from copying the code. If Microsoft had been able to copyright the fast inverse square root function, Carmack could still have written his own version and even copyrighted that version himself.

williamcotton · on June 24, 2022

Prior art has been used for copyright in plenty of court cases, and increasingly so!

Here’s an example: https://scholar.google.com/scholar_case?case=728470765881077...

It seems particularly apt to consider prior art for use in software IP if only for the similarities with the patentable invention of mechanical parts.

imtringued · on June 24, 2022

What? You are free to reverse engineer any non patented object and reproduce an almost identical object. Prior art for copyright is meaningless unless it is about authorship. You can still do a clean room implementation and ignore prior art.

williamcotton · on June 24, 2022

But the current interpretation is that you cannot claim authorship over things that were already in the public domain. Feel free to read the case notes and follow some of the links in there for more info. Am I not interpreting the court rulings correctly?

——

Considering de novo the evidence before the district court, we hold that the district court did not err in granting summary judgment. Johannsongs failed to offer admissible evidence to rebut Ferrara's analysis, so there is no genuine dispute of material fact as to his conclusions that Söknuður and You Raise Me Up are not substantially similar and most of their similarities are attributable to prior art. Based on these conclusions, Johannsongs has failed to satisfy the extrinsic test and Defendants are entitled to judgment as a matter of law.

woleium · on June 25, 2022

but they used all your code to train. that's pretty substantial..

typetheorist · on June 23, 2022

I too have reservations about Copilot, but does the MIT license define a "substantial portion"? I doubt a snippet would fall under either "copies" or "substantial portions"

davesque · on June 23, 2022

I doubt many licenses define that kind of terminology. That's left to precedents established by actual cases. My point was just that you're not free to use code from an MIT-licensed project without following the terms of the license. The other details get worked out when legal actions are taken.

ellyagg · on June 23, 2022

Well, I hope your viewpoint doesn't win the day, because making code as freely shareable and remixable as possible is a huge boon for humanity.

celeritascelery · on June 23, 2022

Code being freely shareable and remixable is great. Selling that open source code for profit is not.

WisNorCan · on June 23, 2022

Is your take that Microsoft should offer this for free? Or if they are not willing to do it for free, Microsoft should cancel this service and we should wait for Apache or someone else to offer the service?

Or something else ?

gfrff · on June 23, 2022

Microsoft should make this service free for open source (not just thought leaders), and compensate people otherwise. I should have a 0.01% equity in Open AI if they're using my stuff like this.

Or they should do opt in.

throwaheyy · on June 24, 2022

Half serious/flippant, we need MS to create a cryptocurrency so that developers can be credited with micropayments each time their code gets “quoted” in the IDE.

<ducks>

earnesti · on June 23, 2022

What is wrong with someone making a little dough. It is just numbers in database.

jdbernard · on June 23, 2022

Yeah, but those numbers translate to food on the table for my kids, a roof over their heads, better education, etc. Come on, this is a tired response. Nothing is wrong with people making money. There is a lot wrong with people making money off of the hard work of others without any consideration or remuneration.

gopiandcode · on June 24, 2022

I feel like you're missing the forest for the trees here - making code freely shareable and remixable is exactly the purpose of GPL and other free-software licenses, but you can bet that the proprietary codebases Copilot will be used in will go out of their way to prevent any such uses of _their_ particular code snippets.

IMO, the only way to use Copilot's output in an ethically sound way is to only use the output it produces in AGPL licensed projects (assuming that Copilot has not been trained on any non-free software codebases which in itself is a strong assumption).

account42 · on June 24, 2022

> IMO, the only way to use Copilot's output in an ethically sound way is to only use the output it produces in AGPL licensed projects

Even then, that is missing attribution which should really be the default for all code reuse and derivation even when you legally are allowed to omit it.

rpd9803 · on July 2, 2022

Based on this comment, you may not understand what the GPL's purpose actually is, because it is NOT simply for the promoting sharing and remixing. The GPL is for ensuring that code, and its derivatives, are all able to be shared and remixed in perpetuity. the biggest (imo) difference between GPL and MIT/BSD licenses is that you CAN NOT use GPL'd code in a non-GPL* codebase. (*or GPL-compatible license)

jnsie · on June 23, 2022

It's just as shareable on Gitlab, no? And the issue isn't that code is not shareable - it's that a huge corporation is profiting from this code without consent from the developer.

leereeves · on June 23, 2022

> a huge corporation is profiting from this code without consent from the developer

Also without attribution. The more permissive licenses allow corporations to profit from shared code, but most of them still require attribution.

And it's really not much to ask: when someone gives you free code, give them credit for their work.

bayindirh · on June 24, 2022

Well, I hope your viewpoint doesn't win the day, because breaching GPL left and right to make some developers life easier opens huge cans of worms.

throwaheyy · on June 23, 2022

The Twitter thread’s title seems unnecessarily incendiary and clickbaity.

I don’t buy that producing/synthesizing code snippets based off public repos is a problem.

There’s nothing proprietary or original about eg. the syntax of a for-loop, or the boilerplate of setting up some JS framework MVC.

Besides, it’s basically just a (semantic and contextual) search engine inlined within the IDE. Copyright infringement hasn’t taken place until the user activated the autocompletion and actually placed the code within their own and released their code containing the infringing code.

DJHenk · on June 24, 2022

> There’s nothing proprietary or original about eg. the syntax of a for-loop, or the boilerplate of setting up some JS framework MVC.

Of course there is something proprietary or original about that. Why else would they need such an enormous AI to suggest it. Auto completing simple boilerplate was already solved in a much simpler way.

> Copyright infringement hasn’t taken place until the user activated the autocompletion and actually placed the code within their own and released their code containing the infringing code.

Copyright infringement takes place as soon as some company publishes/sells material without explicit license or permission. So not the moment the users hits accept, but the moment just before that: when the tool shows it to the user.

throwaheyy · on June 24, 2022

Applying your logic, is any search engine infringing on copyright because it contains a snippet of the source page?

After all, if showing a search result in the IDE is “publishing” (let alone “selling” (?)) why hasn’t Google been sued out of existence for showing search results (oops, “publishing” copies of original work, billions of times over), as well as selling related advertising?

https://en.wikipedia.org/wiki/Fair_use

> Examples of fair use in United States copyright law include commentary, search engines, criticism, parody, news reporting, research, and scholarship.

tremon · on June 25, 2022

is any search engine infringing on copyright because it contains a snippet of the source page?

In some jurisdictions, it is. And in other jurisdictions, it is only allowed as long as it shows a link to the source page, which Copilot also doesn't do.

lbhdc · on June 23, 2022

I stopped publishing open source after all this started coming out because I was so uncomfortable with it.

jaywalk · on June 23, 2022

If your code is using a license that allows it, how could you possibly opt-out aside from using a different license?

sammax · on June 23, 2022

Don’t most licenses require at least attribution? I don’t believe GitHub is restricting themselves to only licenses that don’t. In fact the only software licenses I can think of that don’t require attribution are 0BSD, WTFPL, CC0, MIT-0 and Unlicense, and they all aren’t super popular. Also in some countries creators have inalienable moral rights which can be enforced regardless of the license. For example in Germany it is impossible to relinquish certain rights you have as the creator of a work, including the right to attribution.

TAForObvReasons · on June 23, 2022

This is an important and overlooked point. Even common permissive licenses (ISC / MIT / Apache-2.0) require attribution

jazzyjackson · on June 23, 2022

Just as a mind experiment: couldn't CoPilot just publish a list of every github user and attribute the work to all of them?

TAForObvReasons · on June 23, 2022

CoPilot is a black box at the moment. Microsoft claims they used the public corpus on GitHub. There are plenty of GPL, AGPL, and "source available" projects in the public corpus. So what exactly is the licensing?

The argument may make sense if they limited themselves to public-domain (CC0) works, but that is not what happened here. If CoPilot attributed something to an AGPL project, does it mean the "virality" applies to all projects that use code from CoPilot?

ntoskrnl · on June 23, 2022

There's also a good amount of commercial and leaked source code on GitHub, including MS's own leaked Windows XP source. I haven't played around with Copilot yet, but if I ever do I plan on copy/pasting some win32 API definitions to see if I can get it to spit out any of the leaked source.

yellowapple · on June 23, 2022

> if I ever do I plan on copy/pasting some win32 API definitions to see if I can get it to spit out any of the leaked source.

If that works, then I can't wait for that to be a boon for Wine and ReactOS: "Microsoft itself provided this code and allowed us to use it, so therefore it's totally legal. Neener neener."

rpd9803 · on July 2, 2022

Some trivia: CC0 is a public domain declaration.. at least in the US. There is no process by which an author can make their works public domain, CC0 is just a (weak) promise that the copyright holder will treat the work as if it were public domain.

whoisthemachine · on June 23, 2022

This feels like a tool that can easily be destroyed by a lawsuit, I can't imagine a TOS can force you to give away your copy rights (especially if they allow and encourage you to post your own copyright).

kragen · on June 23, 2022

If it can't then Wikipedia is doomed; its entire licensing status rests on the notion that editors grant such a license as part of their clickwrap ToS.

bouke · on June 23, 2022

Does GitHub verify that the code that is in my repository is actually in accordance to the license that I’ve added? I could just upload any proprietary code with an incorrect license, and GitHub would just use that to feed their AI. Like any other dependency that you incorporate into your application, GitHub should verify/audit whether the license allows them to do so.

nickjj · on June 23, 2022

> If your code is using a license that allows it, how could you possibly opt-out aside from using a different license?

A repo setting that instructs Github not to use your code for Copilot, it could be a similar option as turning Discussions on / off.

If they really want to win developers over they would even have Copilot scanning disabled by default but that'll never happen.

quietbritishjim · on June 23, 2022

Even if Github did provide that setting, as a courtesy, someone could clone / fork the code to another repo (if you use any licence that allows it) and not enable that setting.

Inityx · on June 23, 2022

Sure that's possible, but there's a huuuge difference between Possible and Default Behavior.

TAForObvReasons · on June 23, 2022

In a case like this, GitHub itself could set up a bot account that forks all projects as soon as you make the switch. The company in fact would be incentivized to do so.

jonny_eh · on June 23, 2022

Sounds like you want a new license that just prohibits use by one company for one purpose.

thamer · on June 23, 2022

There are other AI-based code completion systems than Copilot, at least Tabnine[1] and Kite[2] come to mind, I'm sure there are more.

[1] https://www.tabnine.com/

[2] https://www.kite.com/

belter · on June 23, 2022

As of today there is a new one...

"Now in Preview – Amazon CodeWhisperer"

https://aws.amazon.com/blogs/aws/now-in-preview-amazon-codew...

widjit · on June 23, 2022

is there something wrong with that?

jonny_eh · on June 23, 2022

Not at all, you can put any license on your code that you want.

igneo676 · on June 23, 2022

I'm not sure using a different license actually opts you out. By merely hosting your code on GitHub you grant them the right to analyze your code on their servers[1]

They may be morally in the wrong, but I'm unsure they are legally in the wrong here. To boot, denying them the right to create this tool in your license is technically a violation of OSS principles and problematic

[1]: https://docs.github.com/en/site-policy/github-terms/github-t...

typetheorist · on June 23, 2022

> This license does not grant GitHub the right to sell Your Content. It also does not grant GitHub the right to otherwise distribute or use Your Content outside of our provision of the Service, except that as part of the right to archive Your Content, GitHub may permit our partners to store and archive Your Content in public repositories in connection with the GitHub Arctic Code Vault and GitHub Archive Program.

Wouldn't this be a violation?

okasaki · on June 23, 2022

Microsoft could provide an opt-out for projects or even contributors, regardless of licence.

ghostbrainalpha · on June 23, 2022

It would be kind of cool if Github could show some stat that code you wrote has been used 50,000 times for 12,000 people.

Being a top CoPilot contributor should at least have value to signal on your resume.

dragonwriter · on June 25, 2022

> This might be overreacting but is there a way to opt-out of Copilot using your code in open source repos?

I don't think there is a way to opt out if it is a public repo regardless of license, and Microsoft's copyright theory suggests that they wouldn't feel obligate to enxclude any code they got their hands on except under a specific NDA preventing such use; the use of public GitHub repos isn't based on legal constraints but practical convenience.

invig · on June 24, 2022

They’re not selling you code. They’re selling you an engine that helps you find the right free code at the right time.

If you read free code yourself it’s fine, but if a machine does it for you it’s not? We overvalue humans.

bayindirh · on June 24, 2022

> If you read free code yourself it’s fine, but if a machine does it for you it’s not? We overvalue humans.

No, it's not fine. Apparently, you missed SCO & Oracle vs. Google cases. Both of these cases argued that somebody looked to the code, and copied it. In SCO case it was not true, but the argument stretched the timeline rather successfully. In Oracle vs. Google, copying function signatures opened a big can of worms.

So, just by copying the function signature without filling it the very same code with the original, even for interoperability, you're getting into a huge gray area in a legal sense.

Similarly, no sane Wine developer will read leaked Microsoft source code, yet alone copy it. Again, no sane emulator developer will read leaked Nintendo code.

Reading the code "colors" your creativity, and if you're tried at court and enough similarity is found in your code with the leaked code, it's game over.

So, reading code and copying is not guaranteed to be legal, depending on its license. When this is done by a robot, it's still illegal (you're breaching licenses during the code generation process), and immoral and unethical on top of it.

So, we don't overvalue humans, but overvalue AI, which is just informed search, BTW.

invig · on June 30, 2022

I suppose you learned to code without reading any code?

bayindirh · on June 30, 2022

No, I didn't and don't read other people's code to understand how something works. I use books and official language/library documentation for that.

On the other hand, this is irrelevant to the issue at hand.

GitHub copilot is not a tool for education. It's tool for auto-completing code, which can be put to production, where licenses and other stuff come into play.

The issue is not code sharing per se. It's more of a legal problem, and an important one at that. In the software copyright sense, even reading code you can't import to a project (let it be leaked, not compatibly licensed or for any reason), puts you at risk of legal troubles. This is why we have methodologies like "clean room development".

In Copilot's case, you're possibly deriving a code from a source which contains many licenses, and some of them are not compatible with that you're doing. As a result, you're in direct breach of the license which is not compatible with your code.

On a more higher level, you're also breaching the ethics code and morality by using a code or its derivation with an incompatible license to your code, and disregarding other peoples desires codified as a case-tested and valid license.

As a result, if you think that using a derivative of a GPL licensed code in your closed source application is OK on every front, then the vice versa is true. I can disassemble and reverse every part of your code and re-implement it as GPL bug for bug and open it.

Because if you can breach my license, and expect no consequences, I can breach your license without consequences, as well. It's a two way street.

Guid_NewGuid · on June 23, 2022

I find this whole topic very annoying, this is like the 3rd variation to reach the front page today. But it has made me realize why I instinctively dislike Free Software as a movement.

Copyright and licensing are bad, actually. Stop getting worked up about the idea of using courts to punish theft. Stop getting into a frenzy of arousal about the police kicking down doors to drag Billy Gates to jail because 80 characters of fast square root is theft but 79 isn't.

Where on earth is the ambition and vision!? Knowledge is public domain. A commons of knowledge is a public good. The cost of code copying is zero.

Sure in our day job we have to pretend to care about this stuff. But when did the ideological scope of what can be achieved become rules lawyering over license text.

Copy my MIT licensed code without attribution? I don't give a shit, go ahead, I hope it helps, in fact I want a truly public domain license but copyright law is so hostage to corporate interests no such thing exists in many countries.

Free the code.

sirsinsalot · on June 23, 2022

"A commons of knowledge is a public good."

Yes but this copilot model takes that, adds value and doesn't itself join the public common good. Instead it takes it, and makes you pay to have it back in another form.

If copilot were open source and the model released for the public good, being built of public data (in your scenario) we would have a very different conversation.

visarga · on June 23, 2022

It costs money to run a huge language model with low latency, in the loop with you - charging 10$/month is reasonable. You need multiple GPUs to load even a single copy. Copilot is adding something extra to the original code - it selects the recommendation from the whole corpus, while keeping the surrounding context into consideration and adapting to your variable names.

And in reality 99.9% of the generated code has no long ngrams in common with the training set, it's already original. All they need to do is to enforce never to generate data identical to the training set, something that can be implemented with a bloom filter, then the generated code is impossible to attribute and should have no legal problems.

In the end what do models like Copilot do? They act like culture - absorbing and replicating memes. They free the knowledge and make it reusable. They can act like a general purpose NLP tool for information extraction, classification and text generation. You can implement your ideas faster with it, don't need to label much data.

It works even with just a prompt. Try OpenAi Codex to extract a receipt to see what I am talking about - it gives you the output in JSON. It's a new tool and a new interface to the computer. There are going to be plenty of open source implementations as well, some are already under training.

sirsinsalot · on June 24, 2022

You are incorrect. The code it generates is substantially the same (complete with comments) as the input, which is often sought without permission and in violation of license.

And offers nothing back to those authors in return.

HeavyStorm · on June 24, 2022

Thank you for you this. I wouldn't never been able to articulate it better - people are just annoyed that someone is making m money and they aren't, without considering why that is.

sirsinsalot · on June 24, 2022

That's not at all what people are annoyed at and to be reductive like that is childish.

The issue is consent of using people's code as input and paying nothing back.

Also the parent comment is substantially technically wrong on a number of points, but feel free to use it as validation for yourself.

andybak · on June 23, 2022

And I really don't mind.

I want every line of code I've ever written to be used as much as possible.

I find "intellectual property" to be dubious to the core. I'm not confident enough in my feelings to be a zealot, but if I had to pick sides then I know which side I would pick.

JoshTriplett · on June 23, 2022

You're welcome to use a "do whatever you want" license on your code, and people should respect that. (Though even those licenses tend to require attribution, and copilot doesn't do even that.)

Other people use licenses that try to create a commons where if you want to use it you need to share your own code, as a counterpoint to the non-commons in which you can't use code at all. And if people use those licenses, they should be respected as well.

By all means, eliminate copyright, and let all code be copied freely. And until that happens, as long as proprietary code exists and doesn't let anyone copy it, respect copyleft licenses as well.

andybak · on June 23, 2022

A fair point. "What to do in a world where copyright already exists" is a tougher question to answer and one in which I tend to go back and forth.

sirsinsalot · on June 23, 2022

If an AI "listened" to music and created new samples for musicians to use for a fee, do you not think the original musicians should be compensated?

The value transfer is basically theft.

It isn't about the usefulness of the service, or even that something similar is a good thing ... it is about the execution and what it says about fairness for those that worked to create the data it depends on to produce value.

andybak · on June 23, 2022

I'm not sure I was clear enough when I expressed my doubts about the concept of intellectual property.

Your musical example is playing out in the courts in multiple forms. The Marvin Gaye case, Led Zeppelin, Katie Perry etc.

And each case pushes me further towards wanting to rip down the whole rotten edifice.

We've lived through 4 or 5 decades of unprecedented expansion of the domain to which IP lays claim. Surely it's time for the pendulum to swing the other way?

HeavyStorm · on June 24, 2022

If I listen to music and create samples to be sold, is it theft?

sirsinsalot · on June 24, 2022

If your aim is to produce music that is directly derived from what you listened to ... yes. And this has been tested in court time and again.

AI isn't being inspired or creative, it is mass scale and mechanised bootlegging.

To compare it to human inspiration is naive or wishful thinking.

Guid_NewGuid · on June 23, 2022

Yes they haven't paid it forward, or back, but why fight on the occupier's territory. By calling for legal frameworks to enforce this we accept the language and terms of the dominant party. By using courts and the law and creating new law for copyright we actually move further from the goal of abolishing copyright and IP entirely.

Every time we use courts to enforce IP we're strengthening the Walt Disneys and Nintendos of the world.

(I accept I am in a group of like 3 people with this goal but it's my view)

Edit: to expand slightly more on this. People should be able to decompile/reverse engineer whatever the hell they want. They shouldn't have to worry about armed goons kicking down their doors. Every time cases are used to strengthen the enforcement of IP/licensing, whether for the light (FSF) or dark (Micro$oft, Google, etc) the outcome is the same, we move further from that goal.

matheusmoreira · on June 23, 2022

> the goal of abolishing copyright and IP entirely

Completely agree with you. It's the 21st century, once data has been published there is no controlling it anymore and all attempts to do so lead to the destruction of computer freedom. No doubt people all over the world copy code every single day with nobody even finding out about it. I'd rather get rid of all these monopolists than limit the potential of computers to whatever reality enables them.

>I accept I am in a group of like 3 people with this goal but it's my view

Now we're four.

handoflixue · on June 23, 2022

> Every time we use courts to enforce IP we're strengthening the Walt Disneys and Nintendos of the world.

Can you actually point to substantial examples where Disney or Nintendo benefited significantly from a precedent set by an open source court case? Open source has been around for decades, so it should be trivial to find numerous clear-cut examples at this point... if your theory is actually correct.

Guid_NewGuid · on June 23, 2022

No, I honestly have no idea. I know nothing about the law and understand even less. I may be wrong about all of this, but if we take the (laughable) idea of justice being blind it stands to reason any precedent that protects a single open source developer also protects Amazon's code.

ozim · on June 23, 2022

Funny thing is ALL these legal frameworks are there to protect these 3 people like you.

If there would be no enforcement of IP/licensing or legal enforcement - M$, Google etc. would not be nice - they would just come over and kick your doors cut your head off because they could do so. With legal framework they at least have to ask someone else.

You just have to understand you don't stand a chance with your 3 buddies against 10 motivated attackers.

Writing about "accepting terms of dominant party" you clearly never had a robbery at your house - imagine now corporations doing the same when there would be no legal frameworks.

Read up on Dutch East India Company - or just Nestle - Microsoft or Google are still quite nice companies with Walt Disney and Nintendo.

Guid_NewGuid · on June 23, 2022

This is a slight misreading of my general political position. I am pro-government in general. I find the term "monopoly on violence" to generally indicate someone who lives a very cosseted and easy life who can spend time getting mad about like, seatbelt laws or speed limits, so I use it somewhat tounge-in-cheek.

There's quite a lot of possibilities between DMCAs of youtube-dl repositories and Big-co death-squads decapitating people in their homes. I'd prefer where we are now to the Brazil end of that spectrum but we can imagine better models of digital and intellectual 'property'.

zzo38computer · on June 23, 2022

I also agree to abolish copyright and IP entirely.

I agree that people should be able to decompile/reverse engineer whatever the hell they want.

And if armed goons (whether goverment or if they are Microsoft or some company) kick down your doors, then they should be arrested for trespassing.

JoshTriplett · on June 23, 2022

Proprietary software is more than willing to use those legal frameworks. Unilaterally disarming while your opponent does not is a losing strategy.

As long as copyright exists, copyleft should be respected.

Varqu · on June 23, 2022

People (github in this case) do something to make your life easier so that you can save time for the price of 1 latte per month and you complain?

Software Developers seem to be the most whining profession in the world and I despise this attitude (while being a developer myself)

tuckerman · on June 23, 2022

People aren't whining because the price is too high, they are upset because some (myself included) believe Microsoft is exploiting developers by copying their work against their wishes and then turning around and selling other developers a product which may or may not be generating code which violates copyright/patent licenses. A developer who inadvertently uses a copilot suggestion which gets them into hot water is going to be spending a lot more than a the cost of a latte to defend themselves in court.

sirsinsalot · on June 23, 2022

This. It is a matter of (a) consent and (b) compensating people that, without their data, the model would be useless.

Varqu · on June 24, 2022

If someone contributes to open source, then they shouldn't be surprised that someone else uses this code. The licensing hell is something that shouldn't belong in IT.

tuckerman · on June 24, 2022

When source code is made available under an open source license, there are strings attached; attaching those strings is the author’s right! Assuming you or any company has the right to do anything you want with that code without respecting the license is immoral.

That “licensing hell” (i.e. strong copyleft protections) is the reason we enjoy such a vibrant and large open source community today. I don’t take it for granted that open source as we have it today was inevitable: it required a lot of work and I’d hate to see that slip away.

Varqu · on June 25, 2022

The licensing hell is exactly the problem. If someone contributes to open source, which is a praiseworthy activity, then they do it with the intention that anyone can use this code but also re-adapt it, bundle in new products - it's all about bringing humanity forward.

And all those "you can do this, but you can't do that" licenses are things that only invite lawyers to the tech world. IMHO, licensing open source is a bullshit activity.

tuckerman · on June 25, 2022

You are making a lot of assumptions about what someone wants/intends when they contribute to open source codebases. If an author chooses, for example, the AGPL, I think they clearly had a different intention. Like it or not, not everyone wants to dedicate their work to the public domain.

Varqu · on June 26, 2022

Then why contribute to open source if you want to still be a gatekeeper? In that case better to fork it and work in a private repo.

rpd9803 · on July 2, 2022

GPL code is open source but what you do with it also needs to be open source as a condition of its use. Will CoPilot inform developers if suddenly the code suggested requires them to re-license their software?

tuckerman · on June 26, 2022

Every large successful open source project I know is explicitly not in the public domain/licensed CC0. I understand that there are some people that are very against copyright/intellectual property but you surely must interact with a large number of projects/people that disagree.

Philadelphia · on June 23, 2022

Yep, anything useful has to be legal and welcomed. Microsoft should start breaking into people’s houses and sorting their underwear drawers for them while they’re out. Million dollar idea!

jppope · on June 23, 2022

> "Yes but this copilot model takes that, adds value and doesn't itself join the public common good. Instead it takes it, and makes you pay to have it back in another form."

$10/ month ... how much to you think this thing cost to build, and to maintain?

nightski · on June 23, 2022

That's the whole point. Without the data, it would be worthless. Microsoft is not paying the full cost because it is ripping the data without asking consent. I'm not saying what they are doing is illegal per se, but it's definitely immoral.

Guid_NewGuid · on June 23, 2022

But why is it immoral? All that code is still out there, if I had the time and the resources I could build a language model. Unlike commons in the real world (e.g. land, fresh water, etc) a code commons is purely additive. With the release of Copilot (which I don't intend to pay for or use) nothing has been destroyed, instead we'll get more code for less work where companies do pay for their developers to use it, some might even find its way back into the commons as new open-source code (whether more code of copilot generated quality in general is an unalloyed good is left as an exercise to the reader).

bayindirh · on June 23, 2022

Because copilot is violating the terms I put for my code. My code is GPL. It cannot be put into projects with incompatible licenses. That’s my code, and I share it with strings attached. You can’t just copy my code and sell to other parties no strings attached.

If that’s fine and dandy, Microsoft should also train Copilot on their source code repositories, so we can use that knowledge, too.

ShamelessC · on June 24, 2022

I guess I've just never had to work with GPL code before, but the complaints essentially only seem to be coming from coders who like this style of open source where you still get to make it kind of a pain in the ass to actually use your software.

I guess you have the right to do this, but it doesn't mesh at all with why I personally contribute (without any expectation of attribution), which is that (much like stack overflow), programmers mostly agreed awhile ago that it's just easier if we all share.

So much of what's wrong with the modern economy comes down to seeking rent on an idea that should just be public knowledge.

Sorry if my viewpoint towards your work is apathetic, but the whole field is already infested with academics who only understand citation as a useful metric. Further, the point remains that anyone with enough money could do this - not just Microsoft (Salesforce has released several models for python competitive with Copilot). Times are changing - maybe don't share code anymore? I imagine in ten-twenty years this whole conversation will seem pretty petty though when your entire program is trivially recreated from its GitHub description without ever needing to have seen it in the first place.

imtringued · on June 24, 2022

>from coders who like this style of open source where you still get to make it kind of a pain in the ass to actually use your software.

Most "coders" don't publish anything if they don't have to. Using proprietary code is an even worse pain in the ass because you don't have access to it.

The point of the GPL is to force people to share their code.

>which is that (much like stack overflow), programmers mostly agreed awhile ago that it's just easier if we all share.

>So much of what's wrong with the modern economy comes down to seeking rent on an idea that should just be public knowledge.

The entire point of the GPL is to force e.g. hardware vendors to share their driver code under the GPL or any other opensource license to be included in the Linux kernel.

>Times are changing - maybe don't share code anymore?

The entire point of the GPL is to force people to share their code.

> I imagine in ten-twenty years this whole conversation will seem pretty petty though when your entire program is trivially recreated from its GitHub description without ever needing to have seen it in the first place.

What the hell are you talking about? If that is the case then why did humans ever bother with extensively documenting and testing their software if three sentences are enough to encode it? Your perspective is particularly annoying because copilot isn't learning to write its own code, it's entirely reliant on an army of unpaid software engineers publishing code on the internet. If it knows how to recreate a project from just the GitHub description it basically just had the codebase inside its model to begin with and merely pretend that it did everything on its own. That is actually a form of rent seeking.

ShamelessC · on June 24, 2022

> extensively documenting and testing their software if three sentences are enough to encode it

Was just hyperbole for "from plain English specs/requirements".

I'll admit to being uninformed about GPL, but your understanding of large language models is also limited. They actually learn to interpolate between data points meaning they can compose sequences not found in the training data. Further, GitHub added a feature that checks existing code for a match and rejects predictions if any match occurs.

bayindirh · on June 24, 2022

Nobody disputes their ability to interpolate, I think (at least not me), but the problem is the starting points for these interpolations contains GPL licensed code, hence it derives GPL licensed code.

This derivation brings GPL in, and the model doesn't understand this. As a result, every time a GPL training data is mixed into the interpolation, you're converting the code GPL, or if you're not converting your code to GPL, you're violating GPL.

It's plain and simple.

On the other hand, I'm hearing "we'll write the specs, and computer will just auto-generate it" gospel since 2002. This time it won't be different. Human brain, intuition and creativity is beyond algorithmic modeling.

So, no, computer will not autogenerate the code from specs. It might link boilerplate together, which can be already done today.

namose · on June 24, 2022

But GPL owners aren’t seeking rent, so you’re just asking those who believe all code should be open source to unilaterally let large companies use all their code, while they reap no such benefits from the large companies

ShamelessC · on June 24, 2022

Like I said, I understand the premise, just not the emotion behind why you want to release code to the public at all if it isn't simply a donation to all human knowledge.

There are better ways to gain notoriety as a coder than by essentially legally requiring your name is attached to a thing for all time.

I personally would be thrilled to know my work was valuable enough to be used by a company because I really just couldn't care less that about the "credit" part of it. I know what I've done and don't have anything to prove.

bayindirh · on June 24, 2022

It's not an emotion. It's a stance.

> Why you want to release code to the public at all if it isn't simply a donation to all human knowledge.

On the contrary. I donate my code to all human knowledge. Just not to corporate's private code corpus. I intend my code to be open to all humans to run, study, modify and share, forever. I don't give you the freedom to take it to a closed domain, and not share the further knowledge you derived from my code. If your primary intention is to return this knowledge to human kind, GPL is an enabler, not an hinderer.

> I personally would be thrilled to know my work was valuable enough to be used by a company because I really just couldn't care less that about the "credit" part of it. I know what I've done and don't have anything to prove.

I personally don't care whether my code is good enough to be used by a company. If I want to contribute code which can be used by a company, I can contribute to MIT projects (which I also do). I don't have anything to prove.

I release my code with the hope it'd be useful for somebody, and I don't want it to be included in any permissive or closed source base. Doesn't matter it saves your beef for today or not. That's not my problem. Go write a better one, then. I don't care.

bayindirh · on June 24, 2022

When actually using the software means "taking it, adding it to a commercial software and never telling anyone, incl. the developer of the original code, and not giving any attribution whatsoever, and earning money over that piece of code", yes GPL makes it hard. It's by design, and this is why I license anything and everything I put in the open GPLv3+.

If anyone contributes to a GPL software, they're clearly attributed. Moreover, Git makes this attribution irrevocably visible. Before that patches were sent in with mails, and mailing lists were open, so attribution was also visible back then. So, no, GPL makes attribution visible, and irrevocable, by design.

GPL doesn't seek rent over any idea. It forces ideas to stay open, forces you to put your improvements back in the open. You'll be attributed, your code will be in the open all the time, and nobody can grab and run your code and hide into its software to make any kind of unjust profit, which makes "Open Source" coders visibly and literally wince and cringe, because they can't grab and paste a piece of code and make their days easier.

Again, this is by design.

Sorry if my viewpoint towards your view is apathetic, but the whole field is already infested with programmers who only understand being able to copy and paste code left and right to develop software as a useful metric.

It's not about Microsoft, it's just about being honoring a license. A case-tested, lawyer written, trusted license which many developers chose for licensing their work. It's a breach of contract, plain and simple.

As I said elsewhere, some of the code I'm writing is backed by papers. I don't obfuscate my papers to prevent anyone from implementing it, but if I open my reference implementation as GPL, this is because I don't want someone to grab it and run with the code, change it a little, put into a closed source program and call the idea theirs, possibly patenting it in the process.

I have a serious piece of research, my Ph.D. actually, and I'm still developing the code powering the whole idea. I was planning to open it under GPL license, to force its evolution in the open, but I understood that people don't appreciate that. So, probably I won't open the code. Binaries maybe. Highly obfuscated, protected binaries, probably.

Banana699 · on June 23, 2022

You can say the exact same about piracy, when I take a game or a pdf book from a pirate site, nothing is destroyed, nothing is subtracted. The server still owns the data and can copy and share it infinitely, all that changed is that I now have a copy too, and I use it to enrich my own intellectual life.

The argument has 2 main flaws

1- It's not symmetric. The massive corporations with paid armies of lawyers aren't hugging trees and talking about how "Knowledge is - like - just free, man" with dreamy eyes, I would love if they were like that but no. They are constantly on the lookout for anyone remotely using their work. They don't deserve the language of free knowledge and open data, that would be like extending peace to an invading army, or defending a tyrant with the lingo of free speech. He Who Lives By The Sword Dies By The Sword.

2- If the person(s) behind the data or the code lives off their intellectual labor, you are ripping them off by using it without compensation. Sometimes the compensation is as little as simply citing them, just mention their names so that they get visibility and prestige they deserve for toiling in the intellectual field to produce the ideas and brain patterns you use and benefit from.

The whole thing is a huge mine field, digital reproduction of information and abstract structures is an extremely novel phenomenon that breaks tons of human intutions about how ideas and thinking work and spreads. But the involvement of a corporation allows you to shortcut the entire thing by invoking (1), also known as the fundamental theorem of ethics : Do Unto Others As You Wish They Do Unto You. Do corporations allow you to freely take and mix their intellectual produce and sell it back into them ? No ? then they DON'T get to do that either, except maybe among themselves.

What I find strange is how nobody talks about how inherently repulsive and ugly the "Copilot" philosophy is, how it is fundamentally a dead end and how much it betrays a lack of understanding of how programming works on part of those who fund and market it. Code is different from natural language, the fact that we call the symbols we write algorithms in "Programming Languages" is purely a historical incident. Code doesn't have the redundant resilience and error-correcting properties of natural language, removing or modifiying or adding even a tiny bit to correct code can give you atrociously-slow correct code, or full-of-security-holes correct code, or non-correct code, or any of the 3 mixed together with other disasters. If you're going to steal people's open source code, at least do somthing interesting and intelligent with it, don't be a lazy fuck and apply an NLP technique to a highly formal and rigid domain then smile smugly and charge people for it as if this going to end anywhere useful.

jazzyjackson · on June 23, 2022

If it was just published as a public good it would probably be as illegal as sci-hub

I consider the $10/m as a donation to the microsoft legal defense fund to allow free access to accumulated knowledge.

sirsinsalot · on June 23, 2022

To allow access to a service that grants you the accumulated knowledge's output in small bits.

I'm all for a world where these tools help developers, but i'm not here for a system that isn't open. I want to own my tools.

Copilot is a bit like musicians paying a monthly fee for access to a loop library. Except all the loops are rip-offs of other peoples hard work and there's no effort to compensate them.

If I made an AI that resampled music into derivative tracks ... you can be damn sure i'd be sued until my ears bled.

jazzyjackson · on June 23, 2022

> monthly fee for access to a loop library. Except all the loops are rip-offs of other peoples hard work and there's no effort to compensate them.

the analogy works if there were an open access library of music (restricted licenses tho they may be…) that was available to search and browse without the tool

then an auto-composer could suggest music to fill in gaps in my own composition, using snippets of audio from the otherwise freely available library

that's a plug-in I would pay for too, but yea if my "no commercial use allowed" melody made it into someone else's composition, I would want my license terms to be surfaced to them as well

except I personally wouldn't want to live in a future where every line of code has to have some claim of "who authored this function first" or "who wrote this melody and rhythm first", pursuant to licensing terms in perpetuity. that sounds terrible.

ece · on June 24, 2022

I'm all here for openness and tools you own, so there could be a FOSS implementation. Microsoft could just open it up and still charge the $10/mo for hosting the model, and I hope that happens.

Making the tool better without verbatim copying and making it more effective should be the priority, IMO. Trying to control it too much would be missing the point of the tool.

throwaway675309 · on June 24, 2022

"Except all the loops are rip-offs of other peoples hard work and there's no effort to compensate them."

Except all the loops are smaller pieces of larger loops which you as the developer than mix together in new ways to create your application. FTFY.

sirsinsalot · on June 24, 2022

Even if the sample was one snare hit. Someone worked hard to tune that snare, mic it, record it and process that sound.

They should be compensated.

spullara · on June 23, 2022

It absolutely adds to the common good in the form of people using it to write more open source code.

sirsinsalot · on June 23, 2022

Seeing as copilot is known to output code thats a straight copy from non-permissive code where the author's permission wasn't obtained ... I'd say it is helping you steal from code authors without giving back (as there is no obligation to open source your code).

Given Microsoft's record of persuing IP violations aggresively through the legal system, I'd say the whole thing is ironic.

monocasa · on June 23, 2022

The issue is that whether the free software people want it or not, the copyright system over code exists, and historically has been used as a cudgel against smaller players. If we got rid of copyright over code entirely I'd totally be down for this. And IIRC RMS has said the same thing; that he'd be in favor of the removal of copyright over code as a concept even if it meant neutering the protections of the GPL.

Until that happens, and copyright protections are still used by larger entities, using the same system to protect yourself and (more importantly) your users isn't turning your back on your ideals, but instead simply adjusting your strategy to the current material conditions. Remember that Google v. Oracle (while ultimately a win versus what could have been) was a step back, with de minimis claims left on the table as not a valid defense. The play field is heavily slanted towards the big players and software freedom requires every tool it can put it's hands on at the moment.

zzo38computer · on June 23, 2022

> The issue is that whether the free software people want it or not, the copyright system over code exists, and historically has been used as a cudgel against smaller players. If we got rid of copyright over code entirely I'd totally be down for this. And IIRC RMS has said the same thing; that he'd be in favor of the removal of copyright over code as a concept even if it meant neutering the protections of the GPL.

As someone else asked, I would also want a citation, but I agree.

Actually, I want a license that you can do pretty much anything you want to do with it (including: lack of attribution, distribution without source codes, distribution with source codes (whether they are the original source codes or reconstructed), lack of copyright notices, reverse engineering, circumvention of your own copy and write reports about anything you want to do, to use or not use the software (and to modify or not modify) at your choice, etc), but that you are not allowed to add further legal restrictions to it (with a few exceptions dealing with trademarks (but not all) and allowing conversion to GNU (A)GPL 3 and CC-BY-SA 4.0 if you are able to satisfy the conditions of those licenses) or to derivative works, and that if someone will try to use legal processes against you relating to this, then anyone can countersue.

Guid_NewGuid · on June 23, 2022

Interesting that he's said that, I wasn't aware.

I think at its root the problem is copyleft is a mirror image of copyright. It relies on and replicates all the cultural and legal requirements and constraints of the copyright model and curtails an imagining of other possibilities. Every sentence or thought spent on copyleft is misdirected in my view.

Which is why I find Microsoft doing this (potential) en-masse license violation and then a bunch of GPL folks getting mad pretty funny overall. I just find the high and mighty tone annoying, like sure, they've (allegedly) screwed you, but they're going to (theoretically) get away with it because they're rich and powerful, sorry that didn't turn out how you wanted.

Kbelicius · on June 23, 2022

>I think at its root the problem is copyleft is a mirror image of copyright.

That is the (only)point of copyleft. If it weren't for copyright it wouldn't exist. Fight fire with fire, that sort of thing.

rcxdude · on June 23, 2022

I don't think that's true: copyleft is right to repair for software. Even if the software is not copyrightable without the source code users are still relatively powerless. (Incidentally this is related to why patents were created: not to constrain or encourage innovation, but to get people to publish inventions instead of keeping them secret). If copyright were abolished and so too copyleft destroyed, linux users freedoms would probably materially go down, not up (though in general user freedom would marginally increase because most software is not copyleft).

imtringued · on June 24, 2022

Copyleft is formalized code sharing. Pretty much an excuse to tell people to share their code.

matheusmoreira · on June 23, 2022

> And IIRC RMS has said the same thing; that he'd be in favor of the removal of copyright over code as a concept even if it meant neutering the protections of the GPL.

Do you have a citation? I was under the impression he defended copyright because copyleft depends on it.

marpstar · on June 23, 2022

> Copy my MIT licensed code without attribution? I don't give a shit, go ahead, I hope it helps

This is my feeling as well. I don't build stuff in the open so that I can get bent out of shape at someone not properly licensing it. It's in a public repository, FFS... I assume that if anyone even notices my repo, that they may copy/paste a few lines out of my solution if it helps them.

cududa · on June 23, 2022

Exactly! Do they really think every single line of their code is so precious it requires attribution? If I publish code, I assume it might get pushed, pulled, refactored in a million ways and no one will ever know my name’s attached to it. And guess what? I DONT’T CARE. It’s code. Not a self-constructed monument to my own intelligence that needs a little placard with my name on it to follow around some clever async function I wrote

georgeecollins · on June 23, 2022

If its a couple lines of generic code, of course. That's also an indefensible copyright, btw. But if its hundreds of very specific likes of code written to do one thing under a license you don't follow, that's something else.

This isn't just an issue of code. You can write a program that combines songs, or combines novels creating a different work that has sections that are essentially the original protected work. I don't think the authors of those novels are going to be OK with you selling or giving away a version of their work just because an AI edited it or combined it somehow.

sirsinsalot · on June 23, 2022

But this isn't everyone's feeling. And they have a right to choose how their work is used. Thats the basis of commerce being possible here.

The mechanised license ignorance and the way original authors are not compensated is the issue.

If you had a repo you'd worked really hard on, and offered a commercial license or GPL depending on the use (so you can be funded to work on it) ... do you think it is fair that copilot ingests that code and allows others to benefit from your work and knowledge without the commercial license as you intended?

Note how Microsoft always throws out the capitalism "rules of engagement" when it benefits them and undermines everything else. The fact we are even trusting the situation Microsoft are creating is dire, and speaks to the short memory of our industry.

alar44 · on June 23, 2022

Saying an auto complete of a line of code is "using their work" is a massive stretch.

bayindirh · on June 24, 2022

When it's demonstrated that it can generate whole function bodies intact (fast inverse square root debacle), and autocomplete it with a wrong license, it's not a stretch anymore.

See: https://twitter.com/mitsuhiko/status/1410886329924194309

sirsinsalot · on June 23, 2022

It isn't autocompleting "a line of code", it completes whole function bodies.

georgeecollins · on June 23, 2022

You may not care about licensing or copyright, and I imagine many others who create code under an attribution license don't. That's still not the same as saying "copyright and licensing are bad." Too many businesses depend on them to exist for me to have that opinion.

If an AI takes a copyright work and makes its own version-- say combining two novels by popular authors in a way that is unique but keeps large parts of the text intact, can I sell that? I think if I were the authors I would be unhappy.

Also, how hard would it be for copilot to include a comment saying "// I got this line from x repo" when you are copying from a new repo? I am guessing not hard at all. Then at least the user would be aware of where their code was coming from and could be expected to make a judgement. If the line is "let a = b" then probably no worries. But if it is hundreds of lines of a simulation, all from the same repo with no changes, then I think some attribution is good for both parties.

Guid_NewGuid · on June 23, 2022

Don't get me wrong, I know this (copyright abolition) is pie-in-the-sky stuff. I'm using an anon account to post because even advocating for it could be troublesome for employment. But I don't accept we have to be meek or have small goals in talking about this ideological stuff. And I think this has made me realise why I find the Free Software vision so disappointing and weak. And hence why I find all these (ideologically) Free Software aligned takes of sending Billy to jail for a thousand years so irritating.

bayindirh · on June 23, 2022

> I find this whole topic very annoying, this is like the 3rd variation to reach the front page today.

Me too. I also find three iterations of the same subject not enough discourse. We need to take this matter more seriously.

> But it has made me realize why I instinctively dislike Free Software as a movement.

On the other hand, this whole discourse reminds me why I absolutely love Free Software as a movement.

> Copyright and licensing are bad, actually.

This is why we have "Copyleft".

> Stop getting into a frenzy of arousal about the police kicking down doors to drag Billy Gates to jail because 80 characters of fast square root is theft but 79 isn't.

And, stop getting into frenzy of arousal about being able to use any and every code piece you see elsewhere in any project regardless of its license.

> Where on earth is the ambition and vision!? Knowledge is public domain. A commons of knowledge is a public good. The cost of code copying is zero.

This is why GPL is important. It forces knowledge to evolve in the open, stay in the public domain and help it actually makes public good. It also doesn't hinder ambition and vision by not taking it to private domain, and keeping it open to everyone.

> Sure in our day job we have to pretend to care about this stuff. But when did the ideological scope of what can be achieved become rules lawyering over license text.

You might be pretending to care about this in your daily job, but we really care. Some of the projects I take part can't ever include GPL code (because the projects are MIT licensed). These texts are court-tested licenses, so they're as proper and serious agreements as the EULAs of "particular" software companies.

> Copy my MIT licensed code without attribution? I don't give a shit, go ahead, I hope it helps, in fact I want a truly public domain license but copyright law is so hostage to corporate interests no such thing exists in many countries.

If I want my code to be copied and possibly closed, I'll license it with MIT or BSD-0 and forget about it, but if I'm licensing my code with GPL3, it means I want that code to stay open. As a license, I expect anyone using that code to respect that license.

> Free the code.

Yes, and respect the license the author selected for his/her code.

gopiandcode · on June 24, 2022

> > Copyright and licensing are bad, actually.

> This is why we have "Copyleft".

This. Exactly. It's suprising how many developers have strong anti-copyleft/anti-GPL opinions while being completely uninformed on what they're talking about (but hey, I guess "uninformed but strongly opinionated" is HN in a nutshell). The purpose of GPL and other copyleft licenses is exactly to combat the insanity of intellectual slavery.

imtringued · on June 24, 2022

Pretty much, copyleft is turning copyright against itself.

mplanchard · on June 23, 2022

If that's what you want, you should license your code not under MIT, but under a license that allows replication/distribution without attribution. Meanwhile, others who do care about such things can license their code under licenses that require attribution/copyleft/etc.

Guid_NewGuid · on June 23, 2022

But I can't really because the legal systems for it don't exist. I can't relinquish anything https://softwareengineering.stackexchange.com/questions/1471... (CC0 looks closer but still doesn't do what I'm after).

And I can't because there are a bunch of, for want of a better word, dweebs who care about this stuff. I don't give a single solitary frick about the finer points of MIT vs GPL vs BSD 3 clause vs CC-BY-NC or whatever-the-hell. But y'all are forcing me to care by making the legal frameworks for software ever more strict and confusing.

I take a maximalist view, don't want the code copied, sliced up, re-used in any form whatsoever with no credit? Don't post it on a code sharing site. Like I say in the OP, in my job I obviously have to follow the rules, but on an ideological level I'll ignore them where I can get away with it outside of work.

If you don't want the code to be used, don't post it online,

tuckerman · on June 23, 2022

I'm curious if this view is software specific or relates to any work released online? For example, do you feel similarly about a novelist or graphic artist? I reckon at least a few software engineers look at what they produce not entirely differently from how an artist or writer looks at theirs.

Guid_NewGuid · on June 23, 2022

It's a good, and thought-provoking, question.

First to be flippant the idea of a software developer with that view sounds so unbearably insufferable and full of themselves I hope never to meet one. All code is terrible, be less attached.

Stream of consciousness: Should artists or writers be paid for what they produce? Yes. So why not software developers? I'm paid for what I produce. But then I don't release the stuff I'm paid for for free on the internet. But I'm against DRM, I also think Winnie the Pooh shouldn't have IP protection (now expired). What makes art or literature a different commons from software? I also think all scientific journals should be available for free. Do artists and writers have an alternative route to make money from what they publish, what is the artistic or writer equivalent of open source? I think this is the crux of it, if we're going to do open source let's actually do it and stop being precious about it but this only applies to freely-entered open source. So does that mean I support some form of copyright after all? Then again some old out-of-print books will sell for Amazon for like $4000 so we should be able to copy those for free.

Ultimately it's a question of what a vision for society without copyright would look like. I think software is uniquely placed to start exploring that idea. How would we make a living of software if anyone could reverse engineer (even our proprietary) code freely and safely?

tuckerman · on June 23, 2022

The reason I ask with writers in particular is because, like code, having access to it necessarily means that the viewer has the ability to copy it as much as they'd like. Unlike software, however, there is no ability to keep the source code private in a book while still having users.

I definitely agree that copyright protections have become far too strong but I don't think we can really ever know if we would have be able to build the strong open source community we have today without coopting the copyright system for copyleft protections. At the same time, perhaps we are past the point where it's necessary and now it's holding us back... it's entirely possible!

To the first thought, I personally see some coding as a creative act (some is doing _a lot_ of work there though). It's not because I fancy myself a Picasso but because I think some (again, doing a lot of work!) solutions/ideas have a bit of their creator in them and, for those works, the author should be able to exert some control over their works. I think this is more philosophical than legal/political, but I would disagree that its flippant :)

mplanchard · on June 25, 2022

You don’t need an “official” license, although I agree creative commons is closer to what you want. I feel like you can pretty easily write a license file that explicitly waives all of your rights and responsibilities. Such simplicity is after all what made MIT such a popular license, even though it’s not substantially different from Apache.

notacoward · on June 23, 2022

I suggest you read up on the history of free software and open source. It exists as a reaction to intellectual enclosure, to prevent that ill and create greater freedom of ideas. Yes, it uses the tools of copyright to fight greater ills of copyright, because those are the tools available, and actions like these are necessary to keep the enclosure from happening all over again. Anyone who has actually studied the matter for even five minutes can see how silly the "free software is anti-freedom" FUD is.

vajow46267 · on June 23, 2022

So glad this sentiment is becoming more common in the OSS community! I MIT license everything, if someone wants to make money using stuff I wrote that's awesome, and I wish them the best.

I don't think users owe me anything at all. If people want to PR back that's cool but if not that's cool too.

eikenberry · on June 23, 2022

There is a license for that, the MIT-0 or the MIT No Attribution License.

https://opensource.org/licenses/MIT-0

wcoenen · on June 23, 2022

> I want a truly public domain license

I think this sentence contradicts itself.

A "license" implies that there is a copyright holder who allows usage of the work under the terms of said license.

While "Public domain" implies that there is no copyright holder (e.g. because the copyright expired, was explicitly waived, or is for some other reason not applicable).

If you want to put your work in the public domain, you can do so; simply include a note saying that you dedicate it to the public domain.

Guid_NewGuid · on June 23, 2022

You're right that it does contradict itself, but the unfortunate situation is that public domain declarations don't work and would make it harder for people to use your code safely in the current licensing model. The closest options are Unlicense and CC0 afaict and both don't work in many European jurisdictions.

I just want people to be able to take my code and do whatever the hell they want with it (including commercially) and optionally contribute to it. Having a license currently makes that easier but every time the Free Software lot going zooming off into the weeds of GPL v3 versus GPL v2 versus LGPL my eyes roll back into my head and I internally start screaming "get a life!".

imtringued · on June 24, 2022

You use GPL for desktop apps, AGPL for webapps and services and LGPL for libraries. Who cares about the specific version, just pick one of them.

nonbirithm · on June 23, 2022

I think because this kind of ML is so new, we have no choice but to frame arguments for/against in terms of the structures that have been in place for decades past (copyright, open source licenses). We don't yet have the legal language to express dissent against ML in clear yes or no terms.

I think if there were an option to add a machine learning clause and ask individual creators if they wanted it applied in that context, we would see a considerable amount of uptake. It's just that we couldn't forsee this progress happening so soon, and the issue is still not visible enough. I think it's only a matter of time before the culture catches up and new creative works in the coming years are excluded from training sets by their authors with clear and direct language.

By that point there would be no way to argue "but they shouldn't care, they licensed it like this, so I'm assuming it's fine for ML use."

If copyright is not enough to stop another entity from using a person's data for training, then some other protection should be invented that does.

popcube · on June 24, 2022

because big companies want this, we absolutely will accept that company can get copyright from AI

Schroedingersat · on June 23, 2022

The problem with this is 'freeing the code' in this instance leads to microsoft building a wall around it and asserting complete control in a few years.

Copyleft exists for a reason and without the ongoing fight for the commons we lose it all.

nmfisher · on June 24, 2022

I totally agree, this reaction seems very hypocritical. If some rinky dink startup did exactly the same thing - as they are entitled to do under the licences of huge swathes of code on GitHub - hardly anyone would bat an eyelid. But just because it’s a Microsoft-owned company, it’s somehow verboten?

That seems totally inconsistent with decades of people clamouring for more openness/liberty when it comes to IP rights.

bayindirh · on June 24, 2022

Regardless of the size of the offender, if you're not respecting the terms of a license, you'll get pushback. It's natural.

If you're a company which executes Embrace Extend Extinguish on any technology you like yet don't own, you'll get quadruple amounts of pushback. That's normal too.

Microsoft isn't saint, and copilot is breaking a lot of legal, ethical and moral rules. It's doubly-natural to give reaction to this.

progman32 · on June 23, 2022

I see the free software movement as a variant on your ideals but rooted in practicality given the current environment.

Guid_NewGuid · on June 23, 2022

I think we share a lot of the same goals but they presuppose openness based on violence, if you don't do what their license says exactly then they're going to use lawyers and courts and the state's monopoly on violence to make you comply.

I think at a fundamental level this abandons any vision of a true commons since as copilot discussions reveal the well is now polluted (to mix metaphors) and though in some frames the code is more free you certainly won't be if you fail to pay the penalty levied in a civil case for misusing it.

imtringued · on June 24, 2022

That is true of any license.

kube-system · on June 23, 2022

> Free Software

> public domain

These are incompatible concepts. RMS's vision of 'free-as-in-freedom' software doesn't let people do whatever they want. It forces those who distribute binaries to also distribute source. This is not possible with a public domain work.

futureshock · on June 23, 2022

In this thread: many engineers nervously sweating. The moats are drying up and the wizards are about to be thrown out of the castle. This tech is the first product in a long line of products that will massively lower the barrier to entry. It has been a good run, but it was never going to last forever. We are not part of the capitalist class and were never going to be.

ThalesX · on June 23, 2022

The world might change, but software engineers have been working with and within change their entire careers presumably. I think we'll be OK, as people, no matter what happens.

I was sweating nervously before I started using Copilot awhile ago but I've stopped since because A - it really doesn't replace me, tried really hard; B - I don't sweat nervously for IntelliSense either.

There's also C, where being of an entrepreneurial mindset, I'd love the opportunity to hand over the software to an AI dev and just direct the implementation to my desire until I have a working product. I bet I could secure a higher room in the castle if instead of coding for 8 hours per day I could work on n products with capable AI Software Engineers. We're not there yet though.

imtringued · on June 24, 2022

Is this supposed to be a joke? You're arguing that software developers are being replaced by themselves because ML just takes in training data and is entirely dependent on real humans to provide that data. If anything, this will simply result in another productivity explosion where software developers will get paid even more.

LordDragonfang · on June 23, 2022

Copilot replaces code monkeys, not engineers. Ultimately it's just faster stack overflow, proper software engineers and system architects are going to be just as in demand as they are right now for the foreseeable future. At the point at which that stops being the case, we'll have much bigger societal and existential problems (because it implies the singularity is nigh)

(You're correct on not being part of the capitalist class, though)

futureshock · on June 23, 2022

There are a lot of code monkeys out there and I might be one of them. That island of job security seems like it will be shrinking.

account42 · on June 24, 2022

I don't agree that we are that close to that (or that Copilot is a significant contribution to bringing it closer) but ultimately eliminating mindless jobs is a good thing. The problem only comes from the expectation built into our current society that people need a job in order to be allowed to survive. Or to put it another way, the profit from automating away jobs en masse should be shared with the whole of society, not privatized.

notacoward · on June 24, 2022

> the wizards are about to be thrown out of the castle

You have completely misunderstood who the moat-building wizards are. That's proprietary software. Heard of it? I ask because a lot of young people nowadays don't seem to understand how dominant it used to be and the threat that it represented. (Plus a few older folks who never knew, forgot, or deny reality for other reasons.) We've been trying to throw the wizards out for decades, by making code available to everyone and making sure it stays that way via licensing. Code without a license is subject to re-enclosure as important enhancements - even necessary ones, such as security - are made behind locked doors. The open version becomes out of date, the proprietary one wins, we're back to wizards and moats. What Microsoft is doing is the same thing for code that was supposed to have legal protection so it could remain open and avoid that fate. It's taking magic back from the people and making it exclusive to the "capitalist class" (eye roll) again.

ssalka · on June 23, 2022

Information wants to be free

VoodooJuJu · on June 23, 2022

It is now proven that copilot returns code from codebases with non-permissive licenses [1].

I'm curious - what are the legal implications of this going forward? I've so many questions.

1. Will Microsoft ever face lawsuits for these license violations?

2. If so, who/how? Class-action?

3. Will copilot be forced to open-source in the future? Under which license? Some open source licenses are incompatible with others, but copilot uses code from probably every OSS license conceived.

4. If Microsoft faces no justice, will we start seeing more OSS license violations? Will Google start using AGPL-licensed code?

[1] https://news.ycombinator.com/item?id=27710287 | Copilot regurgitating Quake code

mhaymo · on June 23, 2022

That regurgitated code exists on Github exists under an MIT license: https://github.com/jethrodaniel/fast_inv_sqrt

"jethrodaniel" does not appear to have the copyright to offer that license, but it's hard for Github to determine that in general, so I doubt they would be liable for the error.

monocasa · on June 23, 2022

Even if it's somehow available under an MIT license (which is questionable on the part of jethrodaniel), there's still infringement. MIT isn't public domain, it still has

> The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

Replicating it without complying with those terms is still infringement.

sirsinsalot · on June 23, 2022

this. People are being willfully blind here, like cult members looking dead-eyed at their leader and chanting "This is great" as they drink the kool-aid.

And from Microsoft no less, once outcast for mass poisoning.

vorpalhex · on June 23, 2022

> but it's hard for Github to determine that in general, so I doubt they would be liable for the error.

Please insert that meme, "That's not how that works. That's not how any of this works!"

The legal system is permission based, not forgiveness or "I didn't know" based.

minhazm · on June 23, 2022

Actually the legal system is evidence based. Microsoft has evidence that the code they are producing is licensed under MIT as far as they can reasonably know. There's no definitive way to know that who actually owns the original copyright. I could grant permission to use my repo, but maybe I got that code from someone else, who then got it from someone else and so on and so forth. It's a similar situation with stolen goods, if you unknowingly purchase stolen goods you usually cannot be charged for theft as long as there aren't obvious signs that it's stolen such as the goods being priced far below market value.

sammax · on June 23, 2022

Microsoft has evidence that the code they are reproducing is MIT licensed, so are they intentionally violating that license or does this AI thing include the license and attribution in every snippet it generates?

monocasa · on June 23, 2022

Major aspects of copyright infringement are strict liability, like a lot of civil actions around damages. It doesn't matter if you thought it was OK, there's still a damaged party that needs compensation according to the law. At best you'll simply avoid the criminal and punitive penalties.

BaculumMeumEst · on June 23, 2022

Exactly, that's why Pornhub hasn't had any liability issues arising from where its content comes from either. It's just too darned hard to tell.

monocasa · on June 23, 2022

No, PornHub doesn't have liability in a lot of cases because of 17 § 512, but has still had to deal with liability in general, which is why they nuked some 80% of their library not backed by verified individuals a while back.

https://www.law.cornell.edu/uscode/text/17/512

A huge part of 17§512 is the DMCA takedown process mainly in 17§512(c)(3). Does Microsoft even have the ability to truly remove training data from the model? Or do they have to retrain on each DMCA takedown?

Flimm · on June 23, 2022

I personally don't want to have to upload proof of identity to GitHub and a signed document swearing that I own the copyright to all the code I upload to GitHub, or proof that I coded it. We need to be careful what we wish for.

vorpalhex · on June 23, 2022

Excerpt from the MIT license:

> THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

concordDance · on June 23, 2022

If they had a reasonable basis for believing they had a license they're in the clear. "I didn't know" might not be enough but "I had good reasons to think otherwise" is.

vorpalhex · on June 23, 2022

> If they had a reasonable basis for believing they had a license they're in the clear.

False.

If they committed copyright infringement, even if they genuinely believed they weren't, they are not in the clear. They still owe damages.

concordDance · on June 24, 2022

Can I have a citation?

vorpalhex · on June 24, 2022

https://www.traverselegal.com/blog/accidental-copyright-infr...

https://revisionlegal.com/copyright/what-is-accidental-or-in...

mrh0057 · on June 23, 2022

I’m not a lawyer but my understanding these are torts so all you have to prove is Microsoft has liability. I think this would be easy to prove due to the way neural networks work since it’s just a way of performing a search.

Since it’s a tort I don’t think you have to prove they should have know it would return copyrighted code, the fact that it does is enough to have liability.

jsiaajdsdaa · on June 23, 2022

That doesn't stop youtube from blasting people away over copyright issues?

On youtube, video uploads are a cost center, whereas on github, code is a profit center

542458 · on June 23, 2022

IANAL. My understanding is that the general legal precedent in the US is that a) datamining text has no copyright implications (in the same way that reading a book has no copyright implications) and b) it is not a copyright violation to use a small amount of copyrighted material provided the context is sufficiently transformative. This might seem silly or unfair to you, but that is the current legal reality.

But even ignoring that, everybody uploading code to GitHub has given GitHub the right to analyze that code as per the GitHub ToS. This is the same mechanism by which you can't upload code to GitHub with a license that says "nobody is allowed to display this code on the internet" and then sue GitHub.

aposm · on June 23, 2022

I can't imagine a scenario in which any lawyer would consider granting Github the right to "analyze" code anywhere close to granting Github the right to spit out that same code verbatim without your copyright notice (even if laundered by AI).

542458 · on June 23, 2022

Here's Kate Downing, an IP lawyer specializing in software license:

> According to Downing, the answer depends to a certain extent on where that code is hosted. If it’s on GitHub, there very clearly would not be copyright infringement.

> “If you look at the GitHub Terms of Service, no matter what license you use, you give GitHub the right to host your code and to use your code to improve their products and features,” Downing says. “So with respect to code that’s already on GitHub, I think the answer to the question of copyright infringement is fairly straightforward.”

Downing cautions that copilot output of large chunks of code complete with comments are more questionable to use, but that for the most part it looks above board.

https://fossa.com/blog/analyzing-legal-implications-github-c...

Here's an English lawyer on the same topic...

> The licence is broadly worded, and I'm confident that there is scope for argument, but if it turns out that Github does not require a licence for its activities then, in respect of the code hosted on Github, I suspect it could make a reasonable case that the mandatory licence grant in its terms covers this as against the uploader.

https://decoded.legal/blog/2021/06/github-copilot-initial-th...

Engineering-MD · on June 23, 2022

To me regardless if it is technically legal, it certainly doesn’t feel right. Furthermore, contracts rely on people understanding what they are agreeing to, and I don’t think many developers would agree to letting the code be used outside the terms of the license they uploaded it under.

I am very surprised there hasn’t been a legal challenge to it.

mynameisvlad · on June 23, 2022

What, exactly, is there to challenge?

“I’m sorry your honor I didn’t understand what I was signing” I don’t think has ever been a valid reason in a courtroom, similar to “I’m sorry I didn’t know I was committing a crime” is not a valid defense.

ghusbands · on June 23, 2022

Courts interpret the intended and understood meaning of contracts and terms all the time. Research the term "meeting of the minds" and case law around it.

When the terms were written, it's exceedingly unlikely that they intended it or anyone understood it to be blanket permission to allow a trained AI to copy code for others and no user would have interpreted it that way. Microsoft/Github can't necessarily unilaterally increase the intended range without making it clear in the terms.

If it got to a court case, and both sides could afford it, it could be a lengthy one.

(This comment is not legal advice. I am not a lawyer.)

mynameisvlad · on June 23, 2022

How does "[allowing] a trained AI to copy code" change the interpretation of the ToS?

By uploading your code, you give Github an exclusive license to use it to improve their services. Copilot is such a service. Just because it's an AI and it provides others code does not somehow invalidate the license you gave.

ghusbands · on June 24, 2022

Again, research "meeting of the minds". It's a standard legal term directly relevant to all contracts and terms. Also, "transparency" is another important one.

Many online services have very wide terms around what they can do with your data, which most people who bother to read them interpret as being what is required for them to handle the service for you without breaking copyright law. In that context, being able to use and analyse your data to improve their services could be another catch-all that lets them do specific performance optimisation on their backend.

One party instead deciding they've got blanket permission to do whatever they like with your work, including selling it to others, may well not hold up in court.

Contracts aren't programs and one party tricking the other rarely works out in court - courts world-wide tend to rule against trickery and deception.

BaculumMeumEst · on June 23, 2022

> “If you look at the GitHub Terms of Service, no matter what license you use, you give GitHub the right to host your code and to use your code to improve their products and features,” Downing says. “So with respect to code that’s already on GitHub, I think the answer to the question of copyright infringement is fairly straightforward.”

That's assuming that all code on GitHub is uploaded in good faith by the copyright owner, which is not always going to be the case.

zerohp · on June 23, 2022

Many repositories on Github were put there by people that do not own the copyright and never agreed to GitHub's Terms of Service.

Linux, for example, does not require copyright assignment. The original contributor of a change owns the copyright for that code and may have never used Github.

concordDance · on June 23, 2022

There's also one more question:

5. Even if it is illegal, is it actually bad? No one can possibly sell code snippets, the transaction costs are many orders of magnitude greater than any reasonable price. In my opinion, at least in this case the benefits massively outweigh the costs and the law should not apply here.

xtracto · on June 23, 2022

I really, REALLY like the idea of Copilot. I think it is a glance at what the future of AI can bring to improve programming. I understand where all the litigation and "uneasiness" is coming from, both from commercial and open-source projects.

I've not installed or used it for the same reason (don't want to use AGPL or GPLd code by accident, and don't want my closed source code to be used accidentally as well), but the thought of Copilot being "killed" due to litigation/copyright/licensing issues is sad.

For me, It's kind of like when MP3 first appeared: Sharing music in Napster or downloading Mp3s from Geocities was just amazing. The idea of having such things at your fingertips. Even though I understood the issue the authors had with the unpaid distribution of their music... still, the idea of "what could be..." made it amazing.

I guess Microsoft could be a bit forward thinking, and implement the "Spotify" model in code: Pay OpenSource developers (whoever owns the repo, or whoever made a commit?) a small amount whenever their code gets used through Copilot.

I'm super excited by how "Copilot" related services will look like in 10 years. And I really really hope that the technology/idea doesn't get killed by litigation.

PaulKeeble · on June 23, 2022

Microsoft could have trained this on their own code and there would be no issue. The problem is instead of doing that they knew full well the approach would reproduce the code and they decided they would rather breach GPL than expose their own code. But I bet Microsoft has more than enough lines to train an AI, there was a clear choice to breach other peoples licenses in preference.

frazbin · on June 23, 2022

Huh... These comments have given me an idea: MS needs to be forced to train a model to compensate (pay) code authors and codebases based on snippet suggestions given by their tool: the Spotify model replacing Napster!