So this makes it official... this post[0] and the comments on the announcement[1] concerned about licensing issues were absolutely correct... and this product has the possibility of getting you sued if you use it.
Unfortunately for GitHub, there's no turning back the clocks. Even if they fix this, everyone that uses it has been put on notice that it copies code verbatim and enables copyright infringement.
Worse, there's no way to know if the segment it's writing for you is copyrighted... and no way for you to comply with license requirements.
Nice proof of concept... but who's going to touch this product now? It's a legal ticking time bomb.
I run product security for a large enterprise, and I've already gotten the ball rolling on prohibiting copilot for all the reasons above.
It's too big a risk. I'd be shocked if GitHub could remedy the negative impressions minted in the last day or so. Even with other compensating controls around open source management, this flies right under the radar with a c130's worth of adverse consequences.
Do you also block stack overflow and give guidance to never copy code from that website or elsewhere on the Internet? I'm legitimately curious - my org internally officially denounces the copying of stack overflow snippets. Thankfully for my role it's moot as I mostly work with an internal non-public language, for better or worse, and I have no idea how well that's followed elsewhere in the wider company.
It appears that the code that copilot is using is created under a huge variety of licenses, making it risky.
On the other hand, a small snippet in a function that is derived from many existing pieces of other code may fall under fair use, even if it is not under an open source license of some sort.
Stack Overflow and Copilot are similar. Usage of both routinely violates licenses. Stack Overflow content is licensed under CC-BY-SA. Terms [1]:
* Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
* ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
In over a decade of software engineering, I've seen many reuses of Stack Overflow content, occasionally with links to underlying answers. All Stack Overflow content use I've seen would clearly fail the legal terms set out by the license.
I suspect Copilot usage will similarly fail a stringent interpretation of underlying licenses, and will similarly face essentially no enforcement.
The difference here is that it's hard to sue a company for sporadic, difficult to track down usages of SO content written by their own engineers.
One can now trivially coerce copilot to regurgitate copyrighted content without attribution. Copilot's basic premise violates the CC-BY-SA terms, and this will continue until no party can demonstrate a viable method of extracting copyrighted code.
There is now a single party backed by a company with a 2 Trillion dollar market cap that can be sued for flagrant copyright violations.
I would think it's more complicated when the tool is the thing spitting out the verbatim copies of code. Both the tool and the developer are independently distributing copyrighted code that neither of them have the rights to distribute.
why? one could easily claim that if the tool is reproducing the contents of copyrighted works they are a "distributor". Subjecting the makers of the tool/distributor too much higher copyright infringement claims.
Let's differentiate legal risk by the party it affects:
* Companies with engineers using Copilot. Risk here is negligible, like that of copying Stack Overflow answers, or any code that isn't under a truly permissive license like CC0 [1]. Prohibiting use of Copilot in a company based on this risk has no merit.
* GitHub and Microsoft. Risk for them is higher yet worthwhile. Copilot is more like Stack Overflow than Napster. Affected copyright holders added their works to GitHub and agreed to their terms, so GitHub has a legal basis to show that content in Copilot. In terms of facilitating copyright infringement, far more violations occur by engineers manually searching and copying code on GitHub; lawsuits against GitHub due to that would be dismissed. Determining provenance is slightly harder in Copilot than in search, but GitHub could minimize risk to itself by noting in Copilot terms that users must review Copilot's suggestions for underlying license concerns. Engineers rarely will -- they routinely violate licenses of Stack Overflow and code copied from elsewhere -- but that shifts responsibility from GitHub, and legal risk to companies using Copilot remains negligible.
In addition to other licensing gotchas, a ton of SO snippets are copied wholesale from elsewhere—docs or blog posts. So it's pretty likely that the poster can't license them in the first place because they never checked the source's license requirements.
Except that CC-BY-SA is not a permissive license; the SA part is a form of copyleft. It's just that nobody enforces it. From the text [1]:
- "[I]f You Share Adapted Material You produce [..] The Adapter’s License You apply must be a Creative Commons license with the same License Elements, this version or later, or a BY-SA Compatible License."
- "Adapted Material means material [..] that is derived from or based upon the Licensed Material" (emphasis added)
- "Adapter's License means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License.'
- "You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, Adapted Material that restrict exercise of the rights granted under the Adapter's License You apply."
A program that includes a code snippet is unquestionably a derived work in most cases. That means that if you include a Stack Overflow code snippet in your program, and fair use does not apply, then you have to license the entire program under the CC-BY-SA. Alternately, you can license it under the GPLv3, because the license has a specific exemption allowing you to relicense under the GPLv3.
For open source software under permissive licenses, it may actually be okay to consider the entire program as licensed under the CC-BY-SA, since permissive licenses are typically interpreted as allowing derived works to be licensed under different licenses; that's how GPL compatibility works. But you'd have to be careful you don't distribute the software in a way that applies any Effective Technological Measures, aka DRM. Such as via app stores, which often include DRM with no way for the app author to turn it off. (It may actually be better to relicense to the GPL, which 'only' prohibits adding additional terms and conditions, not the mere use of DRM. But people have claimed that the GPL also forbids app store distribution because the app store's terms and conditions count as additional restrictions.)
For proprietary software where you do typically want to impose "different terms or conditions", this is a dead end.
Note that copying extremely short snippets, or snippets which are essentially the only way to accomplish a task, may be considered fair use. But be careful; in Oracle v. Google, Google's accidental copying of 9 lines of utterly trivial code [2] was found to be neither fair use nor "de minimis", and thus infringing.
Going back to Stack Overflow, these kinds of surprising results are why Creative Commons itself does not recommend using its licenses for code. But Stack Overflow does so anyway. Good thing nobody ever enforces the license!
Yes. In a past life, after researching the situation, we had to find and remove all the code copied from Stack Overflow into our codebase. I can’t fathom why SO won’t fix the license.
What makes it even worse is if you try to do the right thing by crediting SO (the BY part) you’re putting a red flag in the code that you should have known you have to share your code (the SA part).
Who really copies stack overflow snippets verbatim? It's usually just easier to refer to it for help figuring out the right structure and then adapt it for your own needs. Usually it needs customization for your own application anyway (variables, class instances, etc).
I don't think I've ever copied code directly from any of the Stack* sites. I generally read all the answers (and comments) and then use what I learn to write my own (hopefully better) code specific to my needs.
Ha! Well, I think a lot of people copy code from StackOverflow verbatim once at least - including me.
Of course it turned out the code I'd blindly inserted into my project contained a number of bugs. In one or two cases, quite serious ones. This, even though it was the accepted answer.
It was probably more effort to fix up the code I'd copy pasta'd than write it from scratch. Since then I've never copied and pasted from StackOverflow verbatim.
I've copied plenty of Microsoft sample code verbatim, because the Win32 API sucks and their samples usually get the error handling right.
But, I can't think of a single scenario where I've copied something from Stack Overflow. I'm searching for the idea of how to solve a problem, and typically the relevant code given is either too short to bother copying, or it's long and absolutely not consistent with how I want to write it.
"Too short to bother copying"? I copy single words of text to avoid typing and typos. I would never type out even a single line of code when I could paste and edit.
> "Too short to bother copying"? I copy single words of text to avoid typing and typos. I would never type out even a single line of code when I could paste and edit.
Very honest suggestion: learn how to touch type. You can still copy if needed, but your typed input will be much faster.
I'm somewhere between 45-75 wpm. But Ctrl+C Ctrl+V can type 300wpm!
Typing when you could paste is like having that Github Copilot put the right sentence right in front of you and you decide to type over it instead. Not only does it feel like wasted and robotic effort, typing everything leads to RSI.
I'm not sure why people disagree. Another symptom is that I insist on aliases for everything while others type out all the commands every time. Maybe I get distracted by the words when I type and lose my train of thought?
You need to highly the correct test first and move the curser to the correct location to paste text. I bet you can type 123123 several times faster than you can highlight that text in this comment and past it into a reply.
Same here. I copy boilerplate code for new projects etc. regularly. But I don't remember copying anything verbatim from SO. Function, argument and variable names rarely fit the scheme used in the particular project I'm working on at that moment and usually I do a better job at adapting the code thinking what I'm doing rather than just copy and paste and then wonder what went wrong.
I think I did a few times, usually for languages that I wasn't going to spend to much time with (so no benefits in figuring how to do it from the answers) and for specific tasks.
Anything posted to Stack Overflow has a specific (Creative Commons IIRC) license associated with it. The same is not true of GitHub Copilot, and in fact their FAQ doesn’t specify a license at all, probably because they are technically unable to since it is trained on a wide variety of code from differing licenses (and code not written by a human is currently a grey area for copyright). The FAQ simply says to use it at your own risk.
I tried Googling this and couldn't find it. I also don't want to believe it because it seems like the world suddenly turned into an apocalyptic hellscape with no place for developers like me. Do you have a source?
First, I work at Google and its onboard training explicitly mentions Stack Overflow as a forbidden example due to CC-BY-SA license (SA is the problematic part). The following link is the official reference.
I don't have a source to link, but I've also been told this by someone who works at Google. Is copy-pasting stuff verbatim from SO really that much of a thing? I use SO plenty, but have never considered taking anything verbatim.
That's actually an attack vector: mirror SO using their open-sourced DB and inject malware into the suggestions, or change the text before it enters the clipboard. People blindly copy/pasting aren't going to notice.
Same here. I’ve directed our teams and infra managers that we must be able to block the use of copilot for our firm’s code.
Id be very surprised if the other large enterprises that I have worked at downs doing exactly the same thing. Too much legal risk, for practically no benefit.
No-one cares about this. People have no clue about licenses and just copy-paste whatever. If someone gets access to their code and see all the violations they're screwed anyway.
Obviously they aren't, but just as obviously, "the legal department didn't review this, therefore it's safe to assume it's legal" would not pass muster with said legal department. :) Kiro's comment ("if someone gets access to their code and sees all the violations they're screwed anyway") is probably technically accurate, even if in practice you're unlikely to get caught. As other people have noted elsewhere in the comments here, the Google v. Oracle case over Java definitely suggests that verbatim copying of just a few lines, even for trivial functions, is enough to get you in trouble if those lines aren't licensed in a way that lets you do that.
This is absolutely not true. While some individuals might not care and might not always conform to their companies' policies, most companies have policies, and most employees are aware of and mindful of these policies.
It's absolutely the case that before using certain libraries, most engineers in large corporations will make sure they are allowed to use that library. And if they don't, they are doing their job very badly IMO.
This kind of sucks honestly, copy and pasting without understanding has lead to all sorts of issues in IT. Not to mention legal issues as mentioned by another reply.
Not only this but a huge amount of publicly available code is truly terrible and should never really be used other than a point of reference, guidance.
I think that proper coding assistant should help with not writing code (and I stress that it is "not writing code") - how to rearrange your code base for new requirements, for example.
Code not written does not have defects, does not need support and, as you point it out, is not a liability.
The practical utility will outweigh the legal concerns. Engineers using this are going to be more productive and this is a competitive advantage that companies won't eschew.
If the legal concerns are well-known, then what you are describing might be viewed as criminal negligence (at worst) and or insufficient duty of care (at best). Such engineers should be held fully responsible and accountable for their actions.
That's optimistic. The people who would rely heavily on this sort of thing are going to be the worst at detecting what a "bad autocomplete result" would look like. But even if you are capable of judging that you've got a good one, it still doesn't inform you of the obvious potential licensing issues with any bit of code.
Surely somebody working on this project foresaw this problem…
If they get rid of licensed stuff it should be ok no? I really want to use this and seems inevitable that we'll need it just as google translate needs all of the books + sites + comments it can get a hold of.
There's not many licenses that let you reuse code without including the same headers / licensing blurb. You're in public domain, non-copyleft territory. WTFPL etc.
There is no such thing as properly licensed code because it is a function of the what is legally acceptable for your company and what it intends to do with the work.
Unlicensed code just means “all rights reserved.” You’d need to limit it to permissively licensed code and make sure you comply with their requirements.
Which licenses would it be ok that the training material is licensed under, though? If it produces verbatim enough copies of eg. MIT licensed material, then attribution is required. Similar with many other open source-friendly licenses.
On the other hand, if only permissive licenses that also don't require attribution is used, well, then for a start, the available corpus is much smaller.
Can you even trust that the License in a random repo is accurate and expresses the actual copyright of all the contained code?
I guess my point is, you can't be positive that even if you're following the license in a repo you forked that the repo owner hasn't already violated someone else's license, and now transitively, so have you.
> Can you even trust that the License in a random repo is accurate and expresses the actual copyright of all the contained code?
In fact, that seems to be exactly the problem shown in the tweet - someone copy-pasted the quake source and slapped a different license on it, and copilot blindly trusted the new license.
Yes: not all code on GitHub is licensed in a way that lets you use it at all. People focus on GPL as if that were the tough case; but, in addition to code (like mine) under AGPL (which you need to not use in a product that exposes similar functionality to end users) there is code that is merely published under "shared source" licenses (so you can look, but not touch) and even literally code that is stolen and leaked from the internals of companies--including Microsoft!... this code often gets taken down later, but it isn't always noticed and either way: it is now part of Copilot :/--that, if you use this mechanism, could end up in your codebase.
If you publish the code anywhere, potentially. You could be (unknowingly) violating the original license if the code was copied verbatim from another source.
How much of a concern this is depends heavily on what the original source was.
And the problem with copilot is that you have no way of knowing. If it changes even a little bit of the code, it's basically ungoogleable but still potentially in violation.
Distributing binaries to third parties is enough to trigger a license violation. For internal corporate tools, it would be less of an issue as "distribution" hasn't happened.
> The technical preview includes filters to block offensive words
And somehow their filters missed f*k? That doesn’t give a lot of confidence in their ability filter more nuanced text. Or maybe it only filters truly terrible offensive words like “master”.
In my testing of Copilot, the content filters only work on input, not output.
Attempting to generate text from code containing "genocide" just has Copilot refuse to run. But you can still coerce Copilot to return offensive output given certain innocuous prompts.
Ahh, so it's the most pointless interpretation of the phrase "filters to block offensive words", where it is stopping the user from causing offense to the AI rather than the other way around.
I believe the concept is to stop users from prompting the AI to generate offensive stuff specifically, and then publishing the so-generated stream of offensive stuff as negative PR for GitHub, in the same way the generated stream of offensive stuff coming from Microsoft’s AI was a big PR disaster.
I suppose you’re referring to the AI Twitter bot that initially was very lovely and within a day 4chan had turned into a nazi. That was both very naive and hilarious.
The big difference in this case, however, is that this AI was constantly learning based on user input, however, which I do not think is the case for Copilot.
Copilot is indeed constantly learning based on user input (as detailed here [1]) but it seems to be more high-level ("did the user accept or deny suggestion XYZ" and potentially what changes you make to suggestions after accepting) versus just dumping everything directly back into the model a la Tay.
They probably don't want to repeat Microsoft's incident with Tay, though they seem to have created their own incident which dooms the product if it wasn't already
Interesting how this continues to be an issue for GPT3 based projects.
A similar thing is happening in AI Dungeon, where certain words and phrases are banned to the point of suspending a users account if used a certain amount of times, yet they will happily output them when it is generated by GPT3 itself, and then punish the user if they fail to remove the offending pieces of text before continuing.
Lol, how does that make any sense? I mean, all these word blacklists are always pretty stupid, but at least you can usually see the motivation behind them. But in this case I'm not even sure what they tried to achieve, this is absolutely pointless.
Changing master to main was something Github did when they were taking heat for their contract with ICE. It was a nice bit of misdirection that cost them nothing, achieved nothing and garnered praise in some quarters.
ICE, of course, runs an actual concentration camp which has a slightly more troublesome history than the word master.
Language policing is to racism what recycling is to global warming - an attempt to shift the focus away from elite responsibility for systemic issues to "personal responsibility" and forestall meaningful reform by placing emphasis on largely non-threatening symbolic gestures.
I get what you mean, but in a discussion about semantics it might be unhelpful to dilute the term "concentration camp", especially if prefixed with "actual" in italics. That is unless you actually mean that ICE camps serve the same purpose and are equivalent to nazi concentration camps.
Nazi “concentration camps” were not actual concentration camps (a thing which long predates the Nazi camps), they were extermination camps for which “concentration camp” was a minimizing euphemism.
US WWII “internment” and “relocation” centers were actual concentration camps (“relocation center” was itself a euphemism, but “internment” referred to a formal legal distinction impacting treaty obligations.)
Sure, but I don't know if I've ever heard anyone use the term "concentration camp" without qualifiers to refer to anything else than the nazi concentration camps (or something equivalent).
If someone says that something is "_literally_ a concentration camp" I think that most people will think of ovens and genocide.
Perhaps it's a regional thing, but that is how I interpreted it.
It's not so much a regional as a political thing. Want it to sound worse? Use concentration camp. Want it to sound better? Use internment camp (or in some cases, re-education facility).
The Nazis ran what would more accurately be termed extermination camps.
Though what they did certainly bore a strong resemblance to the Boer war concentration camps/manzanar,etc. whose purpose was to "concentrate" people into one place rather than industrially slaughter them.
I don't know if I've ever heard anyone use the term "concentration camp" without qualifiers to refer to anything else than the nazi concentration camps (or something equivalent).
Maybe it's just me, but I think it would have been more clear if you said internment camp if your intent was to refer to the broader context and not invoke a comparison to nazis.
Where it also makes the point that the nazi camps were primarily extermination camps.
Maybe take it up with them and get back to me if you feel truly passionate about this issue.
>Maybe it's just me, but I think it would have been more clear
Gosh, it's awfully ironic that this sentence would happen in a thread about how language policing is used as a distraction from important issues.
Is it more important to you how people use the term concentration camp or the fact that ICE lock up children in internment/concentration/[ insert favorite word here ] camps?
> So, is it more important to you how people use the term concentration camp or the fact that ICE lock up children in internment/concentration/[ insert favorite word here ] camps?
Well, that escalated quickly.
I don't think I ever said anything for or against what ICE is doing, in fact I tried not to because the only thing I wanted to say was that when using the words "literally concentration camps" people might read that as "camps designed to kill people" since that is the way I've been taught it (in history classes) and heard it (in general use).
I don't even live in the US so I have no say in this in a democratic sense. If I did I'd be against the way migrants are treated and want more humane treatment, but I don't think that should be relevant to what I said.
You seem to think I have some political motive, I don't. I just saw a comment that from my perspective and historical education seemed to equate two things that I regard as different and said that it might be helpful to not conflate those. It seems like you did not intend to conflate them and it is a difference in what you and I read into the term "actual concentration camp".
From my perspective this conversation is as if someone said "working for XCompany is actual slavery" and I said "Perhaps don't use 'actual slavery' as a term for something that isn't that?"
Can’t speak for parent, but when I describe the immigration jails as “concentration camps” I do have a political motive. There was a political motive in calling them “immigration facilities” in the first place. I simply want to call them for what I believe they are. When describing the nazi camps I say “death camps”, as to not conflate the infinitely worse horror of the nazi camps.
There is usually a political motive behind what controversial things are called. There was an active push from the oil lobby swap out the terms for the climate disaster from “global warming” to the more innocent sounding “climate change”. Then recently some media companies made the political decision to start using the term “climate disaster” or “climate crisis”.
> working for XCompany is actual slavery
I don’t think this is equivalent (even though it sounds like it to your ears). The ICE facilities can accurately be described as concentration camps. The victims are kept against their will—i.e. imprisoned—in camps in dedicated camps. This is an accurate term. Slavery only applies when you are forced to work for little or no salary. I.e. I often use the word actual slavery when referring to prison labor. This is a political decision on my part. And you are free to criticize this choice of word. And you would be right to say it diminishes the term when compared to the horrible cattle slavery in the Americas until the 19th century. But it is still an accurate term.
y'know it really seems like both purpose and outcome need to be closely examined here, if we're going to be emphasizing actual next to concentration camps.
what's the paradigm of a concentration camp? if we go straight for Auschwitz we'll get nowhere, how about the Boer concentration camps? Origin of the term after all.
What was the purpose? To concentrate the Boer population during a total war against them, so they couldn't supply and hide the belligerents.
What was the outcome? Tens of thousands of preventable deaths, mostly from disease. Success in the war, from the British perspective.
So, let me turn my spectacles to your example of, may I quote?
> an actual concentration camp
Which appears to be a migrant detention center. To put it succinctly, migrants who enter the country without filling out paperwork, and get caught, end up in one of these places for months-to-years while USG figures out what to do with them.
So a Boer concentration camp is filled by the British riding into a farmstead or town, kidnapping the women and children, and driving them out to a field and sticking them in a tent. A migrant detention center is filled with someone enters the United States without following the rules which govern that sort of behavior, and then, gets caught.
Where is the war?
Where is the excess death?
Ah well. I'm out of time and patience to express my contempt for your abuse of language and disrespect for the real horrors which you cheapen with this kind of facile speech.
Your vacuous argument about what is an _actual concentration camp_ is out of place. This wasn't a discussion about concentration camps, it was about github's attempted misdirection, and their facetious show of supporting inclusion, by eliminating the term "master".
Central America, where the US has a history of funding politically aligned factions in conflict, contributing to today's instabilities which drive people to our border.
> Where is the excess death?
Mostly Central America, but keeping people in crowded stressful conditions during a pandemic can probably account for a few more.
> To put it succinctly, migrants who enter the country without filling out paperwork, and get caught, end up in one of these places for months-to-years while USG figures out what to do with them.
The US first funds wars in these people's home countries, then refuses to let them in when they want to live in a safer country, then rounds them up and puts them in camps because they came in anyway. That sounds like a concentration camp to me, even if the death rate is lower than other instances of the same thing.
Pedantic attacks on word usage is fun, though. Let me try it:
> ICE, of course, runs an actual concentration camp which has a slightly more troublesome history than the word master.
The history of the word "master" includes the trans-Atlantic slave trade and the institution of slavery in the US, a well known and extremely long running atrocity. That doesn't seem less troublesome than current ICE activity.
But if you consider what is being said, instead of seizing on exact wording, you can see the points that the action of running these camps is more important than the choice of word used to describe a code repo, and that as a form of concentration camp, the camps are bad - not that slavery wasn't a big deal (my facetious straw man), and not that the US are the British and the detainees are the Boers.
A few responses seem to focus on two of the words chosen in that post, and ignore this point about misplaced focus on choice of words.
Pretty sure they were being sarcastic. I also don't find your arguments persuasive in the slightest, and I find myself being skeptical of these recent moral outcries. I'm skeptical of its sincerity, and I don't buy it. "Master" has an etmylogical background far more diverse than the dichotomy to "slave". I can wholeheartedly say that I've not once thought to make that association. It's been a title for centuries. Master blacksmith, etc. (See https://en.wikipedia.org/wiki/Master for a list)
Another example of what seems like a fake moral outcry is "blackface". And, I mean what it is being referred to now, and not the actual meaning. The racist ridicule by stereotyping ethnicity. That was "Blackface". Yet, for some reason, context doesn't matter anymore, and we end up with removing episodes of Community because someone painted their face in a cosplay of an dark elf, in exact commentary of this.
There is a significan systemic racism in the US that affects almost everything. In order to deal with those things, the very first thing would be to properly be able to identify racism. Context matters. Renaming "Master" branches is not progress. Ostracising a kid for dressing up as Michael Jackson isn't it.
Whenever I see outrage over such things I cynically think that the person is probably white, and probably doing it for attention. One thing is for sure, it only serves to detract from the real issues.
Check out the recent Marc Rebillet stream with Flying Lotus and Reggie Watts. They absolutely destroy the bs around the use of the word master. I think both FL and RW will be quite representative of how African Americans (and the rest of the world) feel about this.
> For co-workers not familiar with the history of slavery in the United States, there is always a pause, and then some confusion about the changes. After explaining the historical context, 99% of people reply: "Oh, I understand. Thank you to explain."
Most people answer like this when they realize you are an unreasonable person who refuse to listen. Happens all the time, like "Oh, I understand (you are one of those). Thank you for explaining!", and remember that they need to stop using this word when working with you.
The word master has many usages. One specific context (master/slave) is inappropriate, but that doesn't mean every other context is unusable now.
Github changing master->main was the epitome of virtue signaling. This literally does not affect black people at all, nor does it do -anything- to help with racial inequality in the US. It's actually quite patronizing and tone-deaf to think that instead of all the things -Microsoft- could be doing to help racial inequality, they're putting in as little effort as possible.
Congrats on granting power over words to unreasonable people who ignore things like context in language and common sense.
while the word 'master' can indeed be used in the sense of "master and slave", its use in git is more akin to the use of 'master' in "master record", and doesn't refer to 'ownership' in any way
Everyone has a line of how much they are willing to change their language, though. There will always come a point where someone will think some change is "silly", even though the old term may have upset some people. And almost every term has some sort of baggage associated with it.
There was a post going around somewhere of a college's earnest attempt and change some language (like avoiding "give it a shot" because of the association of "shot" with guns). Would renaming all the various things we call "triggers" be ok, so we don't upset victims of gun violence?
So the master->main change was the line for some people, not others.
As a matter of principle I don't think we should be moving towards ignoring any and all contexts of words. Granting this power of word banning to random arbiters is quite crazy. In this case, master was moreso changed because it -could- be deemed offensive, not that it -actually- is offensive by itself. Not one person that I've spoken to about it has actually cared.
Words having multiple usages is not really a novel concept. If we ban words based on them potentially being offensive, we'll end up with no words at all as people move onto using different words, and so forth.
It is not silly to have pushback when someone wants to grant themselves power over language usage. Dropping usage of a word should have a strong, tenable argument and larger community support than 0.00000001% of people caring.
> Everyone has a line of how much they are willing to change their language, though.
But that line is constantly moving though. People are forced to adapt, or they are ostracised socially and economically.
If prestigious organisations, people and institutions decide "master/slave" is an immoral thing to say, I have no choice. Eventually I'll need to fall in line or my livelihood will be at risk.
I don't work in USA and I don't intend to. Your history of slavery is none of my concern, especially when I'm just trying to do my work.
The word 'master' is useful for me, and I don't believe for a nanosecond that anyone, American or not, is ACTUALLY offended by it. I believe that some people (mostly affluent white Americans) are searching for things that they think they SHOULD be offended by.
Actually the indentation of the first comment and the lack of preprocessor show it's not copied from this code directly but from Wikipedia (https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overv...)
So It could be that the Quake source code is not part of the training set but the Wikipedia version is.
While I strongly doubt they would use Wikipedia as a training set, has anyone done a search of GitHub code to see if other projects have copied-and-pasted that function from Wikipedia into their more-permissive codebases?
Almost 2000 results for one of the comment lines. I'm not going to read through those or check the licenses, but I think it's safe to say that block of code exists in many GitHub code bases, and it's likely many of those have permissive licenses. Given how famous it is (for a block of code) it's not unexpected.
A question that popped into my head is: if the machine sees the same exact block of code hundreds of times, does that suggest to it that it's more acceptable to regurgitate the entire thing verbatim? Not that this incident is totally 100% ok, but if it was doing this with code that existed in only a single repo that would be much more concerning.
if the machine sees the same exact block of code hundreds of times, does that suggest to it that it's more acceptable to regurgitate the entire thing verbatim?
From a copyright standpoint, quite possibly. This is called the "Scènes à faire" doctrine. If there are some things that have to be there in a roughly standard form to do a standard job, that applies.
This would need to first be tested in court; apparently Microsoft is happy in generating thousands (or millions) of violations, knowing most programmers don't enforce their copyright.
I don't get it, that seems like standard fare for an R-rated movie? And then it seems like some complained because they decided to start editing it down to a PG-13 movie?
Essentially, from my understanding, there was a data leak they never commented on, they instituted a poorly made content filter without saying anything. The filter frequently has false positives and negatives, someone discovered they trained the game using content the filter was designed to block, meaning the ai itself would frequently output filter triggering stuff, more people found out their private unpublished stories were being read by third parties after a job ad and the stories were posted on 4Chan, people recognized stories they wrote that had triggered the filter that were posted, and then they started instituting no warning bans.
I might have missed something, but that's the gist of it.
Also, before and while all this was going on, the quality of the AI's output has been steadily dropping to the point where NovelAI.net now generates what's in many ways better writing.
That's GPT-J-6B, to be clear. A 6-billion-parameter model is producing better output than a 300 billion parameter model, because of what I can only assume to be sheer incompetence on AI Dungeon's part. I've also used the raw GPT-3 API, and it does better at writing than either. In other words: Doing nothing would have been better than whatever they've been doing.
It’s pre-trained, partially, on Wikipedia. GPT-2 did this sort of thing all the time: native to the architecture to surface examples from the fine-tuning training set by default.
that will make a great defense at a copyright court.
"your honor, i would like to plead not guilty, on the basis that i just robbed that bank because i saw that everyone was robbing banks on the next city"
...on the other hand, that was the exact defense tried for the capitol rioters. So i don't know anything anymore.
It's impossible to automate checking for code license violations.
If you and I write the exact same 10 lines of code, we both have independent and valid copyrights to it. Unlike patents, independent derivation of the same code _is_ a defense for copyright.
If I write 10 lines of code, publish it as GPL (but don't sign a CLA / am not assigning it to an employer), and then re-use it in an MIT codebase, I can do that because I retained copyright, and as the copyright holder I can offer the code under multiple incompatible licenses.
There's no way for a machine to detect independent derivation vs copying, no way for the machine to know who the original copyright holder was in all cases, and whether I have permission from them to use it under another license (i.e. if I email the copyright holder and they say 'yeah, sure, use it under non-gpl', it suddenly becomes legal again)...
It's not a problem computers can solve 100% correctly.
It's the same problem s with self driving cars, you gets sued. The company that provides the service/car or the the programmer/driver?
I think the latter.
This is true but doesn't change the problem that copilot itself is potentially distributing unlicensed copyrighted material. This isn't necessarily a problem for you as a developer though.
As someone who gets paid to write code (nominally) and has also written a few novels, I don't agree with this characterization. From what I've seen of Copilot, it's more like having a text editor generate your next sentence or paragraph^[1]. The idea (as I see it) is that you might use it to generate some prose "boilerplate", e.g. environmental descriptions, and hack up the results until you're satisfied.
It's content generation at a fragmentary level where each "copied" chunk does not form a substantive whole in the greater body of the new work. Even if you were training it on other authors' works rather than just your own, as long as it wasn't copying distinctive sentences wholesale, I think there's a strong argument for it falling under fair use--if it's even detectable.
On the other hand, if it regurgitated somebody else's paragraph wholesale, I don't think that would be fair use. Somewhere in-between is where it gets fuzzy, and really interesting; it's also where internet commenters seem to prefer flipping over the board and storming out convinced they're right to exploring the issues with a curious and impartial mind. I see way too much unreasoned outrage and hyperbolic misrepresentation of the Copilot tool in these threads, and it's honestly kind of embarrassing.
As far as this analogy goes, it's worth noting that the structure of a computer program doesn't map onto the structure of a piece of fiction (or any work of prose) in a straightforward way. Since so much of code is boilerplate, I would (speculatively, in the copyright law sense) actually give more leeway to Copilot in terms of absolute length of copied chunks than I would for a prose autocompleter. For instance, X program may be licensed under the GPL, but that doesn't mean X's copyright holder(s) can sue somebody else because their program happened to have an identical expression of some RPC boilerplate or whatever. It would be like me suing another author because their work included some of the same words that mine did.
^[1] At least one tool like this (using GPT-3) has been posted on HN. At this point in time I wouldn't use it, but I have to admit that it was sort of cool.
> ^[1] At least one tool like this (using GPT-3) has been posted on HN. At this point in time I wouldn't use it, but I have to admit that it was sort of cool.
Have a poke at novelai.net if you get a chance.
It's... not very smart. It's pretty decent at wordcrafting, though, and as an amateur writer I find it invaluable for busting writer's block. Probably if you spend all day writing fiction you'll find ways around that, but for me the solution has become "Ask the AI to try".
It'll either produce a reasonable continuation, or something I can look at and see why it's wrong. Either is better than a blank page.
That does not seem like a response to what I just said?
I said that it is impossible for the user to check that the code copilot gives is OK, license-wise, and therefore, they can not be sure that it is legally OK to include in any project.
That's bonkers. And the beauty of it is that now someone could realistically do a GDPR Erasure request on the Neural Net. I do hope that they're able to reverse data out.
Since the information is encoded in model weights, I doubt that erasure is even possible. Only post-retrieval filtering would be an option.
It only goes to show that intransparent black-box models have no place in the industry. The networks leak information left and right, because it's way too easy to just crawl the web and throw terabytes of unfiltered data at the training process.
If this system includes personal information that cannot be removed, corrected or controlled, it's probably a gross violation of all European and some American privacy laws.
Designing a system that you cannot control does not grant you legal immunity for whatever the system does. As Github operates inside the EU, personal information this system contains MUST be deleteable, correctable and retrievable, or it's simply illegal.
The problem is that the information is in an opaque encoding that nobody can reverse engineer today. So it's impossible to prove that a certain subset of data has been removed from the model.
Say, you have a model that repeats certain PII when prompted in a way that I figure out. I show you the prompt, you retrain the model to give a different, non-offensive answer. But now I go and alter the prompt and the same PII reappears. What now?
Yes, but the compute costs required for training are probably in the range of hundreds of thousands of usd to potentially millions of usd. Not to mention potentially months of training time.
I think copilot is solving the wrong problem. A future of programming where we're higher up the abstraction tree is absolutely something I want to see. I am taking advantage of that right now -- I'm a decently good programmer, in the sense that I can write useful, robust, reliable software, but I'm pretty high up the stack, working in languages like Java or even higher up the stack that free me from worrying about the fine details of memory allocation or the particular architecture of the hardware my code is running on.
Copilot is NOT a shift up the abstraction tree. Over the last few years, though, I've realized the the concept of typing is. Typed programming is becoming more popular and prominent beyond just traditional "typed" languages -- see TypeScript in JS land, Sorbet in Ruby, type hinting in Python, etc. This is where I can see the future of programming being realized. An expressive type system lets you encode valid data and even valid logic so that the "building blocks" of your program are now bigger and more abstract and reliable. Declarative "parse don't validate"[1] is where we're eventually headed, IMO.
An AI that can help us to both _create_ new, useful types, and then help us _choose_ the best type, would be super helpful. I believe that's beyond the current abilities of AI, but can imagine that in the future. And that would be amazing, as it would then truly be moving us up the abstraction tree in the same way that, for instance, garbage collection has done.
A taller abstraction tree makes tradeoffs of specialization: the deeper the abstractions, the more one has to understand when the abstractions break or when one chooses to use them in novel ways.
This is something I'm interested in regarding this approach... When it works as intended, it's basically shortening the loop in the dev's brain from idea to code-on-screen without adding an abstraction layer that someone has to understand in the future to interpret the code. The result is lower density, so it might take longer to read... Except what we know about linguistics suggests there's a balance between density and redundancy for interpreting information (i.e. the bottleneck may not be consuming characters, but fitting the consumed data into a usable mental model).
I think the jury's out on whether something like this or the approach of dozens of DSLs and problem-domain-shifting abstractions will ultimately result in either more robust or more quickly-written code.
But on the topic of types, I'm right there with you, and I think a copilot for a dense type forest (i.e. something that sees you writing a {name: string; address: string} struct and says "Do you want to use MailerInfo here?") would be pretty snazzy.
Yeah, but generating tons of stupid verbose code that nobody will be able to read and understand is more fun. Also, your superiors will be sure you are a valuable worker if you write more code.
I may be over-reading, but I think this kind of example not only demonstrates the pragmatic legal issues, but also the fundamental weaknesses of a solely text-oriented approach to suggesting code. It doesn't really seem to have a representation of the problem being solved, or the relationship between things it generates and such a goal. This is not surprising in a tool which claims to work at least a little for almost all languages (i.e. which isn't built around any firm concept of the language's semantics).
I'd be much more excited by (and less unnerved by) a tool which brought program synthesis into our IDEs, with at least a partial description of intended behavior, especially if searching within larger program spaces could be improved with ML. E.g. here's an academic tool from last year which I would love to see productionized.
https://www.youtube.com/watch?v=QF9KtSwtiQQ
I think it’s pretty clear that program synthesis good enough to replace programmers requires AGI.
This solely text based approach is simply “easy” to do, and that’s why we see it. I think it’s cool and results are intriguing but the approach is fundamentally weak and IMO breakthroughs are needed to truly solve the problem of program synthesis.
There's a few decades worth of work on program synthesis and it works very well. You don't need AGI.
You need either a) a complete specification of the target program in a formal language (other than the target language) or b) an incomplete specification in the form of positive and negative examples of the inputs and outputs of the target program, and maybe some form of extra inductive bias to direct the search for a correct program [edit: the latter setting is more often known as program induction].
In the last few years the biggest, splashiest result in program synthesis was the work behind FlashFill, from Gulwani et al: one-shot program learning, and that's one shot, from a single example, not with a model pretrained on millions of examples. It works with lots of hand-crafted DSLs that try to capture the most common use-cases, a kind of programming common sense that, e.g. tells the synthesiser that if the input is "Mr. John Smith" and the output is "Mr" then if the input is "Ms Jane Brown" the output should be "Ms". It works really, really well but you didn't hear about it because it's not deep learning and so it's not as overhyped.
Copilot tries to circumvent the need for "programming common sense" by combining the spectacular ability of neural nets to interpolate between their training data with billions of examples of code snippets, in order to overcome their also spectacular inability to extrapolate. Can language models learned with neural nets replace the work of hand-crafting DSLs with the work of collecting and labelling petabytes of data? We'll have to wait and see. There are also many approaches that don't rely on hand-crafted DSLs, and also work really, really well (true one-shot learning of recursive programs without an example of the base case and the synthesis terminates) but those generally only work for uncommon programming languages like Prolog or Haskell, so they're not finding their way to your IDE, or your spreadsheet app, any time soon.
But, no, AGI is not needed for program synthesis. What's really needed I think is more visibility of program synthesis research so programmers like yourself don't think it's such an insurmountable problem that it can only be solved by magickal AGI.
I said program synthesis good enough to replace programmers requires AGI. Program synthesis based off of informal specifications in natural language. Not talking about highly constrained environments with formal specs.
I am not belittling the work going in this space, and I’m sure for highly constrained and narrow use cases a lot can be done even now. But I believe solving the general problem of program synthesis based on informal spec requires AGI. I am hardly the only one who thinks this.
>> I am not belittling the work going in this space, and I’m sure for highly constrained and narrow use cases a lot can be done even now.
No. Program synthesis approaches work very well for a broad array of problems, not for "highly constrained and narrow use cases"- that is a misconception of the kind that results from lack of familiarity with modern program synthesis.
Sumit Gulwani, that I mentioned in my previous comment, is an author. To clarify, I'm not in any way affiliated with him or his collaborators. I'm actually from a rival camp, if you will, but the paper I link to is a very good summary of the state of the art. It should help you if you wish to understand where program synthesis is at.
>> I said program synthesis good enough to replace programmers requires AGI. Program synthesis based off of informal specifications in natural language.
Program synthesis from natural language is hard to make work because it's difficult to translate natural language specifications to specifications that a program synthesiser can use. But that is a limitation of current natural language analysis, specifically natural language understanding, approaches - not a limitation of program syhtesis approaches.
I think you equate formal specifications, or specification by example, with "narrow use cases". There's no connection between the two.
If program synthesis is as far advanced as you say it is, how come I make six figures doing something that you seem to be arguing can be totally automated?
The reality seems to disagree with your statements. Program synthesis is as of right now limited to academic research and highly narrow use cases. If the opposite was true, I’d be out of a job.
I think copilot is probably the first product of its type that might make its way into the hands of users en masse.
Edit:
Btw I was referring to program synthesis based off informal natural language spec. Spec inference is part of the synthesis pipeline, I think it’s not fair to just ignore that problem.
The purpose of program synthesis is not to get programmers out of a job. Rather,
it's a tool to help programmers better do their job. I think it's easy to see
why you're not using it. With few exceptions, advances in research take many
years to percolate down to the industry. And of course the industry is famous
for following trends without real understanding of anything.
Anyway the review I linked to has some examples of real-world applications of
program synthesis. Don't be afraid to read it- it's light on formal notation and
you don't need special skills to understand it. I appreciate that it's a long
document but there's a Table of Contents at the start and you should be able to
skim through in a short time just to get a general idea of the subject.
Anyway I can see you're trying to "wing it" and reason from first principles
about something you know nothing about, in true SWE style. Yet, you don't know
what you don't know, so you start from the wrong assumptions ("fully automated"
etc) and arrive at the wrong conclusions. That's no way to understand anything.
It's certainly not going to give you any good idea about what's going on in an
entire field of research you know nothing about.
Of course you're not obliged to know anything about program synthesis, but in
that case, maybe consider sitting back and listening rather than expressing
strong opinions with absolute conviction that is not supported by your
knowledge? I think that will make a better conversation, and a better internet,
for everyone.
I think you're holding text-based approaches and synthesis based approaches to radically different expectations. Copilot isn't approaching replacing programmers; presumably a programmer is invoking it, deciding what to keep or change, etc, i.e. generating parts of programs under the guidance of a human programmer. Synthesis can work at the level of providing an expression or a helper function, as a useful tool under the guidance of a programmer.
Copilot suggests some code snippets, and not necessarily good ones. To be dismissive of another approach to generate parts of programs because they cannot replace programmers is like saying that belt-drive bikes aren't worth considering over chains because a belt-drive bike isn't a replacement for a Learjet.
To be clear I wasn’t dismissing anything, that was not my intent. I think as a programmer assist text based approaches work and I really like copilot for what it is.
I was merely saying that for the holy grail, program synthesis from informal spec generalised to any domain, the approach will have to be different.
Your links say that program synthesis in general is the holy grail of computer science etc, you said that the holy grail is "program synthesis from informal spec" and by that you meant a natural language specification as per the context of our conversation so far. Are you now trying to subtly shift the goalposts?
If so, please leave them alone. You have no reason to assume that program synthesis from a natural language specification is "the holy grail" of anything. But I'm glad that our conversation at least made you look up a few links, even if only to try and win the internet conversation from what it looks like.
So what did you learn, from what you read about program synthesis? Can you see why your assumptions earlier on, about "narrow use cases" and the like were wrong?
Edit: btw, the Freuder paper you linked is about constraint prorgamming, not program synthesis.
And did you notice that one of your links above is the abstract of the review paper I proposed you read, earlier?
Look, I think the main disagreement here is that you don’t seem to consider spec inference as part of the program synthesis process, whereas I do. Your position may very well be correct from an academic point of view.
From my perspective, spec inference from informal spec is the main thing to solve. Because for formal specs, I’d just be programming in a declarative language to create the formal spec.
Spec by example won’t scale because you can’t provide examples across the entire domain for apps of real world complexity.
Once spec inference is solved, then you are just left with a search problem. I understand that the search space is freakin huge but I’d still say the latter problem is easier to solve than the former.
And I’d guess that the problem of inferring a spec from an informal description is what requires AGI.
I hope this clarifies my POV. I don’t think we disagree, we just have different perspectives.
Thank you for a level-headed and well thought-out response! I'm glad to see that
our communication isn't "ratcheting" to more and more forceful forms.
Now. I think the root of our disagreement was the necessity or not of AGI for
various kinds of program synthesis, which I think we 've probably pared down to
program synthesis from an informal specification, particularly a natural
language one.
I don't agree that AGI is necessary for that. I think that, as many AI tasks,
such a complete and uncompromising solution can be avoided and a more
problem-specific solution found instead.
In fact, we already had almost a full solution to the problem in 1968, with
SHRDLU [1], a program that simulated a robotic hand directed by a human user in
natural language to grasp and rearrange objects. This was in the strict confines
of a "blocks world" but its capability, of interpreting its user's intent and
translating it accurately to actions in its well-delineated domain remains
unsurpassed [2]. Such a natural language interface could well be implemented
for programming in bounded domains, for example to control machinery or run
database queries etc. This capability remains largely unexploited, because the
trends in AI have shifted and everybody is doing something else now. That's a
wider-ranging conversation though. My main point is that a fully-intelligent
machine is not necessary to communicate with a human in order to carry out
useful tasks. A machine that can correctly interpret statements in a subset of
natural language suffices. This is technology we could be using right now, only
nobody has the, let's say, political will to develop it because it's seen as
parochial, despite the fact that its capabilities are beyond the capabilities of
modern systems.
As to the other kind of informal specifications, by examples, or by
demonstration, what you say, that it won't scale etc, is not true. I mentioned
earlier one-shot learning of recursive hypotheses without an example of the base
case [3]. To clarify, the reason why this is an important capability is that
recursive programs can be much more compact than non-recursive ones and still
represent large sets of instances, even infinite sets of instances. In fact, for
some program learning problems, only recursive solutions can be complete
solutions (this is the case for arithmetic and grammar learning for example).
The ability to learn such solutions from a single example I think should
conclusively address concerns about scaling.
To be fair, this is a capability that was only recently achieved by Inductive
Logic Programming (ILP) systems, a form of logic program synthesis (i.e.
synthesis where the target language is a logic programming language, usually
Prolog, but not necessarily). The Gulwani survey I linked mentions this recent
advance in passing only. But ILP systems in general have excellent sample
complexities and can routinely generalise robustly from a handful of examples
(in the single digits), and have been doing this since the 1990's.
The cost of searching a large hypothesis space is, indeed, an issue. There are
ways around it however. Here, I have to tout my on horn and mention that my
doctoral research is exactly about logic program learning without search. I'd go
as far as to say that what's really been keeping program synthesis back is the
ubiquity of search-based approaches, and that program synthesis will be solved
conclusively only when we can construct arbitrary programs without searching. But you
can file that under "original research" (I mean, that's literally a description
of my job). Anyway, no AGI is needed for all of this.
So, I actually understand your earlier skepticism, along the lines of "if all
this is true, why haven't I heard of it?" (well, you said why do you still have
a job but it's more or less the same thing). The answer remains: because it's
not the trendy stuff. Trends drive both industry and research directions. Right
now the trend is for deep learning, so you won't hear about different approaches
to program synthesis, or even program synthesis in general. There's nothing to
do about it. Personally, I just grin and bear it.
Parsing intent in a programing context is easier than others. Also most the code is written to be parsed for a machine anyway. So with ASTs and all other static and maybe even some dynmaic checks it should be possible.
We already some of it with type detection , intellisense etc
It is hard set of problem with no magic solutions like this with years of development time needed. That approach will not happen commercially, only incrementally in the community.
Also the goal doesn't need to be "to replace programmers". As with copilot, the point of a program synthesis tool can be to assist the programmer. The point of the system in the video linked above is partly that interactively using such a system can aid development. My main point is this can be a lot better in combination with approaches from outside the ML community, which may involve much tighter integration to specific languages, as well as some awareness of a goal for the synthesized portion.
To "replace programmers", an organization would need to have a way of specifying to the system a high level program behavior, and to confirm that an output from the system satisfies that high level behavior. I think for specifications of any complexity, producing and checking them would look like programming just of a different sort.
I mean, the cases where it tries to assign copyright to another person in a different year highlights that context other than the other text in the file is semantically extremely important, and not considered by this approach. Merely generating text which looks appropriate to the model given surrounding text is ... misguided?
If you think about it, program synthesis is one of the few problems in which the system can have a perfectly faithful model dynamics of the problem domain. It can run any candidate it generates. It can examine the program graph. It can look at what parts of the environment were changed. To leave all that on the table in favor of blurting out text that seems to go with other text is like the toddler who knows that "five" comes after "four", but who cannot yet point to the pile of four candies. You gotta know the referents, not just the symbols. No one wants a half-broken Chinese Room.
> generating text which looks appropriate to the model given surrounding text is ... misguided?
Agreed - it represents a failure to adequately model/understand the task, but I don't think it is a "fundamental weakness" of text-based 'Chinese room' approaches.
> You gotta know the referents, not just the symbols. No one wants a half-broken Chinese Room.
"Knowing the referents" is not at all clearly defined. It's totally possible that, under the constraint of optimizing for next-word prediction, the model could develop an understanding of what the referents are.
You can't underestimate the level of complex behavior emerging from a big enough system under optimization. After all, all the crazy stuff we do - coding, art, etc. is produced by a system under evolutionary optimization pressure to make more of itself.
> "Knowing the referents" is not at all clearly defined. It's totally possible that, under the constraint of optimizing for next-word prediction, the model could develop an understanding of what the referents are.
Well, in this case, it would have been good to understand that "V. Petkov" is a person unrelated to the project being written, and that "2015" is a year and not the one we're currently in. Sometimes the referent will be a method defined in an external library, which perhaps has a signature, and constraints about inputs, or properties which apply to return values.
> You can't underestimate the level of complex behavior emerging from a big enough system under optimization. After all, all the crazy stuff we do - coding, art, etc. is produced by a system under evolutionary optimization pressure to make more of itself.
I think this can verge into a kind of magical thinking. Yes, humans also look like neural nets, and we might even be optimizing for something. But we learn to program (and we do our best job programming) by having a goal for program behavior, and we use interactive access to try to run something, get an error, set a break point, try again, etc. I challenge anyone to try to learn to "code" by never being given any specific tasks, never interacting with docs about the language, an interpreter, a compiler, etc, but merely to try to fill in the blank in paper code snippets. You might learn to fill in some blanks. I highly doubt you would learn to code.
This is totally a case where the textual representation of programs is easier to get and train against, and that tail is being allowed to wag the dog to frame both the problem and the product.
None of this is to say that high-bandwidth DNN approaches don't have a place here -- but I think we should be looking at language-specific models where the DNN receives information about context (including some partial description of behavior) and outputs of the DNN are something like the weights in a PCFG that is used in the program search.
>> I don't think it is clear that such "fundamental weaknesses" exist. A text-based
approach can get you incredibly far.
Mnyeah, not really that "incredibly". Remember that neural network models are
great at interpolation but crap at extrapolation. So as long as you accept that
the code generated by Copilot stays inside a well-defined area of the program
space, and that no true novelty can be generated that way, then yes, you can get
a lot out of it and I think, once the wrinkles are ironed out, Copilot might
be a useful, everyday tool in every IDE (or not; we'll have to wait and see).
But if you need to write a new program that doesn't look like anything anyone
else has written before, then Copilot will be your passenger.
How often do you need to do that? I don't know. But there's still many open
problems in programming that lots of people would really love to be able to
solve. Many of them have to do with efficiency and you can't expect Copilot to
know anything about efficiency.
For example, suppose we didn't know of a better sorting algorithm than
bubblesort. Copilot would not generate mergesort. Even given examples of
divide-and-conquer algorithms, it wouldn't be able to extrapolate to a
divide-and-conquer algorithm that gives the same inputs for the same outputs as
bubblesort. It wouldn't be able to, because it's trained to reproduce code from
examples of code, not to generate programs from examples of their inputs and
outputs. Copilot doesn't know anything about programs and their inputs and
outputs. It is a language model, not a magickal pink fairy of wish fulfillment
and so it doesn't know anything about things it wasn't trained on.
Again, how often do you need to write truly novel code? In the context of
professional software development, I think not that often. So if it turns out to
be a good boilerplate generator, Copilot can go a long way. As long as you don't
ask it to generate something else than boilerplate.
There are approaches that work very well in the task of generating programs
that they've never seen before from examples of their inputs and outputs, and
that don't need to be trained on billions of examples. True one-shot learning of
programs (without a model pre-trained on billions of examples) is possible.
With current approaches. But those approaches only work for languages like Prolog
and Haskell, so don't expect to see those approaches helping you write code in
your IDE anytime soon.
This does make me wonder if this is susceptible to the same form of trolling as that MS AI got. Commit a load of grossly offensive material to multiple repos, and wait for Copilot to start parroting it. I think they're going to need some human moderation.
Way better. It's susceptible to copyright trolling.
Put up repos with snippets for things people might commonly write. Preferably use javascript so you can easily "prove" it. Write a crawler that crawls and parses JS files to search for matching stuff in the AST. Now go full patent troll, eh, i mean copyright troll.
1) Write a project heavily using Copilot (hell, automate it and write thousands of them, why not?)
2) AGPL all that code.
3) Search for large chunks of code very similar to yours, but written after yours, licensed more liberally than AGPL. Ideally in libraries used by major companies.
4) Point the offenders to your repos and offer a "convenient" paid dual-license to make the offenders' code legal for closed-source use, so they don't have to open source their entire product.
This was my first thought when reading about Copilot...it feels almost certain that someone will try poisoning the training data.
Hard to say how straightforward it'd be to get it to produce consistently vulnerable suggestions that make it into production code, but I imagine an attacker with some resources could fork a ton of popular projects and introduce subtle bugs. The sentiment analysis example on the Copilot landing page jumped out to me...it suggested a web API and wrote the code to send your text there. Step one towards exfiltrating secrets!
Never mind the potential for plain old spam: won't it be fun when growth hackers have figured out how to game the system and Copilot is constantly suggesting using their crappy, expensive APIs for simple things!? Given the state of Google results these days, this feels like an inevitability.
Targeted attacks to elicit output only at a give context are generally possible with AIs. And here, writing an implementation of a difficult and vulnerable process seems easy. Bad implementations of various hard things become common 'cause people cut and paste the code without looking closely since they don't understand it anyway.
Given that code is easier to write than it is to read this one is troubling.
I certainly wouldn't want to be using this with languages like PHP (or even C for that matter) with all the decades of problematic code examples out there for the AI to learn from.
This is a very famous function [0] and likely appears multiple times in the training set (Google gives 40 hits for GitHub), which makes it more likely to be memorized by the network.
It's worth keeping in mind that what a neural network like this (just like GPT3) is doing is generating the most probable continuation based on the training dataset. Not the best continuation (whatever that means), simply the most likely one. If the training dataset has mostly bad code, the most likely continuation is likely to be bad as well. I think this is still valuable, you just have to think before accepting a suggestion (just like you have to think before writing code from scratch or copying something from Stack Overflow).
I have no idea how this or GPT3 works or how to evaluate them, but couldn't you argue that it's working as it should? You tell copilot to write a fast inverse square root, it gives you the super famous fast inverse square root. It'd be weird and bad if this didn't happen.
As far as licenses go, idk. Presumably it could delete associated comments and change variable names or otherwise obscure where it's taking code from. Maybe this part is shady.
Maybe I could build a robot that goes out in the city and steal cars.
As far as licenses go, idk. Presumably it could delete the number plate and repaint the car or otherwise obscure where it's taking the car from. Maybe this part is shady.
In particular, fast approximate inverse square root is an x86 instruction, and not a super new one. I'd be surprised if it wasn't in every major instruction set.
This is an interesting issue. I suspect training on datasets from places like Github would be likely to provide lots of "this is a neat idea I saw in a blog post about how they did things in the 90's" codes.
> the most probable continuation based on the training dataset
This is not wrong, but it's easy to misread it as implying little more than a glorified Markov model. If it's like https://www.gwern.net/GPT-3 then it's already significantly cleverer, and so you should expect to sometimes get the kind of less-blatant derivation that companies aim to avoid using a cleanroom process or otherwise forbidding engineers from reading particular sources.
Arguably the most famous block of code, of all time. Maybe fizzbuzz but there are so many flavors of it. And InvSqrt is way more memeable.
So I don't know if on this alone it proves Copilot regurgitates too much. I think other signs are more troubling, however, such as its tendency to continue from a prompt vs generate novelty.
It seems like a very sensible answer from copilot since the prompt includes "Q_" which makes it obvious that the programmer is specifically looking for the Quake version of this function.
To me it doesn't show that copilot will regurgitate existing code when I don't want it to, just that if I ask it to copy some famous existing code for me it will oblige.
The claim for AI systems like this is that it has actually learned something and is generating code from scratch. Oftentimes the authors will claim regurgitation is simply not possible, and this example shows that's a lie.
Many arguments on the benefits, legality and power of AI systems rely on this claim.
To turn around now and say it's OK to regurgitate in the right setting is to move the goalposts.
> Oftentimes the authors will claim regurgitation is simply not possible
Do the Copilot authors claim this?
I get that you're suggesting that Copilot may benefit from absolute claims made by the authors of other, similar systems (or their proponents), but I also don't think it's reasonable to exclude nuance and the specifics of Copilot from ongoing discussions on that basis. The Copilot authors have publicly acknowledged the regurgitation problem, and by their account are working on solutions to it (e.g. attribution at suggestion-generation time) that don't involve sweeping it under the rug.
Nat Friedman explicitly stated that it shouldn't regurgitate [0]:
> It shouldn't do that, and we are taking steps to avoid reciting training data in the output
He's being woefully naive. To put it bluntly, we don't know how to build a neural network that isn't capable of spitting out training data. The techniques he pointed to in other threads are academic experiments, and nobody seems to have a credible explanation for why we should believe that they work.
I'm not anything close to an ML expert, and I have no opinion on whether what they're aiming for is possible, but this document^[1] (linked in your linked comment) states explicitly that they are aware of the recitation issue and are taking steps to mitigate it. So, in the context of the comment I replied to, I think Github is very far from claiming that recitation is "simply not possible".
That kind of bullshit phrasing can only get you so far.
It's like if some corporate PR department told you "we're aware of the halting problem, and are taking steps to mitigate it." You would rightly laugh them out of the room.
It's not going to work, and the people making these statements either don't understand how much they don't understand, or are deluding themselves, or are actively lying to us.
An honest answer would be something like "We are aware that this is a problem, and solving it is an active area of research for us, and for the machine learning community at large. While we believe that we will eventually be able to mitigate the problem to an acceptable degree, it is not yet known whether this category of problem can be fully solved."
You're using some pretty strong language here, but do you have any more substantive criticisms of the analysis they present at https://docs.github.com/en/github/copilot/research-recitatio... ? They seem to think the incidence of meaningful (i.e. substantively infringing) recitation is very low, and that their solution in those cases will be attribution rather than elimination.
Again, I'm not an ML expert, but that sounds a lot more reasonable to me than announcing one's intention to solve the halting problem.
They had some people use the thing for a while, and concluded "Hey look, it doesn't seem to quote verbatim very often. Yay!" There is nothing in there that describes any sort of mitigation. The three sentences about an attribution search at the very end are aspirational at best, and are presented as "obvious" even though it's not at all clear that such a fuzzy search can be implemented reliably.
I use the halting problem as an analogy because their naive attempts to address this problem feel a lot like naive attempts to get around the halting problem ("just do a quick search for anything that looks like a loop," "just have a big list of valid programs," etc.). I can perform a similar analysis of programs that I run in my terminal and come to a similar "Hey look, most of them halt! Yay!" conclusion. I can spin a story about how most of the ones that don't halt are doing so intentionally because they're daemons.
But this approach is inherently flawed. I can use a fuzz tester to come up with an infinite number of inputs that cause something as simple as 'ls' to run forever.
Similarly, I can come up with an infinite number of adversarial inputs that attempt to make Copilot spit out training data. Some of them will work. Some of them will produce something that's close enough to training data to be a concern, but that their "attribution search" will fail to catch. That's the "open research question" that they need to solve.
We don't have a general solution to this problem yet, and we may never have one. They're trying to pass off a hand-wavey "we can implement some rules and it won't be a problem most of the time" solution as adequate. I don't see any reason to believe that it will be adequate. Every attempt I've seen at using logic to try and coax a machine learning model into not behaving pathologically around edge cases has fallen flat on its face.
> The analysis you're citing is just that -- a statistical analysis. They had some people use the thing for a while, and concluded "Hey look, it doesn't seem to quote verbatim very often. Yay!" There is nothing in there that describes any sort of mitigation.
> The three sentences about an attribution search at the very end are aspirational at best, and are presented as "obvious" even though it's not at all clear that such a fuzzy search can be implemented reliably.
I agree with all of this, though I do think that the attribution strategy they describe sounds a lot easier than solving the halting problem or entirely eliminating recitation in their model. Obviously, the proof will be in the pudding.
Maybe you and others are reacting to them framing this as "research", as if they're trying to prove some fundamental property of their model rather than simply harden it against legally questionable behavior in a more practical sense. I think a statistical analysis is fine for the latter, assuming the sample is large enough.
The biggest issue with that analysis is that their model is clearly very able to copy code and change the variable names, copying code and changing variable names is very clearly still "copying", and the analysis doesn't seem to include that in its definition of "recitation event".
I'd fully expect it to copy code and change variable names in a lot of cases--if it wants to achieve the goal of filling in boilerplate, how could it do anything else? That's pretty much the definition of boilerplate: it's largely the same every time you write it.
What's less clear to me is that Copilot regularly does that sort of thing with code distinctive enough that it could reasonably be said to constitute copyright infringement. If somebody's actually shown that it does, I'd love to see that analysis.
They did! In the faq which I can't find anymore they said:
>GitHub Copilot is a code synthesizer, not a search engine: the vast majority of the code that it suggests is uniquely generated and has never been seen before. We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set.
There's a risk of confirmation bias though, because the search was performed by the developers of the system who are strongly motivated to find no problem with their work.
I have done this to myself many, many times. I look, carefully, at length, for problems with my work until I satisfy myself that there's no obvious problem with it. Then someone else points out the obvious problem I was overlooking. Actually, this has happened often enough and it's painful enough that I learned to really look nowadays.
In the context of the search for "snippets that are verbatim from the training set" there's all sorts of things that can go wrong. The search (a regex search I think?) can be unintentionally made too weak to catch obvious cases. Or too strong, probably. The search for "snippets that are verbatim from the training set" may ignore snippets that are 80% verbatim. Or the code generated during the experiment can be generated in such a way as to only generate verbatim snippets very rarely, contrary to more typical use. And so on and so forth.
There's so many ways to fool oneself when looking for errors in one's own work.
Edit: They explain their search methodology and with only a quick look I gave it, it seems legit, but it was a quick look. The devil is in the details, yes? Maybe people who are really interested in this issue should take a closer look.
This actually seems like an explicit acknowledgement that regurgitation is possible, and not remotely a claim that it is "simply not possible".
It stands to reason that cases where people are intentionally trying to produce regurgitation will strongly overlap with the minority of cases where it actually happens. So I think we are probably suffering from some selection bias in discussions on HN and similar forums--that might be unavoidable, and it certainly stimulates some interesting discussion, but we should try to avoid misrepresenting the product as a whole and/or what its creators have said about it.
I think only Github's lawyers would interpret what GP posted the way you did. Looks like weasel wording to make such an interpretation possible, while making customers believe that code is more or less synthesized in realtime. "Snippets" makes one think one or two lines of code, not entire functions and classes.
I think that until somebody shows that Copilot is willing to copy distinctive code fragments verbatim, unprompted, with a high occurrence rate, I'm not going to start accusing Github of building an engine to cynically exploit the IP rights of open source copyright holders for profit. I've seen no evidence of that, and in absence of evidence I prefer to remain neutral and open-minded.
How would that work, anyway? Rare, distinctive code forms seem much more difficult for an ML thing to suggest with a high-ish confidence level, since there won't be much training data. The Quake thing makes sense because it's one of the most famous sections of code in the world, and probably exists in thousands of places in the public Github corpus.
I'm emphasizing distinctive because a lot of boilerplate takes up a lot of room, but still doesn't make a reasonable argument for copyright infringement when yours looks like somebody else's.
It looks like you're responding to the wrong comment. I don't recall alleging that Github is "building an engine to cynically exploit the IP rights of open source copyright holders for profit".
> I think only Github's lawyers would interpret what GP posted the way you did. Looks like weasel wording to make such an interpretation possible,
So what are you suggesting here, except that Github is attempting a legal sleight-of-hand to hide real infringement?
> while making customers believe that code is more or less synthesized in realtime.
What are you suggesting here except that Github is (essentially) lying to customers, making them believe something that is substantially untrue?
When I say "building an engine to cynically exploit the IP rights of open source copyright holders for profit", I am talking about a scenario in which they are sweeping legitimate IP concerns under the rug with bad faith legal weaselry and misrepresentation of how the product functions, etc., to chase profit. I do not see how that is substantially different from the implications of your comment, especially in the context of this subthread.
Could you enlighten me as to how your intended meaning substantially differs from my interpretation? If you don't mean to accuse Github of malfeasance, we probably don't have much to discuss.
You're not wrong, but the very idea that it will regurgitate copyrighted code at all (and especially at this length, word for word), means that it will be totally unacceptable for many places. In fact, it is arguably not acceptable to use anywhere if you care deeply about copyright.
Apparently you haven't seen many of the demos that people are showing off? Because saying that this only occurs when the author is explicitly asking for copied code is blatantly false.
No I haven't. If you think the other demos are more interesting please link to them. I'm just saying that this demo is biased and that we can't draw any conclusion from it. Actually the author has just confessed optimizing it for entertainment in a sister comment. That doesn't mean that the claim is false but it doesn't show that it is true either.
I think you misunderstood my comment. The same code gets generated if you call the function `float fast_rsqrt` or `float fast_isqrt` for instance. I intentionally wanted it to be looking like `Q_rsqrt` so that people pick up on it quicker.
This reply from @AzureDevOps is bizarre: "We understand. However, the way to report this issues related to Windows 11 is through our Windows Insider even from another device. Thanks in advance."
Wow. To think of it, nothing in this HN thread, including your link, is truly new and unexpected, but in this context it felt somehow more dystopian than ever. Talking about machines pretending to be humans doing stupid stuff, getting automated responses from machines pretending to be humans, that also are the same kind of stupid stuff... Almost feels like drowning.
We understand. However, the way to report this issues related to im̄̽̚m͚͠i͙̬͈̟̹̳ͨ͆̀ͅn̲͚̻ͩ̐͒ͩ̊è̹̱͖̼̰n̘̯ͥ̿̌͛͌t̳̖̣̻̯̱ͥ̅̿̇͜ ̥̻̺͒ͣ͒͠A͔͔͓ͨÌ̖̲̆͒̐̍ͅ ̝̙̼̤͖͍̆̀ͪdͤͨ͑̈҉̭̖y̤͔̮͚̞̺ͬͦͦ̎ͮ͐́s̤͓̲͓̖̪̊t͎̰̤̩̞̞͇͐̎͂̉̆̚o̱̣̰͇̟̻͎̿͒̋̎p̫̰̮̌͐ͧ͗̔̀ͣi̫̱̩̠̫͔͒̉ͤa̶ͧͦͭͩ is through our Citizen Satisfaction Department, even from another Autonomous Azure® Sub-district. Thanks in advance.
The user that account replied to was having a conversation with Microsoft Support about Windows 11, and they replied to the wrong tweet thread with the wrong account.
Why would excluding GPL'd code be enough to not violate licenses? I don't understand why people think MIT or other licenses are free for alls to take code as they wish. The MIT license includes an attribution clause. And, as the linked video shows, Copilot is more than happy to take its code and put your pet license and copyright notice on instead. Isn't that equally as infringing as stealing GPL code? The idea of mining GitHub for training data was doomed from the start copyright-wise, as there's so much code that's misattributed, wrongly-licensed, or unlicensed.
At some level though, this suggests that the only way to be safe if you're writing a program (outside of a Copilot context) is probably simply not to look at GitHub (or maybe Stack Overflow and other code sources) except for, perhaps, using properly attributed entire functions. If you take a couple lines of code and tweak it a bit are you now required to attach copyright attribution? IANAL, but I'm guessing not.
Has anyone ever been sued IRL for using MIT/Apache/... code? Or are we stuck in imaginary land where this is something to be worried about?
Btw the GPLv2 death penalty is rather unique and I don't think anyone will deny that including GPL code in proprietary code is a hell of a lot worse in every way (liability, ethically, etc) than including permissively licenced code and forgetting to attribute it
At least that will reduce the chance of license violation as well as make a good legal argument for any uncovered violations as "unintentional" incidents.
Depends. If you find useful code on Github, Stack Overflow or anywhere else in the internet, you still need to check whether it is suitable with your licensing or not.
If you find useful code on Github or StackOverflow, you can check for the license directly there, or you can try to find where it was copied from, and look for a license there.
Copilot isn't copying, it's regurgitating patterns from its training dataset. The result may be subject to a license you don't know about, but modified enough that you won't find the original source. The result can be a blend of multiple snippets with varying licenses. And there's no way to extract attribution from Copilot - DNN models can give you an output for your input, they can't tell you which exact parts of the training dataset were used to generate that output.
But Copilot won't accurately tell you if it's directly copying code, and if so what the license is. If it provides MIT licensed code that I then need to include, how do I know that? Do I need to search for each set of lines of code it provides on GitHub?
When a person gets code from another source on the internet, they generally know where the code has come from.
In a real world scenario you wouldn't be mindlessly pressing Tab right after linebreak and accepting the first suggestion that comes your way. While entertaining, nobody gets paid to do that.
What you get paid is to write your own code. When you write your own code, generally you think first and then type. Well, with Copilot you think first and then start typing a few symbols before seeing automatic suggestions. If they are right, you accept changes and if they happen to be similar to any other code out there, you deal with it exactly the same as if you typed those lines yourself.
If you use it as a programming partner it will simply autofill whatever you're writing line-by-line. You're not forced to use code completion at a whole-function level and it's not even the suggested use-case.
Code completion that can suggest the whole line instead of a single word (e.g. often it guesses function parameters and various math operations when you haven't even typed function name yet).
Isn't it entirely possible that they did exclude GPL licensed code, but somebody somewhere has violated copyright and copy-pasted that snippet into non-GPL-licensed code that they trained on?
They could try to trace every single code snippet they train on to its "true source" and use the license for that, but that's not very well-defined, and is a lot harder, and it's never going to be 100%.
Which raises another question: ideally Copilot wouldn't be trained on "somebody somwhere", but is that happening?
To use the old trope — if the majority of programmers can't implement Fizzbuzz, but they do have a Github profile, are they being included too?
Hopefully there's some quality bar for the training set, i.e. some subset of "good" code (e.g. release candidate tags from fairly established OSS tools/frameworks in different languages) rather than any old code on the internet.
> Once, GitHub Copilot suggested starting an empty file with something it had even seen more than a whopping 700,000 different times during training -- that was the GNU General Public License.
This is pretty clearly just a search engine with more parameters.
I thought there was something more going on with copilot, but the fact that it is regurgitating arbitrary code comments tells me that there is zero semantic analysis going on with the actual code being pulled in.
It's more that the model is so large it is capable of memorizing a lot. This can be seen in other language models like GPT-3 as well.
Comments, I suspect, will be more likely to be memorized since the model would be trained to make syntactically correct outputs, and a comment will always be syntactically correct. That would mean there is nothing to 'punish' bad comments.
It is decidedly not "just a search engine with more parameters." Language models are just prone to repeating training examples verbatim when they have a strong signal with the prompt. Arguably, in this case, it is the most correct continuation.
Is it possible that Copilot just put Quake's source code into the public domain?
From the Copilot FAQ:
Who owns the code GitHub Copilot helps me write?
GitHub Copilot is a tool, like a compiler or a pen.
The suggestions GitHub Copilot generates, and the code you write with its help, belong to you, and you are responsible for it.
We recommend that you carefully test, review, and vet the code, as you would with any code you write yourself.
Copilot can probably recite most of Quake's source code and according to the FAQ, the output of Copilot belongs to the user.
I think a point where this argumentation might fail is that Quake's source code does not belong to Github directly, but instead both Github and Quake belong to Microsoft. However, I am not a lawyer, so I might be wrong.
The python example is using floats for currency, in an expense tracking context.
The golang one uses a word ("value") for a field name that's been a reserved word since SQL-1999. It will work in popular open source SQL databases, but I believe it would bomb in some servers if not delimited...which it is not.
The ruby one isn't outright terrible, but shows a very Americanized way to do street addresses that would probably become a problem later.
And these are the hand picked examples. This product seems like it needs some more thought. Maybe a way to comment, flag, or otherwise call out bad output?
> The ruby one isn't outright terrible, but shows a very Americanized way to do street addresses that would probably become a problem later.
As someone who has been coding up address storage and validation for the past week in my current job, that one really made me laugh. Mostly because it tries to simplify all the stuff I have been analyzing and mulling over for a week into a single auto-complete.
Spoiler: The Github Copilot's solution simply won't work. It would barely work for Americanized addresses, but even then not be ideal. Of course trying to internationalize it, this thing isn't even close.
I get what Copilot is trying to do. But at the same time I don't get it. Because from my experience, typing code is the fastest part of my job. I don't really have a problem typing. I spend most of my time thinking about the problem, how to solve it, and considering ramifications of my decisions before ever putting code in the IDE. So Copilot comes around and it autocompletes code for me. But I still have to read what it suggested, making edits to it, and consider if this is solving the problem appropriately. I'm still doing everything I used to do, except it saved me from typing out a block of code initially. I still have to most likely rebuild, edit, or change the function somewhat. So it just saves me from typing that first pass. Well that's the easy part of the job.
I have never had a manager come to me and ask why a project is taking so long where I could answer "it just takes so long to type out the code, i wish I had a copilot that could type it for me". That's why we call it software engineering and not coding. Coding is easy. Software engineering is hard. Github Copilot helps with coding, but doesn't help with Software Engineering.
> I spend most of my time thinking about the problem, how to solve it...
A few years ago, I got a small but painful cut on my fingertip. I thought I would have a hard time on the job as a dev. To my surprise, I realized I spend 90-95% of my time thinking, and only 5-10% of the time typing. It turned out to be almost a non-issue.
As the owner of a fairly normal American address that is either corrupted by the UPS address validation service, this is a good time to remind everyone: accept the address that your customer enters. If you offer a service to try to improve your customer’s address, keep in mind that it’s a value added service, it may be wrong, and you MUST test the flow in which your customer tells your service to accept the address as entered. And maybe even collect examples in which the address change is accepted to make sure it does something useful.
Vendors have lost sales to me because they were too incompetent to allow me to ship things to my actual address. Oops.
P.S. for the US, you need to offer at least two lines for the address part. And you need to accept really weird things that don’t seem to parse at all. I know people with addresses that have a PO Box number and a PMB number in the same address. Lose one and your mail gets lost.
P.P.S. If you offer discounted shipping using something like SurePost, make sure you let your customers pay a bit extra to use a real carrier. There are addresses that are USPS-only and there are addresses that work for all carriers except USPS (and SurePost, etc). Let your customer tell you how to ship to them. Do not second-guess your customer.
Im absolutely with you and want to upvote that part of the comment x100. Unfortunately it's often considered a fairly spicy opinions.
Entire frameworks (Rails) are built around the idea of typing as little as possible. Others can't even be mentioned without the topic of boilerplate/keystroke count causing a flame war (Redux).
A lot of engineers equate their value with the amount of lines they can pump out, so there's definitely a demand for tools like these.
There's also some legitimate stuff. There's a lot of very silly thing I have to google every time I do because I have a bad memory. It saves the step of googling. In a way, it was the same debate around autocomplete at the very beginning, but pushed to the next level. Autocomplete turned out to be a very good thing (even though new languages and tools keep coming out without it).
>Because from my experience, typing code is the fastest part of my job. I don't really have a problem typing. I spend most of my time thinking about the problem, how to solve it, and considering ramifications of my decisions before ever putting code in the ID
So very true.
[1] Understanding the problem > [2] thinking about all possible solutions > [3] working out which solution fits best > [4] working out which implementations are possible > [5] working out the most suitable implementation
A lot of my job is thinking hard about how to do [X], incidentally needing to remember how to do [trivial thing Y] and looking it up.
Like, I did it before, remember that it was trivial, I just forget the snippet and I have to break focus to look it up - often by scrolling through my own commit history to try and find the time I did [trivial thing Y] four months ago.
I do kind of wish I could automate that. Skipping the actual typing of the snippet is sort of gravy on top of that.
It would be nice if there were a way to automate the "remembering what that one function is called and what order the parameters are in" portion of my job.
IME the best thing for this is looking at the method listing in the docs for the classes I'm using. E.g. for Ruby, it's usually looking at the methods in Enumerable, Enumerator, Array, or Hash. Or I'll drop a binding.pry into the function, run it, and then type ls to see what's in scope.
Even in the 90s that was a solved problem in Visual Basic with autocomplete. That a lot of dev environments "lost" the ability to do it is mind boggling. With that said, doesn't Rubymine let you do that with autocomplete with the prompt giving you all the info you need? (I haven't done Ruby in a long time).
Still, having to look up the doc or run the code to figure out how to type it is orders of magnitude slower than proper auto complete (be it old school Visual Studio style, or something like Copilot).
orders of magnitude slower than proper auto complete
Having worked extensively with verbose but autocomplete-able languages like Java, compact dynamic languages like Ruby, and a variety of others including C, Scala, and Kotlin, I've come to the conclusion that, for me, autocomplete is a crutch and I develop deeper understanding and greater capabilities when I go to the docs. IDE+Java encourages sprawl, which just further cements the need for an IDE. Vim+Ruby+FZF+ripgrep+REPL encourages me to design code that can be navigated without an IDE, which ultimately results in cleaner designs.
If there's any lag whatsoever in the autocomplete, it breaks my flow state as well. I can maintain flow better when typing out code than when it just pops into being after some hundreds of milliseconds delay. Plus, there's always the chance for serendipity when reading docs. The docs were written by the language creators for a reason. Every dev should be visiting them often.
That's totally cool but the grandparent was talking about remembering shit they already knew. Not everyone has a fantastic memory, and remember the arguments are A then B or B then A doesn't deepen your understanding of a language. Most of the time the autocomplete and the official doc use the exact same source anyway, formatted the same way, with the same info.
remember the arguments are A then B or B then A doesn't deepen your understanding
What I meant is that you will coincidentally learn new things by going to the docs for old/simple things. In addition to remembering that method ordering, you might learn about a new method that simplifies your task.
I spend most of my time thinking about the problem, how to solve it, and considering ramifications of my decisions before ever putting code in the IDE. So Copilot comes around and it autocompletes code for me. But I still have to read what it suggested, making edits to it, and consider if this is solving the problem appropriately.
So, rather than helping people program better, all its done is replace a bunch of the offshore cut-and-paste shops with "AI."
You are right that USPS maintains a database of canonical delivery points. However, it's inevitable this database might not be correct or up to date.
If you don't want to validate, then yes addresses are just a series of text fields. However, mapping them to that delivery point is where the problems arise.
This is actually a good idea that is missing from nearly every machine learning product. How do you back propagate lessons from user interaction into future training of the model? It can be done, I can't think of a place I've seen it done though.
Even free software snippets have clauses like GPL or attribution.
Putting GPL code in proprietary codebase would cause a company massive headaches...
So I agree copilot is problematic by default, liability to lawsuits for employers and forced open sourcing, liability to IP lawsuits as well which will end up on employees shoulders.
It takes what should be your method of last resort - copypaste - and makes it the first thing you try.
All the steps in between - looking at the docstring for the function you're calling, googling for more general information, looking at and deciding not to use not-applicable or poorly-written SO answers - get pushed aside. So instead of you having to convince yourself "yes, it's safe to copy-paste these lines from SO, they actually fit my problem" you're presented with magic and I think the burden for rejecting it is going to be higher once it's in your editor than when you're just reading it on a SO post or Github snippet.
Even for a newcomer looking to learn, working on simple stuff that it has great completions for, it seems like it will sabotage your long-term growth, since it takes all the why and the reasoning out of it. Autocomplete for a function name isn't that relevant to gaining a deeper understanding. Knowing why a certain block of code is passed in in a certain style, or needs to be written at all? Probably that is.
Thinking about it more: there's a very small subset of problems that I think this is actually great for. And I do run into this somewhat often: relatively new libraries or frameworks that don't really care about thorough documentation so they only show you a few happy path snippets and nothing about how to do something more interesting, so you have to bridge the gap between "this one line in the doc obviously doesn't work with me, but I'd like to figure it out without reading all their source code from scratch..." - getting more example snippets barfed up onto my screen from other people who've figured it out before could be a sort of replacement for the library writers having provided documentation in the first place. But ... this is a somewhat insane way to work around a problem of shitty code documentation, and is still insufficient in a couple ways:
* some poor bastard is going to have to be the first person to figure out how to do something, so that copilot itself can know
* any non-code nuances around "oh, if you do that, your memory usage is going to explode" or "oh, by the way, if you do that, make sure you don't do your own threading" will still fail to be communicated.
I'm not really sure that type of tool could really be anything else.
How would a model become aware of all of the various edge cases that depend on which SQL database you use or differences in language versions over time?
> I'm not really sure that type of tool could really be anything else.
It can't be, because they've chosen to use a deep learning approach. That makes it a dead end right from the start.
> How would a model become aware of all of the various edge cases that depend on which SQL database you use or differences in language versions over time?
A lot of things that we call "edge cases" are only a problem for humans. They're not "edge cases" from the point of view of the grammar / semantics of programming languages and libraries. The way a hypothetical, better Copilot could work, is by having directly encoded grammars and semantics metadata corresponding to popular languages and tools. It could generate code in principled and introspectable way, by having a model of the computation it wants to express and encoding it in a target language.
Of course, such hypothetical Copilot is a harder task - someone would have to come up with a structure for explicitly representing understanding of the abstract computation the user wants to happen, and then translate user input into that structure. That's a lot of drudgery, and from my vague understanding of the "classical" AI space, there might be a bunch of unsolved problems on the way.
Real Copilot uses DNNs, because they let you ignore all that - you just keep shoving code at it, until the black-box model starts to give you mostly correct answers. The hard work is done automagically. It makes sense for some tasks, less for others - and I think code generation is one of those things where black-box DNNs are a bad idea.
> The way a hypothetical, better Copilot could work, is by having directly encoded grammars and semantics metadata corresponding to popular languages and tools. It could generate code in principled and introspectable way, by having a model of the computation it wants to express and encoding it in a target language.
But that sounds like too much work, let's just throw a lot of data into an NN and see what comes out! /s
> and introspectable
Which most importantly means "debuggable", I assume. From what I get there doesn't seem to be any way to ad-hoc fix an NN's output.
This is my thought as well. I get the "make productive engineers even more productive" angle, but productive engineers' bottleneck isn't coding. Sure, coding up a boilerplate Go web server is tedious, but I have done it so many times that it takes me two seconds now.
On the flip side, coding can be the bottleneck for the worst kind of coder. When I first started coding, coding was hard simply because I had very little reps and was just learning to understand how to code common solutions, data structures, libraries, etc. Fast forward a few years and, if I were still struggling to understand these concepts, Copilot is a lifeline.
I’m gonna have to disagree - coding can and does take significant amounts of time even when I know exactly what problem I am solving.
I admit that at many organizations there are so many other factors and bottlenecks, but it’s not uncommon that I find myself 8+ hours deep into a coding task that I had expected would be much shorter.
On the other hand, usually that’s due to refactoring or otherwise not being satisfied with the quality of my initial solution, so copilot probably wouldn’t help…
Hmm... I mean, these all seem like mistakes I could make and I don't think I'm the "worst kind of coder".
The currency one I learned a while back, but it's not like I intuited using integers by default.
Value being a reserved keyword, I'm not sure I'd know that and I do Postgres work as part of my myriad duties at the startup I work at. Maybe I'd make that mistake in a migration, maybe I have already.
In a way, is it much different then what we do now as engineers? I'm hard pressed to call it much of an engineering discipline considering most teams I work on barely do design reviews before they launch in to writing code, documentation and meeting minutes are generally an afterthought, and the code review process while decent isn't perfect either and often times relies on arcane knowledge derived over months and years of wrangling with particular <framework, project, technology>.
It's pretty neat, presumably it'll learn as people correct it, and it'll get better over time. I mean it's not even version one.
I get the concerns, but I think they're a bit overblown, and this'll be really useful for people who want to learn how to code. Sure they'll run into some bugs, but, I mean, they were going to do that anyways.
Is this any worse? Maybe not. Is it better? Absolutely not.
This kind of tool will only further entrench the production of mediocre, bug-ridden code that plagues the world. As implemented, this will not be a solution; it is a express lane in the race to the bottom.
it is a race to the bottom, and people are trying to win. any skilled trade is being turned into an unskilled job. it might suck, the results might suck, but it's more profitable, and that's what matters.
I find it is reducing my research time by providing a decent starting solution space. Especially for boring stuff where you just need to google the signature of some standard library function.
> The golang one uses a word ("value") for a field name that's been a reserved word since SQL-1999. It will work in popular open source SQL databases, but I believe it would bomb in some servers if not delimited...which it is not.
In their defense they created the table with this column before invoking the autocomplete, so they sort of reap what the sow here.
It could at least auto-quote the column names to remove the ambiguity, but it's not a compiler, is it.
'user' isn't defined, should be user_name, right? Side note, 'copilot' is a decent name for this (though copilots are usually very competent, moreso than this right now). You must check the suggestions carefully. Maybe it'll make folks better at code review, lol.
That's what I thought when I first started working in text generation too. It's highly annoying people pitch their successful models with hand picked examples. It's literally the opposite of STATISTICAL learning imo.
That's for the best. We don't want products that pretend to write code for us, while copying other's code without attribution and that may not even work.
Now that they have an AI that can be trained to replicate code, it looks like the next step is training it to replicate good code. That will be non-trivial, since step one is identifying good code and they may not have much big data signal to draw from for that.
We know you can't use StackOverflow upvotes. However, they should have enough signal to identify what snippets of code have been most frequently copy-pasted from one project to another.
Question is whether that serves as a good proxy for good code identification.
This is a complex topic, mainly for two reasons: 1. it works on two layers (storage and code) 2. there is a context to take care of.
[Modern] programming languages have decimal/rational data types, which (within limits) are exact. Where this is not possible, and/or it's undesirable for any reason, just use an int and scale it manually (e.g. 1.05 dollars = int 105).
However, point 2 is very problematic and important to consider. How do account 3 items that cost 1/3$ each (e.g. if in a bundle)? What if they're sold separately? This really depends on the requirements.
My 20 cents: if you start a project, start storing currency in an exact form. Once a project grows, correcting the FP error problem is a big PITA (assuming it's realistically possible).
>[Modern] programming languages have decimal/rational data types
This caveat is kind of funny, in light of COBOL having support for decimal / fixed precision data types baked directly into the language.
It's not a problem with "non-modern" languages, it's a problem with C and many of its successors. That's precisely why many "non-modern" languages have stuck around so long.
Additionally, mainframes are so strongly optimized for hardware-accelerated fixed point decimal computing that for a lot of financial calculations it can be legitimately difficult to match their performance with standard commercial hardware.
> It's not a problem with "non-modern" languages, it's a problem with C and many of its successors.
Not really. Any semi-decent modern language allows the creation of custom types which support the desired behavior and often some syntactic sugar (like operator overloading) to make their usage more natural. Take C++, for example, the archetypal "C successor": It's almost trivial to define a class which stores a fixed-precision number and overload the +, -, *, etc. operators to make it as convenient as a built-in type, and put it in library. In my book, this is vastly superior to making such a type a built-in, because you can never satisfy everyone's requirements.
It is also trivial to keep doing C mistakes with a C++ compiler, hence no matter how many ISO revisions it will still have, lack of safety due to C copy-paste compatibility will never be fixed.
> [...] no matter how many ISO revisions it will still have, lack of safety due to C copy-paste compatibility will never be fixed.
Okay, no idea how that's relevant to "built-in decimal types" vs "library-defined decimal types", but if it makes you feel better, you can do the same in Rust or Python, two languages which are "modern" compared to COBOL, don't inherit C's flaws, and which enable defining custom number types/classes/whatever together with convenient operator overloading.
> Python not really as the language doesn't provide any way to keep invariants
Again, how is that relevant? If there's no way to enforce an invariant in custom data types, then there's also no way to enforce invariants in code using built-in data types.
What I meant [1] was: In Python, invariants are enforced by conventions, not by the compiler. If that's not suitable for a given use case, then Python is entirely unsuited for that use case, regardless whether it provides built-in decimal types or user-defined decimal types. That's why I said that your objection regarding invariant enforcement is irrelevant to this discussion.
> How do account 3 items that cost 1/3$ each (e.g. if in a bundle)?
You never account for fractional discrete items, it makes no sense. A bundle is one product, and a split bundle is another. For products sold by weight or volume, it's usually handled with a unit price, and a fractional quantity. That way the continuous values can be rounded but money that is accounted for needs not be.
My last job they wanted me to invoice them hours worked, which was some number like 7.6.
This number plays badly when you run it through GST and other things - you get repeaters.
So I looked up common practice here, even tried asking finance who just said "be exact", and eventually settled on that below 1 cent fractions I would round up to the nearest cent in my favour for each line item.
First invoice I hand them, they manually tally up all the line items and hours, and complain it's over by 55 cents.
So I change it to give rounded line items but straight multiplied to the total - and they complain it doesn't match.
Finally I just print decimal exact numbers (which are occasionally huge) and they stop complaining - because excel is now happy the sums match when they keep second guessing my invoices.
All of this of course was irrelevant - I still had to put hours into their payroll system as well (which they checked against) and my contract specifically stated what my day rate was to be in lieu of notice.
So how should you do currency? Probably in whatever form that matches how finance are using excel, which does it wrong.
I wish this was untrue, but I have spent years hearing the words "why dont my reports match?" - no amount of logic, diagrams, explaining, the next quarter or instance - "why dont my reports match?"
The “exact” version they wanted was full of approximations too. They just didn’t have enough numerical literacy to understand how to say how much approximation they are ok with.
I guarantee nothing in anyone’s time accounting system is measured to double-precision accuracy. Or at least, I’ve never quite figured out the knack myself for stopping work within a particular 6 picosecond window.
Sure, but at the end of the day someone had to pay me an integer amount of cents. They wanted a total which was a normal dollar figure. But when you sum up 7.6 times whatever a whole lot, you might get a nice round number or you might get an irrational repeater.
What's notable is clearly no one had actually thought this through at a policy level - the answer was "excel goes brrrr" depending on how they want to add up and subtotal things.
Generally what is done is that “int 1 != $0.01” rather it’s “int 100 = $0.01”, as in the base of the integer is 1/100th a cent. That doesn’t perfectly solve your example case perfectly though admittedly.
There's no one answer, but decimal counts of the smallest unit that needs to be measured is common. Like pennies in the US, or maybe "number of 1/10 pennies" if there's things like gasoline tax.
For Python, I prefer decimal.Decimal[1]. When you serialize, you can either convert it to a string (and then have your deserializer know the field type and automatically encode it back into a decimal) OR just agree all numeric values can only be ints or decimals. You can pass parse_float=decimal.Decimal to json.loads[2] to make this easier.
My most obnoxious and spicy programming take is that ints an decimals should be built-in and floats should require imports. I understand why though: Decimal encoding isn't anywhere near as standardized as other numeric types like integers or floating-point numbers.
> My most obnoxious and spicy programming take is that ints an decimals should be built-in and floats should require imports
I don't care about making inexact numbers require imports, but the most natural literal formats should produce exact integers, decimals, and/or rationals.
An integer of the smallest denomination. For example, cents for the American dollar. And you probably would want to wrap it in a custom type to simplify displaying it properly, and maybe handle different currencies. If you language has a fixed point type that might also be appropriate, but that's pretty rare, and wouldn't work for currencies that aren't decimal (like the old british pound system).
Yes, you can. There are algorithms for rounding up, rounding down, rounding to nearest, and banker's rounding, on the results of integer division. This is a solved problem.
GCP on there other hand has standardized on unit + nano. They use this for money and time. So unit would 1 second or 1 dollar, then the nano field allows more precision. You can see an example here with the unitPrice field: https://cloud.google.com/billing/v1/how-tos/catalog-api#gett...
Copy/paste the GCP doc portion that is relevant here:
> [UNITS] is the whole units of the amount. For example if currencyCode is "USD", then 1 unit is one US dollar.
> [NANOS] is the number of nano (10^-9) units of the amount. The value must be between -999,999,999 and +999,999,999 inclusive. If units is positive, nanos must be positive or zero. If units is zero, nanos can be positive, zero, or negative. If units is negative, nanos must be negative or zero. For example $-1.75 is represented as units=-1 and nanos=-750,000,000.
It has the same issue that the other suggestion of your parent comment had: it can’t deal with fractions of cents, which is an issue you will most likely run into before you will into floating point rounding issues.
> In its base unit. So cents in USD. Which can be an int64.
Note that if you use cents in the US so that everything is an integer then as long as you do not have to deal with amounts that are outside the range [-$180 trillion, $180 trillion] you can also use double. Double can exactly represent all integer numbers of cents in that range.
This may be faster than int64 on some systems, especially on systems that do not provide int64 either in hardware or in the language runtime so you'd have to do it yourself.
Having worked on a POS system, the issue of using cents alone is if you've got something like "11% rebate" and you need to deal with fractional cents.
The arbitrary precision decimal type should be the default answer for currency until it is shown that the requirements no and at no time in the future will ever require fractional units of the smallest denomination.
A lot of good answers, but they mostly relate to accounting types of problems (which granted, is what you need to do with currency data 99% of the time)
I’d just add that if you are building a price prediction model, floats are probably what you need.
Depends what you’re doing. In fact it’s not always wrong to use floats for currency. For accounting you should probably use a fixed-precision decimal type.
If someone asks how to handle money the best answer is integers or fixed precision decimals. There may be a valid case for using floats, but if someone asks they shouldn't be using floats.
Also I'm hard pressed to come up with a case where floats would work. Can you give an example?
The answer is the same as _any_ time you should use floats: where you don't care about answers being exact, either (1) because calculation speed is more important than exactness, or (2) because your inputs or computations involve uncertainty anyway, so it doesn't matter.
This is more likely to be the case in, say, physics than it is in finance, but it's not impossible in the latter. For example, if you are a hedge fund and some model computes "the true price of this financial instrument is 214.55", you certainly want to buy if it's being sold for 200, and certainly don't if it's being sold for 250, but if it's being sold for 214.54, the correct interpretation is that _you aren't sure_.
When people say "you should never use floats for currency", their error is in thinking that the only applications for currency are in accounting, billing, and so on. In those applications, one should indeed use a decimal type, because we do care about the rounding behavior exactly matching human customs.
You can't use a generic decimal type in that case either! You need a special-purpose type that rounds exactly matching the conventions you're following. This is necessarily use-, culture-, and probably currency-specific.
Most things in front office use floats in my experience, e.g. derivative pricing, discounting, even compound interest. None of these things are going to be any better with integers or fixed-precision, but maybe harder to write and slower.
Yes, the risk management/instrument pricing part in the "Front Office" uses floats, because the calculations involve compound interest and discount rates.
And the downstream parts for trade confirmation ("Middle Office"), settlement and accounting ("Back Office") used fixed precision. Because they are fundamentally accounting, which involves adding things up and cross-checking totals.
These two parts have a very clear boundary, with strictly defined rounding rules when the floating point risk/trading values get turned into fixed point accounting values.
Create a Money class, or use one off the shelf. It should store the currency and the amount. There are a few popular ways of storing amounts (integer cents, fixed decimal) but it should not be exposed outside the Money class.
There's plenty of good advice in this subthread for how to represent currency inside your Money abstraction, but whatever you do, keep it hidden. If you pass around numbers as currency values you will be in for a world of pain as your application grows.
Either a fixed-point decimal (i.e. an integer with the ones representing 1/100, 1/1000, etc. of a dollar, or a ratio type if you need arbitrary precision.
This is the better default, so I'd ditch the qualifier, personally. At the very least when it comes to the persistent storage of monetary amounts. People often start out thinking that they won't need arbitrary precision until that one little requirement trickles into the backlog...
Arbitrary precision rationals handles all the artithmetic you could reasonably want to do with monetary amounts and it lets you decide where to round at display time (or when generating a final invoice or whatever), so there's no information loss.
> Dumb question, but what is the proper way to handle currency?
In python, for exact applications (not many kinds of modeling, where floats are probably right), decimal.Decimal is usually the right answer, but fractions.Fraction is sometimes more appropriate, and if you are using NumPy or tools dependent on it, using integers (representing decimals multiplied by the right power of 10 to get the minimum unit in the ones position) is probably better.
Yeah, you probably want to use some sort of decimal package for a configurable amount of precision, and then use strings when serializing/storing the values
Every front office finance project I have ever worked on has used floating point, so take the dogma with a grain of salt. It depends entirely on the context.
No, it's just that we're in the realm of predictions and modelling, not accounting. If you're constructing a curve to forecast 50 years of interest rates from a limited set of instruments, you're already accepting a margin of error orders of magnitude greater than the inaccuracies introduced by floating point.
The models also use transcendental functions which cannot be accurately calculated with fixed point, rationals, integers etc.
It's not like decimal or fixed point does not suffer from rounding errors either. In fact for many calculations, binary floating point gives more accurate answers.
In accounting there are specific rules that require decimal system, so one must be very careful with the floating point if it is used.
I mean, fixed point and a specific type for currency (which also should include the denomination, while we are at it) are not rocket science. Spreadsheets get that right, at least.
Excel uses IEEE-754 floating point, so I don't get what you mean with the spreadsheet comment. It has formatting around this which rounds and adds currency symbols, but it's floating point you're working with.
Rounding error doesn't matter on these types of financial applications. It's the less glamorous accounting work that has to bother with that.
They're not rocket science, but they're unnecessary, and would still be off anyway. Try and calculate compound interest with your fixed point numbers.
Each country has a law or something similar that states how people should calculate over prices.
The usual is to use decimal numbers with fixed precision (the actual precision varies from one country to another), and I don't know of any modern exception. But as late as the 90's there were non-decimal monetary systems around the world, so if you are saving any historic data, you may need something more complex.
These are great examples. I wrote about how this will propagate all sorts of bugs.
But my argument was that it's good enough developers may get complacent and not review the auto complete closely enough. But maybe I'm wrong! Maybe it's not that good yet.
> And these are the hand picked examples. This product seems like it needs some more thought.
Everyone's self-preservation instincts kicking in to attack Copilot is kinda amusing to watch.
Copilot is not supposed to produce excellent code. It's not even supposed to produce final code, period. It produces suggestions to speed you up, and it's on you to weed out stupid shit, which is INEVITABLE.
As a side note, Excel also uses floats for currency, so best practice and real world have a huge gap in-between as usual.
So how do you know if the code that Copilot regurgitates is almost a 1:1 verbatim copy of some GPL'ed code or not ?
Because if you don't realize this, you might be introducing GPL'ed code into your propiertary code base, and that might end up forcing you to distribute all of the other code in that code base as GPL'ed code as well.
Like, I get that Copilot is really cool, and that software engineers like to use the latest and bestest, but even if the code produced by Copilot is "functionally" correct, it might still be a catastrophic error to use it in your code base due to licenses.
This issue looks solvable. Train 2 copilots, one using only BSD-like licensed software, and one using also GPL'ed code, and let users choose, and/or warn when the snippet has been "heavily inspired" by GPL'ed code.
Or maybe just train an adversarial neural network to detect GPL'ed code, and use it to warn on snippets, or...
Verbatim isn't the problem / solution. If you take a GPL'ed library and rename all symbols and variables, the output is still a GPL'ed library.
Just seeing the output of GPL'ed code spitted by copilot and writing different code "inspired" by it can result in GPL'ed code. That's why "clean room"s exist.
Copilot is going to make for a very interesting to follow law case, because probably until somebody sues, and courts decide, nobody will have a definitive answer of whether it is safe to use or not.
Stack Overflow content is licensed under CC-BY-SA. Terms [1]:
* Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
* ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
In over a decade of software engineering, I've seen many reuses of Stack Overflow content, occasionally with links to underlying answers. All Stack Overflow content use I've seen would clearly fail the legal terms set out by the license.
I suspect Copilot usage will similarly fail a stringent interpretation of underlying licenses, and will similarly face essentially no enforcement.
Have you met programmers? Even those who care about quality are often under a lot of pressure to produce. Things slip through. Before, it was verbatim copies from Stack Overflow. Now it'll be using Copilot code as-is.
Not the parent, but people really like to get riled up on the same topics, over and over again, which quickly monopolizes and derails all conversion. Facebook bad, UIs suck, etc. We can now add to the list, "AI will never reduce demand for software engineering".
Copilot is definitely no replacement for anything except copying from Stack Overflow for juniors.
But in the long run, AI is us basically us creating our own replacement. As a species. We don't realize it yet. It'll be really funny in retrospective. Too bad I probably won't be alive to see it.
It's true I probably wouldnt have laughed quite as loudly if there werent a chorus of smug economists telling us that tools like this are gonna put me out of a job.
Business types hate dealing with programmers, that's a fact. And these claims of "we'll replace programmers" happen with certain precise regularity.
Ruby on Rails was advertised as so simple, startup founders who can't program were making their entire products in it in a few days, with zero experience. As if.
If I want random garbage in my codebase that I have to fix anyways I might as well hire a underpaid intern/junior.
It's easier to write correct code than to fix buggy code. For the former you have to understand the problem, for the latter you have to understand the problem, and a slightly off interpretation of it.
> Everyone's self-preservation instincts kicking in to attack Copilot is kinda amusing to watch
Nobody is threatened by this, assuredly. As with IDEs giving us autocomplete, duplication detection, etc this can only be helpful. There is an infinite amount of code to write for the foreseeable future, so it would be great if copilot had more utility.
Excel rounds doubles to 15 digits for display and comparison. The exact precision of doubles is something like 15.6 digits, those remaining 0.6 digits causing some of those examples floating (heh) around.
My suggestion was a way to comment or flag, not to kill the product. These were particularly notable to me because someone hand-picked these 4 to be the front page examples of what a good product it was.
I agree with you. This is basically similar to autocomplete on cellphone keyboard (useful because typing is hard on cellphone), but for programming (useful because what we type tends to involve more memorization than prose).
I'm not surprised to be honest. I've played around with AI dungeon, which also uses GPT-3. It regularly reproduces content directly from its training material, including even comments attached to the stories they trained the ai on.
This reminds me of an issue that came up when I was working with a intelligence agency, training machine translation.
If you think about language in general, individual words aren't very sensitive. The word for bomb in any language is public knowledge. But when you start getting to jargony phrases, some might be unique to an organization. And if you're training your MT on translated documents surreptitiously intercepted from West Nordistan's nuclear program, and make your MT model public, the West Nordistanis might notice - "hey, this accurately translates our non-public documents that contain rather novel phrases ... I think someone's been listening to us!"
Even includes the commented out code. Clearly Copilot has gained a deep understanding of code and is not simply the slowest way to make a terrible, opaque search engine ever!
From the tweet it looks like an awesome search feature. Just type what you wanted to search for right inline and then it can drop the result in without you ever changing a window or moving a hand to the mouse.
Problem is you don't know whose code you're stealing, which leads to all sorts of legal, security, and correctness issues.
No. GitHub Copilot tries to understand your intent and to generate the best code it can, but the code it suggests may not always work, or even make sense. While we are working hard to make GitHub Copilot better, code suggested by GitHub Copilot should be carefully tested, reviewed, and vetted, like any other code. As the developer, you are always in charge.
Naively, as someone who just heard of this - that sounds worse than useless. If you can't trust its output and have to verify every line it produces and that the combination of those lines does what you wanted, surely it's quicker just to write the code yourself?
Just today I needed to quickly load a file into a string in golang. I haven't done that in a while, so I had to go look up what package and function to use for that. I'd love a tool that would immediately suggest a line saying `ioutil.ReadFile()` after defining the function. I would never accept a full-function suggestion from Copilot, similarily to how I never copy and paste code verbatim from StackOverflow. Using it as hints for what you might want to use next seems like a nice productivity boost.
It’s quite literally stealing code from repos under a GPL license and suggesting them to people regardless of license (if any) they’re using. I do not see how this is legal.
I disagree with this attitude. Many demos such as this one with Quake code are intentionally looking for (funny) outliers by bending the rules. But this is not how anyone would use the system in a real scenario (no one should select license by typing "// Copyright\t" and selecting whatever gets auto-completed), so it doesn't really demonstrate any new limits besides what you could reasonably expect anyway (and what's mentioned on the Copilot's landing page).
Basically, in order to fall victim for this "code theft" (or any other "footguns" from Twitter threads) you'd need to be actively working against all the best practices and common sense. If you actually use it as a productivity tool (the way it is marketed) you'll remain in full control of your code.
Sure, people should double check the code I don’t disagree, but a proprietary code suggestion tool shouldn’t be suggesting licensed code at all; let alone GPLd code unless they can somehow verify the code base they are suggesting it become part of is GPL. That is the problem here and I don’t see how you can disagree with that.
This claim that "AI" only means artificial general / human-equivalent intelligence completely ignores the long history of how that term has been used, by computer science researchers, for the last 70-odd years, to include everything from Shannon's maze-solving algorithms, to Prolog-y systems, to simple reinforcement learning, and so on.
It's true that there has been linguistic drift in the direction of the definition getting narrower (to the point where it's a joke that some people use 'AI' to mean whatever computers can't do _yet_). And you can have reasons to prefer your own very-narrow definition. But claiming that your own definition is the only valid one to the point that anyone using a wider definition (one that has a long etymological history, and which remains in widespread use) are "dumb" is... not how language works.
The clever OpenAI marketing hype squad on HN and Twitter know that they are re-selling a snake oil contraption. This 'thing' completely needs assistance from a human since it is producing insecure code, code that is also copyrighted and most of the times garbage from other sources, which is again totally dangerous.
Just look at this [0] Do a simple 'typo' in the signature and the whole implementation is wrong.
I have to say that OpenAI, GitHub and Microsoft are very clever in selling this scam to engineers who use the code produced by this contraption as 'safe to use' in their projects; especially since GPT-3 still cannot explain why it is generating the code its generating, or if the code is under a license that is non-commercial or under a restrictive licence.
it's the marketing magic bullet. each person shot is entranced by its promises, and given unlimited ammo to spread its lies. few possess armor capable of stopping them
This is utterly damning. I have already instructed my team that Copilot can never be used for our projects. Compromising the product because of unknowable license demands isn't acceptable in the professional world of software engineering.
But if we put the licensing to one side for a moment...
1/ Everything I've seen it generate so far is 'imperative hell'. It is practically a 'boilerplate generator'. That might be useful for pet projects, smaller code bases, or even unit-test writing. But large swathes of application code looking like the examples I've seen so far is hard to manage.
2/ The boilerplate is what bothers me the most (as someone who believes in the declarative approach to software engineering). The future for programming and programming languages should be an attempt to step up to a higher level of abstraction, that has been historically the way we step up to higher levels of productivity. As applications get larger and code-bases grow significantly we need abstraction, not more boilerplate.
3/ As someone who develops a functional framework for C# [1], I could see Copilot essentially side-lining my ideas and my approach to writing code in C#. Not just style, but choice of types, etc. I wonder if the fall out of what is Copilot's 'one true way' of generating code was ever considered? It appears to force a style that is at odds with many who are looking for more robust code. At worst it will homogenise code "people who wrote that, also wrote this" - stifling innovation and iterative improvements in the industry.
4/ Writing code is easy. Reading and understanding code written by another developer is hard. Will we spend most of our time as code-reviewers going forwards? Usually, you can ask the author what their intentions were, or why they think their approach is the correct one. Copilot (as far as I can tell) can't justify its decisions. So, beyond the simple boilerplate generation, will this destroy the art of programming? I can imagine many juniors using this as a crutch, and potentially never understanding the 'why'.
I'm not against productivity tools per se; it's certainly a neat trick, and a very impressive feat of engineering in its own right. I am however dubious that this really adds value to professional code-bases, and actively may decrease code quality over time. Then there's the grey area of licensing, which I feel has been totally brushed to one side.
>2/ The boilerplate is what bothers me the most (as someone who believes in the declarative approach to software engineering). The future for programming and programming languages should be an attempt to step up to a higher level of abstraction, that has been historically the way we step up to higher levels of productivity. As applications get larger and code-bases grow significantly we need abstraction, not more boilerplate.
Just the other day someone on copilot threads was arguing that this kind of boilerplate optimizes for readability... It's like Java Stockholm syndrome and the old myth of easy to approach = easy to read (how long it took them to introduce var).
I've always viewed code generators as a symptom of language limitations (which is why they were so popular in Java land) that lead to unmaintainable code, this seems like a fancier version of that - with all the same drawbacks.
I'm all for abstracting. I like Rails, for example. That said, it gets truly difficult to add or change stuff at the more abstract layers. For example, adding recursive querying to an existing ORM is tough. And on the rare occasion that there is a bug in the abstract layer, debugging that from the normal application code is also tough.
I understand why some corporations prefer dumb boilerplate everywhere for some applications. If there is an outage it's usually easy to fix quickly. Sometimes it's not, if it's a issue in the boilerplate (say, Feb 29 rolls around and all of the boilerplate assumed a 28 day month) that means a huge update all across the system, but that rarely happens in practice.
I would say ORM is tough with code gen or with metaprogramming because it maps two mismatched paradigms (OOP and relational) and tries to paper over the differences.
I do agree on the debugging aspect - especially in dynamic languages - metaprogramming stack traces can be really hard to follow.
Tools like xsd or T4 (in the .NET ecosystem) are great time-savers, but you would never consider directly modifying the code they generate. You would leave the generated code untouched (in case it ever needed to be generated again) and subclass it to make whatever changes you intend.
I think Copilot is so unfortunate because it's not building abstractions and expecting you to override parts of them. It's acting as an army of monkeys banging out Shakespeare on a typewriter. And the code it generates is going to require an army to maintain.
Even there I feel like code generators are just a band aid around the fact that metaprogramming facilities suck. If you would never modify the generated code why generate in the first place. You could argue that stack traces are easier to follow but TBH generated code is rarely pretty in that regard as well.
For example I think F# idea of type providers > code generators.
Linq2Db is a great example of T4 code generation that works. It creates partial classes from database schema. Together with C# I have strongly typed database access.
Awesome summary and thanks for trying it for the rest of us!
Copilot sounded terrible in the press release. The idea that a computer is going to pick the right code for you (from comments, no less) is really just completely nuts. The belief that it could be better than human-picked code is really way off.
You bring up a really important point. When you use a tool like Copilot (or copypasta of any kind), you are introducing the additional burden of understanding that other person's code -- which is worse than trying to understand your own code or write something correct from scratch.
I think you've hit the nail on the head. Stuff like Copilot makes programming worse and more difficult, not better and easier.
While I accept most of the concerns, it's better than your comment suggests. I see some promise for it as a tool for reminding you of a technique or inspiring you to a different approach than you've seen before.
For example, I wrote a comment along the lines of "Find the middle point of two 2D positions stored in x, y vectors" and it came up with two totally different approaches in Ruby - one of which I wouldn't have considered. I did some similar things with SQL, and some people might find huge value in it suggesting regexes, too, because so many devs forget the syntax and a reminder may be all it takes to get out of a jam.
I'm getting old enough now to see where these sorts of prompts will be a game changer, especially when dabbling in languages I'm not very proficient in. For example, I barely know any Python, so I just created a simple list of numbers, wrote a "Sort the numbers into reverse order" comment, and it immediately gave me the right syntax that I'd otherwise have had to Google, taking much longer.
Maybe to alleviate the concerns it could be sandboxed into a search engine or a separate app of its own rather than sitting constantly in my main editor - I would find that a fair compromise which would still provide value but require users to engage in more reflection as to what they're using (at least to a level that they would with using SO answers, say).
Yeah, but... I mean, I guess we all agree that copying code from, let's say StackOverflow without checking if it really does what you want it to do is a bad thing? Now here we have a tool that basically automates that (except it's copying from GitHub, not StackOverflow), and that's supposed to be a good thing? Even if its AI is smarter, you would still have to check the code it suggests, and that can actually be harder than writing it yourself...
The big boost, that I think parent is alluding to, is for rusty (not Rust!) languages in the toolbox, where you may not have the standard library and syntax loaded into your working memory.
As a nudge, it's a great idea. As a substitute for vigilance, it's a terrible idea.
I suspect that's why they named it Copilot instead of Autopilot, but it's unfortunately more likely to be used as the latter, humans being humans.
Right, so it might occasionally be useful as a search tool for divergent ideas of different approaches to a problem, and your suggestion to sandbox it in a separate area works for that.
But that does not seem to be it's advertised or configured purpose, sitting in your main editor.
This is good stuff. As a search engine, it could very well be useful. As another poster pointed out, if some context or explanation were provided along with the source suggestions, its utility as a reference would really grow.
I totally agree with you that prompted help is a big deal and just going to get bigger. We have developed a language for fact checking called MSL that works exactly this way in practice -- suggesting multiple options rather than just inserting things.
One of the things that interests me about this thread is the whole topic of UI vs. AI and how much help really comes from giving the user options (and a good UI to discover them) vs how much is "AI" or really intelligence. I think the intelligence has to belong to the user, but a computer can certainly sift through a bunch of code to find a search engine result and, those results could be better than you get now from Google &Co.
If they're using something like GPT-3 on the backend, which they probably are, it probably can't provide any explanations or context (unless the output is memorized training data, like this); the output can be somewhat novel code not from any particular source, and while it might be possible to find relevant information on similar code, this would be a hard problem too.
It's really weird for software engineers to judge something by its current state and not by its potential state.
To me, it's clearly solvable by Copilot filtering the input code by that repository's license. It should only be certain open source licenses, maybe even user-choosable, or code-creators can optionally sublicense their code to Copilot in a very permissable way.
Secondly, a way for the crowd to code review suggestions would be a start.
I've been in the business a long time and I just don't believe in generalized AI at all. Writing code requires general (not artificial) intelligence. All of these "code helping" tools break down quickly because they may be searching for and finding relevant code blocks (the "imperative hell" referred to by another commenter), but they don't understand the context or the overall behavior and goals of the program.
Writing to overall goals and debugging actual behavior are the real work of programmers. Coming up with syntax or algorithms are 3rd and 4th on the priority list because, lets face it, it's not that hard to find a reference for correct syntax or the overall recipe implied by an algorithm. Once you understand those, you can write the correct code for your project.
I do think Copilot has potential as a search engine and reference tool -- if it can be presented that way. But the idea of a computer actually coming up with the right code in the full context of the program seems like fantasy.
If we're coming up with potential uses, I think they got the direction wrong.
Don't tell me what to do, tell me what not to do. "this line doesn't look like something that belongs in a code base", "this looks like a line of code that will be changed before the PR is merged". Etc.
That would be fantastic! Imagine if it could catch common errors before you make them. So many things in loops and tests that we mess up all the time. My favorite is to confuse iterating through an array vs an object in JS. I'd love to have Gazoo step in and say, "Don't you mean, this, David?"
> It's really weird for software engineers to judge something by its current state and not by its potential state.
No, we're not afraid of Copilot replacing us. The thought is ridiculous, anyway. If it actually worked, we would be enabled to work in higher abstractions. We'd end up in even higher demand because the output of a single engineer would be so great that even small businesses would be able to afford us.
Yes, we are afraid of Copilot making the entire industry worse, the same way that "low-code" and "no-code" solutions have enabled generations of novices to produce volumes of garbage that we eventually have to clean up.
Practically every open source license requires attribution, if copilot has a licensing issue, training a model on only repositories with the same license won't fix it except for the extremely rare licenses which do not require attribution.
Could they handle this by generating a collective attribution file that covers every (permissively licensed) repository that Copilot learned from?
Of course this would be massive, so from a practical consideration the attribution file that Copilot generates in the local repository would have to just link to the full file, but I don't think that would be an issue in and of itself.
Maybe? Might depend on the license, I doubt the courts would be amused.
Almost certainly a link would not suffice, basically every license requires that the attribution be directly included with the modified material. Links can rot, can be inaccessible if you don't have internet access, can change out from underneath you, etc.
Makes sense. Maybe something like git-lfs/git-annex would be sufficient to address the linking issue, but it seems like the bigger concern is whether a court would accept this as valid attribution. In a sense it reminds me of the LavaBit stunt with the printed key.
I think a judge could be persuaded that a list of every known human does not constitute a valid attribution of the actual author, even though their name is on the list. The purpose of an attribution is to acknowledge the creator of the work, and such a list fails at that.
Makes sense. That's probably the best interpretation here. Any other decision would make attribution lists optional in general for all practical purposes.
Stuff like Copilot makes programming worse and more difficult, not better and easier.
Copilot makes programming worse and more difficult if you're aiming for a specific set of coding values and style that Copilot doesn't generate (yet?). If Copilot generates the sort of code that you would write, and it does for a lot of people, then it's definitely no worse (or better) than copying something from SO.
The author of a declarative, functional C# framework likely has very different ideas to what code should be than some PHP developer just trying to do their day-to-day job. We shouldn't abandon tools like Copilot just because they don't work out at the more rigorous ends of the development spectrum.
>If Copilot generates the sort of code that you would write, and it does for a lot of people, then it's definitely no worse (or better) than copying something from SO.
Disagree.
Most SO copy-paste must be integrated into your project -- maybe it expects different inputs, maybe it expects or works with different variables -- whatever, it must be partially modified to work with the existing code-base that you're working with.
Copilot does the integration tasks for you. When one might have had to read through the code from SO to understand it enough to integrate it, the person using Copilot need not even invest that much understanding.
Because of these workflow differences, it seems to me as if Copilot enables an even more low-quality workflow than offered by copy-pasting from SO and patching together multiple code-styles and paradigms while hoping for the best; Copilot does that without even the wisdom that an SO user might have that 'this is a bad idea.'
I'm not firmly for or against the concept of Copilot, but it's fascinating to me that it will introduce an entirely new class of bugs. Rather than specific mistakes in certain blocks of code and edge case errors in handling certain inputs, now we're going to have lazy/overworked/junior developers getting complacent and committing code they haven't reviewed that isn't even close to their intent. Like you could have a backend method that was supposed to run a database query, but instead it sends the content of an arbitrary variable in a POST request to a third-party API or invokes a shell to run `rm -rf /`.
To me, the most interesting aspect is the new class of supply chain security vulnerabilities it will create. How people will act to exploit or protect¹ against those will be very interesting.
1 - I don't expect "not using a tool that generates bad code" to be the top option.
The arguments that the GP makes are not based on a specific style or value of coding. Instead, they're based on the simple truth that it is harder to understand code that somebody else wrote.
In some cases the benefits of doing so outweigh the costs (such as using a stack overflow answer that's stood the test of time for something you don't know how to do), but with Copilot you don't even get the benefit of upvotes, human intent, or crowdsourced peer review.
I don't think they work out past trivial applications. Any non trivial app requires an understanding of a much larger part of the codebase than a tool like Copilot is looking at at any one time.
Copilot does not understand the code in toto and is therefore really useless for debugging (70% of all coding) and probably useless for anything other than very simple parts of an app.
Any non trivial app requires an understanding of a much larger part of the codebase than a tool like Copilot is looking at at any one time.
I don't think that's important. Copilot, at least as it's been demo'd so far judging by the examples, is to help you write small, standalone functions. It shouldn't need to know about the rest of the application. Just as the functions that you write yourself shouldn't need to know about the rest of the application either.
If your functions need a broad understanding of the codebase as a whole how the heck do you write tests that don't fail the instant anything changes?
The reality of code is that stuff breaks when connected to other stuff, as it eventually must be for real work to happen. There's no getting around that.
Since that's where the work of programming is, debugging connected applications (not writing fresh, unencumbered code, a rare luxury), a tool that offers no help for that is, well, not much help.
5/ Boilerplate is easy to write but expensive to maintain in large quantities. Proper abstraction/templating requires careful thinking. Copilot encourages the first and discourages the second.
6/ Copilot learns from the past. It can only favor popularity and familiarity in code patterns over correctness and innovation.
I'm not sure we should throw the baby out with the bath water here due to the large blurbs it stubs in when when it doesn't have a lot to go on in mostly empty files. It is a preview release. They are working on proper attribution of suggested code and explainability [1]. Having a stochastic parrot that types faster than I do would be useful in a lot of cases.
Yes, better layers of abstraction could make us more productive in the future, but we're not there yet. By all means, don't accept the larger blurbs it proposes, but there is productivity to be gained in the smaller suggestions. If it correctly intuits the exact rest of the line that you were thinking of, it will save time and not make you lose understanding of the program.
In some areas complete understanding and complete code ownership is required but in a lot of places, it's not. If it produces the work of a moderately skilled developer it would be sufficient. I don't remember all code I write as time passes. If it produces work that I would have produced, then I don't see how that's any different that work that was produced by my past self.
It may feel offensive but a lot of the comments against it sound like rage against the machine/industrialization opponents and the arguments sound pretty similar to those made in the past by those that had their jobs automated away. I'm not sure we're all as unique snowflakes as we like to think we are. Sure, there will be some code that requires an absolute master that is outside the capabilities of this tool. But I'd guess there is a massive amount of code that doesn't need that mastery.
For small snippets that have likely been already written by someone else, this probably works great. For those though, the time savings is probably at most 5-10 min down to 1 or less. The challenge is that that’s not where my time goes unless I’m working in an unfamiliar language.
As someone who writes a lot of code quickly, I’m usually bottlenecked by reviews. For more complex changes I’m bottlenecked by understanding the problem and experimenting with solutions (and then reviews, domain-specific tests usually, fixing bugs etc). Writing code isn’t like waiting for code to compile since I’m not actually ending up task switching that frequently.
This does sound like a fantastic tool when I’m not familiar with the language although I wonder if it actually generates useful stuff that integrates well as the code gets larger (eg can I say “write an async function that allocates a record handle in the file with ownership that doesn’t outlive the open file’s lifetime”). I’m sure though that this is what a lot of people are overindexing on. For things like that I expect normal evolution of the product will work well. For things like “cool, understand your snippets but also weight my own codebase higher and understand the specifics of my codebase”, I think there’s a lot of groundbreaking research that would be required. That is what I see as a true productivity boost - I’d make this 100% required for anyone joining the codebase. The more mentorship can be offloaded, the lower the cost is to growing teams. OSS projects can more easily scale similarly.
This is a little off topic, but your framework looks really interesting! How come you opted for building a functional framework in C#, vs using F#? I couldn’t see anything in the README about what was specifically frustrating about F#? I ask because we’re looking at introducing it at my company.
I cofounded a company in 2005, the primary product is a never-ending C# web-application project. As the code-base grew to many millions of lines of code I started to see the very real problems of software engineering in the OO paradigm, and had the functional programming enlightenment moment.
We started building some services in F#, but still had a massive amount of C# - and so I wanted the inertia of my team to be in the direction of writing declarative code. There wasn't really anything (outside of LINQ) that did that, so I set about creating something.
We don't write F# any more and find functional C# (along with the brilliant C# tooling) to be very effective for us (although we also now use PureScript and Haskell).
I do have a stock wiki post on the repo for this though [1]. You might not be surprised to hear it isn't the first time I've been asked this :)
Generating code has never been a problem for developers :)
I'd be more interested in a tool that notices patterns and boilerplate. It could offer a chance for generalization, abstraction or use of a common pattern from the codebase. This is of course much harder.
Great points. Really makes me question why so many developers were excited / worried about programming jobs being automated away by this technology. I really doubt that many jobs are going to be displaced by what is at best an improvement to autocomplete/intellisense and at worst an unreliable, copyright infringing boilerplate generator. Also agree with point #3 - I could see Copilot steering devs away from new code patterns toward whatever was most commonly seen in the existing codebases it was trained on. Doesn't seem good for innovation in that sense.
I'm inclined to agree with you, and actually I'm rather mistrustful of even basic autocomplete ever since a colleague caught me using it without even looking at the screen!
But I wonder...
Is this a difference of programmer culture?
I think there are people who write successful computer programs for successful businesses without delving into the details. Without considering all the things that might go wrong. Without mapping the code they're writing to concepts.
More seriously, when I think back to when I was first learning programming - in the heady days of 1985 - I would often copy listings out of computing magazines, make a mistake whilst doing it, and then have no idea what was wrong. The only way was to check character by character. I didn't have the deeper understanding yet, and so I couldn't contribute to solving the problem in any real way.
If they're at that level as a programmer, to the point where their code is being written for them and they don't really understand it, then they're going to make some serious mistakes eventually.
If you want to step up as a dev, understanding is key. Programming is hard and gets harder as you step up and bite off bigger and more complex problems. If you're relying on the tools to write your code, then your job is one step away from being automated. That should be enough to light a fire under your ambition!
I also typed stuff in from magazines in the 80’s, and my fast but imperfect typing really helped me learn programming: I often had to stop, go back to the first page, and actually read the damned thing in order to make it work.
It would certainly alleviate the license concerns. If it was possible to train it to a level (that produces effective output), then sure.
As a thought experiment, I thought "what would happen if we trained it on our 15 million lines of product code + my language-ext project". It would almost certainly produce something that looks like 'us'.
But:
* It would also trip over a million or so lines of generated code
* And the legacy OO code
* It will 'see' some of the extreme optimisations I've had to built into language-ext to make it performant. Something like the internals of the CHAMP hash-map data-structure [1]. That code is hideously ugly, but it's done for a good reason. I wouldn't want to see optimised code parroted out upfront. Maybe it wouldn't pick up on it, because it hasn't got a consistent shape like the majority of the code? Who knows.
Still, I'd be more willing to allow my team to use it if I could train it myself.
> This would enforce bad behaviors and make it even harder for fresh developers to argue against it.
I think this is a significant point. It maintains the status quo. We change our guidance to devs every other year or so. New language features become available, old ones die, etc. But we're not rewriting the entire code-base every time, we know if we hit old code, we refactor with the new guidance; but we don't do it for the sake of it, so there's plenty of code that I wouldn't want in a training set (even if I wrote it myself!)
That would actually be potentially useful, it could do a kind of combination of autocompletion of internal libraries, automatic templates for common patterns and internal style/linting type tasks all in one. Certainly augmenting those other things.
It would be interesting how much code you would need before it was useful (and how good does it have to be to be useful? Does even a small error rate cost so much that it erases other gains, because so many of the potential errors in usage of this type of tool are very subtle?)
That sounds interesting, though it still feels like it would need work. Like a way to annotate suggestions with comments, or flag them. Definitive licensing shown for each snippet. A way to mark deprecated code as deprecated to the training algorithm, etc.
If you find yourself copying code someone else in your organization wrote rather than abstracting it to a function in a shared library or building a more declarative framework to manage the problem, something horrible has happened.
Sometimes boilerplate is unavoidable. As an example, how do you send a GET request with libcurl in C with an authorization header? I can't tell you offhand, but I can tell you the file in my codebase that does have it, because I've duplicated the logic for two separate systems.
So you are saying you would rather every project in the world have at least one--if not, thanks to making it easier via Copilot, many--copies of this code rather than one shared library that provides a high-level abstraction for libcurl?... At least for your own code, how did you end up with two copies of duplicated logic rather than a shared library of functionality?
> So you are saying you would rather every project in the world have at least one--if not, thanks to making it easier via Copilot, many--copies of this code.
Absolutely not, not at all. I'm suggesting that copying and pasting happens, particularly in the context of a single project.
> At least for your own code, how did you end up with two copies of duplicated logic rather than a shared library of functionality?
At what point is it worth introducing an abstraction rather than copying? Using my libcurl example, you can create an abstraction over the~ 10 lines of initialization, but if you need to change it to a POST, then you're just implemnenting an abstraction over libcurl, which is just silly.
If you have 10 lines of repeated code with one line changed to make it GET vs POST, introducing an abstraction isn't "silly": it is simultaneously both ergonomic and advantageous, as not only is libcurl's API extremely verbose (as it is a low-level primitive), if you ever need to add another line of code to that initialization--which totally happens over the years, due to various security extensions you might need to either enable or disable with respect to acceptable TLS settings, or to tune performance parameters related to connection caching, or to add a header to every request (for any number of reasons from debugging to authentication)--you can do it in one place instead of umpteen number of places. The libcurl API is itself a leaky abstraction of the underlying TLS libraries in places, so if you ever realize you need to switch SSL libraries (a space in which there has been absolute upheaval in recent years) you are going to reach for shared abstractions; and like... to take this to its ultimate conclusion: I use libcurl as a fallback for Linux, but if you want to correctly support the user's settings for proxy servers--which are sometimes needed for your requests to work at all--my code is abstracted so I can plug in entirely different HTTP backends instead of libcurl, such as Apple's CFNetwork (which you absolutely should be using if at all possible on iOS). You act like abstraction is somehow a bad thing or some inherent cost you want to avoid, when it should absolutely take you less time to wrap duplicated code into a function than to duplicate it in the first place, and if IDE features (including Copilot) are somehow making you think it is easier to throw a ton of duplicated code everywhere, that is part of the argument for why those features are dangerous... they are apparently undermining all the work people did onto refactoring code browsers that are designed to help users locally manage abstraction instead of mitigating poor architecture :/.
> programming languages should be an attempt to step up to a higher level of abstraction
Adding abstraction buries complexity. If all you do is keep adding more abstractions, you end up with an overcomplicated, inefficient mess. Which is part of why application sizes are so bloated today. People just keep adding layers, as long as they have room for more of them. Everything gets less efficient and definitely not better.
The right way to design better is to iterate on a core design until it cannot be any simpler. All of the essential complexity of software systems today comes from 40 year old conventions. We need a redesign, not more layers.
One example is version management. Most applications today can implement versioned functions and keep multiple versions in an application, and track dependencies between external applications. Make a simple DAG of the versions and let apps call the versions they were designed against, or express what versions are compatible with what, internally. This would make applications infinitely backwards-compatible.
The functionality exists right now in GNU Libc. You can literally do it today. But rather than do that, we stumble around replacing entire environments of specific versions of applications and dependencies, because we can't seem to move the entire industry forward to new ideas. Redesign is hard, adding layers is easy.
> Adding abstraction buries complexity. If all you do is keep adding more abstractions, you end up with an overcomplicated, inefficient mess. Which is part of why application sizes are so bloated today. People just keep adding layers, as long as they have room for more of them. Everything gets less efficient and definitely not better.
Presumably you're writing code in binary then? This is a non-argument, because there's evidence that it's worked. Computers were first programmed with switches and punch cards, then tape, then assembly, then low level languages like C, then memory managed languages etc.
Abstraction works when side-effects are controlled. Composition is what we're after, but we must compose the bigger bits from smaller bits that don't have surprises in. This works well in functional programming, a good example would be monadic composition: monads remove the boilerplate of dealing with asynchrony, value availability, list iteration, state management, environment management, etc. Languages that have first-class support for these tend to have significantly less boilerplate.
The efficiency argument is also off too. Most software engineering teams would trade some efficiency for more reliable and bug free code. At some point (and I would argue we're way past it) programs become too complex for the human brain to comprehend, and that's where bugs come from. That's why we're overdue an abstraction lift.
Tools like Copilot almost tacitly agree, because they're trying to provide a way of turning the abstract into the real, but then all you see is the real, not the abstract. Continuing the assault on our weak and feeble grey matter.
I spent the early part of my career obsessing over performance on crippled architectures (Playstation 3D engine programmer). If I continued to write applications now like I did then, nothing would go out the door and my company wouldn't exist.
Of course there are times when performance matters. But the vast majority of code needs to be correct first, not the most optimal it can be for the architecture.
That's an extreme form of simplification. Simplification is not performance optimization at all; it's removing non-essential complexity. You can still have abstraction and layers, monads and features. The thing is not to keep adding them when a refactor makes them redundant.
Like I say, our designs are ancient and lack features; we need to add more stuff to the code. But that will enable us to remove abstractions that were added only because our previous designs were crap.
Probably an excellent reminder that both Google and Microsoft decided to use your private emails for a training set to create Smart Reply behavior that can "write emails for you", and they swore up and down there's no way that could ever leak private information.
We need legislation banning companies from ingesting data into AI training sets without explicit permission.
GitHub clearly stated they only used publicly available repos in this project. However, as many people are rightfully pointing out, those projects might still be either closed source or copylefted, and if Copilot regurgitates chunks of those projects, people who use it may be subject to infringement lawsuits in the future.
Is Copilot aimed at programmers or at non-technical hiring managers?
I mean, it goes right away with the devaluing narrative of programming that is going around from the last couple of years. To the "anyone can code" narrative we are adding "more so, if they have AI assisted Copilot"
Copilot is one of the worst ideas that have made it to production in recent years. I predict it will be quite successful considering Microsoft's track record.
Honestly, I see this exact issue as the main accomplishment of Copilot. It shows that the black-box machines are to be considered harmful and are incompatible with the current intellectual property and privacy frameworks.
This issue goes way beyond just code - imagine GPT-like systems being used in medical diagnosis and results can suddenly depend on the date of the CT-scan or the name of patient, because the black-box simply regurgitates training data...
Have you read MIT license? It explicitly says: The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
Plenty of people probably copy-paste GPL code with the comments and stick MIT on it. This kind of thing violates the GPL, but I’m pretty sure (IANAL) that such code is “fruit of the poison tree”, and if you then copy it, you too can be held responsible. Sure, you might not get caught, but it’s a rough situation if you do.
Very interesting that this was posted as I literally JUST watched an even MORE interesting youtube upload about this very bit of code just last weekend.
Here's the very fun video if anyone wants to take a look:
The irony is that we're whinging about a tool that generates code that will be difficult to understand in the future...
... and the example is mathematically- and floating-point-spec obtuse enough that it was incomprehensible at the time it was written. (As evidenced by id comments)
I.e., I'm in the middle of a refactoring operation and have to do lots of repetitive work; the tool should help me by understanding what I'm trying to do after I give it 1 example.
First the automation came for the farmers, and I did not speak out —
Because I was not a farmer.
Then the automation came for the factory workers, and I did not speak out —
Because I was not a factory worker.
Then the automation came for the accountants, and I did not speak out —
Because I was not a accountant.
Then the automation came for me (a programmer) —
and there was no one left to speak for me.
Then wait for them to realize how brittle the code is when nobody is considering the context into which this code is being foisted. They'll TRIPLE our salaries! :D
I've always assumed that we would eventually have a low-code, or no-code junior dev replacement, and was wondering if this was it. GH and MS actually have [Ed. had?] some cred for this kind of thing.
Copying GPLed code as your own and passing it under an MIT license is not too far fetched of a thing for a junior dev to do.
Jokes aside, to have a proper junior dev replacement you need something that is able to learn and grow to eventually become a senior dev, an architect, or a CTO. That is the most important value of a junior dev. Not the ability to produce subpar code.
I think a lot of modern software development shops, these days, exist only to make their founder[s] as rich as possible, as quickly as possible.
If they are willing to commit their entire future to a lowest-bid outsourcing shop, then I don’t think they are too concerned about playing the long game.
Also, the software development industry, as an aggregate, has established a pervasive culture, based around developers staying at companies for 18-month stints. I don’t think many companies feel it’s to their advantage to incubate people who will bail out, as soon as they feel they have greener pastures, elsewhere.
Most low-code and no-code platforms go for junior dev empowerment, and senior dev replacement. This one also seems to be aimed at empowering juniors, but looks like it missed the senior replacement by miles.
There are a lot of good points made against copilot. But I’m optimistic in that it will improve with time. At worst it’s an efficient code copy-pasting tool, but at best it could be the next level of abstraction.
I think the problem might be in the training data. Famous code examples are probably copied a lot and therefore appear multiple times in the training data, prompting the neural network to memorise it completely.
Famous code examples are also much more likely to be noticed. For all I know, the thing might be spewing random GPL'd code from the long tail of GitHub all the time and nobody notices because it was written by some random guy and not John Carmack.
Well, it's sure speculation on my part what the root cause is, but i think OpenAI is already trying to ensure the network generalises. It's just common behaviour for neural network to memorise frequent samples, so I think my guess is quite realistic. I don't think OpenAI would not notice large-scale memorisation in their model. But as long as they don't publish more details it's just guesswork.
Just keep in mind that it's a statistical tool. You can't really formally prove that it won't memorise, but I think with enough work you can get it unlikely enough that it won't matter. It's their first iteration.
Also the Pareto principle. 80% of code is shit that you don't want to copy. The vast majority of github is awful hacks and insecure code that should not be touched with a ten foot pole.
Is this function used verbatim in multiple projects? I know it's famous but how often does one use an approximation of inverse sqrt instead of the readily available cpu call in the past 20 years
Copilot may do more to move open source projects out of GitHub than the message that Microsoft is the buyer. Now you can host the code on GitHub to get your license violated, or DMCA-ed in a long run, when your code will become a part of some big proprietary project. At least it makes me think about my choice for code hosting more then whatever happened before.
It looks like the author of the linked tweet intended for it to reproduce the Quake code, by using the exact same function name and comment. Whatever the merits of CoPilot, in this case the human intended to write the quake function into their file, and put the wrong license on it.
I'm honestly kinda amazed this as upvoted here as it is. Typically anything ML-related is upvoted to the top positions and any dissent harshly ridiculed. Anyways... it appears those who thought about this as if it was a glorified code search engine were close to being right.
I still don't think it's just a glorified code search engine.
Context-sensitive data retrieval is undoubtedly a part of it, though and the question is how big and relevant is that part and what are the consequences?
To me the biggest issue is that it's impossible to tell whether the suggestions are verbatim reproductions of training material and thus problematic.
It goes to show that this tool and basically every tool relying on the same or similar technology must now be assumed to do this and thus any code suggestion must be regarded plagiarism until proven otherwise. As a consequence such tools are now off-limits for commercial or open source development...
I expect it will lead to greater compliance than today, in the near future even if not today. Making tens of thousands of engineers understand the nuances of various licenses is infinitely more difficult than making copilot start taking better care of licensing requirements.
Remember that unlicensed content is still licensed. Generally the implicit license of a work is less permissive. So however you feel about this using GPL code, the situation is even worse for most ML models that have no license at all to use their training data.
Copilot transitions programmers from writing code to reading auto-generated code. And the feeling is that reading code is 10x harder than reading it? Seems like a rich source of problems.
(However, I'm still definitely going to try this out once I get off the waitlist.)
Copilot will most likely be useful as a Stackoverflow indexer or a better GitHub search. Developer speed can become faster because this eliminates some time spent on GitHub and SO. I don’t think it’s anymore useful than that.
I don't understand why this is an issue. And now I have a tool to do my coding, as well as my social media contributions:
"'Sweary comments' is not grammatically correct English." -> GPT-2 ->
"Sweary comments" is not grammatically correct English. The "sweary" meaning is used as slang by those in charge, especially the ones who get a free ride through the media, for the sake of a quick buck, or the sake of getting a better job with a promotion. The word is commonly used in a social context and comes from an actual "sweary" comment. The statement or phrase, "You're too shy!" is derived from this phrase: "I wish he were as outgoing as you."
I'm going to go against the flow here and say that worrying about this is similar to worrying about the license we give to snippets of code we copy-paste from other licensed code.
The reality is that we never attribute the original source because we copy-paste it, change it up a bit, and make it our own. Literally everybody does this.
I still care about licensing and proper attribution but the reality is that a snippet of code is not something so easy to attribute. Should we attribute all kinds of ideas, even the very small ones? How quickly is an idea copied, altered & reused? Can we attributes all the thoughts humans have?
I assumed it was trained on source code that was explicitly licensed with a permissive license. Are they training it using private unlicensed repos also?
Copilot seems to be an AI tool to generate code for you[0]. In the gif, it's copying code from Quake, which is GPLv2 or later. If copying GPLed code wasn't bad enough, it then adds a MIT-like license header.
Based on all the negative comments so far, and based on this website's aptitude at predicting the viability of a product, it really seems like Copilot is bound to be a success.
Yeah. I get why people's initial reaction is to dislike it tbh. Honestly I doubt the utility will be huge for experts, mostly likely it will just alleviate having to remember certain how a certain language implements a specific concept.
I hate to be the one that says this but I think it‘s true:
"So you are an SWE and you take a break from work to go to Hackernews to complain that Github's Copilot, which is an AI-based solution meant to help SWEs, is utter shit and completely unusuable.
And then you go back to writing AI-based solutions for some other profession. Which is totally not shit or anything.“
That's a bit different. Advertising is like a race to the bottom, where everybody to survive takes part. You can do that meanwhile wish that it could somehow not be that way. Same with environmental issues.
The GP comment by contrast is about hypocrisy. I personally found it funny that I didn't ever read about (or consider) copyright violations of deep learning until they tried to do it with code :-)
Of course programmers would find the problem with AI as soon as it exploited them.
You mean like the insanely annoying AIs that replaced Google search? The idiotic one that files Javascript books under "Law" in Amazon or the insulting one who runs Ad Sense and thinks my wife isn't good enough and I am stupid enough to leave her for some mail order bride?
Dunno. I go to HN because it's the one place where I can whine about AI being total bullshit, for the exact reasons as we're now complaining about wrt. Copilot.
SWEs create AI based solutions to X 'cause people pay them. Entrepreneurs and investors are the one who actually think they're the answer to everything.
Also, Copilot might (or might not) be useless or even interfere with real work. But it's probably low on the scale of awful things SWEs have helped create. The AI parole app is a thing that should haunt the nightmare of whoever created it, for example. But lots of AI apps may be useless but are probably also harmless so doing that might not be worst thing.
"Briefly stated, the Gell-Mann Amnesia effect is as follows. You open the newspaper to an article on some subject you know well. In Murray's case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the "wet streets cause rain" stories. Paper's full of them.
In any case, you read with exasperation or amusement the multiple errors in a story, and then turn the page to national or international affairs, and read as if the rest of the newspaper was somehow more accurate about Palestine than the baloney you just read. You turn the page, and forget what you know."
I would do this with Reddit posts. I’d see the top comment under something I was familiar with and see it was full of holes or just incorrect but then I’d go to a post about something I didn’t know all that well and take the top comment at face value.
I would say overfitting - the net doesn't "understand" the code in any meaningful sense. It just finds fitting examples and jumbles them a bit.
Understanding would mean to have an internal representation related to the intention of the user, the expected behavior, and say the AST of the code. My pessimistic interpretation of this and many other recent AI applications is that it is a "better markov chain".
a markov chain can have an internal representation related to the intention of the user. I guess this example just got copied a lot and is therefore included multiple times in the training data, forcing the network to memorise it. Neural networks always memorise things that appear too frequent. Memorized Artifacts in an otherwise working neural network is usually seen as a "bug" (since the training allowed the network to cheat), not as a proof that the network didn't generalise.
I mean, if you wrote an autocomplete system for written english and asked it to complete the sentence "O Romeo, Romeo" what would you expect to happen?
You'd expect it to complete to "O Romeo, Romeo, wherefore art thou Romeo?" - a very famous quote.
How else could you produce the single right output for that unique input, other than memorising and regurgitating?
Right, but the demonstration gave zero context and came up with the original function. It would have been interesting if it were instructed to produce the function in Haskell or some other programming model.
Neural nets aren't magic. You actually need quite a bit of complexity and modeling of interrelated problem spaces to get anything more than a childlike naivete or trauma savant-like mastery of one particular area with crippling deficiencies elsewhere.
Almost feels like a developer cultural thing to hate on something like this. If you dont like it, dont use it. If you dont want your team using it, become senior and then set the rules.
Kinda seems like maybe there's some level of insecurity at play here in the criticism. Like a "I coulda came up with that but its a bad idea" type of hater philosophy.
Unfortunately for GitHub, there's no turning back the clocks. Even if they fix this, everyone that uses it has been put on notice that it copies code verbatim and enables copyright infringement.
Worse, there's no way to know if the segment it's writing for you is copyrighted... and no way for you to comply with license requirements.
Nice proof of concept... but who's going to touch this product now? It's a legal ticking time bomb.
0. https://news.ycombinator.com/item?id=27687450
1. https://news.ycombinator.com/item?id=27676266