I don't necessarily understand the dissatisfaction with Github CoPilot, it is an interesting technology and one of the few successful real-world implementation of AI.
I agree that copilot is a real breakthrough, considering that the results seem to be mostly functioning. I think it has a ton of potential for helping people reach a crude solution instantly, without the need of learning the boilerplate setup of some library they wish to use. The developer still needs to pay careful attention and not accept the suggestions blindly.
The dissatisfaction comes from many concerns that need to be solved:
- Licensing: Sometimes Copilot copies chunks of code (including the comments) directly from the source dataset.
- Quality: at least one example from the Copilot promotion page had a vulnerability issue, and another one used floats for currency.
Then there's a problem I haven't seen anyone mention. When using copilot, your editor is sending chunks of your current codebase. I understood that Copilot is using this data for further improvement (a no-brainer, right?). But what if your project is proprietary and critical company IP? What if you have secrets to be leaked through Copilot?
Imagine you hired a consulting company. You can send them half backed code and they send back a fixed code. You can accept the changes or refuse to use them.
After a while, you realize that one of the member of the consulting team has a extraordinary good memory. He has memorized 1,000,000 digits of pi and e. He remembers whole books of chess endings. He has been fixing your code with whole blocks of 10-20 lines of codes that are equal to GPL code that he has read and involuntary memorized.
He was not aware that he has memorized that, he was not looking at the code, and he was not looking at personal notes. He has just too good memory.
You have already distributed your code. Is your code GPL? Can the owners of the original GPL project sue you?
MIT/BSD have clauses about attribution, so you are probably breaking that, but I suspect most people use MIT/BSD as a polite version of the WTFPL and will not be angry.
And there is also proprietary code with source available, that will also be unhappy.
And there are some weird licenses, most LaTeX packages have some kind of restriction.
CoPilot and IP as it exists are incompatible. They cannot coexist and still make sense. If all it takes to get around licensing is piping it through ML, that sets a legal precedent that basically legitimizes the practice of "math washing", which is basically dodging personal accountability because the machine did it, not me.
Further, Copilot enabled exactly the opposite of what good software engineering is about. We should understand the underlying consequences of the code we write. This includes libraries, dependency graphs, licensing incurred obligations, etc.
Also, Microsoft quite literally bought the most popular version control as a service company, then leveraged it to create a Machine Learned code generation framework.
They didn't ask for opt in. They didn't do due diligence, they didn't let anyone know ahead of time. They didn't ask anyone, they just did it.
You may look at my last paragraph and think, "Yeah, so what? Welcome to innovation, move fast and break things!"
You may not have noticed if you only pay attention to the tech world, but you live with a bunch of other non-tech people who have to regularly follow way more rules than tech companies have been held accountable for, and a lot of them are realizing that the relative competitiveness of tech is probably coming from their ability to skirt regulations put in place for good reasons.
While yes, sometimes society turns a blind eye and selectively enforces laws/regulations, etc..., It is generally done most frequently when socio-political agents are confident that harsh, resource-intensive enforcement really wouldn't produce as much realistic value in terms of applying that effort. The last decade has seen a lot of non-tech folks starting to become more aware of the reality of how tech isn't out or intending to act responsibly. They're out to make money, and position temselves into centralized positions of power and exaggerated influence.
Copilot is one more example of tech people being so concerned with whether they could, that once again, no one sat down and wondered if they should.
The dissatisfaction comes from many concerns that need to be solved:
- Licensing: Sometimes Copilot copies chunks of code (including the comments) directly from the source dataset. - Quality: at least one example from the Copilot promotion page had a vulnerability issue, and another one used floats for currency.
Then there's a problem I haven't seen anyone mention. When using copilot, your editor is sending chunks of your current codebase. I understood that Copilot is using this data for further improvement (a no-brainer, right?). But what if your project is proprietary and critical company IP? What if you have secrets to be leaked through Copilot?