It's impossible to automate checking for code license violations.
If you and I write the exact same 10 lines of code, we both have independent and valid copyrights to it. Unlike patents, independent derivation of the same code _is_ a defense for copyright.
If I write 10 lines of code, publish it as GPL (but don't sign a CLA / am not assigning it to an employer), and then re-use it in an MIT codebase, I can do that because I retained copyright, and as the copyright holder I can offer the code under multiple incompatible licenses.
There's no way for a machine to detect independent derivation vs copying, no way for the machine to know who the original copyright holder was in all cases, and whether I have permission from them to use it under another license (i.e. if I email the copyright holder and they say 'yeah, sure, use it under non-gpl', it suddenly becomes legal again)...
It's not a problem computers can solve 100% correctly.
It's the same problem s with self driving cars, you gets sued. The company that provides the service/car or the the programmer/driver?
I think the latter.
This is true but doesn't change the problem that copilot itself is potentially distributing unlicensed copyrighted material. This isn't necessarily a problem for you as a developer though.
As someone who gets paid to write code (nominally) and has also written a few novels, I don't agree with this characterization. From what I've seen of Copilot, it's more like having a text editor generate your next sentence or paragraph^[1]. The idea (as I see it) is that you might use it to generate some prose "boilerplate", e.g. environmental descriptions, and hack up the results until you're satisfied.
It's content generation at a fragmentary level where each "copied" chunk does not form a substantive whole in the greater body of the new work. Even if you were training it on other authors' works rather than just your own, as long as it wasn't copying distinctive sentences wholesale, I think there's a strong argument for it falling under fair use--if it's even detectable.
On the other hand, if it regurgitated somebody else's paragraph wholesale, I don't think that would be fair use. Somewhere in-between is where it gets fuzzy, and really interesting; it's also where internet commenters seem to prefer flipping over the board and storming out convinced they're right to exploring the issues with a curious and impartial mind. I see way too much unreasoned outrage and hyperbolic misrepresentation of the Copilot tool in these threads, and it's honestly kind of embarrassing.
As far as this analogy goes, it's worth noting that the structure of a computer program doesn't map onto the structure of a piece of fiction (or any work of prose) in a straightforward way. Since so much of code is boilerplate, I would (speculatively, in the copyright law sense) actually give more leeway to Copilot in terms of absolute length of copied chunks than I would for a prose autocompleter. For instance, X program may be licensed under the GPL, but that doesn't mean X's copyright holder(s) can sue somebody else because their program happened to have an identical expression of some RPC boilerplate or whatever. It would be like me suing another author because their work included some of the same words that mine did.
^[1] At least one tool like this (using GPT-3) has been posted on HN. At this point in time I wouldn't use it, but I have to admit that it was sort of cool.
> ^[1] At least one tool like this (using GPT-3) has been posted on HN. At this point in time I wouldn't use it, but I have to admit that it was sort of cool.
Have a poke at novelai.net if you get a chance.
It's... not very smart. It's pretty decent at wordcrafting, though, and as an amateur writer I find it invaluable for busting writer's block. Probably if you spend all day writing fiction you'll find ways around that, but for me the solution has become "Ask the AI to try".
It'll either produce a reasonable continuation, or something I can look at and see why it's wrong. Either is better than a blank page.
That does not seem like a response to what I just said?
I said that it is impossible for the user to check that the code copilot gives is OK, license-wise, and therefore, they can not be sure that it is legally OK to include in any project.