It's no different from how current copyright works for us humans. Something is o...

core-utility · on Jan 6, 2023

So (just thinking out loud), if Copilot suggests something only seen in one codebase, the code owners have a decent copyright case. But if copilot suggests something that's frequent across multiple, there's really no case to be made.

judge2020 · on Jan 6, 2023

That's at least one of the rules that GH is trying to enforce on CoPilot, but legally I imagine that even repeating code that appears multiple times on the internet could be considered copyrighted infringement (ie. if multiple people copied that code from one person).

The problem here ends up being that code, especially in popular languages, will always looks similar when you're doing something like finding the best implementation for an algorithm. So if you invoke CoPilot for a common problem, chances are it can pull the exact code it needs from is dataset, but it also could've generated that same code snipped had the solution not existed in its training dataset. And when you start out solving a problem then ask it to continue writing more code, it just assumes you're solving the exact same problem that the original source code was solving.

This could probably be remedied if CoPilot spit out a "this is % similar to <x> source code from the internet" so that you can know just how unique CoPilot is being. Legally, copyright is just a mess and was not ready for the scale of the internet nor the advancements in ML when there are machines that have a 50% chance of infringing on someone's copyright and 50% chance of creating something new.