You stated a fact "Copilot is trained on publicly available code".
The question (and implication) is: why not train it on MS internal code, if the claim that the output isn't license-incompatible is true.
If the output doesn't conflict with any open-source license (ie. it springs into existence from general principles, not from "copying" licensed code -- then MS-internal (in fact, any closed-source code) should be open-season.
I can imagine a few of the non-obvious segments of code I've written being "recognizable" methods to solve certain problems. And, they are certainly licensed (GPL + Commercial, in my case).
I think, at the very least, that a set of AIs should be trained on different compatible sets of code, eg. GPL, AGPL, BSD, etc. Then, you could select what amount of license-overlap is compatible with your project.
The question (and implication) is: why not train it on MS internal code, if the claim that the output isn't license-incompatible is true.
If the output doesn't conflict with any open-source license (ie. it springs into existence from general principles, not from "copying" licensed code -- then MS-internal (in fact, any closed-source code) should be open-season.
I can imagine a few of the non-obvious segments of code I've written being "recognizable" methods to solve certain problems. And, they are certainly licensed (GPL + Commercial, in my case).
I think, at the very least, that a set of AIs should be trained on different compatible sets of code, eg. GPL, AGPL, BSD, etc. Then, you could select what amount of license-overlap is compatible with your project.