It's worth keeping in mind that what a neural network like this (just like GPT3) is doing is generating the most probable continuation based on the training dataset. Not the best continuation (whatever that means), simply the most likely one. If the training dataset has mostly bad code, the most likely continuation is likely to be bad as well. I think this is still valuable, you just have to think before accepting a suggestion (just like you have to think before writing code from scratch or copying something from Stack Overflow).
I have no idea how this or GPT3 works or how to evaluate them, but couldn't you argue that it's working as it should? You tell copilot to write a fast inverse square root, it gives you the super famous fast inverse square root. It'd be weird and bad if this didn't happen.
As far as licenses go, idk. Presumably it could delete associated comments and change variable names or otherwise obscure where it's taking code from. Maybe this part is shady.
Maybe I could build a robot that goes out in the city and steal cars.
As far as licenses go, idk. Presumably it could delete the number plate and repaint the car or otherwise obscure where it's taking the car from. Maybe this part is shady.
In particular, fast approximate inverse square root is an x86 instruction, and not a super new one. I'd be surprised if it wasn't in every major instruction set.
This is an interesting issue. I suspect training on datasets from places like Github would be likely to provide lots of "this is a neat idea I saw in a blog post about how they did things in the 90's" codes.
> the most probable continuation based on the training dataset
This is not wrong, but it's easy to misread it as implying little more than a glorified Markov model. If it's like https://www.gwern.net/GPT-3 then it's already significantly cleverer, and so you should expect to sometimes get the kind of less-blatant derivation that companies aim to avoid using a cleanroom process or otherwise forbidding engineers from reading particular sources.