Co-pilot spits back protected expressions, not novel expressions based on ideas ...

KoolKat23 · on April 3, 2023

That's not the case, there's a probability it may "spit back" the protected expression. There's also a probability I, as a human "spit back" the protected expressions. This could either be by pure chance or from past learnings, reading the protected code and internalizing it as a solution, my subconscious forgetting I actually saw it elsewhere.

In Uni, students run their theses through plagiarism checkers, even if it's novel research as it naturally occurs.

As the thought experiment goes, given infinity, a monkey with a typewriter will inevitably write Shakespeares works.

EarlKing · on April 3, 2023

...except you don't need an infinite number of monkeys. It has been trained to produce protected expressions by virtue of being trained on protected expressions. The probability of it producing a protected expression at some point is 1.

KoolKat23 · on April 3, 2023

The same truth holds for you or me writing up that code.

EarlKing · on April 3, 2023

No it doesn't. My mind contains information derived from expressions I've read which I can rearrange into novel expressions. I don't regurgitate protected expressions verbatim. Co-Pilot does.

KoolKat23 · on April 3, 2023

That's exactly what co-pilot does, ask it to rearrange it if what it comes up with is the same. That's what code plagiarism checkers are for.

sureglymop · on April 3, 2023

You are correct. The problem is that the GitHub Terms of Service probably (guessing) have a clause which invalidates your license if you upload your code there. And that's exactly why you shouldn't use GitHub.

xigoi · on April 3, 2023

The terms of service explicitly say that GitHub is not allowed to use your code for commercial purposes.

natch · on April 3, 2023

This seems to be what people imagine about it, not what it actually does, although I don’t doubt you could cherry-pick some snippet after a lot of trial and error to try to claim that it had regurgitated something verbatim. But certainly let’s see the examples.

bombolo · on April 3, 2023

You never know if a snippet it created came from another project verbatim or not… unless you claim you know all of the code that exists?

bqmjjx0kac · on April 3, 2023

That's a bit extreme. In theory, an LLM's proclivity for plagiarism could be studied by testing it with various prompts and searching its training data for its responses (maybe with some edit distance tolerance).

bombolo · on April 3, 2023

Except the training data is secret…

natch · on April 3, 2023

You can search github and other open sources to find at least a likely subset of the training data though.

bombolo · on April 6, 2023

You suggest doing this, by hand, for every suggestion?

natch · on April 7, 2023

Just try it out for some code you have on github where you know yours is the only solution out there. You'll be pleasantly surprised to see that it does not suggest a verbatim copy/paste of your code or anything close to it, unless you try this with a one liner like how to do an fopen(), which would not be a good test, and would not be the only solution out there. And then seeing the result, you can adjust your theory. So, in short, I suggest simply testing your theory, not anything absurd like what you are coming up with.

bombolo · on April 10, 2023

What would that prove? I still have no access to all the proprietary code generated from copilot, and no idea if it did copy paste or not in all those cases.

You suggest I try it twice and since it will probably not copy paste in those 2 tries, assume it never copy pastes (despite existing evidence that it does copy paste in some other cases).

What problem would this exercise solve? I can't see it.

natch · on April 3, 2023

Search tools are a thing. Grep, Google, Github, etc.

But yes, exactly, you never know if a snippet came from another project or not, so let's not assume it did without some convincing evidence.

bombolo · on April 4, 2023

Why assume it didn't?

natch · on April 5, 2023

I’ve done tests and it passed with flying colors so it’s not an assumption. So the premise of your question is flawed.

bombolo · on April 10, 2023

It has been shown that github copilot does copy paste.

The fact that you tried it a couple of times (or 10 or 20) means absolutely nothing.

1 copyright infringement is enough for a lawsuit.