Hacker News new | past | comments | ask | show | jobs | submit login

That's a bit extreme. In theory, an LLM's proclivity for plagiarism could be studied by testing it with various prompts and searching its training data for its responses (maybe with some edit distance tolerance).



Except the training data is secret…


You can search github and other open sources to find at least a likely subset of the training data though.


You suggest doing this, by hand, for every suggestion?


Just try it out for some code you have on github where you know yours is the only solution out there. You'll be pleasantly surprised to see that it does not suggest a verbatim copy/paste of your code or anything close to it, unless you try this with a one liner like how to do an fopen(), which would not be a good test, and would not be the only solution out there. And then seeing the result, you can adjust your theory. So, in short, I suggest simply testing your theory, not anything absurd like what you are coming up with.


What would that prove? I still have no access to all the proprietary code generated from copilot, and no idea if it did copy paste or not in all those cases.

You suggest I try it twice and since it will probably not copy paste in those 2 tries, assume it never copy pastes (despite existing evidence that it does copy paste in some other cases).

What problem would this exercise solve? I can't see it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: