Hacker News new | past | comments | ask | show | jobs | submit login

You should be able to choose flavors of the model trained only on public-domain code which does not require attribution, for example.

But that would mean Microsoft acknowledging license violations.




Sorry, to be clear, I meant even if a Github user asserts their code is public-domain/no-attribution/unlicensed, they could have lifted it off a codebase that doesn't allow it. It would be tricky for Github to establish the code was indeed original and hence their agreement with the user allows them to train their models on it.


> they could have lifted it off a codebase that doesn't allow it

Ah. But then someone else is guilty of redistributing code without permission.

But you're suggesting, GitHub should implement something like ContentID but for code. Which should be cheaper (since code is cheap to analyze, while videos are much more bandwidth-intense). And this would kill two birds with one stone.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: