Hacker News new | past | comments | ask | show | jobs | submit login

Unfortunately moving to Gitlab or Sourcehut doesn't really help, because the underlying model (GPT-x) is trained on the entire internet, so that includes all scrape-able websites. The only way for your data not to be used in GPT (and therefore Copilot) is to not to put it on any website or make it very difficult to access, like encrypting it.



> Unfortunately moving to Gitlab or Sourcehut doesn't really help, because the underlying model (GPT-x) is trained on the entire internet, so that includes all scrape-able websites. The only way for your data not to be used in GPT (and therefore Copilot) is to not to put it on any website or make it very difficult to access, like encrypting it.

Having the entire git history decorates specific chunks (at least entire commits) with context by the commit message. So you may not only process the entire repo at one specific state in time, but the entire history in at this point in time. There is valuable knowledge while making sense of it; But this is not accessible to us. It relies in the knowledge base of one company (or two).


Not sure about this, but training a model on the website that displays the code is not quite the same as training it specifically on just the code. Moreover, (raw) repo content files might not even be included in crawled datasets (e.g., look at https://gitlab.com/robots.txt). I think there is something specific to GitHub as it being part of Microsoft that makes processing that data much easier.


Is that such a bad thing? I write code to get it out there, not for it to be exclusive or something. Any code that I write that helps somebody else in some way is a huge win for me and what keeps me going. If I could, _all_ of the code that I write would be open source or public but as-is the best way to make a living for me is to write closed-source code.

I see my code getting scraped by these AI tools as me having contributed to something greater than the sum of its parts. And I use it! My code helps you, your code helps me.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: