Hacker News new | past | comments | ask | show | jobs | submit login

Not sure about this, but training a model on the website that displays the code is not quite the same as training it specifically on just the code. Moreover, (raw) repo content files might not even be included in crawled datasets (e.g., look at https://gitlab.com/robots.txt). I think there is something specific to GitHub as it being part of Microsoft that makes processing that data much easier.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: