Hacker News new | past | comments | ask | show | jobs | submit login

Someone will come along and say "Why don't you just mirror Anna's Archive?" in 3...2...1...



I think between Anna's Archive, fineweb and as many github repos as you can scrape you can get a pretty decent dataset.

I doubt Anna's Archive would produce a good model on its own though.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: