Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Care to share the scrapped data? I would love to play around with it.




Not sure if I can. At the very least book descriptions most likely could not be distributed. There is an academic dataset with around 200M reviews though: https://cseweb.ucsd.edu/~jmcauley/datasets/goodreads.html

I am not sure about legal side of things here, but a Kaggle dataset would be really cool

I'm surprised he got that much data. Goodreads uses several tricks to try to stop scrapers, for example pagination only works up to a few pages.

They might send him a bill for use of resources.

I’m wondering about how ethical it is to load down a resource in this way, open to opinions. There is a mention “I didn’t hammer down the servers” but what does that really even mean? The site isn’t being used as intended and just curious how other people feel about that.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: