Care to share the scrapped data? I would love to play around with it.

costco · 2025-11-06T21:26:27 1762464387

Not sure if I can. At the very least book descriptions most likely could not be distributed. There is an academic dataset with around 200M reviews though: https://cseweb.ucsd.edu/~jmcauley/datasets/goodreads.html

demaga · 2025-11-06T21:22:31 1762464151

I am not sure about legal side of things here, but a Kaggle dataset would be really cool

guelo · 2025-11-06T21:37:36 1762465056

I'm surprised he got that much data. Goodreads uses several tricks to try to stop scrapers, for example pagination only works up to a few pages.

jacquesm · 2025-11-06T21:45:36 1762465536

They might send him a bill for use of resources.

cjaackie · 2025-11-07T02:22:30 1762482150

I’m wondering about how ethical it is to load down a resource in this way, open to opinions. There is a mention “I didn’t hammer down the servers” but what does that really even mean? The site isn’t being used as intended and just curious how other people feel about that.