They changed it to disallow so that scrapers can't just claim the robots.txt gav...

tedivm · 2024-07-24T18:25:05 1721845505

According to the US court systems the robots.txt file is meaningless. If they respond with a 200 status code giving you the access then you can legally scrape it all you want. If they require that you log in then you have to follow the terms you agree to when creating an account. Public means public though, and if Reddit doesn't want to make the content private (put it behind a login) then we can scrape away.

Note that scraping, regardless of the level of permission, doesn't mean you can do anything you want with the content. Copyright still applies. But you can scrape it, and if your use falls under Fair Use or another caveat to the copyright laws then you can do ahead and do it without needing any permission from the authors.

sssilver · 2024-07-25T10:11:43 1721902303

Fascinating. Where can one learn more about this?

neongreen · 2024-07-25T15:25:26 1721921126

I liked the chapter on DMCA from the 5-volume E-Commerce & Internet Law. It was super detailed.

I haven’t read volume 1, but apparently half of it is about data scraping, and I expect it to be similarly detailed. So if I were you, that’s where I’d start.

Another option is looking for “robots.txt” at Google Scholar and trying various keywords like “legality”, “scraping”, “case law”, etc.

datavirtue · 2024-07-25T14:04:01 1721916241

The internet.

deprecative · 2024-07-25T14:38:04 1721918284

If you have nothing constructive to say why say anything?

datavirtue · 2024-07-26T21:19:14 1722028754

That was my answer, FFS.

toomuchtodo · 2024-07-24T16:06:07 1721837167

Independent scrapers can launder the data between Reddit and AI consumers. The only folks this hurts is users seeking info via search engines and folks willing to kowtow to rules that are potentially low impact to evade. Next steps would be (from an adversarial perspective) browser extensions that stream back data for ingestion similar to Recap for Pacer [1].

[1] https://free.law/recap/faq

(full disclosure: assisting someone pursuing regulatory action against reddit in the EU for a separate issue from scraping, it's a valuable resource, but the folks who own and control it are meh)

whycome · 2024-07-25T02:10:24 1721873424

Scrapings laundering. Do we have a term for this?

throwaway4pp24 · 2024-07-25T04:49:33 1721882973

Yes, right in the law - "fair use"

account42 · 2024-07-25T11:47:01 1721908021

Even more basic, it's free speech. The data itself is public domain so your free speech is not restricted and you don't need fair use excemptions for those restrictions. On the the access through the official system is restricted.