robots.txt is good for telling robots they have no business reading a section, or will be severely disappointed if they do. There is no way one can encode the TOS of a site into the syntax of robots.txt, so there is no reason to believe it embodies the TOS.
I've never used facebook, but Section 9.2.6 of their terms says You will delete all data you received from Facebook if we disable your application or ask you to do so.
I don't see how that could be encoded in robots.txt.
Done. Google crawls Facebook. Every social media monitoring company crawls Facebook (and every other social network). Tons of other companies do the same every day. I know this because many of these companies are our customers.
What really irks me is that Pete collected valuable data and whoever he shared it with was probably able to derive added value from that data - in ways that Facebook is not doing. Facebook is prohibiting value creation.
That's not exactly true. The TOS usually applies to people crawling the severs and mining data. Still, there is no clear way to know how a court would rule on something like this; each case is different.
You don't waive your copyright by having a robots.txt, and while I believe most people think Google style indexing and searching is fair use - that doesn't mean anything you do with the data is fair use.
User content on Facebook may not be copyrightable. If I make a list of my personal interests, I haven't necessarily produced a creative work by the standards of US law.
Correct- lists of facts without styling aren't something you can copyright. The specific form that they are printed in are copyrightable, but there is no IP created by a list of facts. Phone numbers, game scores, colors of rocks, etc... not copyrightable.
The Terms for facebook is a terrible document. It is written to be intelligible to humans but is full of ambiguity and undefined terms. Any lawsuit about the theory "robots.txt did not forbid my actions; therefore they are legal" would probably disintegrate into how the Terms language is interpreted.
Their lawyers probably leapt from high windows when it was released.
Surely if true, that means any web crawler must first locate and understand every single websites terms of service before it can be sure what it is allowed to do.
For a start though, you could easily state that facebook freely allow access to data, without requiring you to read terms&conditions.
You could argue that by freely allowing access to all of their data, without requiring you to read and agree to the terms of usage, then their claims have no basis.
If they did want to restrict access, or make sure every crawler had first agreed to T&C, it wouldn't take long for them to add that.
I think this is missing the point. Facebook's attorneys don't care about the technical details. They already know that whatever is technically possible is entirely subordinate to how they can threaten the owners of the technology.
It hasn't been tested in court, and there's a truly excellent chance Facebook would lose - either in terms of the court or in terms of what's left of their privacy reputation - but that doesn't matter one little bit. They have attorneys and money, and that's all that matters in this instance.
Really. The facile assumption that "it's possible to aggregate thus it's OK to aggregate" is exactly the way normal people think (by which I mean, tongue-in-cheek, us), but corporate attorneys see all this in terms of power relationships and contracts. As far as they're concerned, the poster took pictures through their front windows, and they're damn well going to threaten him with kneecapping until he gives them his negatives.
I've never used facebook, but Section 9.2.6 of their terms says You will delete all data you received from Facebook if we disable your application or ask you to do so.
I don't see how that could be encoded in robots.txt.