robots.txt is good for telling robots they have no business reading a section, o...

jdrock · on April 5, 2010

User-agent: *

Disallow: /directory/people/*

Disallow: /directory/pages/*

Done. Google crawls Facebook. Every social media monitoring company crawls Facebook (and every other social network). Tons of other companies do the same every day. I know this because many of these companies are our customers.

What really irks me is that Pete collected valuable data and whoever he shared it with was probably able to derive added value from that data - in ways that Facebook is not doing. Facebook is prohibiting value creation.

Vivtek · on April 6, 2010

Google can do it because Facebook doesn't really want to go up against money in making this a test case, and Google has baskets of cash.

Facebook doesn't give a tinker's damn about prohibiting your value creation.

rapind · on April 6, 2010

Or because Google would simply comply with their demands and remove all mention of facebook from their indexes... That would be pretty interesting.

Vivtek · on April 6, 2010

Ha! Yeah, I hadn't thought of that aspect. Half of their userbase wouldn't even be able to sign on any more!

bluesmoon · on April 6, 2010

rather more than half I think

robryan · on April 6, 2010

I guess when they have direct gain, ie- people accessing facebook profiles through google searches, they are more than happy for this to occur.

axod · on April 5, 2010

Surely ToS is only applicable if you use facebook - eg create an account.

Either facebook wants to allow people to crawl it, or they don't. robots.txt should be binary - yes or no.

alex1 · on April 5, 2010

That's not exactly true. The TOS usually applies to people crawling the severs and mining data. Still, there is no clear way to know how a court would rule on something like this; each case is different.

See http://en.wikipedia.org/wiki/Browse_wrap

greendestiny · on April 6, 2010

You don't waive your copyright by having a robots.txt, and while I believe most people think Google style indexing and searching is fair use - that doesn't mean anything you do with the data is fair use.

wtn · on April 6, 2010

User content on Facebook may not be copyrightable. If I make a list of my personal interests, I haven't necessarily produced a creative work by the standards of US law.

Check out this site (just found it via search): http://www.canyoucopyrightatweet.com/

tibbon · on April 6, 2010

Correct- lists of facts without styling aren't something you can copyright. The specific form that they are printed in are copyrightable, but there is no IP created by a list of facts. Phone numbers, game scores, colors of rocks, etc... not copyrightable.

gluejar · on April 7, 2010

The courts have generally only required a minimum of creativity to make something copyrightable. Also sentences are surprisingly easy to mke unique- see http://go-to-hellman.blogspot.com/2009/11/uniqueness-of-sent...

jws · on April 5, 2010

I don't see any support for statement #1.

Statement #2 is a false dichotomy.

The Terms for facebook is a terrible document. It is written to be intelligible to humans but is full of ambiguity and undefined terms. Any lawsuit about the theory "robots.txt did not forbid my actions; therefore they are legal" would probably disintegrate into how the Terms language is interpreted.

Their lawyers probably leapt from high windows when it was released.

axod · on April 5, 2010

Surely if true, that means any web crawler must first locate and understand every single websites terms of service before it can be sure what it is allowed to do.

For a start though, you could easily state that facebook freely allow access to data, without requiring you to read terms&conditions.

You could argue that by freely allowing access to all of their data, without requiring you to read and agree to the terms of usage, then their claims have no basis.

If they did want to restrict access, or make sure every crawler had first agreed to T&C, it wouldn't take long for them to add that.

Vivtek · on April 6, 2010

I think this is missing the point. Facebook's attorneys don't care about the technical details. They already know that whatever is technically possible is entirely subordinate to how they can threaten the owners of the technology.

It hasn't been tested in court, and there's a truly excellent chance Facebook would lose - either in terms of the court or in terms of what's left of their privacy reputation - but that doesn't matter one little bit. They have attorneys and money, and that's all that matters in this instance.

Really. The facile assumption that "it's possible to aggregate thus it's OK to aggregate" is exactly the way normal people think (by which I mean, tongue-in-cheek, us), but corporate attorneys see all this in terms of power relationships and contracts. As far as they're concerned, the poster took pictures through their front windows, and they're damn well going to threaten him with kneecapping until he gives them his negatives.