Hacker News new | past | comments | ask | show | jobs | submit login

robots.txt is good for telling robots they have no business reading a section, or will be severely disappointed if they do. There is no way one can encode the TOS of a site into the syntax of robots.txt, so there is no reason to believe it embodies the TOS.

I've never used facebook, but Section 9.2.6 of their terms says You will delete all data you received from Facebook if we disable your application or ask you to do so.

I don't see how that could be encoded in robots.txt.




User-agent: *

Disallow: /directory/people/*

Disallow: /directory/pages/*

Done. Google crawls Facebook. Every social media monitoring company crawls Facebook (and every other social network). Tons of other companies do the same every day. I know this because many of these companies are our customers.

What really irks me is that Pete collected valuable data and whoever he shared it with was probably able to derive added value from that data - in ways that Facebook is not doing. Facebook is prohibiting value creation.


Google can do it because Facebook doesn't really want to go up against money in making this a test case, and Google has baskets of cash.

Facebook doesn't give a tinker's damn about prohibiting your value creation.


Or because Google would simply comply with their demands and remove all mention of facebook from their indexes... That would be pretty interesting.


Ha! Yeah, I hadn't thought of that aspect. Half of their userbase wouldn't even be able to sign on any more!


rather more than half I think


I guess when they have direct gain, ie- people accessing facebook profiles through google searches, they are more than happy for this to occur.


Surely ToS is only applicable if you use facebook - eg create an account.

Either facebook wants to allow people to crawl it, or they don't. robots.txt should be binary - yes or no.


That's not exactly true. The TOS usually applies to people crawling the severs and mining data. Still, there is no clear way to know how a court would rule on something like this; each case is different.

See http://en.wikipedia.org/wiki/Browse_wrap


You don't waive your copyright by having a robots.txt, and while I believe most people think Google style indexing and searching is fair use - that doesn't mean anything you do with the data is fair use.


User content on Facebook may not be copyrightable. If I make a list of my personal interests, I haven't necessarily produced a creative work by the standards of US law.

Check out this site (just found it via search): http://www.canyoucopyrightatweet.com/


Correct- lists of facts without styling aren't something you can copyright. The specific form that they are printed in are copyrightable, but there is no IP created by a list of facts. Phone numbers, game scores, colors of rocks, etc... not copyrightable.


The courts have generally only required a minimum of creativity to make something copyrightable. Also sentences are surprisingly easy to mke unique- see http://go-to-hellman.blogspot.com/2009/11/uniqueness-of-sent...


I don't see any support for statement #1.

Statement #2 is a false dichotomy.

The Terms for facebook is a terrible document. It is written to be intelligible to humans but is full of ambiguity and undefined terms. Any lawsuit about the theory "robots.txt did not forbid my actions; therefore they are legal" would probably disintegrate into how the Terms language is interpreted.

Their lawyers probably leapt from high windows when it was released.


Surely if true, that means any web crawler must first locate and understand every single websites terms of service before it can be sure what it is allowed to do.

For a start though, you could easily state that facebook freely allow access to data, without requiring you to read terms&conditions.

You could argue that by freely allowing access to all of their data, without requiring you to read and agree to the terms of usage, then their claims have no basis.

If they did want to restrict access, or make sure every crawler had first agreed to T&C, it wouldn't take long for them to add that.


I think this is missing the point. Facebook's attorneys don't care about the technical details. They already know that whatever is technically possible is entirely subordinate to how they can threaten the owners of the technology.

It hasn't been tested in court, and there's a truly excellent chance Facebook would lose - either in terms of the court or in terms of what's left of their privacy reputation - but that doesn't matter one little bit. They have attorneys and money, and that's all that matters in this instance.

Really. The facile assumption that "it's possible to aggregate thus it's OK to aggregate" is exactly the way normal people think (by which I mean, tongue-in-cheek, us), but corporate attorneys see all this in terms of power relationships and contracts. As far as they're concerned, the poster took pictures through their front windows, and they're damn well going to threaten him with kneecapping until he gives them his negatives.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: