Database builder faces web-scraping lawsuit

wwortiz · on April 29, 2010

I thought it was going to be something like or about this: http://news.ycombinator.com/item?id=1243159, but rather it seems that the "scraping" was much more malicious and underhanded when you look at the details such as multiple logins to avoid detection and so much load was put on the server[s] that the website went down.

This is a hell of a lot more than just scraping a website like a search engine.

wtn · on April 29, 2010

I don't think that proves maliciousness. There's no reason to believe M. intended to crash the web server.

ewams · on April 29, 2010

Mitsubishi = Client || Snap-on = Original Contractor || O'Neil = New Contractor ||

A database was made by Snap-on for Mitsubishi. Mitsubishi moved the work to O'Neil. Snap-on did not want to give up their work [data]. A log in was required to access the data. Mitsubishi and O'Neil discussed ways to get the data. O'Neil tried to covertly get the data and hide what they were doing, according to 'testimony'. O'Neil caused damage. Snap-on sued O'Neil.

I think this is a better summary than the linked site. Plus I did not have 500 paragraphs with 2 sentences per section. No information was given on the contract terms. It is not mentioned how accounts are created to access the data. It is also not mentioned who should or should not have access. The method of gathering the data was not discussed at all; no mentioned of an automated process, no mention of a script, no mention of using interns with a Firefox plugin.

sireat · on April 29, 2010

What is bizarre/stupid of Mitsubishi that they did not own the database in the first place. When you hire/contract a developer, it would reasonable that their fruits of labor transfer to the one paying the bills.

Whoever wrote up the original contract at Mitsubishi screwed up, or alternatively Snap-on was smart/devious.

Reminds a little bit of the Skype fiasco, with eBay not buying exactly everything...

tezza · on April 29, 2010

The title is a little misleading IMHO. From reading the article it sounds like the term 'scraping' has been misused.

Rather it seems like a static web catalogue was downloaded and cloned once off using a website downloader.

.

The way we use web-scraping here on HN tends to be::

* frequently downloading

* structural changes can be as important as data changes

* often scraped on demand from a client request

fnid2 · on April 29, 2010

This is really a war between Robots and people. Would there be a case if someone had flipped every page of the paper catalog and entered the information into a database? I doubt it.

jasonlotito · on April 29, 2010

Actually... the article mentioned that the scrapers also violated the T&C of the site. Without knowing the details here, it could very well be that the use of the service be limited to only people, and not robots, or other such limits (like only the person associated with a user account allowed access and sharing is not allowed). This gets even stickier since it seems like the scrapers got the accounts from a company that probably signed some form of agreement with the database company.

lena · on April 29, 2010

I don't know the laws in other countries, but where I live there is a database-law that explicitly gives the owner of a database rights. For example: you cannot simply copy the phonebook and put it on the web, it doesn't matter if you use ocr-technology or hire typists to type it all in manually.

carussell · on April 29, 2010

Where I live (the United States), there is case law that found that where neither the individual pieces of data nor the selection criterion contain or are products of originality (read, "require editorial control" in the selection criterion aspect), the owner of a database has no such rights. For example: you can simply copy the phonebook and put it on the Web. It doesn't matter if you hire typists who type it all in manually after you've told them "Copy this." or if you use OCR technology.

The phonebook example was exactly the subject of Supreme Court case Feist v. Rural.

It seems like the only way Snap-on could win this one is if they demonstrate that their selection criterion met the standards for originality, which, though incredibly low, don't permit "Mitsubishi parts" to pass for originality.

Edit: This deals only with the copyright aspect of the suit. This doesn't cover the trespass and trade secrets aspects, which O'Neill might be found guilty of if there were terms and conditions they violated. But I don't know—I don't know much about business/contract law.

pierrefar · on April 29, 2010

What did the robots.txt file say? Was that violated? Is robots.txt a legal enough statement?

Interesting case.

jat850 · on April 29, 2010

As robots.txt is entirely optional (there's not even an RFC governing it), what it says or doesn't say isn't even technically violated in any way here, let alone legally.

(I apologize, I don't know how to do proper citations on here but http://en.wikipedia.org/wiki/Robots.txt provides more details.)

pierrefar · on April 29, 2010

But it is the convention that is widely accepted.

jat850 · on April 29, 2010

Yep, most services or search engines choose to respect the convention, but if the article indicates properly, it doesn't sound like the robot in question here was trying to be particularly kind or gentle in its work.

jasonlotito · on April 29, 2010

But not the only thing to consider. The article does mention violations of agreements, and not just the scraping. The argument isn't about scraping alone. I could see their being a contract that limited what the user accounts could be used for, and the possibility that the user accounts were limited to particular individuals and what not, as well as the limitation of their use.

Of course, all of this is speculation. The point is, it's not JUST a scraping case.

flog · on April 29, 2010

The more interesting question is does a site's individual policies apply? Does scraping a site constitute acceptance of the site's terms if no compliance is sought before it's use? Surely Google would be violating millions of such policies.

In this case it sounds like a login was required to access the data, in which case the account holder has probably had to agree to such terms.

As someone starting a business which plans to use web-crawling to a large extent I find the publicly-accessible side of this very interesting. Unfortunately it's still a very gray area - nobody seems to have an answer for me. The advice seems to be "try it, but you may have to defend it", which means a lot more money than most startups have.

wwortiz · on April 29, 2010

It seems that it shouldn't matter if a robots.txt was present or not because logins were required to "scrape" the information and that implies that those accounts that had to login probably had to accept some sort of agreement which probably forbid this in some way.

qeorge · on April 29, 2010

Interesting question. Anecdotally, I believe Facebook recently took the position that robots.txt has no legal standing:

"Their contention was robots.txt had no legal force and they could sue anyone for accessing their site even if they scrupulously obeyed the instructions it contained. The only legal way to access any web site with a crawler was to obtain prior written permission."

http://petewarden.typepad.com/searchbrowser/2010/04/how-i-go...