I thought it was going to be something like or about this: http://news.ycombinator.com/item?id=1243159, but rather it seems that the "scraping" was much more malicious and underhanded when you look at the details such as multiple logins to avoid detection and so much load was put on the server[s] that the website went down.
This is a hell of a lot more than just scraping a website like a search engine.
Mitsubishi = Client ||
Snap-on = Original Contractor ||
O'Neil = New Contractor ||
A database was made by Snap-on for Mitsubishi. Mitsubishi moved the work to O'Neil. Snap-on did not want to give up their work [data]. A log in was required to access the data. Mitsubishi and O'Neil discussed ways to get the data. O'Neil tried to covertly get the data and hide what they were doing, according to 'testimony'. O'Neil caused damage. Snap-on sued O'Neil.
I think this is a better summary than the linked site. Plus I did not have 500 paragraphs with 2 sentences per section. No information was given on the contract terms. It is not mentioned how accounts are created to access the data. It is also not mentioned who should or should not have access. The method of gathering the data was not discussed at all; no mentioned of an automated process, no mention of a script, no mention of using interns with a Firefox plugin.
What is bizarre/stupid of Mitsubishi that they did not own the database in the first place. When you hire/contract a developer, it would reasonable that their fruits of labor transfer to the one paying the bills.
Whoever wrote up the original contract at Mitsubishi screwed up, or alternatively Snap-on was smart/devious.
Reminds a little bit of the Skype fiasco, with eBay not buying exactly everything...
This is really a war between Robots and people. Would there be a case if someone had flipped every page of the paper catalog and entered the information into a database? I doubt it.
Actually... the article mentioned that the scrapers also violated the T&C of the site. Without knowing the details here, it could very well be that the use of the service be limited to only people, and not robots, or other such limits (like only the person associated with a user account allowed access and sharing is not allowed). This gets even stickier since it seems like the scrapers got the accounts from a company that probably signed some form of agreement with the database company.
I don't know the laws in other countries, but where I live there is a database-law that explicitly gives the owner of a database rights. For example: you cannot simply copy the phonebook and put it on the web, it doesn't matter if you use ocr-technology or hire typists to type it all in manually.
Where I live (the United States), there is case law that found that where neither the individual pieces of data nor the selection criterion contain or are products of originality (read, "require editorial control" in the selection criterion aspect), the owner of a database has no such rights. For example: you can simply copy the phonebook and put it on the Web. It doesn't matter if you hire typists who type it all in manually after you've told them "Copy this." or if you use OCR technology.
The phonebook example was exactly the subject of Supreme Court case Feist v. Rural.
It seems like the only way Snap-on could win this one is if they demonstrate that their selection criterion met the standards for originality, which, though incredibly low, don't permit "Mitsubishi parts" to pass for originality.
Edit: This deals only with the copyright aspect of the suit. This doesn't cover the trespass and trade secrets aspects, which O'Neill might be found guilty of if there were terms and conditions they violated. But I don't know—I don't know much about business/contract law.
As robots.txt is entirely optional (there's not even an RFC governing it), what it says or doesn't say isn't even technically violated in any way here, let alone legally.
Yep, most services or search engines choose to respect the convention, but if the article indicates properly, it doesn't sound like the robot in question here was trying to be particularly kind or gentle in its work.
But not the only thing to consider. The article does mention violations of agreements, and not just the scraping. The argument isn't about scraping alone. I could see their being a contract that limited what the user accounts could be used for, and the possibility that the user accounts were limited to particular individuals and what not, as well as the limitation of their use.
Of course, all of this is speculation. The point is, it's not JUST a scraping case.
The more interesting question is does a site's individual policies apply? Does scraping a site constitute acceptance of the site's terms if no compliance is sought before it's use?
Surely Google would be violating millions of such policies.
In this case it sounds like a login was required to access the data, in which case the account holder has probably had to agree to such terms.
As someone starting a business which plans to use web-crawling to a large extent I find the publicly-accessible side of this very interesting.
Unfortunately it's still a very gray area - nobody seems to have an answer for me. The advice seems to be "try it, but you may have to defend it", which means a lot more money than most startups have.
It seems that it shouldn't matter if a robots.txt was present or not because logins were required to "scrape" the information and that implies that those accounts that had to login probably had to accept some sort of agreement which probably forbid this in some way.
Interesting question. Anecdotally, I believe Facebook recently took the position that robots.txt has no legal standing:
"Their contention was robots.txt had no legal force and they could sue anyone for accessing their site even if they scrupulously obeyed the instructions it contained. The only legal way to access any web site with a crawler was to obtain prior written permission."
This is a hell of a lot more than just scraping a website like a search engine.