Considering the kind of private scraping and selling tactics LinkedIn has been chronically guilty of (and not just the ordinary "growth hack" stuff: "LinkedIn violated data protection by using 18M email addresses of non-members to buy targeted ads on Facebook" [1]), it's satisfying to see LinkedIn lose this.
I feel like this is a really common theme I've seen several times. Something like "Music Lyric site X sues Google for embedding their lyrics in the results directly" which is funny because site X got the lyrics by scraping them from other sites.
Plus Google only exists from scraping content, but I believe their TOS includes "don't scrape our content".
I find it really funny that the scrapers are battling scrapers - like guys you only exist because you do THE EXACT SAME THING
Regardless, there is legitimate value in the collection, cleaning, interlinking, and presentation of existing data. How that is interpreted by the law is one thing but merely because the data came from a variety of other public/private sources doesn't mean it derived all of its value externally.
For sure, but they shouldn't be hypocritical about it. If they don't consider themselves content parasites, they shouldn't consider people scraping their site to be content parasites, either. (Some sites really are just parasites, though.)
There's nothing hypocritical about it. Googlebot respects robots.txt configured on pages it scrapes. Google in turn expects that their own robots.txt will be respected. What's the issue?
Can I politely point out that the conversation is not about respecting robots.txt.
If you want to talk about this in terms of robots.txt, Google is thriving on the fact that other companies don't block their content in robots.txt, but at the same time Google blocks all of its content in its robots.txt.
> If you want to talk about this in terms of robots.txt, Google is thriving on the fact that other companies don't block their content in robots.txt, but at the same time Google blocks all of its content in its robots.txt.
It seems like you're stating this as though to cast some sort of moral aspersion. I don't get it. If other companies don't want Googlebot to scrape them they just have to say so. Most companies want Googlebot to scrape their content. Google doesn't want other people's scrapers to scrape Google's content. Nobody involved in any of this has done anything unreasonable or morally objectionable.
> Plus Google only exists from scraping content, but I believe their TOS includes "don't scrape our content".
Yes. This is EXTREMELY frustrating.
Of all companies to prevent scraping, Google is the most ironic.
Especially since their goal is to organize the world's information, it shocks me that there's no way to get access to this organized information from machine to machine.
I think it’s important to distinguish types of barriers to entry. Some are “real” while others are “artificial”. For example, a real barrier to entry would be institutional knowledge about an industry while an artificial one would be an arbitrary TOS clause.
And disallowing scraping or making it difficult while refraining from providing an API for the same data is the arbitrary kind. The default state of the web is that it's trivially scrapable - you have to go out of your way to make it harder.
[1] https://techcrunch.com/2018/11/24/linkedin-ireland-data-prot...