Google Removed 50 Million “Pirate” Search Results This Year

cynwoody · on Dec 28, 2012

Well, there is always Yandex, over on the other side of the North Pole, far away from the DMCA. I wonder if anybody had done a comparative analysis.

Or, you could try Bing instead. Last May, MS sent Google a takedown notice for a Bing search result, which was still working[1].

[1]http://www.techdirt.com/articles/20121008/03500520637/micros...

guard-of-terra · on Dec 28, 2012

Yandex will still have to comply with DMCA in their English/USA-oriented .com version.

But I guess what will actually happen is people skipping Google and going straight to torrentz.eu.

iaw · on Dec 28, 2012

When I've searched for content that falls under a DMCA takedown there is usually a link to the original DMCA notice at chillingeffects at the bottom of the google page.

In theory shouldn't all the take down notices include the URL that was de-listed? So how hard would it be to take a search term, run it through google, look at all the DMCA results at chilling effect that are linked, parse out the original offending URL and result Title, then reconstruct something close to what would have been the pre-DMCA search result?

Is googlewithoutdmca.com available?

I'm not going to pursue this. If anyone wants to pick it up, knock yourself out.

nostromo · on Dec 28, 2012

Cool idea, but I don't think it's possible.

I just Googled "hobbit torrent" -- there were 5 DMCA notices at the bottom of the page -- so far, so good.

The problem is a single DMCA complaint contains thousands of URLs for many movies (ex: http://www.chillingeffects.org/notice.cgi?sID=709810). To reconstruct the page with these results would require Google revealing which of the thousands of urls was actually withheld.

iaw · on Dec 29, 2012

I think it's possible, just a bit too hard to make it a quick "piss off the RIAA" project. Instead of merely parsing out the URLs it would require linking the infringing titles at the top to the URLs below.

I would do a first pass that tries to compare the URL string to the infringing content. The second pass would cURL all of the yet to be identified URLs and then parse it for content that would link it back to the infringing content titles. I'd actually expect the page titles to contain the infringing content title most of the time (also there's probably a small number of sites that account for 90% of the takedown notices).

This all leads me to another thought : what's to stop someone from parsing all the DMCA takedown URLs for the content they contain and then generating a search engine for them. Legality for US citizens would be minimal here at best but not all of us are from the US.

qxcv · on Dec 29, 2012

And yet Google is still the easiest way to search for pirated content. I think that the most interesting part of this article is that the biggest alleged infringer, FilesTube, doesn't actually host any infringing files! It's simply a search engine which uses hyperlinks and iframes to embed results.

wmf · on Dec 29, 2012

Apparently the rules are pretty simple; FilesTube gets indexed by Google[1], so it's a candidate for deindexing.

[1] I won't speculate about whether FilesTube is actually a search engine or a Mahalo-style SEO arbitrage play, since that's off-topic.

hhuio · on Dec 28, 2012

Yeah, baidu.com ;)

1qaz2wsx3edc · on Dec 28, 2012

DMCA take downs surely have been growing.

However, isn't 50 million still a drop in the bucket, considering google's index is around 45 billion?

pixl97 · on Dec 28, 2012

At the rate the requests are growing there is no way to see if the requests are legitimate. This allows the DMCA to be used as a weapon against sites that are not infringing anything.

http://techflap.com/2012/10/microsoft-sends-dmca-notices-to-...

lukesandberg · on Dec 28, 2012

On the site you can see which individual requests were rejected (or partially rejected).

e.g. http://www.google.com/transparencyreport/removals/copyright/...

if you look through the page of all requests

http://www.google.com/transparencyreport/removals/copyright/...

there is a column which shows which requests have urls that were rejected. You have to go back pretty far to find requests with rejected URLs (since there are a lot of requests and urls are rejected only rarely).

e.g. http://www.google.com/transparencyreport/removals/copyright/...

wmf · on Dec 29, 2012

I applaud Google for at least trying to do the right thing, but how do we know what percentage of false positives/negatives are in there?

gizmo686 · on Dec 29, 2012

A third party is free to go through googles transparancy reports, there take down requests, and use the engine itself to validate what got taken down. As far as I am aware, Google is the only party devoting the resources nessasary to go through their DMCA requests.

miahi · on Dec 28, 2012

They will grow, and not in a good way. These are automated requests, and RIAA and the others don't really care if a percentage of them are not correct. The problem is that if your site was in these lists, and you actually have a fair use, your complaint will be manually processed - so it will take a lot of time.

BTW, the site list[1] looks like a great index for file sharing sites; somehow it defeats the purpose.

[1] http://www.google.com/transparencyreport/removals/copyright/...

lukesandberg · on Dec 28, 2012

For anyone interested in pirating content on the internet, it is already pretty easy (even without google search, or this page in particular). So it doesn't really change anything.

iroy · on Dec 28, 2012

Right. twitter search for putlocker.com or sockshare.com #whac-a-mole I never understood how megavideo got taken down when youtube had a much larger library of pirated content. Still does. Just watched Dredd 3D on it last week.