Can't you stop crawling sites that dish up your search terms for you (without any content)? A good example is eudict.com - search for an obscure word and this is the first that pops up.
It then returns a page with your search queries and no information.
What use is a site that simply returns search queries?
>Why aren't sites that scrape content blacklisted?
The problem is more difficult than you'd think. For instance, virtually every news organization "scrapes" the associated press, but we wouldn't want to throw out every news organization.
Content-free search result pages are things we do try to remove, even manually if it becomes a big enough problem.
virtually every news organization "scrapes" the associated press
If they're not adding real value, like analysis or graphics or commentary or whatnot, why would you want to keep them if they're all just duplicates?
I had a friend work at a startup to solve this problem exact: we read virtually identical articles about the same bit of news on all the news sites. The startup was working on highlighting only the unique bits of each article and recommend the one article that seems to have the most pieces of information. You would read the one and skim to the unique bits of the others, and you would have gotten all angles and facts much more quickly.
We do filter near-duplicates within the same set of results. You'll likely see only one copy of an AP story with a link at the bottom saying something like "Repeat this search with the omitted results included"
Google News doesn't show up in search results the way, e.g., Mahalo might. The only search results I've seen that incorporate Google News are built right into the main results page; I don't click through expecting content and get a Google News page instead.
In fact, I haven't seen any Google-owned scraping or aggregating page in a result that I've clicked through. They are big believers in the theory that you should look at exactly one search results page, not a page that takes you to a page that takes you to (...) the result you actually wanted.
I haven't seen any Google-owned scraping or aggregating page in a result that I've clicked through.
What about Google Health results? Try [Whooping Cough] or similar. Top 'result' is a Google health page whose main column is all content republished from Medline. Right column is essentially 'more results' from News and Scholar.
It's not quite as bad as other paste-together pages of text and more results, but they're creeping in that direction.
It then returns a page with your search queries and no information.
What use is a site that simply returns search queries?
Why aren't sites that scrape content blacklisted?