It's a crazy problem to have, but we haven't found a solution. We can't just disappear from all the various indexes and collections that these robots represent.
I face a similar issue going forward. I think http://prerender.io has been posted to HN before - seems to be a pretty reasonable plug and play solution.
Thanks for the link! We started working on this project in early 2013, prerender.io didn't exist then (I think).
prerender.io does do most of what we do. A few things we do differently are keeping a pool of PhantomJS workers always available, killing HTTP requests that never complete, putting it behind HAproxy and logging.
I wrote about this back in July of 2010, and the problem has only gotten worse: http://don.blogs.smugmug.com/2010/07/15/great-idea-google-sh...
It's a crazy problem to have, but we haven't found a solution. We can't just disappear from all the various indexes and collections that these robots represent.