Best of luck to the author. Does remind me a broader question -- why is there no...

duggan · on Dec 21, 2011

Edit: there appears to be such a project, at least on the crawling side: http://www.commoncrawl.org/

I'd say there's a combination of factors, the first (and most important) being that Google is good enough for most people.

You'd need to coordinate crawling so as not to turn it into a giant DDoS machine; speed will be an issue due to geo distribution, variable hardware and result sets.

Validity and reliability of the data would also be issues, and would probably require several peers to "agree" to consistency, but in a way that does not allow easy gaming of results.

I suppose they're all solvable, though I think there would have to be a powerful incentive to do so. I imagine it'd be quite pricey too for the individual, though perhaps Gabriel Weinberg* could weigh in there.

[*] http://www.gabrielweinberg.com/

kenjackson · on Dec 21, 2011

I'd say there's a combination of factors, the first (and most important) being that Google is good enough for most people.

This is actually why I think it's important to have a good open source alternative. Google is good today. And frankly, if Bing wasn't around, Google could probably stop doing any work on search for the next five years with impunity.

fizx · on Dec 21, 2011

Gabe outsources his full-web index to Yahoo BOSS/Bing.

fizx · on Dec 21, 2011

Search engines are expensive to run. Roughly speaking, to do it well, you have to keep an up-to-date copy of the internet in RAM.