Hacker News new | past | comments | ask | show | jobs | submit login

It makes sense if you are handling millions of requests from all over the world per second and need failovers if machines go down.

But...if you just want to run your own personal search engine say...

Then for Wikipedia/Stackoverflow/Quora size datasets (50GB with with 10GB worth of every kind of index(regex/geo/full text etc) ) you can run real time indexing on live updates with all the advanced search options you see under their "advanced search" page one any random Dell or HP Desktop with about 6-8GB of RAM.

Lots of people do this on Wall Street. People don't get what is possible on desktop cause so much of it has moved to the cloud. It will come back to desktop imho.




It won't come back to desktops because as they get cheaper the costs of people to maintain them increases.

There will always be people who need them and use them but that proportion is going to keep decreasing (I'm somewhat sad about this, but the math is hard to argue with).


Just temporary. Nature did not need to invent cloud computing to perform massive computation. The speed and data involved in computation going on in a cell or an ants brain show us how far we can still go on desktop. The breakthroughs will come.

In the meantime (unless you are dealing with video) most text and image datasets out there that avg Joe needs can easily be stored/processed entirely locally thanks to cheap terabyte drives/multicore chips these days. People just haven't realized there isn't that much useable textual data OR that local computing doesn't require all the overhead of handling millions of requests a second. This is Google problem not an avg Joe problem that is being solved with cloud compute.


You seem to have missed everything the comment you are replying to says.

There is no dispute with the size or amount of compute available on desktops.


No idea what cost that comment refers too. If you are Facebook or YouTube or Twitter or Amazon sure you need your five thousand man PAID army of content reviewers to keep the data clean cause the data is mostly useless. But if you are running Wikipedia search or Stackoverflow's search or the team at the library of congress please go and take a look at the size of these teams and their budgets and their growth rates.


Are you conflating "single server" and "desktop computer" here? It sounds a lot like you might be?

Because almost everyone already uses the model of co-locating the search index and query code on a single computer (both Wikipedia and Stackoverflow use Elastic Search which does this).

They use multiple physical servers because of the number of simultaneous requests they serve.

This has never been the use-case for Hadoop.

I've built Hadoop based infrastructure for redundantly storing multiple PB of unstructured data with Spark on top for analysis. This is completely different to search.

That's very different to the Wall St analyst running desktop analysis in Matlab, or the oil/gas exploration team doing the same thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: