1.75 GB isn't enough data to justify a hadoop solution. That data size fits easily in memory, and without doubt on a single system.
It depends on what you do with the data. If you are processing the data in 512KB chunks and each chunk takes a day to process (because expensive computation), you probably do want to spread the work over some cluster.
I don't think of hadoop being built for high complexity computation, but high IO throughput.
When you describe this kind of setup, I imagine things that involve proof through exhaustion. For example prime number search is something with a small input and large calculation time. However, these solution don't really benefit from hadoop since you don't really need the data management facilities, and a simpler MPI solution could handle this better.
Search indexing could fit this description(url -> results), but generally you want the additional network cards for throughput, and the disks to store the results. Then again the aggregate space on disk starts looking closer to TB instead of GB. Plus in the end you need to do something with all those pages.
I think the article said that you don't need to use Hadoop for everything and that it might be much faster to just use command line tools on a single computer. Of course you might find a use case where the total computing time is massive and in that case a cluster is better. I still don't think many use cases have that problem.
We are doing some simple statistics at work for much smaller data sizes and the computing time is usually around 10-100 ms so it could probably compute small batches at almost network speed.
Definitely. I was reacting to my parent poster, because size does not say everything. 1TB can be small, 1GB can be big - it depends on the amount of computation time that is necessary for whatever processing of the data you do.
It depends on what you do with the data. If you are processing the data in 512KB chunks and each chunk takes a day to process (because expensive computation), you probably do want to spread the work over some cluster.