Hacker News new | past | comments | ask | show | jobs | submit login

I studied Google's file system. What Google figured out is that, with the rise of very fast networking such as 10 Gig Ethernet and faster, the network is much faster than local disk. Files were spread across multiple servers so they could all stream different parts of the file off their local disks simultaneously to the client computer faster than the local disk on any one computer could run. Thus, you could have systems like Gmail that could run much faster than even local disk based email clients, even with thousands of users.

Other providers were probably using expensive NASs with huge profit margins built in. Google was using thousands of the cheapest crappiest commodity parts because it was all triple redundant... and it worked faster because the network was really fast and multiple computers could stream different parts of the same file to clients.




https://static.googleusercontent.com/media/research.google.c...

Very, very influential reading back in the day, and still interesting.


It also came out at an interesting time, because everyone was trying to push data-to-redundancy ratios to their limits. Since storage was so expensive back then, storing multiple copies of data made little sense when looking at it from a data storage view, even if the speeds were much better

Then Google dropped their MapReduce paper: https://static.googleusercontent.com/media/research.google.c...

Which quite literally paved the way for modern data processing, and works extremely well with the Google Filesystem architecture


Yeah, and then everyone took hadoop and threw it on a NAS or they provision it in the cloud and...throw it on a NAS. Always scratched my head on that one.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: