Scaling the Facebook data warehouse to 300 PB

Skywing · on April 10, 2014

Based on my current connections, and experience, I can't fully see myself ever getting the chance to do so, but I'd love to have the opportunity to work on similar problems - at Facebook or anywhere. I've always loved working on lower level problems such as this. I learned to program by writing computer game hacks, reverse engineering games and coding in C / ASM. Currently, I find myself writing C# every day for a small company at which nobody understands a single word I say about programming.

rdl · on April 10, 2014

Yeah, it's interesting how Facebook does something which on its face seems trivial and unimportant, but due to scale, has some really amazing engineering and infrastructure challenges -- and they solve them by building tech to make things scale from commodity systems (like Google, etc have done), vs. buying third-party big iron solutions (which is what eBay did when they had similar challenges, and what most big companies do).

batbomb · on April 10, 2014

Most big iron solutions are effectively commodity systems anymore anyways, occasionally with fancy interconnects and coprocessors.

nasalgoat · on April 11, 2014

Yes, I laughed when I got a console prompt on an $80,000 EMC Isilon disk array and it was FreeBSD.

salemh · on April 10, 2014

Do you think its a cost/control to not buy big iron (outsource to a degree) vs build/scale themselves? I'd like to hear some HN thoughts if some don't mind sharing.

apaprocki · on April 11, 2014

"Big iron" isn't all it's cracked up to be. Everything is a trade-off. Very few people are doing pure computation and that is where those machines excel (in addition to lots of aggregate I/O). The government research labs and the like get a lot of use from these machines.

If you trying to scale an Internet-style app on one of these machines, you might need to expand past one machine after a while. By staying on one machine, you're avoiding all the complexity needed in your software to coordinate between multiple machines. If you lose the ability to fit on a single box, you'll need to add that complexity in anyway. So what does 10 beefy boxes buy you as opposed to 1000 smaller ones? There is of course an operational/DC/power cost involved with more boxes, but I think most shops consider that an easily solvable problem. For example, a maxxed out POWER7 box from IBM will give you 256 processors and all the memory and I/O trimmings you need. If you need more than 256 processors or the local amount of RAM, you'll pay the software complexity cost anyway.

jbangert · on April 11, 2014

Well, the 10 beefy boxes will be much, much faster if your problem is not very distributable. Say, Facebook as an application shards very easily, because most users don't interact much with each other. Other applications, might have much more interactions.

What you're really paying for when buying a 256 processor POWER7 box is the fact that the interconnect (and therefore the time to acquire a lock/update data from another node) is much faster and more reliable than commodity networks/kernels/stack.

srean · on April 11, 2014

Depends on what you are programming on. If its in a language far removed from the machine your mileage may vary.

I have had the opportunity to try out Google's implementation of mapreduce implemented in C++ way back in time (6 years ago). These would run on fairly impoverished processors, essentially laptop grade. Have done stuff on Yahoo's Hadoop setup as well, these used high end multicore machines provisioned with oodles of RAM (I dont think I should share more than that). If I were to be generous, Hadoop ran 4 times slower as measured by wall clock times. Not only that, Hadoop required about 4 times more memory for similar sized jobs. So you ended up requiring more RAM, running for longer and potentially burning more electricity. This is by no means a benchmark or anything like that, just an anecdote.

That Hadoop would require much more memory did not surprise me, that was expected. What was really surprising was that it was so much slower. JVM is one of the most well optimized virtual machines we have out there, but its view of the processor is very antiquated and it does not surface those hardware level advances to the programmer. You pay for a hot-rod machine but run it like an old faithful crown victoria.

Four times might not seem like much, for one thing I am being generous, and it makes a big difference when you can make multiple run through the data in a single day and make changes to the code/model. Debugging and ironing out issues is a lot more efficient.

I think Hadoop gave Google a significant competitive advantage over the rest, probably still does.

apaprocki · on April 11, 2014

Interconnect may be faster but as a whole system it is hard to compete with the raw speed of an x64 box with all the latest/greatest chipsets. You usually wind up having to write non-portable code to eke full performance out of the massive box and in the end your apps will probably still be faster on x64. They're best suited for massive parallel computation that isn't afraid of getting down to the metal and taking advantage of lots of the special chip instructions in asm. (Or alternatively you want POWER specifically because it has hardware dfp support.) The total gain from running on x64 will most likely exceed any loss from a network hop in a case where both have to go off to SAN for their data.

qq66 · on April 11, 2014

Yes. Facebook's "cost of revenue" (which they state is mostly infrastructure) was $1.875 billion in 2013, a year when they made $1.5 billion in net income. For comparison, research and development was $1.4 billion.

Facebook's business model involves getting 1 billion people to post a ton of stuff inside Facebook, costing them about $2/user/year in infrastructure, $3.50/user/year in other costs, and making about $7/user/year in advertising revenue, yielding about $1.50 in profit. So cutting costs on that $2 makes them significantly more profitable.

applecore · on April 10, 2014

If you're a pure technology company, like Google and Facebook, you're not going to outsource your core competency. It's not an issue of costs; buying "big iron", even if they could, would be akin to style drift for an investment manager.

rdl · on April 10, 2014

You could definitely still do a lot less in-house than FB does, and be successful. FB seems to delight in building tools and infrastructure.

Bloomberg is probably a better example of a company which builds "optional" technology in house, just to be awesome, though -- they're not at the scale of FB (where "traditional" solutions break down), but from what I've seen, they do a lot of interesting work in-house because their staff want to do it, and because it lets them have really top-quality staff in a highly competitive market.

limelight · on April 11, 2014

> they're not at the scale of FB (where "traditional" solutions break down)

I'm not so sure about that. Bloomberg processes an incredible amount of data, and they have strict latency requirements. In many cases, traditional solutions would in fact break down under those requirements.

apaprocki · on April 11, 2014

I work on the infrastructure team at Bloomberg. There are lots of problems solved by OSS, but there are also lots of pieces of infrastructure we have to build ourselves to scale the things we need to. Latency is killer, indeed. (Low-latency data aggregation/generation/distribution is only one part of the business, though.)

rhizome · on April 11, 2014

Facebook does something which on its face seems trivial and unimportant, but due to scale, has some really amazing engineering and infrastructure challenges

Is this really true? Seems to me that these challenges are completely due to collecting a whole bunch of information that people would rather they didn't, and routinely rebuke them for. It's like working on drone targeting problems that are made more difficult because children move more unpredictably (or more quickly, etc.) than adults. "Yeah, but the math is insane!"

You may dislike my analogy, but it only appears "trivial and unimportant" because it's the most mundane aspect of a larger unsavory project.

gibrown · on April 10, 2014

Can I suggest another company dealing with large amounts of data at a similar scale:

http://automattic.com/work-with-us/data-wrangler/

Yes this is a shameless plug for the company I love working for, but I think it addresses Skywing's point. We are one of the few companies at this scale that are completely location agnostic, and we hire by trial (can you do the job), not by credentials.

choult · on April 10, 2014

If you were in the UK I'd invite you to have a think about working with us (DataSift)...

Anyone else feeling this way, drop me a line chris.hoult at datasift dot com.

andrewchoi · on April 10, 2014

When they introduced ORCFile, I was kind of hoping that the next iteration would be the URUKFile.

zerd · on April 12, 2014

Or HUMANFile.

NicoJuicy · on April 11, 2014

I'm actually wondering, are their any ways to play with data like this (eg. downloading data from StackExchange http://blog.stackexchange.com/category/cc-wiki-dump/ )

Any other ways? I don't believe there is a VM for this sort of "experimentation"

zackangelo · on April 11, 2014

http://aws.amazon.com/datasets

pokstad · on April 10, 2014

A lot of fancy tech for only 805MB per user (based on estimate of 500 million users). What if we had a pure p2p web app using webrtc & local storage to replace Facebook? Think everyone could spare 805MB?

Calculation: https://www.google.com/search?q=(300PB)%2F(400000000)

kosievdmerwe · on April 11, 2014

You're a bit behind, it's 1.23 billion monthly active users or if you feel that's a bit disingenuous: 757 million daily active users.

Source: http://newsroom.fb.com/company-info/

PS. you calculated using 400 million.

e12e · on April 11, 2014

300 pb / 10^9 ~ 300 10^15B/10^9 ~ 300 10^6 B ~ 300MB/user.

Considering my latest fb backup was ~18MB (unzipped), out of which 14MB was pictures, this doesn't sound too unreasonable to me. If anything, it sounds very conservative. If I was actively using fb for photos, I'd easily have at least 100 times as many, maybe 200 times as many. Not to mention that the single (short) video I've uploaded is 1.4 MB.

rasz_pl · on April 11, 2014

brilliant idea, lets have 500mil people leave their computers/laptops on 24/7 just so you can check their status and look at lolcats

cmelbye · on April 11, 2014

Facebook had 500MM users four years ago...

tsax · on April 11, 2014

What about persistence?

MasterScrat · on April 11, 2014

There was a trend a few years ago where users would deactivate their FB accounts when logging out, so that people couldn't see/interact with them while they were away.

http://edition.cnn.com/2010/TECH/social.media/11/12/facebook...

So maybe that's not as stupid as it sounds. You could also have some level of caching, eg I visited your pictures yesterday so people can get it from me for some time if you're away.

I definitely see potential in this form of "distributed but controlled" storage mechanisms.

Similarly, I like the idea of having a local backup drive, and another copy at a friend's place (possibly encrypted).

kakoni · on April 11, 2014

I wonder, how does vertica link to this. (http://www.vertica.com/2013/12/12/welcoming-facebook-to-the-...)

ForHackernews · on April 11, 2014

Maybe they should just delete something once in a while. How about actually deleting data for users that close their account?

oregon_engineer · on April 10, 2014

Where's the power coming from to scale like that? If Facebook is relying on power from the Columbia River for their data centers, they better talk to the tribes and the salmon. Those dams are failing and many parties in the Northwest want to see them removed.

A better bet might be for them to scale up in Utah and harvest methane from waste to generate power.

taftster · on April 11, 2014

Because Utah has a lot of methane? I'm not quite following. I'd think methane production would be greater in a cow state like Iowa or Ohio than in a desert state like Utah? joke

Utah's power grid is primarily coal fired, but they use a fair bit of natural (methane) gas as well. [1] Utah has a couple of major hydroelectric dams (glen canyon, flaming gorge), and the use of solar and wind are on the rise.

NSA is building a major data warehouse in Utah; one of the considerations would have to have been cheap power. I'm guessing the Columbia river produces cheaper electricity than pretty much anywhere else, but Utah has very diverse (and affordable) power production overall.

[1] http://www.deseretnews.com/article/700051087/Coal-mostly-pow...

[edit] failed on humor the first time, trying again.

wehadfun · on April 10, 2014

The things facebook does to keep people's shower thoughts readily available.

Theodores · on April 10, 2014

...not to mention our uber-lords at the NSA... (yawn)

Actually all tech can be criticised for banality. Think of the telephone - 'they put in all those cables just so she can talk to mother...'. Or television - 'they dug up teh street just so they could lay those fibre optic cables so your gran could watch the wresting...'. Or even the trains - the train stopping at my home town does seem a waste of time, I can't believe they bother when you look at who gets on.

tedks · on April 11, 2014

It's a fallacy that all technology can be criticized for banality. Technologies are different. Some are more criticizable than others.

Telephone was the first infrastructure to provide real-time voice communication. It enables families staying in contact, but it also enables economic growth, and a more effective society writ large.

Television is now a mindless wastefield of race-to-the-bottom drivel, but there are newer networks have haven't yet succumbed to drive, mostly on digital cable. I only have the respect for science and technology that I do because I grew up watching The Magic School Bus and Bill Nye the Science Guy. My parents watched the moon landing on television.

Facebook does not do anything novel, nor has it ever been used for anything terrifically insightful. It provides some social value and exists for that reason, but it is clearly not equivalent to all other technologies.

nemothekid · on April 11, 2014

I don't quite get it. Being able to transmit images is "novel" but being able to query more information that has ever been available before in the history of the human race is not?

You should really separate the application of the technology from the technology itself. Your last sentence can be said the exact same way about the telephone "It provides some social value and exists for that reason" but that is clearly a ridiculous statement to make about the telephone.

wehadfun · on April 11, 2014

My comment was more of a dig at things like banks which barely will let you view your financial transactions more than a year or two out. Yet I can pull up someones stupid rant about a basketball game from 7+ years ago on facebook.

ihsw · on April 10, 2014

> The things facebook does to keep people's shower thoughts readily available to advertisers.

FTFY.

siculars · on April 11, 2014

And the moat keeps getting larger. Good for them. Yes, they are kind enough to open source this, but how advanced is the tech that they don't? And how much time have they enjoyed as the first to benefit from this impressive tech? Again, good for them.