Hacker News new | past | comments | ask | show | jobs | submit login

> Facebook using it doesn't mean anything since they are probably using it for distributed applications. Meaning the entire box (including BTRFS) can just die and the cluster won't be impacted.

I don't think that follows for lots of reasons:

- If enough of your boxes die that you lose quorum (whether from filesystem instability or from unrelated causes like hardware glitches), your cluster is impacted. So, at the least, if you expect your boxes to die at an abnormally high rate, you have to have an abnormally high number of them to maintain service.

- Filesystem instability is (I think) much less random than hardware glitches. If a workload causes your filesystem to crash on one machine, recovering and retrying it on the next machine will probably also make it crash. So you may not even be able to save your service by throwing more nodes at the problem. A bad filesystem will probably actually break your service.

- Crashes cause a performance impact, because you have to replay the request and you have fewer machines in the cluster until your crashed node reboots. It would take an extraordinarily fast filesystem to be a net performance win if it's even somewhat crashy.

- Most importantly, distributed systems generally only help you if you get clean crashes, as in power failure, network disconnects, etc. If you have silent data corruption, or some amount of data corruption leading up to a crash later, or a filesystem that can't fsck properly, your average distributed system is going to deal very poorly. See Ganesan et al., "Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions", https://www.usenix.org/system/files/conference/fast17/fast17...

So it's very doubtful that Facebook has decided that it's okay that btrfs is crashy because they're running it in distributed systems only.




This article https://www.linux.com/news/learn/intro-to-linux/how-facebook... explains somewhat what Facebook does with BTRFS.

"Mason: The easiest way to describe the infrastructure at Facebook is that it's pretty much all Linux. The places we're targeting for Btrfs are really management tasks around distributing the operating system, distributing updates quickly using the snapshotting features of Btrfs, using the checksumming features of Btrfs and so on.

We also have a number of machines running Gluster, using both XFS and Btrfs. The target there is primary data storage. One of the reasons why they like Btrfs for the Gluster use case is because the data CRCs (cyclic redundancy checks) and the metadata CRCs give us the ability to detect problems in the hardware such as silent data corruption in the hardware. We have actually found a few major hardware bugs with Btrfs so it’s been very beneficial to Btrfs."

The sentence: "We also have a number of machines running Gluster, using both XFS and Btrfs." seems to imply Facebook is not using it heavily for actual data storage. What I distill from this (which is obviously my personal interpretation) is that Facebook mostly uses it for the OS and not for actual precious data.


I'm reading that as quite the opposite: they're saying that Gluster, a networked file storage system, is being backed with btrfs as the local filesystem, so all data stored in Gluster is ultimately stored on btrfs volumes. (They're also using it for OS snapshotting, yes, but insofar as the data stored in Gluster is important, they're storing important data on btrfs.)

See also https://code.facebook.com/posts/938078729581886/improving-th...

"We have been working toward deploying Btrfs slowly throughout the fleet, and we have been using large gluster storage clusters to help stabilize Btrfs. The gluster workloads are extremely demanding, and this half we gained a lot more confidence running Btrfs in production. More than 50 changes went into the stabilization effort, and Btrfs was able to protect production data from hardware bugs other filesystems would have missed."


Given how many times btrfs has failed to read data or to mount (with an error), I would imagine this is why btrfs is used by Facebook: because it isn't afraid to 'just let it crash' (cleanly), to use Erlang rethoric.


Yeah, it's definitely true that you want a filesystem with data and metadata checksums if you want high reliability. (I think btrfs and ZFS are the only Linux-or-other-UNIX filesystems with data checksums?)

But I think the inference to make is that Facebook trusts btrfs to increase reliability, not that Facebook trusts their distributed systems to cover for btrfs decreasing reliability to gain performance (or features).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: