Hacker News new | past | comments | ask | show | jobs | submit login

XFS is great, now. It had some problems with metadata performance at scale a few years ago (affected early versions of RHEL/CentOS 6, and may be one of the reasons it was not the default in those versions), but it's now the best of the "traditional" filesystems.

> From Btrfs, GlusterFS, Ceph, and others, we know that it takes 5-10 years for a new filesystem to mature.

Those are bad examples - they are all significantly more complex than XFS/ext. Two of the three are distributed filesystems that aren't solving any of the same problems.

However, their inclusion in the article is worth noting, even if the author put them in the wrong paragraph. Increasingly, large volumes are becoming distributed over lots of individual servers, with technologies like glusterfs and ceph. Both of these, and some of their competitors too (xtreemfs is also really good, despite the silly name), use traditional filesystems on the underlying server volumes that are presented as a large distributed FS. XFS is generally used for this task - and unless an individual node got larger than ~ 8EB, there's currently no reason to change this.

The real question, then, becomes - will a single server need a local FS larger than 8EB by 2025-2030? Possibly not. It's very dangerous to say "X is all anyone will ever need", but I think that we're going to increasingly see bulk storage go the same way that CPUs did - instead of a single huge local FS (analogous to increasing single-core clockspeed), we'll see an increasing number of storage nodes combined into one, via a distributed filesystem (analogous to higher core count).

Part of the reason for this is that there are quite a few disadvantages with having very large volumes in one place. RAID rebuilds become unreasonably long, if you use RAID, and RAID rebuild speed is not currently keeping pace with storage growth. If you don't, and solve redundancy with multiple nodes, then the bigger the individual node is, the larger the impact if it fails. At some point, you're also going to need to shift data off that box, and while we're likely to have 100GbE server ports around then[1], even 100GbE is going to take an unreasonably long time to move 8EB anywhere.

1: https://www.nanog.org/meetings/nanog56/presentations/Tuesday...

EDIT: Just did some quick calculations. Assuming a 100 Gigabit server port, that's 12.5 Gigabytes/s. At 12.5 GB/s, it would take over 20 years to transfer 8EB anywhere. The idea that we're going to have 8EB of data on an individual server in that timeframe is starting to look a bit silly, with that in mind - how's it going to get there? What on earth would you do with it once it is there?

Even if by some magic we invent 1TbE and make it cheap enough to use on servers (and invent 10+TbE for the network core) by 2025, that would still take over 2 years to fill the disk. Yes, sorry, but this is just silly. 8EB across lots of individual servers? Sure. But all on one server in a regular filesystem? Not going to happen any time soon.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: