I have messed with ZFS on Linux on Ubuntu and I have to say that I would not yet...

brendangregg · on Dec 29, 2016

We've actually been running it in production at Netflix for a few microservices for over a year (as Scott said, for a few workloads, but a long way from everywhere). I don't think we've made any kind of announcement, but "ZFS" has shown up in a number of Netflix presentations and slide decks: eg, for Titus (container management). ZFS has worked well on Linux for us. I keep meaning to blog about it, but there's been so many things to share (BPF has kept me more busy). Glad Scott found the time to share the root ZFS stuff.

voncopec · on Dec 30, 2016

If I had to choose between a filesystem with silent and/or visible data corruption up to pretty much eating itself and having to restore an entire server, versus a filesystem for which you can trust but could have a kernel deadlock/panic..I would choose the latter, and in-fact did.

I have seen a few servers with ext4/mdraid over the last five years have serious corruption but have had to reset a ZoL server maybe twice.

ianhowson · on Dec 30, 2016

Story time.

I transitioned an md RAID1 from spinning disks to SSDs last week. After I removed the last spinning disk, one of the SSDs started returning garbage.

1/3 reads are returning garbage and ext4 freaks out, of course. It's too late and the array is shot. I restore from backup.

This would have been a non-event with ZFS. I've got a few production ZoL arrays running and the only problems I've had have been around memory consumption and responsiveness under load. Data integrity has been perfect.

voncopec · on Dec 30, 2016

I've seen the same type of thing with respect to memory and load.

dmm · on Dec 29, 2016

Do you have any specific reasons not to trust ZoL?

ZFS-on-Linux devs say it's ready for production[1].

Lawrence Livermore laboratory stores petabytes of data using ZoL[2].

If we're sharing anecdotes, ZoL has served me fantastically for several years.

[1] https://clusterhq.com/2014/09/11/state-zfs-on-linux/ [2] http://computation.llnl.gov/newsroom/livermores-zfs-linux-po...

otterley · on Dec 29, 2016

We have encountered a reproducible panic and deadlocks when a containerized process gets terminated by the kernel for exceeding its memory limit:

https://github.com/zfsonlinux/zfs/issues/5535

We're strongly considering using something else until this gets addressed. The problem is, we don't know what, because every other CoW implementation also has issues.

* dm-thinp: Slow, wastes disk space

* OverlayFS: No SELinux support

* aufs: Not in mainline or CentOS kernel; rename(2) not implemented correctly; slow writes

brendangregg · on Dec 29, 2016

The issue you link to was opened a day ago.

If that were me, I'd see how quickly it was fixed before strongly considering something else.

otterley · on Dec 30, 2016

Have you had any issues to report? If so, how quickly were they fixed? Knowing what the typical time is to address these issues would help us make a more educated decision.

lscotte · on Dec 30, 2016

Yes, we've run into 2 or 3 ZFS bugs that I can think of that were resolved in a timely fashion (released within a few weeks if I recall) by Canonical working with Debian and zfsonlinux maintainers (and subsequently fixed in both Ubuntu and Debian - and upstream zfsonlinux for ones that were not debian-packaging related). Of course your mileage may vary, and it depends on the severity of the issue. Being prepared to provide detailed reproduction and debug information, and testing proposed fixes, will greatly help - but that can be a serious time commitment on your side (for us, it's worth it). Hope that helps!

justincormack · on Dec 29, 2016

zfs is not in mainline or centos kernel, so you are presumably willing to try stuff. I believe all the overlay/selinux work is now upstream, it is supposed to ship in the next RHEL release.

otterley · on Dec 30, 2016

I look forward to that.

devoply · on Dec 29, 2016

My reasons are as follows:

1) Seen users complaining about data loss on issues on github. 2) Had the init script fail on upgrade and had to fix it by hand when upgrading Ubuntu. Probably a one time issue.

Need a bit more reliability from a file system.

user5994461 · on Dec 29, 2016

I thought "ZoL" was a pun with ZFS and LOL to tell how not ready it is for production ^^

ryao · on Dec 29, 2016

ZoL is an acronym for ZFSonLinux.

tnorgaard · on Dec 29, 2016

We have been running ZFS on Linux in production since April 2015 on over 1500 instances in AWS EC2 with Ubuntu 14.04 and 16.04. Only one kernel panic observed so far, on a Jenkins/CI instance, but that was due to Jenkins doing magic on ZFS mounts, believing it was a Solaris ZFS mount.

In our opinion, when we made the switch, it was much more important to trust the integrity of the data, than any possible kernel panic.

jeduardo · on Dec 29, 2016

Well, we (and by this I mean myself and my fantastic team) have been running it since 2015 as the main filesystem for a double-digit number of KVM hosts running a triple-digit number of virtual machines executing an interesting mix of workloads, ranging from lightweight (file servers for light sharing, web application hosts) to heavy I/O bound ones (databases, build farms) with fantastic results so far. All this on Debian Stable.

The setup process was a bit painful given some interesting delays when using some HW storage controllers that caused udev to not make some HDD devices available under /dev before the ZFS scripts kicked in and we have been bitten a couple times by changes (or bugs) in the boot scripts, however the gains provided by ZFS in terms of data integrity, backup, and virtual machine provisioning workflow were definitely worth it.

lscotte · on Dec 29, 2016

It's maturing rapidly and has proven to be very stable so far. We're not using it by default everywhere, at least not yet, and building out an AMI that uses ZFS for the rootfs is still a bit of a research project - but we have been using it to do RAID0 striping of ephemeral drives for a year or two on a number of workloads.

petre · on Dec 29, 2016

It's bullet proof on Solaris and FreeBSD.

user5994461 · on Dec 29, 2016

Which doesn't say anything about its state on Linux.

ianai · on Dec 29, 2016

The implementation might be lacking but the underlying FS should be more reliable. I'd still argue that ZFS should be deployed on FreeBSD or Solaris. There are plenty of ways to fire up a Linux environment from there.

FooBarWidget · on Dec 29, 2016

You didn't get the hint. He's saying you should be using Solaris or FreeBSD instead of Linux.

phil21 · on Dec 29, 2016

Depends on what you're worried about. Operationally speaking I agree, it's not plug and play.

But it's at a point where it safely stores your data correctly. Perhaps some init scripts fail on boot to import your pool/etc. but the data is there.

We do run it production, but we also have in-house tooling built around it.

sigstoat · on Dec 29, 2016

i've been using zfs on ubuntu since ~2010 for a small set of machines, reading/writing 24/7 with different loads. it's worked great through quite a few drive replacements, and various other hardware failures.

i'm perfectly willing to believe there may be some rare situations where zfs on linux will cause you a problem. but i bet they're rare enough it'll have saved you a few times before it bites you.

iso-8859-1 · on Dec 29, 2016

Do you trust btrfs? Suse has been having it as the default since 2014...

jlgaddis · on Dec 29, 2016

> The parity RAID code has multiple serious data-loss bugs in it. It should not be used for anything other than testing purposes. [0]

[0]: https://btrfs.wiki.kernel.org/index.php/RAID56

penagwin · on Dec 29, 2016

Important to note that is only referring to Raid 5 and 6

jlgaddis · on Dec 29, 2016

My newly built (Ubuntu 16.04 LTS) workstation is using ZFS exclusively. I'm keeping my fingers crossed.