Hacker News new | past | comments | ask | show | jobs | submit login
OpenZFS 2.0 (github.com/openzfs)
364 points by ascom on Nov 30, 2020 | hide | past | favorite | 148 comments



Has there been any progress on the zfs on linux Linus disagreement front since this article?

https://arstechnica.com/gadgets/2020/01/linus-torvalds-zfs-s...


zfs on linux available as root partition since 20.04. Working quite well I might add!


That’s Ubuntu-specific where they provide their own kernel bundled with ZFS. It was working fine before 20.04 as well in the same way it does for other distros. Has nothing to do with the comment you’re replying to.


There's not much of a difference betweens building with a kernel module or building an independent kernel module. I can't figure out what the question being asked is, but I don't see how Ubuntu would matter.


It always worked, the question is how much work they have to do to work around a kernel that dislikes them.


It's still a big pain if you like to keep your kernel relatively up to date. I switched to btrfs; it just working is worth the few extra warts over ZFS.


> It's still a big pain if you like to keep your kernel relatively up to date.

Replacing a core system component with an out-of-repo version is always going to hurt, yes.

> I switched to btrfs; it just working is worth the few extra warts over ZFS.

I'm not sure I'd call "catastrophic failure and data loss" a "wart". In all my years of distro hopping, I've had 3 root filesystems become unbootable: 1 F2FS system early on, which I actually did manage to fsck out of, and 2 on an openSUSE tubleweed system using BTRFS as root.


>and 2 on an openSUSE tubleweed system using BTRFS as root.

How long ago was that? and have you been using other fully checksummed filesystems (like ZFS) on that hardware since then? I'm asking because if you're using btrfs without any raid features (or with simple RAID modes like 1/0) for the past several years and it breaks, if you dig deep enough into the problem, often the hardware is found to be at fault.

And ext4 or xfs either don't find corruption at all (if it's data corruption), or have better error recovery if the FS's own metadata got trashed (which is a strong argument in favor of them, I agree, but I wouldn't trust such a filesystem anyway and would restore from backups right away).

Edit: it's a strong argument for storing data on them which is checksummed by some higher component in your software stack, like the database. Otherwise, you're just asking for silent data bitrot.


> if you dig deep enough into the problem, often the hardware is found to be at fault

That's not really good enough though. Next gen file systems are supposed to be resilient even if hardware fails. That's the whole point of raiding and checksumming. ZFS was very much intended to be resilient when faced with bad hardware. Heck, even in the 90s this was a known problem hence chkdsk on DOS marking bad sectors to somewhat mitigate data corruption on FAT file systems. If Btrfs only works when hardware is behaving then that is absolutely a problem with Btrfs.

As for my experience with ZFS, it's kept consistency when disks have died. It's worked flawlessly when SATA controllers have died (one motherboard would randomly drop HDDs when the controllers experienced high IOPS -- which would be enough to trash any normal file system but ZFS survived it with literally no data loss). Not to mention frequent unscheduled power cuts, kernel panics (unrelated to ZFS), and so on and so forth. I'm sure it's possible to trash a ZFS volume but it's stood strong on some pretty dubious hardware configurations for me and where most other file systems would have failed.


Letsee, this root filesystem says it was installed on 2019-05-11, so what, a year and a half ago? ish? I just wiped it and reinstalled since only the root filesystem was hosed (separate home filesystem thankfully wasn't affected) and this box was already fully managed by ansible so I just rebuilt an exact replica of the same system in place. (In hindsight, no, I don't know why I didn't use that opportunity to switch to XFS.)

Also, I'm going to somewhat mirror sibling comments: Even if the hardware is faulty, that should produce a filesystem with explicit checksum errors, not an unreadable filesystem. There is certainly an upper limit to what it could catch, but you'll have to forgive my skepticism that only one of the 2 filesystems on the system was affected and only after months of use, and then the corruption was so complete that it couldn't even tell me what was wrong and try to fix it.


> if you dig deep enough into the problem, often the hardware is found to be at fault

Well with ZFS I've had hardware break and still not experienced any data loss. I've had cables getting lose multiple times, I've had several disks dying[1], I've had unstable SATA controllers (hello JMicron) and plenty of unexpected power losses and hard resets.

Yet ZFS has sailed through it all with my data intact. Sure ZFS ain't bulletproof. It can get messed up. But for the most part it takes a lot of beating without a dent.

[1]: As a matter of fact, I just finished resilvering a RAID-Z1 pool in my NAS after a WD Red 3TB died after almost 7 years of 24/7 operation (barring a few accidental power outages).


I second this. Even though it's an out of tree project, ZFS on Linux has always been much much more stable and reliable than Btrfs, as far as my experience goes. The only time I really managed to screw a ZFS pool was because of the buggy controller on a dirty cheap SSD.


I've had zero problems with kernel updates on Ubuntu 20.04 with ZFS on a natively encrypted root. I followed the instructions in the wiki, lightly modified for my hardware and workload:

https://gist.github.com/xenophonf/76fd44ae24772e457cb63d00c0...

`apt-get update && apt-get dist-upgrade -y` works as expected. I plan to switch to a similar config on my Lenovo laptop when I upgrade it to the next Ubuntu LTS release.


Ubuntu's kernel isn't exactly keeping up to date though. I assume the person you were replying to may be following mainline.

As someone using new kernel version as they are released, I'm not willing to use a filesystem that may break with a kernel update. It also seems openzfs only supports up to kernel 5.6, according the the github release. I'm on 5.9, so its not even an option.


Yeah, too many scary notes and warnings for me.

https://wiki.archlinux.org/index.php/ZFS

I would need a package that depends on zfs and provides linux-kernel at an appropriate version. Can't have something so critical break because of an upgrade, and I don't want to pin it and forget to upgrade it (also fairly anti-arch).


I’ve ran the latest kernel with latest openzfs git since around kernel 4.x, currently on 5.9.11. I build inkernel as opposed to a module.

There have been a couple cases where I had to wait a week or two for compatibility fixed to get merged into zfs git, but otherwise staying up to date has not been a problem.


Adding on, I've been using ZFS as my root partition on Arch with the latest kernel and zfs-dkms and have never had a problem.


Zstd compression with configurable levels is really interesting: You could write every block first with a level comparable to lz4 for very fast performance. And if a block has not been rewritten for some time you recompress them with a compression level allowing more compression and comparable decompression performance.

So cold data (cold write, cold/hot read) will take less and less space over time while still having the same read performance.


That would be an even more interesting feature for NILFS2, as I understand it, its ring buffer structure requires moving the oldest unmodified blocks as the ring buffer write frontier approaches. Any blocks that are forced to be copied are by definition old and unmodified, and need to be moved anyway, so why not recompress? AFAIK, there are no plans for compression in NILFS, but I think it's an interesting idea.


My understanding is that for ZFS, things like this would require a mythical feature called "block pointer rewrite", the same feature required to implement out-of-band deduplication.


You are correct - ZFS hardcodes the assumption that data's location on disk will never change once written very deeply, and offline dedup/data migrating of any sort would require that.

(It would also be a performance nightmare - you'd have a permanent indirection table you'd need to use for _everything_, and if you've ever seen how ZFS dedup performs with its indirection table not on dedicated SSDs, you can understand why this is terrible.)


The block could still be rewritten from the view of zfs as long as it does not update the last-written timestamp (does zfs have this?). I was just describing how it would look like from the birds eye.


Directly no, but if you moved the data to a new dataset, with a command that preserves the timestamp that would work (rsync -a or zfs send/recv), which could be run from a cronjob.

Compression settings are set at a per dataset level, so applying this to only some files in a dataset isn't practical.


Sadly dRAID (parity Declustered RAIDz) just missed the cut-off for 2.0, but it looks like it will be in 2.1:

* https://openzfs.github.io/openzfs-docs/Basic%20Concepts/dRAI...

* https://www.youtube.com/watch?v=jdXOtEF6Fh0


dRAID looks really fascinating, but presentation is pretty abstract. Would it allow to add/remove drives from a pool, and allow ZFS to rebalance itself?

Would be great for home use, where I have a lot of drives that I collected over the years that are not the same size.

EDIT: The more I read into this, it still seems assume that all drives must be of the same size.


I don't think so. The essence of draid is that, instead of keeping a spare drive unused in case one of the working drives fail, it incorporates the spare drive to the array and uses it, but one drive worth of free space is reserved randomly across the entire array.

That way, if one disk fails, the reserved space is used to write the data necessary to keep the array consistent. Because the free space is distributed randomly across the array, the write performance of a single drive doesn't become a bottleneck.

This is unrelated to the ability to remove drives from a pool (which is difficult to support in ZFS due to design constraints)


Maybe this presentation by Mark will help?

dRAID, Finally![0]

[0]: https://www.youtube.com/watch?v=jdXOtEF6Fh0


This sounds like synology hybrid raid, which uses lvm and mdadm together for something similar if I recall.


You can already do that with btrfs.


I currently use btrfs with RAID1 at home, and it works great. But btrfs also does not have the track record for being the most stable filesystem as compared to ZFS.

[1] https://lore.kernel.org/linux-btrfs/20200627032414.GX10769@h... [2] https://lore.kernel.org/linux-btrfs/20200627030614.GW10769@h... [3] https://lore.kernel.org/linux-btrfs/20200520013255.GD10769@h...


You can do that with ZFS too, at least for mirrored sets (i.e. RAID10). It's possible to remove a vdev, and the pool will migrate the data to the remaining vdevs.


This is huge! And very exciting :D

One thing I am wondering about is this:

> Redacted zfs send/receive - Redacted streams allow users to send subsets of their data to a target system. This allows users to save space by not replicating unimportant data within a given dataset or to selectively exclude sensitive information. #7958

Let’s say I have a dataset tank/music-video-project-2020-12 or something and it is like 40 GB and I want to send a snapshot of it to a remote machine on an unreliable connection. Can I use the redacted send/recv functionality to send the dataset in chunks at a time and then at the end have perfect copy of it that I can then send incremental snapshots to?


zfs send supports a resume token (-t) to resume interrupted streams received with (-s). Just use normal send/receive until you have the full stream sent.


I think it's more if you want to not send scratch or cached files you can have it automatically remove it from the snapshot being sent

> Redacted send/receive is a three-stage process. First, a clone (or clones) is made of the snapshot to be sent to the target. In this clone (or clones), all unnecessary or unwanted data is removed or modified. This clone is then snapshotted to create the "redaction snapshot" (or snapshots).

Think of it like a selective sync in Dropbox or SyncThing at the FS level.


That's a protocol problem, use a protocol such as rsync. You don't need to use redacted sends/recvs.


rsync doesn't scale like zfs send/recv. It requires scanning of every file at both the source and destination to compute the delta to send. zfs snapshots and send/recv don't need to do that. The delta is already fully described by the snapshots themselves. zfs is also working with immutable snapshots. It guarantees the source and destination copies are identical; rsync can't do much about the source and destination being modified while it is running since it's reliant upon other users of the system not touching the data being synced.

That's not to say rsync doesn't work. It does. But it doesn't scale well, and the data integrity guarantees aren't there.


rsync has it's own issues if the connection has high latency though - zfs send was originally developed by a Sun engineer who wanted to speed up large transfers to servers in China, if I recall correctly.


+1 for rsync, but with check-summing turned on, i think that's acceptable for 40GB.


It's not really enough for ZFS (unfortunately). It won't move snapshots, bookmarks etc.


I'd love to get rid of my FreeNAS VM and run ZFS directly on my Linux desktop, but having to mess with the kernel has kept me from attempting it so far. Maybe I'm worrying about nothing.

btrfs seems like the main alternative if you want native kernel support, but when I checked a couple years ago there seemed to be a lot of concerns about the stability. Is that still the case?


To mirror other comments, btrfs is pretty stable now and I run it on my server. The main problems now are with RAID5/6 profiles, the implementation has lots of issues still that can cause data loss[1]. Seems most of the core developers don't use those profiles so it hasn't been getting better. If you want to use RAID it would be safer to stick to RAID1/10[2]

[1] https://lore.kernel.org/linux-btrfs/20200627032414.GX10769@h...

[2] https://www.man7.org/linux/man-pages/man8/mkfs.btrfs.8.html#...


On the btrfs mailing list [1], there are still sporadic reports of unrecoverable FS corruption for whatever reason. See [2], [3] for some recent examples.

[1] https://lore.kernel.org/linux-btrfs/

[2] https://lore.kernel.org/linux-btrfs/CAD7Y51i=mTDnEWEJtSnUsq=...

[3] https://lore.kernel.org/linux-btrfs/CAMXR++KUj2L7qpR7QZeiM2T...


[3] is interesting, thanks for linking that. [2] has so many moving parts, I wouldn't expect it to be related to btrfs without more information. I mean, there's both the fs and the cache layer being resized down, with unknown method.


Both openSUSE and [as of very recently] Fedora use btrfs by default, so btrfs support seems pretty stable these days.

(But as others have pointed out, there are options for using zfs on linux, too)


Attempting to use zfs for the root partition is a huge headache because the software lives in the supplementary `filesystems` repo. https://build.opensuse.org/package/show/filesystems/zfs

1. It often happens that the main repo offers a new kernel, but the corresponding module is not ready on obs yet. This means upgrading to the latest rolling release cannot just happen at any time, but requires careful planning. This is a big inconvenience.

2. In the past dracut sometimes just failed to pick up the module for the initrd, causing a boot failure at the next system start. I could not figure out why, however this never happened with the first class supported ext/xfs.

3. The distro's boot/rescue media do not contain the driver. This means a third-party boot medium is required to go into a broken system, and repairing it when chroot is involved is now much more complicated because of the different distro.


btrfs was a really underutilized filesystem. It still has some superior features to zfs (such as offline deduplication), but the momentum now is clearly with zfs.


ZFS is no extra work with NixOS! You just declare the filesystem type like any other in the config and it takes care of kernel modules and what-not.


Sure - after you figure out NixOS lol


Actually no, NixOS is probably easier to use than other Linuxes. It gets more difficult when you need to package something new that it doesn't have, then you have to know the Nix language and how nixpkgs work.


Interesting; when I tried it a while back it seemed like you needed to know the language to manage your configurations. Was that impression incorrect?

I may have tried it far enough back that I pretty much immediately encountered packages I wished it had and tried (and failed) to package it myself, though, and got the experience mixed up…


No, you're 100% correct. Despite the prosthelytization you'll get from many died-in-the-wool users who forgot what it was like in the beginning, in order to use NixOS effectively you either need to learn Nix, or be willing to spend a lot of time on IRC asking questions, which will end up with... you learning Nix. That's the reality. I think I spent something like 3 weeks porting my server configuration from Ubuntu to NixOS, by hand, piece by piece, many years ago. Admittedly I think we're better off than we were 7 years ago, but it's still not a grand slam. Even things like basic GUI installers that can set up your filesystem don't exist! Manually screwing with partition layouts to get volume encryption isn't easy to use at all, honestly.

There are an array of flaws with the tools but, despite that, they are unbelievable powerful and you can do things in NixOS you can't dream of doing elsewhere, and it makes things like using OpenZFS or whatever pretty easy and simple. And it makes some thing far more difficult than that, nearly trivial. But only once you know what you're doing. But that's just the reality: it's an extremely powerful tool that has many rough edges. Saying it's "the easiest distro to use" is a complete joke, and I wish fellow NixOS users didn't have some weird propensity to practically lie about how good it is on that front. I say this as someone who has been a NixOS developer and user for like, ~7 years and who apparently(?!) has over 1,000 commits to the tree now, too. Trying to actually sit down at a terminal and sell unconvinced people on it opened my eyes quite a bit. It's good, but lying about what it is and isn't is a good way to burn peoples faith.


I don't think what you said necessarily contradicts what I said.

NixOS is easy to use as long as as what you're trying to do is contained within the configuration, then set up is pretty much just editing that configuration file (which essentially is just series of dictionaries and lists). I wouldn't for example have with installing NixOS for my grandparents.

Here's example config when that's true: https://github.com/areina/nixos-config/blob/master/thinkpad-...

If you want to do something that's not covered then you'll have to learn Nix, and that part is indeed hard, because it's like configuring linux through use of saltstack/chef/puppet/ansible through a functional language (which many people don't have experience with), but as you said it pays off.

I think the hardest part is the paradigm shift where everything you do is no longer imperative but declarative. It also doesn't help that documentation is always behind what nixpkgs can do and Nix functionality would cover multiple books.


Is there a guide for this? Sounds interesting!


For instance, I have:

     fileSystems."/zfs/media" =
        { device = "tank/media";
          fsType = "zfs";
        };
in my hardware-configuration.nix. tank/media is defined as using a legacy mount-point or whatever the ZFS terminology is. Done.

ETA: I mean, I had to do all the gruntwork to get the pool built, yeah. But once it was defined, getting it mounted and all the kernel bits and bobs set was trivial like that.


In addition, if you just want to play around with it:

    boot.supportedFilesystems = [ "zfs" ];
Both installs the necessary kernel modules and adds zpool(1) / zfs(1) to $PATH.


I've been running encrypted ZFS on 20.04 on my main workstation since it came out and it's worked great. Wrote up details here, it's a slight hack for encryption, no hack if you don't want crypto. https://linsomniac.gitlab.io/post/2020-04-09-ubuntu-2004-enc...

A friend did a video based on my blog: https://www.youtube.com/watch?v=PILrUcXYwmc


Why ZFS encryption vs unencrypted ZFS atop LUKS?


Native ZFS encryption make pool management easier. Less bookkeeping to import/export a pool. And the file system can make better decisions about performance and recovery.


I wanted to say "as someone who tends to follow the unix philosophy" and realized the irony of saying that regarding ZFS...

That said, I generally agree with you in that do one thing and do it well is a laudable design goal. However, I also am very excited about encrypted ZFS for one main reason: backups.

Okay two. Snapshots and backups!

ZFS is absolutely amazing to use as a home NAS that does daily (or more) snapshots and then nightly differential syncs to a second location. In the past I had to run all my own infrastructure to do this, as the data was in the clear.

Now my ZFS nerd friend and I can simply swap backup space and have "zero knowledge" of the others' files, while retaining the amazing features of ZFS snapshots+zfs send/receive.

This also tickles the "create an encrypted ZFS backups as a service" service itch for me, but then I realize I'd be creating it for all 13 potential users of the service. That said, I'm sure rsync.net will offer this functionality shortly - which would make them a viable backup target for me.


Regarding "unix philosophy" and ZFS... it's actually very compliant with it, despite uninformed claims from early public release about "blatant layering violations".

It's just that majority of users never have reason to see more than tiny signs of the layers hidden behind (mostly) 2 command line tools, and for various reasons those layers are compiled into one one module.

But the clean layered design is how LustreZFS happened :)


https://zfs.rent/ is very interesting.

I also recall someone working years ago on a way to push snapshots to S3 or similar, but I never heard if that idea got off the ground (downside is of course the snapshots need to be recovered before they can be mounted, but the dollar cost would be rock bottom).

What would be more interesting is a backup application for Desktop Linux that assumes a ZFS root; all the problems that plague Desktop applications (that seem to keep them in eternal beta or wither away) disappear. It needs to switch on and push snapshots. It needs allow the remote file system to be mounted (to browse the snapshots for selective recovery). It needs a a disaster recovery process to recover an entire system from a remote snapshot.


> I also recall someone working years ago on a way to push snapshots to S3

You can pipe zfs send to gof3r.


rsync.net already provides zfs receive capability: https://www.rsync.net/products/zfsintro.html


Once you put a fs on top of something, don't you lose any guarantees of finding broken sectors when scrub?

This is why I really wish btrfs would get native encryption, but maybe my info is out of date.


Why add another separate layer?


Sorry for the late reply, traveling.

In my case, because it's what Ubuntu "supports" for bootable root crypto ZFS, and I wanted to try it.

I've run ZFS on top of LUKS for my backup storage servers for probably over a decade now, and it works fine. But it wasn't really an option for my workstation.

That said, I'm not really sure what benefit I'm gaining from ZFS on my desktop. I've got that snapshots which are definitely nice. I've used it a couple times to go back in time. In theory I can go back if I wedge the system through package installs or an OS upgrade, but I've not done that (yet). It does slow down package installs because of taking snapshots, but that's ok generally.


Because traditionally, programs do one thing well.

And in my experience LUKS works great.


If you're using ZFS, you're already okay with some level of ignoring that; ZFS is a inherently huge layering violation. It's a filesystem and a volume manager with encryption and compression, its own user access system (zfs allow), and its own NFS implementation.


Umm, that's untrue.

It's a very cleanly layered system, it just doesn't bother end user with details (as implementor, you can play with them, thus LustreZFS): there's separate SPA (block), DMU (OSD) and ZPL (FS) & ZVOL (emulated block device) layers.

Compression and encryption are integrated at DMU level because that's a logical place for them.

NFS actually calls OS nfs server.


I suppose it depends which layer you look at; as a humble sysadmin/user, it certainly behaves like a monolithic system, as opposed to having separate things to configure and invoke for LVM/LUKS/filesystem (which is, after all, the starting topic for this thread). Having only semi-recently seen an internal architecture diagram, I was indeed pleased to see that it's internally a composition of parts that do one thing well, but that's not really user-visible.

Interesting; it'd been claimed to me before that ZFS had its own NFS server (or I guess the OpenSolaris NFS server) included but that nobody used it because it was old/buggy. A quick glance at https://github.com/openzfs/openzfs/ (the old archived version based on illumos, if I read correctly) implies that this might have been true at one point, but indeed https://github.com/openzfs/zfs doesn't seem to do its own NFS so it's not true now if it ever was. Thanks for correcting my understanding.


That sounds about as modular as systemd: it's in theory modular, but it very much throws away the existing boundaries and has yet to see its new modules be adopted by external folks (if that were even possible, depending on interface stability).

I believe that the modularity will only proof itself when external (as in, from unrelated people) projects becomes established and we see how well the original project maintains compatibility.

(I must say I wonder how big the intersection of people-who-like-ZFS and people-who-like-systemd is; they seemed to originate from very different cliques but there's no reason people who like one would dislike the other…)


There's sufficiently strong dislike for systemd in the ZoL circles I moved around, because ZFS was all about making sure its rock solid and stable, while systemd has well-deserved reputation for breaking things often.

As for external clients for the layers, LustreZFS is a separate project though it had started with certain intersection of ZFS devs. However the general division of labour between layer is pretty strict (except for - now extinct - FreeBSD TRIM support), it's just that there isn't any work being done to use it outside of OpenZFS.

The boundaries are pretty clear, it's just that ZPL and ZVOL build up on all of them. Linux has /some/ related features, but nothing that was feature-parity: SPA roughly corresponds with MD/DM subsystem assuming certain plugins in use, DMU is very roughly the equivalent of OSD subsystem, but that one supports only SCSI OSD which has incompatible assumptions etc. - in fact, an OSD implementation on top of DMU should be pretty simple (main differences are due to DMU being a bit explicit on redundancy features, iirc).


Does it offer plausible deniability like dm-crypt?


I personally use ZFS on Arch Linux. The DKMS package works almost out of the box and I haven't had any troubles. It takes a long time (but not too much) to compile though.

Or you can use the latest Ubuntu that is shipped with ZFS.


Be aware of possible incompatibilities with the regular Arch kernel. I switched my NAS to linux-lts when a kernel point-release proved incompatible with openzfs.


Right, I forgot to mention I always use the LTS kernel, since I had getrandom() booting problem last year. https://lwn.net/Articles/800509/


Reporting in from Ubuntu 20.04 with ZFS on root. It works great. No issues so far, other than docker requires "zfs" as the storage-driver and doesn't support overlay2. See: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/171...


Interesting. I thought overlay2 can be agnostic to the base filesystems.


I use btrfs on both servers and laptops without a problem these days. This wasn't the case almost a decade ago, though, when I got bit by its then-instability.


You know Ubuntu support ZFS since 20.04. Experimental, but quite stable for me. Just select file system during installation process.


Or say apt install zfs-utils, that's it, not even a reboot.


Oh yeah, right. Installation method if one needs ZFS on root.


> but having to mess with the kernel has kept me from attempting it so far. Maybe I'm worrying about nothing.

For the most part, yes. Occasionally a kernel developer who seems to be bitter about a company that doesn't exist any more tries to break compat with ZFS, but it's generally smooth sailing on Fedora, Debian, and CentOS, with dkms handling the building of modules seamlessly.


Just use something else for your root file system, and zfs for the rest. I've been running Zol for 10 years (on arch) and had to recover a few times, but it was never difficult because of the totally standard setup except for the data disks.


The easiest way is using t Proxmox installer which has ZFS as a filesystem. Underneath it is a Debian 10 installation. Last time I tried you could not enable ZFS encryption. I don’t know what is the case with Openzfs 2.0

Do we have encryption,yet?


Proxmox uses their own kernels, so they build the zfs modules as well. They are bundled in the pve-kernel package.


Ubuntu 20.10 has an option in installer to use ZFS encryption for root partition.


Oh sweet, I don't think encryption was an option in 20.04. Is it still under an "experimental" option in the installer?


Yeah, still experimental but improved over 20.04. Added encryption and "autotrim=on" by default.

I think Canonical planing is to have ZFS as an experimental option only for Desktop version of Ubuntu until next LTS release.


I'm using btrfs and my system still works. :)


Me too. I have been running full btrfs since at least 5 years (single and RAID1 disks), which no data loss whatsoever. Now that there is support for swapfiles, I do not even need to create a dedicated partition for them.


>ZFS directly on my Linux desktop

Use BTRFS trust me it's stable now...well the commands are terrible compared to ZFS. All my Server are FreeBSD but on the Laptop and on one Workstation i have openSUSE Tumbleweed since like 2 years and it works great.


I've used btrfs for years with no problems ever. But, I see weekly reports on the btrfs reddit forum of the type "I was doing btrfs RAIDxyz, and I can't mount read/write" etc., so there do exist people who have issues with it today. If you do RAID on steroids, you might do some research before trying btrfs.


> well the commands are terrible compared to ZFS

Really? I don’t think so, I find btrfs usage extremely straightforward and easy to grok. ZFS on the other hand has all that confusing lingo about vdevs, etc...

I get that this is subjective but I disagree.


The Brtfs commands are very poor compared with what ZFS offers. The ZFS commands are organised around the end user: the system administrator. The Brtfs commands are not.

As an example, you're running low on space and need to find out which datasets (subvolumes) are using the most space. How do you do that? With ZFS it's a single command which runs in a few milliseconds. With Btrfs...


Hey everyone has a different taste, but vdevs, datasets, and pool are for me much more logical than lv's and lg's (pun was NOT intended).


but thats not really btrfs, thats LVM. I use BTRFS directly on physical disks and dont use pvs, vgs or lvs.


well then call it volumes and vdev's?...i love zfs's layering.


> the commands are terrible

what does that mean?


The "btrfs" tool has a lot of leaky abstractions, confusing intended usage, and gotchas all over the place. If you aren't a btrfs developer, it is difficult to know what exactly you want to do and how to accomplish it.

ZFS on the other hand has just two commands for common administration tasks: zpool and zfs. zpool controls pool-level operations, mainly ones that deal with the storage layer; zfs controls the logical file systems and volumes that are contained within a pool. The zpool and zfs commands have been meticulously crafted to not expose much of the underlying software architecture and focus only on what administrators want, and all of it is clearly documented.

There are actually a few other commands that come with ZFS if you really want or need to deal with low-level and difficult details, commands like zdb, zinject, zstreamdump. You almost never need any of them.


For zfs specific features, there are 'zfs' and 'zpool' commands (well, binaries, and the first parameter is a command). For btrfs, there is 'btrfs'.

So I guess that the GP considers /usr/sbin/{zfs,zpool} more intuitive than /usr/sbin/btrfs.


Well zpool is for the pool which consist of filesystems/datasets or zfs's. So i have a clear distinction of the whole pool (pools or logical groups) or filesystems/datasets (zfs or logical volumes).

It has nothing todo with /usr/sbin/x


https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs

>what does that mean?

Not functional but logical (for me)


debian has zfs in the contrib repo since stretch; no manual hacking required just have to enable contrib

I switched my freebsd box over to debian about two years ago. No complaints so far :)


Finally, this means we've a way to share "real" filesystems on both FreeBSD and Linux. The only other filesystems you could open without issues on both are FAT and NTFS (thought NTFS-3G), both of which are less than ideal for data you care about.


Slightly off topic, but it seems like GitHub can't/won't display the user profile page for one of the OpenZFS developers:

https://github.com/behlendorf

For me, that gives a unicorn 100% of the time (tried across several minutes), instead of showing the developer profile.

Anyone else seeing that?


It does, indeed, report that "This page is taking too long to load."!


Yeah, it's still unicorning for me, about a day later. :(


Loaded in under 5 seconds flat for me, perhaps it's something strange with whatever edge server you're hitting?


Could be, but if so it's persistent. It's about a day later now, and the page still won't load.


Testing now, the page is finally loading. Page load time of ~2 days... that's different. ;)


It loaded for me earlier today, I think github is just having issue.


Congratulations - it's great to see the code unification on the two key ZFS platforms, and continuing to add useful features, especially around at-rest encryption.

Many thanks to the various OpenZFS contributors.


How's the memory consumption of ZFS without deduplication these days? I've got a couple of 4 TB drives connected to a single board ARM computer with 2 GB of RAM. I used to use btrfs, but switched to XFS after I accidentally filled up a drive and was unable to recover.


ZFS without dedup will just run slower with less RAM available for caching, up to a point (I think the lowest I've seen someone run it with ARC configured to use in recent memory is 128 MB? I believe 32 MB or so is the minimum below which OpenZFS will just ignore you if you try to tell it to use less...)

I've seen people use it as a rootfs on RPis, and have personally run it on Pis for brief occasions without encountering any RAM problems.


I'm looking at setting up my first ZFS pool ('zpool'?) in a few weeks, on Linux. Will I be using OpenZFS or something else? Ubuntu 20.04.

(Sorry if noise; I'm just trying to get an idea of how relevant this 2.0 release is to me.)


> The ZFS on Linux project has been renamed OpenZFS! Both Linux and FreeBSD are now supported from the same repository making all of the OpenZFS features available on both platforms.

Previously it was called ZFS on Linux, but now ZFS development is unified on the "OpenZFS" codebase shared both between Linux and FreeBSD as much of the development effort for ZFS in general ended up there.


Ah, I was wondering what happened since I stopped hearing about "ZFS on Linux" so now I know what to search for. Thanks!


Just built a FreeNAS system over the past couple weeks and finished doing burn-in tests of my hard drives, wonder if I should wait and see how to install OpenZFS 2.0.0 before I create my storage config.


FreeNAS 12 (now named TrueNAS) is already using OpenZFS 2.0, or very nearly.


Does it support NFS4.2?(fallocate, sparse files and server side copy)


Aren't ZFS upgrades to existing vdevs really simple? I don't see any reason why you need to wait.


That’s the idea I’ve gotten when looking around online. I figured I was in the uncommon situation of having a completely blank and ready system, so I could afford to just wait a few days.


Yes, ZFS upgrades are really simple, but they are one-way, you can't downgrade after.


They certainly seem to be within OpenZFS over the past few years.


Anyone know what version of Ubuntu Server this will land in?


Likely 21.04. I doubt they'll pull it into 20.10 or 20.04.


Probably 21.04. 22.04 if you want an LTS release.


hooray for zstd compression!


Side note, they really should have in big-bold letters "DO NOT ENABLE DEDUPLICATION UNLESS YOU HAVE A TON OF RAM!" on their readme. That was a huge mistake on my part. The ram requirements are VERY high for good performance.

I realized how bad the performance was when it took about 2 hours to delete 1000 files.


It does already say that. This is what it says:

Deduplication is the process for removing redundant data at the block level, reducing the total amount of data stored. If a file system has the dedup property enabled, duplicate data blocks are removed synchronously. The result is that only unique data is stored and common components are shared among files.

Deduplicating data is a very resource-intensive operation. It is generally recommended that you have at least 1.25 GiB of RAM per 1 TiB of storage when you enable deduplication. Calculating the exact requirement depends heavily on the type of data stored in the pool.

Enabling deduplication on an improperly-designed system can result in performance issues (slow IO and administrative operations). It can potentially lead to problems importing a pool due to memory exhaustion. Deduplication can consume significant processing power (CPU) and memory as well as generate additional disk IO.


That's not new with 2.0 though. It's forever been the case with ZFS. Everything that discusses dedupe basically says: 'don't use it'.


Most guides I read tell you that you should not enabled DEDUP unless you know what you are doing and it will use a lot of ram.


To me this sounds more like you didn't RTFM ;-)


Will OpenZFS on Linux ever be integrated with the Linux page cache?


Probably never. ZFS isn't just a filesystem, it was developed to be an entire storage system that's vertically integrated, so ARC is a fundamental part of the filesystem design.

ZFS also has a huge legacy. Right now the license (probably) prevents you from legally shipping a compiled zfs module with the linux kernel, just solving that seems insurmountable. It's also supported on Illumos and FreeBSD, trying to refactor it to use the linux page cache would have a chance of introducing bugs to these platforms.


ZFS isn't really designed for local 'temporary' file systems (IMHO). You don't really need to nest checksums, create snapshots or volume manage when you're slugging pages between ram and nvme.


No, they have ARC and ARCL2, if you want the traditional thing go to NILFS2 or BTRFS or in the future XFS (when they have full check-summing).


>in the future XFS (when they have full check-summing).

Is this actually planned?


YES! Step by step and keep XFS as stable as it is (the most trustworthy linux FS of them all)


XFS is one of the only filesystems I've suffered serious catastrophic dataloss with. The other one is of course Btrfs which was and is the worst of the lot.

What was worrying was that the XFS dataloss was due an action totally out of our hands: a power outage at a substation which took out a whole area of the city. The whole datacentre lost power, and the XFS filesystems on some massive storage arrays were completely hosed. Just from power loss. It took days to put it all back from tape backups. XFS has long been known to have problems with unclean shutdowns, but total loss from a power outage is about as bad as it gets.


During my last >15 years of Linux usage, I had exactly the two filesystems you are advocating for here crashing on me: - XFS, long time ago, had a bug that made it lose files silently - BTRFS twice, most recently about a year ago in a super simple setup (no raid or any other fanciness). I wasn't able to recover it, after a while I at least got it to mount as ro and copied the contents away

These were all on Gentoo, so with relatively recent tools and vanilla kernels.

The only filesystems that I never had problems with were ext4 and reiserfs.


>reiserfs

That was exactly the FS that eat my data back in ~2005. Never had problems with XFS or ZFS. With Btrfs well i just use it regularly since 2 years so i cant say much, but i think Redhat chose XFS for a reason.


The reason is they hired the XFS developers from SGI. And they bought Sistina for LVM. As a result, they have been wedded to both XFS and LVM for many years now, because there is likely a combination of wanting to maximise their investment into these technologies and developing the in-house expertise to support them very well, and also in having a number of staff who are deeply committed to them and don't want to change.

At some level, they must understand that both XFS and LVM are over 25 years old, and when compared with e.g. ZFS, are completely outclassed. Their current efforts developing Stratis, which is an attempt to provide more ZFS-like functionality by extending XFS, adding LVM thin pools, and managing it all with an unholy complex combination of daemons, D-BUS and Python looks like a logical progression based upon what they have to hand in house, but a strategic mistake when it can never approach ZFS in functionality or reliability simply because these technologies can only be extended so far because of fundamental design limitations. I'll be morbidly interested to see what they can stretch XFS to do. But I won't be using it myself.

What I find really surprising here is that Linux in general, and RedHat in particular, don't have a competitive filesystem to offer. There is absolutely nothing which matches ZFS.


>because there is likely a combination of wanting to maximise their investment into these technologies and developing the in-house expertise to support them very well, and also in having a number of staff who are deeply committed to them and don't want to change.

Not sure if you would risk your customers data just because of that. I never had any problems with XFS.

>At some level, they must understand that both XFS and LVM are over 25 years old

Being a User of ZFS (on FreeBSD) myself, zfs is not much younger 2006.

>and RedHat in particular, don't have a competitive filesystem to offer.

That i really don't understand too. Maybe they think for "small" stuff HW-Raid or LVM is good enough and everything bigger is Ceph or Gluster anyway.


Absolutely agreed, the customer's data is paramount, and I think from the perspective of supporting that with their well established in-house expertise, it makes sense.

However, XFS isn't perfect. As I wrote in a separate reply in this thread, my team in a previous position suffered catastrophic dataloss when a power cut took out some massive storage arrays. XFS does not handle power loss gracefully, and in two cases, the whole storage array was unrecoverable and required restoring from tape.

I use ZFS on FreeBSD (and Linux) too, and while it dates back to 2006 and was designed around ~2000, LVM and XFS date back at least a decade prior to that. They are a generation apart, and ZFS builds upon the knowledge of that previous generation, and its successes and its flaws.

Regarding competitive stuff, that's a mystery to me as well. My organisation went with some proprietary IBM storage array kit, but it was a real pain. Required hand compiling kernel modules against the RHEL kernel. And it still resulted in the above dataloss issues.


OpenZFS is in fact a more prestigeous name and it already sounds better than ZFS on Linux.


If you get on the calls, you’ll find zero hostility across the operating systems devs. The focus is on OpenZFS, with the Linux branch gradually becoming baseline for the FreeBSD work as well. Illumos ( where OpenZFS originated after Illumos was formed post the OpenSolaris shutdown) hasn’t moved to this baseline yet due to the significant OS level differences and instead code is pulled between the “branches” as needed. The collaboration happens via email and regular calls.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: