Stratis Storage Software Design [pdf]

anoother · on Aug 2, 2017

> 10.2.2 Layer 1: Integrity (optional)8 > This layer uses a to-be-developed DM target that allows the detection of incorrect data as it is read, by using extra space to record the results of checksum/hash functions on the data blocks, and then compare the results with what the blockdev actually returned. This will enable Stratis to detect data corruption when the pool is non-redundant, and to repair the corruption when the pool is redundant. Such a DM target could use DIF information if present.

This (and other aspects of Strasis) seem like sensible solutions, especially in Linux-land[1]. Implementing this (and I assume dedupe, which is penned for v3 but on which no information is yet provided) at the DM layer means that other FSs can also benefit from this work.

[1] By this I mean that Linux, both as a dev community and as an OS, seems much more comfortable with individual, composable components that can be tied together in different configurations, than monolithic one-size-fits-many solutions like ZFS (and... systemd).

_gkxg · on Aug 3, 2017

That module is actually already in Linux in the form of dm-integrity: https://github.com/torvalds/linux/blob/master/Documentation/...

I'm using it with mdadm right now and it works pretty okay.

pmlnr · on Aug 3, 2017

I don't get this.

First and foremost, RAID, in many, many cases, is hardware level and it's sorted by mdraid if not. So I never understood why btrfs tried to implement this before having stable other things; it should have been an extra in the end.

LVM... well. LVM works, that is for sure. It's also painful, and the last time I tried the same amount of LVM2 snapshots as I have in ZFS it rendered my machine usesless in speed.

But features like built-in encryption, transparent compression, cow - these are _useful_. XFS has it's own problems, especially if you manage to fill it up 100% - I had to reboot numerous production servers due to this -, and it lacks all 3.

ZFS should be the answer. I'm aware of the license issues, still, ZFS is decades ahead of the other solutions. I'm also aware that the MD+LVM+XFS+Stratis will fit better in the UNIX philosophy - including the overhead of learning all the non-standardised, completely un-intuitive commands they have on their own.

jorangreef · on Aug 3, 2017

"First and foremost, RAID, in many, many cases, is hardware level and it's sorted by mdraid if not. So I never understood why btrfs tried to implement this before having stable other things; it should have been an extra in the end."

RAID is broken in those layers. The ZFS developers go into the reasons for this in detail.

_gkxg · on Aug 3, 2017

A pity they didn't consider Bcachefs [0]. I understand it's a new project but it's made some good progress and it seems like a relatively cheap investment.

Anyone from RedHat, have you considered talking to Kent, the Bcachefs developer, about this?

[0]: http://bcachefs.org/

zx2c4 · on Aug 3, 2017

bcachefs seems especially exciting because the built-in crypto uses authenticated encryption in the form of ChaCha20Poly1305. Most disk encryption operates in an unauthenticated mode (such as XTS) which means the ciphertext is malleable on disk. This makes it's useful only for the case of your laptop getting stolen out of your car in the parking lot while turned off, but not for many more complicated scenarios (like trusting the data once the police recover your stolen laptop). With authenticated encryption, you're able to guarantee the authenticity and integrity of your files, in addition to encrypting them. Really great. The authentication tag naturally, then, doubles as what ZFS people refer to as their "checksums". I'm watching bcachefs closely to see how it matures.

http://bcachefs.org/Encryption/

Filligree · on Aug 3, 2017

> Like trusting the data once the police recover your stolen laptop.

If that's your scenario, then you should also expect EFI-level implants or extra chips. Having authentication isn't a bad thing... just don't expect it to be enough.

The only thing you can safely use that machine for post-capture is to read a still encrypted disk image over the network.

bogomipz · on Aug 3, 2017

The article states:

>"ZFS isn’t an option RHEL can embrace due to licensing (Ubuntu notwithstanding.)"

Can someone explain that comment to me? I understand that that ZFS is CDDL but why is this acceptable for Canonical and not Red Hat? They both have enterprise customers. Is this just differing opinions form each's legal council or does this comment imply something else?

rodgerd · on Aug 3, 2017

Canonical are rolling the dice and hoping Oracle don't go after them, just as they rolled the dice on the patents around multimedia they bundle but Red Hat and Fedora don't. The wisdom of relying on the generosity is left as an exercise for the reader.

greenhouse_gas · on Aug 3, 2017

I don't know if Oracle (as a right holder of ZFS) can sue them (if they distribute the modules under the CDDL, they're fine as far as Oracle is concerned). It's all the Linux rights holders (including Oracle) who can sue over the GPL violation.

gtirloni · on Aug 3, 2017

Exactly. And it's wise of RH to stay out of that mess.

greenhouse_gas · on Aug 3, 2017

I'm not a lawyer, but there's quite an active debate whether closed-source (or, as on this case, GPL incompatible) modules are allowed to be distributed.

There are >1 lawyer which says that all kernel modules are derivative works of the kernel.

_gkxg · on Aug 3, 2017

Hasn't the NVIDIA kernel module been closed source essentially forever?

sanxiyn · on Aug 3, 2017

Yes, and many people consider NVIDIA kernel module to be in violation of GPL.

gigatexal · on Aug 3, 2017

Damn Oracle and their purchase of SUN Micro else we could all have had native ZFS but nooo we are stuck with an incompatible license and RedHat reinventing another ZFS like filesystem.

mbreese · on Aug 3, 2017

Remember - it was Sun that setup that initial CDDL licensing. Oracle had already started working on a separate replacement at the time (btrfs).

As came out with the Java GPL issues - don't assume that Sun had any choice in the matter - they may have had separate licenses to deal with. They may not have written everything from scratch.

gigatexal · on Aug 3, 2017

Ahh I didn't know that. Still a shame, I love ZFS (probably irrationally) but given it's license some of that goodness has to be reinvented in other projects which just seems unnecessary.

cmurf · on Aug 3, 2017

Almost everything in Btrfs is in the kernel. And except for some glue that I'm not familiar with, the same is true for ZFS. That means Btrfs volumes are very interoperable across distros and kernel versions, you merely need the kernel module built which most every distro includes. You don't even need user space tools to just mount a Btrfs volume, and get passive scrubbing, and data integrity guarantees.

This is basically an active (daemon) user space file system volume manager, leveraging a bunch of kernel layers. Assembly will require having stratis installed and running, it's not at all a general purpose file system solution. This is trying to fix a specific problem, how to leverage multiple storage technologies invented by different teams over different eras with mutually exclusive terminology goals.

Depending on the likely long list of dependencies, some of which will be version specific, who knows what the portability of such volumes will end up being across distros.

k__ · on Aug 2, 2017

Written in Rust and Python, cool.

tomovo · on Aug 2, 2017

"stores information in a text-based JSON format"

OK, this one is a winner.

oneplane · on Aug 2, 2017

Let's reinvent the wheel...

They could instead have:

- made btrfs 'better' so it works for them

- made ZFS's legal issues go away and use that

- ask apple if apfs was ok to use

coldtea · on Aug 2, 2017

>- made ZFS's legal issues go away and use that

By wishing it intensely?

greenhouse_gas · on Aug 3, 2017

I assume money can help.

Although I don't understand the following if the issue is copyright or patents:

1. If it's copyright, then presumably a team can black box reversed engineer ZFS, rewriting it under the GPL.

2. If it's patents, then how would any other CoW filesystem (say btrfs) get around them (curious sub point: when would those patents expire)?

sanxiyn · on Aug 3, 2017

It's in part patents. My understanding is that btrfs does not get around them.

This is not a vague patent threat, because there was an actual lawsuit with patent numbers spelled out, namely NetApp v. Sun. You can check expiration date of those patents. More here: http://en.swpat.org/wiki/NetApp%27s_filesystem_patents

tracker1 · on Aug 2, 2017

Assuming they're in a position to work with Oracle to make it happen, it's possible if unlikely.

askvictor · on Aug 2, 2017

They're not re-inventing the wheel; they're building on top of xfs and lvm2, making enhancements to both, and tying them into a more unified administrative interface.

tomovo · on Aug 2, 2017

APFS appears to be very modest in scope compared to ZFS. Designed for single volumes, only a few "interesting" features that play into what Apple needs - quicker backups, flash optimized writes, easy and flexible partitioning. That's about it.

orf · on Aug 2, 2017

Perhaps they tried all of those? Nothing wrong with innovation, especially for file systems. Plenty wrong with btrfs by the sound if it though.

jrs95 · on Aug 2, 2017

Maybe none of those will get them to their goal of making the systemd of filesystems ;)

rodgerd · on Aug 3, 2017

So:

1/ Fix something that in ten years hadn't been able to deliver stable RAID code and only found out this year their scrub option corrupts data instead of repairing it? Well, that's going to be trivial.

2/ Convince Larry Ellison to give something away to competitors.

3/ Convince Tim Cook to give something away to competitors.

Did you have any other ideas? Pull unicorns out of their butts?

msl09 · on Aug 2, 2017

The development is on fire https://github.com/stratis-storage

kronos29296 · on Aug 3, 2017

(Ir)Relevant xkcd https://xkcd.com/927/

JoshTriplett · on Aug 2, 2017

https://news.ycombinator.com/item?id=14915419