LZ4 is so fast, that in make sense to use it everywhere over uncompressed data. ...

zX41ZdbW · on Jan 27, 2021

Using LZ4 can easily improve performance even if all data reside in memory.

This is the case in ClickHouse: if data is compressed, we decompress it in blocks that fit in CPU cache and then perform data processing inside cache; if data is uncompressed, larger amount of data is read from memory.

Strictly speaking, LZ4 data decompression (typically 3 GB/sec) is slower than memcpy (typically 12 GB/sec). But when using e.g. 128 CPU cores, LZ4 decompression will scale up to memory bandwidth (typically 150 GB/sec) as well as memcpy. And memcpy is wasting more memory bandwidth by reading uncompressed data while LZ4 decompression reads compressed data.

pdimitar · on Jan 27, 2021

That's extremely interesting. Thanks for sharing.

I'm not into C/C++ for years though and now I wouldn't grok the code, sadly.

tyingq · on Jan 27, 2021

"Even storing items in-memory compressed sometimes is profitable"

LZ4 is one of the algorithms supported by Zram in Linux. It's fairly popular for people using things like a Raspberry PI that have a smaller amount of RAM.

pmarreck · on Jan 27, 2021

TIL about zRAM, emailed myself this config to try at work

https://www.techrepublic.com/article/how-to-enable-the-zram-...

johnisgood · on Jan 27, 2021

Thanks, I just enabled it. We will see. :)

zamalek · on Jan 27, 2021

Microsoft found that compressed pages were always faster, because added by de-/compression was less than the latency to disk (given a sufficiently fast compression algorithm). As a bonus, it's also faster to read and write compressed pages to disk (if that absolutely has to happen). Zswap is therefore enabled by default on Windows.

I configure my own kernel on Arch and Zswap is enabled by default there, too.

londons_explore · on Jan 28, 2021

Microsoft's compressed pages implementation seems to work far better than zswap on Linux.

I can't quite see why - perhaps the logic to decide which pages to compress is different, or there is too much code in the swap subsystem that slows down the compression/decompression process...

tomcam · on Jan 27, 2021

Pages as in the 8K (for example) data structure used to store portions of files on disk, or pages as in text files? I'm assuming the former but I am not very good with file system internals

tyingq · on Jan 27, 2021

Memory pages, typically 4kb.

tomcam · on Jan 28, 2021

Thanks

KitDuncan · on Jan 27, 2021

I have a ZFS pool with exclusively video files. Probably won't see any benefit in enabling LZ4 there right?

andruby · on Jan 27, 2021

Probably not, but ZFS is actually smart enough to store the uncompressed version if compression doesn't save space. In other words: zfs will try lz4 compression on those video files, notice that it doesn't gain anything and store it uncompressed.

[0] https://klarasystems.com/articles/openzfs1-understanding-tra...

js2 · on Jan 27, 2021

Assuming you have a larger record size than block size (with media files you probably want a 1M record size), and that you probably have ashift set to 12 (4k) or 13 (8k), then I believe you need to enable compression in order to prevent the final record from using the full record size. IOW, ZFS will pad out the final record with zeros to the full record size, then use compression on it (if enabled) so that the zeros don't need to be written to disk.

This article refers to the padding on the final record as "slack space" and states that you need to enable compression to eliminate the slack space.

https://arstechnica.com/information-technology/2020/05/zfs-1...

See also:

https://old.reddit.com/r/zfs/comments/gzcysy/h264_h265_media...

jakozaur · on Jan 27, 2021

LZ4 is not going to help, if you already have compressed data.

For image, video and audio there are more efficient compressions taking advantage of those formats.

Mashimo · on Jan 27, 2021

I compression not on by default nowdays? Anyhow, I would not run ZFS with compression disabled completely. There are edgecases where you want it. The meta data? I can't remember the details. At least active the compression that just compresses zeros.

npteljes · on Jan 27, 2021

I enabled it for my media storage, knowing that it wouldn't matter much, and indeed it doesn't. Compressratios are 1.00, 1.00 and 1.01.

Teknoman117 · on Jan 27, 2021

Since you can control compression settings per dataset, I'd just use a separate dataset for the videos directory with compression disabled.

contravariant · on Jan 27, 2021

If it's fast enough you wouldn't only be able to feed more data you'd be able to access it faster as well since you need less bandwidth.

That said it can be quite tricky to rewrite something to efficiently work on compressed data.

mrec · on Jan 27, 2021

GPU texture compression formats would be the obvious example. With the additional constraint that they need to support efficient random access.

jgalt212 · on Jan 27, 2021

> Even network attached disk AWS EBS or AWS S3 it can be a hugely profitable.

I always assume S3 storage was compressed on the fly by AWS regardless of how the client chooses to store his/her data.