Hacker News new | past | comments | ask | show | jobs | submit login

LZ4 is so fast, that in make sense to use it everywhere over uncompressed data. Even storing items in-memory compressed sometimes is profitable as you can fit more items in memory.

Still zstd offers way better compression and got variable difficulty factor: https://github.com/facebook/zstd Decompression is always fast, but you can trade off compression vs. ratio factor.

In general if send data over network zstd is quite profitable. Even network attached disk AWS EBS or AWS S3 it can be a hugely profitable.




Using LZ4 can easily improve performance even if all data reside in memory.

This is the case in ClickHouse: if data is compressed, we decompress it in blocks that fit in CPU cache and then perform data processing inside cache; if data is uncompressed, larger amount of data is read from memory.

Strictly speaking, LZ4 data decompression (typically 3 GB/sec) is slower than memcpy (typically 12 GB/sec). But when using e.g. 128 CPU cores, LZ4 decompression will scale up to memory bandwidth (typically 150 GB/sec) as well as memcpy. And memcpy is wasting more memory bandwidth by reading uncompressed data while LZ4 decompression reads compressed data.


That's extremely interesting. Thanks for sharing.

I'm not into C/C++ for years though and now I wouldn't grok the code, sadly.


"Even storing items in-memory compressed sometimes is profitable"

LZ4 is one of the algorithms supported by Zram in Linux. It's fairly popular for people using things like a Raspberry PI that have a smaller amount of RAM.


TIL about zRAM, emailed myself this config to try at work

https://www.techrepublic.com/article/how-to-enable-the-zram-...


Thanks, I just enabled it. We will see. :)


Microsoft found that compressed pages were always faster, because added by de-/compression was less than the latency to disk (given a sufficiently fast compression algorithm). As a bonus, it's also faster to read and write compressed pages to disk (if that absolutely has to happen). Zswap is therefore enabled by default on Windows.

I configure my own kernel on Arch and Zswap is enabled by default there, too.


Microsoft's compressed pages implementation seems to work far better than zswap on Linux.

I can't quite see why - perhaps the logic to decide which pages to compress is different, or there is too much code in the swap subsystem that slows down the compression/decompression process...


Pages as in the 8K (for example) data structure used to store portions of files on disk, or pages as in text files? I'm assuming the former but I am not very good with file system internals


Memory pages, typically 4kb.


Thanks


I have a ZFS pool with exclusively video files. Probably won't see any benefit in enabling LZ4 there right?


Probably not, but ZFS is actually smart enough to store the uncompressed version if compression doesn't save space. In other words: zfs will try lz4 compression on those video files, notice that it doesn't gain anything and store it uncompressed.

[0] https://klarasystems.com/articles/openzfs1-understanding-tra...


Assuming you have a larger record size than block size (with media files you probably want a 1M record size), and that you probably have ashift set to 12 (4k) or 13 (8k), then I believe you need to enable compression in order to prevent the final record from using the full record size. IOW, ZFS will pad out the final record with zeros to the full record size, then use compression on it (if enabled) so that the zeros don't need to be written to disk.

This article refers to the padding on the final record as "slack space" and states that you need to enable compression to eliminate the slack space.

https://arstechnica.com/information-technology/2020/05/zfs-1...

See also:

https://old.reddit.com/r/zfs/comments/gzcysy/h264_h265_media...


LZ4 is not going to help, if you already have compressed data.

For image, video and audio there are more efficient compressions taking advantage of those formats.


I compression not on by default nowdays? Anyhow, I would not run ZFS with compression disabled completely. There are edgecases where you want it. The meta data? I can't remember the details. At least active the compression that just compresses zeros.


I enabled it for my media storage, knowing that it wouldn't matter much, and indeed it doesn't. Compressratios are 1.00, 1.00 and 1.01.


Since you can control compression settings per dataset, I'd just use a separate dataset for the videos directory with compression disabled.


If it's fast enough you wouldn't only be able to feed more data you'd be able to access it faster as well since you need less bandwidth.

That said it can be quite tricky to rewrite something to efficiently work on compressed data.


GPU texture compression formats would be the obvious example. With the additional constraint that they need to support efficient random access.


> Even network attached disk AWS EBS or AWS S3 it can be a hugely profitable.

I always assume S3 storage was compressed on the fly by AWS regardless of how the client chooses to store his/her data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: