LZ4 – Extremely fast compression

st_goliath · on Jan 27, 2021

A while ago I did some simplistic SquashFS pack/unpack benchmarks[1][2]. I was primarily interested in looking at the behavior of my thread-pool based packer, but as a side effect I got a comparison of compressor speed & ratios over the various available compressors for my Debian test image.

I must say that LZ4 definitely stands out for both compression and uncompression speed, while still being able to cut the data size in half, making it probably quite suitable for life filesystems and network protocols. Particularly interesting was also comparing Zstd and LZ4[3], the former being substantially slower, but at the same time achieving a compression ratio somewhere between zlib and xz, while beating both in time (in my benchmark at least).

[1] https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/...

[2] https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/...

[3] https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/...

1996 · on Jan 27, 2021

> lz4 (...) probably quite suitable for life filesystems and network protocols

Actually, no. lz4 is less suitable than zstd for filesystems.

BTW, lz4 is present in many mozilla tools like thunderbird: it's represented by its bastard child lz4json, which is diverging by just the headers don't work with regular lz4 tools

> achieving a compression ratio somewhere between zlib and xz, while beating both in time (in my

Your observation is correct: zstd is now standard and the default on openzfs 2.0, replacing lz4.

The 19 compression variants offer more flexibility than just lz4- another strength is the decode time is not a function of the compression factor, which is something good on coldish storage that's rarely updated.

tal8d · on Jan 28, 2021

> zstd is now standard and the default on openzfs 2.0, replacing lz4.

Are you sure? The default compression level has always been "off", but when switched on - the default has been lz4 for about 5 years. Zstd support was added less than a year ago and there are still a lot of things that need to be fixed before one could even suggest that it might be a sane default. I like zstd, but I like my uncorrupted data more. I know that compatibility between compressor versions and pools is a concern, and there are also the compression performance problems with the way zstd handles zfs block sizes. Thankfully lz4 works great for zfs and has for many years now.

https://github.com/openzfs/zfs/blob/master/include/sys/zio.h...

yjftsjthsd-h · on Jan 27, 2021

> Actually, no. lz4 is less suitable than zstd for filesystems.

Why's that? What benefit would I get from switching? Is it workload-dependent?

EDIT: To be clear, I'm not disagreeing; if zstd will work better, I want to know about it so that I can switch my pools to use it.

st_goliath · on Jan 27, 2021

>> Actually, no. lz4 is less suitable than zstd for filesystems.

>

>Why's that? What benefit would I get from switching? Is it workload-dependent?

Presumably because Zstd has much better compression, while still being quite fast.

I don't see however how that invalidates any of my observations. Some filesystems like e.g. UBIFS support LZ4, but now also support Zstd, because both are suitable for the task (and LZ4 was around earlier).

In the end it is a classic space vs. time trade-off and there is AFAIK no generic right or wrong answer (except that some algorithms are too slow to even be considered).

older · on Jan 27, 2021

F2FS supports compression with LZ4 since Linux 5.6.

jakozaur · on Jan 27, 2021

LZ4 is so fast, that in make sense to use it everywhere over uncompressed data. Even storing items in-memory compressed sometimes is profitable as you can fit more items in memory.

Still zstd offers way better compression and got variable difficulty factor: https://github.com/facebook/zstd Decompression is always fast, but you can trade off compression vs. ratio factor.

In general if send data over network zstd is quite profitable. Even network attached disk AWS EBS or AWS S3 it can be a hugely profitable.

zX41ZdbW · on Jan 27, 2021

Using LZ4 can easily improve performance even if all data reside in memory.

This is the case in ClickHouse: if data is compressed, we decompress it in blocks that fit in CPU cache and then perform data processing inside cache; if data is uncompressed, larger amount of data is read from memory.

Strictly speaking, LZ4 data decompression (typically 3 GB/sec) is slower than memcpy (typically 12 GB/sec). But when using e.g. 128 CPU cores, LZ4 decompression will scale up to memory bandwidth (typically 150 GB/sec) as well as memcpy. And memcpy is wasting more memory bandwidth by reading uncompressed data while LZ4 decompression reads compressed data.

pdimitar · on Jan 27, 2021

That's extremely interesting. Thanks for sharing.

I'm not into C/C++ for years though and now I wouldn't grok the code, sadly.

tyingq · on Jan 27, 2021

"Even storing items in-memory compressed sometimes is profitable"

LZ4 is one of the algorithms supported by Zram in Linux. It's fairly popular for people using things like a Raspberry PI that have a smaller amount of RAM.

pmarreck · on Jan 27, 2021

TIL about zRAM, emailed myself this config to try at work

https://www.techrepublic.com/article/how-to-enable-the-zram-...

johnisgood · on Jan 27, 2021

Thanks, I just enabled it. We will see. :)

zamalek · on Jan 27, 2021

Microsoft found that compressed pages were always faster, because added by de-/compression was less than the latency to disk (given a sufficiently fast compression algorithm). As a bonus, it's also faster to read and write compressed pages to disk (if that absolutely has to happen). Zswap is therefore enabled by default on Windows.

I configure my own kernel on Arch and Zswap is enabled by default there, too.

londons_explore · on Jan 28, 2021

Microsoft's compressed pages implementation seems to work far better than zswap on Linux.

I can't quite see why - perhaps the logic to decide which pages to compress is different, or there is too much code in the swap subsystem that slows down the compression/decompression process...

tomcam · on Jan 27, 2021

Pages as in the 8K (for example) data structure used to store portions of files on disk, or pages as in text files? I'm assuming the former but I am not very good with file system internals

tyingq · on Jan 27, 2021

Memory pages, typically 4kb.

tomcam · on Jan 28, 2021

Thanks

KitDuncan · on Jan 27, 2021

I have a ZFS pool with exclusively video files. Probably won't see any benefit in enabling LZ4 there right?

andruby · on Jan 27, 2021

Probably not, but ZFS is actually smart enough to store the uncompressed version if compression doesn't save space. In other words: zfs will try lz4 compression on those video files, notice that it doesn't gain anything and store it uncompressed.

[0] https://klarasystems.com/articles/openzfs1-understanding-tra...

js2 · on Jan 27, 2021

Assuming you have a larger record size than block size (with media files you probably want a 1M record size), and that you probably have ashift set to 12 (4k) or 13 (8k), then I believe you need to enable compression in order to prevent the final record from using the full record size. IOW, ZFS will pad out the final record with zeros to the full record size, then use compression on it (if enabled) so that the zeros don't need to be written to disk.

This article refers to the padding on the final record as "slack space" and states that you need to enable compression to eliminate the slack space.

https://arstechnica.com/information-technology/2020/05/zfs-1...

See also:

https://old.reddit.com/r/zfs/comments/gzcysy/h264_h265_media...

jakozaur · on Jan 27, 2021

LZ4 is not going to help, if you already have compressed data.

For image, video and audio there are more efficient compressions taking advantage of those formats.

Mashimo · on Jan 27, 2021

I compression not on by default nowdays? Anyhow, I would not run ZFS with compression disabled completely. There are edgecases where you want it. The meta data? I can't remember the details. At least active the compression that just compresses zeros.

npteljes · on Jan 27, 2021

I enabled it for my media storage, knowing that it wouldn't matter much, and indeed it doesn't. Compressratios are 1.00, 1.00 and 1.01.

Teknoman117 · on Jan 27, 2021

Since you can control compression settings per dataset, I'd just use a separate dataset for the videos directory with compression disabled.

contravariant · on Jan 27, 2021

If it's fast enough you wouldn't only be able to feed more data you'd be able to access it faster as well since you need less bandwidth.

That said it can be quite tricky to rewrite something to efficiently work on compressed data.

mrec · on Jan 27, 2021

GPU texture compression formats would be the obvious example. With the additional constraint that they need to support efficient random access.

jgalt212 · on Jan 27, 2021

> Even network attached disk AWS EBS or AWS S3 it can be a hugely profitable.

I always assume S3 storage was compressed on the fly by AWS regardless of how the client chooses to store his/her data.

roncohen · on Jan 27, 2021

another contender is zstd: https://github.com/facebook/zstd. It typically offers better compression ratios than LZ4 at a slight (depending on your data) cost in speed. Additionally it offers a training mode to tune the algorithm to increase compression ratio on specific types of data, particularly useful for compression of small pieces of data.

moonchild · on Jan 27, 2021

> It typically offers better compression ratios than LZ4 at a slight (depending on your data) cost in speed

Per the table at [0], zstd provides only a slight improvement in compression ratio, and in exchange is about half the speed of lz4.

They both have their place.

0. https://facebook.github.io/zstd/

hyperpape · on Jan 27, 2021

That table shows zstd comparing poorly at its faster settings, but at slower settings, it offers a significantly better compression ratio, albeit 3x slower decompression.

baybal2 · on Jan 27, 2021

LZ4 has branchless decompression, and lower cache footprint, thus it can work on low end, and non-desktop CPUs equally well.

zstd, brotly, snappy were seemingly all made with high end x86 capabilities in mind.

jorangreef · on Jan 27, 2021

I also appreciate LZ4's simplicity and tiny code footprint.

zstd is brilliant as well, but in terms of code base it's a whole other beast.

makapuf · on Jan 27, 2021

Yes decompression on baremetal cortex m4 is a mere hundreds of bytes, you can decompress it from flash directly to its output buffer.

Teknoman117 · on Jan 27, 2021

I've used it in bootloaders that have slow transfer mechanisms (uart, i2c) to get whatever speedup I can for a few hundred bytes of binary.

wolf550e · on Jan 27, 2021

Google snappy is same class as lzo and lz4, not same class as brotli and zstd.

ignoramous · on Jan 27, 2021

Also see Daniel Reiter Horn's DivANS built at Dropbox: https://dropbox.tech/infrastructure/building-better-compress...

unsigner · on Jan 27, 2021

Zstd is very different - it includes an entropy coder. LZ4 only finds repeated matches, but then doesn't encode them very efficiently.

To put it simplistically, if you have a file which is a (good) random mix of an equal number A and B characters, LZ4 won't be able to compress it significantly, while Zstd will compress it 8:1 converging to an encoding where a '1' bit is A, and a '0' bit is B.

YetAnotherNick · on Jan 27, 2021

> To put it simplistically, if you have a file which is a (good) random mix of an equal number A and B characters, LZ4 won't be able to compress it significantly

I checked it. LZ4 is still reducing the size to half, no idea why half. So for 10 MB file it compresses to 5 MB.

Edit: checked with highest compression and it compresses 1MB file to 185KB. So what the parent wrote is false.

not2b · on Jan 27, 2021

Yes, if I take the 8 combinations aaa, aab, aba etc and assign each of them a 9 bit codeword I replace each 24 bit sequence with a 9 bit sequence. So arithmetic coders have no problem with cases like this.

unsigner · on Feb 1, 2021

but LZ4 doesn't have a arithmetic coder, or any other statistical encoding - it's just matches and literals. Puzzling...

julian37 · on Jan 27, 2021

Yep, Zstd is the spiritual successor to LZ4 and written by the same person (Yann Collet) after they got hired by Facebook.

smueller1234 · on Jan 27, 2021

Actually, I seem to recall that he was working on it before getting hired by Facebook (unless there was a massive delay in the hiring to become known). I was following his excellent blog posts on the matter at the time.

tmd83 · on Jan 27, 2021

Yes it was a fully working things before facebook. There has been a lot of improvement in both the core and cli. But the core innovations of zstd was well established before facebook. I was probably following his blogs (even though I wasn't a compression expert) for months before I saw the post about his joining facebook.

thechao · on Jan 27, 2021

Yann wrote LZ4 and Zstd well before joining FB. I have to applaud FB for supporting Yann's work, though.

zrav · on Jan 27, 2021

I've spent an afternoon testing zstd's custom dictionaries. It really only provides benefits on small data blocks. According to my tests, the largest blocks at which custom dictionaries could still provide a benefit is 8K, above that the compression ratio advantage compared to the default is definitely gone.

StreamBright · on Jan 27, 2021

> Additionally it offers a training mode to tune the algorithm to increase compression ratio on specific types of data

Yes, however there is usually no facility to train your compression algo with most tools using ZSTD.

pmarreck · on Jan 27, 2021

There should be a way to pool standard dictionaries somewhere, such as a "standard english text corpus data" dictionary, that you can then download on demand for encoding, say, BLOB text fields in a database with little to no overhead.

The way this would probably work without this facility though, say, in a database, is that the dictionary is maintained internally and constructed on the fly from the field data and not exposed to users. Although, I don't know if you'd have to keep every version of the dictionary in order to successfully decompress old data? If so then perhaps this is a niche feature

felixhandte · on Jan 27, 2021

W.r.t. standard dictionaries, it's something we're interested in, but the fundamental reality of dictionaries is that their effectiveness is strongly tied to their specificity. Put another way, a universal dictionary is a self-contradiction.

And yes, totally, I know at least RocksDB supports exactly that behavior [0].

[0] https://github.com/facebook/rocksdb/blob/12f11373554af219c51...

PSeitz · on Jan 27, 2021

I ported the block format to Rust matching the C implementation in performance and ratio.

https://github.com/pseitz/lz4_flex

dgacmu · on Jan 27, 2021

Has anyone written the appropriate Reader wrappers to use this with file io? (Asking b/c a quick search didn't turn anything up.)

PSeitz · on Jan 27, 2021

file io should come with the frame format, which is not yet implemented. For the block format it's not really suited.

pmarreck · on Jan 27, 2021

NICE! Well done!

zX41ZdbW · on Jan 27, 2021

It is very interesting that compression libraries from Yann Collet outperform their Google counterparts by all means:

lz4 >> snappy

zstd >> brotli

specialist · on Jan 27, 2021

Maybe Adjacent: Corecursive episode w/ Daniel Lemire was just terrific.

https://corecursive.com/frontiers-of-performance-with-daniel...

Some people are just really good at performance sensitive stuff.

InfiniteRand · on Jan 27, 2021

I'm not a fan of the stacked bar charts, I like the table of data for "Benchmarks" on the github source page: https://github.com/lz4/lz4

It makes it very clear where LZ4 fits into comparisons with compression speed, decompression speed and compression ratio

AnthonBerg · on Jan 27, 2021

Here’s a fork of the Windows compression tool 7-Zip which has LZ4 support baked in along with some other useful algorithms – the repo has a good comparison of them: https://github.com/mcmilk/7-Zip-zstd/

(Linking to this more for the overview than the Windows tool in itself.)

Jonnax · on Jan 27, 2021

The frustrating thing about 7zip is that whilst it's opensource, a single author just does code drops.

So there's a bunch of forks with useful features that'll never be adopted because there's no collaboration.

At least that's what I could tell when I looked into it

nyanpasu64 · on Jan 27, 2021

I liked http://www.e7z.org/ for its UI improvements, but it was last updated in 2016 and misses out on several security bug patches.

rav · on Jan 27, 2021

It sounds like it needs a fork to combine all the forks, similar to what Neovim did as a fork of Vim.

0-_-0 · on Jan 27, 2021

Would that be like... a reverse fork? A defork? A korf?

diggernet · on Jan 27, 2021

Oh heck, let's just call it a spoon.

OldHand2018 · on Jan 27, 2021

Nice. Really, anything that had solid support would be nice. My company delivers data to customers and our customers use all sorts of operating systems. Plain Zip seems to be our lowest common denominator because we can’t count on our customers being tech-savvy or having an IT department.

I really, really, really wish there were more compression standards that were supported on a wide variety of platforms. “Obscure” software package A just doesn’t cut it.

bombcar · on Jan 27, 2021

LZ4 is so fast there’s almost no reason to NOT have it on for zfs volumes.

StillBored · on Jan 27, 2021

For (low) bandwidth metrics yes, for any kind of latency sensitive workload not really.

The extra decompression on top of the data fetch latency can be quite noticeable. Sometimes that can be offset if the compression ratio is affecting a hitrate, and thereby decreasing the latency. The problem of course is that even with 10M IOP storage devices frequently it is really latency and an inability to keep 100k requests outstanding that limit perf to one's IO turnaround latency.

Put another way, compressed ram and disk are really popular in systems which are RAM constrained, or bandwidth limited because the cost of fetching 2x the data vs 1x and decompressing it is a win (think phones with emmc). The problem is that this doesn't really make sense on high end NVMe (or for that matter desktops/servers with a lot of RAM) where the cost of fetching 8k vs 4k is very nearly identical because the entire cost is front loaded on the initial few bytes, and after than the transfer overhead is minimal. Its even hard to justify on reasonable HD/RAID systems too for bandwidth tests since any IO that requires a seek by all the disks will then tend to flood the interface. AKA it takes tens of ms for the first byte, but then the rest of it comes in at a few GB/sec and decompressing at faster rates takes more than a single core.

edit: And to add another dimension, if the workload is already CPU bound, then the additional CPU overhead of compress/decompress in the IO path will likely cause a noticeable hit too. I guess what a lot of people don't understand is that a lot of modern storage systems are already compressed at the "hardware" layer by FTL's/etc.

tpetry · on Jan 27, 2021

Zfs is so fast it should be the default for everything where a slight compression may benefit the system: disk io, network transfers, ...

stavros · on Jan 27, 2021

Do you mean LZ4? I fear your comment is a bit misleading as-is.

tpetry · on Jan 27, 2021

Ups, yeah it should be lz4, but i can‘t change it anymore.

stavros · on Jan 27, 2021

Ah, that's fine, hopefully this comment chain will be upvoted.

xianwen · on Jan 27, 2021

I'm curious. I use btrfs daily. Although I have been interested in using zfs, I haven't yet gotten the time. In your experience, is zfs faster than btrfs?

kbumsik · on Jan 27, 2021

Yes. Much faster. Especially for HDDs. But at a cost of a lot of RAM. Also lz4 compression can speed up your HDDs up to 10x (!) to read and 3x to write. [1, see "To compress, or not to compress, that is the question" section.] But it's going to have a considerably higher CPU usage as well.

[1]: https://calomel.org/zfs_raid_speed_capacity.html

ksec · on Jan 27, 2021

>But it's going to have a considerably higher CPU usage as well.

I am going to assume in three to four years time this wouldn't be a problem? I mean a 16nm Quad Core ARM Cortex SoC are only $15.

Unfortunately no consumer NAS are implementing ZFS. ( TrueNAS offering isn't really consumer NAS )

kbumsik · on Jan 27, 2021

I am not sure about cheap ARM devices but I am using an old Haswell i5-4670 and it is more than enough. So it won't be issue later.

Also, when you are talking about consumer NAS, the real problem is that any low-end systems can saturate the gigabit network (100MB/s) very easily so investing on extra resources for ZFS doesn't make difference. At least a 10Gbe network (which is beyond the average consumer) is required to actually make it useful.

pdimitar · on Jan 28, 2021

I repurposed a micro Dell Optiplex 3060 with 8GB RAM and two external HDDs totalling 9TB of space. The CPU is an i3. The whole thing takes less space than a book.

I have lz4 enabled and the gigabit link is almost completely saturated when transferring: 119 MB/s out of the total theoretical 125.

No ZIL, no L2ARC devices are attached. That thing is _flying_ as a home NAS.

ksec · on Jan 27, 2021

> can saturate the gigabit network

Yes I completely forgotten about that. But 2.5/5 Gbps Ethernet is finally coming along.

Hopefully someday. ZFS will come.

tarruda · on Jan 27, 2021

I've used both Btrfs and Zfs as Linux root filesystems and at the time I tested (about 4-5 years ago) Btrfs had much worse performance. I've heard that Btrfs greatly improved performance on recent kernels though.

What bothers me about Zfs is that it uses a different caching mechanism (ARC) than Linux page cache. With ARC you actually see the memory used in tools like htop and gnome system monitor (it is not cool seeing half your memory being used when no programs are running). ARC is supposed to release memory when needed (never tested though), so it might not be an issue.

After about an year of playing with both filesystems on my Linux laptop, I decided the checksumming is not worth the performance loss and switched back to ext4, which is significantly faster than both filesystems. Still use ZFS on backup drives for checksumming data at rest and easy incremental replication with `zfs send`.

remram · on Jan 27, 2021

My main problem with ZFS is the very limited number of ways you can change your setup. No removing drives, no shrinking, etc. Probably fine for (bare-metal) production systems, but not so friendly with desktops/laptops, where I would still love to have snapshots and send-recv support.

Artlav · on Jan 27, 2021

Predictability, no? Sometimes you want to know how large your data really is if it was to get onto an uncompressed filesystem.

LeoPanthera · on Jan 27, 2021

You can use "du -A" to show the uncompressed size.

zX41ZdbW · on Jan 27, 2021

It is possible to make LZ4 decompression even faster, here's how: https://habr.com/en/company/yandex/blog/457612/

izackp · on Jan 27, 2021

Some interesting and related projects:

https://github.com/strigeus/ipzip - TCP/IP Packet Compressor with LZ4 support

https://github.com/centaurean/density - Extremely fast de/compression

maeln · on Jan 27, 2021

If I remember correctly, it is very popular in video games because it is faster to load compressed assets from disk and decompress them in memory than loading the uncompressed assets from disk, even on an SSD.

gens · on Jan 27, 2021

It is used, and most every other compression technique was/is used in video games.

I was thinking of using LZ4, but it doesn't really work that great on floating point, and images are already compressed (png, jpg, and even BCn, can't be compressed much further). So idk. Good thing about lz4 is that it's very simple[0] and probably faster then memcpy().

http://ticki.github.io/blog/how-lz4-works/

garaetjjte · on Jan 27, 2021

>and even BCn, can't be compressed much further

S3TC is block compression, so if there is repeating data in images it will compress quite well.

gens · on Jan 27, 2021

I tried compressing BC7 (from AMD-s Compressonator) using lz4c and it wasn't much.

Just re-ran it (with -hc, version 1.9.3(latest now)) and: "Compressed 5592544 bytes into 5413783 bytes ==> 96.80%".

7z with "7z a -mx=9 -t7z asd.7z albedo_PNG_BC7_1.KTX" does 5592544 bytes to 4844707 bytes (down to ~87%).

Original file is 10227047 bytes (PNG, RGBA), i can't remember if the ktx has mipmaps.

EDIT: Note that the image is fairly noisy (gravel). Could/should be better with more "artificial" textures.

I don't know if ktx does some extra compression, but, looking at it, i doubt it.

PS I think that BC could be massaged at compression to be better compressible, and i think i read something about that. Don't remember.

garaetjjte · on Jan 27, 2021

Yes, it depends on content and won't do much for grainy surfaces. I don't have AAA game quality textures to compare, but I think for usual textured model it is still worthwhile. eg. this https://milek7.pl/.stuff/somelocotex.png weights 16MiB uncompressed, 5.1MiB as PNG, 5.3MiB as DXT5, and 2.1MiB as DXT5-in-LZ4. (mipmaps included in DXT5)

>PS I think that BC could be massaged at compression to be better compressible, and i think i read something about that. Don't remember.

There's Oodle BC7Prep from RAD Game Tools: http://www.radgametools.com/oodletexture.htm

EDIT: RAD page states that "BC7 blocks that are often very difficult to compress", so that might be also a factor why my DXT5 test compressed much better than your BC7.

EDIT2: Yeah, with BC7 LZ4 only compresses it down to 4.6MiB.

bob1029 · on Jan 27, 2021

If you really want to impress your customers, use SQLite to aggregate LZ4-compressed entities. In many AAA games, there can be hundreds of thousands of assets to load & keep track of. If you have to load an entire scene from disk, you could write a simple SQL query to select all assets assigned to the scene (i.e. a SceneAssets mapping table) and then stream them all into memory from the unified database file handle.

dolmen · on Jan 28, 2021

Do you have any tricks to design CREATE TABLE and INSERT statements so the SQLite file has the best layout (proximity of data) ?

bob1029 · on Jan 28, 2021

The best approach I can think of is to have 1 very small table that is just scene_id + asset_id, and then 1 very humongous table that is asset_id+blob.

You could further optimize by profiling access patterns during QA testing. There wouldnt be 1 global ideal ordering of assets if you had multiple scenes involved using varying subsets, but you could certainly group the most commonly used together using some implicit insert ordering during creation. This would help to minimize the total number of filesystem block accesses you require.

I think one other important IO trick is to make sure you vacuum the sqlite database before you publish it for use. Presumably, these should be read-only once authored in this context of usage. This will clear out empty pages and de-fragment the overall file.

Jonnax · on Jan 27, 2021

I did a search and found this patch note for a game which has a comparison graph comparing cold load, hot load and size of some algorithms:

https://store.steampowered.com/news/app/991270/view/18064466...

TwoBit · on Jan 27, 2021

How does it compare to Kraken? It was fast enough that Sony recently built Kraken into the PS5 hardware.

beefok · on Jan 27, 2021

I've been hunting for a good decompressor to use in a low ram microcontroller, for instance, an ARM Cortex M0. I've read an article [1] on LZ4 decompression on the Cortex, but I couldn't understand what kind of memory requirements are needed.

I've yet to really understand what kind of footprint LZ4 uses, and if it's dependent on dictionary size used to compress. What if I have, say, 4KB that I could use to store in-place decompression. Is that related to the compression ratio?

[1] https://community.arm.com/developer/ip-products/processors/b...

mintyc · on Jan 27, 2021

https://blog.logrocket.com/rust-compression-libraries/

Although implementations arein rust, I assume the provided benchmarks are representative of any optimised implementation...

Many compressor algorithms are compared on several data sets.

The results tables show compressed size, compression and decompression times for a number of normal and pathological cases.

Get a good feel about strengths and weaknesses.

Some algs really go downhill in pathological cases, such as with random data.

Do consider encryption too though you probably want to do that on the compressed data set where possible.

Sometimes external encryption means you will be stuck with something close to pathological...

yotamoron · on Jan 27, 2021

LZ4 rocks. Used it in the past with great results (much less CPU intensive then the GZIP we were using, still getting good compression).

viktorcode · on Jan 27, 2021

Weird that the page doesn't list macOS or other Apple's OS in the list of operating systems with LZ4 support.

90minuteAPI · on Jan 27, 2021

It’s even in the recent-ish high-ish level Compression framework: https://developer.apple.com/documentation/compression/compre...

contravariant · on Jan 27, 2021

If they're including transfer time it'd be fun to see how raw data performs.

maxpert · on Jan 27, 2021

I've personally used LZ4 on production a scale that really proves how using a compression like LZ4 is more efficient than uncompressed data https://doordash.engineering/2019/01/02/speeding-up-redis-wi...

pottertheotter · on Jan 27, 2021

I've been using it reliably for the past several years for government documents needed for academic research. ~20mm documents that range from very small to 10s of MB. Each one is compressed using LZ4 and, overall, it's ~10% of the uncompressed size. Compressed it's about 2TB of data. It's unbelievable how fast they decompress.

mkhnews · on Jan 27, 2021

LZ4 is awesome, thank you for it. Another interesting and fast compressor I saw a while back was Density.

ant6n · on Jan 27, 2021

How does it work?

akanet · on Jan 27, 2021

I found this blog post describing some of the mechanics, but I am still interested in reading how this may be implemented branchlessly.

http://ticki.github.io/blog/how-lz4-works/

cupofcoffee · on Jan 27, 2021

Most interesting question on this thread and gets downvoted. :)

artifact_44 · on Jan 27, 2021

Ring buffer and huffman codes?

vnxli · on Jan 27, 2021

I think it’s middle-out compression

yotamoron · on Jan 27, 2021

It compresses stuff. Amazing.

minitoar · on Jan 27, 2021

Interana uses lz4 for compressing certain data types, namely list-type data. We found it offers the best decompression speed for reasonably good compression ratio. We actually use the high compression variant.

mschuetz · on Jan 27, 2021

Not sure how LZ4 compares but what convinced me to use brotli over some other compression libraries was how trivialy easy it was to integrate in C++ as well as in javascript. No effort at all. I wish more lib providers would make their things so approachable.

TwoBit · on Jan 27, 2021

OK but how does it compare to Kraken?

jansan · on Jan 27, 2021

The wikipedia article about LZ4 says that compression is slightly worse, compression speed is comparable, and decompression can be significantly slower compared to LZO. Can anyone enlighten me why I should not used LZO? I do not know anything about either algorithms, but if I switch from deflate I want to mak the right decision.

JanecekPetr · on Jan 27, 2021

You misread that. It says that decompression is significantly faster than LZO. In fact, decompression speeds of up to 5 GB/s is one of the reasons LZ4 is so popular.

PSeitz · on Jan 27, 2021

As far as I know LZ4 is much faster that most compression algorithms, with decompression speeds of over 4GB/s

adzm · on Jan 27, 2021

LZO is also GPL which can be a conflict, compared with the BSD license of LZ4.