tl;dr: the algorithm splits the audio into blocks (default is 4096 samples per b...

derefr · on Sept 5, 2022

Curious to me why only such a simple representation (polynomial) is used for the approximation. Couldn't you get a much better approximation (⇒ more compressible residual) in about the same amount of storage space + CPU decoding effort, using a formula based on e.g. wavelet LUTs as your primitive (plus maybe a larger block size)?

Are there other, more advanced lossless encodings that do this? And if so, why didn't they catch on compared to FLAC?

For that matter, is there a lossless format that just embeds a regular lossy encoding of the audio as the approximation, and then computes+stores the residual relative to that? (I'm guessing that this wouldn't work well for some reason, but I'm not sure what that reason would be.)

(ETA: the other later lossless audio formats that I'm personally aware of — ALAC, Monkey's Audio, and WavPack — all seem to use linear prediction. Seemingly they were all designed under the presumption of the constraint that the encode step must be able to be done in hardware / with fixed-sized memory buffers; rather than allowing that you can load the whole PCM audio file into memory and do things like FFT to it. Possibly made sense in the late 90s, when a PC's RAM wasn't much larger than five minutes of uncompressed audio. Doesn't really make sense today. Maybe we're due for a new lossless audio encoding?)

dspig · on Sept 5, 2022

Diminishing returns - the polynomial is already good relative to the unpredictability of the waveform. Adding degrees of freedom to the analyzed/predicted waveform shape at some point needs as many bits to store as a larger residual signal would have.

sdenton4 · on Sept 5, 2022

FLAC can use polynomials or lpc, as described in the article.

I'm curious if you'd gain anything by doing an mdct and then modeling in the frequency domain and then storing the residuals... Lots of the frequency channels will usually have much lower energy, so the residuals wind up being easier to store.

hcs · on Sept 7, 2022

I just ran across MPEG-4 SLS [1] (née "AAZ" for "Advanced Audio Zip", aka HD-AAC), which is a lossless format on top of AAC's MDCT, and which is patented, unfortunately.

That led me to other coders: DTS-HD Master Audio [2] (née DTS++) and OptimFROG DualStream [3].

OptimFROG's DualStream mode is similar to WavPack's hybrid mode, and DTS-HD MA uses DTS Coherent Acoustics (based on ADPCM), so none of these are based on a perceptual lossy codec besides MPEG-4 SLS.

(Sorry to keep replying here, I keep stumbling on interesting things after the edit window closes)

[1] https://en.wikipedia.org/wiki/MPEG-4_SLS

[2] https://en.wikipedia.org/wiki/DTS-HD_Master_Audio

[3] http://losslessaudio.org/DualStream.php

hcs · on Sept 5, 2022

> is there a lossless format that just embeds a regular lossy encoding of the audio as the approximation, and then computes+stores the residual relative to that?

It's not "regular lossy", but WavPack does allow separating lossy from the residual in hybrid mode. I think this is rarely done with DCT-based stuff because there's so much potential imprecision in the decoders.

derefr · on Sept 6, 2022

Interesting; it never occurred to me that your average lossy audio codec isn't just lossy, but non-strict in defining the decoding output, such that you get different PCM output depending on the decoder implementation.

Is this just a thing with old codecs? I'd think that any codec from the last 10 years could assume a minimum level of support for certain ALU ops in even the wimpiest hardware expected to decode it; and then constrain more-powerful implementations to emulate that minimum standard, so as to get exactly the same decoded output. (I.e., define a strict "decoder abstract machine" semantics, and then explain how various real-world software/hardware impls should implement it.)

hcs · on Sept 6, 2022

I don't know much about the practical engineering of modern codecs, I was just assuming that the exact order and combination of operations (particularly rounding) would be underspecified to allow for different architectures, and that would impact at least the low bits of the output. magicalhippo's comment suggests this might not actually be the case for H.264, which does also have a lossless mode...

There's another more important point, though: Modern lossy codecs are designed to be perceptually transparent, rather than minimizing an absolute signal error. The difference is a likely a large and unpredictable signal, so the typical Rice coding will be ineffective for compressing the residual.

Wavelets still might be interesting as a basis, there's at least one project [1] that reports comparable ratios to FLAC, if a bit lower.

[1] https://shamazmazum.github.io/wavelet-audio/

magicalhippo · on Sept 5, 2022

> I think this is rarely done with DCT-based stuff because there's so much potential imprecision in the decoders.

Is this why h.264 and friends are specified in terms of how it's decoded?

black_knight · on Sept 5, 2022

As always with compression there is no magic. There are wav files that are smaller than the equivalent flac file (perhaps even the majority of possible WAV files are shorter than their equivalent FLAC). But it so happens that the sounds we actually want to store are very well approximated by polynomials.

aidos · on Sept 5, 2022

Nice. Not dissimilar from what jpeg2000 does (from memory - correct me if I’m wrong!). They use the previous pixel values in order to “guess” the next value and then store the delta, which tends to be a smaller number on average, giving better compression.

mistrial9 · on Sept 5, 2022

proximity in a 2D image has neighbors in north-south-east-west (NSEW) on a 2D grid, not just pixel neighbors per-scanline (row of bytes); proximity in a sound file has harmonics and transitions of some plural set of signal, but the sound file neighbors are roughly at a point in time. corrections welcome

robotresearcher · on Sept 5, 2022

True. The fact of locality in real-world 2D images means you can predict a pixel's value by picking any neighboring pixel, or indeed any nearby pixel, with worsening performance with L2 distance. There's no reason I can think of to prefer a direction, except for convenience in the data structure.

cryptonector · on Sept 5, 2022

Plus the use of Rice codes for encoding the residuals in as few bits as possible.

drawingthesun · on Sept 5, 2022

Would applying a lossless compression algorithm to the data such as zip or 7z after the algorithm you describe reduce the file even size even more?

Wondering if FLAC already does this or if such a feature could be added?

tgv · on Sept 5, 2022

These methods perform poorly on audio, so even worse on compressed audio. Just try it on a wav and a flac file.

liftm · on Sept 6, 2022

Sounds quite simple already. I still wonder if for FLAC there's an equivalent of what QOI is to PNG.