FLAC makes lots of smart decisions. One constraint on lossless audio should, of course, be that it's actually lossless (which the test suite confirms), but also that it makes it easy for the user to confirm the audio streams are the same as the decoded WAV file (`flac -t`) by storing a checksum of the stream.
I have often wondered why other media formats don't do a similar thing, especially since changing a media file's tags (which can change the checksum of a file) or name (which makes external verification from txt file difficult) is quite common. I even wrote a utility[0] that uses ffmpeg to hash all ffmpeg compatible bitstreams, and store their hashes in a xattr (yes, with lots of other options to test and compare, etc.), but all media formats should just be as clever (and care as much) to do this natively, like FLAC.
Not often do I find a comment that brings up very similar experience. Had intermittent data corruption because memory bit error on very upper range that almost always was unused and went unnotced a long time. ZFS backend PC was actually ok, it was main PC that used the share but point remains. After that no more non-ECC memory ever on any computer for me (ok except some laptops).
I'm only going to add a related anecdote that wasn't a failing of ECC vs non-ECC but rather of BIOS behavior.
Background: Lenovo Thinkpad T520 laptop, random crashes and data corruption.
Diagnosis: Eventually let memtestx86+ run a bunch of times for like a week and it wasn't showing any errors. Finally about to give up I pressed some key on the keyboard and it blew errors immediately all over the screen. This suggested EC or maybe some BIOS-controlled keyboard driver was writing to memory it shouldn't have been.
Fix: I am a Linux user, the kernel has an option to reserve low memory for poorly behaving BIOS that likes to write where it shouldn't. CONFIG_X86_RESERVE_LOW should be set to at least 64kb and increased up to 640kb if this issues continue to happen. There are some other options to scan for this misbehavior but I honestly don't know how Linux currently handles it: https://lkml.org/lkml/2013/11/11/683
Yep, that's a smart feature. It had actually uncovered bugs in FFmpeg's FLAC encoder multiple times. [1][2] (why they didn't just use the reference implementation is beyond me...)
But I have one nitpick with FLAC in this regard: they chose MD5 as the checksum instead of something sensible like CRC32/CRC64... It makes no sense, because we're not doing cryptography here - we're doing an integrity check. Moreover it makes a parallelized FLAC encoder somewhat problematic to implement, as there's always going to be a serial bottleneck at the end for computing the MD5 hash. CRC on the other hand could easily be parallelized. But I'm afraid we'll have to live with this shortcoming forever now ¯\_(ツ)_/¯
Agreed. FYI, ffmpeg's hash testing format [0], and therefore my utility dano, allows the use of CRC32, etc., though.
> But I'm afraid we'll have to live with this shortcoming forever now ¯\_(ツ)_/¯
I don't believe there is a fundamental reason why a later version of FLAC couldn't/wouldn't allow different algos as I think the raw decoded bitstream is checksum-ed and is just stored as a FLAC tag[1]. So they could just version that tag.
https://interviewfor.red has lots of info about that exact thing, because in some uh… very peculiar communities, sharing a lossy file passed as a lossless file is a serious offense.
You're basically looking for two things: first, an abrupt stop to frequencies above a certain point -- compression uses applies a low-pass filter to eliminate them.
Second, a bunch of rectangular-ish boxes where signal is missing. The spectrogram will look like swiss cheese in a way -- this is where the compression algorithm decides audio information can be deleted because it's "masked" by e.g. louder surrounding sounds.
(Generally speaking, the more aggressive the compression, the lower the frequency ceiling, and the more/bigger swiss cheese holes.)
On the other hand, if you run e.g. AAC compression at a very high bitrate and explicitly force no lowpass filter at all (which virtually nobody does in practice, but you can do from a command-line tool)... you won't be able to see any qualitative difference. Because you won't be able to hear any qualitative difference either. :)
You can view it with a spectrogram, audio which has been lossily encoded will have a telltale cutoff in the high frequencies. The lossier the encoding, the lower the frequency ceiling will be.
Another thing to look for is deleted or very flat energy between harmonics. Reducing bit rate in inter harmonic regions is most of what's meant by 'psychoacoustics.'
I imagine low pass filtering depends a bit on which compression you're using.
None of these tricks will work for detecting GAN compressed audio, though.
Audiochecker (https://archive.org/details/Audiochecker.v2.0.Beta.Build.457) is amazingly good at detecting WAV/FLAC that is really just a re-encoded from MPEG. It's specific to MPEG though, so it wouldn't be able to detect e.g. Vorbis converted to FLAC.
I couldn't find FLAD but fakeflac doesn't seem to work, even given actual mp3s it claims they are fine :(.
I'm guessing sometimes frequency cutoff is due to using less than ideal microphones, although not the extreme cutoffs (I'm fairly sure it would be gradual in that case). It also seems there is often a fair amount of noise around 20kHz that I'm guessing is from recording equipment, sometimes on tracks that otherwise look suspiciously like they are from lossy audio. Additionally, in some cases there could potentially be high frequency cutoffs in processing while otherwise being lossless, particularly stuff like electronic music where there can be a lot of processing in general. So I'm not sure if it is necessarily all that easy to tell in general, although there are certainly some from Bandcamp that seem highly likely to be conversions from lossy formats :(.
Checksumming in media container formats is tricky because different media formats have different features. So each codec and combination of track/codec needs its own special case on how and when a checksum is performed.
In the case of FLAC the situation is straight forward, you've got uncompressed audio and a codec that's performing lossless compression. You want to know what was encoded losslessly matches the input. It's no different from a checksum in a compressed archive.
It's less straightforward with a lossy compressed codec. Are we just checksumming the compressed bitstream? Where in the handling of the bitstream are we doing the checksum? In MOV/MP4 a media track can have arbitrary start and end times. It makes doing in/out cuts super fast. But when it comes to checksumming the track do we checksum all the samples in the track or only the samples that will play? If a file if flattened and a track has all the samples out of the playable region discarded do we now have to recalculate the checksum?
Media tracks can also have header tags that indicate stuff like display color space, displayed size, or even track level metadata. Are we checksumming the track bitstream and important track headers? A video needing to render in a particular color space will be materially affected if it's bitstream is preserved but color space tags are dropped.
If you want to hack in checksums for track bitstreams both MOV/MP4 and Matroska can accept arbitrary tags for tracks so you can just write a checksum to a track header tag if you really want to. You really need a table of rules for doing the checksum otherwise it's a wasted effort.
PNG - another very well engineered format - also stores a checksum of the image data as well as checksums of every metadata block.
But I guess most people and thus file formats don't care. If your file's corrupted and you don't notice it doesn't matter. And if you do notice you don't need a checksum to tell you. I don't necessarily agree (knowing where the corruption occurred is useful), but I can see the reasoning.
Actually I have a related question about CDs: do any CD players indicate when the (Reed-Solomon I think) forward error correction actually corrects any errors? This could give an indication of the quality of the media. It could be total corrected error count, or corrected error count per minute.. something like that.
Second, do any give an indication of when the error correction fails: I think CD players just fill the missing data with the last sample value (certainly this was the case for first gen CD players)- but this FLAC encoding gives a better procedure: replace the missing samples from those of a predictive model. But either way, it would be nice to know when the playback is not perfect.
> Some ROM drives are capable of reporting C2 error information along with the audio data and some ripping software can use this information to determine whether the retrieved audio data is valid or not. A standardised mechanism for ROM drives to report C2 error information is documented in the Multi-Media Command (MMC) standard
My experience backing up up audio cds with Exact Audio Copy tells me that CD players have limited error correction facilities and using EAC is the only way to know! (read retail CD brand new get same as reference checksum, read same CD ten years later, mismatch errors popup on a few tracks), the FAQ on EAC gives some details on CD player capabilities - standalone audio CD players use oversampling and more but most don't tell you when it kicks in - there's typically only one small LCD/LED line of display.
https://www.exactaudiocopy.de/en/index.php/support/faq/
Well this gives me an idea for an interesting FPGA project. The early CD players, like Sony D-5 (I just looked at its service manual), have accessible recovered clock and data signals (they are buried in a chip for newer devices). It would not be too hard to make the EFM decoder, de-interleaver, and error corrector in an FPGA so that I can have these features.
How much would audiophiles pay to have a quality indicator on their CD player? Hmm...
Edit: Sony's CXD2500BQ shows error correction information on the pins- no FPGA needed.
CXD2500BQ is used in Onkyo's DX-7310
This company claims to have a product for this (but no pictures..), follow the links for their "CD Errormonitor":
I've never personally seen that on a dedicated player (appliance nor software), I assume because they are more interested in keeping the music playing.
Exact Audio Copy shows it but will spend minutes re-reading the same frames: https://i.imgur.com/ZGozdhz.png (screenshot mine, unfortunately)
Hell, 99% of all computers are still running on non-checksummed filesystems, if not 99.99%.
For the most part, only a tiny minority of people are even aware of bit rot, let alone have anything they simultaneously care enough about to protect against, and have enough ownership of it to attempt to try.
Lossy codecs try to find an approximate signal that sounds as close as possible, but is easier to compress. They drop parts of the signal entirely, or reduce their resolution, based on models of human hearing to identify the parts that are least likely to be noticeable if they are missing. E.g. not all frequencies can be heard equally well, so quality is dropped on the ones that are heard less clearly anyways. And loud signals on one frequency can make signals on another frequency or following quickly after harder to perceive.
Most such audio codecs are based to some degree on variants of fourier transforms, so this modification is done by dropping or reducing resolution of parts of the output.
When you listen to music there's a lot of "fine detail" that you can't really hear that gets buried by louder sounds.
CD audio is perfectly lossless - it encodes the signal that you put in by measuring a voltage 44100 times per second and recording that exactly. When you play it back you get exactly the same signal back out. The only problem is, this takes up a lot of space, roughly 10MB per minute for stereo audio.
MP3 audio is lossy in that rather than storing the exact values of a waveform, it stores a description of how short segments of the waveform change. The higher the bitrate, the better the description, and the more detailed the reproduction. A low bitrate MP3 is like trying to redraw the original waveform from a vague description with a paint roller, a high bitrate MP3 is like trying to draw it with a mapping pen from a really detailed description.
FLAC audio is lossless because it takes the precise values of the audio, and uses a technique similar to zip files to find similar-looking blocks of data. Think in terms of having a one-second silence recorded as "Zero, then 44099 more of 'em" rather than "zero zero zero zero zero..." and so on 44100 times.
> CD audio is perfectly lossless - it encodes the signal that you put in by measuring a voltage 44100 times per second and recording that exactly. When you play it back you get exactly the same signal back out.
Not at all! CD audio amplitude is quantized to 16 bits, and temporally sampled at 44100 Hz as you say. Certainly very high quality, but there's absolutely a loss (that nobody can really hear).
I know that when I was about 10yo, I could hear a tiny suggestion of something audible when I tested my hearing at around 23-24kHz. I guess that if you had some really loud content around these frequencies and had equipment that could reproduce it perfectly, it would be not impossible for it to influence my listening experience those many years ago :)
Although there's also a difference between being able to hear a single tone, and being able to reliably perceive a difference in some more complex bit of noise.
Testing with https://www.audiocheck.net/blindtests_frequency.php, my hearing limit in the white noise case for example is at least 1 to 2 kHz lower than my hearing limit for a single frequency sinus tone.
Oh absolutely, hence the stressed out words in cursive. It isn't going to make any meaningful difference even for a 10yo with perfect hearing - for listening, 44100Hz sample rate is more than enough, and 48000Hz provides large enough headroom to already make any semi-reasonable "audiophile" sleep calm.
I used to be able to locate the computer section of WH Smiths or John Menzies in the 1980s and go and play on the ZX Spectrums and Commodore 64s (and other oddities - the one in Arbroath had a couple of Memotech MTX512s!) by hearing the 15kHz scan line whistle from the CRT tellies they used for displays.
Lossless to all practical purposes, then. With 16 bits of amplitude quantisation the smallest bit is below the noise floor of all but the quietest possible amplification, and while 44.1kHz isn't a lot it still places the corner frequency of your antialiasing filter comfortably above human hearing.
It's got way more audio bandwidth than most of the analogue masters that everyone raves about.
No, there's nothing magically audible happening with phase shifts near the steep cutoff of the antialiasing filter that isn't happening with the gentle rolloff of tape, either. Not that you could hear anyway, and even though my hearing is better than most 48-year-old industrial music enthusiasts, not that I could hear either.
For cats, even your very best equipment with perfect reproduction sounds like AM radio because their hearing tops out at 80kHz.
Just like any general purpose compression: everything - flac could be used just like zip/zlib/gzip as a general purpose compressor it just wouldn’t compress as well on most data that isn’t audio.
> dropped when "LOSS" occurs?
Lossy compression generally employs some type of perceptual coding where data is reorganized such that the signal data is sorted or localized according to its perceptual importance. This does partly involve removing or reducing the density of signal in the higher frequencies but it also exploits things such as masking both in time - inability to perceive signals occurring close in time of similar frequency. And frequency masking - our inability to hear quieter signals that are close in frequency to a louder one.
The key point is masking: if a sound at a given frequency is loud enough, you are less likely to perceive weaker sounds at other frequencies. So there is little point in wasting bits on those frequencies during the time intervals where they are being swamped.
The exact frequency/amplitude relationships where masking effects come into play were studied by the telecoms early on (meaning in the 1950s-1960s era), and are still a key part of most lossy encoding models these days.
From my understanding (I took two relevant undergrad courses, definitely not an expert) lossy compression can involve either losing certain frequencies (like those above/below human hearing), or losing the accuracy of the reproduction (like a certain frequency component of the sound could be the smallest bit louder/quieter or higher/lower than it was originally).
But IMO these are the properties of a good container file / stream format. FLAC might define its own but IMO most codecs could work well with a generic container.
tl;dr: the algorithm splits the audio into blocks (default is 4096 samples per block). Then a polynomial is used to approximate the wave for the block, and residuals are calculated for each sample by subtracting the approximation. The residuals, because their magnitude is smaller than the original signal, require fewer bits to encode. Hence the smaller file without loss of information.
Curious to me why only such a simple representation (polynomial) is used for the approximation. Couldn't you get a much better approximation (⇒ more compressible residual) in about the same amount of storage space + CPU decoding effort, using a formula based on e.g. wavelet LUTs as your primitive (plus maybe a larger block size)?
Are there other, more advanced lossless encodings that do this? And if so, why didn't they catch on compared to FLAC?
For that matter, is there a lossless format that just embeds a regular lossy encoding of the audio as the approximation, and then computes+stores the residual relative to that? (I'm guessing that this wouldn't work well for some reason, but I'm not sure what that reason would be.)
(ETA: the other later lossless audio formats that I'm personally aware of — ALAC, Monkey's Audio, and WavPack — all seem to use linear prediction. Seemingly they were all designed under the presumption of the constraint that the encode step must be able to be done in hardware / with fixed-sized memory buffers; rather than allowing that you can load the whole PCM audio file into memory and do things like FFT to it. Possibly made sense in the late 90s, when a PC's RAM wasn't much larger than five minutes of uncompressed audio. Doesn't really make sense today. Maybe we're due for a new lossless audio encoding?)
Diminishing returns - the polynomial is already good relative to the unpredictability of the waveform. Adding degrees of freedom to the analyzed/predicted waveform shape at some point needs as many bits to store as a larger residual signal would have.
FLAC can use polynomials or lpc, as described in the article.
I'm curious if you'd gain anything by doing an mdct and then modeling in the frequency domain and then storing the residuals... Lots of the frequency channels will usually have much lower energy, so the residuals wind up being easier to store.
I just ran across MPEG-4 SLS [1] (née "AAZ" for "Advanced Audio Zip", aka HD-AAC), which is a lossless format on top of AAC's MDCT, and which is patented, unfortunately.
That led me to other coders: DTS-HD Master Audio [2] (née DTS++) and OptimFROG DualStream [3].
OptimFROG's DualStream mode is similar to WavPack's hybrid mode, and DTS-HD MA uses DTS Coherent Acoustics (based on ADPCM), so none of these are based on a perceptual lossy codec besides MPEG-4 SLS.
(Sorry to keep replying here, I keep stumbling on interesting things after the edit window closes)
> is there a lossless format that just embeds a regular lossy encoding of the audio as the approximation, and then computes+stores the residual relative to that?
It's not "regular lossy", but WavPack does allow separating lossy from the residual in hybrid mode. I think this is rarely done with DCT-based stuff because there's so much potential imprecision in the decoders.
Interesting; it never occurred to me that your average lossy audio codec isn't just lossy, but non-strict in defining the decoding output, such that you get different PCM output depending on the decoder implementation.
Is this just a thing with old codecs? I'd think that any codec from the last 10 years could assume a minimum level of support for certain ALU ops in even the wimpiest hardware expected to decode it; and then constrain more-powerful implementations to emulate that minimum standard, so as to get exactly the same decoded output. (I.e., define a strict "decoder abstract machine" semantics, and then explain how various real-world software/hardware impls should implement it.)
I don't know much about the practical engineering of modern codecs, I was just assuming that the exact order and combination of operations (particularly rounding) would be underspecified to allow for different architectures, and that would impact at least the low bits of the output. magicalhippo's comment suggests this might not actually be the case for H.264, which does also have a lossless mode...
There's another more important point, though: Modern lossy codecs are designed to be perceptually transparent, rather than minimizing an absolute signal error. The difference is a likely a large and unpredictable signal, so the typical Rice coding will be ineffective for compressing the residual.
Wavelets still might be interesting as a basis, there's at least one project [1] that reports comparable ratios to FLAC, if a bit lower.
As always with compression there is no magic. There are wav files that are smaller than the equivalent flac file (perhaps even the majority of possible WAV files are shorter than their equivalent FLAC). But it so happens that the sounds we actually want to store are very well approximated by polynomials.
Nice. Not dissimilar from what jpeg2000 does (from memory - correct me if I’m wrong!). They use the previous pixel values in order to “guess” the next value and then store the delta, which tends to be a smaller number on average, giving better compression.
proximity in a 2D image has neighbors in north-south-east-west (NSEW) on a 2D grid, not just pixel neighbors per-scanline (row of bytes); proximity in a sound file has harmonics and transitions of some plural set of signal, but the sound file neighbors are roughly at a point in time. corrections welcome
True. The fact of locality in real-world 2D images means you can predict a pixel's value by picking any neighboring pixel, or indeed any nearby pixel, with worsening performance with L2 distance. There's no reason I can think of to prefer a direction, except for convenience in the data structure.
I do not have very sensitive ears, but maybe an audio enthusiast can explain this to me:
In the 80s and 90s some people were going crazy over HiFi, only the absolute high end products were just good enough. I remmeber seeing stereo systems for 50,000$ and more. CDs were already seen as inferior to records quality-wise, and speakers had to be huge if possible.
Today Wifi speakers are all the rage. The music is downloaded (precompressed) and then sent over Wifi or Bluetooth with (sometimes very) limited bandwidth to a single speaker which has the size of a laptop.
How does the audio quality compare? Is it like day and night? Or do the new multi room systems play in the same league as the old system that were used by enthusiasts? I often have the feeling that overall sound quality does not matter anymore as long as the bass is strong enough, but as I said at the beginning, my ears are not very sensitive.
So to start, you should pretty much disregard anyone who thinks CD quality is worse than vinyl. CD quality is 16 bit, 44k samples per second, and despite some audiophile gear now that is 24 bit, 96k samples per second, ABX testing routinely fails to find a difference between them. As such, in terms of music quality and software quality, anything capable of delivering 16 bit, 44k samples should be considered "perfect" (i.e. FLAC/CDs). There is some evidence that in studio conditions, people can hear a difference between high bitrate lossy compression and CD quality, but realistically, even Vorbis at 128kbps or other formats at 256kbps or higher will provide a very good listening experience.
I feel it's important to explain why <44 kSamp/s sample rates are used. No matter what the digital audio signal should have no information above 20 kHz, but running ADCs far above Nyquist lessens the importance of the analog antialiasing filter. You don't need to worry about expensive caps and how they age if you sample at 192 kHz but filter to 20 kHz. This drives down the noise floor and increases linearity for essentially free.
Please repeat this when people say "there's no reason to use sample rates above 44 kHz". While it's true for source material, it should be properly caveated.
I don’t think that’s giving people enough credit. Sure, if you master for CD to exploit as much dynamic range as possible (as is often done with high end classical music) then CD quality is truly amazing and vinyl can’t even come close.
But, a lot of popular music out there isn’t mastered for that use case (high end ABX testing). On the contrary, there are tons of CDs that are extremely compressed (in the dynamic range sense) so as to sound as loud as possible on the radio [1]. If you compare one of these CDs with an earlier (or even contemporary) vinyl release which has been mastered correctly then of course the vinyl will sound better!
Unfortunately, because we’re dealing with a Wild West of media, new and old, floating around in the marketplace we don’t have the luxury of a perfect ABX comparison, and so people will continue to buy and prefer old formats. It is for that reason that we can’t dismiss them.
"radio" in this case refers more to 'cheap equipment played in a noisy environment'. Back in the 70s ~ 80s that meant car speakers playing over road noise. Now it's cheap ear bud or tinny bluetooth xyz playing over road noise / transit / coffee house ambient noise. It's not how the music gets to you, it's the environment you (most people, that is) hear it in. And on top of that, for most of the audience, it's background noise. So to catch your attention, it gets mastered as 'loud' as possible. Because the next / previous tracks in the playlist are. And so it goes.
Let's add that there is much snake oil in the audiophile obsession. Audiophile formats, equipment, and marketing come and go, but the single most important differentiating factor in audio quality is the equipment and engineering talent at the time of recording.
For example, much of the great early Bluenote jazz was recorded by RVG in the living room of his parents' home in New Jersey. [1]
The first CD's were not that great, being effectively around 14-bit. Better D/A converters and noise-shaping during mastering greatly improved the sound quality.
After that, the loudness wars destroyed it again.
> even Vorbis at 128kbps or other formats at 256kbps or higher will provide a very good listening experience.
Until it is resampled in the noisy OS mixer and/or lossy compressed again to be sent over Bluetooth. Very few ABX studies consider the effect of such modern "digital signal chains", especially transcoding. It's much better to start with FLAC.
My point is that there are no studies demonstrating the absence of such issues. Anecdotally, my Bluetooth headphones certainly seem to sound better with a FLAC source.
But saying this without mentioning the loudness wars misses the main force behind vinyl's staying power.
This is like making an argument for transistor guitar amps. They're better on paper, but tubes produce a type of distortion that is pleasing to many listeners.
Most of the arguments in vinyl sounding better center on original masters being pressed directly to vinyl “biscuits” without an intermediate step while CDs go through a compression step before they are digitized. These days, that’s rarely true. If original masters are used for too many high volume pressings, they wear out (physically), so the masters are often digitized before they go to vinyl. This sparked some controversy recently regarding a Michael Jackson repress with an ironic twist that nobody could tell the difference or noticed until the mastering engineer slipped up and let the cat out of the bag.
> If original masters are used for too many high volume pressings, they wear out (physically)
E.g. when Dylan's famous 60's electric trilogy first came out on CD, at least two of those albums had to be remixed not for whatever possible artistical and/or money-making reasons [1] that commonly cause remixes to be done these days, but simply because the original master tapes (including the safety copies) had worn out through continuous re-pressings of the original vinyl albums.
[1] Though of course the switch to CD in itself was, while also undeniably a definitive technological upgrade, in some ways also a nice way of making people buy the albums again even if they already owned them in some other storage format.
> CDs were already seen as inferior to records quality-wise, and speakers had to be huge if possible
Doesn't sound very scientific to me.
> How does the audio quality compare?
Why compare with an old system from the 80s? I got to hear a friend's friend's modern audiophile setup. A dedicated room with Wilson Audio speakers (totally overpriced but amazing), high-end DAC, high-end amp... It was bliss. One of these $50K+ setup but bliss. This is nothing like a laptop, a soundbar or a Sonos.
> Or do the new multi room systems play in the same league as the old system that were used by enthusiasts?
I wouldn't be suprised they'd actually be better than these old systems. But they don't play in the same league as actual modern audiophile setups.
From my experience with them, you kind of just summed up the HiFi/audiophile core. I've spent waaay too much time with audio engineers, recording engineers, etc from edit bays to sound stages. Yes, there are now things that I can hear because they pointed it out to me, but until then, I was perfectly content with my head in the sand of not-knowing. However, while I agree there are certain things that can make a difference, the tendency for the absurd always seems to take hold.
> Why compare with an old system from the 80s? I got to hear a friend's friend's modern audiophile setup. A dedicated room with Wilson Audio speakers (totally overpriced but amazing), high-end DAC, high-end amp... It was bliss. One of these $50K+ setup but bliss. This is nothing like a laptop, a soundbar or a Sonos.
To be fair you can have a similarly great experience on significantly more affordable setups around 10k.
DAC and amplification above a certain price range is basically good enough to become indistinguishable from one another. The largest part of your budget should go into speakers. And don't underestimate the effects of proper room treatment and correction. Solutions like Dirac Live have made this much more approachable.
Anything above that will provide very marginal gains. There's a lot of costly prestige above that threshold which doesn't always translate into real audible benefits.
What do you mean? What's the difference between an audiophile setup from the 80s GP was mentioning and one from today? Today you'd have a DAC, for a start. Maybe some DRC (Digital Room Correction software). Then amps have progressed tremendously. And so did speakers (newer materials).
As for the setup I listened to: I'm no pro, I don't remember the details. I know the speakers were Wilson Audio but I don't remember the DAC nor the amp(s) brands: all I know is it was high-end stuff costing money I'm personally not willing to put in an audio setup. But it did sound very good.
The cost for high quality plummeted, especially in audio storage, playback, and amplification. That was 1980-2010. The technical advances were not particularly big in microphones, but very big in analog-digital conversion, digital storage, digital editing, and digital-to-analog conversion. Amplification was rejuvenated by switching power supplies and class D amplifiers starting in 1996. The availability of high quality measurement techniques vastly improved the state of the art in speakers since 2000 or so.
Meanwhile, mass-market audio has gotten better but not consistently. The speakers are usually the limiting factor for quality: you can have at most any two of small speakers, deep bass, or volume for a given power budget. Lots of consumer systems go for small speakers and don't have either deep bass or high volume.
Does sound quality matter? That's an individual choice. It's available to you at a much lower cost than ever before in history.
The main appeal of lossless, as I see it, is freedom - to reencode, transmit, store, transform audio (music) as you please, without any worry of loss. I'm not stuck with FLAC - I can use any fancy, new or old format and get the same music. I can create lossy MP3s, OGGs or anything else at a variety of bitrates appropriate for the case.
Once you do lossless -> lossy, you're "stuck", unless you accept reencoding artifacts and those do add up to eventually be bad.
I'm an audio enthusiast, have quality (but not excessive, triple digit prices) DACs, amps and everything else. I can't ABX 256kbps+ MP3s from lossless, but the above stands.
I have to say, just as a casual observation, that for the same money I can get a lot better sound now in 2022 than I could in 2002.
Most low to midrange headphones (€50-€100) sound very acceptable and even the lower range Bluetooth speakers (€50) sound decidedly not terrible. In 2002 this money would buy headphones with no bass, shrill highs and muddy mids.
Just compare the original earbuds included with the iPod to the wired EarPods that Apple sells now.
Or compare the AirPods Pro ($249) to what the same amount (inflation adjusted) could get you 20 years ago.
Of course there’s still plenty of space on the high end, and still the higher you go the more you need to spend to achieve 1% better sound.
> I remmeber seeing stereo systems for 50,000$ and more.
These absolutely still exist. What's changed, to an extent, is that they're a bit less fashionable; whereas in the 80s any self-respecting rich person needed a hifi that cost as much as a small house, today, this is largely the preserve of (rich) enthusiasts.
The vast majority of people would have had relatively low cost stereo systems which were, by and large, far worse than their modern contemporaries, tho.
I admit I’m basically in this category. But I recently saw Jazz at Lincoln Center Orchestra and was totally blown away, mostly by the writing, musicianship, creativity and courage, but also because of the best audio quality I’ve ever experienced, and it has me questioning my assumptions.
Those HiFi people are still there. And the products evolved with time.
My guess: >90% of music consumers don't care about quality as long as it's good enough. What they care is ease of use. Why should they have a complicated setup and try getting their records in lossless FLAC when they can just go onto Spotify / Apple Music / YouTube and press play.
I've seen lot's of people that don't even care that their audio has the intro parts when playing from youtube. So why should they care for HiFi.
In some ways, audiophiles are the mirror image of this on the other end of the spectrum. Detectable improvements in sound do not matter and ABX testing is never mentioned in polite company. Why should they have a straightforward setup when they can purchase ever more arcane products and services to distance themselves from plebeians?
Personally I use FLAC for archiving purposes, to preserve a copy of my physical CDs. They're available in my home Plex server, but they'll get converted to a lossy format if synced to save some storage space on that device.
> Personally I use FLAC for archiving purposes, to preserve a copy of my physical CDs.
Same, but at home I also play directly the FLAC files (laptop / DAC / amp / floorstanding loudspeakers). They're not that much bigger than 320 kbps mp3 files.
I convert to lossy for my car, which takes mp3 files but not FLAC.
I used to collect mp3s back in the Napster days. My HDD was maybe 40 GB back then (?), maybe not even that. Nowadays FLAC files size aren't a concern. I think a screenshot of my screen takes more room than a song FLAC encoded.
Now I'm curious about how you manage to get such large screenshots. I just tried and with a total resolution of 5120x1200 (over 3 monitors) I get a 35x difference between a random song in FLAC (28Mb vs 729Kb)
This test is, of course, limited to the equipment with which you perform it. The most important equipment by far is your own ears. I was 16 when the audience was blindly played the same song (by Michael Jackson) from vinyl, CD and DVD on a 1.5 million dollar system. At that time I could immediately tell which medium was being played. Half the people in the room were wrong and it was clearly due to the age of the audience. Meanwhile, I would probably also no longer be able to tell the difference.
But If your audio system sucks, you will also have a hard time to Tell the difference in this Test.
I’m no spring chicken either and I don’t have a million dollar setup, but with a pair of $100 Shure SE215, I can just barely make out which are lossy (usually). But it’s not a big enough difference to concern me. I’m still plenty happy with a 320kps mp3.
I can't answer all of your questions, but I today we have AirPlay, which streams uncompressed audio to the receiver. Before that, the only option was Bluetooth or proprietary network codecs.
Audio quality has gotten a lot better over time as speaker and player technology improved, so now cheap mainstream stuff could be equal to high end back then.
That is likely true for the transmission of the audio signal.
Regarding the music itself I have heard complaints to the contrary. Loudness war/race and reduced dynamics are keywords I recall here. To blame was the way the audio was mixed before release. I'm not an audiophile though
Not quite, the weakest link are the speakers (and room acoustics), and cheap speakers — which the large
majority of “mainstream” speakers are — still suck today as they did in the 90s.
A lot of mainstream speakers are pretty bad, but I tend to think that once you get above a couple watts the quality-for-money tatto ratio goes up quite a bit.
Interestingly, a study suggests that room acoustics don't negatively affect the sound of good speakers, and what that means is that for listening, acoustics don't matter as much as they do for recording.
The signal is digital until it gets to the speaker so there is no loss of quality after the initial encoding, which might itself be lossless. Some systems (eg Sonos) can stream lossless audio or even 24 bit 192 khz, which is certainly overkill.
Lossless compression to around 60% of WAV size. Also more widespread support for tags than with WAV. So even if you don’t particularly care about the space savings, converting to FLAC could still be beneficial.
WAV is uncompressed. MP3 is highly compressed, but lossy. FLAC is compressed and lossless. If you wanted to store lots of master copies of audio, you could use WAV or you could reduce storage space by using FLAC instead.
WAV is the original lossless audio file. FLAC is just a compressed version of that file (think of it as .zip).
MP3 is more like an edited version of the original where "extra fluff" is removed from the audio in a way that you can still hear the important bits. And then compressed for further space savings.
Obviously, there is no point in converting MP3 to FLAC since when the original lossless audio track was MP3'd , it lost some of the audio information, so you'd only be changing the compression algorithm, I imagine.
What? Most all FLAC is created (converted) from a WAV source. If your source is MP3, then yes, FLAC is irrelevant... But FLAC is basically a lossless compression format for WAV.
I'm really confused by what you're talking about lol... especially "up"convert...??? WAV is the ultimate lossless audio on PC. It really doesn't get any better than WAV. There is no "up" from WAV. FLAC is a compression format for WAV, that does not lose any data. The output of FLAC will be identical to the WAV file, even though its compressed. MP3 is a compression format for WAV that loses data, and will not be identical to the original WAV file.
In 1988, Apple developed the Audio Interchange File Format (AIFF), which is uncompressed pulse code modulation (PCM). PCM is what is stored on CDs, so any Mac with a CD-ROM drive attached will recognize the PCM information on Red Book audio CD's as AIFF files.
Inexplicably, 3 years later, Microsoft and IBM developed the Resource Interchange File Format (RIFF) in 1991, of which the WAV format is one implementation. RIFF doesn't store PCM. Instead it stores various formats of data in 4 byte "chunks."
Depending on the audio file format specified, one can always distinguish a Windows user from an audio professional (or a Mac user), because since about 1990, the vast majority of professional audio recording (tracking, mixing and mastering) studios have been exclusively Mac shops, including such greats as Skywalker Sound and Abbey Road Studios.
All these formats, IFF, AIFF, and RIFF, use named chunks for organization, and store PCM basically the same way, though there are other payloads possible.
My post was BS. The reason Macs became ubiquitous in audio had nothing to do with file formats and likely had everything to do with pro audio software and hardware developers that initially ignored Windows and PCs, and by the time these became platform independent, Mac was too ingrained to be dislodged. But AIFF does have smooth and sexy contours compared to WAV's clunky aesthetic. There I go again.
PSA: If you want to recreate the original file (WAV, AIFF, etc), including metadata, you should use the --keep-foreign-metadata switch to flac, otherwise it only preserves what’s needed for the audio.
In addition to what others have said (FLAC is lossless but compressed, and about 50% smaller than WAV), FLAC (plus a few other tiny files) is also the de facto format for archiving audio CDs (it can do other quality than CD quality, but it's mostly used to backup CDs).
When you rip one of your CD, a good ripper shall verify that your rip is 100% bit perfect (by verifying that the hash of your rip matches an online database of hashes of CDs ripped by other people). These rippers typically do rip to FLAC.
FWIW on Linux I've had good luck with "whipper" in the past (haven't ripped any CD that recently) [1]
I wish apple supported FLAC. There’s an amazing free tool to convert to/from ALAC (https://tmkk.undo.jp/xld/index_e.html), but still: some players only support FLAC and some ALAC and sometimes you can’t transfer songs in either format because of the no overlap between source/target.
ALAC (Apple Lossless …) is not much newer than FLAC — it started in mid 2004 as the audio transport to AirPlay (previously called AirTunes) to AirPort Express.
I think they prefer it because it's in a MPEG4 container, it can be DRM encumbered with their "FairPlay" technology that they hadn't used for a while, but now use again for subscription Apple Music.
Even with Apple's Lossless format, it doesn't matter much because most people are listening through Bluetooth, which does not support lossless transmission.
Zeroing out data in the residual coding step would make FLAC a lossy compression format! (and most likely not a very good one). I'm tempted to implement it for giggles, but OTOH I'm worried it could spark some kind of uprising of audiophiles.
My pandemic project was to flac-ify all my music, including the vinyl records. It still fits on a microSD card, so the phone now has everything.
It's funny how often people still assume "on the computer" means "MP3." I don't know why you'd put up with any loss of quality anymore, even if you personally can't hear the difference.
I don't have a crazy large music collection, and mine is 85 GB of 192 kbps MP3s. With FLAC that'd be 250-ish GB of music. That's a significant chunk of most modern SSDs.
Why would you use 3x the storage space if you can't hear the difference for a non-trivial percentage of your available storage? Literally, by definition, according to your own terms, it serves no purpose.
I'm a musician and audio developer, and it's only really in my own music, that I've listened to over and over again while creating it, that I notice the degradation in a 192 kbps MP3 – and occasionally in the high-hats of CDs that I listened to hundreds of times in high school.
FLAC's great, and definitely serves a purpose, but I use it mainly for archives, not for casual listening.
Why would I use it? Because, as I said, it all fits on a microSD card (and that's a 256GB card, which is not the largest you can get). Yours would fit on the larger-sized card with plenty of space left over.
No, I did not say "it serves no purpose." Those are your words. And I didn't say I can't hear the difference, either, I said I don't care if I can't.
Sorry, I meant that not being able to discern any difference is literally the definition of useless.
And MP3s get stored on a lot of things that aren't SD cards, so that's a pretty weird metric.
But you're getting dangerously into gold-audio-cable-and-tinfoil-hat territory. Most people (including me, literally an audio expert) can't hear the difference in most cases so you're arguing for the increased space, and significant cost based solely on some notion of purity. That makes sense for archives, where they're being preserved for posterity and potential future processing, but not for casual listening.
Would you prefer that all websites served you only PNGs?
Gold audio cables? Now you're talking tinfoil hat territory :)
It IS "archival." Maybe when I'm dead and gone, so distant relative will be listening to this. Maybe they'll be able to hear the difference. Who knows?
I don't think the SD card is a weird metric at all. It fits in the phone, so even if, worst case, I'm traveling and rent a car (assuming it has Bluetooth), I still have all the music.
I'd gain absolutely nothing by having them as MP3s, and the price of a TB is only going to keep dropping.
Of course, I'd also gain nothing if you switched to FLACs. You perceive different tradeoffs than I do, so that's fine.
I still listen to my dad's classic rock records from the 60s and enjoy the hell out of it. But they sound like shit. ;-) If anyone someday down the line inherits your FLAC collection, they'll be listening to it out of nostalgia, not for the pristine audio quality.
You started with "I don't know why you'd..." and the answer is "because it's a waste of money" and "because you (mostly) can't hear a difference" and "because they work in my car".
As a side note, I have a phone that does take an SD card (in part because I like having my complete music collection there), but most people don't.
> I still listen to my dad's classic rock records from the 60s and enjoy the hell out of it. But they sound like shit. ;-)
You know why they sound like that? Because sound recording and copying technology was also shit. Garbage in, garbage out.
FLAC was made 50+ years after these records, and is basically indistinguishable from the real thing when made from something close to the original WAV masters, and played back on decent but affordable equipment. Until recently, this wasn't possible.
Not only that - with sufficient care, it will literally never degrade even by a single binary bit when it's being copied or stored, no matter how many times it's played or duplicated.
I have no problems listening to a FLAC now, or when I'm 90. I'm sure my descendants, if they care about my taste in music, wouldn't mind listening to the same files (possibly transcoded to another lossy format, or somehow improved by [REDACTED]) well into the 22nd century.
> If anyone someday down the line inherits your FLAC collection, they'll be listening to it out of nostalgia, not for the pristine audio quality
I beg to differ on that. The 30s jazz records that are so wonderful musically still sound like shit nowadays. That's a major deterrent to playing them.
> I have a phone that does take an SD card ... but most people don't.
I can't say about the numbers, but I didn't have much trouble finding a phone that took them in April 2021.
In general terms: over the last 50 years, it's never been a terrible move to waste CPU cycles or disk storage. Especially if it's a permanent choice.
One particular reason why you might want to preserve the lossless original is because: what if you want to re-encode it later? Including wireless transmission via Bluetooth.
I don't have great hearing, and I don't try to pretend to hear the difference between one well-enoded lossy file and the next. I can't.
However, audio that's been through multiple lossy encoding steps is generally not good.
> Sorry, I meant that not being able to discern any difference is literally the definition of useless.
That you think you might later re-encode is absolutely an argument for lossless formats. I'm not arguing that there's no use for lossless formats. The comments I was originally replying to was questioning why they're not universal. The reason they're not universal is because you need a special reason for them to make any sense.
> This seems rude.
I meant it tautologically: literally the definition of useless is a difference which has no measurable impact.
As the recipient: I've decided to cut @wheels some slack. You should, too. It's difficult to tell whether your online interlocutor is really a jerk, or just guilty of occasional jerk-seeming behavior. Who of us can say they've never done the latter?
The arguments for lossy codecs vs. dithering / bit reduction aren't identical, but it's a pretty good indictment of the SAVE ALL THE BITS argument. On the same domain as the parent article.
Amusingly enough, I did buy 3 classic albums that I didn't already own, in 24/192 format, and I even have an outboard D/A converter so I can at least tell myself it's not wasted.
The article is about Redbook CD audio (44.1/16) vs. "HiRes" lossless audio. It's about the limits of the human ear, and the engineering tradeoffs available for supporting resolutions and depths > 44.1/16.
That's not what the FLAC vs. lossy discussion is about, at all.
It actually is. The major "hack" in lossy formats is psychoacoustics. It's literally about removing the things you don't hear. At lower bitrates, it's about removing the things you hear less.
The argument Montey is making for 44.1/16 vs. 192/24 is partly about the hardware and distortion of higher formats, but he spends a lot more time in there talking about what we don't hear.
The real point is: what we hear is the important quantifier for audio formats. Arguments from data purity are almost entirely the same quasi-religious stuff that the hi-fi world has been producing since at least the 70s.
(And again, it's weird: there's nothing similar in images: lossless photos just don't make sense for general consumption; they're only useful for archival and post-processing purposes. But there are immeasurably less imagephiles than audiophiles. My guess is that it's because there was a time when being an audiophile was a class signifier.)
Even the original thread author was mostly making an argument from purity rather than from audio quality. Audio quality is the only thing that matters.
FLAC is great for some stuff (again, where you may need to reencode, where you're creating archives, etc.), but for casual listening, it's entirely pointless. For the same reason Montey spends most of that article on: because you can't hear a difference.
Montey is the creator of Vorbis. He's spent a lot of time thinking about what people do and don't hear. The reasons people don't hear differences in 192 kHz and 44.1 kHz are different than the reasons people don't hear the difference between a 256 kbit MP3 and a FLAC (at 256 kbps, virtually noone can hear a difference), but it's still an appeal to ears being the important instruments in measuring audio quality, not bits, or gold plated cables, or other esoteric things that seem to beleaguer impassioned audio hobbyists.
I know that Neil Young's "solution" went nowhere. I have all my music now, and I'm certainly not going to convert it all just to save some GBs. But you've certainly eliminated any temptation on my part to evangelize. Happy listening.
Do you mean Ogg Vorbis? Opus is a codec for speech.
I hate pulling rank so fiercely, but I literally wrote the second (i.e. first non-reference) implementation of the Ogg container format (which Opus, Vorbis and sometimes FLAC use). I know these codecs.
Ogg Vorbis and AAC hit similar levels of quality as a 192 kbps MP3 around 160 kbps. (That actually depends a fair amount on the MP3 encoder. The LAME VBR is particularly good.)
But I have an 11 year old receiver and a 12 year old car that can't play them. Hell, even iTunes can't without third-party codec plugins. MP3s are about as universal as it gets. Being able to play my files everywhere is pretty high up on my list of concerns.
As I understood things Xiph intended for it to replace both speex (for low-bandwidth low-latency voice) and vorbis (for medium bitrate lossy audio). Is this understanding wrong?
>Ogg Vorbis and AAC hit similar levels of quality as a 192 kbps MP3 around 160 kbps. (That actually depends a fair amount on the MP3 encoder. The LAME VBR is particularly good.)
While this is true there's a bit of nuance to add. Some people really don't like the sound of vorbis's artifacts on difficult to compress audio. Maybe it's growing up with fried mp3 recordings being common, but mp3's artifacts are less jarring.
Opus is meant to obsolete every other lossy audio compression [0]. You make a good point about hardware support, but technically it's the winner for generic use, not just speech.
The same holds true for cassette tape decks in old cars.
> even iTunes [...]
That's Apple's policy. Personally I wouldn't use the word "even" here.
It's like saying "Not even the butcher sells vegetables!".
Apart the Apple ecosystem, opus is as established as it gets. Android supports it natively since 2013(?) (Android 5.0).
So, yes, if you want TeX-like backward-compatibility, it might be a good choice.
For all others: knock knock The new millennium arrived!
P.S. I do think that I don't have the worst hearing and kind of decent listening equipment and I can nearly half the file size using opus at a comparable quality.
MP3 is indeed universal, and iTunes doesn't support FLAC. However, on my Mac there are any number of third party players that do (e.g. I have Elmedia Player at the moment), and on the Android almost every music player app supports FLAC.
I don't know about an iPhone. I assume you can find apps that support it.
I actually went digging for a bit on information about audio codecs being decoded in Qualcomm's hexagon DSP. I'm no expert but it seems like recent generations of the platform (anything post-2013 or so) don't bother with hardware decoding for audio codecs (except for in voice calling with AMR and EVS). It appears to be all software codecs for many generations now. Makes sense as even the efficiency cores on modern mobile devices are way overkill for audio.
Lossy codecs suck at gapless playback. Some have support via vendor-specific extensions, but it’s hit or miss whether a particular player will support it. With FLAC you can always be 100% sure that your album is gapless.
Nope. Opus and Vorbis are better, but they still don't support 100% gapless playback.
E.g. create a full-scale sine wave, split it into two files, then convert them to "gapless" Opus. Now open the files in Audacity and you'll see that there's a small amount of ringing at the boundary, so it is not truly gapless.
If you try the same with AAC using Apple's gapless metadata (i.e. iTunSMPB) you'll find that the boundary is perfectly continuous.
AFAIK iTunSMPB just tells you where playback should start/end, which is also all the Ogg metadata does for you. MP3s can have iTunSMPB too, it's just a matter of if a player supports it. Is there any specific feature of AAC that makes gapless playback possible?
Nope, there’s nothing special to gapless playback. But as it stands, libopus cannot produce truly gapless files right now. Is it a bug in libopus? Most likely.
It's about $50 for a Macbook Pro. And I have my music on 4 devices. Sure, I could throw $100-200 at music storage that I can't hear any difference in, but why the fuck would I? Again, this is golden-speaker-cables stuff.
Why carry it around on portable device? I've been streaming FLAC from a NAS for close to 10 years originally over sshfs and then later via plex. In 3G era I just used Google Music which transcoded my FLAC collection to 320kbps mp3 and I streamed it this way on the go. This was feasible around as early as 2009?
I only have 3 GB of data in my data plan, which is always more than enough for me, unless I'm working over my cell connection. I could get more data, but doing that just to stream music (when I already have a working solution for such) seems silly. I think in the US people often have much more expensive and large data plans than elsewhere. (My entire cell phone bill is €12/month, including 3 GB data, and unlimited calls and texts.)
I also very regularly go places with mediocre cell service: country side, airplanes, boats, and festivals.
Well, because you're carrying the portable device around anyway. And it doesn't require a network connection, so it works even when you have no cell coverage.
I don't quite understand how your "streaming it this way on the go" worked?
It's pretty rare I go anywhere that lacks good enough cell service to stream FLAC with. I don't even live in a city, but I get it if you're really remote or can't afford cell service.
Depending on context, you can tell the difference; especially on larger sound systems. 192kbps MP3s played through a 2 kilowatt sound system in a club don't just sound bad, but they feel bad as well.
I've got a couple of DJ friends that have taken to ripping their collections (CDs and vinyl) to WAV files as most DJ software didn't support FLAC playback at the time.
That's questionable. Again, I don't believe anyone who claims things about sound unless they're doing a blind test with well-encoded files. You need to make a stronger argument than "they feel bad". Why? What frequencies are not being reproduced accurately and why do you think that makes them sound bad?
A "2 kW" sound system doesn't have any special properties by nature of its wattage, but a club does have a sound system which can reproduce frequencies that are below that what a typical home (or studio) system can. MP3s are usually encoded with a high-pass filter (usually around 20 Hz), and some club systems can get down close to that range (more often 30 Hz), but it's very questionable that they're hitting border.
It is true though that on those types of systems, you're more likely to expose the effects of doing things like double lossy encoding, use of bad encoders, etc.
The further away from the "ideal" reference listening case of high fidelity headphones in a quiet room, the less likely that the assumptions behind the psycho-acoustic model used in the lossy codec are valid. Frequencies can be missing because of shitty speakers, room cancellation, or deliberate post-processing, and expose sounds that the codec thought would be masked. On my surround sound, the difference between a 320kbps mp3 and a FLAC is not only audible but obvious when played in Pro Logic 2 mode, which exploits phase information to place different sounds in different speakers.
So you manage a lossless and a lossy library? That sounds like a lot of work.
My collection is lossless and nearing 2 TB in size. Too large for mobile usage on iOS, where I don't really need that kind of fidelity anyway. However I still don't want to manage a second lossy library and keep them both in sync. Luckily Apple's Music app can be configured to transcode your lossless audio to a lossy format on the fly whenever you sync your collection to your iPhone. That way I can have my whole collection in my pocket without having to manage it.
Because my music collection would otherwise take up more space than the 128 GB of internal storage my phone has? FLACs are nice, but they're also a few MB per track.
For most music, it's already very difficult to notice any degradation at 128 kbps with modern codecs/encoders.
You have to know what to listen for, and even if you do, why should you care? It's a small price to pay for being able to store several times as much music on your phone, or for saving on bandwidth while streaming.
I just checked a few of my latest CD rips. The FLAC rip is 9-10x as large as the Opus 128bit files. One EP was 338MB as FLAC and 33.1MB as Opus.
On the go, the listening environment is far from perfect, so Opus is fine. I still want the FLAC files on my NAS as an archive or for listening in a better environment in my home.
I got one just 17 months ago (it also has a headphone jack).
I can't predict what manufacturers will do, of course, but with headphones, you have Bluetooth. What's the corresponding substitute for the SD slot?
Which, btw, is another unconventional observation of mine: you no longer need the newest, fanciest phone from Samsung, Apple, or Google. At one time you did. The cheaper ones are more than adequate nowadays and most likely you'll be replacing it every couple years anyway. YMMV.
I can’t understand the very first step in the compression process: “The left and right channels are converted to center and side channels through the following transformation: mid = (left + right) / 2, side = left - right. This is a lossless process, unlike joint stereo.”
Because you can recover left and right from mid and side, you can also multiply both signals by two to make division result to be always integer value.
Surely multiplying both signals by two is changing them? What if overflow happens? How do you store the information that it needs to be divided back when decoding?
Then why divide by two at all? Also, adding one bit for every value (or pair of values) means that data size gets a significant increase from the start.
The point is to decorrelate the channels. The left and right channels are usually mostly the same, so coding them both would basically send the same signal twice. With perfect decorrelation, the side channel would become zero, instantly saving half the bits.
(It's the same idea for images, where the RGB channels mostly all look like grayscale copies of the image, so the YCbCr/YCoCg transform is done to decorrelate them.)
There are actually three methods in FLAC: mid/side, left/side, and right/side. Each frame can use a different method and stores what method it used (if any) in the frame header.
The difference has a range 1 bit larger than the original, but this doesn't matter that much since everything is getting compressed anyway. Anyway the bit is only used if the side channel is very large, ie. the correlation was poor, in which case it would be better not to use a decorrelation for this frame.
Still flaky with double ID3 tags (front/back) or issues parsing metadata (so it fails decoding), but the issues with variable block size were solved. Most FLAC I handle on a large system play fine.
There are other issues related to streaming the FLAC via Range requests depending if it is WebAudio, <audio> or directly in a tab, however this applies to all audio/media in general.
A lot of tagging programs add them as they just get appended or prepended to the file regardless their content. Sometimes one program reads a format but writes in the other, I have personally found a FLAC in the wild with ID3v2 at the front, ID3 at the back (corrupt) and FLAC metadata as well. The FLAC metadata inside was wrong, the valid one was outside at the front (sadly it was a broken JIS encoding).
There is no sane approach to media. Between legacy formats, legacy tagging, and all kinds of implementation specific bugs, you are in for a ride if you want to cater to more than one decoder out there :)
I have often wondered why other media formats don't do a similar thing, especially since changing a media file's tags (which can change the checksum of a file) or name (which makes external verification from txt file difficult) is quite common. I even wrote a utility[0] that uses ffmpeg to hash all ffmpeg compatible bitstreams, and store their hashes in a xattr (yes, with lots of other options to test and compare, etc.), but all media formats should just be as clever (and care as much) to do this natively, like FLAC.
I mean -- why not?
[0]: https://github.com/kimono-koans/dano