Hacker News new | past | comments | ask | show | jobs | submit login
FLAC 1.4.0 released – added support for 32-bit audio (xiph.org)
171 points by thrdbndndn on Sept 14, 2022 | hide | past | favorite | 114 comments



Nice! I (ab)use FLAC to store waveform data that isn't audio, but compresses better using audio codecs than with general purpose compression algorithms. This will let me remove some hacks to compress 32-bit data as multiple channels.


What data are you compressing?


A variety of sensor data. Nothing coming straight off an ADC needs 32-bits, but I do have need for it in various cases.

* The sensor system wants to do unit conversion for me. This data doesn't really have 32-bits of dynamic range, but for archival purposes I'd like to store the exact bits without adding another quantization step - that can happen later.

* The sensor system combines multiple ADC channels HDR style. I'd rather get the raw readings and do the combining myself, but don't always have access to it.

* Some more advanced preprocessed results, akin to a GPS solution. Linear predictor based algorithms are perfect here since the full possible set of values does have high dynamic range, but the upper bits change very slowly.


I'm kinda curious. What kind of sensors? (If you're able to divulge it that is; which given how you've phrased it, I wouldn't be surprised if you're under NDA or similar)


I've used it to record oscilloscope captures. Especially useful since audio cards are cheap high res adc's.


Now that 24b/192kHz cards are widely available, I wonder if you could hook a loopstick antenna up to one and get WWVB (60kHz carrier)? In theory as long as the signal is above -144dB you should be able to tease it out...


SNR even on the best converters is unlikely to be better than 120dB, but still - seems likely.


I'm amazed that consumer-grade audio cards are still available and cheap, to be honest. I can't recall having seen one in years, then again, I haven't had a reason to look for them in years either.


To be clear: actual "PCI-slot"-ish audio cards are now rare. But USB / Thunderbolt (and previously Firewire) external interfaces are a dime a dozen. Serious audio mostly uses them now.


Creative is still marketing them to gamers, amazingly.

Can't say I haven't been tempted.


Pretty common for any hobbiest who wants to record things too.


it's a weird market segment. the inside of a computer is not a great place to put a DAC, due to the EMI from all the other components. an audio enthusiast is going to strongly prefer running optical or USB to an external DAC/amp. a non-enthusiast is likely to be happy with whatever comes with their motherboard. it's hard to imagine a scenario that actually makes sense for a high-end audio card, but I'm sure some people buy them anyway.


Anything with a mic input will work. Audio interfaces over USB are plentiful these days though the discrete PC sound card is pretty rare :(


Microchips are cheap, once initial investment is made. I guess, it only costs ~10¢ for new SoundBlasters at 90nm-130nm.

The price of 10 sound cards will be enough to pay for a whole wafer.


Finally! I can ditch away 32b special casing in my project cleanup and backup automation! As much as I love WavPack[1] (it's great and has 32 bit integer/fp support since forever), Flac is much faster and probably more future-proof. Thanks a lot everyone at XIPH, your work really makes a difference.

Edit: oops, I can't. No fp support yet.

1. https://www.wavpack.com/


It is sad that a technically superior and patent free technology often lose out in the popularity contest. Sigh.


What do you mean? Both FLAC and WavPack claim to be patent-free.


I am not aware of FLAC claims to be patent free ( when it started ), it may be royalty free and patent free now given it has been 20+ years since 1.0. They do however claims they dont hold any patent on it.

WavePack was specifically designed to be Patent Free, or using old and expired Patent techniques.


I personally can't care less about 32-bit, but unfortunately a few digital music stores I buy from use this format (delivered in WAV). And I don't want to downscale them simply for archival reasons.

Now I can at least re-compress them losslessly in FLAC. (previously I have to use Wavpack, which isn't bad but I'd prefer to have a single format in my media library.)


the noise floor at 16 bit is already low. 24 bit, much more common (and imo needless for consumers), is 256x lower than that.

there’s no part of the analogue capture process that can utilise such a low noise floor (as far as i’m aware): is there anything of value worth preserving there?


>is there anything of value worth preserving there?

Not really in term of audio itself, just easier to do bitwise comparison with the original file, in term of file integrity.


Still, higher bitrates can give you valuable information of the noise spectrum of your recording chain. This can potentially be useful in post processing. It also gets rid of the need for a dithering step before certain effect chains.


ffmpeg can downsample audio to a more manageable rate. I use it to reduce hi-res flacs to 16-48, which cuts it down to a ~third of its original size and is supported in my car stereo and everywhere else. Can't tell any difference on playback.


Audiophiles usually recommend using some better algorithms such as SSRC resampler (foobar2000 have it as plugin handily) than whatever FFMPEG provides.

I can't say I personally can tell the difference, but based on my (limited) knowledge of DSP, artifact-free resampling definitely isn't a straightforward thing.


> artifact-free resampling definitely isn't a straightforward thing.

It's more that in practice artifact free resampling isn't a thing at all, so you have to choose what your artifacts do.


> artifact-free resampling definitely isn't a straightforward thing.

It definitely isn't in general sense. Thankfully, limitations of human range of hearing (20Hz-20KHz, < 100db dynamic range) make the limitations non-issue, except for people who claim they can hear ultrasonic frequencies or can hear Cymbals when played with over 124db of dynamic range.


When you scale a 24 bit number to a 16 and halve rate from 96 to 48, the calculations are easy as pi. Not like we’re running them thru jpeg twice. Like taking an average, scaling down from 20x more quality that can be sensed to 2x more quality that can be sensed.


Boomkat?


I've heard of that too, but in my case it's batch of Japanese distributors like e-onkyo [1] or Ototoy [2].

[1] https://www.e-onkyo.com/music/ [2] https://ototoy.jp/top/


Yes, unfortunately all of Japan has jumped headfirst into something called "hi-res audio" which is a complete scam, as CD audio is already the highest quality something mixed for humans could possibly need to be. I'd like to think it's an excuse to stop doing loudness war mastering, but it's probably just as bad.

The real problem with CD audio is that it's premixed in stereo, and the real next step is Atmos/spatial audio. Just because you only have two ears doesn't mean you can't hear 360 degrees.


> which is a complete scam

I can hear the difference between 16-bit audio and 24-bit audio. 16-bit isn't bad at all, but 24-bit is definitely superior. I've worked with 32-bit on digital boards, and I think it is probably over-overkill, but whatever makes someone happy.

But one thing I know for certain and beyond all possibility of doubt is that I only have two ears. All that is necessary for 3D sound is stereo speakers. To be clear, with only 2 speakers sound can be mixed to seem like it is coming from any direction the engineer desires, left, right, top, bottom, front, back, and everywhere in between. And stereo speakers are dead simple to set up correctly for 3D sound. Sure, you can use 6 or 8 or 10 speakers to get the same effect, if you can figure out how to properly set them up, which is not a given, but I think it's a dick-pull and a money grab because no one can hear with more than two ears.

I realize there are Surround fanatics out there and in here that feel very strongly about it, but the physics of acoustics tells me their set ups are no better than and less efficient than stereo, and nearly always incorrectly configured.


> I can hear the difference between 16-bit audio and 24-bit audio. 16-bit isn't bad at all, but 24-bit is definitely superior.

Generally means they got the dithering wrong. (You might be hearing the dithering actually.)

How well can you hear quiet music while standing next to a running power drill? 16-bit can represent that accurately.

> But one thing I know for certain and beyond all possibility of doubt is that I only have two ears. All that is necessary for 3D sound is stereo speakers.

But it doesn't matter how many ears you have. The problem is that it always sounds like it's coming from the same direction, because it's premixed. Can't turn yourself around and have the front and back switch, or move your head around to get a better localization.

Even if you don't care about that, it won't sound perfect because the mixer doesn't know your HRTF or if you're on headphones vs IEMs vs speakers.


> But it doesn't matter how many ears you have. The problem is that it always sounds like it's coming from the same direction, because it's premixed. Can't turn yourself around and have the front and back switch, or move your head around to get a better localization.

Is this like VR, but for sound?


Sure, or it's just sound in VR.


> Generally means they got the dithering wrong. (You might be hearing the dithering actually.)

Who are "they?" And when was I converting higher bit rates to lower? I have no issue distinguishing between pro-quality 16-bit and 24-bit ADC, no dither involved. This is not exceptional, and many can consistently distinguish between them even not knowing before hand which converter was used. It is theorized that ultrasonics picked up by the higher converter can affect sounds at a lower frequency, which can be heard. 32-bit ADC sounds no different to me than 24-bit, and though I understand the application, I've never needed absurd amounts of dynamic range.

> The problem is that it always sounds like it's coming from the same direction

Once a proper stereo field is created, a listener can not accurately tell from where sound is coming, regardless of head movement. That they falsely believe they can is only due to psychoacoustics: they see a speaker and assume it is the source of the sound. Repeated studies involving arrays of speakers with only two active have confounded listeners who have insisted sound was coming from inactive speakers above and behind them.[1] These studies were performed in a large room with wall treatment, and the listeners were allowed to move around the room (and obviously were not paralyzed to prevent them from turning their heads). Because you know where the speakers are, you will assume sound is coming from them. A stereo field will not collapse if you turn your head, unless the stereo field is only as large as your head or smaller. Your assertion is false.

> But it doesn't matter how many ears you have.

This is ridiculous. If you only had one ear, or were deaf in one ear, you could only hear in mono. This is known as counter-example proving your assertion incorrect on its face.

> The problem is that it always sounds like it's coming from the same direction, because it's premixed. Can't turn yourself around and have the front and back switch, or move your head around to get a better localization.

Again, the reason why you confidently believe you know where reproduced sound is coming from is because you can see your speakers, you know where they are, and obviously, that is where you know sound is being produced. Sound localization is confounded by sounds bouncing off surfaces and reaching one ear or the other, in one example, before those coming directly from speakers. It is impossible to determine if the sound was directly perceived or indirectly perceived, namely because sound is invisible.

If you have a tiny stereo field, what you describe is possible. This is common in DAW workstations set up to maximize space, and this is why professional control rooms are larger and have more than one set of speakers placed at different distances apart from each other and further away from the engineer. IOW, it is the reason medium and far field monitors are necessary to properly mix professionally. Once one or both ears are outside the stereo field, you'll have a different aural effect. But if your stereo field is large enough (which necessarily involves increased sound pressure) and your ears stay within it, turning your head, standing upside down on your head, or turning around will make no difference.[2] Your ability to determine the sound direction is only reinforced by what you see and what you know. If you were blindfolded and did not already know where your speakers were, in a properly treated room and listening within a stereo field, you would not consistently be able to determine where your speakers were nor if you were listening to stereo or surround sound. These beliefs that one can determine sound direction in only a stereo field are engrained due to assumptions and visual cues, and due to the home stereo industry that marketed surround sound for the home. The only way to dislodge these false beliefs is in a full demonstration set up like the studies I have described above. Only when you are certain sound is coming from one speaker and not another, and shown that the speaker you insist is the source of the sound you hear is not connected to anything and not active will it finally click, and you'll recognize the power of psychoacoustics and how you've been fooling yourself all along.

[-1] https://en.wikipedia.org/wiki/Auditory_illusion

[0] https://en.wikipedia.org/wiki/Franssen_effect

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4744263/

[2] https://en.wikipedia.org/wiki/Sound_localization#The_cone_of...


> All that is necessary for 3D sound is stereo speakers. To be clear, with only 2 speakers sound can be mixed to seem like it is coming from any direction the engineer desires, left, right, top, bottom, front, back, and everywhere in between.

Sure, if you want to sit perfectly still in a tiny spot this will work. And don't even think about trying to move closer to that one instrument you hear coming from the rear left.


You are describing a synthetically small stereo field, which is common in DAW setups, and the reason professional control rooms are larger and proper mixing requires medium and far field monitors. If the stereo field is large enough such that in turning your head or walking around neither ear leaves the stereo field, in an enclosed room with reflective surfaces, when blindfolded or if unknown dummy inactive speakers are displayed, accurate sound localization is confounded. You only "know" where sound is coming from because you can see your speakers, and psychoacoustics does the rest. See my comment above.


> Compression of preset -3 through -8 was slightly improved at the cost of a small decrease in encoding speed by increasing the precision with which autocorrelation was calculated (Martijn van Beurden)

Would love to see some benchmarks on this. Since FLAC hit 1.0 over 20 years ago, CPU speed has increased by ~100x, so a small hit on encode speed for better compression mostly doesn't matter. Then again, storage space has increased by the same amount, too.


Btw there are alternative flac encoders, like FLACCL using GPU: http://cue.tools/wiki/FLACCL It compresses much faster than software libflac and gives smaller output files.


32-bit integer, not float.

I didn't even know 32-bit integer was a thing. Float is far more common. (For audio.)


This does not matter…

32bits is 32bits is 32bits.


It absolutely does matter when the compression algorithm is expecting audio in a certain format. If you put something else in those bits you will get worse, or no, compression, and might as well just use zip.


I must be missing something here, but last I checked 32bit float uses about 8bits for metadata —- positive / negative, exponent, etc. — leaving 24bits for signal storage. A 32bit uint can just store the bits of raw data and would allow for denser packing of bits.

You can then do processing with a 64bit float after converting the int’s bits without loss.

Pretty sure every ADC IC I looked at uses integers. During analog processing, using the term “float” is highly disingenuous for an analog waveform, and if an ADC spouts out an int, what float are we talking about here?

I maybe have gone too deep in the weeds reading TI chip manuals and writing drivers for some adc chips for my 8 channel, 32bit, 192kHz recording system, and maybe I missed some nuances in what “32bit” means, but last I checked my audio is pretty solid even after converting the 32bit int to 64bit float during my signal processing.

Audio engineering has been a massive pain because these people use different terms for basic things (great example is “phantom power” — it’s just a 48v, low-current power supply, but good luck finding a simple answer to that useless term) and there doesn’t seem to be any authoritative references to find answers, so I am sure I am missing something here.


You are not going to get good compression reinterpreting a 32-bit float as if the bits were an integer.


What?

Most systems are 64bit now, I wouldn’t be doing signal processing on a system that wasn’t. If my signal has 32bits to work with during capture and my scratch space is 64bits during processing, I can assure you I am not losing any signal during conversion between int / float.

What am I missing?


32bit float would be more interesting for music production.


After reading "How Music Works" by David Byrne, it seem like FLAC is solving the wrong problem. No one needs over 4 billion discrete levels.


You're correct; lossless formats aren't needed by the average consumer. But the problem they solve is a very real thing. Think of the difference between a repeatedly edited JPG vs. a repeatedly edited PNG. Every save of the JPG destroys information irretrievably, whereas every save of the PNG keeps it all. It's the same with MP3 vs. FLAC: MP3 throws away information each export, but FLAC doesn't.

For an example, music artists/producers can't afford to lose information when they "bounce" a track. FLAC solves that. "People" don't need lossless, but for those that do, it's great.


(I think) They're not saying FLAC is pointless, just that 32 bit audio is. Even music producers record and export at 24bit, 32/64bit is only used for the effects and mixing (because compounding quantization noise and rounding errors is very real).

You can't find 32 bit PCM music anywhere, even hifi DVD audio is 24bit/96khz.


We don't always record and export at 24-bit, though that's always the end result from mastering (if not 16-bit). In some contexts, 32-bit files are common for samples and for transferring in-progress artifacts like mixed tracks that are ready for mastering.


> You can't find 32 bit PCM music anywhere

Yes you can (not that I think it’s useful): https://ototoy.jp/find/?q=32bit

(via https://news.ycombinator.com/item?id=32843619)


With the recent debacle regarding that audiophile vinyl company secretly using digital in the process and nobody noticing, I'm just gonna assume that company could as well just interpolate up to 32bit...


> You can't find 32 bit PCM music anywhere

The real reason is that there's no 32-bit floating point digital audio converters, so distributing the audio in a format which needs to be quantized to be played anyway has no purpose (outside of deceptive marketing, of course).

Even integer 32 bit makes no sense for a final format - the available noise floor is just beyond reasonable, 24 bit is more than enough. In practice, 16 bit is too.


I have a lynx hilo and it does 32 bit input. It’s incredibly nice for recording because if the input hits clipping range you can just lower the volume on the 32 bit recording after the fact and the recording is preserved. Lots of high end DACs have a 32 bit input output option because of this.


Lynx hilo has 121dB input dynamic range and is built upon cs4398 which is a 24-bit DAC chip.

If your input clips, you usually don't need to lower it too much to protect your ADC, and if you recording the sounds so quiet they end up below 24bit noise floor after this, well they probably didn't clip in the first place and you most probably already facing bigger issues with mic/preamp/ambient noise and dynamic range.

Also, you mention hilo can take 32 bit input. Means, the signal is already digitized, and we're talking about recording?


"People" don't need just lossless, they need The Beatles to release all of their pre-master tracks to be released in lossless format, so they of them can create their own remastered versions of a the tracks.


16 bit is not enough?


No, if you want your final output to be 16-bit (eg CD quality), then you need more bits to work with when mixing. Each operation you perform will be requantized (rounded) to 16-bit resolution adding a bit of noise each time, until eventually the lower 4 bits (for example) are nothing but noise, and your signal is effectively only 12 bits. One the other hand if you did that mixing at 24-bit resolution, then you still have 20 bits of good data remaining, you throw away the bottom 4, and have nice clean 16-bit data to distribute. 32-bit integer is overkill in most situations, which is why it wasn't implemented for so long. 32-bit float is used a lot because it has 24-bits of resolution, but also has a huge scaling range, so you don't have to worry about clipping during intermediate operations, just at the end.


I am a recording engineer and I will choose 32 bits over 24 any day when recording.

Of course you need the right (exceptionally good!) preamps to even make a difference here, but when you are manipulating audio 32 bits and upwards (for summing) are the default (for a reason).

Having something to store those files with a little bit of lossless compression is a welcome addition


> I will choose 32 bits over 24 any day when recording.

There's probably no point since your audio interface doesn't produce 32 bit audio anyway.

For transient formats it's another story: if a bit of audio gets re-printed multiple times with some processing between renders, then it's beneficial to store it in a DAW-native format (32/64 fp) to avoid quantization or multiple additions of dither noise. For recording you just waste space.

> Having something to store those files with a little bit of lossless compression is a welcome addition

You'd probably want to store 32/64 bit floating point, and WavPack works really well. On comparable formats it achieves ratios really on par with Flac.


My audio interface is a sound devices mixpre 10 which records 32 bits. It also has preamps with -133dbV noise figures.

In the practical sense what this means is that I have to be less careful with setting the gain. Typically when setting recording gain you want to go as high as possible so preamp/mic noise doesn't become an issue, but not so high that you are going into the limiter when something unexpectedly loud happens.

I often record in environments where unexpectedly loud things happen, so having a recorder that allows me to record at a lower level while still not gaining noise issues or loosing detail in the silent parts is a very welcome thing.


I understand where you're coming from and the supposed benefits of a higher dynamic range.

In my experience as a mix engineer working with music, recordings, when made proper, rarely use the full available dynamic range of 24 bits. Often I get tracks recorded too low due to incorrect gain staging. But maybe even more common is just a noisy recording. Poor mains wiring, the length of a recording chain, self-noise of various gear used all contribute to what usually amounts to noise levels much higher than -144dB. And, to be honest, in my line of work it rarely matters until it crosses the 16bit noise floor of -96dB.

What I suppose I mean by all of this and my previous posts is if you're not careful enough to fit into the higher portion of the available dynamic range of 144dB you're just adding more noise regardless of the format used, as in most cases the level of cumulative noise is higher than -144dB. YMMV, of course.


It's very unlikely your audio interface can actually record at 32 bit resolution.

https://www.mojo-audio.com/blog/the-24bit-delusion/

32bit or more makes sense for dsp and summing, and non-musical data storage. But your I/O is most likely not giving 32 bits


32-bit is common in field recordings, like with a Zoom (and there's reason to believe they can legitimately take advantage of it).


That's a bit different, as it's 32 bit floating point, not integer. 32 bit float is often used for summing and that's what's going on here. Those are done by having 2 (or more) parallel ADCs (usually 24 or 16 bit) with the input voltage scaled lower on each successive one, and then they are digitally re-scaled and mixed together using a 32 bit float format into a single stream (double that paragraph for stereo).

The big benefit from 32bit float is you can go "above" 0db, so your digital operations can't "clip" in the analog sense. I definitely see the benefits of 32 bit float hardware, especially for field recordings (zoom aren't lying, it does work). But it's a different format and the internals are relying on the integer based format and don't exceed 24 bit integers.


I was thinking flac 32 bits is float as well?


My sounddevices mixpre 10 records in 32 bits. Granted not all of those bits will be significant, but the preamps are good enough to take 24 bits to the limit.

32 bits are less critical to set the gain for (more headroom) and audio files are tiny anyways (compared to say, video). So in a practical sense I need to be less precise with setting the gain, or can record a bit more dynamic events. I am very happy with the results.

Sounddevices has a good explaination if you are curious: https://www.sounddevices.com/32-bit-float-files-explained/


> I will choose 32 bits over 24 any day

why?


Because if the recording goes past the clipping range for digital audio it still records the data and preserves it. 32bit recording has serious advantages because it’s much easier to record a hot input because digital clipping is not possible. You just need to pull the volume down on the recorded clip after the fact and there is no clipping.


You mean 32 bits float? This is 32 bits integer


Nyquist Shannon theorem, sure. There are situations where you'd use 32bit audio but not for mastered music you'd listen to at home really. Although people still won't believe you, hence the market for thousand dollar cables and other snake oil.


From my understanding, doesn't Nyquist-Shannon assume you take infinite steps to refine the discrete steps into the final waveform?

Since real-world DACs don't have infinite taps, either increasing the number of samples per second in the original audio or the number of steps done by the filter will improve how close it gets to the original waveform.

Would that apply to the number of bits used to represent the level as well? I thought that was mainly useful to give some additional headroom when editing?

(Obviously there's diminishing returns either way)


Nyquist-Shannon simply says that you need 2x the sample rate of the highest frequency to perfectly recreate something. So, to recreate a 20 kHz wave that was sampled at 16 bits, you'd need 40 kHz sample rate. Bump that up to 22.05 kHz (for anti-aliasing) and you get CD audio.

The bit depth, OTOH, is your signal-noise ratio (noise floor) when sampling/quantizing said audio. It's unrelated to Nyquist-Shannon. More bits gives you less noise when reproducing it. For CD audio, 16 bits was chosen to give an acceptable SNR of 96 dB.

Monty Montgomery (of xipf.org) has a nice YouTube video related to this: https://www.youtube.com/watch?v=cIQ9IXSUzuM


Yes, you only need that many samples given you know a maximum frequency. But to actually perform the conversion from discrete samples to continuous wave takes infinitely many steps. The number a given DAC actually does varies. Some do hundreds, some thousands, a couple rare (very expensive) ones do 100k or more.

That's what I'm saying. Increase the number of samples in the audio itself, and the DAC can get closer to the original.


Except it's not. The video shows that. In the real world, the DAC doesn't immediately jump from code N to N+1; it slides (the "slew rate" of the opamp). More bits just makes the difference between codes N and N+1 smaller; i.e. where it slides to is more accurate to the original. The "stair step" or "straight line" view you get from programs like Audacity is a lie.


A final low pass filter (tuned to remove anything above half the sample rate) allows the accurate reconstruction of the original analogue waveform from the sampled digital data. ( The vertical 'stair step' of sampled data is very high frequency information which gets removed, 'bending' the output back into it's original shape). The video linked above is well worth a watch.


There are definitely scientific uses that might want to (ab)use FLAC to store sound-like data that would benefit from 32 bit.


I got this one "remaster" of an album that I like that I've been working on that the 32-bit version that 100% definitely goes over 0 DB just happens to sound a whole lot better when I don't correct for that so on occasion I like to load up that album in Ableton live and listen to it exactly like that.


There's a real use case for storing original tracks and masters in 32-bit and down-sampling for consumer use.


How do you know no one needs it for anything? How do you know no one will ever need it in the future? It's not possible to know that. All we can do is capture the real world data with as much quality and precision as possible.


I don't think it's for storing music.


While I doubt that human ears need 4 billion discrete levels, I think it's important for archival purposes.


Is it? What is your reasoning? And why would you pick 32-bit, instead of 24-bit or 64-bit or something else?

Actual audio equipment has a noise floor. The encoding depth you use also has a noise floor. If the noise floor of the encoding is far below the noise floor of the signal, then it won't be perceptible.

So you choose to put your noise floor at some amount below the existing noise floor. Not infinitely below, because that would require infinite bits. This is the reasoning you'd use for 24-bit audio, which has a very comfortable noise floor of -144 dB, which leaves a large margin even for extreme low-noise professional equipment (you might see -120 dB ish for extremely good equipment).

At 24-bit, even passive components like transformers and resistors are contributing measurable amounts of noise.

The reason you might pick 32-bit in practice is so that you can have lots of headroom for some DSP algorithm, or do lots of sums of different signals without accumulating quantization error. The final "archive" file will still have worse than 24-bit precision.


Fair enough! That's what I get for commenting on something I'm not as familiar enough with as I should be.


I had no idea FLAC didn't store 32-bit. I'll need to look over my library and make sure nothing strange is going on.

I'm curious if sites like Qobuz will suddenly replace some of their existing files with higher-bit copies if they already had them and just had no way to encode them as FLAC.


It is extremely hard to justify 32-bit audio from a technical standpoint, since it requires that every component in your audio chain has better than 144 dB dynamic range. In practice, just about none of the analog components in your audio chain will have that kind of dynamic range, and any analog components in the recording chain are unlikely to have that range either.

The kind of equipment I'd expect to see in order to record that kind of range is where you'd have something like multiple microphones, and switch back and forth between them based on the level of the material you're recording (which would have to be something like actual gunfire or explosions).

I don't know what kind of equipment you'd need to reproduce this dynamic range, but I don't think it's audio equipment.

My guess is that the main real application of this is going to be compression of non-audio data that happens to compress well with FLAC.


In audio production, the relevance of 32 bit audio is that it's using the same 32 bit float representation of the audio that most of the signal processing stages are also using. In contrast, when people talk about 16 or 24 bit audio, they are customarily referring to an integer representation.

In principle, every time you convert from the 32 bit float back to 24 bit integer, there's an opportunity for a careless human to screw up the scaling, and throw away some of the available integer range. Rinse and repeat until you have an audible problem.

So this forms the basis for an argument in favour of keeping the audio in a float representation for as much of the production process as can reasonably be achieved.

I don't see any benefit in 32 bit representation for delivery of the finished content, but I suppose that if, as part of the production process, you're transferring the audio online between different sites/people/whatever, then having lossless compression that works without having to convert back to integer might be useful.

EDIT: Just read the thing more carefully, and realised that it's specifically talking about 32 bit int, NOT float. So right now, I can't see much practical use for this.


You're absolutely right insofar as you're speaking about static audio that's already been produced and finished - there is almost no point in storing anything above 24-bit integer as far as dynamic range even for archival purposes.

However, there is a legitimate purpose behind having higher dynamic range for production purposes and sample sources. There are some recording sources that can actually produce 32-bit audio. Plus, you might want to do some processing on the sound that would end up affecting the dynamic range, or otherwise benefit from the increased resolution. One example is nonlinear processing that generates new musical information from the original signal - you can of course just reduce the gain after processing, but you are then sacrificing some of the resolution of the new combined signal, which itself could otherwise be used by further downstream processes. This all happens post-recording, but can still be musically important before getting to the finished product.

This is why DAWs work in 32-bit or 64-bit processing internally, and why many high-quality sample libraries will come in 32-bit, especially smaller one-shots. I often convert samples to .flac for space reasons, and have to either skip 32-bit .wavs or downsample them to 24.


> However, there is a legitimate purpose behind having higher dynamic range for production purposes and sample sources.

This is the justification for 24-bit audio... is there a reason why 24 bits is not enough here?

If you're capturing audio sources directly, you'd use something like a 24-bit ADC, which you can find easily enough. The "raw" output of the ADC is 24 bits.

If you're doing intermediate processing in your DAW, then the DAW is using single-precision floats (or possibly double), which cannot be losslessly converted either to 32 bit or 24 bit integers, so how would you choose the right format to store? It seems to me that you'd either store the original floating-point data, or you'd perform some kind of lossy conversion to a high-quality archival format... but if you do that, isn't 24 bits good enough? You're quantizing either way, and at 24 bits, you can have plenty of headroom and noise floor at the same time. Loads, even.


There are two justifications, one, for recording - 24-bit is the standard in the studio yes (and more than you need for that context indeed), but 32-bit is more and more the standard for field recordings where the hardware is capable of it and it provides genuine utility, where you often have extremely soft and subtle sounds captured that you want to increase in gain to a more useful level.

Also, once inside the digital world, there are many processes you can perform that add new musical information to the original sound that might be higher in gain but that you want to preserve for downstream processing until you're ready to actually "print" and quantize the final product, at which point, yes, 24-bit will be more than enough.


A microphone, preamp, and ADC that have > 144dB of dynamic range are extraordinarily rare outside of lab equipment.

Physically, you are talking about signals measured in the tens of nano-volts (or nano-amps, depending on the mechanism of the microphone).


The latest Zoom field recorders support 32-bit float recording and achieve a wider dynamic range than that (upwards of 210db) by having a circuit with two different ADCs.

Also, it's less about the absolute resolution, and more about the ability to boost the gain, often by a lot, while still having a wide and useful dynamic range after the fact.


32bit FLOAT is 24 bits of audio data, and has the unique advantage of being the typical internal processing format of plugins and audio workstations, due to the desire for headroom while processing to avoid artifacts from overflows of various kinds, among other things. (Upsampling is also common before performing calculations that may alias or mirror.)

The stacked ADC approach does allow for a wider overall dynamic range, but I would be skeptical of the physical transducers capabilities in using it all, but the 32-but float format here appears to be largely about lossless transfer to a DAW such that further processing is lossless, as we are not converting from int to float and back.

This format could then be said to be useful as a master media format.

The article in the OP is about a flac containing a 32-bit int audio stream, however, which appears less useful and not at all related to 32-bit float


Color me skeptical, that's a wider dynamic range than air


DAWs use floating point numbers, which are not applicable to this discussion. They also waste about half the bits most of the time.


Note that 144dB range is what you get from 24 bits per sample. 32 bits gets you another 48dB more.

Some of your ADC/DAC chain can reasonably claim 125dB range. Some amplifiers can claim 17 or even 18 bits above their noise floor - 108dB.

No full-spectrum microphones, headphones or speakers can reasonably claim 125dB without distortion, but if they did you would still want to limit your exposure to "never". Long-term damage begins with long-term exposure under 96dB.


This unfortunately discontinued mic could comfortably exceed your claimed limits:

https://en-de.neumann.com/d-01

156 dB max spl 86 dB snr 130 dB dynamic range

Neumann patented a dual ADC preamp to make this possible.


86dB signal-to-noise.


As noted in my third para :~)

Additionally from the product page:

"Unprecedented fidelity and detail, 130 dB dynamic range"

I've just realized that the product page I linked is extremely unusual for stating a dynamic range value.

However this makes sense if you read the papers covering Neumann's dual ADC and pre design of which they were justifiably proud. System D was a mid nineties introduction.


One would only have the SNR worth of content inside those 130db, was the point I think that person was trying to make. Real world physical transducers will have physical limitations of some kind, not all of which must be simultaneously surpassed to encounter said limitation.


People will use microphones to record audio like gunshots and explosions for use as sound effects. The humble snare drum will produce loads of dBs too, and is usually close-miked. You then take the same microphone and use it to record something much quieter. You end up with microphones that definitely do reasonably claim >125 dB range. The TLM102 claims something like 130 dB. How much you care about distortion will depend on the situation.

My thought is that it would be very hard to get that 130 dB range all the way from an audio source to your ADC, and it would be very hard to get it back out all the way to speakers again.


>The TLM102 claims something like 130 dB. How much you care about distortion will depend on the situation.

Neumann makes no such claims. They do claim max spl of 144 dB. But the TLM and is 10 dB less and this is prior to ADC of variable ability. Distortion isn't a ceteris paribus value for signal to noise calculations.


https://en-de.neumann.com/tlm-102

> Due to its enormous dynamic range of 132 dB [...]

Not really interested in dissecting what this means, or how to reconcile it with the SNR numbers. Just saying that Neumann does, in fact, make this claim.

Not sure what point you're making about distortion.


I could see it as part of a pipeline/workflow where having that dynamic range lets you not worry about losing meaningful information through everything, even if you're going to compress it and take it back to 16bit cd quality at the end. Being able to losslessly store the results of each step in a FLAC file should be better than raw data since it'll be compressed and easier to manage then. That said, at the edges of that I totally agree even 24-bit can be questionable there since a lot of the analog side doesn't have a noise floor that would let it be meaningfully used.


> That said, at the edges of that I totally agree even 24-bit can be questionable there since a lot of the analog side doesn't have a noise floor that would let it be meaningfully used.

It's not that hard to beat 16-bit, which is 96 dB, using easy-to-find, off-the-shelf equipment. One example scenario is that you are recording something but you don't have a precise idea of how loud it will be ahead of time, so you record at low levels and rely on 24-bit capture to give you headroom above and noise floor below. Trying to capture at 16-bit can, in practice, be annoying and difficult because it is more likely that you will ruin takes by setting the gain wrong.


The 55% compression ratio (or whatever the case may be) seems much more useful at the end (where it nearly doubles how long a consumer can listen before swapping media) then along the way (where it nearly doubles how much raw material a studio can capture before swapping media).


Even if you did have a recording setup like that I'd expect it to work better with float samples.


I don't know that 32-bit audio is exactly common.

As far as human hearing goes, 24-bit is already overkill. Even 16-bit should be an imperceptible downgrade for anything with a naturally low dynamic range (i.e. most music).


And iTunes still does not support FLAC. .


Sadly still no support for variable blocksize.


I made my own encoder and experimented with variable block size (with dynamic programming) plus a massive amount of brute-force search for LPC parameters. But the amount of compression gain I could achieve over libflac was small and not worth the hours of encoding time. https://www.nayuki.io/page/benchmark-of-nayukis-flac-encoder


Sometimes encoding time doesn't matter. Archiving is done once in a lifetime. I encode everything with "--lax -8Vepl32" which takes ages but since it runs as a low priority background task I don't care. Also, you can use heuristics to choose the right blocksize instead of brute-forcing it. If the algorithm is clever enough you might achieve slightly better results at almost zero costs.


Interesting. Do you have numbers to show that variable block size can be done with "almost zero cost" as you said?


I think "If the algorithm is clever enough" is a big weasel phrase just like https://wiki.c2.com/?SufficientlySmartCompiler . Having studied and written many algorithms, I have to conservatively assume that a clever algorithm doesn't exist unless it is proven by construction.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: