Hacker News new | past | comments | ask | show | jobs | submit login
MP3 vs. AAC vs. FLAC vs. CD (2008) (stereophile.com)
146 points by thefilmore on Oct 1, 2023 | hide | past | favorite | 371 comments



Let's go over this one more time:

- Q: Can you hear the difference between CD-quality lossless audio and anything higher fidelity? A: No, no one even has the biological ability to. 44100khz, 16-bit audio can perfectly reproduce audio as far as we can physically tell. The only reason to store anything higher is for production or archiving (that is, for computers to listen to).

- Q: Can you hear the difference between 320kbps MP3 or the equivalent, and CD-quality lossless? A: Yes, this is _theoretically_ possible. However, many well-controlled listening tests have been performed on this subject that all say no, so it's much more likely that you can't, and the burden is on you to prove otherwise with an abundance of evidence.

e.g.: https://downloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP384... https://www.researchgate.net/publication/257068576_Subjectiv...

The listening test linked in the article leads nowhere, I would have liked to see their methodology.


Some time ago - though not as far back as this article was published - we did an experiment at a conference that we held in the demo facilities of a Very Well Known Audio Company.

We played a range of snippets of music - rock, classical, electronic, pop - at various qualities over what was quite possibly the best sound system in the world.

The audience was a significant number of record label executives, distribution execs and general audio/music industry experts.

We played pairs of the same snippet and asked people to tell us which was higher or lower quality.

One person got them all correct. Turned out he’d mastered one of the early tracks we played so had a good reference and then used that as a baseline for the others.

Everyone else it was completely scattershot.

It wasn’t a controlled experiment but it was definitely interesting.


This is why an ABX test is the way to go. If you don't know which version of a snippet is "right" there's no way to objectively say which one is "better" - maybe you like the distortions (cf the famous vinyl "warm sound").


wait, but if he could distinguish them from having mastered one of the tracks, doesn't that kind cast doubt on the "no human could possibly ever distinguish these things" mantra? as an aside, am i alone in feeling like the absolute certainty and hyperbole with which this argument tends to be stated potentially has a bit of a "doth protest too much" vibe to it? i definitely know it is a minority view, and that is ok-- i'm happy to be wrong and skeptical.

skeptical though i may be, i'm definitely not here to say that "audiophiles" aren't charlatans or anything like that, for the record. and while i don't totally understand the setup you describe in the sense that i don't get why insider knowledge on one track would tip all the rest of them (were the HQ tracks played either all first or all second, or something?), wouldn't one person's ability to completely discriminate between the two encodings seem to be very strong evidence that it is possible to tell the encodings apart? the kinds of differences between master recording and 16bit 44.1 kHz are exactly the kinds of things that would give away which encoding is higher quality, no?

i feel like there is this moving target thing that goes on sometimes, where the strong argument made loudly is "no human can tell the difference", and then the tests are more like "most people can't tell the difference between things that they have no reason to be attuned to well enough to have any chance of picking up these subtle differences.

forgive me if i misrepresent, or come across like some sort of audio quality chauvinist-- i ask all of this in earnest, and without having a strong opinion one way or the other.


> asked people to tell us which was higher or lower quality

This test didn't measure what you probably wanted it to measure.


What did I probably want it to measure?


My guess would be, if there was a perceivable difference in quality between sample pairs.


Genuinely interested to know why - without being present - you think this would not have provided some sort of measurement of this. While it wasn’t scientifically devised it also wasn’t just “play a couple of things”.


The problem is that "what sounds better" isn't the same question as "which one is uncompressed".

The original CD version might have some high frequency stuff that's just on the edge of perception that you don't really know is there, but you can just sense a bit of discomfort when listening to it. After going through the MP3 process and that high frequency is removed because it contributes the least in reconstructing that signal, the resultant decompressed signal might sound "better" even though it's not the original, because you get a high quality reproduction but the thing that led to a slight discomfort when listening has now gone.

In this case, the sound engineer got them all right because he could tell the difference and knew what it was supposed to sound like. The rest of the people maybe could tell the difference or maybe couldn't (which was the claimed result of the test), but in fact, even if they could tell the difference, they had no idea which one was the uncompressed one and voted on which they thought sounded best.

As another comment has noted, it'd be a much better test if there was a "they sound the same" option as well as asking which one sounds best.


Without being present there, I can't analyze actual execution of the test, but test methodology can still be evaluated.

Playing A & B samples and asking which one is better/original requires much more from the listener that just hearing a difference between the two. It is possible to hear the difference, but not know which is which as that requires additional knowledge.

To avoid this issue you could:

Play (in random order) original twice and processed once and asking which one was different/processed.

Or play two sequences (in random order), [original, original] and [original, processed] and ask, if processed was in the first or second sequence.

Second option might focus better on short-term memory, because it has shorter sequences (2 samples vs 3 samples per sequence).

This would produce a better measurement of whether the difference is audible or not.


We actually did something more along the lines of your first suggestion - we were asking for difference not specifically “which is better” because actually “better” is entirely contextual (and actually this was part of the point - that some people actually prefer lower quality audio because it’s what they are used to). I think (it’s a long time ago now) that we played each set which was two or three samples twice and rather than asking which was “original” I think we asked them to score for clarity, richness and which they preferred - and then to make a guess about which was a higher/lower quality source file.

Like I say, it wasn’t a rigorously scientific experiment but it was in the context of a conference about evolution of audio standards and what that meant for audio delivery from labels/distributors to DSPs.


Some time ago - just a couple of years ago, I decided to get back to the roots and to listen some Iron Maiden. After a visit to the Brown Sector I made out with the full discography in FLAC but also I got a copy of Powerslave in AAC ripped from iTunes.

And all I can say what I really, really hear the difference.

Because iTunes version is mastered to sound good in iPods so it has a quite noticeable bass boost all over the album, which is extremely noticeable on my 2.0 acoustic which itself has a good bass boost, so this version sounds quite muffled compared to the FLAC version from the CD (CP32-5043).

But on the go, with my CX300-II / CX3.00 there is no noticeable difference.


So you can hear the difference between different masters and not necessarily the difference between different levels of uncompressed bitrate.


This is exactly why vinyl actually does sound better sometimes. Because of the physical limitations of viny, vinyl masters were less affected by the "loudness wars" and they have greater dynamic range.


> and they have greater dynamic range

If you mean 'better used' then yes, if you mean 'vinyl has a greater dynamic range' then...


I mean the vinyl masters have greater dynamic range the the digital / cd masters. Because of the loudness wars.

I do not mean that vinyl is capable of greater dynamic range then digital. Of course not.


Exactly what I said.


I can hear the difference between Ignition and the Ignition remix.


Unless I'm reading it wrong, your second source does very much imply some people can tell the difference quite reliably. As expected, regular people can scarcely tell the difference, but musicians are better at it and sound engineers are in fact quite accurate.

This matches my own experience well: most of my friends do not care about various levels of compression, nor what headphones they use - that's fine, I'm glad they're enjoying art in their own way - but I, and some others, do in fact stand to benefit from less compressed audio.

I've personally done blind tests on myself using a python script that randomly plays compressed and uncompressed snippets of the same track and mp3@320 was not transparent to me (though opus@256 was).

Can I tell the difference when casually listening? I don't know, but when the cost of lossless is having my music collection take 60gb instead of 20gb on my 512+gb device, I have no reason not to go for lossless.


Examine Figure 1 - The key is the 4th and 5th columns there, CD/256 and CD/320. The results show no significant ability to discriminate between them.


The thing about being or not being able to point out differences in audio quality is that it all boils down to pattern recognition. If you know anything about pattern recognition, you understrnd that you can't have pattern recognition without prior training through provision of tagged samples of such patterns.

If you would give high quality audio experience, to a person that has been listening through 80s general store headphones, to low quality radio rips on magnetic tapes, you might be surprised how few people are going to describe one as "better", without prior description of work and technology required to produce each experience.

And one would be even more surprised by how many people choose the cassette tapes because of nostalgia and a long time satisfying experience.


isn't perception itself a matter of mere pattern recognition? hearing? the whole point is that you can hear the difference. whether or not it sounds "worse"... is certainly debatable, but is definitely a value judgment. and the burden of proof is definitely on the "it doesn't matter side" to prove that a lower fidelity version is "better" than one truer to the original master.


I've done the same blind test with decent but not amazing headphones (HD590) and I could tell the difference all the time as long as the music was slightly complex.

If loudness was artificially boosted, I had a harder time but could still often tell. I think the sound engineering of the music played a big role and a lot of modern music isn't mixed with complexity in mind.

I also go for lossless ;-).


There is one situation where 44k/24 bit and 88k/24 bit CAN sound appreciably different, and that's when aliasing is introduced into the recording, mixing, or the sample rate conversion.

If proper precautions are not taken during the recording/mixing/mastering phases aliasing artifacts can be heard in the recording. This may account for the differences that some people hear when judging whether there are differences between the two. Higher sample rate files are more permissive of aliasing and exhibit less perceptible artifacts. So you're less likely to hear it at a higher sample rate.

The artifacts of aliasing manifest as inharmonic distortion that starts at the top octaves and then folds back into lower frequencies as the effect is intensified. This can be easily perceived by most listeners if it is pointed out to them. It is not a pleasant effect like first-order or second-order distortion. It does not compliment the record at all.

That said, if proper precautions are taken to mitigate latency artifacts during the record-making process then a listener shouldn't perceive any difference between a 44k and an 88k record. The best case scenario is often a record that's recorded, mixed, and mastered, at high sample rates, even if it's ultimately be down-sampled to CD quality (44 kHz).


So if you had an 88k recording, you could run it through a well-known anti-aliasing filter to create a 44k recording that sounded the same?

So the only situation where 44k and 88k can sound wrong is if... the 44k file is different and wrong?


If you think about this "aliasing" as in, what occurs in 3d graphics, then you can understand this. What these 3d fiters do is either remove infirmation with blur (FXAA) or use information that is not available in the image (MSAA and derivatives)

In audio recording, sampling at 88k would be like generating MSAA x2 image, so it can be displayed with higer fidelity, despite the outgut resolution being in lower 44k sampling rate.


Aliasing can actually occur in the capture stage if the setup isn't right. Think of moire patterns in a video with highly textured subjects. It can happen whenever a signal is sampled.

The mastering discussed higher up in the thread is going on ahead of time at the studio, not on your playback system. The whole mastering pipeline starts with some initial capture resolution from microphones/cameras. The studio processes these original raw captures into a combined form and prepares the distribution format, i.e. a planned audio/video stream resolution. The studio can use different resolutions during capture, processing, and final distribution.

Generally speaking, the highest rates would be easiest to work with and avoid perceptible artifacts. But practical tradeoffs are made to save cost whether in processing, transfer, or storage.


The point is that because of sampling, order of operations can matter. So having a 88k file -> apply an effect -> downsample to 44k, can sound different than having a 88k file -> downsample to 44k -> apply an effect.


This is an important point. The main reason that pro audio gear pushes bit depth and sample rate up to higher that 16/44.1 audio is because when you start doing the floating point math to mix and apply effects to audio you can end up with audible differences when multitrack recording. In this case (and I still think it’s optional for all but the most demanding recording of live performance) higher sample rates can help and to a lesser degree but depth can give you more dynamic range.

I give that long preamble to say once a record is done and mastered, having > 16/44.1khz is wasted bandwidth.


You can verify this by mixing to mono or splitting stereo and inverting the "after" and mixing them back into the "before".

If you get silence, they're perfectly identical.


Being pedantic here, but since it's on topic. This only applies for non-linear processing, which is most of what we want to do when mixing music. But not exclusively.


Where would “taking a 44k track and up-sampling it to 88k (audio DLSS?), applying effects, and then downsampling back to 44k” fall between those two points in the spectrum?


The downsampled 44k that went through a half rate filter might actually sound better, for that matter. The speakers won't try to reproduce the content above 22khz then.


Even in professional audio environments, monitor speakers' electronics are generally designed to filter out such high frequencies.

The "frequency response" spec listed on speakers will tell you what range they are designed to reproduce. Typically, it's approximately 20Hz to 20,000Hz to match human hearing, perhaps with a higher floor if the speaker is designed to be paired with a subwoofer. This range is usually a deliberate (and sensible) limitation imposed by the electronics, not necessarily the materials or the magnet design, etc.

Some speaker manufacturers will list abnormally low or high range numbers on the spec sheet in an attempt to attract customers who mistakenly believe a wider range means the speaker is better. But even those speakers have a steep roll off curve at the extreme ends of the range, so it barely makes any difference.


Even pro audio manufacturer don't seem to lowpass their loudspeakers. Compounded with aluminium dome tweeters having a breakup resonance around ~25 kHz, the problem /could/ exist; but since we don't hear above 20 kHz (and not much above 18~19 for most of us), the only possibility would some kind of mythical modulation I've never seen measured in real world design.

Example: https://www.soundandrecording.de/app/uploads/2020/10/8361-FR... from https://www.soundandrecording.de/equipment/genelec-8361a-3-w...


About 20 years ago I bought some live concert DVDs but the band did not release an audio CD version of it. I wanted to listen to it in my car so I ripped the audio from the DVD but when I made a CD it sounded a little off.

Then I learned that DVD was 48 kHz and not 44.1 kHz so the conversion program I used didn't account for this. I went back and used a polyphase filter to adjust the sampling rate. It sounded normal again but there were some audio glitches at various times.

I went back about 10 years ago and ripped it again and converted it to FLAC which supports multiple sampling rates like 44.1, 48, 96, whatever and now everything sounds good.


Correction: In the last paragraph of my comment I mistakenly typed "latency artifacts" when I should have said "aliasing artifacts".


Aliasing is also icky if you go from 44.1kHz to 8kHz (for phone systems), a 48kHz source would be better for that.

But 44.1kHz worked better in the lead up to CDs (works for modulating onto video tape in PAL and monochrome NTSC), so it won until DVD audio brought 48kHz to the masses.


> 44100khz, 16-bit audio can perfectly reproduce audio as far as we can physically tell.

I agree on the kHz (as well as on MP3), but I deeply disagree on 16 bits.

Because yes, if you keep your headphone volume at a single reference level and never turn it up, then 16 bits is fine. This is very much proven.

BUT this ignores the fact that people often turn up the volume a ton to hear the quiet part of the classical music, or on that YouTube video where the volume is inexplicably 5% as loud as it should be.

So in practice, 24-bit audio allows you to retain perfect fidelity even when you have to turn the volume up. 16-bit doesn't.

I don't understand why nobody ever talks about this. (Or why you have to install special utilities on your Mac to be able to turn up the volume to 200% or 400% in order to listen to those YouTube videos that are maddeningly recorded at 5% volume.)


You're right, it's true that 24-bit reduces the noise floor and extends the dynamic range available. However, 16-bit audio already has a range of -96db (for reference, a quiet recording studio typically has an ambient noise floor of around -60db). In practice, this is beyond the noise floor of even the very best hi-fi systems. As you turn the volume up, you will start hearing the noise floor of your equipment long before you hear the noise floor of 16-bit audio.

Unless you mean that 24-bit allows for representing audio that is stored at an extremely quiet level at the peaks, wasting most of the dynamic range. That would make more sense - but if audio is printed in such a flawed way, I would expect other quality issues to be present as well.


"the effective dynamic range of 16 bit audio reaches 120dB in practice" https://people.xiph.org/~xiphmont/demo/neil-young.html#toc_1...


That analysis misses that when you dither you sacrifice effective sampling frequency for dynamic range. 44.1KHz/16bit can represent that dynamic range, but it can't represent that dynamic range at a 44.1KHz sample rate.


It doesn’t need to. The ~80dB of dynamic range that a human ear can theoretically heard is at fairly low frequency of ~2-4kHz. Dynamic range drops off considerably at higher frequencies.

In fact, the upper limit of ~16kHz is defined by the intersection of the “threshold of pain” power curve and the “threshold of hearing” curve. So the human ear has zero dB of dynamic range at the upper frequency limit.


Ok, but what's the shape of those curves? I could believe that you can dither to adequate dynamic range and still have a high enough sampling frequency across the entire frequency range, but you'd have to actually do that calculation and show it. Also we don't just listen to pure tones - if I have a passage that includes both 12kHz frequencies and 4kHz frequencies with a bunch of dynamic range, are you going to be able to dither that without losing the high part?


Why does dithering sacrifice "effective sampling frequency"? You're just adding extremely small amounts of white noise to reduce quantization distortion or (in more advanced cases) noise with a power spectrum that puts the power of the noise mainly in parts of the audio spectrum that humans hear poorly.


This is not that (or it's an extreme special case of that); I'm talking about the thing the grandparent link is suggesting, representing amplitudes below 1 by having the sample sometimes be 0 and sometimes be 1. If you represent a waveform that should be 0.5 0.5 0.5 0.5 0 0 0 0 -0.5 -0.5 -0.5 -0.5 0 0 0 0 by doing 1 0 1 0 0 0 0 0 -1 0 -1 0 0 0 0 0, then yes you've increased your effective bit depth by 1, but you've halved your effective sample frequency.


What I described is what dithering always means in the context of audio applications.


I haven’t done the math, but I wouldn’t be utterly shocked if undithered 16-bit audio, cranked up some silly amount (such that full scale is 130dBA perhaps) has an audible noise floor.

This is consistent with my other comment about badly encoded MP3 being far from transparent.


It takes a certain amount of analog circuitry to bring a signal up to 130dBA full scale.

Doing the math on the analog bits, the engineering data indicates the analog noise will always end up greater than the 16-bit floor.

"One day" I built an analog preamp which had lower noise than a CD could reproduce anyway.


We’re talking about bad audio, though — setting full scale to 130dB doesn’t mean that the system can reproduce 130dB accurately or even hit 130dB at all.

In any case, while 130dB is a bit excessive, a highly efficient speaker (e.g. the old Voice Of The Theater) connected to a good modern amplifier and DAC could easily be cranked to 118dB with a noise floor that’s, at least in principle, inaudible in any normal room.

(I’m not saying this is a good idea. But seriously, check out the performance of the top amplifiers at audiosciencereview.com — these things have ridiculous performance and aren’t even that expensive. About 120dB SNR at over 100W is something you can just buy, for about $1500.)

(I’m also not claiming anything about linearity of the system or of people’s ears. But I can imagine 16 bits being put to better use in a well-considered floating point system than as plain linear PCM.)


Yeah, 18-20 bits make sense in loudly tuned cinemas.


There's nothing inherently weak about the fidelity of 16-bit audio on its own. PC audio subsystems don't deliver the full dynamic range on a single audio channel by default. They reserve headroom so that they can mix additional audio sources with less risk of clipping. Audio players that let you increase the volume beyond 100% are just letting you use the full range.

None of this is relevant to a real, dedicated music playback system that doesn't contain a digital mixer. You can't hear noise at -96dB. Your amplifier will swamp that with it's own internal noise sources. In the 80s the audiophools loved to complain that CDs were too quiet because their beloved LP noise was supposed to be desirable for some whack reason.


Most professional (i.e. cinema or public address) amplifiers have an SNR of around 108db. A properly functioning amplifier is never a likely source of noise.


People talk about dynamic range compression all the time.


We AB tested 16-44.1 and 24-96 versions of some really good classical recordings recently - you need good listening equipment (ears and electronic) but undoubtedly the dynamic range and top end (particularly) sounded better. It really depends on the listener, the source, and the equipment.

A few years ago I did lots of AB testing with some Sony xm1000w3s (Sony LDAC) and Tidal Hifi with some 24bit masters and it was an incredible experience that changed my mind in the whole "640K.. 16bit is enough" argument.


I love how you can pull out 100 studies and side by side comparisons of recording tools/listening devices much more precise than the human ear that all show this as being flim-flam; and still "audiophiles" will convince themselves to spend 5-25k on specialty equipment that has no effect on their experience.

You're better off spending your money on a bog standard DAC/AMP (feel free to opt for tube even, if you insist) combo running through a pair of decent headphones off of 320kbps MP3/AAC (or FLAC, if you insist) source. Even, if we took your subjective insistance that this specialty equipment improved your experience by .00001%, it's probably not worth the 500-1500% increase in expense.

As to your specific example, I can guarantee you that your Bluetooth codec (LDAC or not) introduced far more sound artifacts than the difference between 16 and 24-bit sound.


Placebo is one hell of a drug and has an effect much larger than .00001%. It might not pass an ABX test, but it absolutely does sound better to those who want it to sound better.


I would be interested in those studies if you can link a few of them.


Was your test blinded? I guess there is a chance you are an outlier but blind tests like this one don't support what you are saying.

http://archimago.blogspot.com/2014/06/24-bit-vs-16-bit-audio...

>In a naturalistic survey of 140 respondents using high quality musical samples sourced from high-resolution 24/96 digital audio collected over 2 months, there was no evidence that 24-bit audio could be appreciably differentiated from the same music dithered down to 16-bits using a basic algorithm (Adobe Audition 3, flat triangular dither, 0.5 bits).

>Furthermore, analysis of those utilizing more expensive audio systems ($6,000+) did not show any evidence of the respondents being able to identify the 24-bit audio. Those using headphones likewise did not show any stronger preference for the higher bit-depth sample. No difference was noted in the "older" (51+ years) age group data (not surprising if there is no discernible difference even with potential age-related hearing acuity changes).


The 24-96 is different master, some sound engineer just had a field day in the studio and produced a better mix. Repeat the test with a 16-44.1 version downsampled (use something like sox with the ultra high quality resmapler) from the 24-96 version and I guarantee you will not be able to spot any difference compared to the "true" 24-96 version.


I understand the theory of this, but cannot reconcile it with an experience I had in person a decade ago.

In the acoustically prepped monitoring booth of his recording studio, a friend of mine tried to give me an ABX test of 24bit 96 kHz recording and its 16bit 44.1 kHz rendering that was supposedly done right. I heard the difference and easily picked the high-rate one that sounded more life-like. With my best effort, I described it as having a clearer high frequency spectrum, while the other sounded muffled in comparison.

I am left wondering if the 44.1 kHz file wasn't actually rendered correctly with dithering, or if my friend failed to actually get his studio equipment to play it back correctly. I.e. was some overly aggressive low-pass filter done during the conversion or during the playback.


As you said yourself, it was most likely a rendering issue. A bad low-pass filter would have attenuated the high-end when converting to 44khz. Also, this is afaik the reason why all modern audio uses 48khz, you get a little bit more head-room when designing a low pass filter and you can even choose a less aggressive and perhaps less computationally expensive one that still won't have an effect on the humanly perceptible frequencies.


I think the reason a lot of modern audio is at 48k has more to do with it being accompanied by video, which has independently settled on sampling rates of 48k, 96k, etc.


How do you know that the 24/96 and 16/44 came from the same masters? If this isn't controlled for then of course the result might be different.[0]

Also, what is xm1000w3s? I can't find any record of this so I'm guessing maybe it is referring to the WH1000XM3 headphones? Given ldac is also mentioned this seems a reasonable guess as it's a bluetooth model. If that's the case I wouldn't call it "good listening equipment", the default frequency response curve of the wh1000xm3 is incredibly bad, it's barely worth listening to classical music on without using AutoEq[1] or something equivalent (I have a pair and it's much worse than my old Ath M50s which were like half the price). The bass heavy curve of the headphones is far more noticeable than any difference between 16/24 bit audio would ever make.

[0] https://people.xiph.org/~xiphmont/demo/neil-young.html#toc_d...

[1] https://autoeq.app/


I'd rather trust solid hearing biology/physics plus all the other failed tests

> the effective dynamic range of 16 bit audio reaches 120dB in practice [13], more than fifteen times deeper than the 96dB claim.

> 120dB is greater than the difference between a deserted 'soundproof' room and a sound loud enough to cause hearing damage in seconds.

> 16 bits is enough to store all we can hear, and will be enough forever.

https://people.xiph.org/~xiphmont/demo/neil-young.html#toc_1...


>> the effective dynamic range of 16 bit audio reaches 120dB in practice [13], more than fifteen times deeper than the 96dB claim.

> 120dB is greater than the difference between a deserted 'soundproof' room and a sound loud enough to cause hearing damage in seconds.

> 16 bits is enough to store all we can hear, and will be enough forever.

Correct me if I'm wrong, but isn't 16 bit = 120db about the levels of gradations of sound? Even a 4 bit = 16 levels of sound pressure/SPL could go from 20db, 20+12.5=32.5db, 32.5+12.5db and so on until 120db.

Then, the important question is what's the minimum SPL difference perceptable (at a given spl level). That may well not be 1db.


That's not how it works. Each bit of sample size yields about 6db of SNR. If you amplify a source to 120db SPL that was recorded with 4-bit samples the quantization noise would be about 96db SPL.


Decibels is a logarithmic (and relative) unit, not linear. So each bit represents 6db or a doubling of amplitude.


We also can't see more than 60 fps according to so much research. And why would we want 10 bit screens?

I checked out the link, and the Sample 2 file does not represent any wave and is not audible, so the article contradicts itself.


What exactly do you mean by "see more than 60fps"? It's possible that 60fps video with full temporal antialiasing and low to moderate motion speed could fool untrained viewers, but if I'm allowed to move my eyes I can tell the difference between high frame rate video (simulated with strobing LEDs because of lack of suitable video hardware) and real-life motion well into the thousands of frames per second. This isn't an unusual ability:

https://journals.sagepub.com/doi/10.1177/1477153512436367

Note that 2kHz flicker requires 4000fps to be displayed as video.


I think people are also equating apples to oranges here. Vision is analog. There is no "DPI" or "FPS" that human vision can see. Some types of motion the human eye can perceive at thousands of "frames" and others it can only perceive at 60, some colors (green) and contrasts it can distinguish extremely fine detail in and other's (blue), it cannot. Ultimately it's variable and non-digital so it's never going to equate to some strict terms.

The audio, on the other hand, that reaches your ears comes from an analog source, even if it ends up digital in between. There aren't some resolution arguments to be made here, all that matters is that the output device can accurately reproduce the proper analog signal. Which has been proven time and time again, and that any simplification of said signal is imperceptible to anything but the most finely tuned listening devices (or maybe some special "golden ears" that the vast majority of audiophiles don't belong to).


We would want 10 bit screens because the research indicates that the dynamic range of human vision is around 90 dB or 1:1,000,000,000, which is alarmingly higher than even 1:1,024

https://en.wikipedia.org/wiki/Dynamic_range#Human_perception

If all research is wrong, I'm gonna start drinking vinegar and building perpetual motion machines :P


According to Pantone, "Researchers estimate that most humans can see around one million different colours". So research says we only need 7 bits.

"Research".. sponsored by corporations, and peer-checked by scientific voting rings. A bunch of incrowd elitists who like to use jargon. Science and politics these days are pretty similar


The 7 million is probably how many different hues we can see. We can see many more different brightness levels.


The article that mentioned it already includes all the different components, lightness etc


Are you both talking about the same thing? Is dynamic range the same thing as 'number of colors'?


I think a lot of your objections to 'science' are due to basic communication misunderstandings and taking things you heard second hand at face value as 'science'. It would probably help to decouple yourself from the notion that a pared down snippet heard from a journalist or on a website is actually what the studies are saying.


Where does this “can’t see more than 60 fps” rumor come from?

It’s trivially refutable by placing a 60 Hz strobe (e.g. old fluorescent light or even some aftermarket headlights) at the corner of your vision.

Also, for interactive systems, 16 ms is a large chunk of our reaction time. You need close to 1 ms response times (1000 fps) to approximate pen and paper.


I don’t know where it came from.. it was already there in the CRT times.

A simple google on 60 fps will still show these “scientists” who claim that we can perceive anything higher than 30-60 fps.

“Science” does NOT equal truth.


You seem to be the only one claiming this bit of 'science'. No one else has heard of this claim.


With CRT monitors, different refresh rates very really easy to spot - 60 Hz was very flickery.


Yeah, 60Hz on a CRT was more or less the minimum tolerable refresh rate, and 75-85Hz was noticeably better. And that's just for trying to display a static image without distracting flickering. Displaying smooth motion is a lot harder.


Try to do better than a simple google, maybe you'll actually stumble on real science which would help understand the difference between the linked claims about hearing and yours about vision


The topic of human vision and perception is complex enough that I very much doubt it's scientists who are making the claim that we can't perceive anything higher than 30-60fps. There's various other effects like fluidity of motion, the flicker fusion threshold, persistence of vision, and strobing effects (think phantom array/wagon wheel effects), etc, which all have different answers. For example, the flicker fusion threshold can be as high as 500hz[0], similarly strobing effects like dragging your mouse across the screen are still perceivable on 144hz+ and supposedly 480hz monitors.

As far as perceiving images goes, there's a study at [1] which shows people can reliably identify images shown on screen for 13ms (75hz, the refresh rate of the monitor they were using). That is, subjects were shown a sequence of 6-12 distinct images 13ms apart and were still able to reliably able to identify a particular image in that sequence. What's noteworthy is this study is commonly cited for the claims that humans can only perceive 30-60fps, despite the study addressing a completely separate issue to perception of framerates, and is a massive improvement over previous studies which show figures as high as 80-100ms, which seems like a believable figure if they were using a similar or worse methodology. I can easily see this and similar studies being the source of the claims that people can only process 10-13 images a second, or perceive 30-60 fps, if science 'journalists' are lazily plugging something like 1000/80 into a calculator without having read the study.

There's also the old claim [2] from at least 2001 that the USAF studied fighter pilots and found that they can identify planes shown on screen for 4.5ms, 1/220th of a second, 1/225th of a second, or various other figures, but I can't find the source for this and I'm sure it's more of an urban legend that circulated gaming forums in the early 2000s than anything. If it was an actual study I'm almost certain perception of vision played a role in this, something the study at [1] avoids entirely.

[0] 'Humans perceive flicker artifacts at 500 Hz' https://pubmed.ncbi.nlm.nih.gov/25644611/

[1] 'Detecting meaning in RSVP at 13 ms per picture' https://dspace.mit.edu/bitstream/handle/1721.1/107157/13414_...

[2] http://amo.net/nt/02-21-01fps.html


"you need more than anecdotal evidence"

"have some anecdotal evidence"


Apologies, that flippant comment was super rude and juvenile and I am taking it as a signal that I really need a break, I don't normally speak in this way. Yikes.

Also totally fair, I shouldn't be hastily writing comments but I am interested in audio and wanted to share my experiences, naively.

I have some 'hires' recordings that I can't tell the difference between a CD at all and do not care for, and some where I hear more details (on very high end), more separation between instruments - and from what I'm reading it seems more like this is a mastering issue. The difference on some of these recordings enable a kind of subjective 'holographic' spatial effect in me (perhaps the cause of my emphatic response) and it seems I have probably falsely attributed this to the higher resolution as the factor.


i appreciate your apology and i didn't mean it as an attack on you personally, it was just meant to highlight the irony. perhaps i could have phrased it more sensitively


These days "good equipment" unfortunately means:

- Sonos

- Airpods

- Beats


All with Bluetooth compression...

For that price range, Hifiman produces pretty good planar headphones. The edition XS sounds really good.


things that are slightly louder "sound better". How did you control this sort of thing?


Sure, but that's also easy to normalize in a proper test


Why AB and not ABX?


Because the base rate is 50% in an either/or test


But you don't necessarily know they're guessing what you're testing for in the A/B test. If they are answering which one sounds better, some songs will genuinely sound better with a little lossy compression. Did you check to see if any audio samples deviated from the base 50% rate in either direction? For example if 70% of people chose the compressed version of Audio Sample 15, that still demonstrates an ability to discern a difference. It just turns out they like the lossy sample more.

For a contrived example, imagine an A/B test where you have to tell me which image has more red. Image 1 is a dark red panel on the left and a fully bright white panel on the right. Most people would say the left is more red, but in my fictional test it is actually the white panel because (100, 0, 0) has less red than (255, 255, 255).

If you use ABX, people know exactly what they are supposed to be matching.


If you know what to listen for in the high frequencies, you can pick out mp3's pretty easily, provided said mp3's have that sort of content.

The question really isn't 'can you tell', it's 'does it matter', and, well, most of the time, no, it does not, even for lower bitrate mp3's.

There are many people, of course, that don't like the idea of lower quality audio, and they can tell at least sometimes, so they 'dislike' mp3 in general. That's all well and good until they start saying silly things like 'mp3's sound bad', which is not true in any sense.


I’ve personally ABX tested people who swore up and down that they could spot 128 AAC, even, and they couldn’t. Never found anyone who could. I know they exist, but they are rare, and probably not the folks who say they can.


Some of it is people ask the wrong questions. On a loudness-war-wrecked pop song I may not be able to tell 128Kbps from the original, but on specific content I have been able to tell. I'm not even claiming golden ears or anything; some specific audio content is the audio equivalent of visual confetti [1], and anyone can hear the difference, because the codec isn't even close. And let me underline, I mean, anyone. No special claims being made here.

But all in all, that content is relatively rare, and generally transient even in the music they appear in.

[1]: https://www.youtube.com/watch?v=r6Rp-uo6HmI


The giveaway for low-mid nitrate MP3 is the high hats. The lower the nitrate, the more you get a sort of temporal ghosting that sounds like an almost “crunchy” swishy sizzle sort of sound, a bit like a jazz player using brushes, but more lo fi.


I agree, and my hypothesis is that it's exacerbated by the combination of three particular things:

1. It's a high frequency complex waveform with a fast envelope, so it demands bitrate.

2. Drum miking often involves multiple mics spaced apart, so more than one typically picks up any given cymbal with a phase offset, and those mics are panned quite differently, leading to a very "wide" result, i.e., left and right output is fairly uncorrelated as seen on a vectorscope [0].

3. A perceptual codec at a given total bitrate often sounds better when stored as a mid-side transformation (instead of storing a left channel and a right channel, store a L+R "mid" a.k.a. sum channel and a L-R "side" a.k.a. difference channel), also known as "joint stereo" which is a common flag on MP3 encoders, because it allows for assigning more bits to the mid channel (correlated signals) and fewer bits to the side channel (uncorrelated signals). More bits for mono center-panned stuff like vocals is the goal, which is generally for the best, but fewer bits remain available for wide stuff like those cymbals! Contrast with regular stereo mode where half of the total bitrate is assigned to each channel. MP3 below 256kbps typically needs joint stereo mode enabled in order to sound decent.

[0] https://en.m.wikipedia.org/wiki/Vectorscope#Audio


A realistic take on the issues invovled. I never knew what “joint stereo” meant. Great explanation.

If anyone has a good cymbal crash sample at 24/96 or better that they can provide, it seems like it would be a great example for intentional differentiation of various compressed versions.


Vinyl actually encodes stereo much the same way.


FM radio as well: originally one channel, then a second was added, and the second is used for the difference signal (the first already being the sum signal, as with any conversion to mono). Overhauling the whole thing to broadcast left and right discretely would destroy backwards compatibility, and the newly-added subcarrier had worse SNR (thus receivers ignore it until reception is sufficiently strong) so it only made sense to use it for the difference.


Low nitrate MP3 is a fantastic typo.


I was going to say this - cymbals are often very noticeably bad on MP3 recordings.


Well, most classical songs are very well compressible, because not much is going on. Punk Rock or any other music were a lot is happening, at the same time, can suffer very audible from 128 kbit lossy compression. So you can hear lossy compression better in a loud pop song than other music.


I don't agree with this from my own experience. To me, classical music at high compression suffers far worse than modern bands.


My unscientific guess would be that classical music might have wider dynamic range than “normal” music. So the same compression amount affects the one with more range first (classical).


Higher dynamic range and typically also more 'pure'. The introduced compression artifacts stand out more in simpler waveforms than in wavevorms that are an addition of many more layers of sound.


Your guess would be correct [1]. Some samples [2].

[1] https://www.mdpi.com/2624-599X/4/3/42

[2] http://www.harmjschoonhoven.com/mp3-quality.html


A typical classical symphony requires 50-80 instruments, several dozen of which might play at the same time, while a typical punk song has maybe four.


Compression algorithms don't care about instrument counts, they care about the complexity of the signal. So there being a total of maybe hundreds of instruments doesn't matter, it's only about how many are playing at any given time. There might be a few dozen instruments playing at any given moment in a classical recording, but they're all highly tuned instruments with several of each one playing just about the exact same note.

Grungy rock music might only have a few instruments, but they're often purposefully highly distorted and have people pretty much screaming and shouting, leading to the actual sinal being closer to literally noise.

So the closer you are to literally noise, the less compressible your signal is.

Imagine a an image with a dozen sharp, clear, colorful squares. Now imagine a similar resolution image with only 5 colors, but they're different shapes and they're kind of fuzzy and they're really more like gradients instead of a pure color. Which is going to compress easier?


This is an interesting argument, but not convincing. You are essentially claiming that the information content of a classical recording is low, and could be replicated with fewer, simpler instruments. However, I suspect that many of the interesting subtleties that make a recording sound “beautiful” would be lost that way. Surely such nuances are why classical musicians are so passionate about their instruments.

(FWIW, I’m way more of a punk fan myself, and usually find most classical music pretty boring.)


> You are essentially claiming that the information content of a classical recording is low, and could be replicated with fewer, simpler instruments.

No, I do agree having multiple instruments does lead to a wider sound than just a single one. Plus multiple instruments will probably help balance out a single one being out of tune or not quite hitting the note right. But still, these instruments usually are way more tuned to produce closer to pure tones and their harmonic overtones than a guitar going through half a dozen different distortion and effect pedals then through a compressor along with a guy screaming all over the place into a microphone.

Also, when strumming a guitar you're almost always playing essentially six strings at once, playing a whole chord with only one instrument. Meanwhile on a flute or a trumpet or a clarinet or a violin a single player is only playing a single note at a time, a single string of the guitar. So during strumming sections a guitar is almost like 6 instruments, in terms of signal complexity. So a rhythm guitar strumming and a lead guitar picking strings is really almost like 7 instruments played by two people compared to many orchestral instruments.

Just look at these two spectrograms. Look at the rock song where there's a lot of distorted guitar, bass, drums, and singing going on and compare that to an active part of the classical recording. See how the classical recording has a lot more clean, straight lines while the rock song is a lot more fuzzy? Imagine if these were images, which would be more complicated to accurately compress? That's not really a great analogy, but it is touching on the same concept.

Rock song: https://youtu.be/BVsp23B8dWo?t=62

Classical song: https://youtu.be/Txp-pHU2K6w?t=210


128k AAC is quite good, and is roughly akin to 160k MP3. Personally, past 160k on MP3, it gets very hard for me to distinguish bitrates, so I ripped at VBR, averaging at around 200k.

128k MP3s, though, fall apart with more complex instrumentation.


And 128k opus is perfect to my ears. I store all my music in the best possible quality FLAC files, but stream it to my phone in 128k opus. Such a great format, encodes very fast even with my Intel atom and sounds great.


I’m curious as someone with a local library, how do you stream it in a different format than it’s encoded in, from your phone?


Lossy to lossy transcodes always cause more degradation, so it's better to keep your lossy files in their original format without any re-encoding.

Easy transcodes to various lossy formats are the primary reason for a well maintained and curated library of proper FLACs (or ALACs).

The main advantage being that your can fit more great sounding music on your space-limited device. Disk space is cheap, but upgrades on your phone are probably not. Even if your Music collection fits easily, why not have more room for other media / apps?


I used to use Subsonic and one of the many clients. I don't know if Subsonic is still an option, but there are many server side applications, including a few forks, that fill that role. There are others that work with those same Subsonic clients.

It will allow you to choose a codec for streaming that is different than storage.


Plexamp does this automatically if requested.


There's probably an element of the quality of the DACs and speakers you're using too. If it's a subtle difference it's unlikely we're going to notice it being played through some low-end computer speakers.


I have very good equipment and lots of people in my circle who are audio enthusiasts. None of them have ever been able to demonstrate in a blind test that they can tell the difference.


"You" is excellent wording here. Who's listening?

I've heard the resonable assertion that the most gifted audio engineers in the world cannot distinguish 192 kHz sample rates from a raw line feed, but some can distinguish 96 kHz. I certainly can't. I used to build audio equipment. There were legendary "golden ears" that people would drive hours to meet, for design feedback. Whatever they heard was reproducible blind with other "golden ears".

How does this square with the logical assertion that there's a sharp cutoff to our biological ability to hear frequencies? Those tests don't account for our ability to sense the presence/absence of overtones.

Now, computers may want better resolution, not to "listen" so much as to better transform in novel ways. Just as "DogTV" recolorizes for its audience, a computer could make the inaudible audible in novel ways. Reconstructing better 3D sound stages. Accurately reconstructing a singer's facial expressions via AI video, rather than simply splatting out something plausible. The whole point of computers are to extend our reach, coevolving in every conceivable way.


>Those tests don't account for our ability to sense the presence/absence of overtones.

Overtones are frequencies. If you filter the higher frequencies you remove the higher overtones. The only way you're going to "hear" ultrasound is if it's very loud and you hear audio-frequency distortion generated in your own ear, e.g. like when bats are squeaking nearby. But this isn't useful musically, because everybody's ears distort differently.


Ok, so despite anecdotal evidence that some individuals can distinguish better-than-CD quality audio, we're questioning the existence of convincing double-blind studies. Yet we accept the varying cutoffs for what frequencies a person can consciously detect in isolation, as proof that we are incapable of perceiving audio information above those frequencies.

Are people asserting that an ear removed from a cadaver, hooked up to the best available scientific equipment, measures as a perfect biologically derived low pass filter? Or that we even partially understand how neurons work, when there may be quantum effects to be uncovered a century from now?

Intellectual history is a graveyard of models confused with reality.


Your first point stands up to experimental scrutiny, but your second needs qualification: Anyone can be trained to pick up the differences between 320kbps mp3 and lossless formats.

Compression kills the high end, and learning to recognize tell-tale compression artifacts will forever ruin your ability to appreciate streamed music, low-bandwidth wireless audio systems, or just 320kbps rips of music, certain genres faring worse than others.


That is absolutely not true. People can be trained to hear that difference when the testing is not blinded, however in triangle tests that ability vanishes.


Maybe not anyone. I could be biased with sensitive hearing.

I know that in grade school, part of the requirements for joining band, due to the size of my school and overwhelming demand, was passing an audiometry test, where we were evaluated on a few different contexts related to ability to discern detail in audio, such as pitch and volume. I remember being pulled away into the principal's office where some of the test administrators were present, and they accused me of cheating and demanded to know how I did it.

Apparently, I was the only student in the entire state to get a perfect score on that test, at least for that particular year. Unsure if they were implying I was the first ever, but that seems ridiculous to me because passing the test boiled down to just paying close attention.

So I really don't know. Maybe the average person can't hear it, but I know just what to look for in the high-end and usually guess even 320kbps mp3 correctly from my own self-tests wherein I would randomly select between different encodings of a music file. I'm confident I would do well in an administered ABX test if I'm simply being tasked with finding the difference between mp3 lossy encodings and a lossless reference.


I don't claim to have golden ears but I am a picky listener. I've been pretty satisfied with good MP3 encodings like 320kb/s though I'm not sure I'd agree they're indistinguishable.

I had an experience I mentioned elsewhere in this discussion, about hearing a difference between 24bit 96 kHz and CD-quality in a sound studio. I don't know whether the operator screwed up or what. It doesn't mean I don't think CD quality is "good enough". But that's not the same question as "non-discernable".

On the other end, I have had a few encounters with satellite radio (SiriusXM) and I cannot understand how that product exists. In each car I've encountered it, I've had a visceral reaction and wanted it turned off after a few minutes of trying to listen to it. It put me somewhere between anhedonia and dysphoria.


I'm not sure I get you. If someone can't tell the difference in a blind test, that means they can't tell the difference. The result of a non-blinded test is of no consequence.


You said the same thing that I said but with more words.

There are a ton of people in the audiophile world who swear that knowledge of what you’re listening to makes a physiological difference and therefore it’s not the placebo affect. It’s insane, but then so is most audiophile marketing.


There is a clear distinction between audiophile snake oil, and detecting artifacts from a lossy compressed mp3.

You shouldn't conflate that with things like telling the difference between a 44.1KHz and 96KHz sample rate, which is bollocks.

That said, it's a good practice for any archivist to source the highest quality digital versions of albums possible, in case they end up befriending or needing to barter with aliens with much more sensitive ears.


Are we talking mp3 or other codecs here? mp3 has a a couple of encoding ”tells” that a trained listener can pick up on (although it gets increasingly hard to do so beyond 128kbit IIRC). Other codecs don’t even have those & at higher bitrates people can’t pick them out in blind listening tests.


Also many encoders have things like a lowpass filter even on their higher presets - some people (often younger) may be able to hear those frequencies.

It may be that two different people have different results of being able to tell the difference due to physical differences, no matter the "effort" the put into listening, or training.

But most of the time that's intentional, as high frequencies you "can't hear" just waste bits that could be used on something more useful, or even cause noise from aliasing and harmonics, as no actual playback equipment is perfect. And extending the frequency range makes it much harder to design the circuitry. All for something you can't hear :P

It's another reason why high frequency (96khz) playback is kinda useless, or even make things worse, as the extra frequency range that gives cannot be heard by humans anyway, and just gives more opportunity for those higher frequency patterns to cause distortion. It may even be that people can "tell the difference" precisely because of those distortions. But that doesn't mean it's "better", indented, or even get the same result on different playback equipment.


Yeah, can't argue with the Nyquist theorem.


Even 128kit from a modern encoder is harder to pick out than it was 25 years ago. Most of the self-appointed experts proclaiming how woefully inadequate MP3 is are basing their assessment on outdated experience from the distant past.


You can literally point to artifacts in a spectrogram which exist within the range of human hearing with many lossy codecs. It doesn't even have to be subjective.


Sure - these codecs lose information. The clue is in the name!

The question is can you hear those differences, to which the answer is basically no, for a modern codec at high enough bitrate. "enough" is 128kbit or more in blind listening tests.


The answer is not "basically no", and the qualifier of the initial argument was specifically 320kbps mp3.

No one here is arguing that modern, high-bitrate codecs aren't much better at producing imperceptible artifacts. But 128kbps is absolutely not enough in blind listening tests, 128kpbs typically produces a ton of perceptible artifacts. You're just making that up.


Yes, I should have caveated that: 128kbit mp3 from a decent encoder is transparent to average listeners.

To a “trained” ear that is listening out for the differences? Sure, those are perceptible. But those aren’t normal people.


> mp3 has a a couple of encoding ”tells” that a trained listener can pick up on (although it gets increasingly hard to do so beyond 128kbit IIRC).

AIUI there are some things that don't go away at any bit-rate, e.g. pre-echo.


mp3, I mentioned it specifically for a reason.


Agreed on the 44.1k 16-bit is all you need as a delivery medium for a final audio product.

However higher bit rates and sample rates are needed for multi track recordings so that during the mixing stage and mastering, the fidelity is preserved when _math_ causes rounding errors and what not. Unless, you are using nondestructive editing.

As for listening to the final product, i.e. store bought CDs and their equivalent MP3 and AAC rips... I can often hear the difference in specific recordings, no matter the bit rate, because the perceptual encoding schemes often butcher certain recordings.

For example, on RUSH's Red Barchetta from Moving Pictures, there is a synth intro that slowly vamps in volume. Every MP3 encoder (that is normally worth it's salt) I've ever tried encoding that with outputs a garbled, distorted, electronic sounding distortion. It clears up immediately once it reaches full volume, but during the crescendo it falls on it's face.


The MP3 study you referenced uses flawed methodology:

> participants were asked to listen to both versions as many times as needed and to choose the version they preferred in a double blind A/B comparison task

I have trouble finding high quality studies comparing 16-bit to 24-bit audio. This one one is kind of interesting:

https://www.researchgate.net/publication/338989993_Study_on_...


> No, no one even has the biological ability to. 44100khz, 16-bit audio can perfectly reproduce audio as far as we can physically tell. The only reason to store anything higher is for production or archiving (that is, for computers to listen to).

I'm not an expert, but one claim I saw somewhere is that a higher bit width and sample rate is good for people who are mixing and doing audio processing, even where the final result might get downsampled to 44100 hz and 16 bits per sample at the last stage.


24 fixed bit and 32 variable/ floating bit rate masters have more head room that may avoid clipping but doesn’t guarantee that. 48 or 96 kHz is useful for time stretching and maintaining fidelity (maybe other post processing without aliasing).

That is all intermediate formats and doesn’t really say anything about what is best for consumers like the standard mastered cd quality at 16 bit 44.1 khz.

Bandcamp is a cool market because I can download wavs from albums to store on my phone. You can see what people use as masters and its all over the place. There are many 96khz masters around and 24 bit depth is popular.

I have a usb audio IO that supports 192KHz across 8xin+out. Those file’s just clog up hard drives so I figure 96 is good enough for bat music.


Also, I’ll note that I think the amp and speakers are far more contributing than the master file format. And the quality of the master and mix and tracking even moreso.

I’ll run youtube rips of dj sets through some light hardware compressors and preamps and it sounds great. You cannot have specs determine quality.


that's what 'production' in the quoted passage means.


Yes, that's for antialiasing headroom purposes during the production process.


The thing is, the “heard” difference between 320kbps MP3 and CD quality & CD quality and higher resolution formats are not “details” per se.

The audible difference can be described as sound stage size, instrument separation and atmosphere.

The problem is making these details audible needs a good system. “Good” doesn’t mean $10K+ here. Two high quality, large-ish two way bookshelf speakers and a good amp with enough punch (50W+ Yamaha or similar will do) plus a good source in a sizable room is enough.

There’ll be people who can’t tell any difference, there’ll be people who can “feel” it, and there’ll be people who can pinpoint differences. This is because the ear training and biological limits of said people.

I have a friend who can pinpoint a half note (natural vs. sharp) mistake in a 90+ people symphony from YouTube recordings, incl. the instrument. His natural ear resolution is around 1/9th of the tone. He always tunes his instruments via ear and verifies with a tuner. So, this is not impossible.

My ears are not that absolute, but I can divide music to layers and pinpoint details, for example.

Lastly, taking a “diff” of CD quality and 320kbps MP3 version of the same track will leave an audible residue.

There are other comments I left over the years here. Search them for more info. I’m on mobile. I have no practical way to link all of them.


This is the kind of snake oil that double blind studies in controlled reference environments have established is snake oil.

Like for example, the fact there is an "audible" diff is meaningless. The threshold of hearing is not linear nor frequency independent. This is called "masking" and it's exploited by lossy codecs to allow for better encoding as well as audio watermarking. You can add a noise that would be perceptible by itself to content that is entirely masked by the content itself. And the reverse is true, you can remove content without it being perceptible.


This is the first reaction I always get: "This is snake oil, and you don't know what you're talking about". While there are undisputed snake oils in audio and audiophile market, the difference I'm talking about, is not.

The idea of lossy codecs is they filter out the things you theoretically don't hear, yes. However, the presumption that you don't hear these when they are present is not completely true. Because they have a secondary order effects in overall sound.

The audible residue you claim that I don't hear when it's in the CD is the part of the sound which adds this instrument separation and soundstage expansion. Same is for higher sampling rates. While you can't pinpoint the difference with words, it shows itself as smoothness and "richer" sound.

Saying that you can't hear that difference is akin to saying "Human eye can't see faster than 30/60/X FPS anyways", which is not true.

When anyone presented with a lossy-encoded audio file produced with a state of the art encoder and not brick-wall mastered, will be impressed, yes. This includes me, too. However whenever you listen to the same file in lossless or, if present, higher resolution formats, with a sufficiently transparent audio system a couple of times, you start to notice the differences.

There are a couple of caveats in all of this audio business. First of all, you need to know how your audio system sounds and behaves to be able to discern differences. This requires time with the same system for a long time, to understand how it responds. In my case, I have the luck of having the same amplifier (An AKAI AM-2850) for ~30 years. I know how that thing responds to any genre of music, and I know how anything should sound at any quality level. Again, as I aforementioned, you need to do these ABX tests a couple of times back to back, esp. if you don't know the track, to be able to decode the details in sufficient manner. Digitalfeed's ABX test (https://abx.digitalfeed.net) understands this and makes you listen to the same thing 5-10 times according to your available time.

See, I'm an ex-orchestra player. I played in concerts, listened master recordings, and YouTube uploads of the concerts I played as well. I have also listened tons of CDs, MP3s of the same albums, etc. Some of the albums I listen have a captivating sound when I listen to them from CDs. MP3 versions of the same albums do not nail me to my chair, yet I can't leave the CD version of the same album to get a cup of tea. Both are ran through a Yamaha CD-S300 CD player with an iPod interface and MP3 playing capability over USB.

I can also write how CD tracking quality affects audio clarity, but this comment is long enough. In short, Yamaha's old CD-Recorder, CRW-F1 really improved sound quality by abusing Red Book standard by lengthening the pits of audio CDs. It reduced to capacity to 68 minutes, but it was worth it, esp. on lower end CD players.


> I can also write how CD tracking quality affects audio clarity, but this comment is long enough. In short, Yamaha's old CD-Recorder, CRW-F1 really improved sound quality by abusing Red Book standard by lengthening the pits of audio CDs. It reduced to capacity to 68 minutes, but it was worth it, esp. on lower end CD players.

Sorry this one part especially makes no sense. Digital is digital. Either it added more samples per second, or more bits per sample, or it's snake oil. There's a stream of bits that comes out of the reader. There's no residual information about the length of the pits.

EDIT, yeah sorry this is completely and utterly impossible that you are getting better "audio clarity":

"Yamaha tries to attract computer enabled audiophiles with the Audio Master technology. Audio Master promises reduced jitter and decreased error rates for audio recordings via extended pit and gap sizes on the CD-R. This is actually quite simply achieved by increasing the disc rotation speed vs. the laser clock frequency. In other words, Audio Master recording at 8x rotates the disc at 8.2x, thus creating the extended pit & gap lengths. This naturally reduces the capacity of the disc."

Literally they are just spinning the disc faster, reducing capacity and make it slightly less likely that errors will be read. If you're getting read errors on playback, that means your disc is dirty or your CD player sucks. It's the same bitstream, just read at a different linear rate.

If you honestly believe that this is an audiophile concern, I'd urge you to reevaluate a lot of your other beliefs, because they are clearly not all grounded in technical facts.


The problem is not the values delivered to the DAC. It's the aperture error of when the DAC was clocked. This affects what those sample values actually represent. If the clock is being extracted from the pattern of pits and there's a way to reduce the jitter, you will get a more accurate signal.

However, I'd hope that anyone that cares about fidelity has a CD player that does a little more to generate a DAC clock. NCO run by a software PLL or a hardware PLL with a good loop filter are things I've heard of, but control systems is not my specialty.


CRW-F1 didn't encode more information into the pits. It allowed lower end DACs to have more time to switch properly by giving them a slower signal stream within acceptable limits.

DAC's digital part is easy. What differs in quality is the analog part. If DACs were that simple, a 25 cent DAC would power every unit from bottom bin to top tier.

Before that Yamaha CD player I had, I used a lower end Sony CD-Player (I don't remember the model, sorry). Writing the same album, to same brand of CD-R, with the same speed in two different modes created two audibly different disks.

I sometimes challenged myself by writing in both modes, not marking the CD-Rs, and the audio difference was always audible. Even after weeks. 68 minute CDs were always had larger sound stages with more clarity and instrument separation. This is again on the same AKAI AM-2850 amplifier.

I guess this difference would be impossible to hear today, because higher end units have better tracking and better DACs. Also some of them use DAE and use multi-second buffers, so the "slower stream" is no longer present in the pipeline due to buffering.


A CD signal stream fed to a DAC is a 44.1 kHz 16-bit signal, period. All you did was force the drive to spin a bit faster to keep tracking, or let it fill its buffers more slowly if it spun at the same speed. The buffers, after error recovery, are what feed the DAC. Assuming an error-free read on two discs burned with the same data (regardless of "pit length", disc material, etc), you get the same bits in the buffers.

There's no "slower bitstream" for the DAC. That's provably nonsense and you can work it out from basic principals. The same bits would come out of the optical interface of a CD player, at the same rate either way. If the CD player has a built-in DAC, the same bits would get fed to that same DAC either way.

I'm sorry, but if this is truly what you believe, it really puts everything else that you said into question.

To give you the benefit of the doubt, I might say that the lenses or lasers on your CD players are filthy, and you're just hearing skips or noise from poor reads and that a slower-written, borderline-spec disk might just allow them the function better. Perhaps your player was interpolating or concealing frames [1] that it couldn't read correctly and failed to correct via ECC and you were just hearing a poorly reconstructed digital data stream.

This sort of confident incorrectness, ignoring the underlying technical architecture, is probably why people don't believe anything that an audiophile says.

[1] https://www.pearl-hifi.com/06_Lit_Archive/02_PEARL_Arch/Vol_...


I don't know much about the Yamaha technology referenced, but he's not totally off his rocker.

CDs don't record 0s as pits and 1s as lands. A change from pit to land or land to pit is a one, and no change over a time base is a 0.

Therefore, the recording and tracking performance can be affected by the disc content and processing applied.


Sure, I can understand how slightly changing the layout of the pits and lands can potentially reduce the error rate. But if I'm already getting playback with 0 C2 errors, there is absolutely no change in quality from the original signal.


Advanced Audio Master visibly reduces C1 errors in recorded media, to almost negligible levels. While C1 error can be corrected without any degradation theoretically, its result is up to the CD player's capabilities (and quality).

CDRInfo's tests back in the day showed dramatic improvements in C1 levels, see [0]. Considering some of the lower end CD players by leading manufacturers didn't even had 16bit audio decoding and used late stage oversampling, reducing C1 errors was/is a big deal in recorded media.

As I said in my earlier comment, this mode lead to clearly audible improvements in my older, low-end Sony CD player. I don't how how will it fare in my new Yamaha player due to technology improvements.

[0]: https://www.cdrinfo.com/d7/content/yamaha-crw-f1e-cd-rw?page...


Sure, the C1 rates are lower here for one kind of media in one actual comparison, but C1 errors below a certain rate are entirely correctable errors. Meaning, you're not going to lose data when it gets reconstructed. You are not losing fidelity of the CD by having a lower C1 rate in the 20s to 40s; every bit decoded is being played back exactly as originally encoded. Otherwise reading data off a CD would always have significant amounts of corruption every time you installed software. Every time you opened a photo off a photo CD you'd get all kinds of JPEG corruption.

Assuming you have a properly functioning CD player, these errors have zero input in the quality of a CD being played. If you have a noticeable difference in audio quality between a CD with a max error rate of 24 or 32 C1 errors, you've got a tremendously faulty CD player that is complete trash.

Its funny too because in this table it shows the Mitsubishi media performed about identical or better in every speed while being significantly faster at recording. The 1x speed with AudioMaster on for Plasmon is the worst result, ignoring the time they burned 16x media at 44x speed. This table is also challenging to actually compare, because they show different speeds for the different modes (1,4,8x for AM, 4,16,44x for regular) so the only one we can really compare fairly is the 4x. Even then, at 16x speed AM off it had a lower average error rate than 8x speed with it on!

Don't get me wrong, burning a lower error rate from the get go is good, it implies the burn will possibly be more reliable over time as you get things like scratches and other imperfections on the disc. But arguing that a disc with an average of 2.1 C1s vs 1.1 is going to be noticeably different in the sound is absurd. And once again, even then this showed an improvement only in one of the two medias tested. Maybe its better in more media, maybe its worse, maybe there was just something odd with their burns and this is largely just noise in the overall results of burn results from this drive.

I'm still not convinced AM actually did anything but reduce your recording time. Your link doesn't say anything of actually increasing the audio fidelity in the slightest, just that in one straight comparison the C1 errors were lower. Practically every result in that table other than the 16x media being written at 44x speeds is already massively in the "negligible levels." Being below 220 is "negligible", and all of these burns (aside from the one using AM at 1x speed!) are well below that.


I have much of what you describe: 3 listening environments between near field computer speaker setup, reasonably high end home stereo and dac/amp with high end iems.

I have tried multiple times to discern flac vs 320 mp3 across genres. Every time I believe I can figure it out and I consistently fail to exceed 50% (pure chance) accuracy.

Makes me wonder what ultra-linear source gear or speakers would highlight the differences in real-world situations, if at all. But for my purposes I’ll happily accept the roughly 80% file size reduction for no audible difference.


I could hear the difference between 320 Kbps MP3 and lossless WAV in blind tests in my early 20s - though I needed headphones to do it, and it had to be a track I knew well, and I couldn't tell which was which. They were just different, and neither felt worse. This quickly converted me to using CBR 320 Kbps MP3 for everything, on the basis that it would be just as good as the original.

A couple of years later I found an ambient track that sounded messy at 320 Kbps, and that converted me to using flac instead. Disk space had got cheaper enough over that period that it made no meaningful difference.

THis was all years ago now (I'm in my 40s...), and I don't worry about it too much any more. Firstly, I still use flac, so it's identical to the original anyway. And secondly, even if I did use mp3, my aged ears probably couldn't tell the difference even if I turned the volume up to unreasonable levels.


I think you need to spend more time with the systems and the music you have. Because, at least for me, understanding the differences at the first shot is very unlikely.

Brain is interested in the low hanging fruit, i.e. the music and the melody itself, first. The music needs to became mundane or ordinary to be able to listen it deeper for more details. This is when differences can be heard more easily.

Lastly, you don't need perfect systems to hear differences, but understand how your systems respond to the music you're listening to. i.e., your music system's sound needs to be mundane to your brain too to be able to go from low hanging fruit to minute differences you were not able to hear before.


> 16-bit audio can perfectly reproduce audio as far as we can physically tell.

Imagine encoding a sort of real world dynamic range across 16-bits. This would go from 0db to 100db in volume. This would need more than 16-bits which yields an SNR of about 96db. The dB values are different and not comparable but you can see we don’t capture the full dynamic range of human hearing very well.


Humans don't have 100dB dynamic range across their full hearing spectrum. We're less sensitive to high frequencies, which means you can apply high-frequency dithering to improve the dynamic range without adding audible noise.

https://en.wikipedia.org/wiki/Noise_shaping


> Can you hear the difference between 320kbps MP3 or the equivalent

While I agree with the premise this also depends massively on the equipment used to reproduce the sound. If you have a good amp and large speakers you are more likely to notice than your cheap headphones or thru your crappy laptop speakers.


I see audio storage like I see cars: there’s no such thing as a “best” version — more like the right version for the right time.

On a plane, I’m glad I can pull down a ton of albums on Spotify and listen to them without access to internet connectivity.

For Eno, Steely Dan, Roxy Music and Thomas Dolby produced albums, I’m glad for lossless.

For sitting at home on a Friday evening with a bourbon in my hand and the lights down low, nothing beats my vinyl player.

I could get into the cars side of it too but that’s more a “me” thing than a “relevant to this conversation” thing.


> The listening test linked in the article leads nowhere, I would have liked to see their methodology.

Here you go:

https://web.archive.org/web/20080322114622/https://www.stere...


Here's the full results from 2013 by the original blogger:

https://archimago.blogspot.com/search?updated-max=2013-02-24...

Starts about halfway down the webpage.


I wonder how far we still have to go.

Computer graphics is pretty good, but how does it compare to walking out into a bright sunny day.

Audiowise, I wonder how listening to live music, then listening to something that went through capture and playback end-to-end.

I'll bet there are differences and I wonder where the "bottlenecks" are.


If you're listening to most professional live sets that has amplifiers you're listening to something that went through "capture and playback end to end". Loads of boards are digital these days, and you're probably going to be altering the mix and EQ and going through compressors and all kinds of stuff in between.


And it’s worth noting that if you’re of a certain age - and generally that age correlates closely with the ability to afford equipment that can reproduce the very high quality you’re demanding - your hearing has likely deteriorated past the ability to discern the difference.


You don't need super high-end equipment to hear subtle details in audio, you just need reasonably good headphones. A few hundred dollars worth of headphones will get you quality that would cost tens of thousands with speakers and the room treatment speakers need to perform at their best (digital room correction can make a good room sound great but it can't fix a bad one).


The same can be said about loudspeakers, a good set of loudspeakers is far more important than buying a super-expensive set of DACs, Pre and Power Amps.


> Can you hear the difference between 320kbps MP3 or the equivalent, and CD-quality lossless?

I find it hard to believe an entire industry exists with high-end audio equipment ($100k+ on speakers/receivers/room treatment) just to play 320kbps MP3s?


The poster was talking about digital formats and compression. The expense of speakers and receivers is usually higher quality analog reproduction. Two vastly different concepts at play there. Even then going into the $100k+ territory usually yields very little results on top of the many thousands of dollars setup. Its probably better spending at least half of that just making the room and the other sound reflecting/absorbing things better if you're going that high-end.


It's also hard to believe homeopathic medicine exists.


What about when the output of the lossy codec is passed through another lossy codec, e.g. MP3 through AAC over Bluetooth? I would expect better results (from the second codec) when starting from a pristine lossless source.


> What about when the output of the lossy codec is passed through another lossy codec, e.g. MP3 through AAC over Bluetooth? I would expect better results (from the second codec) when starting from a pristine lossless source.

It's true that, technically, you'll get better results from the second codec when starting from the uncompressed source. Generally, it's always better to avoid unnecessary generation loss. That doesn't necessarily mean that you'll hear a difference since that depends on the cumulative output quality.


Or just poor listening conditions. Everyone always talks about needing super high end setups to hear the difference, but it's actually the low end setups with wonky frequency responses that totally mess up the assumptions behind psychoacoustic masking.


That would be an interesting experiment. Take a hi-res file and encode it with encoder A then B then A then B then ...

How many encodings does it take before a trained listener using good equipment in an ideal setting can tell?


This is a great example of why I prefer lossless source material. I can tell an MP3 from a CD nearly every time (on recordings of good original quality) once it's sent through Bluetooth.


I created some computed waveforms for audio testing on my PC, and on a whim, stored them as both WAV and MP3. Counterintuitively, the MP3's worked just fine for all of my tests. I didn't dig into the reasons why.


At its core, an MP3 says that for the next slice of time, play these frequencies at these volumes. If your waveforms are simple, an MP3 can encode them perfectly.


Worth noting 48KHz audio is now a commonly encountered standard for video. Not to say it’s necessarily audibly discernible from 44K but it’s obviously not quite as straightforward as 44K being the end of the story.


48kHz has the advantage of being an integer multiple of many common video frame rates, which makes video editing simpler.


48Khz can theoretically be better because the anti-aliasing and reconstruction filters can have a gentler slope, but modern oversampling converters make this moot now.


48 is just about convenience, it's not meant to be "better" than 44. 45, 46, 47 or 49 would be fine too, but 48 is a rounder number.


AFAIK, 48k was preferred over 44.1k in movie and cinema recordings because it divides evenly by 24. Both 44.1k and 48k are evenly divided by 30 and 60 fps. However, none of that really matters for TV where the actual frame rates are 59.97 and 29.98 to reduce flicker with US household mains frequency.

In the early days of digital audio, and before oversampling was possible, the anti aliasing filters were analog circuitry.

It's very difficult to cheaply implement a 20kHz brick wall filter with a 2kHz sideband.

Doing it in 4kHz yielded better results at the cost of slightly faster ADC designs.

I believe this is why 48kHz designs got the foothold in professional audio circles. The analog parts of those designs were WAY better sounding.

Once oversampling became common and affordable, the anti aliasing filters where implemented much more easily in the digital domain.


> to reduce flicker with US household mains frequency This isn't the reason for the odd frequencies (and black and white television was exactly 60i). The reason for the odd frame-rate is that you need to do some very fancy math to fit the color data in the same analog signal without breaking backwards compatibility.


Ah you're right. It still breaks the "even division" argument though.


> Can you hear the difference

Do you think it matters if I play the song on my $10 cheapo earbuds or on $60,000 Sennheiser HE-1 Summit headphones?


On $10 cheap buds? Yes. On $100 middling buds? Very, very few people will notice a difference.


"The human eyes can only do 24 fps" and "if I can't do thing, then no one can" all over again.


No one ever said the human eyes can only do 24fps.


No one who knows what they’re talking about, but I’ve absolutely seen that argument advanced on multiple occasions, albeit usually with 30 rather than 24fps.


Last time I heard this crap in real life was back in 2018. Things have progressed immensely over the last decade, though. Now almost everyone "knows better" due to relentless marketing from big companies, including phone and TV vendors.

Before that, 30 was totally fine for the masses. In fact it was preferable. Cinema is the last big holdout and, apparently, it's going to take at least another decade before even mere 48 is standard. As someone who has been riding the 120+ fos for over two decades, going to the movies is awful, especially action scenes and panning.


24 fps is the standard for cinema because that gives the preferred look for most content, with nice looking motion blur and whatnot. High frame rate may make sense for some movies, but it's not a win for the whole industry to go 48 or 60 or higher.

I'm not sure what you're getting at with the 120 fps comment, because that is obviously not the frame rate of the finished product, so it's not the same conversation.


The whole 24 FPS thing is mostly historical. I don't like it. Nor do I like motion blur.

The 120 fps was regarding games. While movies are passive, they could still benefit immensely by doubling to 48. Not every scene in a movie is people talking and this is where 24 stops being adequate. Even YouTube has had support for 60 FPS videos for years.

I know it's not a win for the movie industry. They ought to hate it, especially the artsy types.


30 was not fine for the masses. 60p and 50p were too hard to make work at the dawn of tv, which is why we had 60i and 50i, but 60i is not the same as 30p, and also note the NES runs at 50p or 60p (as do similar systems) and it's darn hard to play most of its games with less.


> Cinema is the last big holdout

Don't know if cinema will ever drop 24fps. The shift to higher frame rates is of questionable benefit as it just makes movies look like TV shows. It seems 24fps is what makes a movie feel like a movie.


Check out any reddit thread about high framerate monitors. There always one person.


I wish years ago we would have switched to A/B testing taking weeks or months for each side.


What quality and power speakers are needed to get good output from the files so it can be heard?


You aren't considering the listening environment. It is possible for ultrasonic sounds to interact with objects in the environment or with other ultrasonic sounds to produce lower frequency sounds that are in the range of normal human hearing.

There was an even a creepy ad campaign several years ago that took advantage of this. They had a billboard in New York for A&Es new show "Paranormal State" with the tagline "It's not your imagination".

They used an ultrasonic system on the billboard to make audible sounds appear in a small region on the sidewalk but not anywhere else. When people walking along the sidewalk got to that region they would hear a woman whisper "Who's there? Who's there? It's not your imagination".

That system worked by making a single ultrasonic beam that somehow as it dispersed became audible. There are other systems that use multiple ultrasonic beams that produce audible sound via interference where the beams meet.

Many acoustic instruments do produce significant amounts of sound above normal human hearing range. Cymbals for example have nearly 70% of the sound power above 20 kHz. Trumpets with a mute have almost 2% above 20 kHz.

It seems possible then that if you wanted to produce a recording that reproduces the sound you would get if live acoustic instruments were playing in the same environment you might need to include ultrasonics unless you are making a binaural recording.

This does raise the question of what we actually want playback of a recording to achieve. Is a recording of a string quartet when played back in my living room supposed to sound like that string quartet is playing in my living room, or is it supposed to sound like what I'd have heard if I was there when the piece was recorded, or is it supposed to be something else?

(For those who haven't heard of binaural recordings, they are stereo recordings made by placing microphones inside the ears of a model human head so they record the sounds that actually ends up in each ear when something is recorded live for a listener at a specific location in an environment. This page of headphone tests [2] includes a binaural test if you'd like to such a recording).

[2] https://www.audiocheck.net/soundtests_headphones.php


These ultrasonic playback things are depending on specific interference patterns materializing at certain spots. The design of the signals, shape of the space, arrangement and characteristics of the playback devices all need to be seriously tuned for this.

You're not experiencing these ultrasonics with headphones. You're not going to faithfully recreate some ultrasonic interference pattern in a random room with a pair of tower speakers placed in any arrangement. If anything, you're more likely to just make extra noise that wouldn't have been there originally from those interference patterns happening haphazardly and chaotically, coupled with the fact the vast majority of audio gear isn't designed to reproduce ultrasoincs (they're often specifically made to try and not generate them!)

And even then, by recording in the room with the instrument, you are capturing the resulting ultrasonic interference that you could have experienced if you were there.


You can absolutely hear the difference between 44.1 and 96 kHz sample rate. Even with typical reproduction filters on the output, sampling at 44.1 kHz prevents you from accurately preserving e.g. a sine tone at above ~6 kHz, and that's even without taking into account all the aliasing problems you're facing when the samples don't align with with the peaks of this or that tone. 44.1 kHz is "good enough", but it's not accurate, and you can definitely tell the difference.

As for anything beyond 16 bits amplitude on line level, no, you cannot hear a difference. For such a low-voltage signal the resolution at 16 bits is so fine that it already drowns in all the natural noise and THD in the cables, in the amplifier, in your speakers/headphones etc.


Please do watch & internally digest the explanatory videos at https://xiph.org/video/

It explains why you’re wrong in easily digestible terms & how a 44kHz sample rate will accurately encode signals right up to the Nyquist limit. The second video is an end to end demo showing the process in action.


> https://xiph.org/video/

Thanks a lot for those videos, they were absolutely excellent. For anyone wondering they're presented by "Monty", the guy behind the ogg container and vorbis codec. I probably understood 10% of what he said but that's still a lot.


> sampling at 44.1 kHz prevents you from accurately preserving e.g. a sine tone at above ~6 kHz

This is mathematically false. A 6kHz or 8kHz or 10kHz or 20kHz signal absolutely can be perfectly preserved with a 44.1kHz sample rate. Not just kind of preserved, but perfectly preserved.


It's perfectly preserved only if your samples are perfect. Imagine instead that we used 4-bit samples. The results would be obviously garbage. 8-bit would be better. 16-bit is better still. But it isn't perfect.


Increasing the bitrate lowers the noise floor. But that 6 or 8kHz signal is still there.

8:43 in the second video, he goes in to showing what increasing the bit depth gets you.

https://xiph.org/video/vid2.shtml


I doesn't look like you understand what sampling is, and how reconstruction filters in DACs work. Your statement is true for some waveforms, depending on their frequency, due to the use of reconstruction filters on the output, but it's not true for any signal and the problem becomes more apparent the higher the frequency of the waveform.


Reconstruction filters in DACs are analog low-pass filters. They don't do a linear interpolation between samples.


You explicitly said a 6 kHz sine wave. Which is pretty much the textbook example of a waveform and frequency which would work perfectly.

Maybe you wanted to say square wave?


If I'm understanding you correctly, you're saying that while a perfect sinc interpolation reconstruction would allow you to capture up to 44.1/2 kHz, in practice since we're limited to FIR reconstruction filters we can't actually get that high? If so it seems like a fair point, although I'd imagine they'd be better than 6khz?

There's also the issue of the input signal not being band-limited which is necessarily true for real world signals given that you sample for a finite duration.


Input signals are ALWAYS band limited for digital systems. If you don't do this and you work for any company designing such circuitry, you'd be fired.


There's no such thing as a finite support band-limited signal.


You should watch this: https://youtu.be/cD7YFUYLpDc?si=rUm6IR3IKXyzcaDB to better understand why high sample rates are a waste of time, instead of just reading about nyquist. "accurately preserving a 6kHz sine wave" sounds a lot like you think that sample points are reproduced 1-1 from the digital to the analog domain.

This just builds on the xiph video someone else linked but essentially

- sine waves are fine as long as you have points for rising and falling edge (nyquist, 44k guarantees 22k sine wave reproduction)

- bit depth only really affects noise floor, so it depends on your audios dynamic range


A 44 kHz sample rate guarantees accurate 22 kHz triangle wave reproduction if a reconstruction filter with linear interpolation is used on the output, and accurate amplitude of same signal if samples happen to align somewhat with the peaks of the waveform.


Wrong. You need to read some more Shannon and Nyquist. https://en.m.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_samp...


That's exactly why linear interpolation is not used.

The Nyquist-Shannon sampling theorem says that you can perfectly reconstruct any sampled signal as long as the input signal was bandlimited to 1/2 the sampling rate. Using a low-pass filter to bandlimit your 22 kHz triangle wave will remove all the (inaudible) overtones, leaving you with a single 22 kHz sine wave as input to your ADC. The reconstruction filter on the output DAC will then output a perfect 22 kHz sine wave, with the correct amplitude too!


Yep, as Monty showed in the 2nd xiph video a square wave will have issues with a low nquist frequency (at for eg 44khz sampling).


There are no "issues" with the square wave at 44kHz. Yes, a square wave sampled at 44kHz and converted back to analog does have visible ringing on an oscilloscope, but that's because you're looking at a perfect reconstruction of a band-limited square wave. Before sampling, the square wave is filtered to remove all of the overtones above 22kHz, which are not audible (as that's above the upper limit of human hearing) and would cause aliasing issues in the sampled signal.

In other words, the ringing is an artifact of low-pass-filtering on the square wave, not the sampling process itself. A purely analog system with a 22kHz low-pass filter would have the same "ringing", and your ears couldn't tell either way because they can't detect information above 15-20 kHz. It's not even possible to have a "perfect" square wave in the real world because that would require the speaker cone, air molecules, and your eardrum to teleport instantly from one spot to another -- so you will always have some low-pass filtering, and therefore ringing, on any real-world approximation of a square wave.


Thank you for your detailed comment. Disclaimer, I don't think I fully understand this topic or your comment in entirety.

When you say

> Before sampling, the square wave is filtered to remove all of the overtones above 22kHz, which are not audible (as that's above the upper limit of human hearing) and would cause aliasing issues in the sampled signal.

Do you mean that the human ear "hears" (for lack of a better word) in sine waves because of fourier transforms? So, a pure squarewave signal (cut off above 20khz due to human ear limits) sounds identical to an "imperfect"/wavey sq.wave made from harmonics lacking higher frequency parts?


A perfect square wave has infinite bandwidth. The phase change from high to low theoretically happens instantaneously. Practically nothing moves that fast in the real world. Voltages take time to go up and down because of inductance and what not in the circuit, the cone of the speaker needs to start accelerating to move, etc. A speaker cone doesn't move instantaneously from its high point in the wave to the low point in the wave, it takes time for it to move and actually push the air around. Plus, the air itself is kind of springy, so even if the speaker did move instantaneously (which it can't, it would have to move at infinite acceleration) the resulting pressure wave wouldn't be nearly as sharp of a transition.

Your ears don't experience pure square waves. They can't. They'll get the approximation of a square wave as close as they can experience them, but your ear drum doesn't immediately warp from one point to another. It gradually moves. The fluid in your ear has its own springiness. The hairs which do the final detection don't just instantly move either, they're being vibrated by the motion of the fluid in your ear.

And yeah, technically your eardrums can be moved by waves higher than 20kHz. But its not just the motion of your ear drums that give you hearing, its lots of tiny hairs in your inner ear that resonate at different frequencies that gives you the detection of certain audio frequencies that are present. Normal humans (read: practically everyone) tend to only have the equipment to accurately sense up to ~20kHz pressure waves with our ears as sound. As you age the areas which detect higher frequency sounds get less sensitive first, so you start losing the ability to hear those higher frequency sounds first.

It does seem like you're missing a bit of knowledge about signal theory though. That would really help you understand what I mean when I say a real square wave has infinite bandwidth. A very rough and basic idea to help here is that a wave can be thought of as a sum of fundamental sine waves. So a square wave is essentially the sum of all the component fundamentals, each fundamental gets sharper and sharper edges of the square. But the only way for the sine wave to have a truly vertical edge is to be infinite frequency, right? And what are the edges of a square wave? Vertical lines. So to keep adding these fundamentals together to achieve a square output, you'd need to add an infinite sum of sines together to make an actually perfect square. Monty touches on this in that video a bit, but it goes pretty quick.


Thanks again for your detailed comment. I think I understand it now. The only thing slightly puzzling me is why waves are broken up (be it Fourier or in our ears) into sine waves, and not say sawtooth (or some other, like a semicircle as in https://math.stackexchange.com/questions/1019005/what-kind-o...) waveform.

I understand how a squarewave needs infinite bandwidth for decomposing into sine waves, but a wave like the semicircle one has a vertical tangent and would not need an infinite bandwidth. Btw you're right, I've never had any signal theory classes (studied mech engg).


I'm mostly just an amateur myself, so I totally get where you're coming from.

FWIW, the graph at the top of that article (as mentioned in the comments) does not have vertical tangents. They're not true semicircles. Most of the answers given to make actual semicircles become non-continuous signals (y >= 0 if ... answer) or infinite sum. The one which doesn't just says they look like semicircles. I don't really have the time (or immediate knowledge, I'm admittedly bad at math) to dig all the math but my gut instinct suggests those aren't true semicircles and don't truly have vertical tangents.

And this is kind of how a signal generator can get away making square waves and sawtooths and what not; internally its not always truly "discrete" signals its just quickly flipping a switch from one thing to another. It flips from the high state to the low state fast enough that for your 50MHz oscope it looks pretty continuous, and its probably designed to try and draw a connected line.

This all kind of makes some sense when you get into how we actually make electrical signals. We're normally just modulating the electric "vibrations" of some crystal or accelerating some magnet through a loop back and fourth. These things move in continuous waves. As mentioned, things don't have infinite acceleration they take time to shift between states. So you're never going to get something that goes from high to low in zero time. And most home owners will tell you there's no such thing as a right angle.

I'd like to mention though, you're imaging the wave as being "broken up" into sine waves, which IMO isn't quite the right way to be looking at it. Its not "breaking up" the signal into sine waves, the signals were always sine waves. Remember my comment, truly square waves don't really exist. We can have things that kinda look square-ish when you squint your eyes, but they're not really square waves. A truly square wave in reality requires infinite acceleration or its not continuous. Fill a bathtub and try to make square waves. Its just not going to happen.

I know I'm not fully answering your question, I don't fully know the answer myself. So far in my stumbling around the closest thing I can answer is because that's just how nature and waves are, at least as much as our monkey brains can reason about them. Think about a string in a guitar or a wave in the water or throwing a ball. They all moves in ways which can be described by sine waves. Its just like the nature of how things accelerate and move, changing between states. Why do we see the golden ratio in so many places? Why are circles so special? Good luck on digging for more truth.


sorry, I misused "discrete" in this comment, should say "continuous".


It's been a long time since my DSP classes at uni, but I don't think this is true. 44.1kHz sampling is enough to reproduce up to 22.05kHz sound accurately without aliasing. Unless there is another type of distortion you might be picking up. This stuff is pretty far out of my realm these days.

https://en.wikipedia.org/wiki/Nyquist_frequency


It's true for a triangle wave and a square wave, depending on if the output has a reconstruction filter doing linear interpolations between samples. You cannot accurately sample a 22.05 kHz sine wave (or any other "complex" waveform) with a 44.1 kHz sample rate.


You're incorrect. A 44.1kHz sample rate can recreate a 22.05kHz sine wave. Your mental model for a digital signal isn't correct. It's not making stair steps nor is it making triangular waves.

https://youtu.be/cD7YFUYLpDc?si=rUm6IR3IKXyzcaDB


Yeah this is the "stairstep vs. lollipop" thing again.


Everyone complaining with "but stairstepping" fails to recognize that the final stage of a DAC is a reconstruction filter. The steps are gone after that filter is applied. You aren't analyzing the full DAC performance if you look in front of the filter. This is most dramatic in class-D amplifiers where the raw waveform feeding into the speakers is square wave hash that gets filtered out by the speakers themselves.


The filters do linear interpolation between samples. This bridges some shortcomings of a sample rate too low to capture complex waveform at high frequency, but it's not a silver bullet.


They don't. Analog low-pass filters don't do linear interpolation between samples.


Yes and no. Reconstruction filters are part of the problem (and part of the solution) but it's not all about them.


1. This is an 2008 article. Per Guidelines you should put years in the Title.

2. MP3 has improved a lot over its lifetime. LAME was already used for default by year 2000. When people say MP3 was good enough, they refer to MP3 encoded with LAME. ( Rant: When we people learn the codec, encoder and the encoded results are different things? 2023 and I see this mistakes everywhere still )

3. Even iTunes AAC has seen lots improvement since 2008. Especially in the 256Kbps+ Range.

4. And when AAC is mentioned. That is AAC-LC ( Or AAC Main Profile which isn't all that different ). AAC-LC ( Low Complexity ) has been declared as Patent free by RedHat. There is no reason to use MP3 today.

5. The definition of "CD-quality" alike went from MP3 128Kbps to now AAC 256Kbps. And arguably that is true for consumer market. Even Hydrogen audio has repeated these test multiple times.

6. I still prefer the codec MPC, Musepack (https://www.musepack.net). Sorry I just had to write it out. Sadly it never gained any traction.

7. If we have to be picky about frequency range, may be CD itself isn't good enough and we could use SACD?

8. Lossless is making a come back. Storage and Bandwidth cost continues to fall. ( Arguably not true for NAND, but let's ignore that part for now )

9. It is ironic when Lossless could gain and be used mainstream, Wireless earphones are replacing traditional earphones. Meaning your music will be re-encoded before it is sent to your earphone. And No. Most Android or iPhone dont have AAC pass through. i.e Your AAC encoded files will still be re-encoded before sending it your bluetooth earphone.


> Per Guidelines you should put years in the Title.

It's certainly the convention on HN to put the year in the title for older articles, but it's not one of the guidelines (https://news.ycombinator.com/newsguidelines.html).

(minor point but I can't help it)


Conventions are just guidelines that aren't written down. Also couldn't help it.


Hmm. We scold people for breaking guidelines but we don't scold them for not following conventions. We expect commenters to know the guidelines but we don't expect them to know the conventions. Seems different to me!


Arguing with dang.


That's a paddlin'.


Didn't realize. Oh well


Musepack! There are dozens of us! Dozens!!

Back in the early 2000's when I was getting into ripping my collection I didn't have enough space for FLAC so I surveyed the options and Musepack seemed like the obvious lossy codec winner. I still have that collection of .mpc's somewhere.


I think its worth repeating that "cd quality" is a term of art being used here that does not necessarily mean the audio has the quality of a cd (and, as they emphasize, in fact does not). I would dispute that any new standard has taken the helm of "cd quality" - my experience is that such a phrase is never used in describing quality of lossy compression. Most music downloads are either described by their bitrate (so that the listener is left to figure out what these mean), or by labels like "low", "medium", and "high" quality (with the listener left to distinguish whether those are accurate descriptors).


some people encode their mp3s to FLAC


Per


Why is the standard considered to be CD quality? In that way, the article shows its age. Today you wouldn't be talking about 44.1kHz 16 bit, it would be all about 24 bit 192kHz. If you're looking at spectrum plots, CD is very much on the low quality side of the spectrum of what's possible. Maybe we should be considering megahertz sampling rates and 32 bits, surely we have enough bandwidth.

Why not then? Because there is a ton of science and empirical evidence that humans cannot hear the difference[1]. Good engineering is about meeting the requirements with minimal cost. If the requirement is that it sounds good to humans, and the cost is number of bits to encode (and thus store and transmit) the signal, then modern codecs like Opus are clearly superior to uncompressed and losslessly compressed signals, much less higher sampling rates.

If your goal is something other than good engineering, for example the aesthetic satisfaction that the bits are the same as what the mastering engineer put on the CD, or for some reason caring how clean spectrum plots of artificial signals look, then the arguments may have some merit. But let's be clear on the goals.

[1]: https://people.xiph.org/~xiphmont/demo/neil-young.html


>Why is the standard considered to be CD quality?

Because it can fully reproduce everything the human ear can hear. Higher bitrates are only useful for production or archival.


The point is to have all of the information stored in your archive.

You can compress it for listening later, but you can never add information _back into_ the file. Store it in FLAC for archival purposes.

An equivalent would be archiving works of visual art in JPEG and not something lossless.


Agreed, for archival purposes we should be using lossless codecs. Not because you can hear the difference but because it makes it easier to reason about whether there's any distortion introduced by the compression. And we can consider the original artifact, as created by the mastering engineer, an authoritative source of truth, even if it's an imperfect representation of what was performed by the musicians.


What is the distortion if no human is able to hear it?


Just to give one example, if you want to do forensic analysis on the signal based on inaudible differences. That's a valid use case for an archive that doesn't apply to consumer (even audiophile) consumption of music streams.


Forensic analysis to answer which question?


What tools were used to create the audio? For example, the exact patterns of dither-based noise shaping[1] may reveal insight, but are by definition inaudible. Or perhaps there's an ultrasound source - something recorded near old-school CRT monitors may have a 15.75kHz tone, ordinarily outside the threshold of hearing but shows up as a clear peak on a spectrogram.

[1]: https://www.sweetwater.com/insync/hear-effects-dithering/


> something recorded near old-school CRT monitors may have a 15.75kHz tone, ordinarily outside the threshold of hearing

Only for some people -- the upper limit of human hearing varies between 15-20kHz, depending on the person and their age. For many children and younger adults (myself included), CRT coil whine is well within our audible range, as an incredibly annoying high-pitched squeal.

This comes up in speedrunning communities sometimes -- many runners prefer to play on CRTs due to their fast response time, and streamers who use CRTs need to remember to set up a notch filter on their microphone, or else their stream may be borderline unwatchable for younger viewers and the streamer might not even realize it.


Interesting, thanks


The real issue nowadays is that you shouldn't record or master at 44.1kHz (or multiples) and go for 48kHz instead.

But not because it sounds better. Simply because 48kHz is what your computer and phone natively clocks its audio codecs at. It's done because that's generally an integer fraction, but the why doesn't matter as much as the fact itself; PC "HD Audio" and phone codecs are 48kHz.

Yes you can resample, and yes you can resample without it being audibly noticeable. But it's an extra step where you're at the mercy of whoever implements it to do it right. Doing it wrong may also include noticeable delay, breaking e.g. A-V / lipsync.

Can you build HiFi systems that support 44.1kHz and maybe dynamically switch their clock source as needed? Sure. But what's the easiest way to build a HiFi system these days? You just stick an off-the-shelf embedded device in it, which likely uses standard PC/phone tech…

So just ship 48kHz.

(Similar argument for video recording in 50Hz countries btw - unless you are recording for TV/broadcast, you should always shoot at 60 / 59.94fps. Because that's what PC and phone screens run at…)


This used to be somewhat true, but modern digital often uses such high internal rates that converting between the two is near lossless. For instance, the least common multiple of 44.1k and 48k is in the 7MHz region. Realtime conversion between them is pretty doable today. We've been doing it by using decimation with much simpler hardware for a few decades before that.


That might be true for the hardware, but since mixing of multiple streams is a software function nowadays (… some older HW was multi-stream capable…), any 44.1kHz stream has a good chance of being resampled to 48kHz just to allow mixing it with other sounds.

Even if it's the only stream and you could switch the codec to 44.1kHz mode, what do you do if the OS wants to play a random notification sound? Switching between 44.1kHz and 48kHz is not going to be hitless on a significant number of HW (not all, but most I'd guess), so whoever's writing your OS mixer code would reasonably make a call to always mix at 48kHz…

(Yes this argument primarily applies to PCs and phones, hopefully on a HiFi system that just happens to use COTS embedded devices they'd write some code to switch the rate…)


Well, my OS lets me configure the output device and specify the sampling rate. I would think that if I configure it at 44.1 kHz and I'm playing back some ripped CD and the system tries to play a notification sound, that whatever that sound is sampled at it'll ensure to output it at 44.1 kHz. Otherwise what's the point of the setting?



Utterly useless. No listening tests except for a custom made sound file which is designed as an artificial worst case for the codecs. You might as well benchmark a text codec on /dev/random...


Yeah, this feels like someone arguing against no-one. They just showed that lossy codecs are indeed lossy. I don't think anyone was arguing that MP3 isn't actually lossy. (They did have a bit of interesting info about where the loss creeped in but that seemed to be a side node to the "argument".)

The question is if humans can hear the difference in the lossy waveform and if that harms the listening experience.


> We recommend that, for serious listening, our readers use uncompressed audio file formats, such as WAV or AIF—or, if file size is an issue because of limited hard-drive space, use a lossless format such as FLAC or ALC.

I recommend that, for serious listening (for some weird definition of "serious"), go to a music concert. PCM is also a lossy compression due to the quantization step, albeit its effect is much less pronounced for so many reasons that no one even thinks it as a "compression" method. If you can tolerate PCM, you should be also able to accept some good enough lossy codecs---I don't know if that includes MP3 or AAC or Vorbis or Opus or whatever, though.


I think it depends on the style of music. I guess for orchestral / classical music it's good.

But for other styles, I don't enjoy concerts for audio quality.

It's usually way too loud, so you have to wear earplugs. I've heard some made for this don't skew audio too much, but they are still a filter.

And then you have to like the balance that's chosen by the audio engineers and they are often not ideal. The voices can sometimes be not loud enough to the point you don't hear the words well, the bass too loud. Frequencies don't all travel the same way, so if you are too far away some things are missing or distorted, etc.

And then there's the noises from other people, the claps, the screams, etc.

And the audio still possibly went through some kind of non-analog equipment.

Not saying that feeling the bass in your whole body and feeling the communicative / excited atmosphere from the crowd can't be enjoyable but for audio quality, I'd rather listen to music in a calm room with some good equipment, at a volume level comfortable to me, when audio engineering didn't have to be live and could be (even) more carefully managed.

> If you can tolerate PCM

Are there people who can't tolerate it? It must not be very convenient.

(Huge caveat to this comment: I listen to music most of my awaken hours, but I'm not an audiophile. I never carefully listen to music, it's usually in the background.)


> PCM is also a lossy compression due to the quantization step, albeit its effect is much less pronounced for so many reasons that no one even thinks it as a "compression" method.

    20 * log10(1.0 / 2**16) == -96db
Much like sampling rate, it produces a range that's most likely outside of the ability for any human to appreciably detect. It's also a constant effect, whereas codecs actually analyze the audio to determine which components of the frequency spectrum it can eliminate.

I don't think it's reasonable to compare PCM and lossy codecs this way.


>PCM is also a lossy compression due to the quantization step

No, it's a lossy encoding step. Losing at least some of the information of a performance when you record it is unavoidable. For PCM to be a lossless, when you play it back it would have to transport you back in time to when the performance was recorded and you should be able to touch the performer. You're being silly.


That's a silly answer for a silly claim, if I really need a clarification.


I mean, you called PCM lossy because it quantizes the signal. What lossless recording method exists, if PCM is lossy? If the answer is none, then it seems you have a word for something that doesn't and can't possibly exist, and a qualifier that doesn't serve any purpose.


Which is the exact point of my silly answer. :-)

The original article claimed that "of necessity [lossy codecs] eliminate some of the musical information". I don't know how to quantify the musical information, but given another statement that "[l]ess bits always equals less music", it seems to be more or less same to the information-theoretic complexity. But as you have correctly guessed, there are a lot of places where the information can be lost, everything from performer's skills, musical instruments, ADC/DAC processes, and up to speakers. So "information not lost" is not a good argument for bashing lossy codecs, because you have lost so many informations already, just that you haven't noticed yet.

Also I claim the lossless compression exists even after this massive reduction of potential information because you can prove that a specific step is indeed bijective. In fact, even the most of lossy codecs are lossless. They are designed to lose some information at the very specific point so that they can be analyzed. "Lossless JPEG (re)compression" wouldn't make sense unless you realize that those lossless steps can be done more efficiently. I have seen enough people who assumed that this is impossible though...


>PCM is also a lossy compression due to the quantization step

Wrong, especially today. Modern ADCs use oversampling to push quantization noise into the inaudible range, and then filter it out before decimation to standard PCM.


Sorry, it is what you say that is wrong.

Because the end result is standard PCM, the quantization can be only worse, not better.

Oversampling ADCs push a much greater quantization noise into the inaudible range, and then, by low-pass filtering, reduce the quantization noise to the level of standard PCM.

Oversampling ADCs are not better, they are much cheaper, because 16-bit or 24-bit quantizers with enough speed and accuracy are extremely expensive.

Oversampling, i.e. sigma-delta modulation, in both ADCs and DACs, allows the use of much cheaper quantizers with low resolution, of only a few bits, or even of only 1 bit, and of much cheaper filters, which do not have to be very steep, without degrading too much the quality in comparison with real PCM conversion done at the Nyquist frequency.


Nothing you said refutes my assertion.


Let's assume there are people who are able to hear a difference. Why does it matter to a majority of people and the way they consume music? Maybe I'm from a spoiled generation, growing up listening to FM radio and tapes, even copying from tape to tape.

A lot of rock music lives from the imperfection of audio equipment, people spend a considerable amount of time replicating the behavior of vacuum tubes. Even techo producers like Robert Babicz record to analogue tape machine to enhance the final result.


Quite literally the sound of rock music is the sound of distortion. The kinks didn’t razor blade their amps for no reason.


Their recommendation doesn't make any sense. They first explain that lossless compression reproduces exactly the same data as when uncompressed:

"Lossless compression is benign in its effect on the music. It is akin to LHA or WinZip computer data crunchers in packing the data more efficiently on the disk, but the data you read out are the same as went in."

...but then recommend uncompressed over lossless compression for "serious listening":

"We recommend that, for serious listening, our readers use uncompressed audio file formats, such as WAV or AIF—or, if file size is an issue because of limited hard-drive space, use a lossless format such as FLAC or ALC."


Yeah, that doesn't make any sense at all. The only reason I can think of to use WAV instead of a losslessly compressed FLAC is if the player being used has a dog slow CPU or an incredibly old software stack that can't play FLAC files.

But I doubt these guys are using a Pentium 1 machine to play their audio files so idk. The low end smartphone I had in 2013 could easily play FLAC files, at least in the real time uncompressing and decoding part of the equation. Now if the built in DAC and amplifier could take advantage of that extra data is another thing.


Funny thing is that even on a Pentium 1, it doesn't make much difference. You'd have to go to a 486 to really choose WAV. Even then FLAC is absolutely playable, it just takes a bit more than twice the CPU of WAV (70% vs 30%). MP3 though on the 486 is just too heavy to do.


There are two reasons to use WAV over FLAC in music production: you can load WAV files into your DAW faster; and WAV supports floating point, which means you never have to think about clipping until the final mastering step. Neither is relevant to listening.


FLAC has a massive advantage over WAV/AIF in that it can be tagged with metadata for the music player software to read and display.


Maybe decoding speed is an issue or decoder quality


The decoder either reproduces exactly the same bytes as the original or it's seriously broken. You can run "flac --verify file.wav" to test this, or compare the decoded bytes yourself if you don't trust the tool. I doubt such bugs are a common issue.

I suppose decoding speed could matter in some situations, but they said "for serious listening", not "if your system is so slow that it fails to decode the file in real time".


>I suppose decoding speed could matter in some situations, but they said "for serious listening", not "if your system is so slow that it fails to decode the file in real time".

Even decoding speed is doubtful, a 486-100 MHz can decode 44.1/16 FLAC in real time with CPU to spare.


OP forgot to put (2008) at the end of this.

Today's standard isn't "CD Quality" anymore. There is literally no audible difference between MP3 320kbps, which covers the complete range of human hearing up to 22kHz and FLAC which covers all the way to 192kHz, which is lossless. At this point digital audio has surpassed what the human ear is capable of hearing, and any advancements to this is superfluous as far as music is concerned.

The only advantage to raw or lossless formats for music is archiving, as FLAC can be converted into other formats without incurring additional quality loss. For listening, it is now more important to have good equipment rather than a lossless format, and for streaming it is generally preferable to keep bandwidth requirements down.

The only reason I can imagine to continue expanding the capabilities of lossless audio is for scientific purposes and machine learning where the limits of human sensory perception isn't a limiting factor.


> There is literally no audible difference between MP3 320kbps, which covers the complete range of human hearing up to 22kHz

Although MP3 does have some fundamental limitations that cannot be fixed no matter how much bitrate you throw at them (referring especially to the "Inoptimal window sizes" from https://web.archive.org/web/20120222124415/http://www.mp3-te...). They're not dramatic issues, to be fair, but as long as you go for lossly compression, using a somewhat more modern codec like AAC or Opus would be preferable, unless you absolutely need the maximum compatibility afforded by MP3 (though these days at least AAC support should be pretty widespread, too, plus the patents on regular ["low complexity"] AAC have expired as well).


One of the benefits of lossless compression is that you don't have to put any effort into thinking about what level of compression is good enough and who can hear the difference. It just doesn't make any difference.


>The only reason I can imagine to continue expanding the capabilities of lossless audio is for scientific purposes and machine learning...

Communications is the main reason. There's only so much bandwidth in 44.1KHz.


With a headline like that it felt like 2002 again.

The article's byline has 2008.

A 2023 update could be interesting comparing the streaming providers' choices, and persistence of choices, now that monthly subscriptions, rather than actually owning anything, are so dominant.


I'm glad it's an old article. It would be sad if those audiophiles were still debating this old chestnut.


I’ve been involved in some heinous forum threads with people arguing the difference in sound of music streamed from different SD Cards. People will argue about all sorts of drivel


Plenty of those people are present in this thread.


Hydrogen Audio?


Citing his own past work from 1995, too.


What's really infuriating to me is that, at one point, Apple took a reasonable stand with their music streaming and said "256kbps AAC is CD quality" (it is). And now they've turned around and starting pushing this snake-oil, DRM'd "Hi-Res Lossless" nonsense.

The only reason to have "Hi-Res Lossless" is if you're going to do something besides listening with it... and you can't with Apple's streaming.


In the age of 4K,8K video streaming, free efficient lossless codecs and yearly $1000 phone upgrades why do we even bother to ask the question of lossy vs lossless audio for entertainment? It's not a matter of disc capacity anymore. You can have the best for "free".

If we do ABX test you will find out that people can't even make the difference between the original artist and a cover artist let alone lossy vs lossless. Should we just use cover artists at concerts?

The brain adapts quickly to lower quality be it visual, audio, olfaction or gustatory. Does it mean we should ingest the most we can tolerate because we get used to it so we can run on more efficient/cheap resources/content ?


The whole world does not have FTTH and unlimited 4G+ mobile data plans.

And you would be surprised by the number of radios streaming at 96kbps.


No, you can't hear the difference, let's collectively move on, please.


…if competently encoded.

You can absolutely hear the difference between a bad MP3 and the original. I used to amuse myself and friends by quite reliably identifying the difference, blinded, using a rather bad pair of speakers.

Actual CD audio can also work quite differently than any encoding, as at least older CD drives had an entirely separate analog output cable that connected to the sound card and bypassed the ATAPI link entirely. Levels wouldn’t even be matched.


To be fair, that's true today, but not in 2008 when this article was written. MP3 encoders have come a long way and bitrates are typically much higher.


I'm not sure its true even today. I've done lots of work in studios and can hear the difference between MP3 vs FLAC and CD -- even these days.


No you can’t. Maybe it’s volume, maybe it’s the non-blinded nature of the testing, but you can’t hear it. They’re the same.


Ah we will have to agree to disagree here :)


I can definitely hear the difference on some songs at some “CD quality” bit rates of MP3. Also some MP3 encoders (and decoders too, to be fair) are better than others. Particularly back when this article was written.

That all said, these days encoders are much better, and there’s no excuse not to go for 320kbps (assuming you have to use MP3).

What I find more interesting is that there was a period where some people who grew up listening to MP3s preferred the artefacting they introduced vs lossless. In much the same way how vinyl enthusiasts like the colouring of the sound that medium introduces. Which just goes to show that as much of this is down to psychology as it is technology.


Beyond comparing formats, even subjectively, it is important to consider how the public got used to compression artifacts.

I mean, the famous mp3 pre-echo was so common in early 90's that I think part of the listeners would prefer listening to it than to a cleaner sound. It is possible that mp3 influenced how music is composed, mastered and mixed.

That being said and adding the fact that people are willing to listen to music using cheap auricular phones in the noisy environment of their cars and recompressed using Bluetooth, I'd say that the 128kbps mp3 is still a very hard to beat format.


Not surprising about the MP3 pre-echo. I remember reading a comment while back, I think it was in Reddit. Someone posted a comment about how they bought a CD that have 50s/60s music for their grandparent and gave it them as Christmas gift. Their grandparent are graceful about the gift but they didn't use it very often. They inquired about why they are using the vinyl over a crystal clear CD. The grandparents said they loves the hissing and other sounds that are prevalent on the vinyl. They said it made it part of the music and they have strong memory of it during their era. The CD completely removed those sounds and felt it is unnatural sounds they are not used to. They felt it is too clean.

People have strong relations with their musics that they are used to while growing up. Nostalgia are powerful memories and they don't want their music unsullied from something that they grew up with.


MP3 smearing can be noticed even at 256kbps [1].

[1] https://www.soundonsound.com/techniques/what-data-compressio...


Well, MP3 does have some fundamental limitations that were only fixed with subsequent codec generations, like AAC or Opus.


I came across this after listening to several 320kbps MP3 files and found that they sounded noticeably worse than 256kbps AAC versions. Support for AAC is widespread now and it should be preferred over MP3 [1].

[1] https://www.iis.fraunhofer.de/en/ff/amm/consumer-electronics...


MP3 does have an advantage though, being widespread and royalty-free since the Fraunhofer patents expired.


AAC-LC patents have also expired since the codec was introduced in 1997 (26 years ago)


OPUS is still better than AAC and MP3.


I don't disagree. But almost any electronic device released in the last 10 years can play mp3. Probably even your electric toothbrush.


Even if you set aside universality as a desirable property, according to the official site at 128kbps any Opus efficiency advantage disappears. https://opus-codec.org/comparison/


That graph does not describe codec efficiency. It's explaining how the various codecs preserve the frequencies of the raw signal at various bitrates:

Some codecs only work at low bitrates and preserve only narrow bands of frequencies. Some codecs work only at mid bitrates and preserve wider bands. Some codes only work at high bitrates and preserve only the widest bands; you can't get then to drop more frequencies for better savings even if you wanted to. Opus works on all bitrates and gradually and dynamically removes frequency bands as the bitrate drops. Vorbis preserves more or less the same frequencies as Opus at the same bitrate, but loses frequencies a bit faster as the bitrate drops. MP3 drops even faster. AAC works very similarly to Opus, but can't output low bitrate streams.

To compare codec efficiency you would need to do subjective comparisons to see how often each codec achieves transparency (when people can no longer tell if the sound has been compressed or not) at a given bitrate with various types of sounds. This has also been measured, and it's agreed that Opus is basically transparent at 128 kbps. MP3 needs twice as many bits to get the same quality, so Opus is twice as efficient.


> That graph does not describe codec efficiency.

My friend, the vertical axis is literally labeled "Quality", and the horizontal axis "Bitrate". The caption is "The figure below illustrates the quality of various codecs as a function of the bitrate." Quality at a range of bitrates is how codec efficiency is measured.

I'd never heard the claim that "Opus is basically transparent at 128 kbps", but I did find https://wiki.hydrogenaud.io/index.php?title=Opus, which agrees with you: "Very close to transparency". But it also notes, "Most modern codecs competitive (AAC-LC, Vorbis, MP3)", which lines up with the chart.

Early Opus vs. MP3 tests were done with LAME, which is awful. This may be why you're under the impression that MP3 needs twice as many bits to get the same quality.


>the vertical axis is literally labeled "Quality"

And the labels on that axis make it perfectly clear what they mean by "quality". It's how much of the spectrum they preserve at that bitrate. If "quality" referred to subjective quality there's no reason why the chart should stop at 128 kbps. It stops there because the fullband codecs don't brickwall the signal past that point. Instead they use psychoacoustics to compress it.

>Early Opus vs. MP3 tests were done with LAME, which is awful.

That's funny, because other commenters say LAME is currently the benchmark for MP3 encoders.

Here: https://wiki.hydrogenaud.io/index.php?title=Transparency it states that MP3 is considered artifact-free at 192 kbps, although here: https://www.head-fi.org/threads/when-is-mp3-transparent-an-a... someone did an ABX test and they could still hear differences more than half the times at 256 kbps. If I take the lower number, MP3 is still 50% less efficient than Opus.


Which is why I use Opus128 on my iPhone, where storage is a limited, fixed commodity and my listening environment is rarely optimized. Everywhere else is FLAC or mp3-256. My ears aren’t golden enough anymore to justify 320 bit mp3.


> But what about when the codec is dealing not with a simple tone, but with music? One of the signals I put on Test CD 3 (track 25) simulates a musical signal by combining 43 discrete tones with frequencies spaced 500Hz apart.

Yes, but what about with music?


Interesting—have been tinkering in this area for decades now and always heard AAC was better than MP3. But until now have not seen how/why it was better. Thank you Stereophile.

Yes as several have written, the piece is from 2008 and it doesn't matter any more.

First, once LAME and VBR came about, I've never been able to tell the difference between my 192K MP3 and lossless files, even as a spring-chicken with expensive equipment. Been "good enough" for a very long time.

Second, since storage and bandwidth exploded I've used FLAC exclusively. Why not? But, have found 24/96+ files on the internet occasionally and first thing I downsample them to 16/48khz and do a listening test. I sure as hell can't hear the difference between those. I do leave the last extra 3.9khz... why not? Incredibly cheap and maybe the kids can hear it. Playable on car stereo and more compact, one third the size.

Finally, a big exception. Techies obsess about compression formats, but they don't matter as much as you think at the high-quality end. I've learned the source, i.e. master recording is more important. Example—rip "pristine" FLACs (or WAVs) directly from an iconic 80s CD. Do a listening test. Compare them with a modern remaster encoded with 192K Lame VBR MP3. The MP3 will sound a lot better and preserve the improved high end details. Yes, more noise but you'll struggle to hear it.

(Caveat—this is assuming we're not talking about a shitty 2010-era "loudness war" remaster but a quality-oriented remaster.)

Was mildly surprised by this after insisting on FLAC for almost two decades. A bit too early, in hindsight. Storage is so cheap now though, it again doesn't matter. FLAC it is, Opus from online sources.


Compressed music had a place, but in 2023, with 5G, unlimited data and streaming services, there's really no reason not to go lossless.

ALAC/FLAC files are pretty small, there's few downsides to going lossless. To be fair, there arent that many upsides either, but you at least skip one recompression step when sending the audio over BT.


Have you contemplated what AWS charges to egress a music file in 2023? Gigabytes are expensive.


I don't see the relevance to AWS regarding people's music habits.


The problem with AWS is the temptation to use lambdas when streaming will keep your program running. Adding on egress costs, as parent points out, makes what could be a simple streaming server quite expensive. Better to avoid AWS for this reason. Personally I've had great success with my own streaming server I made a few years to learn Rust running on a raspberry pi that I leave plugged in at home.


The bigger question is why people uses AWS for sending bulk data.

Gigabytes are cheap.


The files aren't small, storage size on a phone is still small especially when photo/video is competing and is taking more space, so the same real downside remains, and there are still plenty of places without any fast data connection even when most of the time you have 5g. Re recompress- ok, but is it a real downside, can anyone notice?


Flac takes a lot of space on my drive. Most people will not want to have that much data on their drive just for music. If i look at my folder now:

Yanni Rainmaker Flac -> 40 MB Yanni Rainmaker Mp3 -> 3MB

More than a factor 10 for a single song. For 50 songs that would become 2Gb. I love flac as a format but i would never recommend it as a general format for my grandmother.


I was suffering from low disk space for a few years and then happened to notice last month that 2TB NVME SSDs are $75-$125 depending on speed. They are all much faster than an old drive.

If you haven't looked in five years (like myself) I recommend doing that. No one needs to suffer on short disk space any longer. Don't know what "grandma" uses but it is unlikely that audio is a significant burden anymore when people routinely shoot HD+ video.

Also if compressing, Opus sounds better and is smaller.


A 1TB microsd card can store 2-3000 CDs worth of FLAC files. Or an 8TB SATA SSD can store 10s of thousands of CDs. You basically can't fill up a modern drive with (legally acquired) music.


A 1TB MicroSD card is still like $100 compared to getting a 64GB for like $3. I'd prefer to save $97 and delete some data that I probably won't even notice. I'll also get better battery life when picking a codec which my device has hardware decoders for.

And I get that the 1TB for $100 is cheaper per gig, but if I never even needed those gigs in the first place my overall cost is still cheaper to get the $3 one.


For a while, HydrongenAudio, arguably the best sound/music related forum, did plenty of listening tests.

The last one sadly is from 2014: they tested Opus, AAC and Ogg Vorbis at 96 kbps against a classic MP3 128 kbps, and find out which codec produces the best sound quality.

https://listening-test.coresv.net/results.htm

https://listening-test.coresv.net/bytrack/index.htm

Notice that it is almost 10 years old, and that MP3 was encoded at 128kbps.


>compressed ilk do not offer sufficient audio quality for serious music listening.

Old old OLD argument here. Apart from -rare-, well-trained golden ears, very few humans can distinguish 192K or better, well-encoded MP3s without knowing -exactly- what to listen for. Or will need to 99% of the time.

As dozens of studies have shown over several decades. The rest is either marketing or self-deception.

https://wiki.hydrogenaud.io/index.php


I'm convinced that we can "hear" frequencies well above the reputed 20KHz limit of human hearing, as overtones, i.e. as tonal quality.

I certainly don't have golden ears; I'm no audiophile, and I'm getting on in years. 44KHz FLAC is easily good enough for me. But I tire of listening to MP3 music, after a few tens of minutes; it seems to lack the presence and immediacy that keeps me interested.


That's fine, but you should appreciate that you are lying to yourself until you perform a blind test to prove it.


> you are lying to yourself until [...]

Not really. I'm not proposing a hypothesis that needs testing; I'm just reporting subjective anecdata. I don't need to test it, because even if I'm deluded it costs me 300GB instead of 100GB. Pfft.

"Lying to yourself" is silly talk; that implies that I'm knowingly telling myself a falsehood, which doesn't make sense. At worst, I'm mistaken.


>that implies that I'm knowingly telling myself a falsehood

You are aware of a common fact backed up by mountains of empirical data, a strong physical explanation, and fundamentals of information theory, but tell yourself this lie:

>I'm convinced that we can "hear" frequencies well above the reputed 20KHz limit of human hearing

This is a lie. You tell it to yourself. You must see this.


If frequencies abover 20Khz are causing audible differences in the audible frequency range then that's considered distortion and there's an issue in your playback chain somewhere.

https://people.xiph.org/~xiphmont/demo/neil-young.html


The intersection of not understanding digital audio and not understanding the neuroscience of hearing remains a place that never ceases to amaze me.


While acknowledging that I don't know whether I can tell the difference in every case or not, I would summarize my own preference for lossless audio in the following terms. Choosing lossy audio, my best case scenario is that I save space or bandwidth because I can't tell the difference; my worst case scenario is that I'm missing some element of the music, whether it is consciously noticeable, something I'm unaware of entirely, or perhaps something that I may only be experiencing on a somatic level that doesn't reach the level of conscious thought (I know that the possibility of this last option will be contested by some, and that's fair enough). Choosing lossless audio, my best case scenario is that I'm hearing the music in a higher fidelity, and increasing the amount I'm capable of appreciating; my worst case scenario is that I'm wasting some space or bandwidth for the reassurance. Basically, Pascal's Wager, but for audio.


There are measurably much larger effects from insufficient replication hardware. Are you using the same amp, speakers, room, listening position, and volume level as the person who mastered the recording? No? Then your difference in setup is adding much larger differences than -90 dB RMSE.

It's all a painfully fruitless effort when you learn that most masters don't even consider the phasing of instrument microphones and none of it is at all a close approximation of what it would be like to be in a room listening to instruments. It's good enough, yeah, but there are much more important and difficult threads to tug than lowering noise in the signal chain.


Didn't use lame for mp3 so the conclusions are pointless in my opinion


Lame is the most popular encoder so it is highly likely you have listened to its output.


This can be humbling http://abx.digitalfeed.net/list.html just try flac vs 128kbit.. I swear I used to be able to distinguish 320kbit, now I struggle with 96k :s Age and loud music...


2008.


We all know the Vorbis is supreme. Get out of here with your 15 year old DRM compression riddled subpar listening formats. OGG is all that matters. Without it… we wouldn’t have Spotify. <leaves before shoe is thrown>.


Wow, great tests! Now do a blind A/B with headphones on. :)


Anything is better than YouTube-- which seems to be the common format everyone is listening to these days. I would LOVE to be able to regularly listen to CD quality music...


YouTube is quite good with >128kbps opus audio


It's acceptable for most people listening to it on tiny speakers on phones or even earphones. But there is no way it's high-fi.


This test seems to imply that it is hi-fi.

https://listening-test.coresv.net/results.htm


Not just acceptable. For most people it's basically indistinguishable from uncompressed CD audio.


You can Google around and find lots of (these things of course will always be subjective) comparisons where people claim they can hear a difference.

I think my point is that for people who have a hi end setup and are used to listening to music -- don't come under the classification of "most people".

I agree with you for that for the average listener with $200 headphones or a club DJ, MP3-320k is fine.


There are people who can hear up to 22khz and beyond, so I've never understood why many of these arguments aren't easily resolvable on that basis alone.


I’m not entirely into the audiophile stuff, but from personal experience, you can tell the difference, last thing I tried was when I switched from Spotify to Apple Music, where the later has “lossless” option (I think even Spotify has that) but the difference was clear between the two for a streaming service, Apple one is just more clear and alive, I even opened the same song and kept switching back and forth between the apps just to make sure I’m not imagining stuff. Was it because the lossless on apple is better than the lossless on Spotify? Or something else? I don’t know.


Doubt it. The problem is that you don't know the source of the audio. That is the key difference, and since streaming even if you pick the right album the "same" song might come from another because it "is the same".

If you see such a huge difference across all music the playback software have manipulated the audio.

Assuming you have high quality set on spotify (even in the mobile-streaming setting, if you didn't use wifi).


I have the same experience with Spotify vs Deezer. I think it's more likely that Spotify has somehow screwed up their encoding process than that I would hear the difference between high bitrate lossy and lossless compression. Spotify's volume normalization also somehow makes everything sound worse but it's easy to disable.


It could be simply a difference in sound volume. Our brains tend to believe louder is better.


In my humble unscientific test, I did have both at the same volume level (didn’t check the EQ though as what other commenter said), I even tried different speakers incl my car’s, nothing fancy or extremely measured, just as an average consumer perspective.


Even though they’ve been talking about it for 2 years AFAIK Spotify still does not have a lossless offer. And lossy bitrate is adaptive and middling (especially if not using premium).

EQ-ing and mastering could also be different.


> EQ-ing and mastering could also be different.

Possible, although I didn’t change any default ones.


There is no lossless on Spotify.


Well, than that’s definitely the reason why


Maybe someone can make an online quiz of a bunch of formats and accumulate statistics on how well people can actually tell or if it’s just a bunch of random.


There are a bunch of online tests that help you see if you, with your ears and equipment, can tell the difference. http://abx.digitalfeed.net/ and http://abx.digitalfeed.net/list.lame.html was on HN some time ago, for example. I'm not sure if any of them collect stats though.


I have an abx website as a side project.

Feel free to make one or take an existing one like https://abx.funkybits.fr/test/the-eagles-hell-freezes-over-h...


For mp3, for me, 192kbps and higher is where it sounds pretty good, 128kbps sounds bad


Required bitrate depends on the music. With a modern version of LAME, 128kbps will be very difficult to ABX for solo vocals, but much easier in busy rock music (specifically, by listening to the decay of the cymbals).

This is why variable bit rate was developed.


> This is why variable bit rate was developed.

A more modern codecs like AAC or Opus, that can e.g. better deal with the cymbals problem you mention.


oh no here come the audiophiles runs away


Why do you guys tell others what they can or can't hear? We need to have someone do head surgery on you and implant a device to detect signals between your eardrum and the brain to see what's heard. Even then the brain probably has a lot to do with how sound is interpreted or processed.


You can easily create blind tests and they speak a very clear outcome.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: