Some of the best recordings I've heard (NPR) are only on YouTube. This leads me to believe recording quality is orders of magnitude more important than encoding, as long as a decent bitrate and encoding scheme were used.
The quality ceiling for any recording you have is the quality ceiling of the weakest link in your audio pipeline.
This means, to be able to get a good sound from any system, you have to feed it a good signal, and that path starts with recording.
Current audio codecs are great from a psychoacoustic point of view. A good encoder can create an enjoyable file at modest bitrates (192kbps for MP3, and 128kbps for AAC IIRC), and retain most of the details.
The audible residue when you subtract a MP3 from a FLAC is not details per se, but instrument separation and perceived size of the sound stage. People generally call this snake oil, but I have the same amplifier for the last 30 years, and I can say how different qualities of audio render through the same pipeline. A good recording stored losslessly can bring the concert to your home, up to a point. MP3 re-encodings of the same record will sound flatter and smaller.
Lastly, it's not possible to completely contain the sound of a symphony orchestra in a stereo recording. That's not happening. So there's always a limit.
A reduction in soundstage/width is likely due to using "joint stereo" or "intensity stereo" encoder modes, which do things such as mid-side (M-S) conversion (which isn't itself the culprit) in order to give more bits to M (which results in better quality for sounds with high L-R correlation, like vocals) and fewer bits to S (which results in less quality for sounds with low L-R correlation, like a drum kit stereo miked).
If using plain old "stereo" mode instead, this problem doesn't occur, but you need a higher overall bitrate for correlated sounds to come through at the same quality, so it's rarely used at modest bitrates and instead tends to be reserved for only the highest bitrates.
Thus, comparing mp3@192 with mp3@320 often actually means comparing mp3@192joint with mp3@320stereo and therefore the listener will find very little if any improvement in the quality of mono-miked center-panned sounds (vocals, etc.) but a decent improvement in the quality of wide sounds (cymbals, reverb, string sections, etc.) since the 320 will have only a few more bits for "mid" but way more bits for "side" so to speak, relative to the 192.
Thanks for the technical details, I didn't know how intensity/joint modes work, however I never use them.
The tests I have done is all encoded by myself. I have purchased 24bit WAV of Radiohead's OK Computer Remastered. I encoded it to FLAC, and 320CBR Stereo with LAME. I still can feel the difference on soundstage, and can create a audible residue file by subtracting MP3 from FLAC version.
I agree that current iteration of encoders create very good audio, however given that your audio system can render high resolution audio, the difference is still audible.
Not every human is created equal when it comes to ear and sound processing. I have met people with sub 20Hz hearing, and people with ears so sharp, they were able to pick a single wrong note from a single instrument while watching a recording of a symphony orchestra (I played together with them).
MP3@320kbps CBR and AAC@256kbps are pretty good for normal listening, but if you have the hardware to render, lossless formats creates a richer soundstage. I have an amplifier which can render it, and I'm listening music with it for 30 years now, so I can hear the difference.
At the end of the day, if your audio pipeline can render the differential residue between MP3@320kbps and FLAC, you can hear it.
Now, you can say that "are you attentive enough to perceive such difference", I'm not listening that intently 75% of the time, but it pays off when I put some time aside to listen to my favorite album for the sake of listening it.
By far the main source of degradation in any typical analog audio path is going to be transducers (microphones, speakers, phono cartridges, tape heads) and inferior media (tape, etc.). The vast majority of modern amplifiers, and high end older amplifiers, are extremely transparent with good margins beyond typical human hearing; any issues like noise, harmonic distortion, uneven response, inadequate damping factor, etc. introduced at the amplifier would typically be masked by quirks of the speakers, and revealed only by measuring upstream of the speakers.
The problem is not the number of ears we have, but the amount of air moved by the instruments themselves and how they interact with each other.
A symphony orchestra is miced per group normally (2 for violins, 2 for trumpets, etc.), but if you're around 60 people, you can mic every instrument individually.
To reproduce the sound 1 to 1, you need to mic every instrument individually, and playback them with speakers matching the frequency response and air pressure . So you need speakers equal to the number and characteristics of instruments themselves. On top of that you need to record them ideal microphones and store them loslessly in the process.
Otherwise, you can't create the sound by recording 100 people with 20 microphones, and downmixing them to two channels. It's not possible. I played in double bass in an orchestra, listened countless orchestras, listened the recordings of our own concerts. The gap is enormous.
I genuinely have no opinion. I like to use a vintage amplifier with a couple of beefy bookshelf speakers. I run a couple of Heco Celan GT302s with an AKAI AM-2850.
It's a very well balanced system for my needs and room size. That's a pretty nifty setup for me.
In one of my previous lives I built an encoding platform for all the major record labels. Part of this involved listening to hundreds of tracks to try and optimize the encoder settings. It's not necessarily the quality of the original recording, but simply the type of audio. For instance, the absolute hardest, IMO, were "unplugged" albums, e.g. solo singer, acoustic guitar. Lossy compression would shit itself on those.