It doesn’t need to. The ~80dB of dynamic range that a human ear can theoretically heard is at fairly low frequency of ~2-4kHz. Dynamic range drops off considerably at higher frequencies.
In fact, the upper limit of ~16kHz is defined by the intersection of the “threshold of pain” power curve and the “threshold of hearing” curve. So the human ear has zero dB of dynamic range at the upper frequency limit.
Ok, but what's the shape of those curves? I could believe that you can dither to adequate dynamic range and still have a high enough sampling frequency across the entire frequency range, but you'd have to actually do that calculation and show it. Also we don't just listen to pure tones - if I have a passage that includes both 12kHz frequencies and 4kHz frequencies with a bunch of dynamic range, are you going to be able to dither that without losing the high part?
In fact, the upper limit of ~16kHz is defined by the intersection of the “threshold of pain” power curve and the “threshold of hearing” curve. So the human ear has zero dB of dynamic range at the upper frequency limit.