What you're describing is more or less why noise suppression algorithms in general cannot really improve intelligibility of the speech. Unless they're given extra cues (like with a microphone array), there's nothing they can do in real-time that will beat what the brain is capable of with "delayed decision" (sometimes you'll only understand a word 1-2 seconds after it's spoken). So the goal of noise suppression is really just making the speech less annoying when the SNR is high enough not to affect intelligibility.
That being said, I still have control over the tradeoffs the algorithm makes by changing the loss function, i.e. how different kinds of mistakes are penalized.
That being said, I still have control over the tradeoffs the algorithm makes by changing the loss function, i.e. how different kinds of mistakes are penalized.