I think the point is rather that you can't get a more useful prediction by choosing a lower probability description unless you have AGI. Only an AGI could tell that you're not in the mood for "Hey" to be followed by "darling", and only a superhuman AGI could realistically compensate for human bias in data sets.
Without AGI there are still cases when the lower probability prediction will be better, and will lead to escaping a local minima. I'd argue that the potential benefits of calibrating that axis dynamically exist with or without AGI.