I feel like there needs to be a model that fixes faces to clean this up. Humans are so attuned to faces that I can imagine it would take a specialized model to render convincing faces. Maybe there could be a layer to identify and occlude existing pseudo-faces generated by Stable Diffusion and another model to populate the occlusion.
These shots from an exit-less, claustrophobic NYC subway with mangled faceless things is the stuff of nightmares:
https://lexica.art/?q=new+york+subway