You know all the stories about the capricious djinn that grants cursed wishes based on the literal wording? That's what we have. Those of us who've been prompting models in image space for years now have gotten a handle on this but for people who got in because of LLMs, it can be a bit of a surprise.
One fun anecdote, a while back I was making an image of three women drinking wine in a fancy garden for a tarot card, and at the end of the prompt I had "lush vegetation" but that was enough to tip the women from classy to red nosed frat girls, because of the double meaning of lush.
Programming is already the capricious djinn, only it's completely upfront as to how literally it interprets your commands. The guise of AI being able to infer your actual intent, which is impossible to do accurately, even for humans, is distracting tech folks from one of the main blessings of programming: forcing people to think before they speak and hone their intention.
One fun anecdote, a while back I was making an image of three women drinking wine in a fancy garden for a tarot card, and at the end of the prompt I had "lush vegetation" but that was enough to tip the women from classy to red nosed frat girls, because of the double meaning of lush.