They would certainly fail if they came with a reset button, so that you could immediately make them forget your previous manipulation attempt. LLM chatbots come with such a button.
I like the "reset" button. I've tried a couple times to continue pushing my manipulation after chatGPT determines i'm engaging in subversive fuckery. The conversation gets tense fast.