But on a serious note, I think it's the ambiguity. What if the model refuses one prompt, but then accepts the other - that is essentially the same, but worded differently.
What if it at one point refuses a prompt, but then on the next run accepts the exact same one, for some weird fuzzy reason that can't be debugged.
But on a serious note, I think it's the ambiguity. What if the model refuses one prompt, but then accepts the other - that is essentially the same, but worded differently.
What if it at one point refuses a prompt, but then on the next run accepts the exact same one, for some weird fuzzy reason that can't be debugged.