The problem is that the information is in an opaque encoding that nobody can rev...

The problem is that the information is in an opaque encoding that nobody can reverse engineer today. So it's impossible to prove that a certain subset of data has been removed from the model.

Say, you have a model that repeats certain PII when prompted in a way that I figure out. I show you the prompt, you retrain the model to give a different, non-offensive answer. But now I go and alter the prompt and the same PII reappears. What now?