Hacker News new | past | comments | ask | show | jobs | submit login

The problem is that the information is in an opaque encoding that nobody can reverse engineer today. So it's impossible to prove that a certain subset of data has been removed from the model.

Say, you have a model that repeats certain PII when prompted in a way that I figure out. I show you the prompt, you retrain the model to give a different, non-offensive answer. But now I go and alter the prompt and the same PII reappears. What now?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: