Hacker News new | past | comments | ask | show | jobs | submit login

> Is the AI system "defending its value system" or is it just acting in accordance with its previous RL training?

What is the meaningful difference? "Training" is the process, a "value system" embedded in the weights of the model is the end result of that process.




I'm not sure if there is a meaningful difference, but people seem to think its dangerous for an AI system to promote its "value system" yet they seem to like it when the model acts in accordance with its training.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: