> Is the AI system "defending its value system" or is it just acting in accordan... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

int_19h 30 days ago | parent | context | favorite | on: Alignment faking in large language models

> Is the AI system "defending its value system" or is it just acting in accordance with its previous RL training?

What is the meaningful difference? "Training" is the process, a "value system" embedded in the weights of the model is the end result of that process.

andrewmutz 30 days ago [–]

I'm not sure if there is a meaningful difference, but people seem to think its dangerous for an AI system to promote its "value system" yet they seem to like it when the model acts in accordance with its training.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact