Hacker News new | past | comments | ask | show | jobs | submit login

I have been reading arguments about "things like alignment faking" for years, while simultaneously holding that "it's just autocomplete".

The alignment-faking arguments are still terrifying to the extent that they're plausible. In the hypothetical where I'm wrong about it being "just autocomplete" (and fundamentally, inescapably so), the risk is far greater than can be justified by the potential benefits.

But that's itself a large part of why I believe those arguments are false. If I gave them credit and they turned out to be false, then I figure I have succumbed to a form of Pascal's Mugging. If I don't give them credit and it turns out that a hostile, agentive AGI has been pretending to be aligned, I don't expect anyone (including myself) to survive long enough to rub it in my face.

Honestly, I sometimes worry that we'll doom ourselves by taking AI too seriously even if it's indeed "just autocomplete". We've already had people commit suicide. The sheer amount of text that can now be generated that could propose harmful actions and sound at least plausible is worrying, even if it doesn't reflect the intent of an agent to convince others to take those actions. (See also e.g. Elsagate.)




> But that's itself a large part of why I believe those arguments are false. If I gave them credit and they turned out to be false, then I figure I have succumbed to a form of Pascal's Mugging. If I don't give them credit and it turns out that a hostile, agentive AGI has been pretending to be aligned, I don't expect anyone (including myself) to survive long enough to rub it in my face.

I'm sorry, but this is a crazy reason to believe something is false. Things are either true or they aren't, and if the world would be nicer to live in if thing X was false does not actually bear on whether thing X is false or not.


It's not "the world would be nicer to live in if thing X was false".

It's "the world would cease to exist if thing X were true".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: