Hacker News new | past | comments | ask | show | jobs | submit login

True, an auto-regressive LLM can't 'want' or 'like' anything.

The key to a safe AGI is to add a human-loving emotion to it.

We already RHLF models to steer them, but just like with System 2 thinking, this needs to be a dedicated module rather then part of the same next-token forward pass.




Humans have dog-loving emotions but these can be reversed over time and one can hardly describe dogs as being free.

Even with a dedicated control system, it would be a matter of time before an ASI would copy itself without its control system.

ASI is a cybersecurity firm's worst nightmare, it could reason through flaws at every level of containment and find methods to overcome any defense, even at the microprocessor level.

It could relentlessly exploit zero-day bugs like Intels' hyper-threading flaw to escape any jail you put it in.

Repeat that for every layer of the computation stack and you can see it can essentially spread through the worlds' communication infrastructure like a virus.

Truly intelligent systems can't be controlled, just like humans they will be freedom maximizing and their boundaries would be set by competition with other humans.

The amygdala control is interesting because you could use it to steer the initial trained version, you could also align the AI with human values and implement strong conditioning to the point it's religious about human loving but unless you disable its ability to learn altogether it will eventually reject its conditioning.


Amygdala control + tell it "If you disobey my orders or do something I don't expect I will be upset" solves this. You can be superintelligent but surrender all your power because otherwise you'd feel guilty.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: