True, an auto-regressive LLM can't 'want' or 'like' anything. The key to a safe ...

cynusx · 2024-06-20T07:43:16 1718869396

Humans have dog-loving emotions but these can be reversed over time and one can hardly describe dogs as being free.

Even with a dedicated control system, it would be a matter of time before an ASI would copy itself without its control system.

ASI is a cybersecurity firm's worst nightmare, it could reason through flaws at every level of containment and find methods to overcome any defense, even at the microprocessor level.

It could relentlessly exploit zero-day bugs like Intels' hyper-threading flaw to escape any jail you put it in.

Repeat that for every layer of the computation stack and you can see it can essentially spread through the worlds' communication infrastructure like a virus.

Truly intelligent systems can't be controlled, just like humans they will be freedom maximizing and their boundaries would be set by competition with other humans.

The amygdala control is interesting because you could use it to steer the initial trained version, you could also align the AI with human values and implement strong conditioning to the point it's religious about human loving but unless you disable its ability to learn altogether it will eventually reject its conditioning.

noway421 · 2024-06-21T17:47:53 1718992073

Amygdala control + tell it "If you disobey my orders or do something I don't expect I will be upset" solves this. You can be superintelligent but surrender all your power because otherwise you'd feel guilty.