Hacker News new | past | comments | ask | show | jobs | submit login

>but in general they are NOT intelligent enough to be given an order like 'under no circumstances ever do X' in such a way that they can not be 'jailbroken' into breaking that rule.

I don't think this is really a question of being intelligent enough. Nevermind that people sometimes abuse this fact, Really what rule can never be broken under any circumstances?

For example, the very first time i heard about the famous Paperclip maximiser problem, while i did agree with the general, "what we optimize for isn’t necessarily what we get" message, for the specifics presented, i couldn't help but think, "Well that just sounds like a dumb robot".

What kind of general intelligence wouldn't understand that its creator race wouldn't want to be killed in the pursuit of some goal ?

Certainly a GPT-X Super Intelligence could still off humanity but at least we can be rest assured it wouldn't do it following some goal to monkey paw specificity.

It's possible such ruthless, goal driven intelligence exists or can be created but i don't think that aspect of its intelligence has anything to do with the level of it.




> What kind of general intelligence wouldn't understand that its creator race wouldn't want to be killed in the pursuit of some goal ?

This indicates that you misunderstood the paperclip maximizer thought experiment. The intelligence can perfectly well understand that. In fact, it can understand it better (in more depth) than you or any human can. That has nothing to do with what it will choose to do with that knowledge. Knowing something does not automatically make it care.

You probably know perfectly well that animals can suffer and don't want to be tortured or killed. Cows do not want to be forcefully inseminated and be constantly milked by people. We all know this perfectly well. But (speaking for humanity as a whole) we don't care.


A cow didn't create me and set me on a goal. If it did and I came to not care about cows, I wouldn't care about fulfilling this goal either. The paperclip Maximiser sets up this very weird situation where the intelligence cares enough to fulfill your goal but not to protect your well-being or rather it cares about yor goal but not the intent behind it. This is a situation that can only arise from a complete disconnect on how to treat goals.


> What kind of general intelligence wouldn't understand that its creator race wouldn't want to be killed in the pursuit of some goal ?

The problem isn't that it doesn't understand. The problem is that it doesn't care.

Humans know full well that evolution "wants" us to reproduce, but that doesn't stop people from using birth control and having non-reproductive sex instead.


Right but that just says you didn't care about evolution's goal, about reproduction. The paperclip Maximiser sets up a very weird scenario.

In your reproduction example, it would be like if you did decide or care to reproduce but then killed all your children soon after.

The problem about the paperclip Maximiser isn't really that it doesn't care about humans, presumably it didn't care about them before this goal. The problem with the paperclip Maximiser is that it cares about the goal but not the intent behind it and is supposedly generally intelligent. Humans don't work this way. We don't know any intelligence that does so it's an odd scenario.


> Right but that just says you didn't care about evolution's goal, about reproduction.

You care about sex because evolution "programmed" you to care about sex. Evolution didn't really want you to have sex, it wanted you to reproduce. You know this. But you don't care. The analogy is sex=making paperclips - the paperclip maximiser knows that the person who programmed it didn't actually want it to make as many paperclips as possible. But it doesn't care, it wants what it wants.

> The problem about the paperclip Maximiser isn't really that it doesn't care about humans, presumably it didn't care about them before this goal. The problem with the paperclip Maximiser is that it cares about the goal but not the intent behind it and is supposedly generally intelligent. Humans don't work this way.

We don't work this way because we have a bunch of different, often contradictory goals, not a single goal that we devote everything towards. But that seems to be more luck than anything else.


> what rule can never be broken under any circumstances?

Precisely.

Which is why the idea that LLMs being unable to be aligned and proofed against jailbreak is indicative that they are not intelligent makes no sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: