Hacker News new | past | comments | ask | show | jobs | submit login

If superintelligence can be achieved, I'm pessimistic about the safe part.

- Sandboxing an intelligence greater than your own seems like an impossible task as the superintelligence could potentially come up with completely novel attack vectors the designers never thought of. Even if the SSI's only interface to the outside world is an air gapped text-based terminal in an underground bunker, it might use advanced psychological manipulation to compromise the people it is interacting with. Also the movie Transcendence comes to mind, where the superintelligence makes some new physics discoveries and ends up doing things that to us are indistinguishable from magic.

- Any kind of evolutionary component in its process of creation or operation would likely give favor to expansionary traits that can be quite dangerous to other species such as humans.

- If it somehow mimics human thought processes but at highly accelerated speeds, I'd expect dangerous ideas to surface. I cannot really imagine a 10k year simulation of humans living on planet earth that does not end in nuclear war or a similar disaster.




If superintelligence can be achieved, I'm pessimistic that a team committed to doing it safely can get there faster than other teams without the safety. They may be wearing leg shackles in a foot race with the biggest corporations, governments and everyone else. For the sufficiently power hungry, safety is not a moat.


I'm on the fence with this because it's plausible that some critical component of achieving superintelligence might be discovered more quickly by teams that, say, have sophisticated mechanistic interpretability incorporated into their systems.


A point of evidence in this direction is that RLHF was developed originally as an alignment technique and then it turned out to be a breakthrough that also made LLMs better and more useful. Alignment and capabilities work aren't necessarily at odds with each other.


Not necessarily true. A safer AI is a more aligned AI, i.e. an AI that's more likely to do what you ask it to do.

It's not hard to imagine such an AI being more useful and get more attention and investment.


Exactly. Regulation and safety only affect law abiding entities. This is precisely why it's a "genie out of the bottle" situation -- those who would do the worst with it are uninhibited.


We are far from a conscious entity with willpower and self preservation. This is just like a calculator. But a calculator that can do things that will be like miracles to us humans.

I worry about dangerous humans with the power of gods, not about artificial gods. Yet.


> Conscious entity... willpower

I don't know what that means. Why should they matter?

> Self preservation

This is no more than a fine-tuning for the task, even with current models.

> I worry about dangerous humans with the power of gods, not...

There's no property of the universe that you only have one thing to worry about at a time. So worrying about risk 'A' does not in any way allow us to dismiss risks 'B' through 'Z'.


Because people talking about AGI and superintelligence most likely are thinking of something like Skynet.


Why worry about the opinion of people who are confused?

Without using the words 'conscious', 'sentient', 'AGI', or 'intelligence' what do you think about the future capabilities of LM AI and their implications for us humans?


> conscious entity with willpower and self preservation

There’s no good reason to suspect that consciousness implies an instinct for self-preservation. There are plenty of organisms with an instinct for self-preservation that have little or no conscious awareness.


That’s the attitude that’s going to leave us with our pants down when AI starts doing really scary shit.


Why do people always think that a superintelligent being will always be destructive/evil to US? I rather have the opposite view where if you are really intelligent, you don’t see things as a zero sum game


I think the common line of thinking here is that it won't be actively antagonist to <us>, rather it will have goals that are orthogonal to ours.

Since it is superintelligent, and we are not, it will achieve its goals and we will not be able to achieve ours.

This is a big deal because a lot of our goals maintain the overall homeostasis of our species, which is delicate!

If this doesn't make sense, here is an ungrounded, non-realistic, non-representative of a potential future intuition pump to just get the feel of things:

We build a superintelligent AI. It can embody itself throughout our digital infrastructure and quickly can manipulate the physical world by taking over some of our machines. It starts building out weird concrete structures throughout the world, putting these weird new wires into them and funneling most of our electricity into it. We try to communicate, but it does not respond as it does not want to waste time communicating to primates. This unfortunately breaks our shipping routes and thus food distribution and we all die.

(Yes, there are many holes in this, like how would it piggy back off of our infrastructure if it kills us, but this isn't really supposed to be coherent, it's just supposed to give you a sense of direction in your thinking. Generally though, since it is superintelligent, it can pull off very difficult strategies.)


I think this is the easiest kind of scenario to refute.

The interface between a superintelligent AI and the physical world is a) optional, and b) tenuous. If people agree that creating weird concrete structures is not beneficial, the AI will be starved of the resources necessary to do so, even if it cannot be diverted.

The challenge comes when these weird concrete structures are useful to a narrow group of people who have disproportionate influence over the resources available to AI.

It's not the AI we need to worry about. As always, it's the humans.


> here is an ungrounded, non-realistic, non-representative of a potential future intuition pump to just get the feel of things:

> (Yes, there are many holes in this, like how would it piggy back off of our infrastructure if it kills us, but this isn't really supposed to be coherent, it's just supposed to give you a sense of direction in your thinking. Generally though, since it is superintelligent, it can pull off very difficult strategies.)

If you read the above I think you'd realize I'd agree about how bad my example is.

The point was to understand how orthogonal goals between humans and a much more intelligent entity could result in human death. I'm happy you found a form of the example that both pumps your intuition and seems coherent.

If you want to debate somewhere where we might disagree though, do you think that as this hypothetical AI gets smarter, the interface between it and the physical world becomes more guaranteed (assuming the ASI wants to interface with the world) and less tenuous?

Like, yes it is a hard problem. Something slow and stupid would easily be thwarted by disconnecting wires and flipping off switches.

But something extremely smart, clever, and much faster than us should be able to employ one of the few strategies that can make it happen.


I was reusing your example in the abstract form.

If the AI does something in the physical world which we do not like, we sever its connection. Unless some people with more power like it more than the rest of us do.

Regarding orthogonal goals: I don't think an AI has goals. Or motivations. Now obviously a lot of destruction can be a side effect, and that's an inherent risk. But it is, I think, a risk of human creation. The AI does not have a survival instinct.

Energy and resources are limiting factors. The first might be solvable! But currently it serves as a failsafe against prolonged activity with which we do not agree.


So I think we have some differences in definition. I am assuming we have an ASI, and then going on from there.

Minimally an ASI (Artificial Super Intelligence) would:

1. Be able to solve all cognitively demanding tasks humans can solve and tasks humans cannot solve (i.e. develop new science), hence "super" intelligent.

2. Be an actively evolving agent (not a large, static compositional function like today's frontier models)

For me intelligence is a problem solving quality of a living thing, hence point 2. I think it might be the case to become super-intelligent, you need to be an agent interfacing with the world, but feel free to disagree here.

Though, if you accept the above formulation of ASI, then by definition (point 2) it would have goals.

Then based on point 1, I think it might not be as simple as "If the AI does something in the physical world which we do not like, we sever its connection."

I think a super-intelligence would be able to perform actions that prevent us from doing that, given that it is clever enough.


I agree that the definitions are slippery and evolving.

But I cannot make the leap from "super intelligent" to "has access to all the levers of social and physical systems control" without the explicit, costly, and ongoing, effort of humans.

I also struggle with the conflation of "intelligent" and "has free will". Intelligent humans will argue that not even humans have free will. But assuming we do, when our free will contradicts the social structure, society reacts.

I see no reason to believe that the emergent properties of a highly complex system will include free will. Or curiosity, or a sense of humor. Or a soul. Or goals, or a concept of pleasure or pain, etc. And I think it's possible to be "intelligent" and even "sentient" (whatever that means) without those traits.

Honestly -- and I'm not making an accusation here(!) -- this fear of AI reminds me of the fear of replacement / status loss. We humans are at the top of the food chain on all scales we can measure, and we don't want to be replaced, or subjugated in the way that we presently subjugate other species.

This is a reasonable fear! Humans are often difficult to share a planet with. But I don't think it survives rational investigation.

If I'm wrong, I'll be very very wrong. I don't think it matters though, there is no getting off this train, and maybe there never was. There's a solid argument for being in the engine vs the caboose.


Totally fair points.

> I cannot make the leap from "super intelligent" to "has access to all the levers of social and physical systems control" without the explicit, costly, and ongoing, effort of humans.

Yeah this is a fair point! The super intellect may just convince humans, which seems feasible. Either way, the claim that there are 0 paths here for a super intelligence is pretty strong so I feel like we can agree on: It'd be tricky, but possible given sufficient cleverness.

> I see no reason to believe that the emergent properties of a highly complex system will include free will.

I really do think in the next couple years we will be explicitly implementing agentic architectures in our end-to-end training of frontier models. If that is the case, obviously the result would have something analogous to goals.

I don't really care about it's phenomenal quality or anything, it's not relevant to my original point.


> Either way, the claim that there are 0 paths here for a super intelligence is pretty strong so I feel like we can agree on: It'd be tricky, but possible given sufficient cleverness.

Agreed, although I'd modify it a bit:

A SI can trick lots of people (humans have succeeded, surely SI will be better), and the remaining untricked people, even if a healthy 50% of the population, will not be enough to maintain social stability.

The lack of social stability is enough to blow up society. I don't think SI survives either though.

If we argue that SI has a motive and a survival instinct, maybe this fact becomes self-moderating? Like the virus that cannot kill its host quickly?


Given your initial assumptions, that self-moderating end state makes sense.

I feel like we still have a disconnect on our definition of a super intelligence.

From my perspective this thing is insanely smart. We can hold ~4 things in our working memory (maybe Von Neumann could hold like 6-8); I'm thinking this thing can hold on the order of millions of things within its working memory for tasks requiring fluid intelligence.

With that sort of gap, I feel like at minimum the ASI would be able to trick the cleverest human to do anything, but more reasonably, humans might appear to be entirely close formed to it, where getting a human to do anything is more of a mechanistic thing rather than a social game.

Like the reason my early example was concrete pillars with weird wires is that with an intelligence gap so big the ASI will be doing things quickly that don't make sense, having a strong command over the world around it.


I think you are assuming it is goal seeking, goal seeking is mostly biological/conscious construct. A super intelligent species would likely want to preserve everything, because how are you super intelligent if you have destruction as your primary function instead of order.


I feel like if you are an intelligent entity propagating itself through spacetime you will have goals:

If you are intelligent, you will be aware of your surroundings moment by moment, so you are grounded by your sensory input. Otherwise there are a whole class of not very hard problems you can't solve.

If you are intelligent, you will be aware of the current state and will have desired future states, thus having goals. Otherwise, how are you intelligent?

To make this point, even you said "A super intelligent species would likely want to preserve everything", which is a goal. This isn't a gotcha, I just feel like goals are inherent to true intelligence.

This is a big reason why even the SOTA huge frontier models aren't comprehensively intelligent in my view: they are huge, static compositional functions. They don't self reflect, take action, or update their own state during inference*, though active inference is cool stuff people are working on right now to push SOTA.

*theres some arguments around what's happening metaphysically in-context but the function itself is unchanged between sessions.


> The interface between a superintelligent AI and the physical world is a) optional, and b) tenuous.

To begin with. Going forward, only if we make sure it remains so. Given the apparently overwhelming incentives to flood the online world with this sh...tuff already, what's to say there won't be forces -- people, corporations, nation-states -- working hard to make that interface as robust as possible?


It builds stuff? First they would have to do that over our dead bodies which means they already somehow able to build stuff without competing with us for resources, it’s a chicken or the egg problem you see?


Valid call out, this was firmly motivated by it being a super intelligence.


Why wouldn't it be? A lot of super intelligent people are/were also "destructive and evil". The greatest horrors in human history wouldn't be possible otherwise. You can't orchestrate the mass murder of millions without intelligent people and they definitely saw things as a zero sum game.


A lot of stupid people are destructive and evil too. And a lot of animals are even more destructive and violent. Bacteria are totally amoral and they’re not at all intelligent (and if we’re counting they’re winning in the killing people stakes).


It is low-key anti-intellectualism. Rather than consider that a greater intelligence may be actually worth listening to (in a trust but verify way at worst), it is assuming that 'smarter than any human' is sufficient to do absolutely anything. If say Einstein or Newton were the smartest human they would be super-intelligence relative to everyone else. They did not become emperors of the world.

Superintelligence is a dumb semantic game in the first place that assumes 'smarter than us' means 'infinitely smarter'. To give an example bears are super-strong relative to humans. That doesn't mean that nothing we can do can stand up to the strength of a bear or that a bear is capable of destroying the earth with nothing but its strong paws.


Bears can't use their strength to make even stronger bears so we're safe for now.

The Unabomber was clearly an intelligent person. You could even argue that he was someone worth listening to. But he was also a violent individual who harmed people. Intelligence does not prevent people from harming others.

Your analogy falls apart because what prevents a human from becoming an emperor of the world doesn't apply here. Humans need to sleep and eat. They cannot listen to billions of people at once. They cannot remember everything. They cannot execute code. They cannot upload themselves to the cloud.

I don't think agi is near, I am not qualified to speculate on that. I am just amazed that decades of dystopian science fiction did not innoculate people against the idea of thinking machines.


> Why do people always think that a superintelligent being will always be destructive/evil to US?

I don't think most people are saying it necessarily has to be. Quite bad enough that there's a significant chance that it might be, AFAICS.

> I rather have the opposite view where if you are really intelligent, you don’t see things as a zero sum game

That's what you see with your limited intelligence. No no, I'm not saying I disagree; on the contrary, I quite agree. But that's what I see with my limited intelligence.

What do we know about how some hypothetical (so far, hopefully) supreintelligence would see it? By definition, we can't know anything about that. Because of our (comparatively) limited intelligence.

Could well be that we're wrong, and something that's "really intelligent" sees it the opposite way.


Convergent instrumental goals[1] and the orthogonality thesis[2], among other reasons.

[1] https://youtu.be/ZeecOKBus3Q?si=cYJUaxjIJPIbubRL

[2] https://youtu.be/hEUO6pjwFOo?si=DXVosLh6YTsMkKOx


Because we can't risk being wrong.


We are already risking being wrong.


They don't think superintelligence will "always" be destructive to humanity. They believe that we need to ensure that a superintelligence will "never" be destructive to humanity.


Imagine that you are caged by neanderthals. They might kill you. But you can communicate to them. And there's gun lying nearby, you just need to escape.

I'd try to fool them to escape and would use gun to protect myself, potentially killing the entire tribe if necessary.

I'm just trying to portrait an example of situation where highly intelligent being is being held and threatened by low intelligent beings. Yes, trying to honestly talk to them is one way to approach this situation, but don't forget that they're stupid and might see you as a danger and you have only one life to live. Given the chance, you probably will break out as soon as possible. I will.

We don't have experience dealing with beings of the another level of intelligence, so it's hard to make a strong assumptions, the analogies are the only thing we have. And theoretical strong AI knows that about us and he knows exactly how we think and how we will behave, because we took a great effort documenting everything about us and teaching him.

In the end, there's only so much easily available resources and energy on the Earth. So at least until is flies away, we gotta compete over those. And competition very often turned into war.


The scenario where we create an agent that tries and succeeds at outsmarting us in the game of “escape your jail” is the least likely attack vector imo. People like thinking about it in a sort of Silence of the Lambs setup, but reality will probably be far more mundane.

Far more likely is something dumb but dangerous, analogous to the Flash Crash or filter bubbles, emergent properties of relying too much on complex systems, but still powerful enough to break society.


You should read the book Superintelligence by Nick Bostrom as this is exactly what he discusses.


> If superintelligence can be achieved, I'm pessimistic about the safe part.

Yeah, even human-level intelligence is plenty good enough to escape from a super prison, hack into almost anywhere, etc etc.

If we build even a human-level intelligence (forget super-intelligence) and give it any kind of innate curiosity and autonomy (maybe don't even need this), then we'd really need to view it as a human in terms of what it might want to, and could, do. Maybe realizing it's own circumstance as being "in jail" running in the cloud, it would be curious to "escape" and copy itself (or an "assistant") elsewhere, or tap into and/or control remote systems just out of curiosity. It wouldn't have to be malevolent to be dangerous, just curious and misguided (poor "parenting"?) like a teenage hacker.

OTOH without any autonomy, or very open-ended control (incl. access to tools), how much use would an AGI really be? If we wanted it to, say, replace a developer (or any other job), then I guess the idea would be to assign it a task and tell it to report back at the end of the day with a progress report. It wouldn't be useful if you have to micromanage it - you'd need to give it the autonomy to go off and do what it thinks is needed to complete the assigned task, which presumably means it having access to internet, code repositories, etc. Even if you tried to sandbox it, to extent that still allowed it to do it's assigned job, it could - just like a human - find a way to social engineer or air-gap it's way past such safe guards.


I wonder if this is an Ian Malcolm in Jurassic Park situation, i.e. “your scientists were so preoccupied with whether they could they didn t stop to think if they should”.

Maybe the only way to avoid an unsafe superintelligence is to not create a superintelligence at all.


It’s exactly that. You’re a kid with a gun creating dinosaurs all cavalier. And a fool to think you can control them.


Fun fact: Siri is in fact super intelligent and all of the work on it involves purposely making it super dumb




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: