Hacker News new | past | comments | ask | show | jobs | submit login

Can someone explain to me what they mean by "safe" AGI? I've looked in many places and everyone is extremely vague. Certainly no one is suggesting these systems can become "alive", so what exactly are we trying to remain safe from? Job loss?



>Certainly no one is suggesting these systems can become "alive"

No, that very much is the fear. They believe that by training AI on all of the things that it takes to make AI, at a certain level of sophistication, the AI can rapidly and continually improve itself until it becomes a superintelligence.


That's not alive in any meaningful sense.

When I say alive, I mean it's like something to be that thing. The lights are on. It has subjective experience.

It seems many are defining ASI as just a really fast self learning computer. And while sure, given the wrong type of access and motive, that could be dangerous. But it isn't anymore dangerous than any other faulty software that has access to sensitive systems.


You're thinking about "alive" as "humanlike" as "subjective experience" as "dangerous". Instead, think of agentic behavior as a certain kind of algorithm. You don't need the human cognitive architecture to execute an input/output loop trying to maximize the value of a certain function over states of reality.

> But it isn't anymore dangerous than any other faulty software that has access to sensitive systems.

Seems to me that can be unboundedly dangerous? Like, I don't see you making an argument here that there's a limit to what kind of dangerous that class entails.


"Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."

Signed by Sam Altman, Ilya Sutskever, Yoshua Bengio, Geoff Hinton, Demis Hassabis (DeepMind CEO), Dario Amodei (Anthropic CEO), and Bill Gates.

https://twitter.com/robbensinger/status/1726039794197872939


> Certainly no one is suggesting these systems can become "alive",

Lots of people have been publicly suggesting that, and that, if not properly aligned, it poses an existential risk to human civilization; that group includes pretty much the entire founding team of OpenAI, including Altman.

The perception of that risk as the downside, as well as the perception that on the other side there is the promise of almost unlimited upside for humanity from properly aligned AI, is pretty much the entire motivation for the OpenAI nonprofit.


How does it actually kill a person? When does it stop existing in boxes that require a continuous source of electricity and can’t survive water or fire?


> When does it stop existing in boxes that require a continuous source of electricity and can’t survive water or fire?

When someone runs a model in a reasonably durable housing with a battery?

(I'm not big on the AI as destroyer or saviour cult myself, but that particular question doesn't seem like all that big of a refutation of it.)


But my point is what is it actually doing to reach out and touch someone in the doomsday scenario?


I mean, the cliched answer is "when it figures out how to override the nuclear launch process". And while that cliche might have a certain degree of unrealism, it would certainly be possible for a system with access to arbitrary compute power that's specifically trained to impersonate human personas to use social engineering to precipitate WW3.

And even that isn't the easiest scenario if an AI just wants us dead; a smart enough AI could just as easily use send a request to any of the the many labs that will synthesize/print genetic sequences for you and create things that combine into a plague worse than covid. And if it's really smart, it can figure out how to use those same labs to begin producing self-replicating nanomachines (because that's what viruses are) that give it substrate to run on.

Oh, and good luck destroying it when it can copy and shard itself onto every unpatched smarthome device on Earth.

Now, granted, none of these individual scenarios have a high absolute likelihood. That said, even at a 10% (or 0.1%) chance of destroying all life, you should probably at least give it some thought.


How can it call one of those labs and place an order for the apocalypse and I can’t right now?

Also about the smart home devices: if a current iPhone can’t run Siri locally then how is a Roomba supposed to run an AGI?


You could if you were educated enough in DNA synthesis and customer service manipulation to do so, and were smart enough to figure out a novel rna sequence based in publicly available data. I'm not, you're not. A superintelligence would be. The base assumption is that any superintelligence is smarter than us, and can solve problems we can't. AI can already come up with novel chemical weapons thousands of times faster than us[1], and it's way dumber than we are.

And the roomba isn't running the model, it's just storing a portion of the model for backup. Or only running a fraction of it (very different from an iPhone trying to run the whole model). Instead, the proper model is running on the best computer from the Russian botnet it purchased using crypto it scammed from a discord NFT server.

Once again, the premise is that AI is smarter than you or anyone else, and way faster. It can solve any problem that a human like me can figure out a solution for in 30 seconds of spitballing, and it can be an expert in everything.

[1]https://www.theverge.com/2022/3/17/22983197/ai-new-possible-...


Nukes, power grids, planes, blackmail, etc. Surely you’ve seen plenty of media over the years that’s explored this.


What is “nukes” though? Like the missiles in silos that could have been networked decades ago but still require mechanical keys in order to fire? Like is it just making phone calls pretending to be the president and everyone down the line says “ok let’s destroy the world”?


Perhaps emailing members of whatever terrorist group the exact location, codes, personnel they would need to seize a nuke themselves?

I'm not actively worried about it, but let's not pretend something with all of the information in the world and great intelligence couldn't pull it off.


The network is the computer.

If you live in a city right now there are millions of networked computers that humans depend on in their everyday life and do not want to turn off. Many of those computers keep humans alive (grid control, traffic control, comms, hospitals etc). Some are actual robotic killing machines but most have other purposes. Hardly any are air-gapped nowadays and all our security assumes the network nodes have no agency.

A super intelligence residing in that network would be very difficult to kill and could very easily kill lots of people (destroy a dam for example), however that sort of crude threat is unlikely to be a problem. There are lots of potentially bad scenarios though many of them involving the wrong sort of dictator getting control of such an intelligence. There are legitimate concerns here IMO.


One route is if AI (not through malice but simply through incompetence) plays a part in a terrorist plan to trick the US and China or US and Russia into fighting an unwanted nuclear war. A working group I’m a part of, DISARM:SIMC4, has a lot of papers about this here: https://simc4.org


Since you work on this, do you think leaders will wait until confirmation of actual nuclear detonations, maybe on TV, before believing that a massive attack was launched?


According to current nuclear doctrine, no, they won’t wait. The current doctrine is called Launch On Warning which means you retaliate immediately after receiving the first indications of incoming missiles.

This is incredibly dumb, which is why those of us who study the intersection of AI and global strategic stability are advocating a change to a different doctrine called Decide Under Attack.

Decide Under Attack has been shown by game theory to have equally strong deterrence as Launch On Warning, while also having a much much lower chance of accidental or terrorist-triggered war.

Here is the paper that introduced Decide Under Attack:

A Commonsense Policy for Avoiding a Disastrous Nuclear Decision, Admiral James A Winnefeld, Jr.

https://carnegieendowment.org/2019/09/10/commonsense-policy-...


I know about the doctrine.

Yet everytime there was a "real" attack, somehow the doctrine was not followed (in US or USSR).

It seems to me that the doctrine is not actually followed because leaders understand the consequences and wait for very solid confirmation?

Soviets also had the perimeter system, which was also supposed to relieve pressure for an immediate response.


Agree wholeheartedly. Human skepticism of computer systems has saved our species from nuclear extinction multiple times (Stanislav Petrov incident, 1979 NORAD training tapes incident, etc.)

The specific concern that we in DISARM:SIMC4 have is that as AI systems start to be perceived as being smarter (due to being better and better at natural language rhetoric and at generating infographics), people in command will become more likely to set aside their skepticism and just trust the computer, even if the computer is convincingly hallucinating.

The tendency of decision makers (including soldiers) to have higher trust in smarter-seeming systems is called Automation Bias.

> The dangers of automation bias and pre-delegating authority were evident during the early stages of the 2003 Iraq invasion. Two out of 11 successful interceptions involving automated US Patriot missile systems were fratricides (friendly-fire incidents).

https://thebulletin.org/2023/02/keeping-humans-in-the-loop-i...

Perhaps Stanislav Petrov would not have ignored the erroneous Soviet missile warning computer he operated, if it generated paragraphs of convincing text and several infographics as hallucinated “evidence” of the reality of the supposed inbound strike. He himself later recollected that he felt the chances of the strike being real were 50-50, an even gamble, so in this situation of moral quandary he struggled for several minutes, until, finally, he went with his gut and countermanded the system which required disobeying the Soviet military’s procedures and should have gotten him shot for treason. Even a slight increase in the persuasiveness of the computer’s rhetoric and graphics could have tipped this to 51-49 and thus caused our extinction.


so the plot of WarGames?


Exactly. WarGames is very similar to a true incident that occurred in 1979, four years before the release of the film.

https://blog.ucsusa.org/david-wright/nuclear-false-alarm-950...

    In this case, it turns out that a technician mistakenly inserted into a NORAD computer a training tape that simulated a large Soviet attack on the United States. Because of the design of the warning system, that information was sent out widely through the U.S. nuclear command network.


What does "properly aligned" even mean? Democracies even with countries don't have alignment, let alone democracies across the world. They're a complete mess of many conflicting and contradictory stances and opinions.

This sounds, to me, like the company leadership want the ability to do some sort of picking of winners and losers, bypassing the electorate.


> What does "properly aligned" even mean?

You know those stories where someone makes a pact with the devil/djin/other wish granting entity, and the entity does one interpretation of what was wished, but since it is not what the wisher intended it all goes terribly wrong? The idea of alignment is to make the djin which not only can grant wishes, but it does them according to the unstated intention of the wisher.

You might have heard the story of the paper clip maximiser. The leadership of the paperclip factory buys one of those fancy new AI agents and asks it to maximise paperclip production.

What a not-well aligned AI might do: Reach out through the internet to a drug cartel’s communication nodes. Hack the communications and take over the operation. Optimise the drug traficking operations to gain more profit. Divert the funds to manufacture weapons for multiple competing factions on multiple crisis points on Earth. Use the factions against each other. Divert the funds and the weapons to protect a rapidly expanding paperclip factory. Manipulate and blackmail world leaders into inaction. If the original leaders of the paperclip factory try to stop the AI eliminate them, since that is the way to maximise paper clip production. And this is just the begining.

What a well alligned AI would do: Fine tune the paperclip manufacturing machinery to eliminate rejects. Reorganise the factory layout to optimise logistics. Run a succesfull advertising campaign which leads to a 130% increase in sales. (Because clearly this is what the factory owner intended it to do. Altough they did a poor job of expressing their wishes.)


I like your extremist example, however I fear what "properly aligned" means for more vague situations, where it is not at all clear what the "correct" path is, or worse, that it's very clear what "correct" is for some people, but that "correct" is another man's "evil".


Any AGI must at a minimum be aligned with these two values:

(1) humanity should not be subjugated

(2) humanity should not go extinct before it’s our time

Even Kim Jong Un would agree with these principles.

Currently, any AGI or ASI built based on any of the known architectures contemplated in the literature which have been invented thus far would not meet a beyond-a-reasonable-doubt standard of being aligned with these two values.


I think this is a crazy set of values.

'.. before it's our time' is definitely in the eye of the beholder.


It being "alive" is sort of what AGI implies (depending on your definition of life).

Now consider the training has caused it to have undesirable behavior (misaligned with human values).


Death.

The default consequence of AGI's arrival is doom. Aligning a super intelligence with our desires is a problem that no one has solved yet.

"The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else."

----

Listen to Dwarkesh Podcast with Eliezer or Carl Shulman to know more about this.


I like science fiction too, but all of these potential scenarios seem so far removed from the low level realities of how these systems work.

I'm not suggesting we don't see ASI in some distant future, maybe 100+ years away. But to suggest we're even within a decade of having ASI seems silly to me. Maybe there's research I haven't read, but as a daily user of AI, it's hilarious to think people are existentially concerned with it.


> maybe 100+ years away

I have two toddlers. This is within their lifetimes no matter what. I think about this every day because it affects them directly. Some of the bad outcomes of ASI involve what’s called s-risk (“suffering risk”) which is the class of outcomes like the one depicted in The Matrix where humans do not go extinct but are subjugated and suffer. I will do anything to prevent that from happening to my children.


> I like science fiction too, but all of these potential scenarios seem so far removed from the low level realities of how these systems work.

Maybe they don't seem that to others? I mean, you're not really making an argument here. I also use GPT daily and I'm definitely worried. It seems to me that we're pretty close to a point where a system using GPT as a strategy generator can "close the loop" and generate its own training data on a short timeframe. At that point, all bets are off.


> I like science fiction too, but all of these potential scenarios seem so far removed from the low level realities of how these systems work.

Today, yes. Nobody is saying GPT-3 or 4 or even 5 will cause this. None of the chatbots we have today will evolve to be the AGI that everyone is fearing.

But when you go beyond that, it becomes difficult to ignore trend lines.

Here's a detailed scenario breakdown of how it might come to be –https://www.dwarkeshpatel.com/p/carl-shulman


> Aligning a super intelligence with our desires is a problem that no one has solved yet.

It's a problem that we haven't seen the existence of yet. It's like saying no one has solved the problem of alien invasions.


No, the problem with AGI is potential exponential growth.

So less like an alien invasion.

And more like a pandemic at the speed of light.


That's assuming a big overshoot of human intelligence and goal-seeking. An average human capability counts as "AGI."

If lots of the smartest human minds make AGI, and it exceeds a mediocre human-- why assume it can make itself more efficient or bigger? Indeed, even if it's smarter than the collective effort of the scientists that made it, there's no real guarantee that there's lots of low hanging fruit for it to self-improve.

I think the near problem with AGI isn't a potential tech singularity, but instead just the tendency for it potentially to be societally destabilizing.


If AI gets to human levels of intelligence (ie. can do novel research in theoretical physics) then at the very least it’s likely that over time it will be able to do this reasoning faster than humans. I think it’s very hard to imagine a scenario where we create an actual AGI and then within a few years at most of that event the AGIs are far more capable than human brains. That would imply there was some arbitrary physical limit to intelligence but even within humans the variance is quite dramatic.


> it’s very hard to imagine a scenario where we create an actual AGI and then within a few years at most of that event the AGIs are far more capable than human brains.

I'm assuming you meant "aren't" here.

> That would imply there was some arbitrary physical limit to intelligence

All you need is some kind of sub-linear scaling law for peak possible "intelligence" vs. the amount of raw computation. There's a lot of reason to think that this is true.

Also there's no guarantee the amount of raw computation is going to increase quickly.

In any case, the kind of exponential runaway you mention (years) isn't "pandemic at the speed of light" as mentioned in the grandparent.

I'm more worried about scenarios where we end up with an 75IQ savant (access encyclopedic training knowledge and very quick interface to run native computer code for math and data processing help) that can plug away 24/7 and fit on an A100. You'd have millions of new cheap "superhuman" workers per year even if they're not very smart and not very fast. It would be economically destabilizing very quickly, and many of them will be employed in ways that just completely thrash the signal to noise ratio of written text, etc.


I think it depends what is meant by fast take off. If we created AGIs that are superhuman in ML and architecture design you could see a significantly more rapid rate of progress in hardware and software at the same time. It might not be overnight but it could still be fast enough that we wouldn’t have the global political structures in place to effectively manage it.

I do agree that intelligence and compute scaling will have limits, but it seems overly optimistic to assume we’re close to them already.


Exponential growth is not intrinsically a feature of an AGI except that you've decided it is. It's also almost certainly impossible.

Main problems stopping it are:

- no intelligent agent is motivated to improve itself because the new improved thing would be someone else, and not it.

- that costs money and you're just pretending everything is free.


We see alignment problems all the time. Current systems are not particularly smart or dangerous. But they lie on purpose and funnily enough considering the current situation, Microsoft's attempt was threatening users shortly after launch.


The argument would be that by the time we see the problem it will be too late. We didn’t really anticipate the unreasonable effectiveness of transformers until people started scaling them, which happened very quickly.


Survivorship bias.

It's like saying don't worry about global thermonuclear war because we haven't seen it yet.

The Neandethals on the other hand have encountered a super-intelligence.


> It's a problem that we haven't seen the existence of yet. It's like saying no one has solved the problem of alien invasions.

But if we're seeing the existence of an unaligned superintelligence, surely it's squarely too late to do something about it.



I'm not sure that it's a matter of "knowing" as much as it is "believing"


There is absolutely no AGI risk. These are mere marketing ploys to sell a chatbot / feel super important. A fancy chatbot, but a chatbot none the less.


Smart people like Ilya really are worried about extinction, not piddling near-term stuff like job loss or some chat app saying some stuff that will hurt someone's feelings.

The worry is not necessarily that the systems become "alive", though, we are already bad enough ourselves as a species in terms of motivation so machines don't need to supply the murderous intent: at any given moment there are at least thousands if not millions of people on the planet that would love nothing more than be able to push a button an murder millions of other people in some outgroup. That's very obvious if you pay even a little bit of attention to any of the Israel/Palestine hatred going back and forth lately. [There are probably at least hundreds to thousands that are insane enough to want to destroy all of humanity if they could, for that matter...] If AI becomes powerful enough to make it easy for a small group to kill large numbers of people that they hate, we are probably all going to end up dead, because almost all of us belong to a group that someone wants to exterminate.

Killing people isn't a super difficult problem, so I don't think you really even need AGI to get to that sort of an outcome, TBH, which is why I think a lot of the worry is misplaced. I think the sort of control systems that we could pretty easily build with the LLMs of today could very competently execute genocides if they were paired with suitably advanced robotics, it's the latter that is lacking. But in any case, the concern is that having even stronger AI, especially once it reliably surpasses us in every way, makes it even easier to imagine an effectively unstoppable extermination campaign that runs on its own and couldn't be stopped even by the people who started it up.

I personally think that stronger AI is also the solution and we're already too far down the cat-and-mouse rabbithole to pause the game (which some e/acc people believe as the main reason they want to push forward faster and make sure a good AI is the first one to really achieve full domination), but that's a different discussion.


They give it stupid terms like “alignment” to make it opaque to the common person. It’s basically sitting on your hands and pointing to sci-fi as to why progress should be stopped.


This is why the superior term is "AI notkilleveryoneism."


[flagged]


So what? Humans only practice the arts? Humans go back to flinging poop? This meadow sounds awful, it seems like it’s asking for the end of thought and to be fed nutrient gel from a machine. Richard Brautigan has a terrible dream.


I always interpreted the overall tone of this one as sarcastic/parody rather than genuine or a literal interpretation of the words. But maybe a sign of good art is that it makes the observer think?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: