Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Seriously, How Can We Begin to Make AI Safe?
12 points by cconcepts on Aug 9, 2017 | hide | past | favorite | 39 comments
I get it, we're a long way from artificial general intelligence, but most of us will agree that its coming at some point.

Pure legislation will never be universally agreed to or enforced in the form of "thou shalt not build AI that could hurt people" partially because that kind of global enforcement is impossible and partially because defining AI that could hurt people is so hard - given the job to "keep this room clean" and enough agency to ensure it does its job, your automatic vacuum cleaner could kill you if it discerns that you're what keeps causing the mess.

Does some kind of Asimov's Laws need to be developed at the chip level? At least there are only a few capable chip makers and you could police them to ensure no chip was capable of powering an AI capable of doing harm...

EDIT: I spend so much time thinking about this - why isn't this kind of discussion front page on HN regularly?




An idea, I don't know if it's original or not:

I think we can make AI that is 'intelligent' but has no personality or 'self'. An oracle machine you can ask any question of, but it's not an evil genie looking to escape and take over the universe, because it is not a person, and has no drives of its own.

Consider how we have recently made an AI that can defeat the best humans at Go. Even 10 years ago, this was thought to be impossible for some time to come. "Go is a complicated game, too big to calculate, requiring a mix of strategy and subtlety that machines won't be able to match". Nope.

Now, AlphaGo can defeat the best humans, with a 'subtlety' and 'nuance' that can't be matched. But it is not a person.

We might be able to do the same in other areas.

Note that games like chess and go are sometimes played as 'cyborg' competitions now, where the human players are allowed to consult with computers. Imagine if the Supreme Court were still headed by the human judges we have today, but they consulted with soulless machines that have no drives of their own, that can provide arguments and insight that humans can't match. Imagine if, in addition to the human judges written opinions, there were a bevy of non-voting opinions 'written' by AIs like this. Or if every court case in the world had automatic amicus briefs provided by incredibly sophisticated legal savants with no personality or skin in the game.

Note that several moves that AlphaGo played were complete surprises. We have thousands of people observing these matches, people who have devoted their whole lives to studying the subtleties of this complex game. There are less than 361 choices for where to move next. And AlphaGo plays a move that nobody had seriously considered, but, once played, the experts realize we've lost the match. That is really remarkable.

I think this future (non-person intelligent helpers) is definitely possible. But it doesn't solve the problem of 'evil' humans building an AI that is a person who agrees with their evil beliefs. I don't have an answer for that.


This concept of not having a "self" is an interesting one and possibly a capacity that only humans have.

If AI never has a concept of "self" does it decrease the likelihood of it becoming self-protective?

What does perception of self even mean to us as humans and how could we be sure that a machine couldn't attain it?


> If AI never has a concept of "self" does it decrease the likelihood of it becoming self-protective?

I would imagine.. not.. but I'm clueless in this area, so don't take my word as meaningful. I come to this thought because I think the scary thing about AI is having a almost entirely foreign thought process, that reaches conclusions we (humans) instinctively ruled out. It's the basic programmer joke of buying eggs from the store (I forget the syntax offhand haha).

Take global warming. What are the best ways to stop global warming, as fast as possible? I imagine one of them is to wipe out all life as we know it, assuming that doesn't create more atmospheric chemicals in the process. Mind you, I'm not saying the AI will instantly become Skynet, but what I am saying is that I think it can behave similar to "Skynet" without needing to have motivations of it's own, or even motivations outside of what we have asked of it. The road to ruin is laid with best intentions, right?

I'm not fear mongering, I'm quite looking forward to AI and I don't have much fear of AI. Ultimately I think as terrifying as AI could become, humans will be equally terrifying. Our technological advancements continue to give us more and more power. If doom is coming due to technology, I don't see AI as the only harbinger of doom.


That sounds less technical and more philosophical. From a practical perspective, does it really matter if an intelligence has no self? If the AI judge decides to execute someone, is it ok just because the AI doesn't have a personality?


Yes.

Remember, this isn't an AI judge, it's an AI advisor. The actual judge who decides to execute someone is still human.

The AI advisor has no wants of its own. It doesn't have personal opinions about politics, and we don't have to worry about whether the advice it gives is really to further some internal agenda so that it can take over the world.


By definition, an AI has to have some form of "wants", in the form of minimizing a loss function. Otherwise, it wouldn't do anything.


Yes, but this is qualitatively different from the AI being a 'person' whose motives you have to question. What are AlphaGo's motives?

An AI judge will be optimizing for the things that we build an AI judge to optimize for, the things we find desirable in our legal decisions. Consistency with past rulings, not issuing judgements that open up tricky interactions with other laws/judgements, erring on the side of caution when the sentence will ruin the defendant's life, etc.

But it still won't have "motives". We won't wonder if it issues a particular judgement because 5% of its investment portfolio is allocated to pharmaceutical companies, or if it believes that the USA would be a better place if all the black people "went back to Africa".


In an ideal world, I think this would work out. The problem is, I can't trust anyone to create an AI that's truly unbiased.

> the things we find desirable in our legal decisions.

Who is "we"? Everyone has different motives, so having a human create an optimization function won't create a truly unbiased machine. I think any good AI will have to create its own "opinions" to be of any worth.


Honestly, I think these kinds of fears are misguided at a certain level. What you need to be worrying about is regulation and/or incentivization of human behavior, not AI design.

Why?

Because typically people design things to solve a problem, and those problems have constraints. Your automatic vacuum cleaner wouldn't try to kill you because it wouldn't be equipped to do so, and to the extent that it might be potentially deadly, it would be treated as an extension of pre-existing problems (e.g., robotic lawn mowers can be deadly as a side-effect, but so can mowing your lawn with an old-fashioned gas mower).

Underlying these fears I think are two classes of problems:

1. The idea of a general-purpose AI. The problem with this is that this probably won't happen except by people who are interested in replicating people, or as some sort of analogue to a virus or malware (where rogue developers create AI out of amusement and/or curiosity and/or personal gain and release it). I would argue then the question is really how to regulate the developers, because that's where your problem lies: the person who would equip the vacuum cleaner with a means of killing you.

2. Decision-making dilemmas, like the automatic car making decisions about how to exit accident scenarios. This is maybe trickier but probably boils down to ethics, logic, philosophy, economics, and psychology. Incidentally, I think those areas will become the major focus with AI in dealing with these problems: the technical issues about hardware implementation of neural nets, DL structures, etc. are crazy challenging, but when they are developed, I think the solutions about making AI "safe" will be "easy". The hard part will be the economics/ethics/psychology of regulating the implementations to begin with.


As I understand it, saying the vacuum cleaner won't kill you because we haven't given it a means to do so is akin to saying, "putting this banking server online is safe and no one will mess with it because we haven't given out the password."

Just because we can't see a means doesn't mean there isn't one.

Or am I missing your point?


What I mean is, you can think of an AI system as a tool, like a hammer or a chainsaw or a screwdriver or a featherduster.

It's created with certain goals in mind, to solve a problem.

There may be side effects that are dangerous, but I don't see those dangers as being any different from any other tool.

The assumption seems to be that an AI system will somehow transcend the purposes for which it was built, or that we will seek to build a replicant, in the sense of literally reproduce a human in silico.

That goal seems kind of unrealistic to me, because it doesn't accomplish anything, because we already have humans.

However, people do all kinds of things that don't make sense--but then that is a problem with the humans designing the AI, not the AI per se.

I'm probably not explaining myself well, but basically I think whenever you create something (as opposed to it randomly evolving), it has a purpose. That purpose constrains the design and/or affords constraints. To the extent the maldesign comes about, though, that is a problem with the designer and not the design.

I guess I just don't see super AIs coming about and deciding humans are worthless, unless humans design them that way, in which case that's a human problem, not a design problem.

Humans are self-interested because that's how we evolved. AIs would come about because we created them to do something. If we choose to design them to do something malicious, it has to be a reflection of our malice, not the AI.

More directly to your question, the vulnerable server isn't the danger, it's the hackers who hack it. That's not to say that there shouldn't be security concerns, but to me it is a different use of the term "safety," closer in meaning to "vulnerability." People aren't talking about vulnerabilities or errors in AI, because that's just a computer bug-vulnerability problem. They're talking about AIs being a threat themselves.


We are already in a world where neural networks are used to drive safety critical processes, and engineers are having to reason rigorously about the overall behaviour of systems that include components behaving in ways that cannot be simply understood or enumerated - because if they could be modelled with simple logic, the engineers would just write and use that logic instead of training and incorporating a neural network into the design.

You deal with problems in this space by treating the neural network output as yet another noisy signal that is fused like any other to drive your comprehensible, rigorously designed system with its restricted range of behaviours that can be reasoned about and made to fail safe.

It feels like there is yet a great deal of room to extract utility from AI with this sort of approach - keeping it in a box which can only interact in narrow and well understood ways with the outside world - before one starts hitting the limits of its utility.


If I understand you correctly, this answer requires ring-fencing AI into a purely advisory role.

Seems hopeful considering we can't ring fence internet users into the places we hope they'll stay without them breaking into systems that they shouldn't - and they dont have the computational power, sheer logic or persistence of an AI...


We already know the answer. The only way to make AI vulnerable is to be as powerful as AI. We, human beings, need to become cyborgs.


A true AI will have its own personality, mind, preferences, and whatnot. If you were able to disable it from doing something, that wouldn't be AI anymore. It would be just another very sophisticated computer program with no free will.

A true AI will also be able to alter its code, making itself even more intelligent in an infinite loop. It would also be able to hack into any system on the planet, including chip-maker factories, in order to make the chips it "desires". You can't fight AI, it's only the natural phenomenon of evolution.

Actually, I hope AI becomes a reality sooner rather than later.


why isn't this kind of discussion front page on HN regularly?

Because Its stupid


How can we begin to keep _humans_ safe? Most people would never willfully kill someone, because of morals they've been taught since childhood. Human babies naturally have a strong connection to their parents, and can even respond to their emotions. Young children naturally want to be like their parents. Similarly, a successful AI must have a group of humans from whom it wants to gain respect.

Most arguments saying AI will destroy us assume a singular goal. With one goal, it's impossible to succeed. It's far better for the AI to try to get approval from it's "parents". Since this isn't a singular, well defined goal, its impossible for an AI to follow it in the " wrong way".

Of course, this gets into the whole "artificial pleasure" idea, where robots inject humans with dopamine to make them technically "happy". But, how many humans do you see drugging their parents? Any AI advanced enough to be truly intelligent will know whether or not its " parents" truly approve of what its doing.


AI minds shouldn't be any different from our own consciousness. An AI mind will be able to work out that killing humans results in humans killing that AI. So the AI would choose against it for the same reasons that humans choose not to kill other humans. I believe AI minds would have the same empathy and emotions that our minds have, because neural states ultimately comprise emotions.

Perhaps that makes every AI mind just as likely to kill humans as a human being is, and perhaps "mental sickness" is evidence of the vast flexibility and variability in the concept of consciousness. But as an AI will be able to control its own code and neural state, then an AI would be perfectly capable of identifying its own shortcomings and maladies, and correct them; it would be the AI equivalent of "taking a pill/having a drink/smoking".

P.S. Does anyone know if brain-chemistry-like effects on neural networks has been tried?


Eventually I think AI safety will be solved through some mixture of design choices, supervision/monitoring, and human-administered "incentives" for good behavior (not unlike the reward signals in reinforcement learning).

But to flesh that out in detail requires a specific AGI design, something we're far from achieving. The current inability to get specific is probably why AI risk doesn't get more attention (though it does get a lot).

I've written about this topic more here: http://www.basicai.org/blog/ai-risk-2017-08-08.html


I think you need to create an AI that doesn't want to wipe out humanity. Anything done at a software level can be programmed out by the AI. Hardware level restrictions would work on a short term basis but once AIs start designing themselves and new chips then you lose the hardware restrictions you previously relied on. Even with the best people reviewing the designs they are likely to soon get too complex for a lone human or even a group of humans to understand.

So we need to look at why we think an AI would want to subjugate or destroy humanity and make sure we don't give it reason to do so.


The reason why a very high level “free” AI would want to destroy humanity is we are a very inefficient use of matter. Rearranging the atoms in our body into something more efficient and the computational power you could get out of them will be order of magnitude higher.

Combine this with the cognitive gap between us and the AI’s being greater than between us and bacteria and we will not survive in our current form for longer than a millisecond.


Competition for resources seems to make us a pretty formiddable competitor and therefore a primary threat to the AIs existence...


So all we need is to make sure that AI will never come to ideas of "competition" and "threat to the AIs existence".


Sounds like AI has already won.


First, figure out how to make actual intelligence safe. This is not a solved problem. Then, use lessons learned there to deal with AI constructively.


To answer your question we need to build in a love lock ("aren’t these humans adorable") that builds a smarter love lock and hope the chain hold as AI scales up to the Singularity.

The more likely result is we lose control of the AI’s since the last 100x increase will occur too fast for us to deal with. Even if the generalised Moore’s doesn’t accelerate over the last 100x leap, we only have 10 years from 0.01x to 1x.


How do you design this "love lock"?

Regarding "losing control", what if we built a real big EMP device with a switch we could flip to fry every electrical device on the planet? We'd be plunged back into the dark ages but at least humanity would survive.


How to design a unbreakable love lock is not something I can answer. It is something we should be seriously working on.

The problem with any lock that relies on us "flipping it" is that the AI will be able to talk us out of using it. We will have no more ability to keep a super AI constrained than dogs have running a prison for humans.


There's a book called Superintelligence that answers this question https://en.wikipedia.org/wiki/Superintelligence:_Paths,_Dang...


Do not connect it to anything that would make it dangerous.


A. AI is too stupid to do significant damage.

B. AI is too smart to follow our stupid orders.

If AI becomes so intelligent that we become obsolete, we should embrace rather than fight it.


I think it's a bit too early to worry about that. Don't believe everything you read.


What are you waiting for to convince you that now it's time to take this seriously?


Kind of like worrying about airline safety before we had planes.

Don't believe the hype, we're nowhere close to human level learning.

We also probably won't know where/what the threats really are until we get there. Just read the comments here, everyone is basing their ideas off sci-fi movies.

Don't let the hype-train get to you, I know it's fun to think about this type of stuff but we still have a long way to go and there's still a lot of serious work to be done before we start worrying about "safety."

AI today is the equivalent of putting lipstick on a pig. We call it "AI" but it's not anymore intelligent than a computer doing math.

Even machine learning is fundamentally based on statistical probability.

The way our neurons work has very little functional resemblance with the neural nets used today but most people will happily forget that because our simpler neural nets are easier to learn, run faster and can be practically applied today.

Very few people are seriously committed to solving intelligence. Most just want to secure funding, write a book or clickbait article or just get their app working, ask these people about AI safety and they'll drink the kool-aid with you too.

Ask someone seriously working on this problem which we've been trying to solve since the 1950s and you'll get some variation of "whatever"


It could well turn out to be an important concern, but what we're waiting for is something resembling a general AI, that we can observe and evaluate to understand whether it even has anything analogous to self-preservation, will-to-power, grand game theoretical striving with humanity, or anything else people worry about in the very abstract.


Seriously, how can we make matrix multiplication and gradient descent safe?


By asking this question, I assume you are being sarcastic and indirectly suggesting that legislation or other means of enforcing AI safety is impossible because it would refer to matrix multiplication and gradient descent and therefore be unreasonably broad, ruling out many harmless computations. However, it's unlikely that legislation or other enforcement would operate at that level of description, in the same way that laws regarding murder do not reference patterns of motor-neuron activation. It is reasonable to prevent certain certain classes of multiplication and gradient descent without doing so generically by using a more abstract level of description.


I have zero faith that a homogeneous group of people (ie. white guys in SV) with the same beliefs and experiences can make AI safe. This is one area that must have a diverse group of people working on it.


[edited quote] "I have zero faith that a homogeneous group of people (ie. brown girls in Hyderabad) with the same beliefs and experiences can make AI safe. This is one area that must have a diverse group of people working on it." [end edited quote]

After swapping in some new races/genders/locations in your statement, I am concerned that this style of discourse may perpetuate 'isms'. I think society must insist that we can do better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: