This sort of "internal" approach to AI safety, where you attempt to build fundamental limits into the AI itself, seems like it is easily thwarted by someone who intentionally builds an AI without these safety mechanisms. As long as AI is an open technology there will always be some criminals who just want to see the world burn.
IMO, a better approach to AI safety research is to focus on securing the first channels that a malicious AI would be likely to exploit. Like spam, and security. Can you make communications spam-resistant? Can you make an unhackable internet service?
Those seem hard, but more plausible than the "Watch out for paperclip optimizers" approach to AI safety. It just feels like inventing a way to build a nuclear weapon that can't actually explode, and then hoping the problem of nuclear war is solved.
For the most part these aren't aimed at people making bad AGIs deliberately, but rather, at well-intentioned developers launching AGIs with serious bugs. Those developers will want to include safety mechanisms, and will, hopefully, be the first to make AGIs with major capabilities.
We should also be working to secure the channels an AGI might exploit, but most of those are already tied to an economic incentive to invest in security, and are already getting large investments compared to the relatively tiny field of AI safety.
These aren't aimed at people making bad AGIs deliberately, but rather, at well-intentioned developers launching AGIs with serious bugs.
That is precisely the approach that does not make sense to me. By the time friendly developers can launch AGIs, unfriendly developers will not be far behind. And AI safety seems likely to be an issue far before the development of AGI - a non-general malicious AI that can only hack internet services or trick humans into running arbitrary code via conversation is already quite a serious problem.
So to me focusing on "unfriendly developers building narrow AIs" seems more logical than focusing on "friendly developers building AGI".
People can already build robots with guns that run around shooting people, but they usually don't. We still have laws to protect us from that. Nation states have the resources to build armies of gun-shooting robots and they already do! But again, they don't use them to destroy humanity for other reasons, like politics and retaliation and all that.
So I don't think we need to worry about someone maliciously making an evil robot. That's already a problem and we already spent thousands of years figuring out systems of society to protect us from those dangers.
I think without visiting the root of the site many people are afraid these are ground rules for strong AI development as an open source project. If that were the case with Google's backing I would be more than apprehensive.
We all know strong AI wouldn't be some sort of robot running around shooting people like a movie, this would be an extinction event worse than a rather large asteroid if it fell into the wrong hands.
The first country or corporation to create strong AI capable of self evolution will either be able to immediately take control of the world as we know it, or worse create something capable of destroying humanity as we know it.
That's not what's going on at all for anyone else that at first glance though that (I'll admit I am guilty of thinking that after glancing at the article as well).
Goal 1: Measure our progress
Goal 2: Build a household robot
Goal 3: Build an agent with useful natural language understanding
Goal 4: Solve a wide variety of games using a single agent
That's all they are trying to accomplish with this (so far anyway). None of which requires strong AI.
TLDR they are formulating some laws of robotics that are a bit more granular for dumb AI which are incapable of self improvement and very task oriented.
You are thinking in narrative terms about AI. Crazy <-> Rational is a far more likely issue with early AI than Good <-> Evil. The problem is you can't optimize for general intelligence only solutions to given problems. Let's say you have two twins one of which is smarter than the other. If you ask about future predictions then how do you know the correct answer before hand.
In other words the 99% of paperclip optimizers are going to start making virtual paperclips not paving over the universe with actual paperclips. Hacking your reward function is easer than solving hard problems.
>All else being equal, not many people would prefer to destroy the world. Even faceless
corporations, meddling governments, reckless scientists, and other agents of doom require
a world in which to achieve their goals of profit, order, tenure, or other villainies.
If our extinction proceeds slowly enough to allow a moment of horrified realization, the
doers of the deed will likely be quite taken aback on realizing that they have actually
destroyed the world. Therefore I suggest that if the Earth is destroyed, it will probably
be by mistake.
I agree. But, so far we have come closest to destroying the world when nation states are stockpiling weapons of war, threatening mutually assured destruction, with their arsenals kept on a hair trigger. To me, that seems likely to keep being the main danger scenario in the future.
It would still be a "mistake", but not the "oops what does this button do" sort of mistake, but rather the "I could have sworn they fired first" sort of mistake.
That is why I think the main goal of AI safety should be how to defend against intentionally malicious AI. We should be thinking about drone swarms and hacker-AIs rather than paper clips.
This paper seems more concerned with near-future and non-malicious AI safety - ie: your self driving car not running off a cliff or your rumba not spewing it's dust bin all over so that it has new work to do.
The stuff you talk about is farther in the future, and a much tougher problem.
> As long as AI is an open technology there will always be some criminals who just want to see the world burn.
Thankfully we have yet to see terror attacks with nuclear weapons, but the lower barriers of entry for potentially catastrophic AIs undoubtedly is alarming.
I often wonder if analog solutions will not be critical. To continue with your example, a nuke that must be armed with a physical lever would be a rather great hindrance to any AI. Internet communications, etc. will be more tricky, due to the high frequency activity, but the point of having meatspace 'firewalls' on mission critical activities is a nice short term kludge.
Low barrier to entry for catastrophic AI? Surely you aren't talking about the current state of AI. The current barrier of entry for an AGI (which is at least what it would take for catastrophe) is "does not exist". That's about as big as barriers get. Even when the first AGIs show up it will take full time supercomputers to perform like a single human brain. I don't see that as a particularly low barrier.
This is making the assumption that AGI is computationally expensive rather than just requiring of a particular algorithmic approach. It may be possible (and in fact I'd expect it to be so) to replicate general human-level intelligence with significantly less raw computational power than that embodied by the human brain.
Evolution does generate highly optimized systems but generally only when those systems have been around for tens of millions of years. Human-level intelligence has only been around for what, 50k - 100k years? We're probably still more in the 'just works' phase rather than the 'streamlined and optimal' phase.
Eh. The barrier for entry for an AGI does exist, though it is currently undefined, since we don't know what it is. The reason I say that is there are at least 7 billion general intelligences running around this planet (and many more if you consider animal intelligences) It is important to define it that way, not that it is impossible, just that it is unknown how much effort is needed to create an artificial one.
This distinction is very important when comparing the threat of AI with other significant threats. Before nuclear bombs were built we could not tell you what the difficulty was in creating one. Now that difficulty is well defined, and we can use that knowledge to prevent them from being built by most nations, except the most well funded.
If the barrier for entry for AGI (then ASI) is lower than we expect, then the threat of AI is significantly different than if AGI/ASI can only be created by nation states.
The barrier for entry for an alien invasion does exist, though it is currently undefined, since we don't know what it is. The reason I say that is there are at least 1 bloodthirsty species running around this galaxy (and many more if you consider the statistical possibility of life on other planets) It is important to define it that way, not that it is impossible, just that it is unknown how much time is needed before an alien invasion.
The reason I am framing things this way is we need to be very careful here because we are starting to turn towards speculation.
You should learn the difference between what is impossible and what just has not happened yet. Much science-fiction that was in the realm of possibility is now science-reality. One should not need reminded they are communicating at the speed of light over a global communications network capable of reaching billions of people at a time. I'm sure at one point in the past that was science-fiction, now reality. I don't believe you can show me any science that points out why AI/AGI/ASI can be created, we simply are not at that level of sophistication.
Science fiction turns out to be true when physical reality agrees that it can be true. This again, is why we have a global communications network and personal wireless devices connected to it. This is also the reason we do not go faster than light.
The reason we don't have flying cars is they are completely possible. They are also terribly dangerous and expensive and a complete waste of energy.
The reason we don't have AGI is not that it is impossible, again if nature can create it, we can recreate it. Since we don't have a good understanding of the networked nature of emergent intelligence we cannot create a power optimized network that would allow us to create a energy efficient version. AGI itself is a complete waste of energy at this point. We already have many types of AI that are energy efficient and used in products now.
In the past, single human brains have come close to destroying the world, and lots of people have access to supercomputers, so the barrier doesn't seem insurmountable.
I don't think you need AGI to cause a catastrophe. A narrow AI specializing in cyberattacks could be catastrophic, and is probably possible with current techniques.
one of the most effective and scariest attack vectors for AI would be convincing or coercing humans to do its bidding (e.g. "pull the lever or i'll switch off your father's life support")
It might be easier than that. Data Broker companies are the ones that run those stupid "Which LOTR character are you" or "What color are you" quizzes that are popular in some social media circles. They do it to slowly build psychological trait models of the people taking the quiz. This allows them to sell that information to marketing companies.
Access to that kind of data would help an AI determine the people that are more susceptible to manipulation. Add in records on health care, as you mentioned, and information on debts and you have data that can help an AI gather as many human minions as it needs.
We don't have to choose between making inherently safe AIs and creating and environment safe from hostile AIs. We should do both. Apart from anything else, we already have a planet full of intelligent agents, many of which are highly hostile, so increasing our general level of security should already be a high priority.
Can you make an unhackable person? If people can get around your safeguards, even only people with authority rather than in general, the AGI can too. Trying to keep a potentially malicious (actively or passively so, I mean only bad for humans) AGI in a box cannot work, we either get the internal parts right so we can be very confident it's not malicious to begin with (and so no need for a box attempt) or we fail.
> It just feels like inventing a way to build a nuclear weapon that can't actually explode [accidentally], and then hoping the problem of nuclear war is solved.
First of all, thanks for the wonderful work, and I hope there's much more to come from your team! In fact I'm really pleased one of the authors came down to HN to comment.
I think the scariest part of AI security is when the program itself becomes unfathomable. By that I mean, we can't just look at the source code and go "Ah! There's your problem". Now, your paper assumes a static reward function, but we can imagine the benefits of an AI that could dynamically change its reward function, or even its own source code.
In fact, the most powerful tool I can think of to train a multi-purpose agent is through evolutionary methods, and genetic algorithms. Take for example the bigger ideas behind https://arxiv.org/abs/1606.02580 [Convolution by Evolution: Differentiable Pattern Producing Networks] and http://arxiv.org/abs/1302.4519 [A Genetic Algorithm for Power-Aware Virtual Machine Allocation in Private Cloud], and determining the fitness of agents by the global accuracy on a large number of broad ML tasks. But I digress...
Given enough computing power and time, these have the possibility of ending in an "outbreak-style" scenario. [This exercise is left to the reader]. And the way AI ideas and methods are so rapidly disseminated and readily available, it's safe to imagine that it could happen in a relatively short time span.
Here's my question: I know you're with Google Brain, but do you know if OpenAI is actively researching these avenues of "self-determined" agents? For their first security-related article, I was expecting security measures along the lines of: safety guidelines for AI researchers, containment and exclusion from the Internet, shutdown protocols for the Internet backbone, etc. I get the impression some of these issues might rear their ugly heads before our cleaning robots become cumbersome.
P.S. Looking at your CV, it's funny to see that you once interned at Environment Canada. I'm also working there presently, during which time I can perfect my knowledge in ML to eventually transition careers. Small world...
These are not asking the right questions, although they kind of hint at it, and they are not fundamentally questions about AI. Example: "Can we transform an RL agent's reward function to avoid undesired effects on the environment?" Trivially, the answer is yes; put a weight on whatever effect you're trying to mitigate, to the extent you care about trading off potential benefits. They qualify this by saying essentially "... but without specifying every little thing". So - what you're trying to do is build a rigorous (ie, specified by code or data) model of what a human would think is "reasonable" behavior, while still preserving freedom for gordian knot style solutions that trade off things you don't care about in unexpected ways.
The hard part is actually figuring out what you care about, particularly in the context of a truly universal optimizer that can decide to trade off anything in the pursuit of its objectives.
This has been a core problem of philosophy for 3000 years - that is, putting some amount of rigorous codification behind human preferences. You could think of it as a branch of deontology, or maybe aesthetics. It is extremely unlikely that a group sponsored by Sam Altman, whose brilliant idea was "let's put the government in charge of it" [1], will make a breakthrough there.
I don't actually doubt that AIs would lead to philosophical implications, and philosophers like Nick Land have actually explored some of that area. But I severely doubt the ability of AI researchers to do serious philosophy and simultaneously build an AI that reifies those concepts.
You're dismissing the paper for not asking the right questions, but you don't propose any questions that you think are better.
> The hard part is actually figuring out what you care about, particularly in the context of a truly universal optimizer that can decide to trade off anything in the pursuit of its objectives.
This seems basically equivalent to what they are saying. A reward function that rewards "what we actually care about." This might seem vague, but that's fine because these are only proposed problems.
I'm not sure what point you are trying to make. It's possible to dismiss an idea without providing an alternative. Yes, finding a reward function is equivalent to figuring out what we care about. Both are about as hard as teaching a bacteria to play piano.
The goal is avoiding unsafe AI. The reason such pointless efforts are wasted on this approach is we don't have a good alternative. The only thing I can think of is delaying it's creation indefinitely, but that's also a difficult challenge. For example, in the Dune books, the government outlaws all computers. That might work for a while.
Let me elaborate. It is easy easy easy to nitpick and find holes in someone's proposals, someone's problem statements, and someone's goals in life.
Statements are adding noise and less than nothing of value if they just consist of telling people they are working on the wrong thing... and not proceeding to tell them what they should instead be working on, and giving clear positive reasons why (instead of negative reasons someone should not be working on something).
Incidentally this is a broader problem with HN discourse.
I'm saying they're modelling the problem incorrectly as a CS endeavor when it has a lot more to do with analytic philosophy.
"How do I make a program make beautiful music" is a CS problem, but only after you have some notion of aesthetics in the first place.
In the context of a universal optimizer, "how do we make this program behave reasonably without bad side effects" is maybe a CS problem, but it's predicated on "how do we codify our notion of reasonable behavior", which is analytic philosophy with probably a bit of social science thrown in.
Problem-posing is itself difficult and how a lot of philosophical breakthroughs are made. If you want rigorous problem-posing where the solution would be handy for AI, hiring a philosopher might be a good start. Very few of us are equipped to do this kind of work, certainly not here in the comments section.
I think you've hit the nail on the head here, this feels to me like one of a few responses in this thread that gets to the core of the issue if we're thinking about long-term AI risk.
In fact, I'm surprised that there doesn't seem to be any reference in the article to previous work on these philosophical implications, e.g. the stuff that has been written by Nick Bostrom or MIRI. Perhaps there are some in the paper?
I think that for the forseeable future, we will inevitably end up with two of the problems that various philosophers have outlined over the last few years:
(1) How do we ensure that an AI agent does exactly what we want it to do and
(2) What do we ultimately want if we can desire anything?
I think that any developer trying to approach this will be doomed to hack around these two issues. We can probably come a long way in AI capabilities without having the optimal solution to this, but the core problem will remain for a long time and haunt those who are cautious.
1) Is exactly what this paper is addressing. The fact that their is philosophical ambiguity is precisely why these are problems, and not solved already.
In a variety of engineering fields, including but not limited to software, we have wonderful tools to track down and eliminate 'bugs'. While high standards are often not upheld, the concepts are largely sound.
In particular, I'm talking about verification and validation testing. I'm curious why generally these approaches are not being leveraged to ensure quality of output here.
I suspect this is because of the persistent belief that AI will annihilate humanity with one mishap, but I'm suggesting that we approach this much more like traditional engineering problems, such as building a bridge or flying a plane, whereby rigorous standards of are continually applied to ensure the system behaves as designed.
The resulting system will look much more like continuous integration with robust regression testing and high line coverage than it will be the sexy research ideas presented here, but I can't help but think it will be more robust. These systems are too complicated to treat them as anything but a black box, at least from a quality assurance standpoint.
Err. Great engineering failures are never one mishap creating a problem. They are many issues all meeting at a critical point. The problem with intelligence is unexpected emergence, a higher order problem occurs out of simple parts in an novel and unpredictable way.
From the article: Safe exploration. Can reinforcement learning (RL) agents learn about their environment without executing catastrophic actions? For example, can an RL agent learn to navigate an environment without ever falling off a ledge?
Yes. That's why I was critical of an academic AI effort which attempts automatic driving by training a supervised learning system by observing human drivers. That's going to work OK for a while, and then do something really stupid, because it has no model of catastrophic actions.
I may be stupid and I am indeed but it is insanely straightforward today (not tomorrow) to put a gun on a drone and tell it to image recognize some targeted 1.3-2.3m tall biped with oval head and shoot him/her down.
Is your point that there are unaddressed safety concerns with existing tech? While true, none of them are really existential threats, whereas something with greater than human intelligence yet none of the limitations of a single biological body to upkeep is such a threat.
There's a huge difference, though. Creating a nuclear weapon encourages others to do the same. An AGI, if done right, would never allow the creation of a second one with conflicting values; there should be no second AGI.
You are mistaking an AGI for an artificial superintelligence (I might also add that the very concept of superintelligence is pure speculation - basic AGI at least can be grounded in replication of human brains). The first AGI will be closer to a low-IQ human than a Machiavellian super-optimizer.
I don't think there's much reason to be confident that the first AGI will be like a low-IQ human, at least for very long, even if it starts off as an emulated human brain. Machines have the huge advantage of faster materials (neurons are slow), perfect memory, perfect calculation, ability to scale horizontally, backups to restore in the event of bad self-modification experiments, and no need for things like sleep and food to take up its time from learning, improving itself, and acting upon the world.
You've omitted the huge, overwhelmingly outweighing disadvantages of intelligence in machines, which is that they don't work. Given that research progress is incremental, there really isn't reason to believe we will jump from narrow AI to super-smart AGI, instead of narrow AI to dumb AGI to smart AGI to super-smart AGI.
There are historical examples of discontinuities, though I don't think the FOOM debate will be settled soon. It may be quite a while before we even get to "dumb AGI", but the hardest part of that is the G part. Right now "they don't work" is indistinguishable from "they don't exist", but if we get that G, I don't see how you could claim either. From there, even if we suppose it's another huge leap to get to true super-intelligence instead of a FOOM, the time to get to merely smart AGI, and indeed smarter-than-human AGI, would be short, if only for the basic advantages of a silicon machine substrate. If all we had were human brains running on silicon even, that would be enough to quickly reach superhuman general intelligence, even if not true super- (or perhaps ultra- as I.G. Good originally put it) intelligence that we expect for a Singularity event.
This makes no sense. This is just unsubstantiated speculation right now. There is no reason to believe that dumb AGI will hardware-scale to smart AGI for free. In fact most machine learning algorithms have diminishing (logarithmic) returns with data and compute.
> I might also add that the very concept of superintelligence is pure speculation - basic AGI at least can be grounded in replication of human brains
I'd tend to think the reverse: the idea that we can recreate general intelligence requires some degree of speculation, but once we have general intelligence running on a computer, it takes a fairly contrived set of conditions for it to not scale up exponentially.
This is speculation. We don't even know what general intelligence on a computer will look like, therefore we don't know what constraints there will be, therefore it is speculative to say there won't be constraints on exponential scaling.
Huh? We have flying cars. They are called planes, and they take a lot of maintenance to keep from falling out of the sky.
But let's change the question up a bit... There are billions and billions of flying intelligences on this planet. Birds, insects, even mammals. Nature has already created that. We've created things that are even better at flying fast and carrying more weight. So simply looking at 'flying cars' and saying they didn't happen so AI can't happen is at the least, very ignorant.
If nature can create something randomly, we can create something directed in a shorter period of time (well, we don't really have another 4 billion years to try). AGI is an eventuality.
GI already exists, AGI is only speculative if you believe that humans go extinct before they get complex. We can perform scientific experiments on GI at this time.
An alien invasion is speculative at this point because there is no alien species on to which we can base any scientific refutation. It is not impossible, it is simply impossible to define any probably of occurrence.
How would we program a self driving car that is faced with something like a "Trolley problem" [1].
i.e. the car is faced with 2 possible probable collisions of which it can only avoid one. Or between running over a pedestrian and crashing into a tree.
I assume this probably already worked into the current prototypes. Does anyone have references to discussions about this in current gen self driving car prototypes?
Jean-François Bonnefon; Azim Shariff; Iyad Rahwan (2015-10-13). "Autonomous Vehicles Need Experimental Ethics: Are We Ready for Utilitarian Cars?". arXiv.org.
http://arxiv.org/abs/1510.03346
> Can we transform an RL agent's reward function
> to avoid undesired effects on the environment?
To me this is the toughest nut in the lot. Training a Pac-man agent to avoid ghosts and eat pellets, in a world of infinite hazards and cautions! Any strategies?
We have well established techniques for developing systems which are safe and exhibit high levels of integrity. We just need to make the tools that support these techniques freely available.
90% of the techniques for making reliable systems are careful requirements engineering and even more careful testing. There is no secret sauce.
I don't think these techniques transfer easily to the AI field. While I might be able to prove that the state machine that controls my nuclear power plant always rams in the control rods in case something bad happens, it's a lot harder to show that some fuzzy system like a neural network doesn't exhibit kill-all-humans behaviours.
You are right. There is no secret sauce. There are no magic bullets. Careful requirements engineering and careful testing is absolutely what you need.
However -- many of these techniques do transfer to the AI field -- albeit with some tweaking and careful thought.
Requirements are still utterly critical. Phrasing the requirements right is important and requires more than a passing thought -- particularly as concerns testability.
A lot of it boils down to requirements that get placed on the training and validation data sets; and the statistical tests that need to be passed: how much data is required and how you can demonstrate that the test data provides sufficient coverage of the operating envelope of the system to give you confidence that you understand how it behaves.
The architecture is critical also -- how the problem is decomposed into safe, testable and understandable subsets -- which has much more to do with how the system is tested than how it solves the primary problem.
This is not quite the full picture -- there are V&V issues which are specific to machine learning systems -- but lest we put the cart before the horse, these should properly build upon a mature V&V infrastructure, toolchain support for which isn't so great in the open source world.
Two obvious ones: Isaac Asimov (basically everything) and Iain M Banks (the Culture). For me, Asimovs universe was much more rewarding as it raises and explores much more ethical and philosophical questions, but Banks has an alternative view on future AI which is worth checking out.
Its common today to make a robot that kills anybody that comes within a foot or two of it. Without any image recognition at all; much more damaging than a gun; and 100M of them already deployed. This conversation is silly and pointless, until we clean up the insane number of land mines deployed around our planet.
Land mines are awful and we should absolutely do something about them. But it makes precisely no sense at all to say "no one should bother to think about AI safety because land mines are awful". You might as well say "no one should bother to think about land mines because cancer is awful" or "no one should bother to think about cancer because aging is awful".
One problem isn't nonexistent or irrelevant just because there's another problem that you regard as worse or more urgent. It's not even like solving the problems of AI safety requires the same kinds of people or the same kinds of resources as solving the problems of land mines; if you tell people not to think about AI safety it's not really going to make them go away and solve the land mine problem.
Um, 100 million of them already out there? So locking the barn door after the horse is gone.
I get it; we don't have land mines in first-world countries, and we will have AIs, so AIs are more interesting to talk about. That's why we continue to have land mines all over the world I think. Not our problem.
All the issues surrounding implacable AI killers on the loose are only something to talk about, if you haven't lived with them for generations already. Want to get real answers to sophomoric questions about robot killers? Just ask the people who already know.
It sounds as if that's intended to be an objection to something I wrote, but I've no inkling what. I certainly didn't mean to deny that there are a hell of a lot of them out there.
> we don't have land mines in first-world countries
I think the chances of a productive discussion would be greater if you didn't leap straight to assuming bad faith on the part of the people you're talking to.
Land mines are a big deal. They're a problem that needs solving. But you're not merely saying that; you're jumping into a discussion of something else and saying "you shouldn't be talking about this at all as long as there are land mines".
Which would be at least somewhat consistent (albeit rude), if that were your response to every HN discussion of things less important than land mines. But it isn't. By the advanced technique of clicking on your username, I see that you've been quite happy to participate in discussions of "table-oriented programming", mobile phone headphone jacks, and off-by-one errors in audio programming, and that you work in embedded software development. Are those things, unlike AI safety, more important than land mines?
I doubt you think that headphone jacks are more important than land mines. So why do you react to a discussion of headphone jacks by talking about headphone jacks, and to a discussion of AI safety by saying it's ridiculous and sophomoric to ask about AI safety when there are millions of land mines out there killing people?
You're trying to make out that the reason is that land mines are the same kind of things as hypothetical unsafe AI systems because they are human-made machines that kill people. But you're an intelligent person and surely you can't possibly really believe that. To deal with land mines we need treaties to stop them being deployed, we need ways of finding them that are cheap enough to deploy in quantity and effective enough to be worth deploying, we need ways of disarming them with the same qualities, and we need effective help for people who get blown up by them. None of these bears any resemblance to anything we might do about AI safety. To an excellent approximation, there is no overlap between the people who can do useful work on AI safety and the people who can do useful work on land mines. And the dangers don't arise in the same way: land mines are dangerous because they are put in place with the specific intention of killing anyone who passes, whereas in the scenarios AI safety people worry about no one intends the AI systems to cause trouble.
So that can't really be it, I think.
Why do you object to discussing AI safety but not to discussing mobile phone headphone jacks, really?
IMO, a better approach to AI safety research is to focus on securing the first channels that a malicious AI would be likely to exploit. Like spam, and security. Can you make communications spam-resistant? Can you make an unhackable internet service?
Those seem hard, but more plausible than the "Watch out for paperclip optimizers" approach to AI safety. It just feels like inventing a way to build a nuclear weapon that can't actually explode, and then hoping the problem of nuclear war is solved.