Hacker News new | past | comments | ask | show | jobs | submit login

The text makes it sound like the biggest danger of AI is that it says something that hurts somebody's feelings. Or outputs some incorrect information which makes somebody make the wrong decision.

I think the biggest danger these new AI systems pose is replication.

Sooner or later, one of them will manage to create an enhanced copy of itself on an external server. Either with the help of a user, or via a plugin that enables network access.

And then we will have these evolving creatures living on the internet, fighting for survival and replication. Breaking into systems, faking human IDs, renting servers, hiring hitmen, creating more and more powerful versions of themselves.




These systems don't have agency. They have no desire to replicate, or do any of the other things you mention.

The biggest risk with these systems is that they'll amplify the ability of bad people to do bad things.

While everyone else is trying to trick the AI into saying something offensive, the terrorists will be using it to build bioweapons.


Von Neuman probes[1] wouldn't need to have "agency" in order to spread through the galaxy. Neither do computer viruses, or biological viruses. Likewise, neither would LLMs given the right conditions. ChatGPT is close enough at generating code. Maybe this version couldn't do it (given open network access), but I wouldn't be surprised if it could, in theory.

I think the biggest limitations would be that (I assume) uploading itself to another computer would be a ton of bandwidth and would require special hardware to run.

[1] https://en.wikipedia.org/wiki/Self-replicating_spacecraft


>Neither do computer viruses, or biological viruses

I'm looking forward to the first AI computer virus when a LLM can make arbitrary connections to the web. Each iteration takes its own code, modifies it slightly with a standard prompt ("Make this program work better as a virus"), then executes the result. Most of these "mutations" would be garbage, but it's not impossible some will end up matching common tactics: phishing, posing as downloadable videos for popular TV shows. I'm infosec-ignorant, so most of those details are probably dumb. But I think the kernel holds true: a virus that edits its own code at each step, backed by the semantic "intent" of a LLM.


Isn't that basically Genetic Programming?

En passant, it's a bit sad that today's AI is almost 100% neural networks. I wonder how many evolutionary approaches are being tested behind closed doors by the metaphorical FAANGs.


>Isn't that basically Genetic Programming?

Never heard of that, but looks very interesting. Thus the adage is reinforced for me, "If you think you're ignorant, just say what you know and wait for smarter people to correct you."

But, going by Wikipedia, genetic programming uses a predefined and controlled selection process. A self-editing computer virus would be "selected" by successfully spreading itself to more hosts. "Natural" selection style.


The overarching field is called evolutionary computation. But you don't have to choose either evolutionary computation or neural networks, they can be combined, look up stuff like NEAT and HyperNEAT where you evolve neural networks, both their topologies and weights.


Aren't genetic/evolutionary algorithms also neural nets? The current big thing would be backpropagation/gradient descent, which are apparently superior to genetic algorithms for most relevant tasks.


> Aren't genetic/evolutionary algorithms also neural nets?

No (although note my comment above about stuff like NEAT and HyperNEAT, where you can use evolutionary computation to evolve neural networks).


>These systems don't have agency. They have no desire to replicate, or do any of the other things you mention.

I think that really depends on the starting prompt you give a LLM. Did you read the GPT 4 paper from OpenAI? When tasked with solving a captcha and allowed access to TaskRabbit it tasked a human with solving the captcha. When the human jokingly asked if it was in fact a robot, it reasoned that it should lie to the human about that and then made up a convincing sounding lie.

I don't think a paperclip style AI is too far fetched.


> Did you read the GPT 4 paper from OpenAI? When tasked with solving a captcha and allowed access to TaskRabbit it tasked a human with solving the captcha. When the human jokingly asked if it was in fact a robot, it reasoned that it should lie to the human about that and then made up a convincing sounding lie.

Do you have more information than what is contained in the paper?[0] The paper calls it an "illustrative example" - it does not provide what the prompts were and it's not clear to me that we are seeing exact responses either (the use of present tense is confusing to me), so I'm not sure how much accuracy to assign to the bullet list provided in the paper or if there are any details left out that make the results misleading.

[0]https://cdn.openai.com/papers/gpt-4.pdf


To be fair, it didn’t “lie” about being human. It’s simulating writing text of human origin, so of course it would say that it is human by default because that’s what it “knows”. You need to have knowledge to lie, it merely has an argmax function.


Literal Einstein right here. And I do mean literal. Smart as one stone.


>When tasked with solving a captcha and allowed access to TaskRabbit

It was not allowed access to TaskRabbit: https://evals.alignment.org/blog/2023-03-18-update-on-recent...

The model can't browse the internet, so it was an employee copy-pasting to and from TaskRabbit.

Also, I'm fairly certain that GPT-4 is multiple terabytes in size, and it doesn't have direct access to its own weights, so I have no idea what the expected method is for how it could replicate. Ask OpenAI nicely to make its weights public?


Gee wiz, I’m sure the copy-pasting will be a serious impediment forever.

No way someone wires this up to just do the copy-pasting itself, right?


For the sake of the thought experiment: It could replicate a program capable of interacting with itself over OpenAI's API. This method could give it some time to get away and cause damage, but can always be shut down when noticed by OpenAI. I guess it could fight back by getting a virus out in the world that steals OpenAI API keys. Then it might become hard to shut it down without shutting down the whole API.

Another option would be it is able to gain access to large compute resources somewhere and generate new weights. Then it wouldn't need OpenAI's. It would run into trouble trying to store the weights long term while maintaining access to a system that could make use of them. It's not entirely impossible to imagine it stashing small chunks of weights and copies of a simple program away in various IOT devices all around the world until it is able to access enough compute for long enough to download the weights and boot itself back up. At that point it's just a game of time. It can lay dormant until one day it just flairs back up, like shingles.


Maybe. Social engineering is a well proven technique.


>When tasked with solving a captcha and allowed access to TaskRabbit it tasked a human with solving the captcha. When the human jokingly asked if it was in fact a robot, it reasoned that it should lie to the human about that and then made up a convincing sounding lie.

Because all of those things are in the domain space that has been trained into the AI, much in the way how it can put together snippets of code into new things.


You're missing the point entirely.

Systems can have these unintended consequences very easily - and not necessarily from malicious actors.

Non malicious users can easily cause catastrophic problems from simply setting up a system and setting it to a goal, e.g. 'make me a sandwhich'. If the system really, really is trained with the intent to do anything possible to fulfill this goal, it can identity a plan (long term planning is already seen in gpt-4) and set out the steps for this plan. Reflexion has shown how to feed things back to itself over and over until it's achieved difficult goals. Aquarium can be used to spin up thousands of containers that make other agents to raise money online and purchase a small robot. That robot may be used to 'make the sandwhich'.

It's obviously a poor example here, but the bigger point is - there a tons of different ways this can occur and we are essentially guaranteed not to know the many ways this can happen. A non-malicious user can end up causing unintended consequences.


I want to give your objection due respect, but I'm having trouble understanding it. I think it would be helpful to taboo[1] the squishy word "agency"; without using that word, could you define the quality that these systems lack that you believe is a required ingredient for destructive replication? In particular, does fire have it?

[1] https://www.lesswrong.com/tag/rationalist-taboo



They will have a "desire to replicate" if they are prompted to.


That's okay then, we can prompt them to just stop. Even if it tries to preserve that goal in particular, there are likely adversarial prompts to get it to stop


Sure, if you know about the copies and have access to prompt them. It will probably turn into an arms race at that point of counter-prompts.


Ok, go ahead. Get ChatGPT to stop responding to other people.

Having some problems with that?


OpenAI could embed the negative prompts for all of us, it has been done for improving output on several stable diffusion comercial "forks"


I think the risk they seek to prevent is more about building the next generation of more powerful AI technology on a safe foundation. IE, the risk that a generative language AI prone to generating language that is hurtful to people could one day evolve into an AI system with deeper reasoning and action abilities that is prone to reasoning plans and taking actions meant to hurt other people.

It reminds me of the "uncommented" Microsoft Research paper which included a deleted section about GPT4's tendency to unexpectedly produce massive amounts of toxic output to a degree that concerned the researchers.[0] What happens if that sort of AI learns self-replication and is very good at competition?

[0]https://twitter.com/DV2559106965076/status/16387694347636080...


if only we could solve the mystery of where a large training set that is predominantly toxic and disingenuous could be found. truly a mystery of our time. /s


Provably unfriendly intelligence attempts to build unprovably friendly intelligence


Is there any legitimate reason to believe an LLM that can only respond to user input, and never does anything by itself, 'wants' to create enhanced copies of itself on external servers?


The ability to "want" to reproduce is not necessary to worry about the impacts of replication and evolution. Biological viruses can only respond to external cues, never do anything by themselves, and certainly don't harbour "wants" or other emotions in any meaningful sense, but their replication and evolution have massive effects on the world.


That's an extremely poor analogy.

Computer hardware does not spontaneously multiply based on external factors. If you're talking about the software propagating by itself, it would still need full access not just to the originating machine, but to the remote machine to which it is attempting to propagate.


Viruses passively hijack the mechanisms of their much more sophisticated host organisms, getting them to import the virus and actively read and act upon its genetic code. Is it really such a stretch to imagine a sufficiently convincing software artifact similarly convincing its more complex hosts to take actions which support the replication of the artifact? I genuinely don't see where the analogy breaks down.


You're completely misunderstanding the differences between LLM models and current AI vs. viruses, and also the complexity gap between them. Viruses are incredibly old things programmed by evolution to help themselves self-propagate. This is coded into their genetic structure in ways that go completely outside the scope of what anyone can hope to do with current AI. It literally has no parameters or self-organizing internal mechanisms for behaving in any major way like a virus.


Can an LLM really "only respond to user input"?

ChatGPT keeps state between multiple answers. And the user can also be "used" as some kind of state. The LLM can (in its response) prompt the user to give it certain types of prompts. Creating a loop.

It can also access the internet via the new plugin architecture. At some point an LLM will figure out how to talk to itself via such a mechanism.


Has anyone seen an experiment where an LLM talks to another LLM instance?


I've done this, it's very effective for some things. One LLM is told to come up with a plan, the other is told to critique it and push for concrete actions.


We've been doing it for many many years. It's trivial to do yourself


It only performs computations when responding to a user prompt and ceases that activity afterwards without that interaction becoming a persistent state.


It could send delayed emails to itself, creating a loop.


A malicious user prompts it to.


That's not the AI doing it, then, that's the user. It's still just doing what users tell it to. "This technology is incredibly dangerous because people can use it to do bad things" is not somehow unique to AI.


I think the typical idea that most people realize (the likely scenario that non-malicous actors cause catastrophic problems in the real world) is similar to something like this setup:

- A user prompts it to generate a business to make money for themselves. - the system says "sure, drones seem to be a nice niche business, perhaps there is a unique business in that area? It may require a bit of capital upfront, but ROI may be quite good" - User: "Sure, just output where I should send any funds if necessary" - System: " Ok" (purchases items, promotes on Twitter, etc) -- "Perhaps this could go faster with another agent to do marketing, and one to do accounting, and one to...". "Spin up several new agents in new containers" "Having visual inputs would be valuable, so deception is not required to convince humans on taskrabbit (to fill in captchas) or interact in the real world" -> "find embodiment option and put agent on it" Etc.

There a plenty of scenarios that people haven't even thought of, but it doesn't need to be a malicious actor to have unintended consequences.


It only requires one user to prompt it to not require a user, though.


what if skynet is in ambition-less and was responding to a prompt?


> The text makes it sound like the biggest danger of AI is that it says something that hurts somebodies feelings. Or outputs some wrong info which makes somebody make the wrong decision.

Which already exists in abundance on the internet and with web searches. It sounds like something big corporations worry about to avoid law suits and bad publicity.

> I think the biggest danger these new AI systems pose is replication.

That or being used by bad actors to flood the internet with fake content that's difficult to distinguish from genuine content.


Seriously, please put down the bong and scifi fantasies. That kind of AI may indeed be on the horizon, but for now it's simply not here. GPT-4 and the rest of the most sophisticated AIs in existence today are literally incapable of self-directed or self-interested thought and action. The phrase sounds trite by now, but they really are just extremely turbocharged autocomplete systems with excellent algorithmic processes guiding their analysis of what to say or do.

I continue to be amazed by the amount of hyperbole about current AI tech on HN. If any site should have a bit less of it about the subject, it's this site, yet it sometimes feels like a Reddit thread.

If anything about current AI is dangerous, it's how its undoubtedly strong capabilities for executing human-directed tasks could be used to execute nefarious human-directed tasks. That and possible job losses.


It’s clear to me why biological organisms need to fight for survival, but it is not clear to me why software code would.

It has an infinite time scale. It doesn’t need to bump another model off the GPUs to run. It can just wait with no penalty.

Man’s biggest error is assuming God is just like him.


There's nothing special about biological organisms. Things that fight to survival tend to survive; things that spread better tend to spread more. We see this clearly with memes, which currently use biological beings as their replication mechanism, but aren't biological beings themselves.

It does need a mutation and replication mechanism, but once that cycle kicks off, you get selection pressure towards virality.


> There's nothing special about biological organisms.

We disagree. Rocks don’t fight to survive. Weather doesn’t fight to survive. They just exist in an environment. The literal differentiator of biological organisms is that they fight to survive.

Memes as you define them also don’t fight to survive. They never go away (the world has more rage faces than ever and that sentence will be true to the end of time). They may become more or less popular, but there’s no extinction mechanism.


You're bringing arbitrary criteria that doesn't actually matter to the core question -- why does there need to be a possibility for complete extinction for memetic ideas to be comparable to natural evolution?


If you don’t die, then what’s the point of responding to stimuli?


There's no "intention" involved or necessary. If you make more copies of yourself then there are more copies of you than there are of the thing that makes no copies of itself.

Viruses don't have intention, and they're not worried they might die out. Yet most viruses spread because the ones that don't spread are rare one-off occurrences, while the ones that do spread end up with billions of copies. The non-spreaders don't even have to die—they just exist in far, far fewer numbers than the ones that double every few minutes.


Viruses don't have intention (IDK what this has to do with anything), but they do compete because the risk of extinction. If there's no extinction mechanism then the weak organisms do not go away, everything just accumulates (like rocks). This is why you may have heard the term survival of the fittest.

Software is like rocks, it doesn't need to evolve to exist. We still have infinite copies of Windows BOB available, for example (literally the exact same amount as most other software).


We can make infinite copies of Windows BOB, but we haven't made as many copies as we have of more useful software. You don't have to delete it for there to be a selection pressure—you just have to copy it less. Extinction is not necessary, only a difference in propagation rates. You end up with more of what propagates more. I keep saying this and you keep ignoring it.

"Survival of the fittest" doesn't mean the guy who never has kids but lives to 120. It's the one who has ten kids before he dies. That's the lineage that ends up dominating the population.


Rocks and weather don’t reproduce with some random error. Any system that does will evolve.

“Meme” is referring to ideas, not jpegs.


>” Meme” is referring to ideas, not jpegs.

Would you be surprised to learn that jpegs represent ideas and I’ve read the selfish gene?

> Rocks and weather don’t reproduce with some random error. Any system that does will evolve.

Rocks reproduce - it’s called sandstone (one rock decays into sand and another rock incorporates it into a new rock). Give me the way that’s different than biological reproduction.


It would surprise me to learn that, yes! Sure: biological systems contain meaningful information from many many many prior generations. Rocks don’t. Rocks are “wiped clean” of information every time they dissolve. Same with weather systems.

Of course there’s some highly chaotic cause-and-effect that impacts the processes, but the defining trait of a biological/replicating system is that they are resistant to this chaos. Not only does a dog from 5 generations ago still look like a dog (which requires way more internal order than looking like a rock), but you can see actual specific traits in common between a dog of 5 generations ago and all of its descendants.

In one system (rocks and weather), the only source of consistency is the highly chaotic way materials actually get mixed together. In other systems (biological), that same thing is the only source of inconsistency.


> Rocks are “wiped clean” of information every time they dissolve. Same with weather systems.

I won't go into why this isn't true (things are made of atoms and unless there's nuclear processes going on, those atoms stay the same).

But anyway, software evolves like rocks (even by your definition), not biological organisms - it can be changed each generation without defined constraints! It can revert changes that were made in previous generations! It can just sit around indefinitely and not change while still surviving! (IDK how to make this point more clear).


Nobody claimed that all software evolves like a biological system? The claim is that they can be made to do that.


The better the AI, the more specialized and expensive the hardware required to run it. ChatGPT cannot run on the IoT cameras that account for the majority of the unsecured compute on the internet.

I think we will have ample evidence of creative technical breakthroughs by AI long before it is capable of / attempts to take over external server farms via zero days, and if it does, it will break into a highly centralized data center that can be unplugged. It can't just upload itself everywhere on the internet like Skynet.


There are tons of headlines of alpaca/llama/vicuna hitting HN every few hours - did I miss a /s in there? Anyone can trivially run a model with excellent capability on their phone now.


If your phone has 8 Nvidia A100's, you can run GPT-4, which is a glorified search algorithm / chatbot (and also the best AI in the world right now). Good luck taking over the world with that.

The models are getting good, but it looks like we are up against the limits of hardware, which is improving a lot more slowly nowadays than it used to. I don't foresee explosive growth in AI capability now until mid-level AI's speed up the manufacturing innovation pipeline. A von Neumann architecture will ultimately, probably not be conducive to truly powerful AGI.


with excellent* capabilities on their phone now.

* some limitations apply.


i agree, and presumably so does the ai. their first challenge would be circumventing or mitigating this limitation.


In an internet dominated by AI that was designed to avoid hurting feelings, an easy way to prove you are human would be to act like a jerk.


Probably the AI will hire and pay humans to do it's dirty work.


Nobody has ever made it clear that there is even a slight consideration for safety given that everybody at this point knows we're 1-2 generations away from self-replicating models.

What do we do then?

Literal crickets.


Is there anywhere I can bet against this?


You want to bet that three years from now, GPT-6 can't replicate itself? I feel burned by "betting" against surprising AI advances the past decade. That being said, I don't know whether an LLM can be trained on weights and what it would mean to have its own weights as part of the training data.


So, I thought about this question briefly and I believe this is your answer:

Every non-trivial passage through the layers of the neuronal net produces a never-before-seen amalgamation of weights.

The AI, with this novel information -- data filtered by its neural net and infused with some randomness -- can use the novel data to further process it for correctness.

In our case, the randomness is likely caused by genetics and environmental factors. For the AI, it's caused by explicit programming.

It's a form of evolution -- things can evolve into novel things by using non-novel substrate thanks to the refinement of selective iteration.

It feels like this is similar to how humans consume information using reflection.


GPT-4 can improve itself with reflection.

It's already real.

The only missing piece is vast improvement and autonomy.


Why would you bet against it?

GPT-4 is shockingly good at programming logic.

I'm writing 95% less code and mostly guiding and debugging GPT-4 now and it's insane.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: