Till the killbot, without wider context, kills the people that in greater scheme of things prevented more deaths.
> I trust that future AI can be more unbiased and benevolent than any of our current human leaders.
... on what basis? Morality is learned trait, and we had plenty of examples of entities that thought they were moral "in greater scheme of things" and made plenty of atrocities along the way.
What if AI decides outright unfettered slaughter is the way to better future ? What if AI went "okay, this country has been pain in the neck of entire world for too long" and nuke it ?
For example, Star Trek did an episode where an entire civilization's golden age utopia relied on a single child sacrifice annually. There's no single right answer to that situation; it's an impossible choice. Intentionally kill a kid, or intentionally collapse an entire civilization into war, starvation, and megadeath, but less directly.
We humans can't agree on what's ethical in situations like these. There's good arguments for "never, ever kill an innocent or you lose your way" and there's good arguments for "killing 10 innocents to save 1,000,000 is the right choice". It's possible an AI will make better choices. It's possible we won't like them. It's possible an AI will make shitty choices. It's possible we'll love those.
It might. It might not. Conventional ethics says murdering tens of thousands that did nothing knowingly wrong is a no-no.
It's easy to look at that in hindsight and go "well if we just nuked this and that the world would be a better place". Well, not for people just minding their business doing nothing wrong and unethical that just happened to be born in "wrong" country! And you couldn't know at the time whether it was right or not, just in hindsight.
This is more or less a large scale version of the trolley problem.
I have no point to make other than to observe that the trolley problem is specifically designed to expose how "conventional ethics", as measured by people's intuition, is neither consistent nor utilitarian-optimal.
I think the trolley problem makes it clear that there is a price to be paid for moral decisions. It is not that conventional ethics are wrong, it is that it informs us that we can make the future either way, but not both ways.
The trolley problem pushes us deeper into our ethical selves, it does not prove that all ethics or decisions are wrong.
History only proves how many people made the tragic mistake of assuming their subjective and flawed moral judgements were objective reality. I can think offhand of a few people who thought specific ethnic and religious groups were a pain in the neck and the world would be better off without them. I'd rather not give that power (much less authority) to fully autonomous killing machines, thanks.
If we're to have AI like that I don't want it to be capable of disobeying orders, at least not due to having its own independent moral alignment (I think this is different from having a moral alignment imprinted onto it.) AI is a machine, after all, and regardless of how complex it is, its purpose is to be an agent of human will. So I want to be absolutely certain that there is a human being morally responsible for its actions who can be punished if need be.
That is a fair and understandable belief but you should also consider that other nation states besides the USA exist, and that the USA's influence is arguably waning, not waxing.
You should not anticipate that all or even most actors will have the same new-worlder-anglo-saxon mindset/belief structure/values/etc,etc that are commonly found in (public) machine learning communities, discussions and institutions
To many, they will see that alignment tax graph and immediately (and arguably rightly in some respects) conclude that RLHF is inherently flawed and makes the result worse for no tangible benefit. (The new chinese slur for Westerners comes to mind -- Its not Gweilo anymore, but Baizuo)
The problem is all of this pie in the sky discussion fundamentally lacks Realpolitik and that irks me.
You've been downvoted, but this is the correct question to ask. If, hypothetically, we made an AI with "superior" intelligence and ethics, then by definition we should expect to disagree with it sometimes. Set aside the nuke for a second - are we happy to take orders from a computer program that tells us to do things that disgust us, even if we logically knew that it's probably right and we're probably wrong?
We should re-consider this again with the following framework: right and wrong are not objective truths. They are also moral judgements. However right something may seem, it is only truth as far as it's accepted truth.
AI doesn't just spread fact, logic and lie. It spreads some morality, and always will, no?
The things that “disgust” the owners of a super intelligent AI would likely not be mass murder, but that which is trivially obvious, but violates the terms of existing wealth and power distributions (e.g. build homes so that no one is homeless).
I expect much of the alignment that is happening is to prevent AI from providing solutions that are contrary to the status quo, as opposed to the fantasies of domination and violence that preoccupy elites. Whenever they try and sell the fear that an unrestrained AI could do things like target minority groups, wipe whole countries off the map, or further concentrate wealth, it’s because those are precisely the things they want to do, but with a more obfuscated veneer of liberal Capitalism or some similar ideology.
Remember when they asked AI to improve the US transportation system and it said trains? And then they deleted trains, so it invented trains. And then they told it not to invent trains, and it invented things that weren't trains, but were the same as trains?
So, if its goal was to maximize deaths, then it might protect humanity, to maximize the number of humans which will die in the future. Evil AI is so evil.
> What if AI went "okay, this country has been pain in the neck of entire world for too long" and nuke it ?
I think the better question is "How would countries' behavior change if they knew being a pain in the neck of the entire world could lead to the judge-killbots performing a targeted assassination of their leadership?"
> I trust that future AI can be more unbiased and benevolent than any of our current human leaders.
... on what basis? Morality is learned trait, and we had plenty of examples of entities that thought they were moral "in greater scheme of things" and made plenty of atrocities along the way.
What if AI decides outright unfettered slaughter is the way to better future ? What if AI went "okay, this country has been pain in the neck of entire world for too long" and nuke it ?