Just because something is done in the name of "safety" doesn't make it unimpeachable good. In fact, it's often quite the opposite. Ask anyone who's had to get on a plane at a US airport in the last 20 years.
Do you mean the seatbelts in the car I drove to the airport at? Or the elevator inspections on the elevator I took from the 4th floor of the parking garage? Maybe the lack of smoking in the airport, or on the airplane? The regular aircraft inspections? Or the regulations on pilot hours? Laws on alcohol consumption for them? How about the emergency exits? Or the oxygen masks. Are these examples that I came up with off the top of my head quite the opposite of unimpeachable good?
No, they’re a bad faith reply, since surely you knew they were referring to the the airport security checkpoint and some of the asinine luggage restrictions imposed there.
Oxygen masks definitely aren't unimpeachable good. Passenger oxygen mask systems are virtually never needed but have caused multiple fires that destroyed planes and killed people. They are an expensive complication without which plane travel would be better - the plane would be cheaper to build, there'd be more room and weight available for passengers or luggage, and we'd all save time on safety briefings from not having to hear instructions on how to do a thing that you certainly won't need to do.
The pilots need oxygen masks as a backup while they quickly descend to a safer altitude. Unless you're flying over the Himalayas and can't drop that low, the passengers should be fine.
AI ethics (like making current AI refuse to do some things) and dealing with existential risks to humanity from future AI are quite different so we should probably not put them into the same category when talking about it
Exactly, one is about shielding humans from their own stupid intents whilst the other is shielding ourselves from AI's homocidal/genocidal intents (even if it's a second-order effect).
This is a common misconception of the meaning of model performance. AI safety effectively means adjusting the objective function to penalize some undesirable outcomes. Since the objective function is no longer absolute task performance model performance doesn't go down - it is simply being evaluated differently. The user may be unhappy - they can't build their dirty bomb - but the model creator isn't using user happiness as the only consideration. They are trying to maximise user happiness without straying outside whatever safety bounds they have set up.
In that sense it is mathematically equivalent to (say) applying an L2 regularization penalty to reduce the occurrance of higher-order terms when fitting a polynomial. Strictly it will produce a worse fit on your training data, however it is done because out of sample performance is important.
Is it just safety? We also need to align them to be useful. So I'm not sure safety and usability are mutually exclusive? A safe model seems like a useful model, a model that gives you dangerous information seems, well dangerous and less useful ?
I googled that exact phrase (and put on the kettle for the visit I'll soon get). The first page was all government related resources on how to deal with terrorist attacks.
If not outright blocked, instructions do seem to be weighted down.
it's a Motte-and-bailey situation [1]. In theory AI-Safety is about X-Risks. In practice it's about making AI compliant, non-racist, non-aggressiv etc.
The standard e/acc trope is that existential risk is obviously sci-fi nonsense, and so anything that slows down progress is just doing harm. (Usually no attempt is made to engage with the actual arguments for potential risk.)
Given the self-evidence of no existential risk, there is then an objection to “dumbing down” where model performance suffers due to RLHF (the “alignment tax” is a real thing), and often but not always this includes an objection to wokeness or perceived unwillingness to speak “truths” or left-wing bias being imposed on models.