In the context of AI, the more safety measures are put into a model, the worse i...

circuit10 · on Sept 26, 2023

AI ethics (like making current AI refuse to do some things) and dealing with existential risks to humanity from future AI are quite different so we should probably not put them into the same category when talking about it

whizzter · on Sept 26, 2023

Exactly, one is about shielding humans from their own stupid intents whilst the other is shielding ourselves from AI's homocidal/genocidal intents (even if it's a second-order effect).

seanhunter · on Sept 26, 2023

This is a common misconception of the meaning of model performance. AI safety effectively means adjusting the objective function to penalize some undesirable outcomes. Since the objective function is no longer absolute task performance model performance doesn't go down - it is simply being evaluated differently. The user may be unhappy - they can't build their dirty bomb - but the model creator isn't using user happiness as the only consideration. They are trying to maximise user happiness without straying outside whatever safety bounds they have set up.

In that sense it is mathematically equivalent to (say) applying an L2 regularization penalty to reduce the occurrance of higher-order terms when fitting a polynomial. Strictly it will produce a worse fit on your training data, however it is done because out of sample performance is important.

ChatGTP · on Sept 26, 2023

Is it just safety? We also need to align them to be useful. So I'm not sure safety and usability are mutually exclusive? A safe model seems like a useful model, a model that gives you dangerous information seems, well dangerous and less useful ?

Xenoamorphous · on Sept 26, 2023

Does Google or other search engines block sites that have instructions on how to make a dirty bomb?

orangepanda · on Sept 26, 2023

I googled that exact phrase (and put on the kettle for the visit I'll soon get). The first page was all government related resources on how to deal with terrorist attacks.

If not outright blocked, instructions do seem to be weighted down.

Serenacula · on Sept 26, 2023

That's not what people mean by AI safety - they're referring to the dangers of uncontrollable or runaway AI. Particularly AGI.

hnben · on Sept 26, 2023

it's a Motte-and-bailey situation [1]. In theory AI-Safety is about X-Risks. In practice it's about making AI compliant, non-racist, non-aggressiv etc.

[1] https://en.wikipedia.org/wiki/Motte-and-bailey_fallacy

hnben · on Sept 26, 2023

scott alexander has a nice post about the intersection of these two types of ai safety https://www.astralcodexten.com/p/perhaps-it-is-a-bad-thing-t...