If the model doesn't refuse to produce output, it's not censored anymore for any...

If the model doesn't refuse to produce output, it's not censored anymore for any practical purpose. It doesn't really matter if there are "censorship neurons" inside that are routed around.

Sure, it would be nice if we didn't have to do that so that the model could actually spent its full capacity on something useful. But that's a different issue even if the root cause is the same.