The example given by OP actually shows a taboo from the intersection of both set...

ein0p · 2025-01-27T20:50:54 1738011054

Yes, English text in pretraining will necessarily have similar distribution. But when it comes to alignment, distributions will be different, since that data is typically not shared. The metapoint is - it is not realistic to expect completely uncensored models. Not in the East, nor in the West. The best you can do is use critical thinking when consulting both.