Yes, English text in pretraining will necessarily have similar distribution. But when it comes to alignment, distributions will be different, since that data is typically not shared. The metapoint is - it is not realistic to expect completely uncensored models. Not in the East, nor in the West. The best you can do is use critical thinking when consulting both.