I think a few other things like memory models also matter and affect cpu architecture. E.g. the x86 total store order vs arm64's model. You potentially get to do a few nice optimizations on arm64 vs x86. I'm not sure how much of a difference that makes though.
Memory models are indeed important. That's why Apple extended ARM with total store order capability. A CPU can efficiently implement more than one memory model.