I think it's a little bit of that and also I suspect they have far more pressing...

I think it's a little bit of that and also I suspect they have far more pressing concerns than CRC32X being relatively slow (it is still a throughput of one per clock which isn't at all bad). Branch prediction and prefetching seems to be the really important problem at least for Apple due to their very deep ROB [1]. A mispredicted branch being resolved late (i.e. a branch dependent on an outstanding DRAM fetch) can lead to hundreds of executed instructions being discarded (wasting tons and tons of power and cycles). I don't quite remember the exact figure, but I've heard a good metric in CPU arch is that about one of every six instructions is a control flow instruction in general purpose programs (i.e. non-scientific/ calculation heavy). Being just a little bit faster on CRC32X calculation may not have been worth it when they could spend that precious power budget elsewhere. It's really just design choices all the way down. They may very well be doing a lot of CRC32s but they're almost certainly doing more of everything else than CRC32s.

[1] https://www.anandtech.com/show/16226/apple-silicon-m1-a14-de...