This is computing CRC-64, not CRC-32, so there's not really a comparison. But perhaps most importantly, ours works with a variety of polynomials (there are a lot! [1])... we're just using the NVME one, but it's trivially adaptable to most (all?) of them. (The Intel instruction you link to only works with two - CRC32 and CRC32C)
Finally, it's based on Intel's paper [2], so they also believe it's extremely fast. :)
[1] https://www.intel.com/content/www/us/en/docs/ipp/developer-g...