Thanks, I came here to ask the similar question about native optimizations on th...

stephencanon · on July 9, 2018

For a generic 64b platform, use RSQRTSS/RSQRTPS, since it's the only one that will exist. The others are specific to rather new hardware.

My recollection is that it's accurate to 11.5 bits, so after one refinement step you have nearly full precision (an error bound of a couple ULP). Check Intel's docs for more details.

piyush_soni · on July 10, 2018

Thanks!