On my M1 mac "dd ... | cksum" takes 3 seconds while "dd | shasum" (sha1) takes 2 seconds. So cksum might not be the best tool for performance checking.
There is CPU specific code in the PG source in src/include/storage/checksum_impl.h
It is written as a plain nested loop in C. So performance is fully dependent on the compiler being able to parallelize or vectorize the code.
I would not be surprised if manually written SIMD code would be faster.
The bottleneck isn't at all the checksum computation itself. It's that to keep checksums valid we need to protect against the potential of torn pages even in cases where it doesn't matter without checksums (i.e. were just individual bits are flipped). That in turn means we need to WAL log changes we don't need to without checksums - which can be painful.
That's measuring 'cksum', which must have an awfully slow implementation. The document notes that this is distinct from measuring PG's checksum performance. (I think it's a pretty useless measurement.)
Earlier (page 4):
> How much CPU time does it take to checksum...
> ...a specific amount of data? This is easy to estimate because PostgreSQL uses the crc32 algorithm which is very simple, and (GNU) Linux has a command line program that does the same thing: cksum.
Yeah, using cksum as an estimate here appears to be very flawed.
There is CPU specific code in the PG source in src/include/storage/checksum_impl.h
It is written as a plain nested loop in C. So performance is fully dependent on the compiler being able to parallelize or vectorize the code.
I would not be surprised if manually written SIMD code would be faster.