On my M1 mac "dd ... | cksum" takes 3 seconds while "dd | shasum" (sha1) takes 2...

anarazel · 2024-08-05T15:12:00 1722870720

The bottleneck isn't at all the checksum computation itself. It's that to keep checksums valid we need to protect against the potential of torn pages even in cases where it doesn't matter without checksums (i.e. were just individual bits are flipped). That in turn means we need to WAL log changes we don't need to without checksums - which can be painful.

Joe_Cool · 2024-08-05T15:00:28 1722870028

Interesting. I guess M1 doesn't have the 'crc32' "acceleration" that is included in SSE4.2.

loeg · 2024-08-05T16:20:48 1722874848

M1 has CRC32 acceleration intrinsics.

https://dougallj.wordpress.com/2022/05/22/faster-crc32-on-th...

https://github.com/corsix/fast-crc32?tab=readme-ov-file#appl...

silvestrov · 2024-08-05T21:28:52 1722893332

So when using these intrinsics an Intel Core i7 can do 30 GB/s but the performance check linked above (by isosphere ) says only 300 MB/s, i.e. 1%

Something is amiss here.

If a CPU can do 30 GB/s then a CRC check should not have any real performance impact.

loeg · 2024-08-06T02:24:40 1722911080

I don't know where you're getting 300 MB/s from.

silvestrov · 2024-08-06T09:21:48 1722936108

Page 5 of https://www-staging.commandprompt.com/uploads/images/Command... says "This system can checksum data at about 300 MB/s per core."

It lacks page numbers. Page 5 is first page with gray box at the top of the page.

loeg · 2024-08-06T19:46:57 1722973617

That's measuring 'cksum', which must have an awfully slow implementation. The document notes that this is distinct from measuring PG's checksum performance. (I think it's a pretty useless measurement.)

Earlier (page 4):

> How much CPU time does it take to checksum...

> ...a specific amount of data? This is easy to estimate because PostgreSQL uses the crc32 algorithm which is very simple, and (GNU) Linux has a command line program that does the same thing: cksum.

Yeah, using cksum as an estimate here appears to be very flawed.

Joe_Cool · 2024-08-07T14:59:48 1723042788

That is weird. Seems like crc optimization is quite a rabbit hole.

https://github.com/komrad36/CRC has a massive section about it in the README. Really interesting.

loeg · 2024-08-08T04:45:29 1723092329

Yeah. crc32 may be simple in theory, but doing it as fast as possible utilizing the various execution units of modern hardware is challenging.