It's wild to me that we gave up hardware error correction on memory at the same time we increased memory sizes about 1000x, shrinking the die (and thus reliability) by a roughly similar amount.
This is true, but even today bit flips per GB/hour are still really low.
However failures in the memory chip -> chip pin -> dimm -> dimm connector -> motherboard -> CPU socket -> CPU pin -> CPU are pretty common. Sure ECC helps with random bitflips, but it's also very useful to diagnose something is broken in the CPU <-> memory chip pipeline. It's very frustrating to debug something when the main sign of the problem is a reboot.