ECC seems like a trivial thing to log and keep track of. Surely any Fortune 500 could do it and would have enough scale to get meaningful data out of it?
It's not just tracking ECC errors, which as you point out is not hard, but correlating it with the other metrics needed to determine the cause and having the scale to reliably root cause bitflips to software (workloads that inadvertently rowhammer) or hardware or even malicious users (GCP customers that may intentionally run a rowhammer.)