I expect Intel to still cripple some aspect of DDR5 ECC on consumer chips; maybe it will correct errors but the memory controller won't report them. Or maybe it's possible to disable ECC even though it's already implemented.
I also expect servers to use two levels of ECC to provide chipkill and also to keep server RAM more expensive than consumer.
Ryan Smith in the link above seems to suggest error correction will be done transparently anyway, so it seems like it won't reported to the OS. So it doesn't look like Intel could cripple it even if they wanted to.
> Ryan Smith in the link above seems to suggest error correction will be done transparently anyway, and it won't reported to the OS.
I once heard a rant from someone on how not reporting this to the OS is really bad for diagnosing issues, even soft errors that are auto-healed. (It could have been from Bryan Cantrill, but couldn't say for sure.)
I do think that there will still be some people interested in ECC detection and reporting, but that's not really why I'm interested in ECC memory. The dividing line "enterprise = error correction with monitoring / non-enterprise = silent error correction" is much more sensible than "non-enterprise = no error correction at all, good luck" IMO.
Isn't the primary reason why people are looking into diagnostics is because it's very hard to determine whether ECC is working in the first place? Because it depends on the particular hardware setup? If the spec states that all DDR5 is supposed to have internal error correction anyway, then I'm happy to take for granted error correction is working until I read about the scandals of non-spec cheap DDR5 :)
Yes, I do not run things at a scale that would need that, but I would appreciate at least a toggle to have it available if needed: default=quiet(er) would be fine for most cases.
I also expect servers to use two levels of ECC to provide chipkill and also to keep server RAM more expensive than consumer.