And the highest data rates to date with the signal integrity requirements that accompany it. Got a piece of dust pushed down by your CPU cooler saddling two DIMM pins? Get ready for your machine to shred your data. And that's just a common simple scenario. I's be surprised if real world error rates in nominal scenarios won't be higher than with DDR4.
And again still likely be right. 10 years ago consumer electronics marketing never included signal integrity stuff like eye diagrams but now pretty much every nvidia announcement with a new memory standard does. We're really pushing ever closer to channel bandwidth limits and corners that could be cut in the past can no longer be cut. ECC is more important than ever.
That's not the type of ECC the parent was talking about. That's because the densities and clock rates are so high for DDR5 that it needs ECC to function properly, but like most standards the minimal implementation is really quite watered down. It doesn't correct the entire range of bitflips that a server with ECC RAM does.
Disagree. Parent was discussing the need to reboot after a system has been on for a long number of hours. The failure mode, assuming it's related to the DRAM, would be an accumulation of bit-flips in the DRAM. Every memory has some FIT/Megabit rate. The on-die ECC added in DDR5 spec will be highly effective in addressing this failure mode.
Channel ECC is the ECC type most directly relevant for high clock rates and signal integrity aspects. I agree with you that Channel ECC becomes a practical requirement to meet the interface transaction rates of DDR5. It is also true that channel ECC is not mandatory in DDR5 and is not implemented by mainstream CPU platforms (like previous DDR generations).
If the on-die ECC reduces the error rate but the lack of standard channel ECC increases the error rate, because of the much more demanding signals, then it's not clear at all that the overall rate of error will lower.
In fact it could very well be higher depending on how the physical module is designed.
I imagine some portion of bit-flip induced reboots are due to the actual DRAM chips, but also some portion will be due to everything else that can bit flip both on the memory module itself and in the interconnect.
I haven't seen anything yet to say that DRAM chip bit flips will be in the majority.
It's never been clear to me if the ECC is necessary for DDR5 to operate, or just a nice feature. Do you or any other readers happen to know the answer?