Well explained! All of the later contributions to the sum are effectively ignored or their contributions severely damaged in 32-bit because the "buckets" are big.
It was precisely this problem. The individual had done all data preparation/normalization in 32-bit because the model training used 32-bit on the GPU. It's a very reasonable mistake if one hasn't been exposed to floating point woes. I was pleased to see that the individual ultimately caught it when observing that 2 libraries disagreed about the mean.
Computing a 64-bit mean was enough. Compensated (i.e. Kahan) summation would have worked too.
It was precisely this problem. The individual had done all data preparation/normalization in 32-bit because the model training used 32-bit on the GPU. It's a very reasonable mistake if one hasn't been exposed to floating point woes. I was pleased to see that the individual ultimately caught it when observing that 2 libraries disagreed about the mean.
Computing a 64-bit mean was enough. Compensated (i.e. Kahan) summation would have worked too.