w/ likwid-bench S0:5GB:8:1:2, 129136.28 MB/s . At S0:5GB:16:1:2 184734.43 MB/s (...

menaerus · 2025-02-02T10:55:14 1738493714

Yes, you're correct that your CPU has 8 CCDs but the bw with 8 threads is already too low. Those 8 cores should be able to get you at roughly half of the theoretical bw. 8x zen5 cores for comparison can reach the ~230 GB/s mark.

Can you repeat the same lkwid experiment but with 1, 2 and 4 threads? I'm wondering when is it that it begins to detoriate quickly.

Maybe also worth doing is repeating the 8 threads but forcing lkwid to pick every third physical core so that you get 1 thread per CCD experiment setting.

lhl · 2025-02-02T11:37:18 1738496238

1: 33586.74 2: 47371.93 4: 65870.07

With `likwid-bench -i 100 -t load -w M0:5GB:1 -w M1:5GB:1 -w M2:5GB:1 -w M3:5GB:1 -w M4:5GB:1 -w M5:5GB:1 -w M6:5GB:1 -w M7:5GB:1` we get 187976.60

Obvious there's a bottleneck either going on somewhere - at 33.5GB/s per channel, that would get close to 400GB/s, what you'd expect, but the reality is that it doesn't get to half of that. Bad MC? Bottleneck w/ the MB? Hard to tell, not sure that without swapping hardware there's much more that can be done to diagnose things.

menaerus · 2025-02-02T12:25:26 1738499126

Mixed results. I suspect you might have an ES (engineering sample) of your CPU.

lhl · 2025-02-02T13:03:14 1738501394

Besides not having ES markings, It is a retail serial and stepping in dmidecode, so that's unlikely.

menaerus · 2025-02-02T13:38:21 1738503501

I see. I am out of other ideas besides trying to play with BIOS tweaks wrt memory and CPU. I can see that there are plenty of them, for worse or for the better.

At a quick glance, some of them look interesting such as "Workload tuning" where you can pick different profiles. There is "memory throughput intensive" profile. You can also try to explicitly disable DIMMs that are not in use given you use only half of them. I wouldn't hold my breath that any of these will make a big difference but you can give it try.

menaerus · 2025-02-03T10:05:15 1738577115

Another idea: AFAICS there have been a few memory-bw zen-related bugs reported to likwid and, in particular, https://github.com/RRZE-HPC/likwid/issues/535 may suggest that you could be hitting a similar bug but with another CPU series.

The bug report used AMDuProf to confirm that the bandwidth is actually ~2x than what likwid reported. You could try the same.