w/ likwid-bench S0:5GB:8:1:2, 129136.28 MB/s . At S0:5GB:16:1:2 184734.43 MB/s (this is the max, S0:5GB:12:1:2 is 186228.62 and S0:5GB:48:1:2 is 183598.29 MB/s) - According to lstopo my 9274F has 8 dies with 3 cores on each (currently each die is set to its own NUMA domain (L3 strat). In any case, I also gave `numactl --interleave=all likwid-bench -t load -w S0:5GB:48:1:2 -i 100` a spin and topped out about the same place: 184986.45 MB/s.
Yes, you're correct that your CPU has 8 CCDs but the bw with 8 threads is already too low. Those 8 cores should be able to get you at roughly half of the theoretical bw. 8x zen5 cores for comparison can reach the ~230 GB/s mark.
Can you repeat the same lkwid experiment but with 1, 2 and 4 threads? I'm wondering when is it that it begins to detoriate quickly.
Maybe also worth doing is repeating the 8 threads but forcing lkwid to pick every third physical core so that you get 1 thread per CCD experiment setting.
With `likwid-bench -i 100 -t load -w M0:5GB:1 -w M1:5GB:1 -w M2:5GB:1 -w M3:5GB:1 -w M4:5GB:1 -w M5:5GB:1 -w M6:5GB:1 -w M7:5GB:1` we get 187976.60
Obvious there's a bottleneck either going on somewhere - at 33.5GB/s per channel, that would get close to 400GB/s, what you'd expect, but the reality is that it doesn't get to half of that. Bad MC? Bottleneck w/ the MB? Hard to tell, not sure that without swapping hardware there's much more that can be done to diagnose things.
I see. I am out of other ideas besides trying to play with BIOS tweaks wrt memory and CPU. I can see that there are plenty of them, for worse or for the better.
At a quick glance, some of them look interesting such as "Workload tuning" where you can pick different profiles. There is "memory throughput intensive" profile. You can also try to explicitly disable DIMMs that are not in use given you use only half of them. I wouldn't hold my breath that any of these will make a big difference but you can give it try.
Another idea: AFAICS there have been a few memory-bw zen-related bugs reported to likwid and, in particular, https://github.com/RRZE-HPC/likwid/issues/535 may suggest that you could be hitting a similar bug but with another CPU series.
The bug report used AMDuProf to confirm that the bandwidth is actually ~2x than what likwid reported. You could try the same.