Perf c2c is extremely useful. Especially when using it with call graphs. Seeing which call stack interacts how with a cache line can make it a lot easier to find solutions to contended cache lines.
It tracks a bunch of cache-related counters, and HITM (hit modify) tracks loads of data modified by either an other core of the same node ("local hitm") or by an other node entirely ("remote hitm").
Here, the contention would have shown up as extreme amounts of both, as each core would almost certainly try atomically updating a value which had just been updated by an other core, with higher than even chances the value had been updated by a core of the other socket (= node).