There is a separate counter for every memory object. One counter is only ever going to be touched by a single core at a time.
And even if there was a single counter, multithreading cannot make incrementing a counter faster. Two cores cannot write to the same cache line at the same time. Instead, cache lines need to bounce across cores when you write to them, and this takes such a long time that it turns the time it takes to roll over from centuries to millennia.
And even if there was a single counter, multithreading cannot make incrementing a counter faster. Two cores cannot write to the same cache line at the same time. Instead, cache lines need to bounce across cores when you write to them, and this takes such a long time that it turns the time it takes to roll over from centuries to millennia.