Computers are constrained in bus bandwidth as well, and it’s harder to add more. RC tends to require strongly consistent counter bumps on the bus even for objects whose exact lifetimes you might not care about so much. The costs of GC are amortized over more work, though the pauses can hurt.
I will add that optimizing for memory usage is hugely important to system performance on Apple’s devices.
I do wonder about the idea that it isn’t universally important; all computers are RAM/cache constrained except maybe HPC equipment.