The thing is that things like Cachegrind are supposed to be used as complements to time-based profilers, not to replace them.
If you're getting +-20% different for each time based benchmark, it might just be noisy neighbors but could also be some other problem that actually manifests for users too.
> used as complements to time-based profilers, not to replace them
Sure. I also use hyperfine to run a bigger test as a user would see the system. I cross reference that with the instruction counts. I use these hardware metrics in a free CI runner, and hyperfine locally.
If you're getting +-20% different for each time based benchmark, it might just be noisy neighbors but could also be some other problem that actually manifests for users too.