The ability to show something more than aggregates is, sure, a UI/front-end issu...

_benedict · on Jan 27, 2016

That's nonsense. If anything the maximum resolution of sampling is higher, since it has lower overhead, so it perturbs the measured thing less.

However most sampling profilers only report CPU-time spent in a task, not wall-time. So, the things he discusses like time spent waiting for a task to complete would be obscured. However it would be perfectly possible to report the number of samples in which a thread was blocked in a method call (though it could not be said with certainty it was the same invocation, it would be enough to see that a majority of time blocked was elapsed there), it's just that they do not typically do so.

tbrownaw · on Jan 27, 2016

maximum resolution of sampling is higher, since it has lower overhead

The article claims otherwise.

This is apparently mostly related to the costs of preemption and resulting icache replacement.

though it could not be said with certainty it was the same invocation, it would be enough to see that a majority of time blocked was elapsed there

But that's not quite what's being discussed. The main interesting thing here is tracking what oddness causes outlier worst-case times, which very much does require tracking individual invocations.

_benedict · on Jan 27, 2016

And I'm calling that aspect of the article nonsense. The cost of preemption is incurred only infrequently, so even if it perturbs the point after it measures slightly, it only does so infrequently and the point at which it measures (assuming it has not been affected by the prior sample) more accurately represents a system without any instrumentation.

Assuming each invocation has a unique stack trace, each call site can still also effectively be tracked through sampling. Looking at his examples, this seems reasonably likely, as they all have quite different behaviours.

What tracing does do is permit a clear sequential analysis of an arbitrary granularity of macro behaviours.

If the macro behaviours are chosen with sufficiently low resolution that they are large enough for instrumentation to be an immeasurable overhead, then you obviously get a very clear and accurate picture of the system behaviour at that reduced resolution.

Tracing RPC calls and other similar behaviours as done in the blog are a good example, but it isn't down to increased resolution; quite the opposite.