Tiptop: Hardware Performance Counters

edwintorok · on July 16, 2014

Interesting that this software has reached version 2.2 and it is not available in Debian yet. I filed a RFP: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754932

thinred · on July 16, 2014

Preliminary work on the package: http://anonscm.debian.org/gitweb/?p=collab-maint/tiptop.git I will change to ITP soon. :)

oakwhiz · on July 16, 2014

Is it possible to use these same hardware performance counters to perform a side-channel attack on another process?

smcleod · on July 16, 2014

Could someone please explain the differences and advantages of tiptop over using a program like htop for analysing performance for every day operations (if any)?

deathanatos · on July 16, 2014

I think they're not really in the same class. The article states,

> software developers identify critical regions of their applications and evaluate design choices to select the best performing implementation.

Which is what I'd use metrics like cache misses and branch prediction misses for: really fine-tuning some section of the code that needs to execute lightning fast.

htop, on the other hand, gives a more high-level overview. Like, "who's eating all the RAM?". (Or, perhaps more often, "who do I need to kill?")

AYBABTME · on July 16, 2014

Once you've identified that a process has a lot of branch mispredict (or other counters), how do you identify where those lots of branch mispredictions are caused?

stephencanon · on July 16, 2014

Use a sampling profiler with support for hardware counters (vtune, instruments, zoom, etc). Configure to take a stack trace every Nth* event on the counter of interest. Run your program under the profiler. Look at the trace with the tree “inverted” or “bottom up” to see exactly which functions are incurring the counter events, which isn’t terribly useful, so look at the functions themselves to see a line-by-line or instruction-by-instruction breakdown of where the events occurred.

(*) What should N be? Depends on how frequently the counter is getting hit. N between 1000 and 1000000 is pretty typical. Choosing prime N is a good idea.

nnx · on July 16, 2014

You could use "perf record -e cache-misses <process>" then "perf report" to see which functions are responsible.

padenot · on July 16, 2014

You can use {call,cache}grind for that, but sometimes it's a bit unpractical if the software is big.

An approach I sometimes use is to throw a generic profiler at the program, make the program do something that is not fast enough (and that would need to be optimizes), look at the profile to identify the function(s) that are too slow, extract them from the big code base, get a good set of input data and run that with {call,cache}grind.

Then you can use the awesome kcachegrind to look at the data (where you can look at different cache misses, branch misdirect, etc.).

Of course, most of the time, simply running in the profiler show a non-optimal algorithm, or terrible allocation patterns, so you don't have to do all that, but I found this approach useful when writing inner loops for numeric computations (and of course, extracting the code if rather easy for this kind of stuff).

And also, this is osx/linux only, sadly.

AYBABTME · on July 16, 2014

That's cool. Right now I'm at the stage "I know how to profile, I can identify algorithmic problems and change them for a better ones, I can find allocation issues and use buffers/pools".

But lower level things like "I know which branch mispredict kills me" or "I know which access patterns results in page faults" are out of my reach. I mean, before asking the question and getting answers like yours.

So thanks for that!

edwintorok · on July 16, 2014

It monitors hardware counters, so it is more similar to 'perf top' than 'htop'/'top', but it can also monitor multiple counters at once. As a disadvantage it can't show the function-level counters, or assembly annotation that 'perf top' can.

So perhaps you could use 'tiptop' to get a general view of what might be slow, and then drill down using 'perf top'.

stinos · on July 16, 2014

Slightly off-topic (maybe) but can this or another readily available tool be used to track counters during startup/shutdown and provide a log of it which can then be imported in some viewer? (think something like windows Performance Monitor/Event Tracer and the likes)

pgeorgi · on July 16, 2014

Like http://www.bootchart.org/?

stinos · on July 16, 2014

exactly!

th3iedkid · on July 16, 2014

it mentions intel instrumentation instructions , does it work on other hardware though goes unmentioned in requirements? Say spark/...