Hacker News new | past | comments | ask | show | jobs | submit login
Pin - A Dynamic Binary Instrumentation Tool (pintool.org)
59 points by nkurz on April 14, 2013 | hide | past | favorite | 32 comments



Here is an actual Hacker tool that allows all sorts of insane CPU close performance work. No comments all day. This is heartbreaking.


First, it's Sunday night.

Second, PIN gives you very low-level access, sure, but it also requires quite a bit of work to get that data. You can get a lot of the same data (as seen in the examples) with libpfm4 (http://perfmon2.sourceforge.net/). CPU Performance counters can give you a lot of this data with much lower run-time overhead, and a lot less work.

Also, PIN's a little hairy. If you just want to generate code for run-time execution, LLVM's your best bet. If you want to diddle with a running executable, you can always use libelf(3) and ptrace(2) to read and diddle with the running process. It may be useful for specific sorts of analyses you want to run on an executable, but it's messy. If you're doing performance instrumentation, dynamically modifying the code is going to alter your results that can be hard to compensate for.


You can't effectively do things like instrumenting every write instruction in a program using ptrace. Also, the techniques Pin uses sound hairy, but they're the same things software virtualization does.


True, but what do you do with that data? Transfer it out of process? Analyze it? And how many systems can withstand that sort of slowdown without timeouts?

Performance counters can tell you quite a bit, and they cost very little to set up. Snapshots and a little differential analysis can get you more comprehend-able data without transfer/storage problems.


I think it's all about using the right tool for the right job. Sure enough, there are some things that dynamic binary modification(DBM) does that can also be implemented another way (e.g. generic application level profiling), but even some of those things are sometimes done faster/better by DBM tools: Want to log the system calls to RANDOM_SYSCALL? Sure, use ptrace and do these context switches: app>kernel>ptrace_app>kernel(actual syscall)>ptrace_app>kernel>app every time any system call executes... Or use PIN and take the usually small performance hit. ptrace also messes up signal delivery.

Want to profile dynamically generated code? libelf won't be much help there. Or do you want to run memory accesses through a cache simulator? ptrace won't be much help. There are plenty of uses when DBM is the right (or only) tool.


You can get a lot of the same data (as seen in the examples) with libpfm4 (http://perfmon2.sourceforge.net/). CPU Performance counters can give you a lot of this data with much lower run-time overhead, and a lot less work.

What's the preferred way of using libpfm4? I've ended up using it's sample program as a way to convert from readable counter names to hex to put into a perf command line. I've found several defunct patches to give perf this functionality directly, and am confused why this seemingly essential functionality is left out.

What I'd like is the ability to measure just sections of code, and access to all available counters without needing to copy-and-paste hex. I'm getting the sense that perf is not the tool for this:

http://lwn.net/Articles/441209/

http://www.mail-archive.com/linux-perf-users@vger.kernel.org...

Is Likwid a viable option? https://code.google.com/p/likwid/


I wrapped the perf_event syscall with something easier to use, and just call it directly around the code I care about.


I came across it as a platform for running and debugging code for a non-yet-released CPU: http://software.intel.com/en-us/articles/intel-software-deve...

A couple other uses that I thought it would be good for were marking instructions with unaligned loads and stores and flagging switches to and from 256-bit VEX code. Although probably you can do these with other tools as well.

I posted this here because it seemed potentially useful, and I was surprised I'd never heard of this project. I'm mostly familiar with the ones Lally mentioned, but Thomas brought up DynamoRIO which is also new to me. Are there other niche optimization tools I should know about?

A specific question would whether there is anything more useful than IACA and gut instinct for determining optimal instruction order. Even just something that would generate an easy to parse data dependency graph for a short section of code?


So... what do you think about it? Or, do you have questions? I'm not very familiar with Pin, but am somewhat familiar with DynamoRIO, which is a competing project.


An interesting application of Pin for malware analysis / visualization is Danny Quist's Vera[1] and de-obfuscation framework[2]. It's also used in MIT's Architecture course to benchmark different architecture designs.

[1] http://www.offensivecomputing.net/?q=node/1687 [2] http://www.offensivecomputing.net/?q=node/492


Here's a paper about Valgrind that includes some details on how it differs from Pin: http://www.valgrind.org/docs/valgrind2007.pdf.


Discussed at greater depth in the paper at http://goo.gl/YDTwu , if anyone's interested.


Thanks! Direct link to the paper here: http://ursuletz.com/people/faculty/pdfs/p190-luk.pdf


Thank you for this.


This is definitely something I won't forget about the next time I'm trying to figure out what a binary is exactly doing. Also, I really need something like this for OSX right now.


I believe people are working on a port of DynamoRIO to OSX.


I am working on a DBT framework that has some user space support. I do my main development in OS X and Linux, and so I have done some testing of it on OS X.

The main focus of the DBT tool is Linux kernel modules, but let me know the kinds of stuff you need it for and I can a) figure out if my tool is applicable, and b) perhaps share the code.


/Applications/Xcode.app/Contents/Developer/usr/bin/instruments

I need to know everything about that binary. How it works, what ports(unix domain & network sockets), files it opens on the harddrive, libraries it's linked to, how it decides what to do. Anything & Everything there is to know about it. ^_^


There are plenty of built-in performance/introspection tools in OS X that you can try first before resorting to a third-party solution:

1) What ports it opens:

- netstat shows you what ports a program has running

- DTrace shows you all syscalls a process makes (among other things). dtruss is a convenient wrapper script included in OS X which shows you all the syscalls a process makes (including opening sockets.

2) What files it opens

- Again, DTrace's syscall provider lets you introspect all syscalls, including open(). There's even a handy wrapper script included with OS X called opensnoop.

- Alternately, you can use the fs_usage command line tool to tap into the xnu kernel's trace mechanism. This shows all sorts of filesystem events, including what files are opened.

3) What libraries a binary is linked to

OS X binaries use the Mach-O format, not ELF like most other Unixes. So you have to use OS X's binary introspection tools to understand that format rather than the standard GNU binutils. What you're looking for here is otool, which lets you introspect Mach-O binaries. Specifically, "otool -L /Applications/Mail.app/Mail" for instance shows you which libraries Mail links to. Run this recursively to get the transitive closure of all dependencies a binary links against. Another way to do this is to run "vmmap -v <pid>" to show you the vm layout of a process, which includes the __TEXT/__DATA segments of all libraries the process links against.

And of course, gdb/lldb is included with the developer tools, you can just attach to whatever process you care about and set breakpoints, type "info sharedlib" to see what libraries are in the address space, etc. Also, for better or worse, Objective-C is an extremely dynamic language, so you can even do things like write a shared library with code you want to inject into a process (potentially monkey-patching existing methods using ObjC categories) and dlopen it from gdb to insert it into the target process's address space.


Nice, never heard of otool or vmmap. I'll definitely try 'em out. thanks.


So, this might be naive, but for static information like the shared libs it's using, you might want to check out otool, and then pick up a good disassembler (I'm a fan of hopper, which is relatively cheap). For dynamic analysis you might want to check out the standard osx tools like netstat, and instrumentation tools like valgrind or gdb. gdb + breakpoints on choice system calls works pretty well on non-obfuscated binaries!


I'll have to try that out. So far I've tried iosnoop, lsof and dtrace to get an idea of what the program is up to. I did get a bit of info from those tools.


I will look into this a bit tomorrow. Can you supply an example command-line invocation?


http://blog.manbolo.com/2012/04/08/ios-automated-tests-with-...

This blog talks about the tool for automated testing. It's kinda "complicated" to setup. In the blog, he eventually gets to a commandline:

instruments -t /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/Library/Instruments/PlugIns/AutomationInstrument.bundle/Contents/Resources/Automation.tracetemplate "/Users/jc/Library/Application Support/iPhone Simulator/5.1/Applications/C28DDC1B-810E-43BD-A0E7-C16A680D8E15/TestAutomation.app" -e UIASCRIPT /Users/jc/Documents/Dev/TestAutomation/TestAutomation/TestUI/Test-2.js

But there's a lot of setup, so =/


The user guide gives a good flavour of the kind of things you can do with this ( http://software.intel.com/sites/landingpage/pintool/docs/584... ).


> Pin is proprietary software developed and supported by Intel and is supplied free of charge for non-commercial use.

Huh. What are the licensing conditions and price for commercial use then?


If this is a concern, feel free to use the open source, BSD-licensed main competitor: DynamoRIO: http://www.dynamorio.org/


Isn't valgrind a more popular competitor?


PIN/DynamoRIO and Valgrind have slightly different design aims. In short:

* Valgrind was designed to support rich analysis plugins (like Memcheck, which keeps a shadow copy of every bit of data) and performance was a secondary concern (on Valgrind, applications run on average about 4x slower, threads are serialized, etc).

* DynamoRIO and PIN are designed not to make much of an impact on performance (usually a few percent) and are more suitable for running in production, but it's somewhat more complicated to write plugins for them.

Both DynamoRIO[0] and Valgrind[1] maintain lists of publications which go into much more detail.

[0] http://www.dynamorio.org/pubs.html [1] http://valgrind.org/docs/pubs.html


How does this compare to dtrace?


They differ quite significantly. dtrace uses probes - points where instrumentation can be installed to inspect the process as it runs. Since it requires these probes to be defined, probes only come "for free" in kernel-space: e.g. tracing syscalls, pageins, that kind of thing. SystemTap offers similar functionality - userspace probes can be defined for it too.

Pin, on the other hand, dynamically rewrites the binary to inject instrumentation. This allows it to inject instrumentation code at a higher granularity (individual instructions). It's useful where the application might not define dtrace probes, or, for example on OSX, where the application has "opted out" of it to protect it from being screenshotted (a la iTunes).

Pin is more like a scripted debugger than an instrumentation tool.


Ah, very clear. Thank you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: