In Smalltalk, you can start writing a debugger that lets you browse a stack trace in under 5 minutes. The debugger is just an ordinary application working on (mostly) ordinary objects that happen to be the meta-level of Smalltalk. (In particular, the contexts.) To complete the debugger, you just need to implement a Smalltalk VM without GC, which is not all that hard, as it's little more than a 256 case switch statement. Basically, your debugger is a VM you control through an app, running the debugged process.
I fail to see how this is relevant. Smalltalk is a VM language, which makes things very different. For instance, I can say that for Python you easily write a debugger using the C API of the VM, and in fact you don't need to write a debugger, since one is part of the standard library. What point does this make?
Smalltalk is a VM language, which makes things very different
Yet somehow, you have stumbled right on one of the major points.
VM languages are often about eliminating impedance mismatch between ordinary coding and the metamagical stuff. Python is another good example, but I am less familiar with Python than Smalltalk. The same goes for Lisp and Ruby. I have written the "5-minute debugger" in Smalltalk as part of a presentation. (Really, it's just a stack browser.) I haven't done the same for the other languages.
In nearly every application domain, you can compare the Smalltalk way and the Other, and in nearly all the Smalltalk is better, but that doesn't mean people will listen.
It's best to explicate your point using a 50-page tutorial on the Java Debugging API, Platform Introspection, and .NET Tracing Hooks, or whatever. Make it so hard that it's worth our time, then we will listen.
When a question like "How X works" is posed, it's best to pause for a minute and solve for the most general form of X, not specific instances.
Whenever you see C, C++, Unix or assembly in what should be a very "foundations" type article, you need to pause and ask yourself "why?". These platform-dependent details don't help us learn anything at all, and their presence is essay-smell.
For debuggers, no two are ever exactly alike. It's a generic term for a class of software that's more of a continuum. At a minimum, they help us set "breakpoints" at specific locations, and allow for manual intervention when that location is executed, and permit us to inspect or edit the application state at that moment.
That's the most generic description of it, and every italized word above is a semantic mine-field, if we take into account the breadth of programming paradigms and their vast differences in machine implementation, program-shape (is "code" a vector? tree? graph?) and the subtleties of their execution models, distribution (where is program location), and temporal properties (when is the program running? time, what a precious concept that we take for granted!)
For the interruption problem, there are a few common approaches:
+ By instruction editing, for linearly executable programs where the "code" is a writable vector, it's common to insert specific debugging instructions (HLL code instrumentation falls under this)
+ By an interrupt table; for programs where the executing machine is virtual or itself programmable, it's sometimes easier to assign breakpoints to locations within the program, and trigger an interrupt when that location is reached. The machine maintains an "interrupt table", either a global one, or per "application" (i.e. process, thread, your granularity du-jour, etc.)
+ A rule-based approach where the break-point is triggered based on semantic meanings produced by the program as it executes, or based on the shapes of its execution and dataflow graphs. Requires the machine to be programmed almost at the same abstraction level as the application, and perhaps tight integration. Most debuggers for logic programming languages operate at this level; they're more equation solvers than blunt shells for machine memory.
And many, may more. There are as many debugger designs as there are programming paradigms, execution models or language implementations. Free yourself from the short-sighted tyranny of antique designs (blech Unix and x86!) and discover the wonderful world of formal systems and abstraction! languages, machines, type-systems and semantics, all on a whim, as weird and wonderful as you want them to be.
It's all very well to understand something in the abstract, but frequently that doesn't easily translate to the specifics without a lot of running around mapping concepts. When your machine is abstract, or even better, a concrete implementation of something virtual, the kinds of manipulations you need for debugging are pretty trivial. But if that's all you know about debugging - only understanding it at that high a level - it will limit your practical usefulness actually getting work done in many scenarios.
Device drivers generally don't run on virtual machines; likewise most C programs, nor most VM implementations, nor most interop through foreign function interfaces in heterogeneous environments. Understanding how these things can be debugged on any particular architecture is applied knowledge, but it is knowledge nonetheless, and no less worthy of a blog post written for those who would learn more about the topic.
Considering a concrete case that people using Delphi for Windows take for granted: integer divide by zero diagnostics, communicated via exceptions on Win32. If your implementation is a virtual machine, creating a debugger to handle this kind of situation is ridiculously trivial; it's just a special case in however you implement divisions. On the x86, it's a little trickier; but Windows makes it easier for you. I don't remember all the details now (it actually may not even be for integer divide, but I think it is), Windows will handle the CPU interrupt, look at the faulting code, disassemble it, figure out what kind of problem it was, and then make sure to dispatch a structured exception corresponding to an integer divide by zero, rather than a more general arithmetic exception. It does this work for Win32, but not Win64, as I recall. Now, the mechanisms for how structured exceptions are dispatched is an article in itself, and very different for Win32 vs Win64; then there's the details of how debugger events are propagated to the debugger in Windows, etc. All this is part of "how debuggers work" in the concrete.
But if all you have is an abstract understanding of debuggers, it won't help you much when you're thrown into the deep pool of real-world writing of debuggers. The devil is in the details; and you get paid for dealing with that devil, not for floating around with the angels of "clarity of thought". The fact is, you won't get much done in the large without also having that clarity of thought at the higher level, and in particular, you won't innovate much.
When the title of the article is "How debuggers work", I thought it might be more profitable to actually fulfill that premise, and not get side-tracked by the quirks of ptrace. (For example, no one could implement a debugger for, say, python scripts just reading this article.)
A better title might have been "how to trace child processes under POSIX".
Solving the general case is not just an academic exercise, it's also a richly rewarding learning experience. When people learn to think abstractly, none of these implementation details really matter. You can always specialize and learn the details of an specific instance; it's best to get a good grasp of the big-picture.
[Edit: Instead of discussing "debuggers", in general, I hoped to narrow down and focus on just interruption mechanisms. Third-party execution tracing and stepping are usually outside language semantics, so there is some "art" to gaining control from an executing program. It would be nice to catalog the lore, and I hope others add to the three I have suggested above]
The fundamental mechanism: Setting a breakpoint at a particular instruction requires overwriting the instruction at the address to "int 0x3" (which is exactly 1 byte long).
Something nice to think about: why should the breakpoint interrupt instruction be exactly 1 byte long?
Actually, it doesn't have to be. There are 3 forms of the interrupt instruction. 2 1 byte versions for specifying either int 3 or int 4, and a 2 byte version that can specify any interrupt number. It's possible to use either the one byte or 2 byte versions of int 3. Also, by adding arbitrary instruction prefixes (for the "int" instruction the are meaningless) it's possible to specify a breakpoint using anywhere from 1 to 15 bytes.
However, their really is no reason to use more space than necessary. If you can encode an instruction with 1 byte, it doesn't make any sense to encode it using any more bytes then what you need. Every byte that gets updated when a break point is set needs to be backed up and restored once the breakpoint is hit. Using extra bytes will hurt runtime performance.
This is particularly true given the "ptrace" interface on Linux. It only supports reading or writing a single word (2 bytes) of memory in the target process at a time. Backing up and restoring any more than 2 bytes would require extra calls into the kernel, plus extra memory barriers to ensure caches get updated. Using a scheme that did that more than once would just waste CPU cycles.
So now we want to set a breakpoint on the "push ax". If we do it using more than a single byte, it will overwrite the int 21h instruction and the code will mess up and likely crash.
In x86 assembly (and a lot of other contexts), 16 bits are called a word (a left over from 16 bit days when 16 bits were in fact a machine word), and larger data types are named dword (double word, 32 bits), qword (quad word, 64 bits), and so on. This terminology is even included in certain instruction mnemonics (movsb|movsw|movsd, ...).
You're thinking too high-level. We're talking about a "word" in x86 instruction set parlance. It's an artifact of the old 16-bit computing days. Nothing to do with the OS.
I agree that we should't deviate from the subject too much. My reply was intended to fix the generally incorrect comment regarding the size of the word read by ptrace.
In ptrace:
> The size of a "word" is determined by the OS variant (e.g., for 32-bit Linux it's 32 bits, etc.).
Not quite. Some processors, including X86, have hardware support for breakpoints (http://en.wikipedia.org/wiki/X86_debug_register). It may also be possible to (mis)use virtual memory hardware for setting breakpoints. Also [pedantic], one overwrites a byte, not an instruction.
Using a single byte is necessary because the program could jump to (address+1). If that address contained part of the breakpoint code, program semantics could change. O, and jumping to (address+1) could even be useful if address contains a multi-byte instruction.
mov eax,[x]
or eax,eax
jz foo
dec eax
foo call bar
The instruction "dec eax" is one byte in size. If you want to place a breakpoint there, and you used the two byte form of "int 3", then when the code did the "jz foo" (jump if the previous result was zero to location foo) the "call bar" instruction would be partially overwritten and form a new instruction. If the condition leading the breakpoint isn't taken, you now have some other instruction (it ends up being an "add" instruction) which is bad.
That's why there's a one byte version of "int 3", because there are one byte instructions.
Just thought you're referring to something else. Thanks for clearing this up; I hope you don't mind if I use your example in the next part of the series. :-)