I've always wanted to know the answer from someone who actually implemented a VM: Why do people prefer stack-based VMs over register-based? They are obviously a simpler design, but register-based designs are closer to the actual machine, and I've read somewhere that register-based VMs generally perform better.
In addition to the other answers, "closer to the machine" isn't necessarily true either.
The obvious implementation of a register file for an interpreter is using an array of registers, something like:
uintptr_t registers[N];
You can then access the i-th register as registers[i], like the machine does. Except that these registers live in memory, not in CPU registers, so you aren't really that close after all! And the compiler cannot in general map these array entries to CPU registers because they are accessed indirectly, by index. Also, you typically have a different number of virtual registers from the actual CPU.
So the advantages of register machines do not come from this theoretical closeness to the machine. They can still be better, though, because you eliminate a lot of bytecode for shuffling data around. I think the canonical source for the possible speedups is "Virtual Machine Showdown: Stack vs. Registers": http://www.usenix.org/events/vee05/full_papers/p153-yunhe.pd...
This paper also gives a very nice copy propagation algorithm that eliminates much of the need for a real register allocator, if you start from stack-based input code.
I wrote a longer justification for this in the introduction of the second book, but the short version is exactly what you wrote: they're easier to build. Since since the goal of the book(s) is not to show how to build production-ready, industrial-grade interpreters, but rather to learn and to understand, that was a good enough reason for me.
And thinking about it some more, a register-based bytecode is less flexible and future proof. A JIT can do register allocation according to the register set of the current (runtime!) target, while a register-based bytecode would be fixed, thus requiring register remapping or reallocation by a JIT to better match the actual hardware.
If a JIT compiler can do register allocation, then it can also do this "reallocation", which is the same thing: mapping symbolic (virtual register) names to CPU registers.
> And thinking about it some more, a register-based bytecode is less flexible and future proof.
The flow analysis that modern JITs use to extract an SSA form from stack-based bytecode is just as easy to perform on register-based bytecode. This SSA form is then optimized, instructions selected, and CPU registers allocated.
From one perspective, you have a continuum from (essentially) infinite register machines with SSA (LLVM bitcode, SafeTSA) or memory machines (such as Inferno's Dis VM) to 256-register VMs (LuaJIT, Lua, others) to 16-register VMs (Dalvik, maybe others) to stack machines (JVM, p-code, FCode, etc.). From one perspective, stack machine bytecode is a compact representation of code for a register machine with 2 registers. Register machines aren't any less flexible than stack machines.