If you are doing a single core virtual machine, the hypervisor is properly implemented, and you can change the hypervisor then it is fairly straightforward. You just funnel everything through a single guest memory write function and then instrument that.
Unfortunately, virtual machines are basically only ever used in multicore configurations which are fundamentally unconstrained multithreading which makes record-replay based (versus full trace based) implementations impossible (or at least very challenging) unless you serialize.
The device driver interface is most likely going to be trapping accesses and communicating back via asynchronous shared memory writes if not pass-through or paravirtualized. I suspect most hypervisor implementations will not even allow you to intercept/trace those operations externally, and if they do it is likely going to be hard to sequence them relative to VM execution/instruction stream as they are rooted in a shared memory interface.
If they are paravirtualized, or just in general, the hypercall interface is also usually very ad-hoc and ill-defined and also likely not possible to intercept or trace. The scope is probably going to be more like trying to represent the effects of every random vendor-specific ioctl.
The assumption here is that you're modifying the hypervisor itself. No-one expects to be able to bolt a record-replay tool onto a third-party hypervisor.
Well, that is what you did with rr (bolted it onto a third-party unmodified OS) so I thought I would cover all the bases.
Once you are in modification land it really depends on the kind of hypervisor and its implementation. Like doing it for a OS, a well-designed, consistent, and tight interface makes it pretty easy. But, the unconstrained multithreading for multicore VMs makes it problematic for record-replay solutions if you do not want to serialize.
Yes, this is true. Genuine parallelism will absolutely be a problem there - similarly to how it is for user-level record/replay tech. And serialising is going to have more impact when you're recording a whole OS vs just one application.
Microsoft's TTD (a user-level tech) does allow genuine parallelism, I believe. But my understanding is that it has slower (though still very impressive) single-threaded performance as a result, so there's a tradeoff. But perhaps you could do something more like that.
We believe it's possible to handle parallelism in record/replay tech while still having good single-threaded performance - but it implies a much more complex implementation to do so safely.
The literature on Microsoft TTD [1][2] indicates it is a instruction emulator full-trace approach. It is a little unclear if it is a basic emulator or a instrumented JIT approach, but the reported ~10x overhead is more consistent with a basic emulator. Though maybe their recording implementation is just inefficient causing such excess overhead.
The Linux kernel supports pretty powerful APIs for inspecting and managing processes, like ptrace and seccomp and /proc, from another userspace process. There is no comparable set of APIs for hypervisors, unless Xen has something maybe?
And we don't handle unrestricted multithreading in rr. If we wanted to it would complicate things a lot and I don't know if existing kernel APIs would be adequate. We're skating along the edge of feasibility as it is.
Unfortunately, virtual machines are basically only ever used in multicore configurations which are fundamentally unconstrained multithreading which makes record-replay based (versus full trace based) implementations impossible (or at least very challenging) unless you serialize.
The device driver interface is most likely going to be trapping accesses and communicating back via asynchronous shared memory writes if not pass-through or paravirtualized. I suspect most hypervisor implementations will not even allow you to intercept/trace those operations externally, and if they do it is likely going to be hard to sequence them relative to VM execution/instruction stream as they are rooted in a shared memory interface.
If they are paravirtualized, or just in general, the hypercall interface is also usually very ad-hoc and ill-defined and also likely not possible to intercept or trace. The scope is probably going to be more like trying to represent the effects of every random vendor-specific ioctl.