Hacker News new | past | comments | ask | show | jobs | submit login
Intel Virtualisation: How VT-X, KVM and QEMU Work Together (binarydebt.wordpress.com)
198 points by Foe on May 3, 2020 | hide | past | favorite | 23 comments



There is a real lack of good textbooks on modern virtualization techniques and hardware. Would be interested to hear any recommendations in case I've missed something.


Have you read Edouard Bugnion’s book? “Hardware and Software Support for Virtualization”. It explains the modern techniques as well as VMware’s trap-and-binary translate hack (using segment truncation) which remains IMO one of the greatest underappreciated achievements in the industry, considering it basically turned unexpected behavior into a multibillion dollar company and ushered in X86 virtualization, server consolidation and ultimately cloud computing.

Ed was one of the founders of VMware: https://en.wikipedia.org/wiki/Edouard_Bugnion

Looks like the whole book is freely available in PDF form: https://www.semanticscholar.org/paper/Hardware-and-Software-...


Awesome, from the TOC it looks exactly like what I want. I'll check it out.


I'd be curious what topics people would find most interesting. The whole COVID lockdown has left me with an abundance of "free" time, and I work professionally in this space (on virtualization for Google Compute Engine). I've been feeling the need to write lately[0]. I know better than to try to tackle a whole textbook, but I'm hoping I could manage a series of blog posts.

[0]: My work has put me in a lot of meetings for quite some time. I've felt a decline in my written communication skills: it takes me a lot longer to write things than it used to, and I'm less happy with the results. Nothing to be done for it but to write!


How virtualized page tables work. SR-IOV. The mechanics of binary translation. The challenges of security in virtualization.


Alright, that's looking like the start of a syllabus, although I may need a guest writer for binary translation (less of that these days, but a friend and colleague was the first employee at VMware, so I'll bug him and see if he wants to write :)


I hope you share that writing here in the future then. I look forward to reading it. Cheers.


I just posted a comment asking about multiprocessing/preemption - how do the root OS and virtualized OS share the CPU core when the virtual OS assumes it is the only thing responsible for scheduling thread execution?


The virtual CPU is (typically) represented as a single OS thread. Depending on how you've configured the virtualization hardware, it may look like a thread that's always runnable, or it may block whenever the guest kernel halts that CPU (while waiting for an interrupt). While the VCPU is running the guest OS is free to context switch what's running there. If it halts, and the CPU has been set up to generate a VM exit on halt, it may take longer to wake it up: and IPI from another VCPU has to hit the host OS, which then has to put the target VCPU thread back into a runnable state, and that thread has to begin running and issue a VMRESUME before the IPI completes. If it sounds like this could have disproportionate performance impact, it can. It works at all because operating systems tend to have fairly relaxed timing expectations (in addition to being largely event driven). Also, virtualization isn't the only source of potential stalls in a modern processor. Consider that a transition from the lowest power state up to a running state can take tens to (iirc) hundreds of microseconds. Generally a pass through the scheduler would be much faster, unless things are contended, _or_ unless the host scheduler needs to wake up a physical CPU -- now the costs compound!

A related problem to consider in the non-virtualized space: the Go runtime scheduling Goroutines onto OS threads. It has to contend with many of them same "is the target thread running" problems, and exhibits some of the same pathologies. Also, once the number of runnable host threads approaches the number of physical threads, the potential for contention, stalls, etc. increases dramatically. Once it exceeds that threshold, it becomes unavoidable. Collectively, all of these issues are commonly filed under the heading of "jitter" -- things that break the illusion of a continuously executing stream of instructions on a computer dedicated just to one thread.


There is a user space thread on the host for each virtual CPU core being offered to the guest. The host OS runs the virtual CPU by scheduling this thread. The host kernel scheduler subdivides the time on the virtual CPU that has been given to it by the host for its threads/processes.


Most hypervisors make this information available to the guest, you can see it as st% in top for example.


Just to expand the abbreviation: steal time percentage

It's important to know for the OS because otherwise your clock will be way off. There's probably other issues that would pop up too that I'm just forgetting right now.


Wouldn't this only be detectable to a guest that's aware it's being virtualized (that is, a paravirtualized one?)


I have an impression that terms "paravirtualization" and "full virtualization" are fuzzy nowadays. KVM, for example, reports steal time. Yet, KVM is considered to be full virtualization.

Also, guest can still guess that it is a mere guest by looking into loaded drivers. Many drivers, like virtio-blk, inevitably lead to conclusion that environment is virtualized, no matter what cpu says.


I might be wrong, but I think "paravirtualized" means that you don't have to modify your kernel to work with virtual CPU / virtual memory. It doesn't mean "guest isn't aware it's being virtualized"


Paravirtualization means that the virtualization does not entirely pretend to be real hardware, but has special interfaces that require OS modifications or special drivers to use (although sometimes optional. E.g. KVM offers full virtualization, but also has the option of providing VirtIO devices, which you might call paravirtualization)



Excellent write-up. first time i had clarity on the role of each part and i have been using all of these tools for a very long time.


I wonder if a CPU ISA change couldn't make virtualization more efficient? Dedicated instructions for IO for example instead of using Load/Store.. D But I don't think new ISA such as RISC V do this, I don't know why.


I guess the question is, what would the new instructions do differently?

Plus, everything out there supports DMA already. Would you have to buy different HW to support this new scheme?

An IOMMU + DMA is already pretty good, a lot of the bottleneck at this point is really in the IO devices themselves these days (network cards and SSDs are only so fast).


The problem is that not all device drivers take advantage of IOMMU, to run device drivers as user level programs.


Maybe I'm just ignorant, but my impression was that the device drivers don't have a choice if they're running in a VM. The hypervisor sets it up so that all memory access was forced through the IOMMU.


> I wonder if a CPU ISA change couldn't make virtualization more efficient

Without knowing the details I am convinced the architecture has an impact. Until some 15 years ago Intel archtecture had the reputation of being impossible to virtualize. I don't remember whether KVM which requires HW support or Xen which doesn't but requires support by the guest (paravirtualization) was first. I guess it was the latter, but either way they became feasible in rather quick succession.

On the other side IBM /370 architecture was designed for virtualization from the beginning. Already in the early 1980s virtualization was standard (possibly even earlier).

I'm quite sure difficult virtualization does not come for free efficiency-wise either.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: