Block storage (virtio-block) is way simpler than file systems, both in the virtio implementation and in the host filesystem implementation. Simple is good, both for security reasons and because it makes it easier to reason about performance isolation between guests. The cost is obviously in features, and in making it hard to share data (and very hard to multi-master data) between multiple VMs. To see how simple virtio-block is, take a look at Firecracker's implementation (https://github.com/firecracker-microvm/firecracker/blob/mast...).
The existing alternative, as the post talks about, is 9p which has a lot of shortcomings, so this is interesting. I suspect it's use is mostly going to be in development environments and client-side uses of containers, and server-side users of virtualization will likely stick with block (or filesystem implemented in guest userland).
Is the use-case here really related to that of block storage?
I imagine a given pool of blocks is reserved for one of VM at a time. I wouldn't want to share the storage live between two VMs unless it was being accessed by some application specifically designed to hit a shared block store.
The point of something like virtio-fs would be to allow multiple VMs to share a file tree. It is a compromise between bind mounds (which are simpler, but only work for containers) and network filesystems (which work remotely, but are less efficient and well behaved).
The envisioned use case is Kata Containers https://katacontainers.io (formerly "Clear Containers"), which behaves like a container runtime but is actually a virtual machine with its own kernel to enhance isolation from the host. Since it runs its own kernel, it needs something like this to access files on the host.
The advantage compared to running normal VMs is that it's interoperable with things that expect a container runtime like Docker or rkt - e.g., Kubernetes can run Kata Containers - and the overhead / density / startup speed is much closer to that of actual containers (in the sense of namespaces+cgroups) than traditional VMs.
Too short text for that topic. Container world certainly needs a lot of storage problems resolved. So, why do containers not need this? What part of it do containers not need?
That's one way to look at it, but with sufficient effort on the quality of the host filesystem, you can offer better reliability, and offer features that are expensive/difficult to do properly from inside a VM. Streaming transaction logs and snapshotting (and if you're adventurous, deduplication), for example.
The email mostly talks about performance, but could this technique finally give us VM shared folders with proper inotify support (at least on Linux hosts)? That has always been my number one gripe with using Vagrant for local development.
I sure hope so! I'm encouraged to see a project potentially tackling this need. The current "best practice" to use a unison synched directory between the host and your VM (and even then maybe to a docker volume bind mount) is very error prone and comes with many pitfalls.
Mine as well, I have this horrible NFS setup that kinda/sorta works but this (if what you query is the case) would be awesome.
I even looked at writing my own 'vagrant' (to fit exactly my use case of Linux on Linux via KVM and deployed via ansible) because the NFS thing and VirtualBox issues mar what is otherwise a great idea.
This isn't to say these will make the guest filesystem more performant than NFS; you'll probably hit the same bottlenecks with mounting host filesystems into the guest. The solution many users arrived at with Vagrant is to use a user-space synchronization program like Unison that watches for filesystem events (eg, inotify) on both the guest and the host. This is its own can of worms: O(size of files monitored) resource use on both hosts, file ignore lists, async races, etc. There's even a Unison plugin for vagrant, but I found it quite finicky and thus unsuitable for large-scale deployment
At Airbnb, we eventually wrote our own wrapper around Unison to make it easier to use, and saw 90x better perf for filesystem-heavy loads like Webpack once we switched off of NFS in our local VMs.
Thanks for the info, I was aware of vagrant-libvirt and ansible (already use that to deploy vagrant guests) but unison I hadn't seen.
Funnily enough webpack is where I had the most problems (it certainly surfaces "what happens when we smack the FS in the face repeatedly" issues better than most) which is what got me thinking down the KVM route in the first place.
In terms of hacking around on vagrant, it's ruby which isn't on the list of languages I'm fluent in and its flexibility makes it more complex than what I need for my use case.
There is something to be said sometimes for rolling your own for your own itch but I'll certainly have a look at unison as part of vagrant first.
1. I no longer work at Airbnb, so I can't help you there.
2. The wrapper is somewhat deeply embedded in the dev tools swiss army knife, and is written in Ruby, so (a) it's difficult to extract since its tied with other Airbnb specific concerns and support libraries, and (b) distributing perf-sensitive Ruby code to end-users is challenging so you might not want it anyways.
You might want to take a look at Mutagen, a unison replacement written in Go which seems to avoid most of what makes Unison annoying to work with. I haven't tested it, though, so I can't vouch for it.
That makes sense. I hadn't heard of Mutagen before, but it seems like it would solve a lot of the issues that drove me away from Unison. Thanks for mentioning it!
Did it work well for you out of the box, or did you have do some configuration early on? I haven't gotten this to work for me since I upgraded LXC to 2.0 (something about the `lxc-create` command failing IIRC), but I never spent much time digging into it, so it's entirely possible it's something to do with my setup and not vagrant-lxc.
Same here, near-native performance is quite critical for certain use cases, including kernel compiling.
If you ever tried to compile from source over NFS, that's almost 10~30 times slower than compile from a native/local filesystem.
Does Vagrant have its own file sharing system, or does it use whatever the backends (mount for lxd, 9p for kvm, proprietary kernel module for virtualbox etc) happen to provide?
I understand that the Kata folks want this and why. Given what Google is doing with Crostini and the overlap between Kata/Firecracker/crosvm, I'm curious what ChromeOS engineers have to say about virtio-fs.
(And while I'm being curious of those folks, what their expectations are around virtio-wl going forward, upstream, etc?)
You can add shared folders at the host side at runtime and then mount them in the guest. Or you can expose network filesystems (CIFS, NFS) on either side and mount on the other.
Change the networking config from NAT to bridged. The VM will now appear as a truly independent device on your LAN, as if it were plugged into a switch.
Then you need to add a port forward to get through the NAT. However, NAT isn't security and you might be better off using a bridge and a firewall to shield your VM.
virtualbox adds a network interface to connect host and VM to each other, you can reach them via private IPs. the exact IPs used depend on the type of network mapping used (NAT, host-only, bridged)
Not clearly IMHO. If the VM being able talk to the host is not acceptable to you, even if you only activate that during runtime when you need it, then scp obviously isn't a solution.
If the file is saved on a kind of disk, and not only in-memory on your VM, it is already on your host in the disk image of the VM, you just extract it. With qemu I usually loopback mount in readonly mode the disk and access whatever I want from the VM guest that way.
A Default VirtualBox configuration does not allow direct host-> guest access, it's on some hidden VB-only NAT. You need to change the Network Interface type to Bridged or Host-only or something else to have access to the VM.
I haven't done it for some time, but in my memory it was neither a problem on OSX nor on Linux with VBox. Maybe try Vagrant? I don't remember, sorry. But ask around and you should find a super easy way to make this happen.
It sounds like you are talking about NAT mode on virtualbox. In this case you can set up port forwarding in the network settings. Also, you can actually reach the host from the guest. In my case my guest has an IP of 10.0.2.15 by default, and I can reach the host on 10.0.2.2
Any chance of getting a kext for this filesystem on MacOS, if it's the host? I'm still searching for the holy grail for using containers in development, and the 9p implementation used by most docker-on-mac setups is extremely slow.
In theory file system drivers for this can be implemented for any operating system that allows third-party file systems. virtio-fs is based on FUSE, so a starting point would be existing macOS FUSE implementations (I haven't checked the status) and then adding the virtio device plumbing.
GlusterFS is a distributed storage system. virtio-fs is a shared file systems for virtual machines.
You can expose GlusterFS directory trees to virtual machines using virtio-fs but virtio-fs itself doesn't do the network communication or policy decisions about where and how data is stored on the network - that's the job of GlusterFS.
The reason you might want to use both together is to hide the details of the GlusterFS configuration from the virtual machine. That way the virtual machine only sees a virtio-fs device and doesn't have network access to your GlusterFS cluster.
> Can't we modify syncthing so that it becomes like a low latency shared filesystem and uses bitttorent as an underlying protocol.
Well... at that point you're writing new software, so you might as well start from the ground up with something actually designed for this use case, which provides support for things like live coherency, writeback and caching.
It's not easy to just make a program "become" low-latency. Syncthing is extremely high-latency to begin with, and is specifically not meant to be used as a real-time sync application which multiple applications can depend on for live coherency. This is the design of Syncthing. Same for bittorent. Tons of overhead due to the design of the system. For something like this which requires true low latency, there are just too many design decisions which would get in the way.
One major pain point about Syncthing in particular is that there isn't a very good way to maintain complete consistency between different OS environments. Syncing files from Linux which are not valid in Windows (name, metadata, etc) either causes filesystem corruption, reduces coherency between guests and host, or straight just drops incompatible files.
Try copying a file from Linux to Windows which has a trailing space. Windows will go bonkers and cannot delete or modify this file. You have to move it to a partition which you then delete.
I was so frustrated with using Syncthing that I gave up trying to make it useful.
You are implying that you want to store the data twice on the host and the guest. The proposed virtio-fs (and also the existing virtio-9p) is about passing through a subtree hierarchy stored on the host filesystem into the guest.
Block storage (virtio-block) is way simpler than file systems, both in the virtio implementation and in the host filesystem implementation. Simple is good, both for security reasons and because it makes it easier to reason about performance isolation between guests. The cost is obviously in features, and in making it hard to share data (and very hard to multi-master data) between multiple VMs. To see how simple virtio-block is, take a look at Firecracker's implementation (https://github.com/firecracker-microvm/firecracker/blob/mast...).
The existing alternative, as the post talks about, is 9p which has a lot of shortcomings, so this is interesting. I suspect it's use is mostly going to be in development environments and client-side uses of containers, and server-side users of virtualization will likely stick with block (or filesystem implemented in guest userland).