Hacker News new | past | comments | ask | show | jobs | submit login

You're absolutely right and thank you for the clarification. I didn't intend to conflate sparse files with file-based disk images, but I was trying to convey that there can be a difference between the logical data of a disk image and the physical data, and that deferred zeroing is the default and the expectation of developers and sysadmins. Images can be sparse and/or file-based, as the features are orthogonal, if cross-cutting.

More importantly, you clarify that RZAT is a necessary feature for what I'm mentioning to work properly. You're right. They should both be ensuring the blocks served to customer VMs are zeroed on use and ensure that they are appropriately running TRIM commands to ensure maximum performance from their hardware. Not all SSDs perform RZAT, and it wouldn't be a bad idea for the host to ensure the device is logically zeroed for the VM anyway.

DigitalOcean could easily switch to doing both, or at least guaranteeing the former by creating new logical disks for customers as every other vendor does. If, as they have blogged about in the past, they are directly mapping virtualized disks to the host's LVM volumes, they are unnecessarily complicating their hosting set up and making their host configuration more brittle. With thin-provisioned/sparsely-allocated or with file-based virtual disk images, they can more flexibly deploy VMs with different disk sizes with minimal changes in host configuration.

Alternatively they could trivially ensure that even forensic tools would have a very difficult time erasing volumes by enabling dm-crypt on top of LVM, and resetting the key every time a virtual machine is deleted. This could reduce performance on some SSDs (particularly SandForce based models) but would allow minimal changes to their configuration to ensure deleted data is unrecoverable.




Using 1:1 mappings of LVM logical volumes to guest VM block devices is the most straightforward and performant method of doing it on Linux, short of doing 1:1 mappings of entire disks or disk partitions to guest VM block devices. While using file-based disk images would prevent data leaks between customers without any further effort required on the VM provider's part (assuming they don't reuse disk images between customers!), there are tons of downsides to file-based disk images, mostly related to performance and write amplification.

I don't agree that file-based disk images are more flexible than LVM's logical volumes — it's ridiculously easy to create, destroy, resize, and snapshot LVs.


Until very recently there were serious problems with putting LVM under any sort of concurrent load. Making more than a few snapshots at the same time, for instance, was asking for trouble. I say "was" - I've got no idea if these problems were fixed. You just don't have those problems with file-based images.


Yeah but you can't snapshot a file based image, so lvm without snapshots is just as good (and much faster).


ZFS and btrfs both let you do this, as do qemu COW images.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: