note, sparse files have... performance issues. in the very best case you are going to end up with a lot of fragmentation where you aren't expecting it. I was on sparse files in 2005; I'm having a hard time finding my blog posts on the subject, but I didn't switch to pre-allocation 'cause I like paying for more disk, you know?
You are right about the zeroes, though; sparse-files solve that problem. and this is what I personally find interesting about this article. I would be very interested to find out what the Digital Ocean uses for storage. This does indicate to me that they are using something pre-allocated; I can't think of any storage technology that allows over-subscription that would not also give you zeroes in your 'free' (un-allocated) space.
>For virtual hard disk technology, Hyper-V VHDs do it, VMWare VMDKs do it, sparse KVM disk image files do it. Zeroed data is the default, the expectation for most platforms. Protected, virtual memory based operating systems will never serve your process data from other processes even if they wait until the last possible moment. AWS will never serve you other customer's data, Azure won't, and none of the major hypervisors will default to it. The exception to this is when a whole disk or logical device is assigned to a VM, in which case it's usually used verbatim.
Yeah, the thing you are missing there? VMWare, well... it's a very different market. Same with Hyper-V. And sparse files, well, as I explained, suck. (I suspect that to the extent that Hyper-V and VMware use sparse files, they also suck in terms of fragmentation, when you've got a bunch of VMs per box. But most of the time if you are running VMware, you've got money, and you are running few guests on expensive, fast hardware, so it doesn't matter so much.)
Most dedicated server companies have this problem. Most of the time, you will find something other than a test pattern on your disks, unless you are the first customer on the server.
No matter who your provider is, it's always good practice to zero your data behind you when you leave. Your provider should give you some sort of 'rescue image' - something you can boot off of that isn't your disk that can mount your disk. Boot into that and scramble your disk before you leave.
I know I had this problem, too.. many years ago when I switched from sparse files to LVM-backed storage. Fortunately for me, if I remember right, Nick caught it before the rest of the world did. I solved it by zeroing any new disk I give the customer. It takes longer, especially when I ionice the dd to the point where it doesn't kill new customers, but I am deathly afraid (as a provider should be) of someone writing an article like this about me. Ideally, I'd have a background process doing this at a low priority on all free space all the time, but making sure the new customer gets zeroes, I feel, is the most certain way to know that the new customer is getting nothing but zeroes.
>Zeroing an SSD is a costly behavior that, if not detected by the firmware, will harm the longevity of the SSD by dirtying its internal pages and its page cache. Not to mention the performance impact for any other VMs that could be resident on the same hardware as the host has to send 10s of gigabytes of zeroes to the physical device.
Clean failures of disks are not a problem. Unless you are using really shitty components (or buying from Dell) your warranty is gonna last way longer than you actually use something in production. Enterprise hard drives and SSDs both have 5 year warranties.
The dd kills disk performance for other guests on spinning disk if you don't limit it with ionice or the like, and that's the real cost. I would assume that cost would be much lower on a pure ssd setup.
You are right about the zeroes, though; sparse-files solve that problem. and this is what I personally find interesting about this article. I would be very interested to find out what the Digital Ocean uses for storage. This does indicate to me that they are using something pre-allocated; I can't think of any storage technology that allows over-subscription that would not also give you zeroes in your 'free' (un-allocated) space.
>For virtual hard disk technology, Hyper-V VHDs do it, VMWare VMDKs do it, sparse KVM disk image files do it. Zeroed data is the default, the expectation for most platforms. Protected, virtual memory based operating systems will never serve your process data from other processes even if they wait until the last possible moment. AWS will never serve you other customer's data, Azure won't, and none of the major hypervisors will default to it. The exception to this is when a whole disk or logical device is assigned to a VM, in which case it's usually used verbatim.
Yeah, the thing you are missing there? VMWare, well... it's a very different market. Same with Hyper-V. And sparse files, well, as I explained, suck. (I suspect that to the extent that Hyper-V and VMware use sparse files, they also suck in terms of fragmentation, when you've got a bunch of VMs per box. But most of the time if you are running VMware, you've got money, and you are running few guests on expensive, fast hardware, so it doesn't matter so much.)
Most dedicated server companies have this problem. Most of the time, you will find something other than a test pattern on your disks, unless you are the first customer on the server.
No matter who your provider is, it's always good practice to zero your data behind you when you leave. Your provider should give you some sort of 'rescue image' - something you can boot off of that isn't your disk that can mount your disk. Boot into that and scramble your disk before you leave.
I know I had this problem, too.. many years ago when I switched from sparse files to LVM-backed storage. Fortunately for me, if I remember right, Nick caught it before the rest of the world did. I solved it by zeroing any new disk I give the customer. It takes longer, especially when I ionice the dd to the point where it doesn't kill new customers, but I am deathly afraid (as a provider should be) of someone writing an article like this about me. Ideally, I'd have a background process doing this at a low priority on all free space all the time, but making sure the new customer gets zeroes, I feel, is the most certain way to know that the new customer is getting nothing but zeroes.
>Zeroing an SSD is a costly behavior that, if not detected by the firmware, will harm the longevity of the SSD by dirtying its internal pages and its page cache. Not to mention the performance impact for any other VMs that could be resident on the same hardware as the host has to send 10s of gigabytes of zeroes to the physical device.
Clean failures of disks are not a problem. Unless you are using really shitty components (or buying from Dell) your warranty is gonna last way longer than you actually use something in production. Enterprise hard drives and SSDs both have 5 year warranties.
The dd kills disk performance for other guests on spinning disk if you don't limit it with ionice or the like, and that's the real cost. I would assume that cost would be much lower on a pure ssd setup.