Hacker News new | past | comments | ask | show | jobs | submit login

As others have said: block cloning (the underlying technology that enables copy-on-write) allows you to 'copy' a file without reading all of the data and re-writing it.

For example, if you have a 1 GB file and you want to make a copy of it, you need to read the whole file (all at once or in parts) and then write the whole new file (all at once or in parts). This results in 1 GB of reads and 1 GB of writes. Obviously the slower (or more overloaded) your storage media is, the longer this takes.

With block cloning, you simply tell the OS "I want this file A to be a copy of this file B" and it creates a new "file" that references all the blocks in the old "file". Given that a "file" on a filesystem is just a list of blocks that make up the data in that file, you can create a new "file" which has pointers to the same blocks as the old "file". This is a simple system call (or a few system calls), and as such isn't much more intensive than simply renaming a file instead of copying it.

At my previous job we did builds for our software. This required building the BIOS, kernel, userspace, generating the UI, and so on. These builds required pulling down 10+ GB of git repositories (the git data itself, the checkout, the LFS binary files, external vendor SDKs), and then a large amount of build artifacts on top of that. We also needed to do this build for 80-100 different product models, for both release and debug versions. This meant 200+ copies of the source code alone (not to mention build artifacts and intermediate products), and because of disk space limitations this meant we had to dramatically reduce the number of concurrent builds we could run. The solution we came up with was something like:

1. Check out the source code

2. Create an overlayfs filesystem to mount into each build space

3. Do the build

4. Tear down the overlayfs filesystem

This was problematic if we weren't able to mount the filesystem, if we weren't able to unmount the filesystem (because of hanging file descriptors or processes), and so on. Lots of moving parts, lots of `sudo` commands in the scripts, and so on.

Copy-on-write would have solved this for us by accomplishing the same thing; we could simply do the following:

1. Check out the source code

2. Have each build process simply `cp -R --reflink=always source/ build_root/`; this would be instantaneous and use no new disk space.

3. Do the build

4. `rm -rf build_root`

Fewer moving parts, no root access required, generally simpler all around.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: