Note that NILFS makes old, crappy SSDs scream. Also, the article makes NILFS look good by comparing it to old filesystems, not COW-tree filesystems that are the current state of the art. NILFS may have merit, but this article doesn't demonstrate that in an honest way.
One of the most noticeable features of NILFS is that it can “continuously and automatically save instantaneous states of the file system without interrupting service”. NILFS refers to these as checkpoints. In contrast, other file systems such as ZFS, can provide snapshots but they have to suspend operation to perform the snapshot operation. NILFS doesn’t have to do this. The snapshots (checkpoints) are part of the file system design itself.
Pardon my ignorance of SSDs and how they work, but aren't seek times supposed to be low or negligible? Isn't a log-structured FS optimised for a disk with high seek time?
I guess my questions boil down to:
1. Why is this approach fast on an SSD? Wouldn't the performance boost be more noticeable on a spinning disk?
2. Shouldn't someone try to make an FS optimised for SSD characteristics? Have someone already done this?
Isn't a log-structured FS optimised for a disk with high seek time?
Sort of. It makes writes sequential (and thus fast) and reads nearly random (and thus slow).
Why is this approach fast on an SSD? Wouldn't the performance boost be more noticeable on a spinning disk?
Old, crappy SSDs have very slow random writes (which is like seeking), so log structuring increases write performance by orders of magnitude. Since random and sequential reads are equally fast on SSDs, the randomized layout doesn't hurt read performance as it does on disks.
Shouldn't someone try to make an FS optimised for SSD characteristics?
If the characteristics are different for every SSD model, no. Recent SSDs are becoming insensitive to access pattern, so the correct "optimization" is probably to perform no optimizations other than trim.
I suspect, without being certain, that recent SSDs are doing this by implementing something like a log-structured filesystem in a translation layer. I also suspect that this is something that's better done in your OS kernel than in your SSD controller, because it has better information to work with.
The correct optimization would be one to minimize wearing out the disk. Essentially, something that would move hot data out from blocks that are in danger of being worn out.
While a speed boost is great and all, don't you have the added concern of making sure your blocks don't wear out on an SSD? Because log structured file systems commit metadata and data in large sequential writes to the hard drive, they have background processes that constantly move data around to prevent massive fragmentation from files that are later deleted. It seems like there would just be too much writing going on if what you want is a long-lasting SSD.
They do mention in the article that it would be a bad idea to use NILFS on your root partition because of heavy traffic, so lets say I decide to use it on a volume where I store all my media (where I pretty much write once, read a lot, and never delete). Now, a) why would I use an SSD for this given that they're so much more expensive and b) wouldn't I get comparable performance using an HDD here since my files will be laid out pretty much sequentially given my workload? Doesn't the same apply to companies storing mounds of data? Where is the middle ground where I would want to use NILFS on an SSD? One where I a) won't be sending enough write traffic to my drive to wear out my SSD too quickly and b) will be reading a lot of random blocks?
What am I missing? I see the merits in using NILFS, but I just don't see the purpose in comparing its performance against other filesystems on SSD's.
Log structured fs (including gc, checkpointing, wear-leveling) has been quite prevalent in the embedded linux world for several years already e.g. jffs2 / yaffs / logfs / ubifs etc.
(Of course these typically target raw NAND flash via MTD / UBI layer, so typically aren't directly suited to the SSD block device abstraction)
"How would you implement a swap on such a filesystem? You couldn't, could you?"
The days of "swap" are numbered. The future is a smart memory hierarchy that uses knowledge about size/speed tradeoffs to make smart decisions about where to store things, but that won't look much like "swap".
Even with SSDs, it has become very unusual for "the system is currently swapping" to mean anything other than "the system's performance characteristics just became completely unacceptable". I stopped using a swap partition somewhere around 512MB to 1GB, and now my laptop that I didn't even try to max out the memory on shipped with 4GB. I can't hardly even fill up the cached buffers in 4GB in any reasonable period of time.
(Please note the difference between "very unusual" and "never". In fact, I can cite a server in my care that benefits from using a bit of swap. But that server is the exception, not the rule.)
We'll still have swap as long as CPUs support virtual memory. Nothing beats using a dumb programming language and process model, and without explicitly deciding anything, being able to scale past physical memory until your working set just barely fits.
Further, there may be some performance benefit to swapping idle process memory to make room for more buffer cache.
It's true that for interactive use (especially a laptop with spun-down drives), you may be happy without swap. But I don't see the facility being mothballed.
How would you implement a swap on such a filesystem? You couldn't, could you?
I guess that's mostly a non-issue, as on Linux you usually don't use a swap file but a whole volume. There is support for using files, but you only use it as a last resort. Let's face it, if you know how to set up NILFS2 then you're probably able to create a swap partition.