He claims ZFS is bad for databases - where is this documented?

desdiv · on Jan 3, 2015

I think he's referring to this:

https://bartsjerps.wordpress.com/2013/02/26/zfs-ora-database...

bcantrill · on Jan 3, 2015

From that post:

I will not make any comments on how this affects performance (I’ll save that for a future post). I also deliberately ignore ZFS caching and other optimizing features – the only thing I want to show right now is how much fragmentation is caused on physical disk by using ZFS for Oracle data files.

Any engineer should be able to appreciate why this is frustrating from the perspective of ZFS: if you're going to turn off all of the things added for performance, reasoning about performance isn't terribly interesting.

He did (kind of) post the promised follow up about eighteen months later[1], but it was running ancient ZFS bits (at least four years old when it was published) on a gallingly mistuned configuration (he only had 100M of ARC).

Again, what we know is our own experience from running Postgres on ZFS -- which has been refreshingly devoid of ZFS-related performance problems. And experience has taught me to trust my own production experience much more than naive microbenchmarking exercises...

[1] https://bartsjerps.wordpress.com/2014/08/11/oracle-asm-vs-zf...

zaroth · on Jan 3, 2015

It sounds like a more inherent result of ZFS being copy-on-write; if you run a pre-allocated fixed size database on top of it which is applying random writes to existing records it's going to heavily fragment over time where a write-in-place filesystem would not.

I think it's fair to examine this behavior without the very-much mediating features such as ARC, since ideally we want ARC boosting performance well above the block storage media, not just earning back lost performance from copy-on-write.

I think the key take-away is if you have spinning rust and expect to sequentially read from disk a large database table by primary key, to the extent old records have been updated they will be fragmented, cause unexpected seeks killing your read throughput, and there's no way to defrag it without taking the database offline.

It sounds like a serious factor to consider depending on the particular workload your database expects.

bhouston · on Jan 3, 2015

I believe we ran into this with https://Clara.io in production. I wrote about it here a few weeks ago: https://news.ycombinator.com/item?id=8777000