I've had to deal with one scenario where ZFS fsck would have been very useful, though it comes up rarely: corruption leading to valid checksums, but bad ZFS data.
In my case[1], our setup died due to some power problems, and somehow, NULL pointers got written to disk with valid checksums. Normally this wouldn't happen, but when it does, it's a PITA to debug because trying to traverse/read the disk gives a kernel fault, instead of a segfault as you might see in a user-level fsck program. This was a real pain, as we had encrypted disks, and every reboot meant going through the disk attach steps (enter password, etc etc), every time.
As a result, a userspace implementation of scrubbing would be useful since in this sort of rare instance, I'd be able to probe the fsck process with a good debugger and see why it's crashing. Since it's in userspace, the fsck program can also quit more sanely, with a full report on where it found the corruption. I was able to get my data back via some ad-hoc patches, but it was an... interesting experience having to debug in kernel vs in userspace.
zdb isn't a substitute for most of these things as the on-disk compression and RAIDZ sharding makes it difficult to actually see the raw data structures. Max Bruning wrote a post a while back with an on-disk data walkthrough[2], where he wrote some patches to fix this, but they haven't made their way upstream yet. Additionally, FreeBSD and Linux don't have mdb. :(
I wrote that code several years ago. It is not quite the right way to go.
zdb now does decompression, though slightly differently from what I implemented.
Syntax is: zdb -R poolname vdev:offset:size:d
The "d" at the end says to decompress. zdb tries different decompression algorithms until it finds one that
is correct.
As for my mdb changes, I really think mdb should be able to pick up kernel ctf info so that it can print data structures on disk. That I could probably get working on illumos fairly easily.
My method used zdb to get the data uncompressed, then
used mdb to print it out ala ::print. I actually think something like "offset::zprint [decompression] type" in mdb is the way to go. It would mean no need for zdb, which usually gives too much or not enough, and is not interactive (hence, not really a good debugger as far as I'm concerned). Better would be:
# mdb -z poolname
20000::walk uberblock | ::print -t uberblock_t
And from there, something like:
offset::zprint lzjb objset_phys_t
where offset comes from a DVA displayed in the uberblock_t.
Some people seem to get my idea and think it's good. Others either don't get it, or don't care.
Someone like Delphix might really like it.
In my case[1], our setup died due to some power problems, and somehow, NULL pointers got written to disk with valid checksums. Normally this wouldn't happen, but when it does, it's a PITA to debug because trying to traverse/read the disk gives a kernel fault, instead of a segfault as you might see in a user-level fsck program. This was a real pain, as we had encrypted disks, and every reboot meant going through the disk attach steps (enter password, etc etc), every time.
As a result, a userspace implementation of scrubbing would be useful since in this sort of rare instance, I'd be able to probe the fsck process with a good debugger and see why it's crashing. Since it's in userspace, the fsck program can also quit more sanely, with a full report on where it found the corruption. I was able to get my data back via some ad-hoc patches, but it was an... interesting experience having to debug in kernel vs in userspace.
zdb isn't a substitute for most of these things as the on-disk compression and RAIDZ sharding makes it difficult to actually see the raw data structures. Max Bruning wrote a post a while back with an on-disk data walkthrough[2], where he wrote some patches to fix this, but they haven't made their way upstream yet. Additionally, FreeBSD and Linux don't have mdb. :(
[1] http://lists.freebsd.org/pipermail/freebsd-fs/2012-November/...
[2] http://mbruning.blogspot.com/2009/12/zfs-raidz-data-walk.htm...
edit: formatting