Why would you need to do any of that? RAID6 can tolerate 2 drive failures and Li...

codeonfire · on Oct 26, 2014

Three drive failures? My question is how do you practically determine which drive to swap. I don't see any labels or anything. Also I read the current version supports rails. The one in the article looks bolted to the rack. The article had no date on it and it sounds like a lot of the issues have been addressed.

brianwski · on Oct 27, 2014

Brian from Backblaze here.

> how do you determine which drive to swap

Every two minutes we query every drive (all 45 drives inside the pod) with the built in linux smartctl command. We send this information to a kind of monitoring server, so even if the pod entirely dies we know everything about the health of the disks inside it up until 2 minutes earlier. We remember the history for a few weeks (the amount of data is tiny by our standards).

Then, when one of the drives stops responding, several things occur: 1) we put the entire pod into a "read only" mode where no more customer data is written (this lowers the risk of more failures), and 2) we have a friendly web interface that informs the datacenter techs which drive to replace, and 3) an email alert is sent out to whomever is oncall.

Each drive maps has a name like /dev/sda1 and these drive names are reproducibly in the same location in the pod every time. In addition (before it disappeared) the drive also reported a serial number like "WD‑WCAU45178029" as part of the smartctl commands which is ALSO PRINTED ON THE OUTSIDE OF THE DRIVE.

TL;DR - it's easy to swap the correct drive based on external serial numbers. :-)

codeonfire · on Oct 28, 2014

Ok, thanks for the info. That doesn't sound too bad.

mikeash · on Oct 26, 2014

Wouldn't the SATA controller tell you which port the bad drive was on, and then you'd presumably have some standard mapping from ports to physical locations in the case?

codeonfire · on Oct 26, 2014

Yes, I'm not saying its not doable. It just seems error prone and time intensive to replace a drive.

sneak · on Oct 26, 2014

I don't understand why they're using raid6 instead of file-level replication/integrity. They're already running an application over top of it - do the replication there and skip replacing disks...

rythie · on Oct 26, 2014

I don't think they are doing replication at all, don't see it mentioned. RAID6 is cheaper than replicating the files, which would mean twice as many servers.

aftbit · on Oct 26, 2014

But you'll need some level of multi-data-center durability (or at least across racks), so you'll want to replicate user content anyway. Otherwise a dead server could prevent a restore.

rythie · on Oct 26, 2014

I see no mention that they do that, clearly they trying to make this as cheap as possible. If the server is down people can wait for their recovery, many tape based backup systems require you to wait for 30 mins or more to get the backup which covers most outages. Even waiting a days for a recovery isn't the end of the world, especially given very few recoveries are being done and big recoveries require sending of a drive that takes days anyway.

The worst case of course is that actually lose your data, probably as the result of a data center fire/explosion, though people should, in most cases, still have the primary copy of the data on their machines. However re-backing this up would take a long time.

vidarh · on Oct 26, 2014

You don't run a serious data storage service on just RAID6.

Data centres do fail, have fires etc.

rythie · on Oct 26, 2014

It's backup not primary storage. If they were going to spend the twice the cost, do you think there would be a least a single mention of this somewhere? anywhere? (I couldn't find one)

This outage post makes no mention of a backup datacenter and they say backups were halted as a result of this outage: https://www.backblaze.com/blog/data-center-outage/