It's an old post with old pricing, I simply don't see vendors pricing close to t...

KaiserPro · on Oct 26, 2014

Does the price include the cost of building the software, and the helping hands to replace/source broken drives? (that includes the 4th iteration)

UnoriginalGuy · on Oct 26, 2014

You're suggesting something that costs $30K minimum. Given the number of vaults they have, they can afford a lot of additional pairs of hands for that saving.

Plus the software has an inherent value, it isn't purely a cost centre like you imply. The value of Backblaze is partly made by that very same software you scoff at.

KaiserPro · on Oct 26, 2014

I'm not scoffing, its a cost. The only reason to do a backblaze is to save on cash.

they are using really wide raid6 stripes, with horrific rebuild times. no hot swap ability.

put in anotherway: with 100 pods, you're going to be loosing 10 drives a week (on average, sometimes more) it takes 48 hours to rebuild an array (although I heard it was upwards of 72 hours, without load.)

so every week, you have to powerdown upto 10% of your storage to make sure you don't loose future data. or you can wait till you've exhausted all your hot spares, but then you're in the poo if you loose a disk on power recycle.(yes raid 6 allows two disk failures, but whats to say you wont loose two disks, as I said they are using large stripes (24?) so there is a high chance of a raid rebuild failing)

hotswap is awesome. running low on hot spares? sure replace online, walk away happy.

to make a backblaze work you need to keep more data in tier 1 storage, which means higher cost. (even if you use a software layer over the top to do FEC you still need at least 1/3rd more disks. )

Then there is the woeful performance.

if you want JBOD, with large scalability, just look at GPFS. The new declustered raid is awesome, plus you get a single namespace over zetabytes of storage, with HSM so you can spin out cold blocks to slower cheaper storage (or tape)

It blows everything else out of the water, ceph, gluster lustre, GFS2 & hadoop. Fast, scalable, cross platform and real POSIX, so a real filesystem which you can backup easily.

brianwski · on Oct 26, 2014

Brian from Backblaze here.

> they are using really wide raid6 stripes,... no hot swap ability.

SATA drives are inherently hot-swappable, you can hot swap drives in a Backblaze Storage Pod if you want to. In our datacenter for our particular application we just choose not to. We shut down the storage pod, calmly swap any drives that failed, then boot it back up. The pod is "offline" for about 12 minutes or there about. The RAID sync then happens as the pod is live and the pod can be actively servicing requests (like restoring customer files) while the RAID sync occurs.

KaiserPro · on Oct 26, 2014

Interesting, do you find that the power up causes more disks to pop? (with the change in temperature?)

How often do you see multi-drive failures(in the same lun)?

We've just past a peak of drive failures (had a power outage) where we've had multi-drive fails in the same lun. The awesome part about that is that each rebuild caused more drive to fail. I think one array swallowed 5 disks in two weeks. It didn't help that the array was being hammered as well as rebuilding (70 hour rebuilds at one point)

brianwski · on Oct 26, 2014

Anytime we reboot a pod, it runs a "significant" risk of not coming back up for a variety of reasons, and if a pod is powered off for several hours it is even worse. My completely unfounded theory is cooling off and heating back up is bad, short power downs things don't really cool off as much.

> How often do you see multi-drive failures(in the same lun)?

It happens - usually for a reason. Most recently a bunch of drives of one model all seemed to fail very closely together after running for 2 years, it's like some part wore out like clockwork. We are pretty relaxed about 1 drive popping, nobody gets out of bed and runs to the datacenter for that, it can wait until the next morning (the pods instantly and automatically put themselves in a state where customers can prepare restores but we stop writing NEW data to them to lighten the load on the RAID arrays). So if 2 drives fail within 24 hours on a Sunday then Monday morning we go fix it.

> each rebuild caused more drives to fail.

Same exact experience here, we feel your pain. We even have a procedure where we line up all brand new drives, clone each individual original drive to a new drive, then take all the new drives and insert them back into the original pod AND THEN run the RAID rebuild.

rincebrain · on Oct 27, 2014

You may find "Datacenter Scale Evaluation of the Impact of Temperature on Hard Disk Drive Failures" relevant to your theory.