Was shocked to see that your durability (27 9s) was so much higher than what S3 ...

jerf · on July 6, 2016

27 9s is literally higher than my confidence that human civilization will be here in the next second. 10^27 seconds is about 32 quintillion years. Extinction events occur with a much higher frequency than that.

No criticism of Dropbox here; they know that number is just math games, too. I'm just putting some numbers on how true that is. Because I find tossing about big numbers like this as if they mean something a bizarre, nerdy sort of fun.

gamegoblin · on July 7, 2016

Exactly. They say they are using some variant of Reed-Solomon erasure coding. If you did something like K=100 and N=150 and stored all of the shards on different disks, the probability you lose data is equal to the probability that 50 hard drives fail before you can repair the lost shards.

If I am reading the article correctly, they claim that they should usually be able to repair in less than an hour in the case of disk failure.

Thus the probability of losing 50 (or whatever their N-K value is) disks within an hour is how you get 27 nines of durability.

Of course, the probability that one of your software engineers introduces a durability bug is WAY more likely than those disks experiencing a coordinated failure.

Or say, the probability that a terrorist organization targets your datacenters. Even if those odds are one in a billion, that's still not even close to 27 nines.

james_cowling · on July 7, 2016

For sure. I hope we're all agreeing here :)

We've very strong believers that an effective replication strategy is just table stakes and that from there the real risks to durability are the "black swan" events that are much harder to model.

I gave at talk at Data@Scale recently where the main premise is about "Durability Theater" and how to combat it in a production storage system. In case you're interested: https://code.facebook.com/posts/253562281667886/data-scale-j...

james_cowling · on July 6, 2016

Yeah to be clear my point from the article is that even though you can model an incredibly high durability number in theory, there's a lot more work required to see that in practice. I'm sure the S3 team also have extremely high internal durability numbers.

A lot of other factors go into an end-to-end durability numbers, including the possibility of software bugs, correlated failure, etc. Magic Pocket has an external durability claim of 12 9s which gives us a lot of headroom between the external figure and the internal model.

Fortunately both 11 or 12 nines is close enough to infinite as far as clients are concerned.

toomuchtodo · on July 6, 2016

Thanks for taking the time to reply James!

Johnny555 · on July 6, 2016

S3's durability numbers are for a single region (since they were quoting that number before they supported cross-region replication) -- so you'd get much better durability if you mirror your objects across regions. Which, apparently, is what Dropbox does.

Though if you require such high durability, you're better off with multiple providers in multiple countries, since at 11 9's, it probably becomes more likely that a provider will go out of business or a civil disturbance will make your data become unavailable than the chance of them losing it.

toomuchtodo · on July 6, 2016

To reply your first paragraph, Dropbox is providing multi-region durability for free compared to S3. Point to Dropbox.

I agree with your second paragraph in entirety.

james_cowling · on July 7, 2016

From what I hear from other major storage providers Magic Pocket is definitely very high up on the paranoia/verification scale. There is of course a cost to all this verification traffic. Any sensible storage provider will be doing stuff like disk scrubbing and verification scans on their storage index however.

We haven't shared drive reliability stats before but I don't think we have any objection to doing so. Don't expect these in the next month or two but we'll likely get to this in the not-too-distant future.

toomuchtodo · on July 7, 2016

> We haven't shared drive reliability stats before but I don't think we have any objection to doing so. Don't expect these in the next month or two but we'll likely get to this in the not-too-distant future.

Thanks so much James!

e12e · on July 7, 2016

>> Was shocked to see that your durability (27 9s) was so much higher than what S3 claims (11 9s)

> 27 9s is literally higher than my confidence that human civilization will be here in the next second. 10^27 seconds is about 32 quintillion years. Extinction events occur with a much higher frequency than that.

Without reading the article, I'm assuming this is the uptime probability they promise - so ( 1.0/10^9 ) * 365.25 * 24 * 3600 * 300 ~ 1 second of unavailability every 30 years (AFAIK Amazon is pretty far in the "red" on this one - they've had a few outages?) [ed: initially was off by a factor of 100 (and 10 in error...), the two 9s before the comma - the "compliment" (1 - p) of 11 9s is 1/10^9, not 1/10^11)].

11 9s is already effectively the same as will "never go down in a way customers will notice" (actually a second every 30 years per region isn't entirely insignificant, only almost insignificant). A higher guarantee does indeed seem silly. It's probably much more likely that we'll see annihilation by global thermonuclear war (for example). In which case I'm guessing the data would go off line for a while -- so such a number is meaningless.

placeybordeaux · on July 7, 2016

Thats not what they are promising, that would be insane. Please just read the article.

e12e · on July 7, 2016

Point well taken - I was confused by the comments about how long 10^27 seconds is, and the "9s" nomenclature which is often used to measure uptime.

On the other hand, a Dropbox engineer upthread just claimed that their service have an external [ed2: durability] of 11 to 12 9s. So it does seem that they effectively claim that a block will practically never be unavailable due to not being possible to read from (any) disk [ed2: ie, unavailable due to failed durability]?

I do wonder a bit at the cost of padding redundancy up to such a high number. They don't mention block size, other than to say that 1 GB is filled with actual blocks. Lets say it's 1MB, and they target 1 Billion users, averaging 100 GB of data stored. That's 10⁹ users storing 10² GB each with 10³ blocks, or 10⁹⁺²⁺³ = 10¹⁵ blocks of data. That still leaves a lot of margin - and effectively the storage part of the system should never be the weakest link.

[ed: and that some users are likely to lose some data, if the 11 to 12 9s figure is to be taken "per block". But maybe it's per user? It seems unlikely that they really mean 11 9s of availability full stop]

[ed2: Snuck in an extra availability where I meant durability, rendering also this second comment nonsensical...]

gamegoblin · on July 7, 2016

You are confusing availability ("uptime") with durability ("losing data").

I could make a system that was only available for 1 minute every 10 minutes (10% availability) but never lost a single file (100% durability)

I could also make a system that was never down (100% availability) but would randomly lose 1 in 10 files (90% durability).

Common causes for availability issues are power outages, network outages (fiber cuts, etc), and DNS issues

Common causes for durability issues are software bugs, entire datacenter losses (e.g. earthquake destroys everything), or perhaps coordinated power outages if the system buffers things in volatile memory.