Not adding that resiliency isn't the answer though - it just means known failure...

Not adding that resiliency isn't the answer though - it just means known failures will get you. Is that better than the unknown failures because of your mitigation? I cannot answer that.

I can tell you 100% that eventually a disk will fail. I can tell you 100% that eventually the power will go out. I can tell you 100% that even if you have a computer with redundant power supplies each connected to separate grids, eventually both power supplies will fail at the same time - it just will happen a lot less often than if you have a regular computer not on any redundant/backup power. I can tell you that network cables do break from time to time. I can tell you that buildings are vulnerable to earthquakes, fires, floods, tornadoes and other such disasters). I can tell you that software is not perfect and eventually crashes. I can tell you that upgrades are hard if any protocol changed. I can tell you there is a long list of other known disasters that I didn't list, but a little research will discover.

I could look up the odds of the above. In turn this allows calculating the costs of each mitigation against the likely cost of not mitigating it - but this is only statistical you may decide something statistically cannot happen and it does anyway.

What I cannot tell you is how much you should mitigate. There is a cost to each mitigation that need to be compared to the value.