Wow you expect fog creek to be down for that long when it does? Why can't you just scp everything on there to your other DC? or at least just move the harddrives? Seems like it would cost less time.
Ha, I wouldn't worry too much about Kiln. I can push when the servers come back. Heck, considering how often some folks I know of push their changesets I wouldn't be surprised if a lot of them never even notice that anything happened.
FogBugz is more of a problem. Some days the "Resolve" button is my only source of job satisfaction.
>4 TB of data? I didn't realize Kiln's gotten that big already. When this whole thing is sorted out, @gecko @kevingessner - would love to see a "State of the Kiln" post and some stats!
Yes, we have multiple off-site backups (cloud as well as an offsite storage DC) for all customer data. All data is still safe in NYC -- we've just brought down service to prevent problems in case of an abrupt power failure.
Sure, but that invalidates the complaint of it being infeasible to scp out 4 TB of data before your NYC DC runs out of power. Those 4 TB of data are already out of NYC, safely in some other DC where you have hardware. You just need new servers/VMs in/near that DC to restore the backups to.
I'm not trying to 2nd guess your ops team, but the whole point in having off-site backups is to facilitate a your RTO plan in case you lose your primary DC with no warning. I guess I'd be surprised if you don't have a < 24 hour RTO plan in place. With how quickly you can get VMs and even dedicated server provisioned by many hosting providers (minutes to a couple hours), the idea of physically moving servers off-site into a new racks, with new networking, etc... seems kinda nutty...
I can't imagine that relocating an entire environment for a multitude of applications and services is as simple as scping things over. I'm sure there is a process wherein scp could be a step, but I don't see it being any easier/faster.
Moving the hard drives could be an option, and I believe it is sometimes done, but it assumes there are empty boxes on the other end waiting to receive the hard drives in a similar configuration to how the hard drives came. Also there's separate issues depending how many drives they are dealing with, and what redundancy is involved. If it's very few, then you might as well move the server outright. If it's very many, then there's extra human overhead (and room for error) in keeping the drives together.