Instant Elasticity in Cloud is a myth. If you think you are going to get 1k host...

easton · on Nov 18, 2020

How early is "way early"? Because as long as it's shorter than the two-three weeks it'd take to order boxes, rack them, provision them (which would be automated but might still take a afternoon), deal with any QA hiccups... I'd much rather call my AWS rep and say "can we add 30% by Thursday" and have them figure it out (and at such a large scale you might be able to spread it out across a couple regions anyway unless you only serve a specific part of the world).

thor24 · on Nov 19, 2020

From what I have seen it is actually of the same order or sometimes more. In one of the region/zone we add few hundreds hosts every week but that is after telling them we plan to upscale this in this region upto some big X number.

perfectspiral · on Nov 18, 2020

"Instant elasticity in cloud is a myth"

This times a million. I think SQS standard queues are probably the only thing that IME actually fulfill that promise.

blackaspen · on Nov 18, 2020

This is the same with disaster recovery too. The idea that "oh, our main DC went down, we'll just spin it up in another region" is great until you realize that means you need reserved instances in another zone, that just like another physical DC, you won't be using.

echelon · on Nov 18, 2020

Why not go fully on-prem then? You can run kubernetes locally.

Are managed data stores that attractive? You can pay for on-prem management.

What workloads are in the cloud versus on-prem?

thor24 · on Nov 19, 2020

Right now there is no specific distinction between what we want to run in Cloud vs On Prem. Important thing to note here is we use Cloud as an IaaS only. We have our own stack which sort of prepares the hosts before it is ingested into clusters as usable capacity.

We actually recommend not using custom cloud providers Databases or any other value added services.

Why not completely either way (on prem vs cloud) is something that happened way before I joined the group but I think the main reason is to have a tactical edge in the long run such that we avoid lock in. I guess in some ways it helps us negotiate pricing better.

Imagine moving a certain workload from GCP region to an AWS region as part of a failover drill.

spullara · on Nov 19, 2020

As these are generally scheduled events, end of year, end of quarter, etc. they can be planned. Beats owning the machines.