Instant Elasticity in Cloud is a myth. If you think you are going to get 1k hosts just like that from AWS you will have an unpleasant experience.
I work at one of the decent size tech company and we are split between cloud and on prem. From our experience you have to inform AWS/GCP in advance (sometime way early) if you are looking to meaningfully increase capacity in zone/region.
Sure, auto scaling few hundreds of hosts may be possible but people who run a service which needs few hundreds of hosts run it directly on AWS, they will run it some kind of scheduler+resource manager which will have some kind of operational buffer anyway (as in you would already have those hosts so cloud elasticity is not a factor here).
How early is "way early"? Because as long as it's shorter than the two-three weeks it'd take to order boxes, rack them, provision them (which would be automated but might still take a afternoon), deal with any QA hiccups... I'd much rather call my AWS rep and say "can we add 30% by Thursday" and have them figure it out (and at such a large scale you might be able to spread it out across a couple regions anyway unless you only serve a specific part of the world).
From what I have seen it is actually of the same order or sometimes more. In one of the region/zone we add few hundreds hosts every week but that is after telling them we plan to upscale this in this region upto some big X number.
This is the same with disaster recovery too. The idea that "oh, our main DC went down, we'll just spin it up in another region" is great until you realize that means you need reserved instances in another zone, that just like another physical DC, you won't be using.
Right now there is no specific distinction between what we want to run in Cloud vs On Prem. Important thing to note here is we use Cloud as an IaaS only. We have our own stack which sort of prepares the hosts before it is ingested into clusters as usable capacity.
We actually recommend not using custom cloud providers Databases or any other value added services.
Why not completely either way (on prem vs cloud) is something that happened way before I joined the group but I think the main reason is to have a tactical edge in the long run such that we avoid lock in. I guess in some ways it helps us negotiate pricing better.
Imagine moving a certain workload from GCP region to an AWS region as part of a failover drill.
I work at one of the decent size tech company and we are split between cloud and on prem. From our experience you have to inform AWS/GCP in advance (sometime way early) if you are looking to meaningfully increase capacity in zone/region.
Sure, auto scaling few hundreds of hosts may be possible but people who run a service which needs few hundreds of hosts run it directly on AWS, they will run it some kind of scheduler+resource manager which will have some kind of operational buffer anyway (as in you would already have those hosts so cloud elasticity is not a factor here).