I'm not in the web or cloud business, but I've filled a rack with my stuff befor...

cbdumas · on Nov 18, 2020

IMO your assertion is validated by the excellent overview of Stack Overflow's infrastructure given here:

https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...

Very few web apps will ever serve as much traffic as SO.

Daishiman · on Nov 18, 2020

SO doesn't have a very operationally complex app.

A bank running 50 different services, on different platforms, with serious audit requirements, physical and logical access control, strict change and configuration management, etc., has two orders of magnitude more complexity. And that shit is very expensive in manpower.

perfectspiral · on Nov 18, 2020

"Very few web apps will ever serve as much traffic as SO."

Their traffic is like 80-90% reads and they actually hire good devs and let them work on perf.

Neither of those things are true in typical companies.

spullara · on Nov 18, 2020

There are now businesses that explicitly depend on the elasticity of the cloud and can never really be moved on premise without massive up-front investment in hardware that may only be used a few times a year for their biggest customers. Trying to hybridize these workloads hasn't been very successful as of yet. It is possible that K8S could relive this problem but I haven't seen it in practice, at scale.

thor24 · on Nov 18, 2020

Instant Elasticity in Cloud is a myth. If you think you are going to get 1k hosts just like that from AWS you will have an unpleasant experience.

I work at one of the decent size tech company and we are split between cloud and on prem. From our experience you have to inform AWS/GCP in advance (sometime way early) if you are looking to meaningfully increase capacity in zone/region.

Sure, auto scaling few hundreds of hosts may be possible but people who run a service which needs few hundreds of hosts run it directly on AWS, they will run it some kind of scheduler+resource manager which will have some kind of operational buffer anyway (as in you would already have those hosts so cloud elasticity is not a factor here).

easton · on Nov 18, 2020

How early is "way early"? Because as long as it's shorter than the two-three weeks it'd take to order boxes, rack them, provision them (which would be automated but might still take a afternoon), deal with any QA hiccups... I'd much rather call my AWS rep and say "can we add 30% by Thursday" and have them figure it out (and at such a large scale you might be able to spread it out across a couple regions anyway unless you only serve a specific part of the world).

thor24 · on Nov 19, 2020

From what I have seen it is actually of the same order or sometimes more. In one of the region/zone we add few hundreds hosts every week but that is after telling them we plan to upscale this in this region upto some big X number.

perfectspiral · on Nov 18, 2020

"Instant elasticity in cloud is a myth"

This times a million. I think SQS standard queues are probably the only thing that IME actually fulfill that promise.

blackaspen · on Nov 18, 2020

This is the same with disaster recovery too. The idea that "oh, our main DC went down, we'll just spin it up in another region" is great until you realize that means you need reserved instances in another zone, that just like another physical DC, you won't be using.

echelon · on Nov 18, 2020

Why not go fully on-prem then? You can run kubernetes locally.

Are managed data stores that attractive? You can pay for on-prem management.

What workloads are in the cloud versus on-prem?

thor24 · on Nov 19, 2020

Right now there is no specific distinction between what we want to run in Cloud vs On Prem. Important thing to note here is we use Cloud as an IaaS only. We have our own stack which sort of prepares the hosts before it is ingested into clusters as usable capacity.

We actually recommend not using custom cloud providers Databases or any other value added services.

Why not completely either way (on prem vs cloud) is something that happened way before I joined the group but I think the main reason is to have a tactical edge in the long run such that we avoid lock in. I guess in some ways it helps us negotiate pricing better.

Imagine moving a certain workload from GCP region to an AWS region as part of a failover drill.

spullara · on Nov 19, 2020

As these are generally scheduled events, end of year, end of quarter, etc. they can be planned. Beats owning the machines.

einpoklum · on Nov 18, 2020

Elasticity? Fine. So their, say, single rack will sometime have limited load and be under-utilized.

About the up-front investment - most hi-tech companies are a massive initial up-front (or nearly-up-front) investment.

spullara · on Nov 19, 2020

I was talking about at scale, not a rack. If you can get by with a rack, you will pay more for the people to support it than the incremental cost of the cloud.

einpoklum · on Nov 19, 2020

> If you can get by with a rack, you will pay more for the people to support it than the incremental cost of the cloud.

Probably a whole lot less.

At larger scale - I would guess it's the same thing. If an organization needs more than a rack during peak use, it can probably benefit from setting up its own infrastructure. Only in the uncommon case of short extreme peak use and almost no use most of the time does such elasticity make a could solution attractive. IMHO.

spullara · on Nov 19, 2020

That is very common with the infrastructure startups that I work with, like Snowflake and others.

einpoklum · on Nov 20, 2020

Don't the various clients even out the usage?

znpy · on Nov 18, 2020

> My impression is that hardware has become a lot more capable even relative to its tasks.

Indeed. The margins are bonkers high. As an example, the amount of ram that you can stuff into a physical machine has at least doubled in the last five years, but the price of the average virtual machine has not.

dnautics · on Nov 18, 2020

You still want ha, failover, and disaster recovery. Then you need to set up stuff like bgp, dns, security rules, etc, etc etc. Complexity mounts pretty quickly.

Daishiman · on Nov 18, 2020

Indeed. It seems that most of the people saying that cloud hosting is expensive have never run into the issues of making their own SAN, managing the provisioning of 20 different teams, etc.

The organizational complexity and specialist knowledge is mind-boggling and there is zero chance that your in-house knowledge is better than what Amazon can provide.

znpy · on Nov 18, 2020

This is true, but unrelated.

We're talking about Dropbox scale.

At that scale you can (nee should) hire all the specialists you need.

johbjo · on Nov 18, 2020

Installing rack servers and setting up services to run a site used to be a sort of rite-of-passage 15-20 years ago, but that time period of the web was different. Still, I would consider basic familiarity with the infrastructure necessary also today.

Increasing hardware performance relative to task load created the rational for virtualization. Virtualization also turned out to be rational with respect to consistency, convenience, maintenance, and so on. At that point, outsourcing to a cloud can be rational.

But fewer people get hands-on experience with the infrastructure, and it sounds like many consider it almost mythical. For example, realizing the amount of work that can be done in 4U today. What does amazon charge for 96 cores and 256GB?