AWS gives you all the things you'd need to scale, without heavy up-front costs. There's a natural path from small instances -> bigger instances -> load balancers/ELB -> reserved instances (or spot if it fits your workload). For a smaller company, any savings you'd get from owned servers would be offset by much higher dev ops costs.
Plus, as mentioned, you get a vast menu of services to choose from, all of which are managed. Just in terms of databases alone, you could go with RDS (raw MySql/PG), Aurora, Dynamo, Elasticache/Redis, DocumentDB (Mongo) and more. Plus managed backups, failover, multi-AZ, security, and connectivity built-in.
If your team is filled with devops engineers, then, sure, go with the non-AWS route. But it's a lifesaver for agile companies who can't afford a lot of infrastructure work and need easy levers to scale, and can afford a modest markup.
> If your team is filled with devops engineers, then, sure, go with the non-AWS route.
This seems backwards to me - running a simple VPS on something like OVH/DigitalOcean/Linode is a matter of creating the instance, setting the size, and setting up your server software. Super simple, and likely about the same complexity as setting up a dev environment.
Setting the same thing up on AWS requires slogging through the documentation of their many services, deciding what makes the most sense for you (EC2? ECS? EKS? Lambda? Fargate?), and then configuring more options with less-friendly UI than any other provider I've seen. If you can fully wrap your head around all those services, create a stack that makes sense, and maintain it long-term, you're probably well on your way to being qualified for a DevOps engineer job. I can spin up a quick DigitalOcean droplet for a new project in less than half the time it'd take me to set up an EC2 instance, configure the exact amount of storage I need, and pull up 2 additional browser windows to determine the specs and pricing of different instance sizes in different regions.
> running a simple VPS on something like OVH/DigitalOcean/Linode is a matter of creating the instance, setting the size, and setting up your server software. Super simple, and likely about the same complexity as setting up a dev environment.
Why would you leave a server running that long? Is that best practice? I'm by no means a sysadmin but I do monthly scheduled reboots, because on monthly reboots you test that reboot gets you a running server that correctly comes back, and you get a more thorough fsck.
Also not a sysadmin, but regular reboots are a must-have. Not only does it test that the machine comes back up, but it will also ensure that your kernel and other services are patched. Of course, the services can also be restarted separately after an update, but swapping the running kernel with a new one is still very uncommon.
Having run a simple web server + database on both EC2 and Digital Ocean I disagree. The two services are nearly identical at that scale.
In fact I don’t see any significant differences between any of the proper cloud providers at this level. The cost & functionality are very nearly the same.
Then you remember that for years, Stack Overflow ran out of a couple of well administered servers. YAGNI. KISS. People forget the basics because "infrastructure astronautics" is fun, and it probably helps make a beautiful resume, too.
While infrastructure astronauts are typically money wasters and padding their resumes, I don't think SO is the best counter example.
SO's read to write ratio is enormous. While they talk about a handful of different bare metal servers a vast majority of their hits are handled by their caching layer which is not one of their bare metal machines.
Their write loads are not explicitly time sensitive either, if there's a delay between a successful post and cache invalidation it's not a big deal for a majority of their traffic.
Not every model is quite so forgiving. But even then there's a lot of stupid wasteful "infrastructure" made by people who think they're going to be Google, or at least tell their investors they'll be Google.
Well, it's an interactive database backed web site. That describes like maybe 90% of the things running on AWS.
StackOverflow is useful because it reminds people that machines are fast, and you probably don't need that many of them to scale to large sizes. StackOverflow is used by the entire global population of developers more or less and it runs off of one large MS SQL Server + some web servers. No auto scaling (not enough hw expense to be worth it), no need for fancy cloud LBs etc.
Sure maybe your service is gonna scale to more than the world's population of developers. Great. But ... a lot of services won't even go that far. For them it's hard to conclude it's really needed.
To pull off a StackOverflow you do need skilled sysadmins though. As the article points out, a surprisingly large number of people who call themselves devops don't really know UNIX sysadmin anymore.
Autoscaling is overused anyway. It's cheaper and much less complex to have 200-300% of average capacity running 24/7 on dedicated servers (OVH/Hetzner, whatever) than trying to scale up and down according to demand.
Changing capacity automatically always has the potential to backfire. And as AWS & Co need to keep those servers running during times of less demand there's no way it's cheaper unless you have really unusual traffic patterns (even then it's probably not).
You're not wrong but it's worth noting SO as a case study for site design has some important caveats. Beefy hardware is great but aggressive caching and setting same expectations around write latency can be a massive scaling win.
I would guess SO had a relatively small number of employees and their servers were running a pretty straight-forward CRUD app and a database. Comparing that to the heterogeneous workloads that large organizations deal with is a bit silly. No doubt there's still a lot of fat to trim in those large organizations, but trimming that fat is rarely their largest opportunity (much to the chagrin of those of us who like simple, elegant systems).
my guy, you literally made me laugh. how's this https://stackexchange.com/performance different from any so called workloads large organizations are running. you've your app servers, db servers, load balancers and failover servers. pretty standard setup. yet SO is running on bare-metal. resume driven development and everyone thinking they're google | fb has killed and made money in our industry
StackExchange is largely a CRUD app. High volume of tiny requests that hit the app layer, then the database, and back. Other organizations have lower volumes of compute-intensive requests, async tasks, etc.
With respect to the size of an organization, the cost of coordinating deployments and maintenance over a handful of servers grows with the size of the organization. It frequently behooves larger organizations to allow dev teams to operate their own services.
None of this is to say that there isn't waste or poor decisions throughout our industry; only that it's not the sole factor and SO's isn't the ideal architecture for all applications.
It is a large site by traffic measure but I would guess the traffic is heavily read only. Managing workloads with more data mutation introduces different complexities which mean you can't just cache everything and accept the TTL for writes based on cache invalidation.
edit: To be clear, not saying SO isn't an achievement, but its one type of use case that yields a really simple tech stack.
Their DB handles peak of 11,000 qps and peaks at only 15% CPU usage. That's after caching. There are also some ElasticSearch servers. Sure, their traffic is heavily read only, but it's also a site that exists purely for user-generated content. They could probably handle far higher write loads than they do, and they handle a lot of traffic as-is.
What specific complexities would be introduced by an even higher write load that AWS specifically would help them address?
> Comparatively speaking even compare them to the same CRUD app and DB, most are using 5 to 10x more servers with 1/2 to 1/5 of the traffic.
No doubt, but how large are those other CRUD app organizations? Do they have a staff that is 20x the size all trying to coordinate deployments on the same handful of servers? What are their opportunity costs? Is trimming down server expenses really their most valuable opportunity? No doubt that SO has a great Ops capability, but it's not the only variable at play.
I guess that without a lot of optimizations, .NET will be much more performant. The lower the performance, the more servers you need and that will make it harder to manage your fleet.
Another issue is people don't understand how powerful modern hardware is. You have modern systems that process less transaction per a second ones from the 1970s.
Just look at the modern SPA. They are slower than ones 10 years ago, and the JavaScript VM is much faster plus all the hardware gains. Why does it make Twitter and Gmail a few seconds to load?
I don't doubt certain mainframe apps of the (late) 1970s could beat the TPS of a generically frameworked app in certain situations, but do you have any real numbers/situations/case studies to back that up?
That's the nice thing about Hetzner, OVH and Co. You don't need Colocation but can instead rent servers monthly. That way you never have to bother with physical hardware, the knowledge needed is purely on the software side.
I also think colocation or own datacenters are a poor fit for most. But dedicated servers are underrated. They can be offered much cheaper as it's simply renting standard hardware and you don't need any of the expertise or size you'd need for colocation.
> AWS gives you all the things you'd need to scale, without heavy up-front costs
A startup doesn't need AWS right off the bat. Planning for scale right from the beginning is a way to quickly bleed $. Of course, if you have VC money, why not spend that cash right?
Where I work we've started non-AWS and have continued non-AWS. We don't have a team of devops engineers, but rather a team where there are a engineers who _can_ do devops. I dread the day we need to move to AWS, but it's much easier moving to AWS than off it.
Yeah but that's what capex based stuff (like buying and colocating) is: planning for scale. With AWS you think you need a robust DB and shit and you've made a decision that can be undone in minutes.
You think you want Elasticache? It's a fifteen minute operation. In that time your devops guy isn't going to have even downloaded the binary and figured out the docs.
As I wrote further up, there's not just colocation and AWS. Dedicated servers are a great fit for many start ups and need no capex. You just rent hardware monthly like you lease most of your office equipment. Much cheaper than AWS, no hardware expertise needed and even (manual) scaling over time works quite well, it's easy to add servers within a day or so.
Sure, you'll always have idle capacity, but this way you could use it. With AWS, Amazon runs that idle capacity and charges you for it.
> With AWS you think you need a robust DB and shit and you've made a decision that can be undone in minutes.
If it can be undone in minutes, it isn't much of a decision. A service that can be enabled or canceled on a whim is unnecessary.
Realistically, analyzing the guarantees offered by a cloud platform, your corresponding requirements, and how everything is supposed to work is going to take days, and actually developing and testing disaster recovery procedures is going to take even longer.
You don't necessarily need to plan from scale from the start. Often you just need a few servers, a load balancer, a database, some file storage, internal networking, user permissions, and some level of security/firewall. That is very easy to set up on AWS in a day or two, and you don't need all your engineers to simultaneously be devops experts.
The scale can happen once you've validated your startup, and when that happens it's a lot easier to just turn on 30 more EC2 instances than to get on the phone with Dell.
You probably don't get on the phone with Dell. There's a large area in between EC2 and racking your own boxes in your own datacenter. You can buy dedicated machines from OVH or similar firms with lead time of a few days, and it's a very rare firm that can't predict their load a week in advance .. even for a fast growing company.
Look at it like this; GitHub mostly ran in their own datacenter using AWS only for spillover capacity. They scaled just fine. Virtually no apps have the problem of runaway growth that they can't handle without instantly provisioned resources.
There are some merits to AWS, I can agree with that, and there comes a point where a startup outgrows baremetal / cookie cutter VPS-es, but I disagree with "AWS is trivial" and that it takes a day or two to get things set up. For basic set up like getting a few servers done behind a load balancer and a database - sure - and I'd argue that services like DigitalOcean and Linode are actually easier to set up than AWS for basic services.
To actually do more advanced stuff (the thing that AWS is good for) and utilize tools such as Terraform, you'd essentially need to hire engineers that are experts in AWS in addition to engineers who can do devops, as there's only so much "magic" AWS can provide.
> A startup doesn't need AWS right off the bat. Planning for scale right from the beginning is a way to quickly bleed $. Of course, if you have VC money, why not spend that cash right?
Especially when the alternative is to hear a lot of, "What? You don't expect to scale? I thought you took this seriously."
OVH and Hetzner could probably provide you with at least 100 top of the line servers within 24h (just guessed that, it's probably more). With a reasonable setup, that's much more than any start up will need to scale.
I'm with you until that statement. AWS is nothing approaching "modest" in their markup. 20% minimum and typically much higher if you know how to negotiate when purchasing your on-prem gear. And if you happen to be a shop that sweats assets for 7+ years that number starts being measured in the hundreds of percentage points more expensive.
And on bandwidth their markup tends towards infinity - same on Azure and GCP.
As a example, 20TB of egress bandwidth will cost you around $1,750 on Azure, almost $90/TB! Your typical server/VPS provider gives you that kind of bandwidth allowance for free, because it costs them close to nothing.
I have a feeling almost all complaints about cloud costs would disappear forever if they'd only stop gouging customers on bandwidth.
The likes of Hetzner and OVH are starting to offer more cloudy features, like load balancers and block storage. My hope is that eventually they become viable alternatives to the big 3, and they are finally forced to reduce their ludicrous bandwidth prices.
The term 'roach motel' comes to mind when I see their network traffic rates.
"Come on in; there's no commitment. Pay as you go, and did I mention our ingress rates are free! Bring us your data and give the cloud a try. It's the future, you know..."
I have yet to move a single customer to Amazon that fired the FTE who was managing on-prem infrastructure and chalked it up to a cost-savings move. . The idea of cutting headcount is a fallacy. In small shops the guy that was managing on-prem infrastructure is now doing that (because it turns out you still need switches and phones and routers and firewalls even if you host things in AWS) as well as managing the AWS infrastructure. In large shops you're typically replacing "cheap" headcount (the guy who racked servers) with someone significantly more expensive (a team of cloud architects).
We're a small company who avoided the hire. And we don't bother with a firewall at our office -- the only things on the office network are laptops, phones, printers, and sonos.
Basically, if you model senior eng time as $5k/week -- not even counting the opportunity cost -- you'll understand why AWS.
While I definitely think AWS is very expensive for the servers, they are not overall expensive. Again, setting up CloudWatch in an afternoon or dumping tons of heterogenous logs into S3 and querying them with Athena or spinning up full environment stacks (front end pool, back end pool, load balancers, databases, vpcs, vpns, etc) is an enormous cost savings vs finding, installing, configuring, and maintaining batches of software to do the same.
edit: pay an eng $180k. Fully loaded, you're looking at $220k plus. Add in either recruiting time or a recruiting fee that probably runs us $2k/mo amortized over the first year on the low end. Add in the fact that the first 6 weeks weren't super productive, and also absorbed tons of time from other SE to get this person up to speed. Add in PM time bringing this person up to speed on our product.
Or, you know, you could have brought in a consultant part time. If you're small enough that you can handle an AWS setup without someone full time, you could run on a managed hosting setup for less including an external consultant charging ridiculous hourly rates.
Clients on AWS were always my best ones, because they were used to paying so much over the odds I could set my rates much higher.
Yeah, I wanted an extra person on our desperately understaffed team, so I proposed I cut about a million bucks of spending year-on-year from AWS.
They didn't want to increase the head count. The AWS spend was a different silo/team/contract.
Corporate accounting is insane.
It almost smelled like "well, we negotiated this great AWS discount, so it doesn't bother us that we are spending an extra million. We're getting it at a DISCOUNT!"
Is there some MBA trend where every headcount is some ticking time bomb of liability that is 10x worse than their yearly salary?
You spend 150k on servers, they don't improve over a year.
You spend 150k on a decent employee, not even rockstar ninja 100xer, and they will deliver useful, although sometimes hidden, productivity and optimization.
On top of that add the opportunity cost of spending all that time managing/setting on prem infrastructure Vs solving business problems....there’s a good reason AWS and the likes have millions of customers today.
It's such a meme on this site to compare costs of AWS to hardware costs as though the hardware operates itself. I'm sure there are many cases in which onprem actually is a better value proposition than AWS, and I would find it much more interesting if we talked about when to go with AWS vs onprem, but instead we try to make these silly comparisons about the upfront cost of a server blade vs an EC2 instance.
I've done operations stuff for ~25 years now. I've used AWS since it launched. I've done devops consulting for years.
I've yet to see an AWS deployment be competitive on cost with onprem or managed hosting. In fact, I used to make good money helping people cut costs by moving off AWS.
That includes paying more for devops, because AWS is incredibly complicated, and most people struggle to get it set up well.
There are valid reasons to pick AWS. Cost is not one of them.
Why? If the primary benefit you're getting from the cloud is VMs, it's valid to compare to hardware. The overhead of bare metal on top of cloud VMs is basically knowing how to handle RAID and set up load balancing.
VMs are certainly not the primary benefit of the cloud; cloud providers offer many services in addition to VM orchestration, but that hardly matters because your own example is illustrative of your error:
> The overhead of bare metal on top of cloud VMs is basically knowing how to handle RAID and set up load balancing.
This implies human beings to handle raid and set up load balancing, which suggests that you need to compare the cost of cloud providers with the cost of hardware and the cost of those engineering resources.
In addition to RAID and load balancing, most organizations/applications also need networking (good luck balancing load without a network), databases (including backup management), TLS, DNS, access management, etc, etc. All of this takes humans to build and operate. AWS services do a lot of this for you. In the on-prem world, you have to build (or buy/integrate) this yourself, but that's not free so you have to account for that cost in your comparison.
You can still make the argument that on-prem is a better value proposition when accounting for the total cost of ownership, but that's a different argument than those which ignore engineering costs altogether.
Yes, you need to know how to use Linux. And as the article astutely points out, there is an undiscussed problem with UNIX skills loss across the industry. AWS provides GUIs and Linux doesn't - accepted.
However for the many, many people who do already have those skills, and for the many pieces of software that are basically web servers + databases, the overhead of bare metal vs cloud VMs or services is basically a bit of Linux sysadmin work. People are acting like you have to hire a full time wizard to even consider using anything other than high cost AWS services but that is wrong: AWS requires learning too, people who understand UNIX are plentiful even if apparently not as plentiful as they once were, and a well run setup will not require constant sysadmin work. A well rounded developer will often be able to do the ops work as well.
Also "bare metal" doesn't mean running your own datacenter. You can buy hardware, send it to a colo and they'll rack it and run the network for you. That's why I say it's mostly a matter of understanding RAID: when the hardware fails (almost always disk or less commonly, RAM units), you file a ticket, blink the HDD LED and some remote hands go and swap out the part for you. Those "hands as a service" come cheap compared to AWS.
A large part of the problem is that most developers have no experience with this, and think server hardware is as much a pain as home PC hardware, which a large portion of younger developers have little experience with too.
They've not seen servers with IPMI tied into PXE boot and a TFTP setup. They don't realise once that server is racked up, we can log in to an IPMI console, power it on remotely, have it boot straight into a bootstrap script, remotely install an OS like CoreOS/Flatcar without manual intervention, have orchestration scripts install the necessary basic services on it without manual intervention, and boom, it's part of your own "private cloud" ready to deploy containers to that may well not require maintenance "below" the container layer for years.
And they don't realise you can contract someone to do this for you for low enough day rates that it'll take you a truly massive setup before you're paying the equivalent of a full time engineer.
100% agreed. I venture that it's a side effect of our industry relying almost entirely on universities and self-learning to propagate knowledge, instead of trade schools. Academics don't want to teach "messy" real world skills like how to actually build a distributed computer with your own hands: there's a very narrow range of skills considered acceptable to teach in the academic world and sysadmin isn't one of them.
Meanwhile we've had 20 years of media and culture telling people that those who get rich are the software guys, never the infrastructure guys, and that the easiest way to get into the industry is by learning web design.
Also, I suspect the industry is exhausting the pool of talent that learned computing on UNIX workstations at university in the 80s and early-mid 90s. The people who lived and breathed servers and UNIX are retiring out of the workforce, and not really being replaced. AWS is stepping in the fill the breach by selling sysadmins-as-a-service, basically, with a thin veneer of GUI programming slapped on top, and some proper documentation writers. Selling pickaxes and shovels to the gold miners - classic.
Finally, I think a lot of firms have developed a "no contractors" policy. I've never worked at a company that allowed contractors and I'm about mid-way through my own career, a bit less. At some point it has become received CEO wisdom that they're building a committed team of true believers, so "mercenaries" aren't welcome. It leads to the kind of thinking you see in this thread where "I don't have the skills, therefore I must hire someone full time, and they must earn as much as me, therefore it's too expensive" ends up leading to "and therefore I must outsource to a firm that 'contracts' for an hourly rate billed via credit card".
The contractor has a problem similar to employee turnover. Contractors make sense when there is a throw-away project (won't need long-term maintenance) that doesn't embody any proprietary knowledge or skill. As soon as a project could expose a trade secret or require on-going support, the contractor becomes a big risk.
>What’s the connectivity into your 7+ year old asset cost? What about the space, power and cooling to provide it a home?
10Gbe ethernet was standard in 2013/2014. Space power and cooling is company dependent. If they're tiny, a closet in the building or a colocation facility, if they're larger, a room. If they're huge, a datacenter.
>What happens when your environment gets attacked by a volumetric DDOS, is that 100Mbps circuit going to get overwhelmed?
I can't recall the last time a customer got hit by a "volumetric DDoS". End-user facing systems sit behind cloudflare. Internal apps are... internal. Who cares if the WAN connection is getting hit by DDoS when your users are on campus? In post-covid world, they've got multiple tunnels with SD-WAN.
That's ignoring that fact that Centurylink or Comcast or Telia are pretty good at mitigating DDoS in 2020. Just because they don't post about the attacks they endure on a blog doesn't mean they're inept.
I never say a whop running on AWS and not having a person managing it. Said person always has enough skill to replicate same setup on a dedicated hardware/VPS. I don't say such shops does not exists, it just I never saw them.
To me AWS killer features are RDS and ability to provision 100 servers in a few minutes. Most of the time scaling is not needed and RDS alone cannot justify the cost of AWS.
For a lot of companies, 20% is an acceptable markup to pay, in return for not having to deal with hardware failures, delays due to ordering new hardware, not always having instant failovers, over-purchasing for capacity/scale, the extra human dev ops costs, etc. If you're well-positioned to handle that, then great, AWS may not be cost-effective for you.
As someone who has built out over 30 different accounts in AWS for various clients, this is definitely not entirely accurate. Elasticache is a caching mechanism not a database. Anyone who thinks they are the same has not used them extensively. Also RDS and Aurora both have had their super privileges removed so you might have to possibly overhaul some of your code base just to make it work. I’m sensing some uninformed fan base propaganda here
Plus, as mentioned, you get a vast menu of services to choose from, all of which are managed. Just in terms of databases alone, you could go with RDS (raw MySql/PG), Aurora, Dynamo, Elasticache/Redis, DocumentDB (Mongo) and more. Plus managed backups, failover, multi-AZ, security, and connectivity built-in.
If your team is filled with devops engineers, then, sure, go with the non-AWS route. But it's a lifesaver for agile companies who can't afford a lot of infrastructure work and need easy levers to scale, and can afford a modest markup.