My goodness is it Monday already and time for yet another of these articles?
At a given scale and software maturity, it is cheaper to buy your own hardware and pay for your own team to operate it. Generally, this is when you start thinking of your hardware needs in terms of fractions of an entire data center, but in simple cases may be smaller.
This almost is not ever true for software startups, who get more value from speed, flexibility, and not needing to pay a team to manage hardware. Just let [cloud services company] handle it for you until you reach that point.
And then when you do reach that point, you get to write the article about how everyone using the cloud are fools because it's cheaper to self host.
The cost differential in my experience is generally higher on the low end. Once you get into the hundreds of k per month in cloud fees you can get steep discounts, and some of the gap starts to close.
You also need devops resources to manage a cloud setup - most of my jobs in recent years have involved exactly that, and it's not appreciably cheaper than managing "raw" hardware resources these days (ever since IPMI etc. became the norm); for some time I did consulting and provided devops services on retainer. The amount of time spent managing actual hardware was a rounding error of the overall devops cost for those of my clients relying on colo's or managed servers.
Since many colo providers also provide managed servers and cloud services, the flexibility is also not really an issue - you can start with cloud services, rent managed servers as you know what your base load is like, and can tie them into the same setup, and do the same with colo services if/when it's cheaper (the tradeoff between managed and colo is trickier than with cloud - the cost differential there is often dominated by realestate costs where the relevant data centres are; if you're somewhere expensive, managed servers elsewhere will often be cheaper than colo near you)
I've heard many times that while retails AWS prices are high all big AWS customers are getting discounts. Just curious at which scale this starts in terms of $$.
This usually requires a term spending commitment and gets you an effective discount of around 5-10%. Higher spends can yield significantly deeper discounts, of course. I think you'll find it difficult to get specifics at higher spends due to non-disclosures.
Google 'AWS EDP negotiated' and you'll find some more info on the process.
Numbers I heard involved a much steeper discount than that, but may well have been an account they saw as particularly important. If you can only get a 10% discount you're still paying massively above the odds.
Yes. But people use "you have to hire people to manage it" as an argument against on-perm. When in practice, most companies hire "devOps engineers" or "cloud engineers" to manage their cloud infrastructure too. And those salaries seem higher than the infrastructure/ops guys of yesteryear. So resource cost isn't really an advantage.
don't forget the "cloud cost engineer", just to figure out how much it will cost to run the 20 services that should have been part of your app with random ways of billing the usage of it
What nerdix said, basically. People assume most devops work when you own hardware is hardware related, but for a typical setup, if your hardware related devops work make up more than a few percent of your total devops cost something is wrong.
As soon as the machine boots over PXE and is on the network, it becomes software. Hardware actually becomes easier to manage as you get more identical machines as you can do differential analysis to figure out which components could possibly be out of spec.
I have seen multiple thousand host clusters managed by 1-2 people, and day to day ops looked like absolutely nothing was happening. This included messing with hardware racking and networking. The most resource intensive was just uncrating, moving and routing cables.
> This almost is not ever true for software startups, who get more value from speed, flexibility, and not needing to pay a team to manage hardware.
Exactly. I cringe every time I look at the AWS/Azure running total, but at the same time I realize that paying a separate team would be even more ludicrous at our current scale.
Right now, I can single-handedly administer 100% of our IT infrastructure as barely a side-concern because nearly everything is inside the Azure/AWS bubble. Delegating IT management tasks to non-wizards is actually feasible too.
The moment you start racking servers on-prem, you need a very specialized workforce to keep them alive. I suppose you could vendor parts of this out too, but then you are really splitting hairs regarding why you brought crap on-site in the first place.
Software startup can use consumer grade workstations and self host at their office (or at home). Cloud at scale is VERY expensive. And if you do not need the scale, but reliability, it is easier to rent two servers at coloc.
> Software startup can use consumer grade workstations and self host at their office (or at home) (...) And if you do not need the scale, but reliability, it is easier to rent two servers at coloc.
Yeah, just don't forget the backups and monitoring, the networking setup (load balancing between your two servers or workstations at the very least), patching, moving to new servers/workstations when you outgrow the current ones (while also taking the risk of overprovisioning) etc.
Or just throw it all in a PaaS, ideally a container-based one (like Google Cloud Run or AWS Lambda or Scaleway Serverless Containers) where you won't have any lock-in and migrating away to something that makes more sense at your new scale would be easy. The only concern that would remain in that case is data backups to a different provider/place - scaling up/down is automatic, monitoring (at least the basics) are already covered, and there's no patching outside of your application's libraries.
It is a startup, not a space program. Backup can be done with simple ssh+rsync script. Patching is a weekly weekend cron script+reboot. Some heartbeat monitoring... If it goes down for a few hours, not a biggie.
For "throw it all in a PaaS" you need dedicated person, monitoring, billing etc... It is more expensive, not simpler. Cloud provider may also terminate your accounts, change settings etc...
If I had a nickel for every time that restores from "simple SSH+rysnc script" backups failed, I would not need my job as an infra engineer. You can do that, but being sure they work is hard
My clients spend a lot more on AWS bills than they would on self-hosted infra; $1000ish compared to $200/mo from self-hosted or a pair of $1000 servers colo'd for about $100/mo. They get a lot more - Reliability and support channels. That's what they're paying for. Absolutely, I recommend that they when they scale, they move their most expensive parts in terms of AWS costs (bandwidth-intensive things like gameservers and media bouncers) from AWS to self-hosted, because those are the things that are too expensive on AWS and have reliability properties where "Throw a bunch at the wall, fail them fast" is a good guarantee. For your production backups, don't rely on simple scripts. Don't rely on complex scripts. Rely on commonly tested solutions of other people's amortized experience.
SSH+Rsync is the wrong way to go about it, you will fail to restore backups. You need something that allows you to make snapshots. Fortunately that's easy and cheap if you use something like Proxmox.
It would be interesting to map out what kind of industries and startups require a load balancing and multiple production servers for the first few years of setting up. A single 32/64/128/256 core server is pretty decent for most work loads, and anycast services are decent in terms of static content delivery.
Was there a specific kind of startup that you had in mind for this?
VPNs and hybrid architectures exist for a reason. If 99% of your IT infra is boring crap but you have this one special unicorn machine, maybe throw it on an employee's fiber connection and set up a point-to-site VPN for that machine.
Does it make sense to abandon the entire cloud because of 1 use case?
I was never on cloud. I have a big home with fiber connection and some backup connectivity. Setting up new server is like $50/year in electricity. If project works I will sell it, and let new owner deal with cloud and scaling it up!
Just reading price list from cloud providers gives me a headache. We will charge you between X and Z, and hopefully you will not bankrupt on next bill. Also we may terminate services anytime for whatever reason.
> unicorn machine
If you have a specialized startup, that is 90% of your cost! Not some sort of unicorn single machine. If you can use consumer grade hardware without extra cost, that is major competitive advantage!
> VPNs and hybrid architectures
What is that? Do I have to study that? Seems like a major obstacle!
That unicorn machine had better not use unicorn data transfer. Transferring an average of 100Mbps from S3 to your unicorn? That will cost you $1-3k / mo depending on how you configure it and what tier you’re in. Never mind that buying the hardware to sustain these data rates at home or in the colo is so cheap as to not even be worth mentioning and can easily be done on 20-year-old gear.
Of course, those prices to rapidly up if you use serious data. Want to train your fancy ML model to draw cats based on your giant data set of customer cat photos stored on S3? Want to do it on your nice nVidia box at home? That, by itself, might cost as much as an expensive Silicon Valley FTE who could manage an entire installation in a colo facility nearby.
> This almost is not ever true for software startups, who get more value from speed, flexibility, and not needing to pay a team to manage hardware.
It's not that simple, some costs on AWS can be high and some tasks don't need the availability/reliability guarantees AWS provides. So even for small startups sometimes having own hardware can be a good cost saver, through with some risk attached to it.
For some mid-sized companies with relative fixed resource utilization sometimes renting rack space or renting bar metal (in a server center, with service for parts replacement) can be a good cost saving measure.
In the end while AWS is flexible and convenient with a lot of services easily accessible it is also expensive (at least when it comes to compute power) so the question is most times how much money do the flexible, convenience and services save. For many companies this is a lot. For some companies (of any size) this is not enough.
I mean this is why there are such funny things as being able to buy a "GPU compute min server center in a container" (through that's outside of the price range of an small company, but smaller versions similar to this exist, too).
> It's not that simple, some costs on AWS can be high and some tasks don't need the availability/reliability guarantees AWS provides. So even for small startups sometimes having own hardware can be a good cost saver, through with some risk attached to it
I agree. I'd like to add that AWS's value add can be the lack of need to understand the product configuration, just the SDK. Your start up already has developers that can read SDKs, you might not have someone on your staff that can set up an elasicsearch, redis, or other service.
When you grow enough that the fine tuning of such services might matter, you likely can hire the right people to run those in house.
I've been on a team with ~6 DevOps members for 5 years, and it's been very helpful to be able to pick up and drop services as we evolve.
I think it depends. My best friend works at a company that took four years to architect and build V1 of its product on prem. They had to build a lot of things by hand that are standard components in the cloud. They architected and built V2 on AWS is a single year. The higher cost is considered small by them compared to three additional years of waiting for a system to offer key features that upsell customers.
If you are software startup, then you don't need all those cloud services. You need a server, install webserver, database, redis or whatever you need and you're good to go. Going cloud as startup is almost always over-engineering. Just go simple. Your total costs will be couple of hundreds $$ and you'll serve a million people with this simple infra. Probably much more.
>This almost is not ever true for software startups, who get more value from speed, flexibility, and not needing to pay a team to manage hardware. Just let [cloud services company] handle it for you until you reach that point.
>And then when you do reach that point, you get to write the article about how everyone using the cloud are fools because it's cheaper to self host.
at least read the article, this is about a software company that started in a colo and literally wouldn't be a viable company if they tried to use the cloud. It's more about the fact that if you have the skills you can avoid having to burn a huge amount of VC money by running your own hardware
I am often surprised how people in our field don't run the numbers, so they are not talking nonsense when these hard numbers are so easy to generate.
Also, people seem to forget that not all startups are consumer lo-CPU requiring applications. If one has a product that requires non-trivial compute, going immediately colo can be essential. In my own previous startup, my calculations showed an AWS monthly expense of $96K, while I could spend $60K to acquire higher end hardware than AWS to build a server cluster and colocate that for a measly $600 a month.And forget about the "it's hard to run a server, let along a server cluster" propaganda - it is not hard at all.
> "it's hard to run a server, let along a server cluster" propaganda - it is not hard at all.
It's not hard at all if you have skills to manage hardware switches, deal with rack power budget, configure the power on sequence, establish procedures for hardware failures, establish procedures for disk management, redundancy, early earnings and replacements, deal with rolling out firmware updates for all components, provision resource limits between apps, provision and secure iLO or similar access, configure hardware monitoring and alerting, maybe run a SAN, ... (and lots of other things)
Sure, it's not hard when you have those skills. And you can offload some of that to another company if you're doing single servers rather than full rack(a). But it's not exactly a common set of skills either.
Edit: Also you can easily forget things at small scale. Just remembered LTT's "Sure we can host our video archive ourselves. Oops, forgot to turn on scrubbing. Aaand the old archive is gone."
> I am often surprised how people in our field don't run the numbers, so they are not talking nonsense when these hard numbers are so easy to generate.
Agreed. Unfortunately the tech industry is driven largely by fashion instead of objective engineering as we'd like to think. Trying to diverge from the fashion du jour (here, cloud.all.the.things) is difficult no matter how solid the numbers are.
At my last couple startups I've run a recurring review of all the cloud costs so at least there is good awareness of how fast the expenses are rising (faster than revenue in every case).
>and not needing to pay a team to manage hardware
But you pay for a team to manage the cloud instances and services. Usually more than the hardware guys.
Most of the hardware stuff is done by the data center anyway. You just create a ticket for remote hands and the data center staff is going to rack the server you sent them or replace a hard disk etc.
Not so true. Not all remote hands are made equal. Some are there just for simple repairs / maintenance tasks. Don't be surprised that it may need a lot more hand holding.
Yup, I'm a bit fascinated that an entire industry is built on the drug dealer / puppy / loss-leader model -- "here; try this for a while for free; you'll like it!"
No question, it is easier to spin up an instance and ler them handle the scaling. And unless you are really careful, it's then very hard to extricate yourself and provision your own servers in a/multiple colo sites...
I always like the control of vertical integration, but evidently enough like convenience to support a huge industry (and there are other advantages, but I don't think that cost is on the list of advantages)
> This almost is not ever true for software startups, who get more value from speed, flexibility, and not needing to pay a team to manage hardware. Just let [cloud services company] handle it for you until you reach that point.
This is an old myth. Today you can get dedicated servers provisioned in minutes with an API call. A competent DevOps person can manage 10s if not 100s with proper automation. When you have a certain baseline usage of compute moving to dedicated can give you massive savings. And if you're a heavy user of say, RDS, the cost difference can be even more dramatic while also gaining on performance. For ephemeral workloads, testing, etc. the cloud makes sense of course.
I'd say testing/CI is one of the best cases for renting dedicated metal because of the huge speed increase available, everyone hates slow CI, and CI workers are easy to setup so if hw fails (it probably won't) it's not a big deal, you don't need to scale up again immediately.
If you are only running in one cloud availability zone, with the backup plan being, we'll spin up new servers during downtime, maybe.. but if you are distributed over 2-3 hosting areas geographically for redundancy, I highly doubt that any cloud application is cost effective at any scale.
Right. I pay under 2k for server costs. And combined with all the different platforms I use and the convenience features and not having to hire my own devops person, I assure you I am saving A LOT.
There are few companies that operate at the scale at where it is cheaper to run your own servers. Once you hit that, it is a great problem to have.
Cloud doesn’t eliminate the need for dev ops. Sometimes it can help, but overall I’ve seen an increase in cost, workload, and stress.
The cloud is very much a your mileage can very situation. Just remember cloud providers are very interested in selling you a product due to their extremely high profit margins.
Eh. Sort of. I disagree with that statement. So far the devops of managing my cloud stuff on heroku has been a few days a year. And we have a better pull-request environment setup than most places I worked with that did or didn't use cloud.
Point is, devops as a service is great for small scale. I've done the math of what it would cost to switch away from heroku, and at this point it is quadruple our current costs, because of the time investment, and that's over a few years. And then moving to our own servers? even worse.
No doubt people will be commenting here on how cloud makes sense because you don’t have to hire experts to run your own computers.
Folks…… it’s pure fiction that cloud does not require experts to run it.
Cloud is more expensive and slower often more complex, and risks major vendor lock-in if you use the cloud vendors APIs instead of open source solutions.
And if you need GPU computing you’d be crazy to use cloud GPUs which are much, much more expensive and much much slower than consumer grade GPUs.
And if you can summarise which parts of your system are incurring traffic costs then you’re beating the deliberately complex systems intended to nickel and dime your architecture every time a byte turns a corner.
AWS "experts" are one of the most expensive. And they'll be pushing for EKS and the whole AWS stack. The funny thing is, if you are using K8s, you might as well setup your own servers. These experts cost upward of $1000/day. Hiring a full time is $150k+, and that's not for the expensive locations.
> A fast modern 16 core machine with 128 gig of ram can meet a lot of needs.
True, as long as you don't care about redundancy or scalability or wasting money because you actually only use 4GB of RAM and barely any CPU.
> So much kubernetes is just overkill.
Absolutely true. Kubernetes is great in some scenarios, but pretty complex so overkill when starting. Serverless (ideally containers as a service), docker-compose and Nomad* are much better options unless everybody is already a Kubernetes pro that has all the tooling ready.
* - disclaimer, I work at HashiCorp, but I actually believed Nomad is amazing, sufficient in most scenarios, and actually better than Kubernetes in some aspects before joining and it's one of the reasons I joined.
> wasting money because you actually only use 4GB of RAM and barely any CPU.
Hetzner will rent you a massive 128GB of RAM server for ~150 bucks a month. In company terms, that is peanuts and whatever time you'd spend installing and managing K8S is pretty much never going to pay for itself, not to mention the added liability of all those moving parts.
The whole "autoscaling" thing is mostly bullshit invented by the cloud providers to make their insane margins more palatable. A dedicated, non-cloud stack sized to peak capacity will be cheaper than the majority of "autoscaled" stacks at minimal capacity.
>> A fast modern 16 core machine with 128 gig of ram can meet a lot of needs.
> True, as long as you don't care about redundancy or scalability or wasting money because you actually only use 4GB of RAM and barely any CPU.
Redundancy: you want two fast machines. More if you want them in multiple locations, but we still see a lot of people with presence only in us-east.
Scalability: You can go to 96-core per socket on commodity x86 servers now; 16 -> 96 is a lot of room to scale. You can also go up to I think 3TB per socket on Epyc Genoa.
> A fast modern 16 core machine with 128 gig of ram can meet a lot of needs.
Computation is the easy part, you can pick that kind of performance up at your local IT box store.
The hard part, that what Kubernetes makes easy, is the rest: dealing with failing compute nodes, with isolating workloads, writing load balancer configurations, keeping them up to date, setting up and tearing down volatile (=feature branch) environments...
I feel like k8s is the new Linux. It gives you so much out of the box, and you can tap into such rich ecosystem, that the perceived complexity is well justified.
When I was doing devops consulting, my "cloud clients" were always by far my most profitable, because they were so used to that and so blind to the cost savings they could get.
Maybe the people running on the cloud are, therefore, efficient businesses? despite using a financially costly option? or thanks to AWS allowing an immediate setup ?
Getting set up in AWS in a remotely manageable way is in no way "immediate" - a lot of my income came from that fact (and another large chunk came from fixing setups that had descended into total chaos because people "just" started spinning up resources without properly managing it)
If you want ad hoc rapid setup you can get a managed server up and running about the same time as an instance with AWS too, from a provider that also offers colos and cloud so you can pick and choose and move between the three depending on what is most cost effective for specific needs. In practice, ironically, having that option makes cloud instances even less cost-effective because having the option of spinning up cloud instances an tie them into your environment means your can allow yourself to have less spare capacity on your managed/colo servers.
Or they're not aware of the options. AWS has massive marketing and market share. A lot of businesses aren't even aware of AWS savings plans and spot instances. That's even why there are companies that purely offer to optimise "cloud costs" as their core business.
I'm one of those expensive EKS Experts. I encourage many of my customers to go to other clouds that are cheaper, but there's a) a "Nobody got fired for buying IBM" thing that founders want, and b) Actual value in the AWS Ecosystem of managed products for all sorts of things.
If you're using Kubernetes, you want a managed cloud storage solution and you want a managed database solution, and AWS is among the best for those. (Though, personally - I prefer GKE). The smaller VPS providers have solutions for those. DynamoDB, though? Not so much.
>And they'll be pushing for EKS and the whole AWS stack.
Nah, I'd recommend serverless for the vast majority of use cases. But otherwise, yeah it gets heavily into the AWS stack with SQS, SNS, Kinesis, S3, DynamoDB, Aurora, etc.
> The funny thing is, if you are using K8s, you might as well setup your own servers
Running k8s on bare metal is incredibly annoying and brittle. Upgrading your clusters is a constant firefighting experience, and because k8s doesn't have something like an LTS version [1], you have to keep upgrading ALL THE DAMN TIME, which often enough means you have to upgrade not just the clusters but also your resource definitions (e.g. ingressClass), or mess around neck-deep in the bowels of MetalLB because it managed to shart all over its configuration yet again.
I'm assuming by AWS 'experts' you mean third-parties? Because every SA I know at AWS will proactively tell you when to not use AWS. Half of my job at AWS is reviewing architecture diagrams and determining whether workloads are a good fit for AWS.
I've never met any AWS expert at AWS who isn't quick to tell you when an AWS service isn't a fit for your needs.
As far as I can tell AWS Cloud requires more expertise to run properly then hardware, but comes with the benefit of more flexibility and availability of various not too hard to integrate services.
In a certain way that's the big failure of AWS and similar, in another way it's a big win for them as it means people will pay more then necessary all the time and there is a huge consultant industry pushing people to adapt AWS (and similar) to then charge them for consulting later on.
Through this kind of patterns tend to fall apart at some point and only hold up for a while due to monopolistic market dynamics around server centers due to their high cost and locality concerns of customers.
> it’s pure fiction that cloud does not require experts to run it.
You're not wrong, however, in cloud I can just use a single "cloud engineer", as opposed to having to find a:
- Hardware specialist (someone who knows how to physically install servers, NAS, SAN, etc.)
- Network specialist (someone who knows how to physically install network cards, switches, etc. + config, setup, etc.)
- Data center environment specialist (usually can come with the data center, but they still need to be vetted)
- Security specialist
Oh and btw there are 100+ vendors who supply all of this. So not only do I need to find a, for example, a network specialist, I need to find someone who uses Nagios because my infra lead made that decision.
You need them in both environments if you're serious about security. Unless you somehow believe that being in the "cloud" grants you some kind of security by default?
> - Hardware specialist (someone who knows how to physically install servers, NAS, SAN, etc.)
You can contract this out to the datacenter itself. This is typically a one-time expense when installing infrastructure that will then last the useful lifetime of the hardware (e.g. 5-7 years)
> - Data center environment specialist (usually can come with the data center, but they still need to be vetted)
What does this even mean?
> - Network specialist (someone who knows how to physically install network cards, switches, etc. + config, setup, etc.)
Yes, true, a truly robust network setup that is colocated in multiple data-centers is probably the most difficult thing out of the entire setup.
> Oh and btw there are 100+ vendors who supply all of this.
You mean Dell? Or HP?
> So not only do I need to find a, for example, a network specialist, I need to find someone who uses Nagios because my infra lead made that decision.
You just need a linux admin on a contract. If they are worth their salt, they don't even have to be full-time.
> You need them in both environments if you're serious about security. Unless you somehow believe that being in the "cloud" grants you some kind of security by default?
I don't disagree that you need a security focused person in cloud. However, the scope of security is drastically different in cloud vs data center (and varies depending on what your data center setup is).
> What does this even mean?
What happens if your server is overheating? As I said, this is usually provided by data center itself, however you have to still have to vet the vendor. AWS/GCP/Azure all have reputations about environmental factors that are rarely questioned.
> You mean Dell? Or HP?
Yes, AND Nagios, Solarwinds, Cisco, IBM, VMWare, etc. etc. and every one of their subsidiaries that have esoteric needs.
> You just need a linux admin on a contract.
Ummm...no. Just no. Have you ever had to create multiple subnets, DMZs, SD-WAN setups, plus VPN access, etc. etc. with multiple locations, all with security in mind, and full redundancy across three data centers? "just a linux admin" isn't the job description for that.
> What happens if your server is overheating? As I said, this is usually provided by data center itself, however you have to still have to vet the vendor.
It's 2023. Hardware operates within spec 99.99% of the time, unless the data center is literally on fire. AWS/GCP has outages too, btw.
> Yes, AND Nagios, Solarwinds, Cisco, IBM, VMWare, etc. etc. and every one of their subsidiaries that have esoteric needs.
If you have all these esoteric needs ON TOP of needing a multi-DC redundant setup with 1000s of servers, then being able to afford 1-2 additional engineers is a complete non-factor. In the cloud, this would mean you're hiring additional cloud engineers anyway.
> Ummm...no. Just no. Have you ever had to create multiple subnets, DMZs, SD-WAN setups, plus VPN access, etc. etc. with multiple locations, all with security in mind, and full redundancy across three data centers? "just a linux admin" isn't the job description for that.
Again, if that's your use case, then we're not talking about 1 cloud engineer vs 3 non-cloud engineers. We're talking an IT department of several hundred people just for infra. It's absolutely not evident that cloud would result in cost-savings here, because at this scale, everything is on an individual scenario basis.
> Ummm...no. Just no. Have you ever had to create multiple subnets, DMZs, SD-WAN setups, plus VPN access, etc. etc. with multiple locations, all with security in mind, and full redundancy across three data centers?
>> Again, if that's your use case, then we're not talking about 1 cloud engineer vs 3 non-cloud engineers.
At my company we operate a setup like this on AWS with about 8 people total (5 infra engineers and 3 security engineers, I suppose we could also count their 2 managers), and that's not even their full job.
At my company we operate something similar to that with 2 system engineers and 2 network engineers. 2 main offices and 2 separate physical COLO datacenters + lots of satellite offices and remote workers. Everything is HA and we have a DR site that is mostly unused. Everything is backed up securely (push only) and can be easily restored (database aware) extremely quickly if needed. In an absolutely worst case shit hits the fan scenariou there are automated tape backups stored offsite.
Hardware is easier, cheaper, and more reliable than ever and I rarely (system engineer) have to go out somewhere. Harddrives still occasionally fail but with a strong RAID and multiple hot spares your system will not even think about failure until you have a lot of failed drives (and even then you have a whole separate redundtant storage array to HA over to). I would estimate someone has to drive out there every 6 months on average - often less. Rarely a piece of RAM will fail. By the time I drive to the COLO the replacement drive or part is already there (the system automatically calls home and orders a replacement part from the vendor).
All of these systems are daisy chained together with multiple 40 or 100gig ethernet links so that even if a core switch goes down + multiple servers everything will keep running.
Does this take awhile inititally to setup? Yes it does. Does it take a lot of work to maintain and upgrade? Not at all.
> Does this take awhile inititally to setup? Yes it does.
This is fundamentally the difference to me. I've worked with hardware before, and if your business and load is predictable, and you understand how to architect it before going in, it's a good option. You can put in a bit of extra up-front work and get more control and lower costs.
The other issue I have ran into is that developers all have worked at a startup where they had really shitty on prem setups. As in just a bunch of devs hacking it together without much thought and no actual sysadmin or system engineer / dedicated ops people. That leaves a bad taste in peoples mouth
> Hardware specialist, Network specialist, Data center environment specialist
If we're talking startups here, still working on product market fit with small active user counts.. this is massive overkill.
Every startup I've been part of (except one) could've very easily been hosted from a single server under someone's desk (sure, get two for redundancy). The cost would be approximately nothing instead of a 5 figure AWS monthly bill. This isn't resume-building though, so it doesn't happen.
(Amusingly, the one startup I was part of that grew way too big for my "server under the desk" hosting model, they went to a colocated datacenter.)
> Security specialist
Yes you always need this if you're going to have anything connected on the internet.
Generally you just need two people (and then probably should have 2 more for redudancy) which would be the System Engineer and the Network Engineer. Between those two they should abe able to do the datacenter work themselves (with physical assistance for racking / moving heavy equipment which is rare). Some of the superstar infrastructure architects I have met could do both jobs easily themselves (although that is rare).
Security you should probably have no matter where your shit is hosted......
Simpler solutions like dedicated servers, VPS work well enough, and the service provider will take care of all the above. The problem is with things like AWS and GCP, and once you fall into their wide matrix of products you'll have a hard time escaping out of it.
This applies only to these major cloud providers where traffic costs are like $100 per TB. With a decent VPS or dedicated server you pay $10/TB, €2/TB or 1Gbps unmetered.
I remember a time when SQL was advertised as "write programs in plain English and fire all your programmers". It ended with the situation where you can't get a job for Oracle SQL if you only know Microsoft SQL. Are we already at a point where you can't get a job as AWS DevOps if you only worked with Azure before?
Sure, cloud is usually more expensive, so is developer time. If your team is wrangling more with the quirks of your own setup instead of the standardized workflow of a paas provider, your calculation won't work in your favour anymore.
And then turn back to trying to work out how to grant IAM access to the resource that developer asked for before turning their attention to why traffic won’t come out of the VPC before trying to work out how to do compression on API gateway.
The whole time pining for the old days of running everything on Linux.
Yes, that's exactly what "colo" in the title of this article is talking about. It is short for "colocation center". You just take your server to their data center, plug it in, and let them handle all the physical operations of running a data center.
yes, through depends what you mean with "everything else".
There are "rent a rack" options (you provide hardware, they provide a rack in a datacenter + network switch(es) + management tooling through the network etc. and when asked will replaces RAM/Disks etc as asked). You still have to maintain the OS through depending on the service you get and what you want to do this can be easier then expected (e.g. because networking is handled through their switches/tooling this can makes things easier and if you e.g. just want to run a single application per server there are net-boot options which allow you to do so in a manner similar to running docker containers. But if you want to run your own OCI container orchistrations then it comes with the cost of you having to well do that.).
But if you ask for you provide server hardware and container and they manage all the rest then I'm not sure it exists. It seems like a bad deal for them.
Yes, you can. I used to provide those kind of services on retainer. Actually managing the "low level" parts of "everything else" was always a near rounding error of the cost, though. Managing modern hardware is not time intensive. And then you put a VM/container setup on top and tie it into an orchestrator and it still looks like a a PaaS to the devs.
From the caveat at the end of the original article:
> This article doesn’t take into account other aspects that would make the comparison even more complicated. These include people skills, financial controls, cash flow, capacity planning depending on the load type, etc.
You can't handwave all this away in real life though. If you're rolling your own data center, the engineering time you need to put into managing servers, upgrades, hardware replacements, 24/7 oncall rotations etc etc is considerable and will often dwarf the costs of the hardware itself, particularly at startup scale. With a cloud provider, it's (almost) all abstracted away.
You know, you can't hand wave away all the additional costs of having anything in the cloud. You need lawyers to tell you your liability for using third parties. You need all that time and money engineering cloud solutions that work, that can scale, and aren't going cost tons of money. You need to factor all those multi-hour calls needed to get to a real person at the cloud company who isn't reading from a multiple choice book and actually knows what you're talking about. You need a rotation of people available 24/7 to know what to look for in the cloud instance and can handle things if / when they go sideways. All these costs are considerable and will often dwarf the costs of the oversimplified cloud estimate you're first given, particularly at startup scale. With an in-house team of systems administrators and colocated hardware, it's (almost) all well understood and abstracted away.
I wonder if people have actually thought about the costs you mention, or if the marketing forces that have led everyone to believe that everything should in the cloud are why people who've never done the slightest work in a datacenter think the cloud is the way to go. I hear it all the time, but people can't articulate real reasons aside from general handwaving about how much salary a company can save from getting rid of systems administrators. But I've also worked for companies that don't even have a single systems administrator of their own, and trust me - they pay much more money for everything as compared with paying for a good admin.
The curse of our industry: the majority of us are weak communicators, unable to explain why they sense the cloud is a poor choice. Meanwhile, the cloud sales people - picked because they are adept communicators - create a comprehensive reasoning mesh, literally playing with the in-house developers inability to explain the dangers they know and with which they are trying to snare the client.
One of my longest contracts was being on call for a a company to assist with their cloud setup. Most of this is not abstracted away all that well. Having managed both on premises, colocated, managed servers and cloud services, my customers when I was consulting tended to need more of my time for similar sized setups to manage cloud setups. Yes, with a colocated setup we needed to rack and wire up servers, but the amount of time spent for the lifetime of a server was typically measured in single digit number of hours amortised over several years.
EDIT: As to why the time to manage colocated servers is so low, consider that pretty much everything has IPMI or similar now, so apart from a few hours setting up an initial provisioning server for the entire environment, on a per server basis you're connecting power, screwing on rails, connecting the network, configuring IPMI (remote reboot, remote KVM, remote sensor access), running your provisioning scripts and starting (automated) burn-in tests, and then you have a bunch of new capacity in your orchestrator at the end of it. Everything else is pretty much the same as using a cloud provider, except your instance sizes are tailored to your actual workloads. The odd replacement of broken equipment can often be handled by your colo provider. When I was racking equipment myself, I'd often install more than one server an hour, and then not touch that server again physically for several years. Monitoring is largely the same - you need to be able to migrate away from failing instances either way, only now some of those failures might mean hardware has failed and you have a few more indicators in your monitoring system matching your IPMI sensors.
I think there's a lot of FUD spread by Cloud providers and people who make money mainly out of it ("Cloud Engineers", consultants, etc.) about how hard it is to setup and manage hardware + software. In many cases it is actually easier to do it for a bespoke solution than some cloud Frankenstein.
I worked in Datacenter Ops for a decade+, running 3-6 private datacenters as an HPC/Nix admin wearing many hats. The amount of work involved with physical plant, circuit management, staff coverage, weather events, regulation, travel, hardware upgrades, replacement...the list was seemingly endless.
That's also not to mention hopefully you never have to staff augment for junior-level tasks. Try hiring someone to correctly rack and stack an entire rack. You just end up re-doing it yourself or paying CDW to ship you an entire pre-assembled rack the next time.
Not on the same scale but in my previous company we observed the same thing (though the other way around since we moved from self-hosted to cloud).
He had a team of 4 managing the infra for an app (82 million users, 30 million DAU) and spent around 500.000€ per year on the data centers + this team's salaries + some other expenses like travels to the datacenters, some new stuff etc..., let's say around 1.000.000€ per year.
At some point upper management decided to move to the cloud and we eneded up in a worse situation: the promise of easier resource management was unfulfilled since the same 4 people was managing now the cloud so no cost reduction on that side and most important, cloud costs were 4-5 times higher than self-hosting
I still wonder why management decided to go that way
I was PM for a cloud hosting solution that we were originally running colocated. We moved it all to AWS for about 5x the costs after you calculated capex vs opex etc.
"The Cloud" allowed our customers to set up their applications with High Availability and PITR (via RDS, building that ourselves at scale would have been problematic). It also allowed us to launch services anywhere in the world. Setting up data centers globally would have been hell.
The product was still easily profitable despite the costs, so I think we made the right move.
We are nearly two decades from the launch of EC2 and I still need to explain to people (technical and not) that the cloud isn’t cheaper once you get to a certain scale. Everyone is always amazed.
Cost is only one aspect and infra should know that moving to the cloud will usually be more expensive than running on colo. I assume there were other aspects that played a role in this decision like elasticity, scalability and high-availability which are easier to go for with a cloud provider.
This keeps being brought up since the early days of AWS. Yes, the cloud is more expensive. It simply adds another layer of management, scalability, and flexibility you would not be able to build yourself. And there is a layer of profit for the cloud provider as well. Whoever was told the cloud would be cheaper was being lied to.
I've built cloud, colo'ed and hybrid systems. Every few years this would come up. Now, so many cloud services we use are connected to the cloud providers we don't even need to spin up SSO/authentication servers because it's simply built in at this point.
Something which sometimes gets lost from a US centric perspective is that cloud lets you get servers much closer to the end user without having to establish relationships with hosting companies and colo facilities on another continent. Shaving 100ms off of each request can be useful in a lot of circumstances.
And if latency is important, you need that origin to be close to the user.
The front end is the least latency critical component in a lot of stacks, because the user downloads it once and can cache it on their machine, whereas every single action they make using that UI is subject to the latency of going to your back end
When you consider that 2U size servers with 128 cores and 2TB of RAM are now a commodity thing you can buy for not a totally absurd price, you can serve a real metric shitload of end-user-facing https content and applications from literally one item of hardware now.
I'm not saying put all your stuff on literally one box, but colocation space for even a really high traffic thing might not exceed something like 1 44U cabinet on the west coast (Hillsboro), 1 cabinet in Chicago, and 1 cabinet in northern VA or in a NJ datacenter.
Why don't companies just use hosting services such as Hetzner? way cheaper expecially for bandwidth, you don't have to pay the hardware costs, and you can scale
I love Hetzner and keep a couple of machines running for prototypes and running scripts. But I can't use it for production or as my primary remote development environment because for someone in India the latency is way too high. I really wish they had Indian datacenter or any similar alternative. The ones in the country are at least 4x priced with still worse specs and much worse reliability and service. If anyone knows any good option in India, I would start using them for me and clients in a heartbeat.
Until then, I am stuck with AWS.
Plenty of companies to that, have rented bare metal, have owned bare metal, etc. It's just not a very visible practice and not backed by armies of marketers.
Another aspect, not mentioned in the article, is that you can arrange to have machines with hardware resources specifically tailored to your workload. e.g. AWS can't get you a box with 20T of NVMe storage. So if you have a workload that needs 20T of fast storage, hosted on AWS the system architecture is different : you're using NAS which has much higher latency so now you're probably using more machines, which is less efficient, and of course much more costly.
As an old-school co-founder, honestly, I just enjoy managing the infra. Sometimes there are issues and outages, but I'm offline maybe 3-4 hours per year - no big deal. It's about 1/4th the price (for the equivalent performance).
Over the weekend I created a globally distributed CDN pulling data from a globally distributed, auto healing database as a cache. It took a few hours then I tore it all down in 15 minutes, and it probably only cost a few dollars.
We're a small company that used to have a full rack in a colo, and have completely migrated to AWS 6 years ago. Our cloud costs are ~20x what our colo cost was.
There are things we use in the cloud (DynamoDB, Aurora, CDN, etc.) that I just don't want to have to build and manage on my own. For us, the managed services are very much worth the extra cost.
I couldn't believe how much better life is without having to diagnose failed DB backups.
People love to generalize this stuff and I still work with hardware all the time (gasp!)
What it really comes down to is this - you cannot just make a blanket assumption on Cloud vs Self Hosted
Generally however I believe that the hybrid approach is going to be the best for most larger companies - any highly variable workload (that benefits massively from autoscaling in the cloud) is going to be better renting servers from AWS or similar. And any very predictable workload (especially ones that require lots of compute or GPU power) will be cheaper to run on your own (think something like a GPO farm for a 3D animation studio).
And of course there is nothing stoping you from doing both and having your very predicatble and high power workloads in your own DC with your own managed hardware and then just autoscale as needed to whatever Cloud you want.
This shit always comes full circle - we had Mainframes and Terminals (basically a mini cloud) and the pendulum swung for awhile to personal PC's and now we are penduluming back again. Theres no all encompassing solution for everyone.
There's a major element missing from these discussions, and I'm stunned I don't see it in the comments either. And that is,
...what happens when things go wrong?
One of the major advantages of cloud, be it AWS, Azure, GCP, etc, is the ability to bridge across failure domains. Properly architected, a workload can be resilient in the face of data center outages and circuit breaks.
To me, these savings from operating in a single colo are living on borrowed time. Because they're one data center fire or cable break away from failure. Can the workload survive a sudden outage for hours? A day? Will it survive a loss of the data center and implementation of the business continuity plan? (There is a plan, right?)
Every organization is different of course, and not everyone needs active-active failover, or 4-nines of availability, etc. But when figures of $400m start getting thrown around, I can't imagine it's in a state where "oh, we'll just rebuild it elsewhere" is the plan.
Show me where they achieve these savings and still have a viable DR/COOP, and I'll be impressed.
Architecting a multi-region app on AWS is not trivial. All basic building blocks AWS provides makes running MultiAZ very easy, however very few AWS services support multi region deployments and failover out of the box.
I appreciate this argument, and let me give the counter-example. 99.5% uptime is good enough for nearly all businesses.
I run a small IT service company - we provide a hosted service in the insurance industry.
We host on-premise, in our physical office (well, across 2 offices for DR). I personally have a history in infra management, so I generally enjoy it. You're right that sometimes things go wrong. Last year, we were down for about 3 hours. The year before that, there was a 6 hour outage. We've had failures in power, ISP, networking, server hardware, software, etc... but we've matured to a point that we can handle them quickly.
Most importantly, these outages are understood by the business and the users. They know going in that they get a credit for outages, and that it won't exceed 1 business day per year (usually much less).
Because we accept this 0.5% downtime (it's actually much lower in reality), we pay 1/10th what I've been quoted by cloud services - probably even less.
>99.5% uptime is good enough for nearly all businesses.
I don't know what your business is, but as a presales engineer in the ecommerce field, if our SLA was 99.5% we wouldn't even make it past the first round of vendor selection.
> I don't know what your business is, but as a presales engineer in the ecommerce field, if our SLA was 99.5% we wouldn't even make it past the first round of vendor selection.
As you know, the SLA isn't the same as the downtime. It just means the threshhold where you'd start paying back some credits for missing the SLA (only to customers who notice and ask). Which is often cheaper than building the infrastructure to actually support the promised SLA.
You can also have different levels of avaibility for your customer facing and critical infrastructure and the infrastructure running your internal services and less important stuff. No need to have a bunch of 9's for your internal test environment or for the systems managing app deployments to the fleet of internal laptops.
Our customer contracts obligate us to inform all customers about any outages. We can’t just hope they don’t notice and only pay the ones that happen to.
Out of curiosity, how do you handle cases where the cloud provider lies to you on their status page? Do you pass on the lie to your customers or do you have your own monitoring in place?
The same that happens when things go wrong with AWS (just cheaper).
One can of course engineer for any level of redundancy the business needs, whether on colo or cloud. Or not, as in most cases. We had four multi-hour outages due to AWS being down last year.
It seems to me that having colo machines for your base loads would be the cheapest, having cloud machines available for flexibility would alleviate scalability / business continunity concerns, and then having them all on cloud networking would allow integration of the managed data services, etc.
Isn't there a "throat to choke" aspect of this as well? While not actual indemnification, leveraging a cloud vendor at least allows for a "but AWS/Google/Azure" was down response to your customers. And cost offsets can come into play as well in the event of a failure, or even missing an SLA, in some cases. You don't get that when you roll your own.
Not to mention certain, tested disaster workflows are available such as global tables/databases/etc., alternate region backups.
If your solution is not mission/legal critical, and outages are simply "uncomfortable" then rolling-your-own seems a viable alternative. But, as your customer base grows in size/maturity/ the costs can start to look a little less important. You just bill it through to the end customer.
It's true even for hobbyist-scale. I can rent 1/3rd of a rack at Hetzner facility in Helsinki for €120/month. They'll even throw in a BGP session, 1gbit uplink and 2TB of traffic for free.
Sure, I need to put a physical server in there, but it's not hard to get something decent for €300-ish on ebay.
And €120/month is nothing for rack space when comparing to how easy it is to rack up an AWS bill. Also, it's 14U space, so scaling is also quite trivial.
And this is Hetzner, getting a 2U space in some local datacenter might be even cheaper.
I'm actually strongly considering doing this, cause currently I'm renting 2 dedis from them at about €90/month. If I need more capacity, it's definitely going to be cheaper in the long run to just colocate.
Out of curiosity, why go with a rack rather than just renting a server from them? At hobbyist scale this would be more cost-effective no? FYI they can give you 10Gbps uplinks on their servers too for an extra fee, no need for colo.
I think the perceived laggard side is less enthused to engage in the debate.
On lots of public forums, there will be plenty of characters telling you to use Rust/Crypto/Serverless/AI for everything. Fair enough, they are cool technologies but don't blow up your business for them.
Everyone is going to have a different answer to this (and that answer may evolve as the business does).
All business decisions like this are about what level of abstraction is appropriate for us (today) and what level of specialization is conducive to our strategy[1] (today).
In some cases the infrastructure outsourcing question won't even be particularly important - a wash essentially regardless of the path taken (in the bigger picture of the enterprise's mission anyway). Other times, it'll be - effectively - paramount.
[1] Or current priorities, concerns, strengths, weaknesses, biases, pressures, risks, opportunities, competitors, etc.
Cloud was always more expensive (in the short term) than self-managed/collocated servers.
How is this news?
In cloud, you pay for availability, for ad-hoc instances without waiting for a month to install new servers and more.
You only need to hire “software” DevOps, rather than people who neatly lay your cables, journey to DC to replace disks, etc, etc, etc.
Pick a good, cheap cloud. It is a commodity product. Just because AWS is expensive, you don't have to pick AWS.
If going solo is an option, you likely do not use any of them “fancy” AWS managed services. In that case, you can just as well go with any of tens of other Cloud providers.
DHH sang the song and some people take it as a gospel. (eyeroll)
"You only need to hire “software” DevOps, rather than people who neatly lay your cables, journey to DC to replace disks, etc, etc, etc."
Lol, what year you think we are? 1998? There are colo facilities around the world, you have 'remote hands' to take care everything related to hardware.
You just need ansible/xyz to deploy you software, thats all it takes
The last time I have seen such an operation was back in 2009.
Also, even today you can buy a barebone place in the rack, with climate control and power supply. I know companies who do their disk replaces, cable management even in 2023.
I am fully aware you can have mostly managed data centre, like for example, Leaseweb, OVH, Servers.com. But if you have specialized hardware requirements, sometimes you have to wait a few days/weeks for installation.
I am old, but I don't think we are back in the 1998.
A question - doesn't using any US cloud provider mean that your data is accessible to all sorts of 3 letter agencies without any warrant and informing you about the fact it was accessed?
Yes, but it's much harder to have a legally enforceable SLA with your in-house team and much less fun to threaten to sue yourself.
[On the other hand, in-house gives you greater control. If you're good (enough) at alignment and execution, you can optimize your infrastructure for your use case and either save the organization money or enable more money to be made through enhanced capabilities.]
They article notes that it’s unclear whether the analysis considers discounts from reserved instances, but then brushes it off. Considering that reserved instances save about 40%, that is a huge deal.
Next, the comparison doesn’t seem to account for the cost of personnel to maintain the self-hosted infrastructure, not to mention the replicate the security and other layers of infrastructure management that are built into AWS.
Next, Singapore sits atop at least 5 different pan-Pacific cables. They are connected to HK, Taiwan, Japan, Vietnam, etc. So this company uniquely benefits from redundancy that other companies not in Sing may not enjoy.
That said, I wouldn’t put all my eggs in region basket, let alone one DC! It doesn’t say, but it doesn’t appear that the analysis includes the cost of maintaining a subsidiary company in another region for HA.
Basically, this analysis, or at least the reporting of it, leaves a LOT to be desired. As it is, it is just a fluff piece that lets you see what you want to see. If you don’t like cloud, it confirms your bias. If you like cloud, the article looks like junk you can ignore.
> * cloud's automated management requires way less man-power to operate
My experience here is the opposite. I did contracting in the devops space for years, and those of my clients who spent the most on ops were consistently those using cloud providers.
My experience is the opposite: Those who see it as important and core to their business are more likely to own. The rest tend to see cloud services as a way to reduce capex. Never mind that you can rent or lease managed servers and still save.
It was almost uniquely down to complexity and e.g working around lack of flexibility (fitting workloads to cloud services and instance sizes rather than fitting the environment to the services).
We'd always run everything under an orchestrator, whether on prem, in colos or in clouds anyway, and with the same monitoring and same redundancy.
My problem has been hiring a person that has experience with colocation and knowing all the ins/outs of hardware maintenance and keeping DC operations alive.
It's a skill set that disappeared with AWS "engineering" became a thing unfortunately.
And this is exactly why I am making sure I keep this skillset (I am relatively young). Not that I am one of those people that is all in for on prem and anti cloud - both have their place. But knowing hardware I think will eventually become a more valued skill once all of the people your talking about finally do retire.
It’s definitely working with hardware, but more of working with operations that I’m concerned about! What needs to be done at what regular intervals. In addition, what are things to inspect at potential hosting sites, etc.
Like in every utility, at certain point of scale it make sense to cut the middle man and do it on your own. Mcdonald's, for example, own potatoes fields.
In my current working place, the cloud really saves us money. I would estimate 500k$ a year.
The problem is not having the backups, it's restoring them. How do you restore backups when you have no servers to restore them on because they all burnt down?
You trivially and quickly lease them from Hetzner, OVH, or literally any of the other shops that lease out dedicated servers? Or large cloud instances.
I don’t understand why you’re artificially constraining this? When we had colo’d servers, we still used cloud services where it made sense (especially S3, scaling storage sucks). I don’t think people are arguing for never using anything but your own equipment.
Sure, but it’s not just the hardware. The (integrated)services and reduced development time play a role as well. And help offset some of the HW cost difference.
“Lift and shift” is always more expensive. That is, you will always pay more for cloud servers than you will for on-prem at a certain level of scale.
Lift and shift is not really getting the cost benefit of cloud though.
As applications get re-architected with serverless technologies like lambda, costs are optimized. The exact amount of resources will pop in and out of existence for the exact amount of time they’re needed.
At a given scale and software maturity, it is cheaper to buy your own hardware and pay for your own team to operate it. Generally, this is when you start thinking of your hardware needs in terms of fractions of an entire data center, but in simple cases may be smaller.
This almost is not ever true for software startups, who get more value from speed, flexibility, and not needing to pay a team to manage hardware. Just let [cloud services company] handle it for you until you reach that point.
And then when you do reach that point, you get to write the article about how everyone using the cloud are fools because it's cheaper to self host.