Hacker News new | past | comments | ask | show | jobs | submit login
Is a billion dollars worth of server lying on the ground? (cerebralab.com)
330 points by george3d6 on Nov 2, 2020 | hide | past | favorite | 331 comments



He kind of touches on this in point IV, but never mentioned specifically: the cost comparison is not AWS vs ovh. It's fragment-of-EC2 vs ovh.

If you need a predefined small number of VMs and no other functionality, it would be silly to go with AWS. But on the other hand, if you want a set of servers of a given class spawning on demand, with traffic coming in via load balancers, with integrated certificate and DNS management, with programmable lifecycle hooks, with integrated authn/authz, with full audit logs and account management, with configurable private networking to other services, etc. etc. ... You'll pay more than the price difference for someone to implement all of that from scratch.

Compared to this many other points sound like conspiracy theories. Meanwhile people either don't know they can do better, or use AWS because they want more features.


AWS gives you all the things you'd need to scale, without heavy up-front costs. There's a natural path from small instances -> bigger instances -> load balancers/ELB -> reserved instances (or spot if it fits your workload). For a smaller company, any savings you'd get from owned servers would be offset by much higher dev ops costs.

Plus, as mentioned, you get a vast menu of services to choose from, all of which are managed. Just in terms of databases alone, you could go with RDS (raw MySql/PG), Aurora, Dynamo, Elasticache/Redis, DocumentDB (Mongo) and more. Plus managed backups, failover, multi-AZ, security, and connectivity built-in.

If your team is filled with devops engineers, then, sure, go with the non-AWS route. But it's a lifesaver for agile companies who can't afford a lot of infrastructure work and need easy levers to scale, and can afford a modest markup.


> If your team is filled with devops engineers, then, sure, go with the non-AWS route.

This seems backwards to me - running a simple VPS on something like OVH/DigitalOcean/Linode is a matter of creating the instance, setting the size, and setting up your server software. Super simple, and likely about the same complexity as setting up a dev environment.

Setting the same thing up on AWS requires slogging through the documentation of their many services, deciding what makes the most sense for you (EC2? ECS? EKS? Lambda? Fargate?), and then configuring more options with less-friendly UI than any other provider I've seen. If you can fully wrap your head around all those services, create a stack that makes sense, and maintain it long-term, you're probably well on your way to being qualified for a DevOps engineer job. I can spin up a quick DigitalOcean droplet for a new project in less than half the time it'd take me to set up an EC2 instance, configure the exact amount of storage I need, and pull up 2 additional browser windows to determine the specs and pricing of different instance sizes in different regions.


> running a simple VPS on something like OVH/DigitalOcean/Linode is a matter of creating the instance, setting the size, and setting up your server software. Super simple, and likely about the same complexity as setting up a dev environment.

Until that server goes down.


Sure but sometimes that’s ok. Everything goes down sometimes. You need to figure out how much effort you’re willing to expend for each extra “9”.

I wonder what percentage of all sites hosted at AWS are actually ready for a zone to fail.


# uptime 01:43:36 up 666 days, 11:42, 1 user, load average: 0.00, 0.03, 0.01

From OVH VPS. Funny thing it's 666 today. YMMV

# uptime 01:47:03 up 640 days, 6:53, 2 users, load average: 2.39, 2.62, 2.73

From hetzner dedicated server. YMMV

# uptime 01:48:11 up 482 days, 20 min, 2 users, load average: 1.53, 2.10, 2.78

From leaseweb dedicated server. YMMV


Please patch your systems.


Why would you leave a server running that long? Is that best practice? I'm by no means a sysadmin but I do monthly scheduled reboots, because on monthly reboots you test that reboot gets you a running server that correctly comes back, and you get a more thorough fsck.


Also not a sysadmin, but regular reboots are a must-have. Not only does it test that the machine comes back up, but it will also ensure that your kernel and other services are patched. Of course, the services can also be restarted separately after an update, but swapping the running kernel with a new one is still very uncommon.


Digital ocean droplet:

03:48:56 up 238 days, 15:04, 1 user, load average: 0.00, 0.01, 0.05


I'd say the risks of a server going down are about the same as an AWS setup going haywire at that level.


> Setting the same thing up on AWS

Having run a simple web server + database on both EC2 and Digital Ocean I disagree. The two services are nearly identical at that scale.

In fact I don’t see any significant differences between any of the proper cloud providers at this level. The cost & functionality are very nearly the same.


Then you remember that for years, Stack Overflow ran out of a couple of well administered servers. YAGNI. KISS. People forget the basics because "infrastructure astronautics" is fun, and it probably helps make a beautiful resume, too.


While infrastructure astronauts are typically money wasters and padding their resumes, I don't think SO is the best counter example.

SO's read to write ratio is enormous. While they talk about a handful of different bare metal servers a vast majority of their hits are handled by their caching layer which is not one of their bare metal machines.

Their write loads are not explicitly time sensitive either, if there's a delay between a successful post and cache invalidation it's not a big deal for a majority of their traffic.

Not every model is quite so forgiving. But even then there's a lot of stupid wasteful "infrastructure" made by people who think they're going to be Google, or at least tell their investors they'll be Google.


Well, it's an interactive database backed web site. That describes like maybe 90% of the things running on AWS.

StackOverflow is useful because it reminds people that machines are fast, and you probably don't need that many of them to scale to large sizes. StackOverflow is used by the entire global population of developers more or less and it runs off of one large MS SQL Server + some web servers. No auto scaling (not enough hw expense to be worth it), no need for fancy cloud LBs etc.

Sure maybe your service is gonna scale to more than the world's population of developers. Great. But ... a lot of services won't even go that far. For them it's hard to conclude it's really needed.

To pull off a StackOverflow you do need skilled sysadmins though. As the article points out, a surprisingly large number of people who call themselves devops don't really know UNIX sysadmin anymore.

https://stackexchange.com/performance


Autoscaling is overused anyway. It's cheaper and much less complex to have 200-300% of average capacity running 24/7 on dedicated servers (OVH/Hetzner, whatever) than trying to scale up and down according to demand.

Changing capacity automatically always has the potential to backfire. And as AWS & Co need to keep those servers running during times of less demand there's no way it's cheaper unless you have really unusual traffic patterns (even then it's probably not).


You're not wrong but it's worth noting SO as a case study for site design has some important caveats. Beefy hardware is great but aggressive caching and setting same expectations around write latency can be a massive scaling win.


I would guess SO had a relatively small number of employees and their servers were running a pretty straight-forward CRUD app and a database. Comparing that to the heterogeneous workloads that large organizations deal with is a bit silly. No doubt there's still a lot of fat to trim in those large organizations, but trimming that fat is rarely their largest opportunity (much to the chagrin of those of us who like simple, elegant systems).


my guy, you literally made me laugh. how's this https://stackexchange.com/performance different from any so called workloads large organizations are running. you've your app servers, db servers, load balancers and failover servers. pretty standard setup. yet SO is running on bare-metal. resume driven development and everyone thinking they're google | fb has killed and made money in our industry


> how's this https://stackexchange.com/performance different from any so called workloads

StackExchange is largely a CRUD app. High volume of tiny requests that hit the app layer, then the database, and back. Other organizations have lower volumes of compute-intensive requests, async tasks, etc.

With respect to the size of an organization, the cost of coordinating deployments and maintenance over a handful of servers grows with the size of the organization. It frequently behooves larger organizations to allow dev teams to operate their own services.

None of this is to say that there isn't waste or poor decisions throughout our industry; only that it's not the sole factor and SO's isn't the ideal architecture for all applications.

> my guy

I'm not your guy.


It is the World top 50 site by Alexa.

Comparatively speaking even compare them to the same CRUD app and DB, most are using 5 to 10x more servers with 1/2 to 1/5 of the traffic.


It is a large site by traffic measure but I would guess the traffic is heavily read only. Managing workloads with more data mutation introduces different complexities which mean you can't just cache everything and accept the TTL for writes based on cache invalidation.

edit: To be clear, not saying SO isn't an achievement, but its one type of use case that yields a really simple tech stack.


Their stats are here:

https://stackexchange.com/performance

Their DB handles peak of 11,000 qps and peaks at only 15% CPU usage. That's after caching. There are also some ElasticSearch servers. Sure, their traffic is heavily read only, but it's also a site that exists purely for user-generated content. They could probably handle far higher write loads than they do, and they handle a lot of traffic as-is.

What specific complexities would be introduced by an even higher write load that AWS specifically would help them address?


> Comparatively speaking even compare them to the same CRUD app and DB, most are using 5 to 10x more servers with 1/2 to 1/5 of the traffic.

No doubt, but how large are those other CRUD app organizations? Do they have a staff that is 20x the size all trying to coordinate deployments on the same handful of servers? What are their opportunity costs? Is trimming down server expenses really their most valuable opportunity? No doubt that SO has a great Ops capability, but it's not the only variable at play.


And you can probably cache all the data you need for the next 24h in a few terrabytes of RAM, _if_ you knew what that data was.


I was thinking more of the ordinary startup than of large organizations.


SO also ran on .net, which is a far cry from the typical startup server load running on Python, Ruby, or PHP.


What do you mean?


I guess that without a lot of optimizations, .NET will be much more performant. The lower the performance, the more servers you need and that will make it harder to manage your fleet.


Yes.


Another issue is people don't understand how powerful modern hardware is. You have modern systems that process less transaction per a second ones from the 1970s.

Just look at the modern SPA. They are slower than ones 10 years ago, and the JavaScript VM is much faster plus all the hardware gains. Why does it make Twitter and Gmail a few seconds to load?


I don't doubt certain mainframe apps of the (late) 1970s could beat the TPS of a generically frameworked app in certain situations, but do you have any real numbers/situations/case studies to back that up?


Can't remember the name of the system that did 4k but found a paper about bank of America processing 1k a second.


When you look at something like how Stack Exchange moved physical servers to a new datacenter, you can see where AWS benefits you.

Not all startups have the server and networking know-how to pull that kind of stuff off, or even set it up in the first place.

https://blog.serverfault.com/2015/03/05/how-we-upgrade-a-liv...


Its not like AWS services themselves require zero knowledge in their use (especially if you dont wanna be billed massive amounts)


That's the nice thing about Hetzner, OVH and Co. You don't need Colocation but can instead rent servers monthly. That way you never have to bother with physical hardware, the knowledge needed is purely on the software side.

I also think colocation or own datacenters are a poor fit for most. But dedicated servers are underrated. They can be offered much cheaper as it's simply renting standard hardware and you don't need any of the expertise or size you'd need for colocation.


Fedora just did it for all the infrastructure servers this summer:

https://hackmd.io/@fedorainfra2020/Sybp76XvL


> AWS gives you all the things you'd need to scale, without heavy up-front costs

A startup doesn't need AWS right off the bat. Planning for scale right from the beginning is a way to quickly bleed $. Of course, if you have VC money, why not spend that cash right?

Where I work we've started non-AWS and have continued non-AWS. We don't have a team of devops engineers, but rather a team where there are a engineers who _can_ do devops. I dread the day we need to move to AWS, but it's much easier moving to AWS than off it.


Yeah but that's what capex based stuff (like buying and colocating) is: planning for scale. With AWS you think you need a robust DB and shit and you've made a decision that can be undone in minutes.

You think you want Elasticache? It's a fifteen minute operation. In that time your devops guy isn't going to have even downloaded the binary and figured out the docs.


As I wrote further up, there's not just colocation and AWS. Dedicated servers are a great fit for many start ups and need no capex. You just rent hardware monthly like you lease most of your office equipment. Much cheaper than AWS, no hardware expertise needed and even (manual) scaling over time works quite well, it's easy to add servers within a day or so.

Sure, you'll always have idle capacity, but this way you could use it. With AWS, Amazon runs that idle capacity and charges you for it.


> With AWS you think you need a robust DB and shit and you've made a decision that can be undone in minutes.

If it can be undone in minutes, it isn't much of a decision. A service that can be enabled or canceled on a whim is unnecessary.

Realistically, analyzing the guarantees offered by a cloud platform, your corresponding requirements, and how everything is supposed to work is going to take days, and actually developing and testing disaster recovery procedures is going to take even longer.


You don't necessarily need to plan from scale from the start. Often you just need a few servers, a load balancer, a database, some file storage, internal networking, user permissions, and some level of security/firewall. That is very easy to set up on AWS in a day or two, and you don't need all your engineers to simultaneously be devops experts.

The scale can happen once you've validated your startup, and when that happens it's a lot easier to just turn on 30 more EC2 instances than to get on the phone with Dell.


You probably don't get on the phone with Dell. There's a large area in between EC2 and racking your own boxes in your own datacenter. You can buy dedicated machines from OVH or similar firms with lead time of a few days, and it's a very rare firm that can't predict their load a week in advance .. even for a fast growing company.

Look at it like this; GitHub mostly ran in their own datacenter using AWS only for spillover capacity. They scaled just fine. Virtually no apps have the problem of runaway growth that they can't handle without instantly provisioned resources.


There are some merits to AWS, I can agree with that, and there comes a point where a startup outgrows baremetal / cookie cutter VPS-es, but I disagree with "AWS is trivial" and that it takes a day or two to get things set up. For basic set up like getting a few servers done behind a load balancer and a database - sure - and I'd argue that services like DigitalOcean and Linode are actually easier to set up than AWS for basic services.

To actually do more advanced stuff (the thing that AWS is good for) and utilize tools such as Terraform, you'd essentially need to hire engineers that are experts in AWS in addition to engineers who can do devops, as there's only so much "magic" AWS can provide.


> A startup doesn't need AWS right off the bat. Planning for scale right from the beginning is a way to quickly bleed $. Of course, if you have VC money, why not spend that cash right?

Especially when the alternative is to hear a lot of, "What? You don't expect to scale? I thought you took this seriously."


OVH and Hetzner could probably provide you with at least 100 top of the line servers within 24h (just guessed that, it's probably more). With a reasonable setup, that's much more than any start up will need to scale.


>can afford a modest markup.

I'm with you until that statement. AWS is nothing approaching "modest" in their markup. 20% minimum and typically much higher if you know how to negotiate when purchasing your on-prem gear. And if you happen to be a shop that sweats assets for 7+ years that number starts being measured in the hundreds of percentage points more expensive.


And on bandwidth their markup tends towards infinity - same on Azure and GCP.

As a example, 20TB of egress bandwidth will cost you around $1,750 on Azure, almost $90/TB! Your typical server/VPS provider gives you that kind of bandwidth allowance for free, because it costs them close to nothing.

I have a feeling almost all complaints about cloud costs would disappear forever if they'd only stop gouging customers on bandwidth.

The likes of Hetzner and OVH are starting to offer more cloudy features, like load balancers and block storage. My hope is that eventually they become viable alternatives to the big 3, and they are finally forced to reduce their ludicrous bandwidth prices.


The term 'roach motel' comes to mind when I see their network traffic rates.

"Come on in; there's no commitment. Pay as you go, and did I mention our ingress rates are free! Bring us your data and give the cloud a try. It's the future, you know..."

"Oh you're leaving and want your data back now?"


>20% minimum and typically much higher if you know how to negotiate when purchasing your on-prem gear.

A 20% premium on a $10k/mo server bill is an incredibly small premium to pay versus a single FTE.


I have yet to move a single customer to Amazon that fired the FTE who was managing on-prem infrastructure and chalked it up to a cost-savings move. . The idea of cutting headcount is a fallacy. In small shops the guy that was managing on-prem infrastructure is now doing that (because it turns out you still need switches and phones and routers and firewalls even if you host things in AWS) as well as managing the AWS infrastructure. In large shops you're typically replacing "cheap" headcount (the guy who racked servers) with someone significantly more expensive (a team of cloud architects).


We're a small company who avoided the hire. And we don't bother with a firewall at our office -- the only things on the office network are laptops, phones, printers, and sonos.

Basically, if you model senior eng time as $5k/week -- not even counting the opportunity cost -- you'll understand why AWS.

While I definitely think AWS is very expensive for the servers, they are not overall expensive. Again, setting up CloudWatch in an afternoon or dumping tons of heterogenous logs into S3 and querying them with Athena or spinning up full environment stacks (front end pool, back end pool, load balancers, databases, vpcs, vpns, etc) is an enormous cost savings vs finding, installing, configuring, and maintaining batches of software to do the same.

edit: pay an eng $180k. Fully loaded, you're looking at $220k plus. Add in either recruiting time or a recruiting fee that probably runs us $2k/mo amortized over the first year on the low end. Add in the fact that the first 6 weeks weren't super productive, and also absorbed tons of time from other SE to get this person up to speed. Add in PM time bringing this person up to speed on our product.

Yup, $5k/week.


Or, you know, you could have brought in a consultant part time. If you're small enough that you can handle an AWS setup without someone full time, you could run on a managed hosting setup for less including an external consultant charging ridiculous hourly rates.

Clients on AWS were always my best ones, because they were used to paying so much over the odds I could set my rates much higher.


Yeah, I wanted an extra person on our desperately understaffed team, so I proposed I cut about a million bucks of spending year-on-year from AWS.

They didn't want to increase the head count. The AWS spend was a different silo/team/contract.

Corporate accounting is insane.

It almost smelled like "well, we negotiated this great AWS discount, so it doesn't bother us that we are spending an extra million. We're getting it at a DISCOUNT!"

Is there some MBA trend where every headcount is some ticking time bomb of liability that is 10x worse than their yearly salary?

You spend 150k on servers, they don't improve over a year.

You spend 150k on a decent employee, not even rockstar ninja 100xer, and they will deliver useful, although sometimes hidden, productivity and optimization.


On top of that add the opportunity cost of spending all that time managing/setting on prem infrastructure Vs solving business problems....there’s a good reason AWS and the likes have millions of customers today.


It's such a meme on this site to compare costs of AWS to hardware costs as though the hardware operates itself. I'm sure there are many cases in which onprem actually is a better value proposition than AWS, and I would find it much more interesting if we talked about when to go with AWS vs onprem, but instead we try to make these silly comparisons about the upfront cost of a server blade vs an EC2 instance.


I've done operations stuff for ~25 years now. I've used AWS since it launched. I've done devops consulting for years.

I've yet to see an AWS deployment be competitive on cost with onprem or managed hosting. In fact, I used to make good money helping people cut costs by moving off AWS.

That includes paying more for devops, because AWS is incredibly complicated, and most people struggle to get it set up well.

There are valid reasons to pick AWS. Cost is not one of them.


Cool, this is the stuff that interests me. Comparing the nominal cost of hardware with the nominal AWS costs is boring.


Why? If the primary benefit you're getting from the cloud is VMs, it's valid to compare to hardware. The overhead of bare metal on top of cloud VMs is basically knowing how to handle RAID and set up load balancing.


VMs are certainly not the primary benefit of the cloud; cloud providers offer many services in addition to VM orchestration, but that hardly matters because your own example is illustrative of your error:

> The overhead of bare metal on top of cloud VMs is basically knowing how to handle RAID and set up load balancing.

This implies human beings to handle raid and set up load balancing, which suggests that you need to compare the cost of cloud providers with the cost of hardware and the cost of those engineering resources.

In addition to RAID and load balancing, most organizations/applications also need networking (good luck balancing load without a network), databases (including backup management), TLS, DNS, access management, etc, etc. All of this takes humans to build and operate. AWS services do a lot of this for you. In the on-prem world, you have to build (or buy/integrate) this yourself, but that's not free so you have to account for that cost in your comparison.

You can still make the argument that on-prem is a better value proposition when accounting for the total cost of ownership, but that's a different argument than those which ignore engineering costs altogether.


Yes, you need to know how to use Linux. And as the article astutely points out, there is an undiscussed problem with UNIX skills loss across the industry. AWS provides GUIs and Linux doesn't - accepted.

However for the many, many people who do already have those skills, and for the many pieces of software that are basically web servers + databases, the overhead of bare metal vs cloud VMs or services is basically a bit of Linux sysadmin work. People are acting like you have to hire a full time wizard to even consider using anything other than high cost AWS services but that is wrong: AWS requires learning too, people who understand UNIX are plentiful even if apparently not as plentiful as they once were, and a well run setup will not require constant sysadmin work. A well rounded developer will often be able to do the ops work as well.

Also "bare metal" doesn't mean running your own datacenter. You can buy hardware, send it to a colo and they'll rack it and run the network for you. That's why I say it's mostly a matter of understanding RAID: when the hardware fails (almost always disk or less commonly, RAM units), you file a ticket, blink the HDD LED and some remote hands go and swap out the part for you. Those "hands as a service" come cheap compared to AWS.


A large part of the problem is that most developers have no experience with this, and think server hardware is as much a pain as home PC hardware, which a large portion of younger developers have little experience with too.

They've not seen servers with IPMI tied into PXE boot and a TFTP setup. They don't realise once that server is racked up, we can log in to an IPMI console, power it on remotely, have it boot straight into a bootstrap script, remotely install an OS like CoreOS/Flatcar without manual intervention, have orchestration scripts install the necessary basic services on it without manual intervention, and boom, it's part of your own "private cloud" ready to deploy containers to that may well not require maintenance "below" the container layer for years.

And they don't realise you can contract someone to do this for you for low enough day rates that it'll take you a truly massive setup before you're paying the equivalent of a full time engineer.


100% agreed. I venture that it's a side effect of our industry relying almost entirely on universities and self-learning to propagate knowledge, instead of trade schools. Academics don't want to teach "messy" real world skills like how to actually build a distributed computer with your own hands: there's a very narrow range of skills considered acceptable to teach in the academic world and sysadmin isn't one of them.

Meanwhile we've had 20 years of media and culture telling people that those who get rich are the software guys, never the infrastructure guys, and that the easiest way to get into the industry is by learning web design.

Also, I suspect the industry is exhausting the pool of talent that learned computing on UNIX workstations at university in the 80s and early-mid 90s. The people who lived and breathed servers and UNIX are retiring out of the workforce, and not really being replaced. AWS is stepping in the fill the breach by selling sysadmins-as-a-service, basically, with a thin veneer of GUI programming slapped on top, and some proper documentation writers. Selling pickaxes and shovels to the gold miners - classic.

Finally, I think a lot of firms have developed a "no contractors" policy. I've never worked at a company that allowed contractors and I'm about mid-way through my own career, a bit less. At some point it has become received CEO wisdom that they're building a committed team of true believers, so "mercenaries" aren't welcome. It leads to the kind of thinking you see in this thread where "I don't have the skills, therefore I must hire someone full time, and they must earn as much as me, therefore it's too expensive" ends up leading to "and therefore I must outsource to a firm that 'contracts' for an hourly rate billed via credit card".


The contractor has a problem similar to employee turnover. Contractors make sense when there is a throw-away project (won't need long-term maintenance) that doesn't embody any proprietary knowledge or skill. As soon as a project could expose a trade secret or require on-going support, the contractor becomes a big risk.


If you don't have an FTE in charge of servers, who's managing AWS? If you leave it up to the individual developers you end up with an overcosted mess.


But again, is that comparable? Seems like apples versus oranges.

What’s the connectivity into your 7+ year old asset cost? What about the space, power and cooling to provide it a home?

What happens when your environment gets attacked by a volumetric DDOS, is that 100Mbps circuit going to get overwhelmed?

I’m not arguing AWS is cheaper, simply that there are lots of things around the physical server which also have costs and complexities.


>What’s the connectivity into your 7+ year old asset cost? What about the space, power and cooling to provide it a home?

10Gbe ethernet was standard in 2013/2014. Space power and cooling is company dependent. If they're tiny, a closet in the building or a colocation facility, if they're larger, a room. If they're huge, a datacenter.

>What happens when your environment gets attacked by a volumetric DDOS, is that 100Mbps circuit going to get overwhelmed?

I can't recall the last time a customer got hit by a "volumetric DDoS". End-user facing systems sit behind cloudflare. Internal apps are... internal. Who cares if the WAN connection is getting hit by DDoS when your users are on campus? In post-covid world, they've got multiple tunnels with SD-WAN.

That's ignoring that fact that Centurylink or Comcast or Telia are pretty good at mitigating DDoS in 2020. Just because they don't post about the attacks they endure on a blog doesn't mean they're inept.

https://business.comcast.com/help-and-support/ethernet/comca...

https://assets.centurylink.com/is/content/centurylink/centur...

>I’m not arguing AWS is cheaper, simply that there are lots of things around the physical server which also have costs and complexities.

Sure, and they're all complexities which we solved 30+ years ago.


> Sure, and they're all complexities which we solved 30+ years ago.

The argument isn't that they are solved or not, but the cost of implementing and maintaining them vs AWS.


I never say a whop running on AWS and not having a person managing it. Said person always has enough skill to replicate same setup on a dedicated hardware/VPS. I don't say such shops does not exists, it just I never saw them.

To me AWS killer features are RDS and ability to provision 100 servers in a few minutes. Most of the time scaling is not needed and RDS alone cannot justify the cost of AWS.


For a lot of companies, 20% is an acceptable markup to pay, in return for not having to deal with hardware failures, delays due to ordering new hardware, not always having instant failovers, over-purchasing for capacity/scale, the extra human dev ops costs, etc. If you're well-positioned to handle that, then great, AWS may not be cost-effective for you.


>And if you happen to be a shop that sweats assets for 7+ years that number starts being measured in the hundreds of percentage points more expensive.

If you include the growth in electric efficiency over time, it can be bad to run old servers after a certain point.


As someone who has built out over 30 different accounts in AWS for various clients, this is definitely not entirely accurate. Elasticache is a caching mechanism not a database. Anyone who thinks they are the same has not used them extensively. Also RDS and Aurora both have had their super privileges removed so you might have to possibly overhaul some of your code base just to make it work. I’m sensing some uninformed fan base propaganda here


Many people use AWS because everyone uses AWS. Many of my clients have no need for AWS but still use it, at least until the VC money runs out. Usually then I have to go first, then servers are moving to somewhere cheaper when a new CFO comes in.


I work in an ecommerce agency environment and it can be quite frustrating. Every client, no matter how small, has been sold the spiel and had some guy come through and build a boutique cloud infrastructure, CI pipeline and development VM. As if traffic will increase 1000% by next week with no prior warning. It just doesn't happen like that.

I spend half my time debugging Hyperscale Bullshit™ that the client never needed and that failed the moment the devops guy left.

If you want this stuff, you need a long term tech team, and a devops member on hand. Every time you hand your project around this stuff costs you hours. You also need to tell your devs to use the devops guy like tech support. They built this rickety tower of obtuse config files, make the devops guy fix it and don't let your devs spin their wheels for 4 hours before their assets compile on the Super Cool Distributed Docker VM microservice stack of teetering jenga blocks.


As an ops and infrastructure engineer I cringe every time people talk about trying to automate all these processes early in the life of an organization or if there’s zero way for the company to have explosive growth. Not every company is going to need to rapidly scale up and down and I rarely see cloud saving money unless one’s infrastructure is pretty garbage already. Cloud lift and shifts are like bad rewrites inheriting all the architectural problems and increasing opex for the sake of a “cloud” stamp aka cloud washing.

But I do recommend companies have a means of deploying stuff to AWS and doing basic security there for clients that really require AWS. Having a VPC ready to go costs nothing operational and is good for at least having some semblance of collective skill with a major public cloud provider.

Invariably though when I see undisciplined developers take on colo style hosting systems become brittle and unable to accept changes and with any success it becomes a Herculean task to deploy new features without putting something important at risk. This results in more time spent on systems archaeology for new engineers eventually than creating new features or improving maintainability.

In 2020 I’d recommend developing as much as possible on PaaS type platforms like Heroku, a managed K8S, or even App Engine to avoid more bike shedding that really doesn’t buy a company anything materially advantageous early on like deciding which kind of EC2 instance to standardize on. Until engineers know what to optimize and concentrate upon in terms of process and there’s perhaps tens of thousands in monthly infrastructure costs most skilled (read: expensive) ops engineers won’t really be of much value except offloading work that developers shouldn’t have been thinking about in the first place.


I did work for a YC startup that replaced my ~$100/mo Heroku app with a $5,000/mo AWS stack that nobody in the place knew how to manage.

Their site was so ridiculously low traffic that I did some napkin math and figured out they were paying around $0.10 per web request.


I've seen this exact same thing happen to a company I used to work for. They had an enterprise B2B app that had extremely low traffic. The Heroku app worked fine but the CTO spent a year building a super complicated AWS stack as an exercise in resume building. After the AWS stack launch filed miserably and we had to switch back to Heroku the CTO quit and still has "migrated company to infrastructure as code AWS stack" on their Linkedin profile.


Huh, making a manual AWS stack 50 times more expensive than a heroku setup seems like a real trick to me! I'm used to heroku being more expensive than running it yourself -- as I would expect, since you're paying them for a lot of management.


Heroku is both incredibly cheap and incredibly expensive.

It's only $7/month to deploy a web application. More if you want some of the paid features and a database instance. It all works out of the box with instant deployment out of the box, it's fantastic.

Then it suddenly goes $50 more per gigabyte of RAM, which makes it massively expensive for any serious workload. It's crazy hot much they can try to charge, makes AWS looks like a bunch of clowns in comparison.


If it saves you a FTE or two from managing your own infrastructure, there is a lot of headroom before it's a losing proposition.

Which is what I think the OP misses discussing in as much detail as they could with AWS -- are there ways AWS is saving some customers developer/ops staff time over the cheaper alternatives? Cause that is often more expensive than the resources themselves. Could just be "because that's what we're familiar with" (and that's not illegitimate as cost savings), but could be actually better dashboard UIs, APIs, integrations, whatever.

[I am currently investigating heroku for our low-traffic tiny-team app, after we had our sysadmin/devops position eliminated. The performance characteristics are surprising me negatively (I can't figure out how anyone gets by with a `standard` instead of `performance` dyno, even for a small low-traffic app; currently writing up my findings for public consumption), but the value of the management they are doing so we don't have to is meeting and exceeding my expectations. (We currently manage our own AWS resources directly, so I know what it is we can't do sustainably anymore with the eliminated position, and the value we're getting from not having to do it).]


My experience - from doing devops consulting and moving clients off AWS every chance I get - is that it's bad business for devops consultants to move clients off AWS in terms of short term billable hours, because clients spent more money on me when they're on AWS. If I was after maximising billable hours in the short term, then I'd recommend AWS all the time...

As such a lot of devops consultants certainly have all the wrong incentives to recommend it, and these days most of them also lack experience of how to price out alternatives.

E.g. a typical beginners mistake is to think you'd price out a one-to-one match of servers between AWS and an alternative, but one of the benefits of picking other options is that you're able to look at your app and design a setup that fits your needs better. Network latency matters. Ability to add enough RAM to fit your database working set in RAM if at all possible matters. And so on. With AWS this is often possible but often at the cost of scaling out other things too that you don't need.

And most developers won't do it if you don't give them budget responsibility and make them justify the cost and then cut their budget.

Development teams used to AWS tends to spin up instance after instance instead of actually measuring and figuring out why they're running into limits to keep costs down. Giving dev teams control over infra without having someone with extensive operations experience is an absolute disaster if you want to manage costs.

I work for a VC now. When I evaluate the tech teams of people who apply to us, it's perfectly fine if they use AWS, but it's a massive red flag to me if they don't understand the costs and the costs of alternatives. Usually the ones who do know they're paying for speed and convenience, and have thoughts on how to cut costs by moving parts or all of their services off AWS as they scale, or they have a rationale for why their hosting is never going to be a big part of their overall cost base.

The only case where AWS is cost effective is if you get big enough to negotiate really hefty discounts. It's possible - I've heard examples.

But if you're paying AWS list prices, chances are sooner or later you'll come across a competitor that isn't.


When you're talking about clients spending more on you as a consultant when they are on AWS... compared to what? Alternatives like GCS? Or actual on-premises hardware? Or what? When you "move clients off AWS every chance you get", you are moving them to what instead?

I'm having trouble following your theory of why having clients stay on AWS ends up leading to more consultant billable hours, I think because I don't understand what alternatives you are comparing it to. I am pretty sure it is not heroku, as in the earlier part of the thread?

Or are you talking about compared to simpler "vps" hosts like, say, linode? Doesn't that require a lot more ops skillset and time to set up and run compared to aws, or you don't think it does?


Compared on on premises, colo or managed hosting.

When moving them off AWS it'd usually be to managed hosting on monthly contracts.

Heroku turns expensive real fast. You're paying for AWS + their margins on top of AWS.

Managed hosting ranges from API-based provisioning not much different than AWS to ordering server by server.

In practice the amount of devops time spent dealing with the server itself for me at least is generally at most matter of downloading a bootstrap script that will provision CoreOS/Flatcar and tie it into a VPN and record the details. The rest of the job can be done by simple orchestration elsewhere. I have servers I haven't needed to touch in 5 years other than recently to switch from CoreOS to Flatcar (other than that the OS auto-updates, and everything runs in containers). Once you've done that, it's irrelevant what the server is or where it is.

For modern server hardware, if you run your own colo setup, that's a matter of having PXE and tftp set up once in a colo, and you can then use an IPMI connection to do the OS installation and config remotely, so even with colocated servers, I'd typically visit the data center once or twice a year to manage several racks of servers. The occasional dead disk would be swapped by data centre staff. Everything else would typically be handled via IPMI.

E.g. one of my setups involved 1k containers across New Zealand, Germany and several colo facilities in the UK. Hetzner (Germany) was the first managed hosting provider we found that could compete on total cost ownership with leasing servers and putting them in racks in the UK. Had we been located in Germany (cheaper colo facilities than near London), they'd not been able to compete, but putting stuff in a colo facility somewhere we didn't have people nearby would be too much of a hassle and the cost difference was relatively minor.

Small parts of the bootstrap scripts we had were the only thing different between deploying into KVM VMs (New Zealand), managed servers not in the same racks (Hetzner), and colocated bare metal booting via PXE on their own physical networks (UK). Once they were tied into the VPN and the container runtime and firewall was in place, our orchestration scripts (couple of weeks of work, long before Kubernetes etc. was a thing - we were originally deploying openvz containers and so the same tool could deploy to openvz, KVM and docker over the years) would deploy VMs/containers to them, run backups and failover setups, and dynamically tie them into our frontend load balancers.

We did toy with the idea of tieing in AWS instances to that setup too, but over many years of regularly reviewing the cost we could never get AWS cheap enough to justify it. We kept trying because there was a constant stream of people in the business who believed - with no data - that it'd be cheaper, but the closest we got to with experiments with AWS was ca twice the cost.

For the record, in my current job we do use AWS entirely. I could cut the cost of what we're using it for by ~80%-90% by moving it to Hetzner. But the cost is low enough that it's not worth investing the time in doing the move at this point, and it's not likely to grow much (it's used mostly for internal services for a small team). That's the kind of scenario where AWS is great - offloading developer time on setups that are cheap to run even at AWS markups.

I tend to recommend to people that it's fine to start with AWS to deploy fast and let their dev team cobble something together. But they need to keep an eye on the bill, and have some sort of plan for how to manage the costs as their system gets more complex. That means also thinking long and hard before adding complicated dependencies on AWS. E.g. try to hide AWS dependencies behind APIs they can replace.


Well Heroku is running on top of AWS. So you are basically paying an extra premium on top of the premium from AWS.

Although whether it is worth it depends on your view.

( I keep thinking SalesForces is just not a good fit for Heroku, they should have sold it to MS or Google )


Many (startup) SaaS companies (except metrics collectors e.g) have very low traffic from logged in users.

When I moved from an ecommerce company with 1000 logins/sec to a SaaS company I could not believe the low traffic :-)


Thats just incompetence.


You make it sound as if devops is the initiator of the complexity problems. This might well be in your case. My experience is the other way around.

I'm forced to think about stuff like Kubernetes because the devs are popping "micro-services" for almost anything and the answer to trying to keep things manageable is, hopefully, K8s.

Of course they then send funny memes and argue that K8s is overkill. Yet they have no idea how much a devops guy has to do to actually ramp up one of their multi-gigabyte container.

The whole micro-services mentality is proving to be backwards in my environment. It's seen as the answer to everything. "Oh we have a behemoth, let's just not use this part of the application and reimplement it somewhere else." In essence that's great. However, the grunt of the work is making tests to capture old behaviour/semantics; and who are the (indirect) clients of this piece of code. Ignoring a piece of code to death and reimplementing it somewhere else behind a socket is only part of the solution.

The hyperscalars are benefiting greatly of this mindset imo.


Oh yeah, my point of view is entirely based on my experience with small to medium sized companies, and there especially developers can be the instigators in sudden complexity multipliers.

I tried my best not to rag on devops as I am always incredibly impressed by these systems and what can be achieved and automated. It absolutely has it's place.

It's just that all the technology that big companies use become trendy and then end up used at small companies too. Small companies don't realise that the tools aren't solving the problems they have, and they don't understand the vendor lock ins and dependencies that they've introduced into their platform for no real gain.

This happens in all kinds of ways, not just devops. Your example of developers using the microservice pattern is exactly that. When I see a developer recommend microservices my mind flashes forward to all of the extra complexity that entails and how we've just turned a small monolith on one server into 6 pieces of software across distributed servers. Great for big companies, too much complexity for small companies.


At my company we colo at a few local datacenters and have do deal with a huge amount of pressure from our investors as to why we're not using AWS.

The points that always seem to come up:

* AWS is a known quantity and it's easier to evaluate our business with it.

* AWS provides "outage damage control" because AWS outages make the news and customers are more understanding. When our ISP has issues it just looks bad on us.

* Our company doesn't look as innovative because we're not cloud. Bleh.

Our app is compute, storage, and data transfer heavy but switching to AWS being a, literally, 10x cost for us is apparently not enough a good enough answer.


Also: Investors want you to burn money. They don't want you to save money. If you run out of money and it works they give you more for more equity. If it doesn't work, they can move on earlier.


A startup I was with in 1999 was owned by a guy who build a super scrappy local isp who sold out to a big co which then sold out to cable.

However the startup was all built on oracle and sun boxes because "this is what investors want to see" and we'll get .80 on the dollar if we have to liquidate. We had some nimrod spend 8 months trying to get oracle to run on bare drives for our 100 tps (max) website.

They refused to let us use mysql or linux even tho the owner was very familliar with them from the ISP.

I think we spent 3x headcount on the hardware and software, eg we could have run for another 2 years had we been more scrappy. Not that the business idea was all that good.

We also got .3 on the dollar iirc.


How do you do failover if a server fails or if connectivity to one of those datacenters is lost? With AWS I could just set up a multi-availability-zone RDS deployment for the database and an auto-scaling group for the web tier and be confident that AWS will recover the system from most failures. To me, that is the major selling point of any of the hyperscale cloud providers.


"connectivity to one of those datacenters is lost"

Anecdotal, but AWS had more global problems than the triple connected data centers I've used in twenty years.

I suffered through many rough times, data center connection or power problems were not (very seldom) one of those.

Most of the problems came from apps that we didn't build well to scale or that had bugs (most frequent cause of problems and sitedowns).


> With AWS I could just set up a multi-availability-zone RDS deployment for the database and an auto-scaling group for the web tier and be confident that AWS will recover the system from most failures

"Confident"? "Most" failures? Are you merely hopeful that the probability of a bad failure is low, or are you able to test the AWS resiliency techniques you mention and to ensure that they stay working? At what cost?


My experience with RDS instances that had multi region failover, was that the failovers worked every time we needed them for deployments (I don’t think we ever needed them for RDS failures). The cost though was enormous. Our write db represented the lions share of our AWS cost, and doubling it for disaster mitigation increased our costs by something like 50%. It was mostly worth it from a business perspective when we were less sure about AWS uptimes, but I’m not sure I could keep justifying the cost given how little problems we had with RDS over time.


Physical hardware failures are handled by having everything in VMs and storage handled by Ceph. We can lose plenty of physical boxes simultaneously before we run into capacity issues.

Multi-DC failover is handled by announcing our public IP block at both locations with different weights. It’s technically active/active because traffic can come in at the secondary DC but we have a internal site-to-site VPN that is used to direct traffic to the primary. If the primary DC goes down the secondary starts handling the traffic instead of passing it along. All the database masters flip to the secondary and things keep humming along.

If we lose the site-to-site then the secondary stops advertising altogether and all traffic is forced to the primary.

So we can lose the site-to-site (which is dedicated) or one of the DCs at any time.


> How do you do failover if a server fails

Not sure if you do not know anything about typical ESXi and vSphere setups, but if a server fails, all virtual machines are automatically migrated to a healthy server. And of course, your HPE G10s are compute only, all storage is on the fiber channel connected SAN.


You don’t get 17 server licenses for vSphere or a SAN for $55k as outlined above. More like $500k plus another $100k/year in maintenance.

You’re still paying a cloud provider, but in this case it’s VMware


Yep! We have a KVM and Ceph based setup but it's basically the same as you describe.


From AWS' own marketing propaganda: How much would said clients have to invest in people and hardware otherwise? What if their application becomes an overnight success and needs to scale up fast?

Sure, if it's an established company then using their own hardware (and people to set up and manage it) might make sense; iirc Dropbox is a fairly recent big player that made that move. But otherwise it's a big upfront investment to make, and you can't know if it'll pay itself back.

So sure, AWS can be 2x or more as expensive as renting servers at a company like OVH or building your own datacenter, but it's paid by the minute if need be, not all in one go. If your startup or its VC money runs out in six months at least you can quickly scale down and pull out.


I think 2x is a bit optimistic, if you just compare what kind of hardware you get by renting a dedicated server compared to EC2 it can easily be 5x or more. Of course that compares very different things, and doesn't mean that renting dedicated servers is always cheaper. The comparison gets much, much worse when significant amounts of traffic is involved. And scaling down isn't so much more difficult than with AWS within certain limits.

Comparing managed services like databases is maybe more meaningful than just EC2, but also so much more difficult.


StackOverflow is famous for running on-prem on bare metal Dell hardware: https://nickcraver.com/blog/2016/03/29/stack-overflow-the-ha...

Last time I checked, GitHub was also mostly on bare metal and cloud-free.


They launched in 2008, on a stack built around IIS because that was what their early devs were familiar with.


Is this supposed to be a bad thing?


StackOverflow does predate the AWS world takeover.


How is this relevant? Amazon.com also predates AWS.


The implication is that they had already solved these problems and redoing the stack has its own cost to consider.


For my own startup, I built a small cluster of 17 servers for just beneath $55K, and that had a month-to-month expense of $600 placed in a co-lo. In comparison, the same setup at AWS would be $96K per month. And it is not hard, easy in many ways. Do not be fooled, the cloud companies are peddling is an expensive scam.


Cloud companies are useful as long as you want what their selling. The best case is needing say 2 TB of ram for some workload or test that only going to take a few hours.

Or something like the Olympics where 95% of your demand comes in a predictable spike across a few days.


very true, the original selling point for Cloud was instant upgrade/downgrade as needed. That was the original amazing thing, dials that said RAM and CPU you could turn up or turn down.


If I was doing my own thing I would go the same route as you, but I’m knowledgeable about this stuff, and can manage the entire system (network, replacing bad hardware, etc). It would need to be a very good reason for me to be oncall for that, or else I’d save money by going with something like ovh.


> For my own startup, I built a small cluster of 17 servers for just beneath $55K, and that had a month-to-month expense of $600 placed in a co-lo. In comparison, the same setup at AWS would be $96K per month.

Why would you build exactly the same setup in AWS as for on-prem, unless your objective is to (dishonestly) show that on-prem is cheaper?

Lift-and-shift-to-the-cloud is known to be more expensive, because you aren't taking advantage of the features available to you which would allow you to reduce your costs.


> Why would you build exactly the same setup in AWS as for on-prem...

It was far better to invest a little up front, and maintain at $600 my operations than the same for $96K a month, that's why.

I never "lifted and shifted", I built and deployed, with physical servers, a 3-way duplicated environment that flew like a hot rod. At a fraction of cloud's expense.


I think the point GP was making is that you could have likely started off much cheaper, eg. with 2k/month of AWS costs before needing to "simply" scale at eg. 12 months, especially so if using managed services and not just bare ec2 instances.

I personally think there's room for both, and I think hybrids between on-prem and cloud are the ideal for long running apps: you size your on-prem infrastructure to handle 99% of the load, and scale to the cloud for that one-off peak.

That's still pretty complicated due to different types of vendor lock in (or lock out in some cases). Google has invested in k8s to get people some value for moving away from AWS.


My application had (still would have) very high CPU requirements, and 2k/month would have got me spending more money than necessary. When I started I bought 1 server with the capacity I needed and put that in co-lo for $75 a month. That little puppie was equal to $10K a month at AWS, so why would I want to use AWS again? Just do the math, even 1 server out performs and is exponentially less expensive. The cloud has the majority of engineers looking like morons from a financial literacy perspective.


Are you claiming that you knew exactly how powerful you needed your machines to be, before you launched? Or are your machines running at 25% utilization which AWS would charge substantially less for?


I'm not making any such claim. I'm saying I built a 24-7 available physical 17-server cluster to operate my startup's needs. I had more capacity than I needed, but at the same expense thru AWS I'd not have enough to operate. At less than the expense of one AWS month, I had my entire environment owned outright. How is that difficult to understand?


AWS also gives you a lot of cost savings for using Spot and signing contracts with minimum spends. Only a small shop pays full price for anything.


Okay but then you have to engineer your application around interuptable spot instances. You're also making a 36 month commitment when you sign that contract (generally buying the hardware for Amazon.)


> you have to engineer your application around interuptable spot instances

This is where your ALBs and ASGs come in. If your app doesn't use local writing and you can shift your caching to a shared cache, the cost savings are good.


It's possible that "many" firms do this, but given AWS' growth numbers that would imply they don't really spend meaningful amounts to begin with.

Counterpoint: I've seen a great many false economies with people trying to go on-prem and do alternative hosting because they don't think the AWS premium for e.g. GPU instances is worth it. I don't think that has generally worked out well.


The AWS premium for GPU instances is absolutely not worth it. You don't hear about people running local GPU compute clusters because it's not newsworthy -- it's obvious. Put a few workstations behind a switch, fire up torch.distributed, you're done. And after two months, you've beaten the AWS spend for the same amount of GPU compute, even if only 50% of the time is spent training. Timesharing is done with shell accounts and asking nicely. You do not need the huge technical complexity of the cloud: it gets in the way, as well as costing more!


What if you want 10x the GPU for one month to build a model?


That's the only scenario I can think where it comes out clearly in favor of AWS - you've tested your model in the small on Colab, you're confident you'll need only a few training runs, you can schedule them in us-east, and you can inference on CPU, and you won't need to rebuild for another eight months (when purchased cards become outdated).

It's not an impossible scenario... But imagine the sort of company that trains their own model instead of using a huggingface refinement or an off-the-shelf redistillation. (These can be done reasonably on an average gaming PC, no need for a cluster.) Such a company has expensive human resources. They bothered to get a data scientist and at least a research engineer, if not a full researcher. Were they hired on six-month contract as well? This is a huge expense, so it must be an important differentiator to have built a custom model -- and it's a one-and-done? I don't see it. I think it's going to be an ongoing project, or it shouldn't have been approved in the first place.


"It's possible that "many" firms do this, but given AWS' growth numbers that would imply they don't really spend meaningful amounts to begin with."

I could spend a million EUR a year on AWS without the need for most of the services of AWS?

But it YMMV and 1M EUR/y is not meaningful, perhaps we differ there.


There is no good data to prove either statement. You could def prove that for some large scale onprem for a given type of problem is cheeper than the cloud.


This article may be poorly written, but if you actually read through the whole thing, it makes some brilliant points:

1. IaaS providers are incentivized to create services that lure you with managed solutions that seem like a great deal on paper, while they are actually more expensive to operate than their self-rolled alternative.

2. "DevOps" and "Infra" people charged with making these decisions often follow the industry zeitgeist, which for the moment is whatever AWS, GCP, and Azure are offering. This is in spite of the best interests of the company they work for. The right decision in so many cases is what choice one can defend when something goes wrong, or what you can parlay into the next gig. If costs, performance, and features aren't closely managed, scrutinized, compared, going with a solution like Aurora/Redshift/Dynamo won't be challenged.

3. Nothing is easily comparable because these services aren't just a CPU as you mention, instead they're hardware and software rolled into one. This is intentional, and has little to do with making things easier for you. At best, IaaS providers defend this as "differentiation" but it's closer to obfuscation of true cost. Go ahead and ask your AWS rep if use case x, y, and z will perform well on lambdas and you'll likely get a near canned response that they all fit perfectly, even if y is a job with high load spread uniformly through time. The only way you can make this comparison is by creating the two services, standing them up in production for a week and checking the bill. In other cases such as FPGAs and databases, you'll have a much harder time as not only is the software/hardware offering unique, but the user code will be completely different and require entire rewrites to get what _might_ be a fair comparison.


Anecdotally, my company spends several millions a year on AWS, and they do it mostly so they don’t have to think too much about the hardware every team uses to implement their solutions. I’d say the strategy is an unmitigated success, even if it costs them 3 times more than a similar on-premise (or dedicated) solution would cost.

My last company went the dedicated route, and they were perpetually in need of more servers because nobody was willing to pre-plan capacity. To the point that the database team just couldn’t provision any new databases for a year.


My last company, went the dedicated route, had management that understood how much of a ripoff AWS/Azure/GCP was and hired people who know how a lot of apps are built and run, and didn't need to redevelop them to fit the architecture which costs a metric ton more with little benefit. Press releases are almost always complete BS from companies when it comes to tech, again not all but most, we saved x using some dubious calculation.

AWS makes some things easy sure, but at my house I have what would cost $15k a year and I spent ~1500 all-in including 10G switch(used) power use is a joke, even with expensive electricity where i am AWS still a ripoff. Same processors as AWS essentially as he was quoting, 10G to my home nodes that need it and bobs your uncle. And I do tensorflow things at home too, i bet that would cost 30k/year, i do it for $1500 BOUGHT and for another $500 for a 3080 or a few in my servers and id be saving tons. Yeah it can go down, but just buy two or a rack, its going to be cheaper than AWS hosted colo. I could scale this myself to a few thousand servers, ive seen it done wrong so many times.....

The truth is more like he mentioned in hist first paragraph, if you have a fan boy at the helm it doesn't matter if he's CTO he knows it all, Ive worked with a lot of CTO's lately they don't have a clue and I feel sorry for them, wasting so many company assets because underlings couldn't possibly know more, i love the x google/facebook guys saying basically scrap everything re-architect its funny and so wasteful but at least they are making more money than me!

One guy said he hates Jenkins, some teams have hundreds of jobs in jenkins that work just fine, he said it should all be redeveloped, some of the stuff has been working fine for 10+ years, not sure he has company interest in mind, he want' everything to be "serverless" its the same damn thing effectively genius.

Hire some sysadmins to run your show, the good ones cost a ton, and computers at their core haven't really changed much in 20+ years nor has the fundamental way the internet works. Good experienced sysadmins can save you a ton and its just as reliable. You just have to trust experience over advertising and real math over funny math.


I wonder, what do you do if you want multi-region failover in your home setup?

I’m fully in agreement that owning your own hardware is cheaper. But if you want extreme redundancy as a company now and be mostly certain it works, you move to AWS instead of trying to build a team that will set that all up for you.

Also, when doing tensorflow things on AWS I can just temporarily rent that monster server with 12? GPU’s to train my model in a few hours, instead of waiting for my home server to do the same thing in days (even though that’s cheaper).


Stepping back though, you do make that hardware choice when you select a service from an IaaS. The actual selection is hidden from you, though. It is a tradeoff where the industry has overwhelmingly come out on one side. I think it's time to start questioning this. Instead of hiring engineers with "AWS experience" why not engineers with experience automating systems using OSS tooling?


> You'll pay more than the price difference for someone to implement all of that from scratch.

This is a common trap for startups. AWS has been around long enough that everyone has a 1st-hand or 2nd-hand horror story about someone with a $5000/month AWS bill for a basic website. This creates a false narrative that AWS is bad and or dangerously expensive. Or worse, that clever devops engineers can simply roll their own solutions with open source tools for half the cost.

The reality is that if a startup truly needs $5K/month of AWS services, they’re getting a bargain by buying it from Amazon instead of building it out themselves. $5K/month won’t even begin to buy you another qualified devops engineer to build and maintain custom infrastructure. The first rule of startup engineering is to use every tool available at your disposal to get to market as fast as possible. Cost reduction can come later, but you can never get back wasted time spent rolling your own solutions when an off the shelf vendor could have solved your problem in days rather than months.

However, the other trap is when inexperienced or overeager engineers see the long list of AWS product offerings and think the goal is to use as many of them as possible. There are some misaligned incentives for engineers who want to gain as much AWS experience as possible on their employer’s dime, regardless of whether or not it’s the right business decision. Worst case, mid-level engineers use as many AWS services as possible so they can pad their resumes, then use the experience to pivot into a higher paying job elsewhere and leave the startup with a mess of half-implemented, unnecessary AWS infrastructure and no one to maintain it. Unfortunately, it happens frequently in the startup world where companies are more likely to hire junior engineers who are eager to tinker with AWS and build their resumes.

The brute force way to avoid this problem is to simply constrain the engineering team to a less powerful platform like OVH or DO. If you don’t need any features of the bigger clouds, this works. However, as soon as you do need the big-cloud features you’re going to waste huge amounts of money building them out yourself. It won’t show up on the monthly provider bills, but rather be hidden in the much higher engineering costs and longer project timelines.

The real solution is to hire experienced engineering leadership who will keep complexity in check, then use AWS or other big cloud providers responsibly.


Can you share a story about how AWS helped to remove or redeuce spend on devops? In my experience with AWS you would get $5000/month for a basic website and also would be paying devops to manage AWS.


Absolutely. But they also charge you independently for (almost) all those services. So, you are somehow still paying for a sort of "comfy lock-in" (no native speaker here so try to follow me even if words are not the exact ones) because you cannot access those services from another pure CPU resource on another (cheaper) provider.

Basically, AWS is squeezing as many dollars as possible from your pockets (good for them).


> You'll pay more than the price difference for someone to implement all of that from scratch.

I also think the "pay someone to set this up" factor is underestimated. Person time cost eclipsed computer time cost a long time ago. If you've got the market demand to need that kind of infrastructure, you're almost certainly operating at a scale where you can afford to dish out the extra dough because it will reduce downtime and overhead enough to make the ROI worth it.

I'm not an expert in this (I'm just a code monkey who works with scientists, and side hustles as a small time sysadmin), but reading books like "Release It!" and reading blogs (especially DevOps, as I'm looking at getting into that) gives me the impression that if someone could reduce their costs while still getting the value AWS gives, they'd be doing it. I mean, maybe it's a well-kept secret (especially if it gives a market advantage), but if it's so easy, I'd think everyone would be doing it, and AWS would be out of business.

Edit: I concede that startups probably don't need AWS; what I'm thinking of above is big, old, established companies that believe "no one was ever fired for buying AWS." But I do believe that you have to have smart, motivated people in order to grow while staying lean, and just because we hear the survivor stories of startups that do, doesn't mean there aren't dozens that die because inexperienced devs though they could do better than AWS, and were proven wrong.


I have a small app I run just fine in Digital Ocean, but at work we care about not just geographical redundancy but geographical load balancing. Having user traffic cross the country puts you at a deficit for response times, and if that matters to your company, having servers close to clusters of users is a good thing.

I can do that a little bit with DO. What are my options with OVH? I see them bragging about having the biggest data center, as if I am supposed to think that's a good idea instead of a terrible one.


>I can do that a little bit with DO. What are my options with OVH?

OVH have datacentres in multiple countries just like DO.


There are parts of the US that are still pretty far from the nearest DO data center. If I go AWS I don't have that problem.

I should clarify that I am playing devil's advocate here. I've never been pro-Amazon (although I slipped for a bit there, I'm feeling much better again), they just have answers for some questions that I don't.

Or at least, not until multi-cloud is on everyone's radar. The combined coverage map of any two AWS competitors generally looks more competitive, but we collectively have to 'work' on the people who think that would be too hard.


Sounds like you're listing the enterprise features people pay for. If Amazon sells it so cheaply why don't enterprises go direct to Amazon instead of paying a SaaS provider like you?

It's about economics not accounting.

Once you get into cost comparisons you've already lost. AWS is purpose built for an accounting narrative.

To illustrate what a dead end accounting is, accounting can't explain things like, "I spend money today to possibly make money more than 1 year in the future." That's basically every startup and accounting doesn't have a story for it.

Of course a free tier looks good accounting-wise. How does one "account" for lock-in though? If you can't figure that out you will not convince bean counters to consider alternatives.

And good luck teaching economics (as opposed to accounting) to bean counters.

Here are a few compelling explanations: the cheapest bid (i.e. free tier) is always the worst one. Some people are so addicted to accounting storytelling - as a way to organize their world, a whole philosophy - they are actually always convinced the cheapest bid is always the best one.

Another: AWS is so overpriced, the $50k you spend on developing against AWS services to use $500k of "free credits" will deliver less value than $50k you spend developing for a single vertically scaled beige box computer with simple ergonomics and running it.


Don't know why you are being downvoted, but I agree with your reasoning.

I've worked most of my career in places where technology was king and AWS is certainly technologically ahead of its competitors.

Fast forward a couple decades and I ended up in financial institutions where economics is king and they don't use AWS, not as much as they should, if you listen to the average internet guru.

Not because AWS doesn't do what they say it does, but because economically it's not advantageous, at least until it is.

Which is not at the beginning.

To put it in simple words: they don't ask "are we ready to scale" but instead "are we making money out of this".


Exactly.

That and the 9's. I don't know how good OVH is with outage and availability. Sure the raw compute is cheaper.

There's also storage. He's renting physical drives on OVH meaning he has to take care of replication and back-up himself. The disks in his AWS VMs should be persistent. If his OVH server goes down due to hardware failure, what happens to his disks? I genuinely don't know.

Lastly, OVH doesn't have regions everywhere. Then it's yet another different provider with slightly different product for every geographical region he wants to support.


Allow me to educate you :-)

> I don't know how good OVH is with outage and availability.

I've personally observed failure rates similar to AWS EC2 and EBS. You need to plan for failure, like any cloud provider.

> the raw compute is cheaper

Dramatically so. 20% of the price.

> There's also storage

You can get local NVMe, cheaply, with literally 100x the perf you can get with the most expensive EBS volumes. AWS does not have a storage offering that comes close, not even their instance store NVMe.

> what happens to his disks

VM images can be backed up automatically without downtime (on a schedule, not live.) They offer an EBS equivalent.

> OVH doesn't have regions everywhere

They have enough. us-west, us-east, eu-west, APAC, others. I'll grant that the US regions are operated by a different corporate entity and a different web interface, and that is annoying.


> VM images can be backed up automatically without downtime (on a schedule, not live.) They offer an EBS equivalent.

But if you need to persist transactions you still need to replicate and persist state yourself. Or can I mount an OVH virtual drive that I know will survive across VM migrations?

> I'll grant that the US regions are operated by a different corporate entity and a different web interface, and that is annoying.

Why? Do they at least support the same API? Can OVH actually scale on demand?


> Or can I mount an OVH virtual drive that I know will survive across VM migrations?

Yes, Block Storage: https://us.ovhcloud.com/public-cloud/block-storage/

> Do they at least support the same API?

Yes

> Can OVH actually scale on demand?

Not as big as AWS, but yes.

Remember that you're getting 5x the compute resources for the same money. You don't need to autoscale as much. You're not fighting crappy EBS performance. You're not architecting around remote-network RDS databases with milliseconds of RTT. I firmly believe that for most applications you're going to be better off just massively overprovisioning your hardware and keeping the software stack simple.


That's really interesting. I never used OVH, you could probably tell!


I think the main reason they can afford pricing their services that high is because of peer pressure - probably itself the result of clever marketing, or that would be a really happy coincidence.

I've worked in many startups now, several of them where I was the first (and for a while, only) developer and had to decide on the infrastructure. Each time I was going with OVH, and each time the CEO was trying to push for moving to AWS instead, despite having no clue what the difference may be.

Their problem was that "startups are supposed to use AWS". They were having impostor syndrome. One would come to tell me every month or so how "all his friends use AWS, and they say it's very good". An other one was afraid what possible investors may say when he tells them we're not on AWS.

If people will pay overpriced services to be with the cool kids, why bother competing on price?


Being on AWS, Azure or GPC isn't about the costs being pressured from another department at all.

The cost reflects the premium for integration into the other services.

It is after first employees bounce or the company outgrows that hustle hack everything together mentally that consultants like myself get hired and want to sob that some employee decided to board to OVH as a small startup especially if it is growing.

Every time I've dealt with this it is a disaster or things hanging together by a thread. Hanging together by a thread have sort of gotten better with kubernetes mostly keeping everything running just by turning anything back on when it dies or crashes. The thing is that only until last year did OVH offer managed k8s, those poor startups that will suffer from these choices.

OVH has it's place to be considered:

+40 engineering companies

High out going bandwidth (CDN, video streaming, etc.)

IO Latency requirements

Large enough scale of anything where the cost of AWS egress bandwidth is too costly


Your lack of punctuation and odd syntax makes me wonder, but if I understood your post correctly, you claim that building with AWS is somehow safer / more robust / more future proof than building with OVH? A technical judgement?

If so, I vehemently disagree. I've been a consultant for 10+ years too and seen 50+ companies from the inside, from startups to behemoths – including AWS itself.

Companies running a tight ship around resources were generally technically superior to those using AWS. "Hanging together by a thread" indeed, playing the AWS bingo of "use a flaky soup of 3-letter-acronym-services to cover technical inaptitude".

AFAIR the AWS versions of Spark and Elasticsearch were abysmal to the point of being unworkable. At least two years ago, maybe it's better now.


I've worked as a contractor for a CEO for two companies, in both he pushed for a full migration to AWS. Would not be surprised if he got a kickback from AWS.

Amazon is pushing AWS pretty hard in the C-level, I don't know if you've ever followed one of their certifications or landing pages, but they do their marketing really well.

Anyway, I do think a platform like App Engine / Beanstalk and other quick / easy / no setup deployment tools have a benefit, if you're not good at setting up servers.


AWS allows you to shift your costs from CapEx to OpEx. Companies with low CapEx are valued higher since "theoretically" you could remove that bill by moving to another provider. Financial Engineering is just another part of software engineering and the cloud enables it.


This is not a real saving in my experience. The DevOps time for an app is so trivial. I actually just setup a .Net core app on Linux/mysql.

My Linux experience is old and very limited. I have used AWS for years for other things (S3, cloudfront, transcribe, etc.).

Initially I setup an elastic beanstalk app/separate mysql instance on my own AWS account just so I could quickly deploy (all new to me).

Then I setup the app on my client's VM, had to configure Apache, .net core app, service, mysql, mailing.

I would say the elastic beanstalk stuff took about 3 hours (some problems with IAM and Amazon's visual studio plugin, basically ended up having to use my master key). Setting up the VM server, plus a new way to deploying .net core apps and learning/relearning much of linux took 4-5?

So no significant savings there.

Deploys are a few clicks from VS on EB, and take a little longer to the VM, but only because I haven't bothered writing a script that I estimate would take me 1/2 hour at most, in reality probably 5 minutes.

I have clients on (windows) servers that have been running for 10 years with little intervention from me (had to clear some space a couple of times as that client's app saves large files just in case, but they are all backed up on S3 as well).

TL;DR; in my experience DevOps part of running a startup/small enterprise app is basically trivial, a rounding error, compared to time spent on development.


To be fully honest, for personal work, I use Caprover for DevOps.

Edit: The move from CapEx to OpEx is not about savings, it's often about shifting the costs in your books.


I guess I phrased that wrong. Explicitly, DevOps costs are tiny in a startup, even if you do it all yourself with a bare metal server, and moving 0.5% from pot A to pot B makes no difference.


It all depends on the services you provide.

Some businesses would require huge up-front investments without the likes of AWS. DevOps costs might overwhelm you pretty quickly once stuff like compliance becomes a factor, for example.

Sometimes it's not about the technical issues, but documentation, process and qualifications. In B2B there's plenty of that and just the bus factor [1] alone might force a start-up into considering a cloud provider.

In the end it's not just shifting cost, it's also shifting risk and standards and that may or may not be a critical factor.

[1] https://en.wikipedia.org/wiki/Bus_factor


Except that nearly all companies greater than a certain size have an engagement with AWS so they shifted CapEx to CapEx.


> shifted CapEx to CapEx

or in other words, shifted nothing?


Yes, I think the CapEx argument often advanced by the marketing of AWS and C-levels and engineers of large companies moving to the Cloud is just something said to justify the decision and help everybody get on board with it, but I think the CapEx -> OpEx one is fallacious.

There is others reasons for example the flexibility, the managed services, etc. but I don't think this one makes sense.


In my mind, one of the big (but seldomly discussed) pros for using AWS and especially their high-level services (especially WRT containers) is that they allow rank-and-file developers to do a lot more of the work that was traditionally considered 'ops'. This is advantageous because developer teams don't need to coordinate with a single ops team when they need something, which allows the whole organization to be more agile. Another advantage is that you don't have to hire and develop a high functioning ops competency in your business--you can outsource much of that to AWS and focus your time/resources on more valuable opportunities (in general, I wish the on-prem side of the debate would acknowledge opportunity costs in their talking points).


Startups don't win because they save 50% on hosting costs. They win because they move fast on product development. AWS make the latter much easier due to all of the additional tooling they provide.


If the constituent employees are trying to win in the classical sense, and not the pump-my-resume-and-bail sense.


CEO changes his opinion when money runs out and a new CFO comes in to fix the costs - at least from my experience.


If you need a CFO to look at your IT bill and cut costs, your problem is likely BS title inflation crowding out real work.


I wonder how many billions of dollars worth of unsold cars are just 'lying around'. How many billions of dollars worth of fire trucks are parked right now doing nothing?

Given redundancy and inventory logistics, once your industry gets to some number of billions of dollars of equipment, you're goddamn right there's going to be a billion dollars worth of equipment lying around. It's not the magnitude, it's the ratio that you should get upset about, if there's even anything to get upset about.

There's also rates of consumption versus rates of production. If consumption rates are variable (eg, ramp-up to Black Friday), there is only so much variability I can manage in production. I can produce at a mostly-stable rate all year and let inventory accumulate during slow months, or I can improve my variability, but the cost is more complicated management of people, and increased likelihood of late surprises.

Also, don't data centers collide with public sector timelines? If you have to start building a data center by November 1 and you try to get your approvals all to happen on October 31, you're going to fail, and possibly get fired.

You try to line those up ahead of time, maybe break ground before the next election so they can't stop it, or someone else sucks up the surplus grid power in that substation. If it's built early, it's sitting around waiting for demand to use it. But at least you have it.


Your comment is correct, but not really related to the article. Author not talking about money wasted on underutilized resources, but money wasted on expensive VPS providers like AWS versus budget options like OVH. Honestly it's a pretty misleading title.


Everything relates to everything else. No corporation is giving you things out of the kindness of its heart. They have a plan to get paid.

The price of the Ferrari is not just the price of the Ferrari. It's the price of all of the logistics and inventory management as well.

The extent to which OVH lags behind Amazon in terms of selection and ability to absorb a large order or a new large customer is all factored into the price differential. Putting 'billions' into the title role suggests an audience of people who make million dollar purchasing decisions (lots of servers, and get them today), not tens of thousands of mom and pop shops trying to save $1000 a quarter. But maybe that's just me.

It's one thing if Amazon is price gouging. It's quite another if OVH is undercutting to build market share. We keep arguing in bad faith about the price difference between a company that can sustain their price point for decades versus one that will have to hike their rates the moment we get comfortable, or risk going bankrupt. If it were all the former, yes let's discuss. If it's all the latter, I don't understand how in 2020 we can still be having that sort of conversation. The road behind us is lined with gravestones, and we should know better. Having another conversation of that sort ranks, for me, somewhere between root canal and spinal tap.

Probably it's a bit of both. I doubt very much that it's that OVH knows how to get a $1000 server cheaper than Amazon can do it.


I got the same feeling after reading the comment.

It is more like "why everyone is driving ferraris, when they could buy toyota yaris", maintenance on ferrari is super expensive so if you have to drive to work and back it is not a smart expense.

Same with AWS vs OVH vps, where with AWS you have a lot of power under the hood but you use it mostly like you would use single OVH vps that is also not smart use of money.


I'm in awe that the comment was written entirely based off reading the title


The article does discuss ratios. "Billion" is just the attention-grabber.

> If the first server, the one that is better in literally every way, costs ~16k/year... how much should the other one cost? Well, maybe 10, maybe 12, maybe 14?

> I don't know, but the answer certainly shouldn't be "Almost twice as much at 26k/year", that's the kind of answer that indicates something is broken.

> In a worst-case scenario, AWS is ~1.6x times as expensive, but again, that's paid yearly. If we compare paid monthly to paid hourly (not exactly fair) we get 37k vs 16k, if we do some napkin math calculations for equivalent storage cost (with equivalent speed via guaranteed iops) we easily get to ~3k/year extra. We have a 40k vs 16k difference, the AWS machine with the worst specs is 250% more expensive.

> But whether the worst AWS machine is 160% or 250% as expensive as the OVH one is not the relevant question here, the question is why are we seeing those astronomical differences in terms of cost to being with.


Definitely need to consider spikes in demand too. A perfect example of this was covid 19 supply shocks. Supply chains got a lot better at just in time delivery and didn’t keep much back stock, but with covid this broke down when everyone went out shopping to stock up for lockdown and work from home.

Edit: s/panic shopping/shopping/


It's not "panic shopping" if overnight I transition from eating half my meals at work to eating all my meals at home.


While this also happened, there was definitely some level of panic buying that happened in the initial stages, at least in the US. The toilet paper/paper towel shortage was pretty much driven by pure panic buying, and you had reports of crazy lines outside of big bulk supercenters like Costco.


I also disagree it was really panic buying.

We were all being warned we might need to self-isolate for two weeks if we caught covid.

So suddenly everyone needed 2 weeks of spare stuff, on top of their normal shop.

Hence the sudden shortages.

I found it quite insulting that the media decided to crow about panic buying, when we'd essentially been told to do it.


No, people really did go overboard. Unless you are running a small orphanage no one needs a Costco cart stocked double high with only toilet paper.

There were also a fair amount of people thinking they could be smart and hoarding to later price gouge, but this mostly fell apart and there were articles about how people were mad at Costco's policy of not allowing returns on toilet paper and paper towels.


I still disagree with you, there was a legitimate demand spike for more supermarket toilet paper.

Working from home = more poos and wees at home. Less consumption of industrial deliveries of toilet paper to companies, much higher home consumption from super markets.

So again, not a panic, but a legitimate, overnight, shift in toilet paper consumption habits that resulted in a long lasting shortage of toilet paper in the supermarket JIT supply system. They were not prepared for a jump of 50-100% more home toilet paper consumption.

No panic, but a simple explanation of the shift in consumption from industrial, bulk deliveries of toilet paper to your company office, to buying more of your own at the supermarket.

Yes, there were some opportunists and crazies with a shopping cart full, but they don't explain why it took months for shelves to finally get fully stocked.


It was panic shopping when people were hoarding flour and yeast though, for instance


I understood why the rice sold out. I understood why the flour sold out. I understood why the beans sold out.

What I don't understand is why the instant ramen sold out at the same time as the rice.


No, people were bored, stuck at home and started trying home baking. It was doing the rounds on social media. So there was suddenly a massive demand for flour and yeast.

No panic, just a simple fad that they couldn't supply enough to meet demand.

I read that in the UK at least the actual problem was they had enough flour, but it was all in industrial bags and they did not have enough packaging to rebag it.

Actually another victim of just-in-time supply lines, rather than any 'panic'.


The fad was also for sourdough bread which requires a lot of flour before you make any loaves and constantly to keep the starter alive.


And half of your bathroom breaks...


Ok, edited.


One big thing not mentioned here: the massive collection of managed services AWS gives you at no cost.

The biggest example is probably IAM, which makes it easy to control which servers can take what actions on which other services. And can also be integrated directly into external tools like Vault.

Want to use service discovery? No need to set up Consul/Istio/etc., just use the managed service. Same with load balancers, and VPCs, and SSM, and route53, etc.

Sure, in 2020 none of those services are that hard to replicate pieces of, and open source tools abound. But setting up those tools all takes time.

Only other nit is that the article makes it sound like Terraform and IaC is meant to abstract away AWS vs GCP such that one terraform module _could_ be deployed to either just by changing some string value from "AWS" to "GCP". I don't believe any serious (and good) efforts are being made in that space, and you won't find any popular modules like that on https://registry.terraform.io/.


> At no cost

That's so ridiculous I'm not sure how to reply or even interpret your comment.


I can't tell if your comment is a particularly rude way of picking the nit that "nothing is free, the cost (e.g., of IAM) is built into other services" or if you really find it absurd that the nominal price of many AWS services is $0 or something else entirely.


I apologize for my rudeness. I wasn't adding anything with that comment. I did find the parent comment jarring, as if it was willfully ignoring the downsides in an effort to promote AWS. That's in a comment thread that otherwise seemed like a candid discussion of the pros, the cons, and how it's a good deal for some but a worse deal for others.

Of course, the cost is built into other services. As pointed out, AWS can get very expensive for many use cases, and that's exactly what you're paying for: access to managed services.

Beyond that by developing on AWS you are taking steps to lock yourself into using their system - your configuration isn't portable to other services. So the time/manpower you spend configuring AWS-specific things is another cost associated exclusively with its use.


The nominal price of $0 absolutely is absurd, since every individual service has a separate pricing chart down to the ELB.


I can't find a pricing chart for IAM, which is the cited example.


IAM isn't a service. It's how you govern access control. Its nonsensical to charge for this, as an "IAM service instance" makes no sense.


You don't seem to understand what a 'service' is. In particular, you seem to think a service is something that operates on resources called 'instances'. Here's the Wikipedia definition of a web service:

> a server running on a computer device, listening for requests at a particular port over a network, serving web documents (HTML, JSON, XML, images), which serve in solving specific domain problems over the Web (WWW, Internet, HTTP)

Clearly IAM is a service.

Further, the service needn't operate on resources called "instances" in order to provide value--few AWS services offer "instance" resources and yet they deliver value to customers and are consequently priced. IAM isn't priced directly, but rather the operating cost for the service is built into other AWS services, presumably to encourage people towards security best-practices (charging for IAM might dissuade users from building securely).


There are many free AWS services. Off the top of my head:

- IAM

- VPC

- ECS

- Several SSM services

- CodeDeploy


Maybe they mean that if you're already using some services, you don't need to implement another in the same way you would when self-hosting.


With the price of pro fiber (redundant with SLA) I recently moved some apps back to our own servers in house. This did cut the price down dramatically. I would not recommend it for super critical apps (except if you have your own state of the art data center, but I am not speaking about that), but having 5-10 servers in a secure cabinet will give you infinite flexibility for little cost. We pay around 5000$/y to keep running about 50k$ worth of hardware. It's 10-20 times less that what we would pay on AWS. Of course this approach has limits, but it can work in some scenarios and should not be "de facto" dismissed.


I like the idea, but remember that the SLA with your fiber provider doesn't mean your fiber magically will come back within a few hours when someone in the street cuts through the cable by accident. Your clients will still experience downtime unless you have two fiber lines going in different directions - completely redundant. If you don't have that, I would recommend some kind of a licensed microwave link on your roof as a failover.


We have a fallback coax connection going another way, it's only 500Mb/100Mb/s but it does the job if the fiber is cut. We are also working on having a 5G fallback just in case, as 5G bandwidth can be quite high on short range.


Sounds like you have a pretty decent set up then. With all the redundancies in place (and hopefully power redundancy too, batteries, generator) you can run mission critical software from there.


You always have a DR offsite, even if you are hosting on Big Cloud that would be sensible to setup.

Your RTO and RPO needs will dictate your DR setup in both scenarios. Most apps can take few hours hit if the alternative is spending 2-3x .


Yes, and we also have off site replication using ZFS send/recv. Which would let us restore everything in a few hours in case of major disaster.


My current employer has the same setup (my colleagues are mostly linux nerds so comfortable with managing their own hardware), mostly because of security but I can imagine cost is a big factor as well. We'd need a development VM for each developer, additional VMs for nightly installs, a build server farm, and systems for hosting git, project management, etc.


How do you deal with data egress costs w/r/t AWS though?


We don't use AWS.


I get it that you’re indie and running web services from home? If you’re willing to share, I’d love to see what kind of apps (that aren’t “super critical”) one can from a cabinet from home.


"in house" doesn't necessarily mean in a residential home; it refers to "on premise" more generally.

A previous company of mine did the same thing - they converted a maintenance closet into a server closet. Even with renovation costs to improve ventilation and electrical load to support the use case, it worked out substantially cheaper than cloud hosting. A few things we ran on it:

- A large data infrastructure. We had an EDI[1] side of the business, and egress bandwidth costs would have eaten the product margins and then some. A lot of traditional EDI happens over (S)FTP, and customers only periodically accessed the system (daily/weekly/monthly/quarterly, depending on the customer). Most enterprise EDI systems have retry logic built in, so minor amounts of downtime weren't relevant. If the downtime were for more than several hours, we could cut over to a cloud-based backup (which was fairly cheap to maintain, since ingress bandwidth is generally free).

- Our analytics environment. In addition to standard reporting, we also used our analytics toolset to create "data utilities", allowing powerusers on the business teams to be able to access bulk datasets for their own downstream processes. The bandwidth usage would have again been cost prohibitive to cloud-host, plus the data was co-located on-premise as well.

- Our B2B website. Traffic volumes were minimal, and it was primarily a static website. So hosting it behind Cloudflare added enough uptime guarantees for our needs.

- Dev environments. Both dev environments for all of the above, as well as something similar to LocalStack[2] (it's been a while, not sure if that was the tool used or something else) to mimic our AWS environment

For all of those, less than a day of downtime had negligible financial impact. And downtime more than a day was a non-issue, as we had off-site fail-over plans to handle contingencies longer than that.

We also operated several services and applications where every single minute of downtime created a visible impact on our financials. Those were all hosted on AWS, and architected with redundant and fault-tolerance built in.

[1] https://en.wikipedia.org/wiki/Electronic_data_interchange

[2] https://localstack.cloud/


Thank you for the info. I got carried away mistaking “in house” for running a business (and hardware) from home!


To add, although not quite the same as running a full server closet, I've also repurposed an old laptop as an always-on server and run a handful of web services for both professional and personal purposes from my home connection:

- Some utility applications I maintain for consulting clients. These tend to be incredibly low volume, accessed a handful of times a month at most.

- I host some analytics infrastructure for clients. It's for reporting, rather than data capture, so latency and uptime aren't super critical.

- I run a personal zerotier[1] network, which I use as both a virtual LAN across all my devices hosted everywhere as well as to tunnel my laptop and mobile traffic when I'm not at home. My internet gateway is hosted at home, so all my mobile and public wifi traffic routes through my home connection.

- I do a minor bit of web scraping. This is fairly low volume, and a mix of both personal and professional work. If it was higher volume I wouldn't host it at home, due purely to the risk/potential hassle of IP blocks and complaints.

I have a fairly stable symmetrical 1Gbps fiber connection (plus a block of static IPs). It's a residential connection so has no firm SLA, but still achieves ~500Mbps when experiencing "congestion" (in whatever form that means for fiber to the premise) and has only been down twice (once during a total power outage and another an attempt to use the parental control features for one device resulted in every device being MitM'd by an invalid SSL cert).

I also have a mobile hotspot I use when traveling for work, which I have connected as a failover ISP when I'm at home. This covered both instances of downtime I've experienced. And in the case when it doesn't and a client needs to access a service, I maintain external backups that I can spin up on VPS somewhere in a pinch. Probably not enough guarantees for a primarily SaaS based business, but has worked without a hitch when user-facing SaaS websites/apps are only a minor secondary component of the work.

[1] https://www.zerotier.com/


As a developer at a big company, if I try to buy a server, then I have to deal with my IT department. I don’t get to buy what I want, have to deal with particular overpriced vendors, and the process slows to a crawl.

If I want to use AWS, however, I get instant gratification. And of course I can experiment with different models and then refine into cheaper service mixes as scale increases. As base load emerges, I can use reserved instance pricing and shift that onto our capital budget. And of course I can control everything with code using AWS CDK.

At a smaller company where I was able to control the IT process more, it might be worthwhile. But the speed and optionality I get with AWS is vastly more cost effective than dealing with legacy enterprise IT processes.


This is probably the real reason for bundled cloud provider popularity. It’s way easier for competent engineers to cut out the brigade of uninformed and cheap managers of the IT budget when you distill your specs into an AWS credit card charge.

For a small startup of competent people this is unneeded, but those companies typically succeed or fail into mediocrity eventually.


It's only 80% of the reason. The other 80% is all the time wasted dealing in the physical world, moving and plugging servers.

You could have the most competent engineer fully backed by the company, it's still going to take weeks to procure a goddamn server because it takes time to fight DELL and ship to the colo and have remote hands put it in place and set it up and a thousand more things.


That's why I prefer renting servers. The company manages networking gear, guarantees having spare parts and someone on call to replace things.

If your requirements aren't too exotic, you can get the server in place in a day.

Compared to big cloud, you get much better CPU and RAM for your money and usually unmetered gigabit connection.


for sure, the friction of managing physical servers is much higher. it would be worthwhile if a company commits to managing those physical assets well, which very few would commit to doing


> As a developer at a big company, if I try to buy a server, then I have to deal with my IT department.

This. AWS is a way to put control back into the hands of developers and work around the internal IT mafia. People are quick to forget how that was like.

For startups, AWS is either a really dumb idea or a valuable tool to go faster than the competition, depending on funding and growth rate.


>>> People are quick to forget how that was like.

I would postulate that the author of the article has no idea what's it like. The stories are happening in small companies that are too small to have an IT mafia or any sort of procurement process.


I think your core argument is in the right place, but: What Big Company enables typical developers personal provisioning on their cloud accounts?

I've never seen this before. Its always handled through an operations team. The cloud definitely simplifies the operations team's job, and you may be able to get something more specific faster than whatever gray box the IT Of Years Past has available, but I don't think you're getting around the Cloud Boilerplate of worrying about IAM, CloudFormation/IaC, repeatability, security, granting access, cleaning up...

I think this position is missing the bigger picture; it shouldn't matter where you've got something deployed. That's the Operations Team's problem. Instead, we need to talk about, lets call them, Functional Primitives. For example: If you want a docker image deployed w/ x vCPU y memory z replicas etc, behind a URL. That's a functional primitive. It shouldn't matter whether its in ECS or in Kubernetes in the closet or wherever. That's on the operations team, and its also on them to get that provisioned quickly for you.

In other words, you're still thinking that its the operations team's job to provide servers, so of course the quickest way to get a server is the Cloud. But that should not be their role; their role should be to provide capacity at a higher level of abstraction.


Mine?


The correct answer to "What company would provide developers direct access to raw compute resources in the cloud" is "one that is deeply misguided."

At the most simple level, it seems like progress. Now every developer can get their own server and push their code, test it, do whatever, think of the agility. But, realistically, all that's happened is the creation of hundreds of silos, each configured differently, possibly being abused, possibly containing restricted data, mis-configurations leading to breach, denial of service, running up the company card, all sorts of bad things.

In other words, it doesn't matter whether those raw compute resources are available in the cloud or in a company data center. The cloud enables developers broader access to provisioning, but there's substantial evidence that may not be a good thing. A company data center is a 1780s-era musket; the cloud is a M249 machine gun.

That's why we need to talk in abstract functional primitives. Developers shouldn't worry about which TLS algorithms are accepted, or that the storage bucket they want has proper read/write access; not just because a ton of this stuff is very arcane and domain specific, but also because humans will ALWAYS get it wrong unless its managed at a higher, centralized, and totally automated level.

And at that point, raw access to AWS doesn't make sense. Developers don't actually want an EC2 instance; they want their app running on a server. Let Operations (or Heroku, or whoever) handle that for you.


Perhaps this is the right model for a particular scenario with which you’re familiar, but it doesn’t match my scenario and is kind of patronizing.

Also, developers should absolutely worry about which TLS algorithms are accepted.


Really? Alright then, right now, off the top of your head: What are the set of TLS ciphers which are generally regarded by the security community to represent the highest security standards?

I have no clue. The AWS ALB 2016-08 security policy allows for ECDHE-ECDSA-AES128-GCM-SHA256 ECDHE-RSA-AES128-GCM-SHA256 ECDHE-ECDSA-AES128-SHA256 ECDHE-RSA-AES128-SHA256 ECDHE-ECDSA-AES128-SHA ECDHE-RSA-AES128-SHA ECDHE-ECDSA-AES256-GCM-SHA384 and another 16 or so.

Of course, that's the default ALB policy. I didn't know that. There's a more strict TLS policy available: TLS-1-2-2017-01. Do you know what the difference between the 2016-08 and TLS-1-2-2017-01 policies are? I don't. I could look it up. Well, beyond disallowing TLS 1.0 and 1.1, TLS-1-2-2017-01 also disallows the ECDHE-ECDSA-AES128-SHA ECDHE-RSA-AES128-SHA ECDHE-RSA-AES256-SHA and ECDHE-ECDSA-AES256-SHA ciphers. Cool.

This shit is VERY domain specific. Its arcane. The way I worded those last three paragraphs was patronizing, to establish a point: No one knows this off the top of their heads. This naturally leads to two things I believe are true at any company (beyond a certain "garage"-scale): As few people as possible should worry about this, and even those people should encode it in some kind of automation that guarantees they don't have to be hands-on when configuring future resources which need this information.

Otherwise: YOU WILL GET IT WRONG. Guaranteed, at some point, maybe today, maybe in eight months, if you let every developer at a company worry about TLS ciphersets, one of them will screw it up. That's not patronizing; that's human nature. We're fallible. We don't all have massive domain expertise, and even the ones who do make mistakes. Manually configuring things is a guaranteed recipe for mistakes.

I'm using ciphersets as an example, but these things are everywhere. I don't trust any developer, including myself, to remember that by default most JWT libraries allow an alg of "none" to pass verification. I don't trust anyone to remember that S3 buckets by default allow per-object public read settings without a separate non-default bucket-level setting to block it. I don't trust anyone to remember that SQS FIFO queues only allow one concurrent reader per read group (nor understand what that means in practice, because this issue is Literally a weekly thing on /r/aws), or to know that hosting a public website on S3 is startlingly easy to DoS via egress network charges and AWS wont refund it.

Capital One, one of the biggest banks on the planet, was hacked because of a misconfigured S3 bucket. I have relieved myself of the hubris of believing I can do this on my own, a hubris every developer needs to relieve themselves of. Encode this stuff into automation and have every eye you can find inspect your changes. And while there's room to give developers a lot of power in this setup, part of that is not "here's an AWS account, have fun".


I agree that developers make mistakes and humans are fallible. Why then do you want me to list permissible cipher systems off the top of my head? Of course I would look that up. (I’ve got a JIRA issue from a couple of years ago about this topic.)

By saying that I want developers to worry about TLS algorithms, I mean that I want developers to worry about everything and understand that they have shared ownership of the production system.

I lead a small dev team working in a reasonably sized division of a large company. We’re accountable for both operations and development, and we’re security-sensitive (I have domain expertise in DFIR). It’s my ass if we screw up any of the ways you cite. We mitigate that risk in a lot of different ways, and I try to seek out new things we can do to ratchet up the quality of our build products.

Again, I think your insistence that things be a certain way doesn’t fit my situation. Have a good evening.


My last gig spent way too much on AWS, as percentage of own revenue.

Years earlier, old brick & mortar company brought in tech consultants to rapidly pivot to e-commerce, who then farmed out most of the work to InfoSys.

My team's own spend was ridiculous. Our hottest data set, which could easily fit in RAM, was on DynamoDB. So much "event sourcing" and CloudWatch. Because you needed all those logs for troubleshooting such a brittle system, right?

And since our core function was the recommendation engine, of course we hoarded and munged data like gnomes, with almost no value add, result negative ROI. (+70% of our "lift" was from the customer's own "recently viewed" list. The highest cost recommendations accounted for less than 3% of "lift".)

There was some director level push to migrate from AWS to Google. Which meant first Kubernetes and then Google. Such fun.

There was next to zero consideration of questioning assumptions. Like unwinding the unholy tar pit of mainframe era batch processing mentality fork lifted onto cloud era web services hotness.

And I don't think they could question their assumptions. The skills and experience of the traditional brick & mortar types simply couldn't.

This "consultants ruin everything" story has played out across all industries.


I work at a company that markets itself as a "premium IT service provider", and I think our cheapest VM is maybe 3x to 5x what the equivalent OVH or Digital Ocean VM costs.

What you get, is:

* actual humans to talk to, when things go wrong

* hosting in central Europe, without dubious ownership structures

* you actually get the VM specs you payed for, no overbooking of hypervisors

* colocation in the same datacenters as the VMs.

* custom deals. If you pay enough money, you can install cages with your own access control in our datacenters.

* architecture support and consulting

* you can visit our datacenters if you really want

* lots of certifications that will make your compliance department happy

* if the auditors come to you, and request access protocols to your servers, we can provide those

... and so on.

Why do I list all that? Because cost is just one aspect to consider, and some business have good reasons to optimize for other aspects.

Some value flexibility in their service provider, some value physical proximity, some value constant contact persons over multiple years.


We tried running things on AWS and digital ocean. Yes, AWS offers a lot of extra value with all the other services. But at the same time, we had a relatively simple application. We ended up managing our own sets of dedicated servers for a fraction of the cost. Peformance is way higher. In the end, the cost difference was just too big to justify it when we were trying to be scrappy. We also realized that all the extra features offered weren’t being utilized as our infrastructure requirements were simple.


So for you DO was cheaper, it's funny that I'm actually looking to move from DO to OVH, as DO pricing is around 50% more expensive than OVH. My main concern right now is the worse UI and the lack of more pre-built images that allow you to spin-up VPSs quicker.


Nope, it's OVH for us as well as this point.

We have about 8 or 9 dedicated servers at OVH right now. Things we've noticed:

- Yes, UI sucks

- No autobilling

- Reboot times are slow (as the actual machines are reboooted, not a VPS)

- You need to upgrade to higher bandwidths for great speeds

- Some routing problems, very rarely

- You really need to know what your server needs are 6-12 months in advance

- Some product offerings way too expensive (eg. Loadbalancer)

Great things:

- Price. Net-net, still cheaper to buy 2-3 dedicated servers than a single beefy VPS

- A lot of sales, like every month

- You get the full resources of the whole server

- Reasonable support for network or hardware related issues

- Unlimited traffic (at least you don't need to worry about normal usage overages)

- DDoS protection

Other thoughts:

In terms of performance, dedi's work great. I've found their OVH Public Cloud instances (not the cheap one, but the more expensive ones) have not as great performance. Might as well buy a dedicated. The only pro is that you only pay for the time you use.


Thanks a lot for the information!

I was mostly looking to get some VPSs from them, the $10.58 2vCPU, 4GB RAM ones to replace the current DO $15/mo 2vCPU 2GB RAM I'm using, as I would get better specs for a cheaper price.

It's a bit hard to tell without actual benchmarks if the performance will be better or worse than DO.


Yeah, I would avoid their VPS. I benchmarked their higher end VPS and they were still not great. Ran a few as well on Vultr, Linode, and DO and ended up going with none of them. Linode performed the best for VPS for me, but price-point wise, it was a better deal to head straight up to dedi's on OVH.

I still run small VPS instances for various workers and what-not, but straight up app and db servers go to OVH.


If I go for dedi, do you know any tool to simulate running multiple, independent VPSs on a dedicated server? I was thinking maybe something like creating Docker containers on the dedicated server, but I'm not sure how the management/provisioning/resource sharing would work, plus I assume it would still need extra IPs and stuff like that for each VPS.

It would be nice to have like a Control Panel where you can link dedicated servers, and then provision VPSs on them using an UI like the one on DigitalOcean. But I guess this would mean having a complete DO competitor (without the hardware part) :D


Do be clarify: you want virtualisation environment, hypervisor? If yes, then you can go with VMWare ESXi or Proxmox. Both have web based consoles to add and manage virtual machines. Like you mentioned, you need additional IP-s, Hetzner has 6 (/29) for €6.72 per month [0], OVH has one time fee €2.00 per additional IP[1].

---------

[0] - https://docs.hetzner.com/robot/dedicated-server/ip/ip-addres...

[1] - https://www.ovhcloud.com/en-ie/bare-metal/rise/options/


Promox is great, as the other reply mentioned.

We don't use any virtualization like that. All bare metal as we are running a single app.


Bare metal destroys AWS and other big managed clouds and even smaller cloud companies like Digital Ocean if all you need is compute, storage, and bandwidth and want to manage it yourself.

Bandwidth is even more extreme than CPU/RAM. There are high bandwidth bare metal vendors that basically sell like rack vendors by capacity, not quantity, and offer bandwidth at stupidly cheap rates compared to any cloud. ZeroTier uses these for root server clusters and get tens of gigabits for <$1000/server... and you can actually use that and it's fine. Vendors include datapacket.com, reliablesite.net, fdcservers.net, and others. Totally no frills though. They provision it, turn it on, and give you access. Also their TOS may have specific prohibitions, so read it.

The problem is that you have to manage it all yourself, which can be challenging and distracting, and these are no frills services with no add-ons like managed or specialized databases or S3.

That being said the cost savings are so extreme that you should consider it. It's ultimately a spreadsheet decision. Compare cost of hosting vs cost of DIY hosting plus labor for your specific application. Some apps benefit more from managed cloud than others.

There's also the multi-cloud option. You could host things that benefit from AWS or GCP there, and host the bandwidth or CPU-intensive stuff on bare metal.


Aside: thank you for ZeroTier. Fantastic software that I've been recommending to everyone.


I would put the estimate of "server lying on the ground" at more like tens of billions of dollars.

Here's how I back in my math:

1. AWS is a $44 billion run-rate business. 2. This estimate from the Duckbill group has EC2 at 60% of the AWS bills of "large accounts". https://www.duckbillgroup.com/products/public-cloud-analysis... 3. I used to work in EC2 Networking and looked at the graphs with the server counts for all of EC2 on a weekly basis, so I can't say the actual numbers here, but I have a very good idea how big the EC2 fleet is. 4. 60% of 44 billion gives us an estimate of 26 billion run rate for just EC2. 5. In my time at Amazon where I was on multiple teams that ran large fleets (thousands of machines), and I heard about utilization numbers for services like Lambda and Fargate, 50% utilization would be a very, very good number for VMs in the cloud. 6. 50% of 26 billion is over 10 billion per year of wasted server capacity (that people are paying AWS for)

So yeah, tens of billions of dollars of AWS and the other cloud providers revenue are likely "waste" from the POV of customers.


Um, the real reason CFOs go for cloudserver solutions is tax reasons. The tax breaks for "leasing" instead of owning, are a cheap way of an exec looking good to a board (plus it's easier to do creative accounting). I promise you, this accounts for easily 70% of equipment leasing/renting agreements, no matter the industry. Construction, printing, food industry, etc. They get a full tax break on the lease and on the service agreement tied into it. This looks good in the short term, but is obviously shit in the long term. I've been apart of too many arguments with clients about this. The person who signs the cheques always mentions the tax incentives and the lower initial yearly costs. But then they lose their ass 5 years down the line. You can argue tech reasons all you want, but it's ego driven money that drives the cloud industry.

Wile I agree mostly with the author... wtf is all the hate with C# I see from people? lol "whatever the heck people use C# for". This pops up so often. C# is the F-350, Catipillar, Kubota of the tech world. Yea, it ain't sexy, but it's meant to just get work done. When it works, no one notices. But when people screw with it, yea, it's easy to laugh at like the videos of crane construction fails (mostly operator error is to blame). While other tech might be the Bentley and Ferrari, they're either all looks and no muscle or catch on fire when something minor goes wrong (looking at you MongoDB :P).


I've assumed this for a while, because so many companies appear to be doing this - do you have any links to the tax savings as I'd like to understand this in more detail?


If you don't need what cloud providers offer you, from hosted databases, access control, blue-green deployment, load balancing, auto-scaling, multiple detacenters, etc, there is little point in going with something like AWS.

OTOH, if you decide to do it yourself, you'll need to engineer a lot of what you need. That costs time and money. In the very low-end, it's cheaper to go bare VM. As size and complexity grows, it'll be cheaper to go with a cloud provider for a while, until you reach a point where you have so much infrastructure and so many services running that moving parts of it to on-prem will be the cheapest option. At this point, you will be your own little AWS.


A good base search would be "lease vs buy equipment tax benefits" for your country/state/city, just to get your feet wet. There are different incentives for different areas. From there, you should be able to google fu around for more information/tips/tricks. It's slightly not-straightforward stuff, but it's not terribly complicated after a little bit of reading. However, there's a reason why tax code books are so dense. When in doubt, ask someone qualified that's smarter than me, CPA or whatever else accountants get for certs.


Thank you!


Interesting, but somewhat incomplete analysis IMO.

If you are running a billion + dollar budget for compute spending, you’re surely going to negotiate pricing with your vendor, and while that won’t bring things to parity, it will bring them much closer together.

If you are spending this kind of money, you’re likely doing a lot to get your workloads into some reasonably geographically aligned areas, and if you’re peering with other services, they tend to be running on AWS, which means choosing another provider can significantly increase those latency costs.

While we’re on the subject of bandwidth, what do you think data transfer pricing will be between your other cloud provider and this high compute instance you picked up on the cheap? Odds are they will more than negate your cost savings, and again at a latency cost.

Let’s say you really want to save huge amounts of money though. The easy answer is probably in moving to a spot instance targeted architecture where you can typically buy the same server from AWS for less than 20 percent of the on demand price. You can always fall back to on demand when they are not available.


Personal observation:

I purchased a company that was locked into cloud infrastructure. They had received over $10M in venture capital several years ago, and had been running their own servers at some point because I found some of them in their warehouse.

At some point they had switched to AWS, and then at a later point switched to GC. When I encountered them they had shut down due to lack of funding.

Their servers were running in zombie mode racking up over $6k per month in charges. Data export would have cost over $10k. They were also renting warehouse and office space.

They had very few assets, and not enough income to cover expenses.

If their server costs had been reasonable, they might have survived, but they couldn't bring those costs down.


> had been running their own servers at some point

Wonder why they stopped?


With respect to varying EBS performance, I remember somebody once mentioning that they always provisioned 10 EBS volumes at once, wrote garbage to them all for 24h or so to warm them up, then benchmarked them and deprovisioned all but the fastest one.

How useful this is in practice isn't something I've tried to measure myself, but given the performance variance people report it might not be a bad idea to try for yourself.


Seems like a strange approach. It's easier to just continuously benchmark performance and kill machines that don't perform as expected. It's very easy to automate.

The article also doesn't mention spot pricing and per-second billing. Being able to burst up to hundreds of machines in minutes is super nice. I did some back of the envelope calculations for some build servers and it came out favorably for $cloud providers. Having idle machines also cost money.

For a highly optimized and predictable work load I'm sure providers like OVH can provide better bang for bucks. But that also assumes you don't use any of the cost reduction methods that are available with $cloud, like using spot pricing for variable load and reserved capacity for base load.


> It's easier to just continuously benchmark performance and kill machines that don't perform as expected. It's very easy to automate.

It's also work you simply don't have to do if you rent hardware.


You don't think there is variance in performance when you rent hardware? You've never had an HDD or SSD that underperformed vs. others of the same type or model? You've never had a stick of RAM that underperformed, or threw lots of CEs?

I'll be frank: If you haven't seen performance variance with physical hardware, you either have been incredibly lucky, not paying attention, or not working at a very large scale.


Ah, that delightful cloud propaganda move, where you assert that because some undesirable thing happens to both VMs and physical machines, that they're the same, elegantly glossing over the orders of magnitude difference in severity.


I'm not sure how it's "cloud propaganda" to say that performance variance is a very real thing in response to someone making the claim that it doesn't exist when you rent servers.


Historically EBS volumes were not initialised. The space was allocated, but until you touched the right blocks the first access time was terrible. This is not the case anymore with modern classes.

Also neighbours will change over time so testing for 24h and committing to a volume long-term doesn't sound optional.


I love OVH, and have several servers there.

But it's not AWS. They don't have the same ecosystem of services. Consider, IAM, for example. Premium pricing for a server that has access to that seems normal to me.

On the other hand, he didn't include egress charges. That makes the gap even wider for many use cases.


Yes.

> So, SourceHut is not hosted in anyone's cloud. I own all of the hardware outright and colocate most of it in a local datacenter.

> I just built a new server for git.sr.ht, and boy is she a beaut. It cost me about $5.5K as a one-time upfront cost, and now I just pay for power, bandwidth, and space, which runs about $650/mo for all of my servers (10+).

> Ran back of the napkin numbers with AWS's price estimator for a server of equivalent specs, and without even considering bandwidth usage it'd cost me almost TEN GRAND PER MONTH to host JUST that server alone on AWS.

> AWS is how techbro startups pile up and BURN their investor money.

https://cmpwn.com/@sir/103496073614106505


These aren't the same product category. OVH sells a specific server, AWS is selling managed compute and storage. I won't buy OVH because the failure scenarios and recovery are incompatible with my needs without a lot of extra work

A better comparison would be Rackspace and AWS.


> On the flip side of the coin, there are server providers such as digital oceans, GC, and Azure that can be more expensive than AWS.

What gave the OP the impression that Digital Ocean is more expensive than AWS? AFAIK there is no configuration of Droplets vs EC2 where Digital Ocean is more expensive. In most cases, especially with any outbound bandwidth, it is cheaper by a large amount.


Off-topic: Bandwidth is still pretty costly at Digital Ocean (DO). You can rent an entire droplet along with the additional bandwidth that comes with said droplet for 50% less than just buying the extra bandwidth (and even with that 50% discount it's still prohibitive for bandwidth-intensive apps).

More off-topic: I still generally like and use DO, but I dislike the policy of billing for one thing (e.g. 1vCPU) and adding hidden, fuzzy terms and conditions like "don't consume excessive CPU cycles." IMO the marketing ought to make it crystal clear that you're allowed to _burst_ to 100% use and also put in a well-defined threshold on any other resource limits like total CPU cycles so that people can plan accordingly and not be hit with surprise outages. It's similar to the complaint people have with Comcast selling XX00 Mbps packages and then tucking a miniscule bandwidth cap in the terms and conditions. That's potentially a fine policy, but it's extremely misleading to sell one thing and use fine print to shape it into a completely different offering.


Comparisons of providers is about like comparisons of programming languages - the best choice will depend very much on the circumstance, and likely there will still be at least two possible choices which cannot be accurately differentiated based on overall cost. There are just so many variables...


I know someone who needed to run a large-ish distributed workload. The project had a pretty respectable budget (I think more than $10k, less than $1M).

They were going to rent DigitalOcean servers, because benchmarks showed they were the cheapest hosting provider around.

Then their account got banned due to "patterns associated with cryptocurrency mining." And they had no way to get in contact with a human, appeal the ban and explain their situation.

(Interesting side-note: Even if they were doing cryptocurrency mining, AFAICT it isn't against DigitalOcean ToS.)

They ended up switching to another provider (Linode I think?)

Anyway, what I'm trying to say is, it's an example of a cheap provider who has good paper specs and even good benchmarks. But when you actually try to use the platform at scale, you'll discover there's some catch and you can't actually use it.

(I speculate that maybe DigitalOcean madly oversubscribes physical CPU, so anyone who actually uses the resources they pay for trips the automated system and gets their account banned, because DigitalOcean can't both keep their cheap prices and pay for the engineering effort to figure out how to throttle customers who peg their CPU. So they make enough money to stay in business from websites and non-CPU-heavy workloads, but have to use the backdoor "cryptocurrency mining" banning excuse to turn away customers who basically want to buy CPU-hours.)


I've worked at a shop that used both bare-metal servers from OVH as well as VMs from AWS. Using the EC2 instances were infinitely simpler, we could easily automate so much of it and manage using load balancing and scaling services. AWS instances were the default choice as the scaling up and down was a huge benefit as well as the simpler management and additional services available.

The only places we used OVH bare metal servers was for bulk. Our ElasticSearch cluster had so much RAM that the EC2 cost was very prohibitive. For our DB servers we couldn't at the time get the same performance raid SSDs/controllers and vertical scaling (256GB, 512GB). For our disposable webapp servers we could have used either and since the ops was already worked out having used OVH before AWS kept it that way with a bit more ops and a smaller infra bill. When you have hundreds of hardware hosts, they're failing monthly and it's up to you to image a new one and add it to the cluster.

To have only one way to do things, it also meant that Chef was used instead of immutable instance creation which was a pain to keep everything in sync.

My thought process is basically: (1) small number of instances -> cloud VMs, (2) medium instances w/ founder(s) time but no capital -> bare metal, (3) mature app or huge number -> bare metal


So...

The news is that AWS is overpriced, and overpriced by quite a lot.

This is not news.


I run my personal website and other small projects on VPSs I bought from OVH, and they're really cheap and work very well. Can recommend.


I agree. They're really great if you need well-priced, no-hassle infrastructure. As others have already mentioned, they may not have some of the higher-level features that AWS comes with, but if you don't need these, then there's no reason to be paying more for it.

I'm really fascinated by their approach to building datacenters, which seems to include taking over disused industrial sites and converting them for datacenter use. For example, their datacenter just outside of Montreal is on the site of an old aluminum smelter [0]. In this case, I'm sure the proximity to plentiful and cheap hydroelectric power nearby (as would have been beneficial for aluminum smelting in the past) was a major factor in the choice of location as well.

[0] https://baxtel.com/data-center/ovh-beauharnois-quebec-bhs


Like I said on another site:

... people know AWS and know how to be productive with the services and frameworks for AWS. that alone is a figure hard to quantify. Sure I could save money bringing all the servers back internally or using cheaper datacenters, but I worked at a company that worked that way. You end up doing a lot of busy work chucking bad drives, making tickets to the infrastructure group and waiting for the UNIX Admin group to add more storage to your server. WIth AWS I can reasonably assume I can spin up as many c5.12xlarge machines as I want, whenever I want with whatever extras I want. It costs an 1/8 of a million a year, roughly. I see that 1/8 of a million that cuts out a lot of busy work I don’t care about doing and an 1/8 of a million that simplifies finding people to do the remaining work I don’t care about doing. The author says money wasted, I see it as money spent so i don’t have to care, and not caring is something I like; hell it isn’t even my money.


I had this mindset for a long time that we would save money running our own infrastructure. I remember talking days with our CTO about getting dedicated servers and spinning up one of the Citrix or VMWare offerings. I was sure that we could run our backend on one dedicated server for a fraction of the equivalent on AWS. I based my assumptions on running my own infrastructure for years without issues. However, once we started growing I understood that adding resources, changing network policies, spinning test environments would be extremely difficult and would stall our growth. I am so glad they didn't listen to me.


You just didn't have anyone skilled enough to automate those things, or you couldn't manage that. If you are colo you aren't swapping disks/network your colo provider is. Im not advocating for doing it ALL yourself, making network cables does save money, but its impractical. Just because you couldn't manage the whole thing while having the ear of the CTO, just means you failed it doesn't mean it doesn't work for lots of other companies that make lots of money.


If I get a three-year reserved instance of a t3.small, it works out to $6/month. A t3.micro works out to $3/mo. Storage and bandwidth, for my use-cases, amount to less than a dollar a month.

Are there any other providers that compete on the low end like this? The common suggestions (DO, Linode, etc.) all cost at least twice as much.

I buy the author's argument for "real" use-cases where you need one or more expensive servers. But I also see this argument made to people using AWS for personal use, and I've never understood it. Am I missing something?

I don't need a massive server, I need something that can run code, and that's always on. AWS seems really cheap on the low end, and those machines are more than powerful enough to host your own email, files, projects, etc.


There's plenty of small VPS providers out there who will give you slivers of a server for very cheap:

https://lowendbox.com/


OVH, Hetzner, and a small army of very reputable but smaller providers (Ramnode, BuyVM, etc).


> and a somewhat worst machine from AWS

I’m not a native speaker, but I see this usage of "worst" pretty often. Shouldn’t it be "worse"? Is this a mistake by someone or is this an actual construct I simply don’t know about?


Yep, that's not grammatical.


You actually get encryption out of the box on GCP. Your data is encrypted also at rest.

Do not compare apples with oranges, or compare them more fair.

Go to whatever provider you want. Make sure you know why you have choosen AWS or someone else.

For a lot of companies, the quality they get from GCP might be overkill, but don't get me wrong, infrastructure cost are often enough, in comparision to how critical they are, very cheap.

It might just be that it is easier for you to get another company to manage your AWS Account while they don't know anything about some other cloud provider and the additional cost is then just worth it.


I run a large infrastructure SaaS on AWS. Our bill is frightening and so we work diligently to find ways to cut it down. Yet, despite other cloud providers offering much lower prices on CPU, storage, and bandwidth, we stick with AWS for the following reason: If we were to switch to a cheaper provider, we would have to do a lot of building ourselves, which would require hiring talent and keeping that talent employed continuously.

Any time we look at the switch, it doesn’t make sense because of the up front costs of staffing that we would need to absorb.


Also if you build it yourself it carries risk -- each project could fail, or take extra time.

Where as buying AWS, has low risk.


Servers go down in AWS too, they are a colo shop with an open source services arm to implement multi tenant services of which you are one tenant yet paying for that multi tenancy service.

You may not notice it having 1 but have a few thousand. Everything goes down in AWS just like it does on normal servers, no one likes to mention that though so often its blamed on something else like software that is running on the cloud that just went down hah.


Okay fair comment. But in terms of the entire data center being unreachable, that has never happened - not in ten years.


Yes. And risk here means my business fails potentially. AWS has never once had an outage in the region we operate in.


A pet peeve of mine is also the "bullshit cloud". For example OnShape is basically running a CAD software on a virtual machine. Fusion 360 is saving files on the networks or launching batch jobs on a farm, 1990 banking style.

The cloudiness provides very little value to the user. Whereas computing the fillets concurrently on a bunch of server and the client and accepting the fastest answer would be useful, instead of locking the UI for 2 minutes, same for toolpath generation.


The article and the other posts in this thread miss the killer feature. Data security via backups. If I put my data in a manager DB and use S3 or equivalent I am more or less guaranteed to never lose data. Running bare metal in OVH can't provide that guarantee. The security of that data depends on the thoroughness and correctness of my back up policy. And those are almost guaranteed to be much worse than what AWS et al provide.


There appear to be a few other important factors missing from the article.

1) It isn't just that AWS (and others) provide loads of services, it's that I can run everything I need in the same data centre. SQL Azure might be a good pricing model for your system but if it makes sense for everything else to be in AWS (features, cost whatever) then I am more likely to spend more on AWS RDS or whatever. Might cost £1000s more but I cannot simply split my load across two providers, that is two points of failure.

2) The level of support you get from OVH is nowhere near as good as AWS, which is to be expected when comparing a "budget" provider with the "gold standard". OVH can rent stuff to you cheaper because their support team is presumably much smaller.

3) Even the UI for OVH gives me the heebie jeebies. I use them for personal stuff because they are cheap but the UI is far too slow to use for fast work. I see notifications that I have previously cleared and I get random emails occasionally about things I don't quite understand. I can live with that for cheaper personal sites but if I was using that every day, I would quit!

Of course, for many people, there are issues with marketing (I haven't of some of the other providers) and the basic, and probably fair, assumption that with AWS you know the sort of quality you expect - a bit like buying Toyota. The time and risk involved in evaluating other providers is not worth it in many cases, especially when you only learn 6 months later that OVH don't have a configurable load-balancer or they don't have proper zone redundancy or you cannot upgrade certain hardware without literally copying your files into storage and trying to copy them back to a larger machine.


Are there tens of regions around the world? Are there multiple AZs within a region that are single-digit milliseconds apart? Can I autoscale? Is there a managed load balancer? Do I have to manage my own database instances and availability? Do I have access to hundreds of other services for things that I don't want to spend time managing myself?

Aren't we years beyond the 1:1 cost comparison for a single server argument?


No, almost no-one needs any of that. The depressing amount of money that goes to AWS for little reason but 'because it's the safe choice and everyone does it' is pretty amazing. In tech/programming we often say 'use the right tool for the job' and yet everyone reaches for the hammer that is AWS when hosting is mentioned.


Every company I've worked for in the last ten years has needed all of the above, and I like to work for small companies (fewer than 100 people). Only in the last couple of years have I had any interest or need to branch out to the other clouds, largely because they're doing specific things really well.


I wanted to write the same. One server in my basement never compares to a highly-available, compliant and secure infrastructure, surrounded by managed services. It's like complaining that taxis are more expensive than driving your own car.


Every infrastructure is as secure as you configure it to be without bugs. Not that AWS doesn't allow for secure infastructure and you're basement is secure by default, but as a consultant I see clients when I ask them about security saying "We're secure because we use AWS.".


One should definitely not default into "we're secure because we use AWS". But let's be honest, how many servers in the basement feature video surveillance? How many destroy their hard disks securely?

An honest statement would be "we reduced our burden of security by building on top of AWS".


Yeah! The amount of toil and burden AWS removes or reduces for me across the board (security, managed services, running multiple datacentres and everything necessary to do so…) means I can work on vastly more valuable things. Which more than makes up for the price.

The cost of 1 server is not a sufficient metric of value.


Not to counter your point, but what I've learned and was suprised how cheap it is to letting a company destroy your harddisks and USB sticks - at least in Germany.


Tell that to Capital One.


YAGNI applies extremely heavily to most of those concerns. Pre-traction startups are wasting money they don't have, just to future-proof their application.


The question is wether that is worth the lock in.


I worked at a job that had a multi year contract with a hosting company. We paid eight figures annually to lease MIPS on a mainframe.

That’s vendor lock in. AWS “lock in” isn’t, it’s “I could terminate for convenience any day I wanted to, but the ROI isn’t there”.


With cloud providers either you use their proprietary services and thus end up with code that runs only in their cloud. Just like with MIPS code that runs only on MIPS. Only that you probably can more easily port code to another CPU architecture.


Not MIPS the RISC CPU arch / vendor and embedded survivor, but IBM MIPS, the only feature added to brilliant hardware designed to turn customers away, e.g. :

"Turning our attention back to IBM’s announcement, this new server offers five hardware models and well over 250+ unique software capacity settings, providing a highly granular and scalable system. The base single-engine speed of 98 MIPS is found on the A01; the same full speed unit (Z01) climbs to 1761 MIPs, up from 1570 MIPs on the prior generation"

From : https://www.evolvingsol.com/2020/04/14/ibmz15-mainframe/


Now go stick some Dell servers in a colocation joint and run the numbers again :-)


Just migrated an ecommerce consulting client from a major VPS provider in the US to AWS.

There's two main reasons. One is that the traffic to this client is heavily driven by ad spend, and the site fell on it's face hard this past year any time a big spike got sent our way by a single or combination of ad vendors (which we can't really control in fine enough detail) -- facebook, by the way, is by far the worst about this. They can and will cause a thundering herd.

The second reason is that there's a bunch of scale up/down to services in AWS if you're doing it right. You're not buying a .16xlarge server in AWS to host your ecommerce site, that would be stupid, you don't need that 24x7. You're paying for a pair of .xlarge servers at a Reserved Instance rate, which is half the published rate. When you need to, your instance count (and your aws bill) goes up.

We couldn't do that at the VPS host we were on, so we kicked them to the curb. With AWS, we can handle the load spikes -AND- the total lower bill for a year came out much lower.


>facebook, by the way, is by far the worst about this. They can and will cause a thundering herd.

From a marketer's perspective, (as long as the site doesn't go down), this is actually facebook doing their job.


GCP also has sustained-usage discounts, which are very convenient, as even without reservation as in AWS, you get up to 30% discount if you have more constant usage, but you still have the full flexibility of on-demand.

When it comes to pricing comparisons of cloud see Cloudorado: https://www.cloudorado.com/


I can think of one reason why OVH is that much cheaper. OVH doesn't bother policing their network. At all. You don't even need to take my word for it, Cisco Umbrella has written numerous papers on the subject.

They're one of three hosting companies I instantly blacklist across the board when I take custodianship of a network. We don't want their customers business.


This is giving too much credit to organizations. In some Cloud providers arechosen plainly because some executive asked “So are we cloud?” And someone said “err I don’t think so” and then some other middle manager was charged with “make us cloud please” and it was never a question of what actually makes sense just a “do we side with Amazon, Google or Microsoft?”.


what you're paying for is the capability of scalable architecture, on-demand launch, and heavy redundancy. this very easily acts as a multiplying cost factor and building out your own infrastructure as opposed to leveraging one that can dynamically scale as your needs require. Yes you can get away with paying for your own infrastructure at a third the cost, but then when you compute the cost of all the individual requirements. colocation ISP prices, space, people to manage all of these machines, the software to then scale yourself in equivalence, you eventually end up at w similar cost apples to apples, oranges to oranges if you replicate for yourself exactly what you get through AWS.


Honestly this is a terrible article, the only real point on display is "if you use AWS for EC2 reserved instances only, then you are overpaying". If your application can be interrupted, spun down, and spun up then congratulations you can use spot instances[1] at a fraction of the price (depending on your region and instance size).

r4.16xlarge isn't available in France, so I'll use us-east-1

Reserved: $36,365

Spot: $9,665

UVH (Still France): $25,771

So if your system can tolerate nodes going up and down every once in a while (and some other caveats), it seems pretty dumb to pay for the dedicated server.

[1]https://aws.amazon.com/ec2/spot/pricing/


When determining what to use for development of my SaaS, I did a comparison of what you actually get from providers. The full article is at https://jan.rychter.com/enblog/cloud-server-cpu-performance-...

My takeaways were that many cloud provider offerings make no sense whatsoever, and that Xeon processors are mostly great if you are a cloud provider and want to offer overbooked "vCPUs".

I haven't tested those specific setups, but I strongly suspect a dedicated server from OVH is much faster than a 4.16xlarge from AWS.


> When determining what to use for development of my SaaS, I did a comparison of what you actually get from providers. The full article is at https://jan.rychter.com/enblog/cloud-server-cpu-performance-...

Your results (e.g. that z1d.xlarge with 4 vCPUs is only 10% slower than z1d.2xlarge with 8 vCPUs) shows that the "performance" you were testing was disk IO throughput (probably dominated by disk latency), not vCPUs.

> My takeaways were that many cloud provider offerings make no sense whatsoever, and that Xeon processors are mostly great if you are a cloud provider and want to offer overbooked "vCPUs".

> I haven't tested those specific setups, but I strongly suspect a dedicated server from OVH is much faster than a 4.16xlarge from AWS.

You seem to be implying that AWS/EC2 does CPU over-provisioning on all instance types; this is incorrect, only T-family instance types use CPU over-provisioning.


> the "performance" you were testing was disk IO throughput

In part, yes, but not entirely. I was very clear that my load isn't embarrassingly parallel, so it is not expected to scale linearly with the number of processors.

> You seem to be implying that AWS/EC2 does CPU over-provisioning on all instance types; this is incorrect, only T-family instance types use CPU over-provisioning.

If you think you are getting a Xeon core when paying for a "vCPU" at AWS, I have a bridge to sell you.


I went into OVH and AWS rabbit hole. I think person needs to understand the concept of alternative cost when approaching such problem. Sure, you will save money on the server itself, but you don't get all other things that AWS is adding and you would have to one way or another do them by yourself. Even if your time is extremely cheap, I don't think one is able to beat their price as much to justify doing all this work. They've been at it for years and their solution is battle tested and you may be adding issues you won't be aware of that will bite you once you get growing.


Or maybe cost just doesn't really matter to many companies. If using AWS helps you scale easier, and lets developers spend less time managing servers, that can be extremely valuable, especially at a company that is growing quickly. Who really cares if you are spending too much, the goal isn't profitability it's getting as many users as possible. For many tech companies, I think money is pretty much a non-issue, as evidenced by the insane salaries they offer - the real challenge is scaling up an organization without completely paralyzing all progress, and AWS can help do that.


Bad comparison:

- on-demand vs upfront (AWS has upfront with slimilar cost reduction) - scalable vs. fixed (when you stop using it on AWS you also stop paying)

Then there is all of the functionality and integration everyone else already has touched on.


Has anyone tried the Scaleway Elements? Looks pretty feature-complete (hosted DB, Kubernetes, object storage). Just wondering if there's a reason it's not talked about, i.e. is it too good to be true?


AWS is not optimized or built for people who want to reserve their own personal server for a year or more.

Another cost option that the author didn't mention are spot instances. An r4.16xlarge spot instance in Paris for one year would cost $8,900, about half the price of OVH and that is assuming that the customer needs the server 24x7. Having a server 24x7 obviously wouldn't work on spot, but the point is that AWS is built more for customers who are frequently creating, destroying, and scaling their clusters based on need.


No one has been fired for choosing AWS.


Hypothesis: the big three US cloud providers are not really trying to be competitive in the European market. Their EU datacenters price in the convenience of a consistent dev/ops story when a US company extends its primarily US footprint into Europe. They are a bad deal for entirely European ops, but this story doesn’t generalize to the US. US dedicated server pricing is much closer to cloud pricing.


Great write-up, and I've got to say that the biggest takeaway from it, for me, is that the cloud pie has more sections than I thought it did. Well, I guess in the back of my mind I knew there were quite a few providers, but I never paid serious attention.

From now on, I'll make a conscious effort of at least taking a look beyond the most popular providers (and Heroku), and not just for the sake of cutting costs.


Amazon hides a lot of extra paid features that are free in dedicated servers. I got cut by their practices of hiding prices while trying to experiment with "free-tried" someone recommended me. Amazon made it really easy to subscribe to paid features. After first $50 bill I moved to kimsufi where I have more memory, disk, transfer and public addresses for €5/monthly.


I felt there are several important things missing in the article, and things that not mentioned in the 160 comments so far.

AWS listed Rate are literally for small fish. Which is the stage you get your $100K / 1% of your Investment. In the 10x example, let say $1M, you should start asking for discount. And it could range anywhere from 20% to much higher. Compared to OVH which is already offering at a very low price. The price stays the same at scale.

Amazon are also going full steam ahead with their own ARM instances, and those offering are 20% cheaper listed already. And in many cases they even perform better per vCore than Intel. On x86 instances vCore is a Thread, while on ARM instances vCore is an actual Core. ( Assuming your Apps works with ARM. )

Network quality - OVH, DO, Linode, none of them are any where near as good as AWS in terms of Network connection speed, routing and capacity. And this is something you can recreate with other IaaS such as OVH.

All this brings AWS to a much lower multiple of OVH. And then you add up all the extra benefits such as easier to hire, Resume Driven Development, Asset vs Lease Tax Break / Financial Engineering etc.

I really do wish Google and Microsoft brings in more competition though. As the author mentioned he was surprised with no single monopoly. Because the market is growing faster than all of these HyperScaler can handle. Intel has been selling as many Server CPU as they could Fab them.


I’m surprised no one has mentioned the number of data centers that AWS offers. Good luck using OVH anywhere but France/US east coast.


If all you need is a few VMs, then there's no real reason you have to be stuck on a single provider. If OVH only has servers on the east coast, then you find another provider for the west coast. I've never used OVH, but I doubt creating an account with them (and other providers) is difficult. Shopping around is the only way to get the best prices.

Maybe it's mildly annoying to have to deal with multiple hosting accounts, but that's a small price to pay for potentially huge savings, IMO.


Why? They have 8 PoPs stretching from US west coast to Singapore and Australia. That seems pretty okay.


The lawyer/legal department bottlenecks on cloud usage is a pretty maddening aspect of corporate cross-shopping of clouds.

At this point, every semi-established cloud provider must have sufficient paperwork to cover the various legal permutations. And it's not like the lawyers are doing any technical research to verify any security aspects.


Few large organizations pay the on-demand price that this article uses for comparison.

By reserving the instance, or using spot, the costs of these instances come way, way down, usually 50-80%.

I’m not trying to defend AWS here, but for an accurate comparison, it’s best to use the numbers that people are actually paying in practice.


The article did reference the (1-year, upfront) reserved instance pricing. It quoted a price 17.5% higher than what I see in US-East2 (Ohio), but they didn't ignore it entirely.

My guess is that most people who would be buying such a server would be using a 3-year RI, which is (ballpark) "buy two years of RI, get the third year free".


A lot of people pointing out the complementary services in this thread. Another place that I hope Kubernetes creates a consistent operational environment, with wide service offerings, such that we aren't so strongly reliant on Big Cloud forever.


The fact is that cost of compute is only a serious fraction of revenue for a small percent of companies. Those should worry about compute — the rest should focus their attention elsewhere.


What if there were Airbnb for servers, where you could rent your spare computing resources in a marketplace?

Many non-confidental processes could run on some other people's computers.


AWS spot pricing meets SETI@home?


> If I thought the whole "beat out the big guys by using cheap compute" thing was so easy, I'd be doing it instead of writing about it.

Very wise.


How fast can you reset an instance on OVH?

I personally wouldn't want a server to live longer than 48 hours :) (Unless it's a managed service)


Can you please tell us why? What are you afraid of? I have seen servers up much longer than 2 days, still humming along no memory leaks no whatever you can think of that "rots" on a server that doesn't change. Maybe your app has a lot of memory leaks or is not well coded where it has to be restarted every few days, or as you say completely brought up on a new server why?


2 days isn't a strict limit :)

But I wouldn't want to maintain instances that aren't regularly cycled.

Then you have to install security updates and things like that.

I think many people today, only want to own the inside of their container, and only have the container exposed through a load balancer that has some sanity check on incoming traffic..

Most LBs won't forward non-HTTP requests, etc..


Does it matter? I'm sure the machines backing EC2 live quite a long time. And that's OK depending on what they're used for.


pretty sure trillions of dollars in unused houses are lying on the ground


god this is written so poorly it's hard to follow


I'd originally posted this here: https://lobste.rs/s/surdxc/is_billion_dollar_worth_server_ly...

But cross posting in case it's interesting to this audience.

Over the past few years of my career, I was responsible for over $20M/year in physical infra spend. Colocation, network backbone, etc. And then 2 companies that were 100% cloud with over $20M/year in spend.

When I was doing the physical infra, my team was managing roughly 75 racks of servers in 4 US datacenters, 2 on each cost, and an N+2 network backbone connecting them together. That roughly $20M/year counts both OpEx and CapEx, but not engineering costs. I haven’t done this in about 3 years, but for 6+ years in a row, I’d model out the physical infra costs vs AWS prices, at 3 year reserved pricing. Our infra always came out about 40% cheaper than buying from AWS for as apples to apples as I could get. Now I would model this with savings plan, and probably bake in some of what I know about the discounts you can get when you’re willing to sign a multi-year commit.

That said, cost is not the only factor. Now bear in mind, my perspective is not 1 server, or 1 instance. It’s single-digit thousands. But here are a few tradeoffs to consider:

Do you have the staff / skillset to manage physical datacenters and a network? In my experience you don’t need a huge team to be successful at this. I think I could do the above $20M/year, 75 rack scale, with 4-8 of the right people. Maybe even less. But you do have to be able to hire and retain those people. We also ended up having 1-2 people who did nothing but vendor management and logistics.

Is your workload predictable? This is a key consideration. If you have a steady or highly predictable workload, owning your own equipment is almost always more cost-effective, even when considering that 4-8 person team you need to operate it at the scale I’ve done it at. But if you need new servers in a hurry, well, you basically can’t get them. It takes 6-8 weeks to get a rack built and then you have to have it shipped, installed, bolted down etc. All this takes scheduling and logistics. So you have to do substantial planning. That said, these days I also regularly run into issues where the big 3 cloud providers don’t have the gear either, and we have to work directly with them for capacity planning. So this problem doesn’t go away completely, once your scale is substantial enough it gets worse again, even with Cloud.

If your workload is NOT predictable, or you have crazy fast growth. Deploying mostly or all cloud can make huge sense. Your tradeoff is you pay more, but you get a lot of agility for the privilege.

Network costs are absolutely egregious on the cloud. Especially AWS. I’m not talking about a 2x, or 10x, markup. By my last estimate, AWS marks up their egress costs by roughly 200-300x their costs! This is based on my estimates of what it would take to buy the network transit and routers/switches you’d need to egress a handful of Gbps. I’m sure this is an intentional lockin strategy on their part. That said, I have heard rumors of quite deep discounts on the network if you spend enough $$$. We’re talking 3 digits million multi-year commits to get the really good discounts.

My final point, and a major downside of cloud deployments, combined with a Service Ownership / DevOps model, is you can see your cloud costs grow to insane levels due to simple waste. Many engineering teams just don’t think about the costs. The Cloud makes lots of things seem “free” from a friction standpoint. So it’s very very easy to have a ton of resources running, racking up the bill. And then a lot of work to claw that back. You either need a set of gatekeepers, which I don’t love, because that ends up looking like an Ops team. Or you have to build a team to build cost visibility and attribution.

On the physical infra side, people are forced to plan, forced to come ask for servers. And when the next set of racks aren’t arriving for 6 weeks, they have to get creative and find ways to squeeze more performance out of their existing applications. This can lead to more efficient use of infra. In the cloud world, just turn up more instances, and move on. The bill doesn’t come until next month.

Lots of other thoughts in this area, but this got long already.

As an aside, for my personal projects, I mostly do OVH dedicated servers. Cheap and they work well. Though their management console leaves much to be desired.


Seems to me that this sort of thinking is predicated on the the idea that infrastructure should still work like it did 10-20 years ago.

Cost is not the driving factor, and in any case, cost is often calculated wrong. Putting one set of prices 'for stuff' in one column and another set in another column and looking at the difference tells you almost nothing.

Forget about start-ups for a minute. There are certainly arguments to be made both ways there. Focus on big enterprises for a second. Infrastructure complexity isnt getting smaller over time and neither is demand. Delivering on demand and managing complexity with a bounded number of people requires a change in thinking when it comes to infrastucture writ-large. We cannot sustain the old on-premises, dinosaur pen style data centres and deliver and grow our core businesses. It just wont work... So you either go cloud or create something very cloud like and do it on-premises. Heck that is how AWS came about in the first place.

Anyone that thinks going cloud is a good way to reduce headcount is going to get a shock. Going to cloud (or lets say changing the way you do infrastructure) is a way to continue to do business with the headcount you have. It's not a question of carrying on with the 'old school' infrastructure you have, you just want keep up doing that. Do nothing and your headcout requirements are unbounded.

Anyone that thinks cloud infrastructure requires you to hire only a bunch of cloud experts is wrong. Chances are you have the bulk of the infra people you need right now. All those people that have been doing 'old school' infrastructure for years are still your most valuable resource. The mechanics of the infrastructure are fairly irrelevant (ok they are not, but in the grand scheme of things we can kind of cancel the mechanics out in the equations), oft missed value is the operational knowledge. The operational bit is what gets swept under the rug in the DevOps discussion. I believe the knowledge of how to translate existing infra into a 'new' model doesnt come from hiring DevOps people or cloud people, it comes from the infrastructure people that have been doing it for years. Leverage the intellectual capital you already have.

The flip side of that bargain is that 'old school' infrastructure people need to recognise we have to adapt. Those that dont are doomed to be cancelled out in the same equations that cancel out the mechnanics of the infrasturcture itself.

Enterprises that fail to recognise the shift are also doomed. Those start-ups we arent talking about? They can scale far more quickly and quickly get to a point where they can deliver at the level a much bigger organisation can. They can eclipse the slow movers. I assume this is why a lot of consolidation happens... The slow dudes only really have one move and that is to buy the little guys before they can get there (the Facebook defence). But thats like doubling your bet on red every time you lose at roulette. Eventually you go bust or the game moves quicker than your bankroll...

Thanks, James


Something the article misses is complementary services. A lot of companies don’t care at all about X vs 2X for server cost, but if you lack certain features around say managed Spark job execution or serverless container deployments, it’s a total deal-breaker.

AWS, Azure and GCP tend to have the widest coverage of complementary services, along with all the other stuff like volume discounts and credits. I believe in many cases it could be perfectly rational to pay 2X more on just the server portion.

I also think support is another big issue, but GCP makes me second-guess myself, since they are the third largest provider but their support system is (literally) to just say “fuck you, read our upsell-laden docs.”

A good example is the way the GKE SLA page mentions a bunch of beta features of kubernetes will void your SLA credits, when in reality Anthos is not remotely close to being ready for production use by most teams and they have no choice but to rely on beta features in GKE. Multi-cluster ingress is a good example of this - “just switch to Anthos” as a proposed solution is literally equivalent to saying, “fuck you.”


Capex + Opex = $32,000 in AWS https://archive.is/rav98


Why not just buy the hardware and save even more then?


It depends on your needs. Buying hardware still make sense in some scenarios where quality of service is not a thing, you have space and some time.


yeah if you have a little time, you can save a lot




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: