They've gone the route of multiple AWS accounts in my company to avoid the issue...

movedx · 2024-12-09T22:55:25 1733784925

The biggest issue I see here is the misguided assumption that Cloud is just automatically and unilaterally better than on-premise or professionally managed, hosted hardware. This isn't true in most cases.

There are so many providers, and therefore examples, of physical tin being accessible in under a minute with cost:hardware ratios that blow Cloud out if the sky (pun! ha!) OVH have a server for USD $95/month (with no commitments) that can be brought up and made available 120 _seconds_ that has six 3.8GHz cores, 32GB of RAM, 2x960GB NVMe SSDs, and 1Gbit/s of UNMETERED, guaranteed bandwidth... that's absolutely insane, and that's fully managed from the hardware down, so arguments like, "bUT yoU haVe to MAintAin hardWARE!" are just not true _at all_.

hmmm-i-wonder · 2024-12-10T13:35:55 1733837755

It was during the wave of "Moving costs from capex to opex give C levels more flexibility" movement after the initial 'cloud is better' wave. In retrospect it seems like another of their badly thought out reactions to a situation they caused by short term thinking, in this case the issues caused by trying to reduce headcount on teams supporting legacy and new physical locations while increasing the pace of new locations.

Those costs were moved and ended up higher than the capex costs were to begin with which everyone expected but the decision makers (they brushed it off every time they were asked in company Q&A's). Opex margins became a major issue and the company did performative layoffs and restructuring to appease the shareholders (then re-hired ~1/3 of the laid off staff within the next 8 months because they actually needed them)

The level of 'bad decision leading to bad decision' happening is somewhere between absurd and depressing at this point.

movedx · 2024-12-10T23:51:56 1733874716

Good summary.

I think this all boils down to a knee-jerk reaction culture that doesn't think about the second or third degree consequences and/pr beyond the next 2-3 years.

fireant · 2024-12-10T02:58:20 1733799500

People on HN refuse to see this being an option in these discussions. Is either "cloud" or "build and manage your own physical rack inside a colo housing"

movedx · 2024-12-10T03:53:49 1733802829

It's wild to me how hardcoded some of these people are. I think a lot of the younger generation on here might not have experienced the "bare metal days", so they don't know how far you can push the hardware and how much you can squeeze out of it.

Gud · 2024-12-10T11:27:44 1733830064

And frankly, how easy it is.

movedx · 2024-12-10T11:33:12 1733830392

Precisely. Operating systems aren’t hard. They’re so easy and well established it’s crazy not to use them directly, and even though I’m not the world’s biggest Docker fan, Compose is kind of awesome to be honest. Deploying software and maintaining and OS is simple in this day and age.

hmmm-i-wonder · 2024-12-10T13:44:48 1733838288

Having gone from managing several thousand physical to virtual/cloud instances, there are certainly major differences and the company has to structure its approach accordingly (IMO).

On premise in my opinion needs a dedicated team managing hardware and leverage solutions to provide that as VM's/Containers/etc to teams. Another team focused on OS level security and base image, then your dev teams can effectively focus on their app and leverage the automated tools provided by the hardware and OS teams.

Cloud gives you at least half of that, or all of it depending on your approach, for a cost. There are points where the cost makes sense and times when it doesn't, and typically that changes through the life of a company. Unfortunately there is a not insignificant overhead even with current tools to maintaining a truly substrate agnostic infrastructure that can be deployed on top of multiple clouds, on-premise etc... so companies are locked in even when economics change.

movedx · 2024-12-10T23:58:03 1733875083

> On premise in my opinion needs a dedicated team managing hardware and leverage solutions to provide that as VM's/Containers/etc to teams.

You're assuming that "On premise" equates to "inside our building, in racks we've installed, using power and networking we have to manage." You're correct if that's the case for your business, but my argument is based around the idea that you can use _managed_ hosting providers of physical hardware that'll be either next door to you, in the same city, or close to your users (i.e, you're a business in Germany but your customer base is in London, so you host the servers using a London based provider.)

The idea that you have to manage hardware is greatly diminished when you consider the availability of managed providers that are dirt cheap.

hmmm-i-wonder · 2024-12-11T12:46:58 1733921218

That's a good point, and at small and medium scales those are very cost effective alternatives to cloud or fully managed. Not many managed providers can provide a full equivalent to an on-premise team, and it quickly becomes cheaper to run it yourself once you scale into large dedicated instances and high network traffic. Before then though its often better than the cloud for many situations.

antonvs · 2024-12-10T14:21:28 1733840488

> On premise in my opinion needs a dedicated team managing hardware and leverage solutions to provide that as VM's/Containers/etc to teams. Another team focused on OS level security and base image, then your dev teams can effectively focus on their app and leverage the automated tools provided by the hardware and OS teams.

Exactly. At which point, you’re essentially reinventing a cloud, usually not very well. If you have access to really good people you can pull this off, and that’s why you see so many people on HN doing the “who needs cloud” flex.

But the reality is that for most companies, managing non-trivial amounts of hardware is not a core competency, and they regularly shoot themselves in the foot by trying it.

apelapan · 2024-12-10T18:14:22 1733854462

If you are in the cloud, you are going to need a team that understands cloud networking, storage, deployment, security etc. You will need enough people to maintain support rotations and survive normal churn.

It seems like many people/organizations belived that they would be rid of the whole "operations problem" once they shifted all their workloads from on-prem to cloud. They believed that they paid a full team for running cables and replacing broken fans/hard drives/PSU:s, when that aspect of on-prem is a tiny (but non-zero) amount of work.

movedx · 2024-12-11T00:01:02 1733875262

I don't believe a lot of this is required.

OS level security? So, "apt update && apt upgrade", then? I mean, what else are you doing, writing patches for the kernel? Checking every line of code that runs? Are you aware of how effective SELinux and systemd containers are? Just a simple firewall at the OS level? Maybe even just using Tailscale (or the open source Headscale) to introduce zero trust access capabilities.

There's a Terraform provider for Proxmox, which is an excellent hypervisor. Making a template takes less than an hour with configuration.

You do need an Ops person for sure, but an entire _team_?

hmmm-i-wonder · 2024-12-11T12:55:41 1733921741

>"apt update && apt upgrade",

Across 10k-100k+ servers, all running services and needing to orchestrate restarting across the whole fleet, while providing 0 downtime or impact to thousands of clients with terabytes of data being processed and analyzed at any given time.

Sure whats so hard about changing a tire? Well try to do it on an 18-wheeler while its driving down the highway without any impact to its speed.

> Are you aware of how effective SELinux and systemd containers are? Just a simple firewall at the OS level?

Part of a layered and in-depth system but one that introduces complexity.

>Maybe even just using Tailscale (or the open source Headscale) to introduce zero trust access capabilities.

Tailscale in an enterprise production environment? Not going to pass any sort of security audit and probably violates a number of certifications customer require at the enterprise level for network access controls, visibility and auditing.

Just managing the git/jenkins/spinnaker/terraform infrastructure in dozens of locations deploying to and maintaining tens of thousands of servers/pods requires a 24x7 team on top of the hundreds of teams and tens of thousands of devs using it.

If you're small enough that doesn't make sense, then you might be small enough one Ops person can handle the load (One is never enough if you're smart but...), but you are dealing with a very small amount of infrastructure and services at this point.

CRConrad · 2024-12-20T08:37:11 1734683831

> Across 10k-100k+ servers

If you "need" that many servers (and aren't Google), you've built your systems massively wrong.

hmmm-i-wonder · 2024-12-10T15:41:57 1733845317

Absolutely.

My issue is really on the other end of that scale, where getting C-suites to recognize when owning that core competency is actually beneficial to the company even if its not the focus of the company.

I grew up around companies leveraging vertical integration at the right scales to improve costs, seeing companies go the opposite direction trading all those advantages for often never-materializing benefits is... frustrating.

sevensor · 2024-12-09T20:08:49 1733774929

I’d ask, “have we worked together?” since this is a spot on description of my former employer, except it’s probably a spot on description of thousands of mid sized companies.

unethical_ban · 2024-12-09T20:28:11 1733776091

Same! Some execs get excited about reducing capital expenses for a data center and the teams that manage it. Some CTO gets excited about the flexibility and some legitimate benefits of cloud.

But it ends up costing a shitton of money to switch paradigms completely, and they don't switch paradigms completely for a number of years: If you're just migrating servers to ec2/vpc, you're doing cloud wrong.

Of course, there is the idea of cloud agnostic, or even multi region, which seems a challenge for most places.

At least with terraform, it is theoretically easier to swing configurations over to a different host.

ryandrake · 2024-12-09T21:54:55 1733781295

At many places I've worked, there are essentially zero checks-and-balances between "Exec gets randomly excited about X" and "X becomes a mandate, with staffing, budget, and deadline." No technical vetting, feedback loop, sometimes no apparent coordination with other execs (and their random ideas). It's just: "Mike is excited about Cloud. -> We are now doing Cloud." Later, Mike gets excited about something else, and the entire team moves over to something else. "Mike is excited about AI. -> We are now doing AI."

GianFabien · 2024-12-09T22:30:49 1733783449

...but the salesperson promised it would be easy, fast and low-cost! </sarcasm>

thephyber · 2024-12-09T20:28:48 1733776128

I would wager it’s not uncommon.

But also, the execs are the ones making the business risk decisions. Just make sure they have the correct info to make those decisions, the. Your responsibility is done.

conductr · 2024-12-09T20:34:44 1733776484

I doubt responsibility is a concern, GP just doesn’t enjoy being a part of the shit show

thephyber · 2024-12-09T22:13:19 1733782399

And my core point is that most companies are shit shows. Employees know what bullet points they should have to minimize downside risk, but struggle with how to get those done while also minimizing upside risk.

In a world of scarcity, just keep communicating the tech debt. Maybe occasionally propose a project to address it.

soulofmischief · 2024-12-10T04:40:20 1733805620

Some people actually want to spend their time contributing to something meaningful. It also sounds like OP is worried executive incompetence might affect his job security.

thephyber · 2024-12-11T08:15:05 1733904905

Yeah, I inferred that from their post.

My point is that even “something meaningful” comes with tech debt. It’s like that at my current place.

Too many people get “grass is greener” syndrome and think that there is some magical company somewhere which gives everyone plenty of time to refactor everything and fix all of the tech debt and execs make fantastic business risk decisions which always benefit the employee. In a world of scarcity, that practically never happens.

Just weigh your options in the market. If it’s worth staying where you are, just realize that the employee is not responsible for making business risk decisions, only responsible for sufficiently informing those who do of the facts.

soulofmischief · 2024-12-11T08:45:57 1733906757

You're still assuming that OP, or anyone else, has the same values as you. As I said, some people want to work on something meaningful, or see the writing on the wall and want to increase their job security. It's not about the grass being greener. And sometimes, it is greener, and the only way you find out is by trying something new.

matwood · 2024-12-10T08:17:43 1733818663

The secret is to tie the tech debt to something that business wants. If that can’t be done then you have wonder how important it really is to address the debt.

hmmm-i-wonder · 2024-12-10T13:23:26 1733837006

I'm not sure if its a secret but its certainly one of the most practical ways to address technical debt.

Unfortunately we're at stage they will outright ignore what they're told, and then blame engineers for not being able to do what they said they couldn't do from the start. They refuse to acknowledge their impact on creating the tech debt in the first place by poor planning and wishful but impractical timelines, so proving to them we need to tackle any part of it is a struggle without letting things degrade to the point a real customer with significant money on the line is upset enough by the state of things to tackle it.

Which ultimately means we're at the horribly dysfunctional stage of management/company growth, the question is does it continue to get worse or does the CEO eventually learn and seriously look at the effectiveness of the VP levels and make changes...