I suppose that's true in one sense - in that I'm using EKS heavily, and don't maintain cluster health myself (other than all the creative ways I find to fuck up a node). And perhaps in another sense: It'll try its hardest to run some containers so matter how many times I make it OOMkill itself.
Buttttttttt Kubernetes is almost pure maintenance in reality. Don't get me wrong, it's amazing to just submit some yaml and get my software out into the world. But the trade off is pure maintenance.
The workflows to setup a cluster, decide which chicken-egg trade-off you want to get ArgoCD running, register other clusters if you're doing a hub-and-spoke model ... is just, like, one single act in the circus.
Then there's installing all the operators of choice from https://landscape.cncf.io/. I mean that page is a meme, but how many of us run k8s clusters without at least 30 pods running "ancillary" tooling? (Is "ancillary" the right word? It's stuff we need, but it's not our primary workloads).
A repeat circus is spending hours figuring out just the right values.yaml (or, more likely, hours templating it, since we're ArgoCD'ing it all, right?)
> As an side, I once spent HORUS figuring out to (incorrectly) pass boolean values around from a Secrets Manager Secret, to a k8s secret - via External Secrets, another operator! - to an ArgoCD ApplicationSet definition, to another values.yaml file.
And then you have to operationalize updating your clusters - and all the operators you installed/painstakingly configured. Given the pace of releases, this is literally, pure maintenance that is always present.
Finally, if you're autoscaling (Karpenter in our case), there's a whole other act in the circus (wait, am I still using that analogy?) of replacing your nodes "often" without downtime, which gets fun in a myriad of interesting ways (running apps with state is fun in kubernetes!)
So anyway, there's my rant. Low fucking maintenance!
"Low Maintenance" is relative to alternatives.
In my experience, any time I was dealing with K8s I needed much lower maintenance to get the same quality of service (everything from [auto]scaling, to faileover, deployment, rollback, disaster recovery, DevOps, ease of spinning up a completely independent cluster) compared to not using it. YMMV.
It's yet another argument, like many arguments against Kubernetes, that essentially boils down to "Kubernetes is complicated!"
No, deployment of a distributed system itself is complicated, regardless of the platform you deploy it on. Kubernetes is only "complicated" because it can do all of the things you need to do to deploy software, in a standard way. You can simplify by not using Kube, but then you have to hand roll all of the capabilities that Kube just gives you for free. If you don't need a good hunk of those capabilities, you probably don't need Kube.
I think the reason why people (correctly) point out that Kubernetes is complicated is because most people do not need a distributed system. People reach for k8s because it's trendy, but in truth a lot of users would be better off with a VM that gets configured with Chef/etc and just runs your software the old fashioned way.
K8s starts to make sense when you want to provide a common platform for a multitude of application developers to work on. Once you can understand it was born from Google's Borg and what problems they were trying to solve with both, the complexity behind it makes a lot more sense.
Most people actually do need a distributed system.
They want their system to be reliable to hardware failures. So when the server inevitably goes down some day, they want their website to continue to work. Very few people wants their website to go down.
They want their system to scale. So when the sudden rise of popularity hits the load balancer, they want their website to continue to work.
In the past, the price to run a distributed system was too high, so most people accepted the downsides of running a simple system.
Nowadays the price to run a distributed system is so low, that it makes little sense to avoid it anymore, for almost any website, if you can afford more than $50/month.
Very well put. I add to that: with an E2E solution, you need to learn things before you deploy your system (still not that you do everything properly), but without that, it's possible to deploy the system, then learn things can go wrong when they actually happen. Now if you ask someone who has a half baked distributed system, they still don't know all the failure modes of their system. I've seen this in mission-critical systems.
But in a company that had properly reliable infrastructure, any system that moved to the new infra based on K8s needed much less maintenance, had much more standardized DevOps (which allowed people from other teams to chime in when needed), and had much fewer mistakes. There was no disagreement that K8s stramlined everything.
I really want NixOS to succeed in becoming easy to configure, operate and onboard. I would like to avoid any other configuration management tool in my life again. In this way, you can have simple vm's and k8s for distributed use cases. Both that can be declared in code.
Exactly, there are a lot of comparisons that aren't apples to apples. If you're comparing kubernetes to a fixed size pool of resources running a fixed set of applications each with their own fixed resources, who cares? That's not how most deployments today.
One could make the argument that deployments today that necessitate K8s are too complex, I think there's a more convincing argument there, but my previous company was extremely risk averse in architecture (no resumé driven development) and eventually moved on to K8s, and systems at my current company often end up being way simpler than anyone would expect, but at scale, the coordination without a K8s equivalent would be too much work.
When choosing distributed systems platforms to work with, k8s vs. rolling your own orchestration isn’t the decision anyone is making. It’s k8s vs cloud vendors that want your money in exchange for the headaches.
Honestly running your own control plane is not that much harder than using something like EKS or GKE. The real complexity that the grandparent was talking about is all the tweaking and configuration you have to do outside of the control plane. Eg the infrastructure and deployments you’re building on top of Kubernetes and all of the associated configuration around that. In other words, whether you use EKS or hand roll your own kube you still have to solve node auto scaling. Load balancing. Metrics/observability. DNS and networking. Ingress. Etc etc etc.
I’ve been running k3s on hetzner for over 2 years now with 100% uptime.
In fact, it was so low maintenance that I lost my SSH key for the master node and I had to reprovision the entire cluster. Took about 90 mins including the time spent updating my docs. If it was critical I could have got that down to 15 mins tops.
20€/mo for a k8s cluster using k3s, exclusively on ARM, 3 nodes 1 master, some storage, and a load balancer with automatic dns on cloudflare.
People talking about the maintenance pain of running Kubernetes have actual customers and are not running their whole infrastructure on 20€/mo.
Anecdotes like these are not helpful.
I have thousands of services I'm running on ~10^5 hosts and all kinds of compliance and contractual requirements to how I maintain my systems. Maintenance pain is a very real table-stakes conversation for people like us.
How often do you perform version upgrades? Patching of the operation system of the nodes or control plane etc? Things quickly get complex if application uptime / availability is critical.
But you are not talking about maintaining Kubernetes, you are talking about maintaining a CI/CD system, a secret management system, some automation to operate databases, and so on.
Instead of editing some YAML files, in the "old" days these software vendors would've asked you to maintain a cronjob, ansible playbooks, systemd unit, bash scripts...
Yeah, they are basically DIY-ing their own "cloud" in a way, which is what kuberetes was designed for.
It's indeed a lot of maintenance to run things thins way. You're no longer operationalizing your own code, you're also operating (as you mentioned) a CI/CD, secret management, logging, analytics, storage, databases, cron tasks, message brokers, etc. You're doing everything.
On one (if you're not doing anything super esoteric or super cloud specific) migrating kubernetes based deployments between clouds has always been super easy for me. I'm currently managing a k3s cluster that's running a nodepool on AWS and a nodepool on Azure.
I’m a little confused by the first paragraph of this comment. Kubernetes wasn’t designed to be an end-to-end solution for everything needed to support a full production distributed stack. It manages a lot of tasks, to be sure, but it doesn’t orchestrate everything that you mentioned in the second paragraph.
> Kubernetes wasn’t designed to be an end-to-end solution for everything needed to support a full production distributed stack.
I'll admit I know very little about the history of kubernetes before ~2017, BUT 2017-present kubernetes is absolutely designed/capable of being your end to end solution for everything.
The idea is wrap cloud provider resources in CRDs. So instead of creating an AWS ELB or an Azure SLB, you create a Kubernetes service of type LoadBalancer. Then kubernetes is extensible enough for each cloud provider to swap what "service of type LoadBalancer" means for them.
For higher abstraction services (SaaS like ones mentioned above) the idea is similar. Instead of creating an S3 bucket, or an Azure Storage Account, you provision CubeFS on your cluster (So now you have your own S3 service) then you create a CubeFS Bucket.
You can replace all the services listed above, with free and open source (under a foundation) alternatives. As long as you can satisfy the requirements of CubeFS, you can have your own S3 service.
Of course you're now maintaining the equivalent of github, circleci, S3, ....
Kubernetes gives you a unified way of deploying all these things regardless of the cloud provider. Your target is Kubernetes, not AWS, Microsoft or Google.
The main benefit (to me) is with Kubernetes you get to choose where YOU want to draw the line of lock-in vs value. We all have different judgements after all
Do you see no value in running and managing kafka? maybe SQS is simple enough and cheap enough that you just use it. Replacing it with a compatible endpoint is cheap.
Are you terrified of building your entire event based application on top of SQS and Lambda? How about Kafka and ArkFlow?
Now you obviously trade one risk for another. You're trading the risk of vendor lock-in with AWS, but at the same time just because ArkFlow is free and open source, doesn't mean that it'll be as maintained in 8 years as AWS Lambda is gonna be. Maybe maybe not. You might have to migrate to another service.
> Of course you're now maintaining the equivalent of github, circleci, S3, ....
On this we agree. That's a nontrivial amount of undifferentiated heavy lifting--and none is a core feature of K8S. You are absolutely right that you can use K8S CRDs to use K8S as the control plane and reduce the number of idioms you have to think about, but the dirty details are in the data plane.
Yeah, but you significantly increase your changes of getting the data plane working if you are always using the same control plane. The control plane is setting up an S3 bucket for you. That bucket could come from AWS, CubeFS, Backblaze, you don't care. S3 is a simple protocol but same goes for more complex ones.
> and none is a core feature of K8S
The core feature of k8s is "container orchestration" which is extremely broad. Whatever you can run by orchestrating containers which is everything. The other core feature is extensibility and abstraction. So to me CRDs are as core to kubernetes as anything else really. They are such a simple concept, that custom vs built-in is only a matter of availability and quality sometimes.
> That's a nontrivial amount of undifferentiated heavy lifting
Yes it is. Like I said, the benefit of kubernetes is it gives you the choice of where you wanna draw that line. Running and maintaining GitHub, CircleCI and S3 is a "nontrivial amount of undifferentiated heavy lifting" to you. The equation might be different to another business or organization. There is a popular "anti-corporation, pro big-government" sentemnt on the internet today, right? would it make sense for say an organization like the EU to take hard dependency on GitHub or CircleCI? or should they contract OVH and run their own Github, CircleCI instances?
People always complain about vendor-lock in, closed source services, bait and switch with services, etc. with Kubernetes, you get to choose what your anxieties are, and manage them yourself.
> You significantly increase your changes of getting the data plane working if you are always using the same control plane.
That is 100% not true and why different foundational services have (often vastly) different control planes. The Kubernetes control plane is very good for a lot of things, but not everything.
> People always complain about vendor-lock in, closed source services, bait and switch with services, etc. with Kubernetes, you get to choose what your anxieties are, and manage them yourself.
There is no such thing as zero switching costs (even if you are 100% on premise). Using Kubernetes can help reduce some of it, but you can't take a mature complicated stack running on AWS in EKS and port it to AKS or GKE or vice versa without a significant amount of effort.
Well you know, we went from not knowing that kubernetes can orchestrate everything, to arguing "k8s best practices" for portability so there is room for progress.
The reality is yes, noting is zero switching costs. There are plenty of best practices to how to utilize k8s for least headache migrations. It's very doable and I see it all done all the time.
It all depends what you are doing. If you just want to run simple webservers then it's certainly lower maintenance than having a fleet of named/Simpson servers to run them.
The trouble is you then start doing more. You start going way beyond what you were doing before. Like you ditch RDS and just run your DBs in cluster. You stop checking your pipelines manually because you implement auto-scaling etc.
It's not free, nobody ever said it was, but could you do all the stuff you mentioned on another system with a lower maintenance burden? I doubt it.
What it boils down to is running services has maintenance still, but it's hopefully lower than before and much of the burden is amortized across many services.
But you definitely need to keep an eye on things. Don't implement auto-scaling unless you're spending a lot of your time manually scaling. Otherwise you've now got something new to maintain without any payoff.
The fundamental tension is that there is real complexity to running services in production that you can't avoid/hide with a better abstraction. So your options are to deal with the complexity (the k8s case) or pay someone else to do it (one of the many severless "just upload your code" platforms).
You can hide the ops person but it but you can't remove them from the equation which is what people seem to want.
I suppose that's true in one sense - in that I'm using EKS heavily, and don't maintain cluster health myself (other than all the creative ways I find to fuck up a node). And perhaps in another sense: It'll try its hardest to run some containers so matter how many times I make it OOMkill itself.
Buttttttttt Kubernetes is almost pure maintenance in reality. Don't get me wrong, it's amazing to just submit some yaml and get my software out into the world. But the trade off is pure maintenance.
The workflows to setup a cluster, decide which chicken-egg trade-off you want to get ArgoCD running, register other clusters if you're doing a hub-and-spoke model ... is just, like, one single act in the circus.
Then there's installing all the operators of choice from https://landscape.cncf.io/. I mean that page is a meme, but how many of us run k8s clusters without at least 30 pods running "ancillary" tooling? (Is "ancillary" the right word? It's stuff we need, but it's not our primary workloads).
A repeat circus is spending hours figuring out just the right values.yaml (or, more likely, hours templating it, since we're ArgoCD'ing it all, right?)
> As an side, I once spent HORUS figuring out to (incorrectly) pass boolean values around from a Secrets Manager Secret, to a k8s secret - via External Secrets, another operator! - to an ArgoCD ApplicationSet definition, to another values.yaml file.
And then you have to operationalize updating your clusters - and all the operators you installed/painstakingly configured. Given the pace of releases, this is literally, pure maintenance that is always present.
Finally, if you're autoscaling (Karpenter in our case), there's a whole other act in the circus (wait, am I still using that analogy?) of replacing your nodes "often" without downtime, which gets fun in a myriad of interesting ways (running apps with state is fun in kubernetes!)
So anyway, there's my rant. Low fucking maintenance!