All these "maybe you don't need this or that X" posts die in an instant when the user already knows how to do X (when the learning curve argument is gone).
Let's get it right:
Kubernetes is really really cheap. I can run 20 low volume apps in a kubes cluster with a single VM. This is cheaper than any other hosting solution in the cloud if you want the same level of stability and isolation. It's even cheaper when you need something like a Redis cache. If my cache goes down and the container needs to be spun up again then it's not a big issue, so for cheap projects I can even save more cost by running some infra like a Redis instance as a container too. Nothing beats that. It gets even better, I can run my services in different namespaces, and have different environments (dev/staging/etc.) isolated from each other and still running on the same amount of VMs. When you caculate the total cost saving here to traditional deployments it's just ridiculously cheap.
Kubernetes makes deployments really easy. docker build + kubectl apply. That's literally it. Deployments are two commands and it's live, running in the cloud. It's elastic, it can scale, etc.
Kubernetes requires very little maintenance. Kubernetes takes care of itself. A container crashes? Kubes will bring it up. Do I want to roll out a new version? Kubes will do a rolling update on its own. I am running apps in kubes and for almost 2 years I haven't looked at my cluster or vms. They just run. Once every 6 months I log into my console and see that I can upgrade a few nodes. I just click ok and everything happens automatically with zero downtime.
I mean yes, theoretically nothing needs Kubernetes, because the internet was the same before we had Kubernetes, so it's certainly not needed, but it makes life a lot easier. Especially as a cheap lazy developer who doesn't want to spend time on any ops Kubernetes is really the best option out there next to serverless.
If learning Kubernetes is the reason why it's "not needed" then nothing is needed. Why use a new programming language? Why use a new db technology? Why use anything except HTML 4 + PHP, right?
BTW, learning Kubernetes can be done in a few days.
All of this glosses over the biggest issue with Kubernetes: it's still ridiculously complex, and troubleshooting issues that arise (and they will arise), can leave you struggling for days pouring over docs, code, GitHub issues, stackoverflow... All of the positives you listed rely on super complex abstractions that can easily blow up without a clear answer as to "why".
Compared to something like scp and restarting services, I would personally not pay the Kubernetes tax unless I absolutely had to.
Exactly. A year or so ago I thought, hey, maybe I should redo my personal infrastructure using Kubernetes. Long story short, it was way too much of a pain in the ass.
As background, I've done time as a professional sysadmin. My current infrastructure is all Chef-based, with maybe a dozen custom cookbooks. But Chef felt kinda heavy and clunky, and the many VMs I had definitely seemed heavy compared with containerization. I thought switching to Kubernetes would be pretty straightforward.
Surprise! It was not. I moved the least complex thing I run, my home lighting daemon to it; it's stateless and nothing connects to it, but it was still a struggle to get it up and running. Then I tried adding more stateful services and got bogged down in bugs, mysteries, and Kubernetes complexity. I set it aside, thinking I'd come back to it later when I had more time. That time never quite arrived, and a month or so ago my home lights stopped working. Why? I couldn't tell. A bunch of internal Kubernetes certificates expired, so none of the commands worked. Eventually, I just copy-pasted stuff out of Stack Overflow and randomly rebooted things, and eventually it started working again.
I'll happily look at it again when I have to do serious volume and can afford somebody to focus full-time on Kubernetes. But for anything small or casual, I'll be looking elsewhere.
At work we're building an entire service platform on top of managed kubernetes services, agnostic to cloud provider. We had already had bad experiences running K8s ourselves.
Going into it we knew how much of a PITA it would be but we vastly underestimated how much, IMO.
Written 18 years ago, so obviously not about Kubernagus, but it does explain the same phenomenon. Replace Microsoft with cloud providers and that's more or less the same argument.
> Long story short, it was way too much of a pain in the ass.
Kubernetes has a model for how your infrastructure and services should behave. If you stray outside that model, then you'll be fighting k8s the entire way and it will be painful.
If however you design your services and infrastructure to be within that model, then k8s simplifies many things (related to deployment).
The biggest issue I have with k8s as a developer is that while it simplifies the devops side of things, it complicates the development/testing cycle by adding an extra layer of complication when things go wrong.
I run my home automation and infrastructure on kubernetes, and for me that is one of the smoothest ways of doing it. I find it quite easy to deal with, and much prefer it to the “classic” way of doing it.
what dark magic are you using? Not joking. I've tried learning kubernetes several times and gave up. Maybe I'm not the smartest. Can you point to guides that helped you get up and running smoothly? This is probably something I should put some more effort into in the coming months.
I think this is really hard, it's a bit like how we talk about learning Rails in the Ruby community. "Don't do it"
Not because it's bad or especially hard, but because there's so much to unpack, and it's so tempting to unpack it all at once, and there's so much foundational stuff (Ruby language) which you really ought to learn before you try to analyze in detail exactly how the system is built up.
I learned Kubernetes around v1.5 just before RBAC was enabled by default, and I resisted upgrading past 1.6 for a good long while (until about v1.12) because it was a feature I didn't need, and all the features after it appeared to be something else which I didn't need.
I used Deis Workflow as my on-ramp to Kubernetes, and now I am a maintainer of the follow-on fork, which is a platform that made great sense to me, as I was a Deis v1 PaaS user before it was rewritten on top of Kubernetes.
Since Deis left Workflow behind after they were acquired by Microsoft, I've been on Team Hephy, which is a group of volunteers that maintains the fork of Deis Workflow.
This was my on-ramp, and it looks very much like it did in 2017, but now we are adding support for Kubernetes v1.16+ which has stabilized many of the main APIs.
If you have a way to start a Kubernetes 1.15 or less cluster, I can recommend this as something to try[1]. The biggest hurdle of "how do I get my app online" is basically taken care of you. Then once you have an app running in a cluster, you can start to learn about the cluster, and practice understanding the different failure modes as well as how to proceed with development in your new life as a cluster admin.
If you'd rather not take on the heavyweight burden of maintaining a Workflow cluster and all of its components right out of the gate (and who could blame you) I would recommend you try Draft[2], the lightweight successor created by Deis/Azure to try to fill the void left behind.
Both solutions are based on a concept of buildpacks, though Hephy uses a combination of Dockerfile or Heroku Buildpacks and by comparison, Draft has its own notion of a "Draftpack" which is basically a minimalistic Dockerfile tailored for whatever language or framework you are developing with.
I'm interested to hear if there are other responses, these are not really guides so much as "on-ramps" or training wheels, but I consider myself at least marginally competent, and this is how I got started myself.
Moreover, if you are keeping pace with kubeadm upgrades at all (minor releases are quarterly, and patches are more frequent) then since the most recent minor release, Kubernetes 1.17, certificate renewal as an automated part of the upgrade process is enabled by default. You would have to do at least one cluster upgrade per year to avoid expired certs. tl;dr: this cert expiration thing isn't a problem anymore, but you do have to maintain your clusters.
(Unless you are using a managed k8s service, that is...)
The fact remains also that this is the very first entry under "Administration with Kubeadm", so if you did use kubeadm and didn't find it, I'm going to have to guess that either docs have improved since your experience, or you really weren't looking to administrate anything at all.
I appreciate the links, but for my home stuff I'll be ripping Kubernetes out.
The notion that one has to keep pace with Kubernetes upgrades is exactly the kind of thing that works fine if you have a full-time professional on the job, and very poorly if it's a sideline for people trying to get actual productive work done.
Which is fine; not everything has to scale down. But it very strongly suggests that there's a minimum scale at which Kubernetes makes sense.
Or, that there is a minimum scale/experience gradient behind which you are better served by a decent managed Kubernetes, when you're not prepared to manage it yourself. Most cloud providers have done a fairly good job to make it affordable.
I think it's fair to say that the landscape of Kubernetes proper itself (the open source package) has already reached a more evolved state than the landscape of managed Kubernetes service providers, and that's potentially problematic, especially for newcomers. It's hard enough to pick between the myriad choices available; harder still when you must justify your choice to a hostile collaborator who doesn't agree with part or all.
IMO, the people who complain the loudest about the learning curve of Kubernetes are those who have spent a decade or more learning how to administer one or more distributions of Linux servers, who have made the transition from SysV init to SystemD, and in many cases who are now neck deep in highly specialized AWS services, which in many cases they have used successfully to extricate from the nightmare-scape where one team called "System Admins" is responsible for broadly everything that runs or can run on any Linux server (or otherwise), from databases, to vendor applications, to monitoring systems, new service dev, platforming apps that were developed in-house, you name it...
I basically don't agree that there is a minimum scale for Kubernetes, and I'll assert confidently that declarative system state management is a good technology, that is here to stay. But I respect your choice and I understand that not everyone shares my unique experiences, that led me to be more comfortable using Kubernetes for everything from personal hobby projects, to my own underground skunkworks at work.
In fact it's a broadly interesting area of study for me, "how do devs/admins/(people at large) get into k8s" since it is such a steep learning curve, and this has all happened so fast, there is so much to unpack before one can start to feel comfortable that there isn't really that much more complexity buried behind that you haven't deeply explored already and understood.
It sounds like we both agree there's a minimum scale for running your own Kubernetes setup, or you wouldn't be recommending managed Kubernetes.
But a managed Kubernetes approach only makes sense if you want all your stuff to run in that vendor's context. As I said, I started with home and personal projects. I'd be a fool to put my home lighting infrastructure or my other in-home services in somebody's cloud. And a number of my personal projects make better economic sense running on hardware I own. If there's a managed Kubernetes setup that will manage my various NUCs and my colocated physical server, I'm not aware of it.
> there's a minimum scale for running your own Kubernetes setup
I would say there is a minimum scale that makes sense, for control plane ownership, yes. Barring other strong reasons that you might opt to own and manage your own control plane like "it's for my home automation which should absolutely continue to function if the internet is down"...
I will concede you don't need K8s for this use case, even if you like containers and wanted to use containers, but don't have much prior experience with K8s, from a starting position of "no knowledge" you will probably have a better time with compose and swarm. There is a lot to learn about K8s to a newcomer, but the more you already learned, the less likely I would be to recommend using swarm, or any other control plane (or anything else.)
This is where I feel the fact I mentioned that managed k8s ecosystem is not as evolved as it will likely soon become is relevant. You may be right that no managed Kubernetes setups will handle your physical servers today, but I think the truth is somewhere between: they're coming / they're already here but most are not quite ready for production / they are here, but I don't know what to recommend strongly.
I'm leaning toward the latter (I think that if you wanted a good managed bare metal K8s, you could definitely find it.) I know some solutions that will manage bare metal nodes, but this is not a space I'm intimately familiar with.
The solutions that I do know of, are in early enough state of development that I hesitate to mention. It won't be long before this gets much better. The bare metal Cluster API provider is really something, and there are some really amazing solutions being built on top of it. If you want to know where I think this is going, check this out:
WKS and the "firekube" demo, a GitOps approach to managing your cluster (yes, even for bare metal nodes)
I personally don't use this yet, I run kubeadm on a single bare metal node and don't worry about scaling, or the state of the host system, or if it should become corrupted by sysadmin error, or much else really. The abstraction of the Kubernetes API is extremely convenient when you don't have to learn it from scratch anymore, and doubly so if you don't have to worry about managing your cluster. One way to make sure you don't have to worry, is to practice disaster recovery until you get really good at it.
If my workloads are containerized, then I will have them in a git repo, and they are disposable (and I can be sure, as they are regularly disposed of, as part of the lifecycle). Make tearing your cluster down and standing it back up a regular part of your maintenance cycles until you're ready to do it in an emergency situation with people watching. It's much easier than it sounds, and it's definitely easier than debugging configuration issues to start over again.
The alternative that I would recommend for production right now, if you don't like any managed kubernetes, is to become familiar with the kubeadm manual. It's probably quicker to read it and study for CKA than it would be to canvas the entire landscape of managed providers for the right one.
I'm sure it was painful debugging that certificate issue, I have run up against that issue in particular before myself. It was after a full year or more of never upgrading my cluster (shame on me), I had refused to learn RBAC, kept my version pinned at 1.5.2, and at some point after running "kubeadm init" and "kubeadm reset" over and over again it became stable enough (I stopped breaking it) that I didn't need to tear it down anymore, for a whole year.
And then a year later certs expired, and I could no longer issue any commands or queries to the control plane, just like yours.
Once I realized what was happening, I tried to renew the certs for a few minutes, I honestly didn't know enough to look up the certificate renewal docs, I couldn't figure out how to do it on my own... I still haven't read all the kubeadm docs. But I knew I had practiced disaster recovery well over a dozen times, and I could repeat the workloads on a new cluster with barely any effort (and I'd wind up with new certs.) So I blew the configuration away and started the cluster over (kubeadm reset), reinstalled the workloads, and was back in business less than 30 minutes later.
I don't know how I could convince you that it's worth your time to do this, and that's OK (it's not important to me, and if I'm right, in 6 months to a year it won't even really matter anymore, you won't need it.) WKS looks really promising, though admittedly still bleeding edge right now. But as it improves and stabilizes, I will likely use this instead, and soon after that forget everything I ever knew about building kubeadm clusters by hand.
Kubernetes, once you know it, is significantly easier than cobbling together an environment from "classical" solutions that combine Puppet/Chef/Ansible, homegrown shell scripts, static VMs, and SSH.
Sure, you can bring up a single VM with those technologies and be up and running quickly. But a real production environment will need automatic scaling (both of processes and nodes), CPU/memory limits, rolling app/infra upgrades, distributed log collection and monitoring, resilience to node failure, load balancing, stateful services (e.g. a database; anything that stores its state on disk and can't use a distributed file system), etc., and you end up building a very, very poor man's Kubernetes dealing with all of the above.
With Kubernetes, all of the work has been done, and you only need to deal with high-level primitives. "Nodes" become an abstraction. You just specify what should run, and the cluster takes care of it.
I've been there, many times. I ran stuff the "classical" Unix way -- successfully, but painfully -- for about 15 years and I'm not going back there.
There are alternatives, of course. Terraform and CloudFormation and things like that. There's Nomad. You can even cobble together something with Docker. But those solutions all require a lot more custom glue from the ops team than Kubernetes.
The majority of what you posted reiterates the post I responded to , and it doesn't address the complexity of those features or their implementation. Additionally, I challenge your assertion that "real production environments" need automatic scaling.
You missed my point. I was contrasting Kubernetes with the alternative: Critics often highlight Kubernetes' complexity, forgetting/ignoring that replicating its functionality is also complex and often not composable or transferable to new projects/clusters. It's hard to design a good, flexible Puppet (or whatever) configuration that grows with a company, can be maintained across teams, handles redundancy, and all of those other things.
Not all environments need automatic scaling, but they need redundancy, and from a Kubernetes perspective those are two sides of the same coin. A classical setup that automatically allows a new node to start up to take over from a dysfunctional/dead one isn't trivial.
Much of Kubernetes' operational complexity also melts away if you choose a managed cloud such as Digital Ocean, Azure, or Google Cloud Platform. I can speak from experience, as I've both set up Kubernetes from scratch on AWS (fun challenge, wouldn't want to do it often) and I am also administering several clusters on Google Cloud.
The latter requires almost no classical "system administration". Most of the concerns are "hoisted" up to the Kubernetes layer. If something is wrong, it's almost never related to a node or hardware; it's all pod orchestration and application configuration, with some occasional bits relating to DNS, load balancing, and persistent disks.
And if I start a new project I can just boot up a cluster (literally a single command) and have my operational platform ready to serve apps, much like the "one click deploy" promise of, say, Heroku or Zeit, except I have almost complete control of the platform.
In my opinion, Kubernetes beats everything else even on a single node.
Maybe, but the point with containers and kubernets is to treat it like cattle, not pets.
If something blows up or dies, then with Kubernetes it's often faster to just tear down the entire namespace and bring it up again. If the entire cluster is dead, then just spin up a new cluster and run your yaml files on it and kill your old cluster.
Treat it like cattle, when it doesn't serve your purpose anymore then shoot it.
This is one of the biggest advantages of Kubes, but often overlooked because traditional Ops people keep treating infrastructure like a pet.
Only thing you should treat like a pet is your persistence layer, which is presumably outside Kubes, somehting like DynamoDb, Firestore, CosmosDb, SQL server, whatever.
This is not good engineering. If somebody told me this at a business, I’d not trust them anymore with my infrastructure.
So, you say that problems happen, and you consciously don’t want to know/solve them. A recurring problem in you view is solved with constantly building new K8s clusters and your whole infrastructure in it every time!?!
Simple example - A microservice that leaks memory.... let it keep restarting as it crashes?!
I remember at one of my first jobs, at a healthcare system for a hospital in India, their Java app was so poorly written that it kept leaking memory and bloated beyond GC could help and will crash every morning at around 11 AM and then again at around 3 PM. The end users - Doctors, nurses, pharmacists knew about this behavior and had breaks during that time. Absolutely bullshit engineering! It’s a shame on those that wrote that shitty code, and shame on whoever reckless to suggest a ever rebuilding K8s clusters.
Yes, "let it keep restarting while it crashes and while I investigate the issue" is MUCH preferred to "everything's down and my boss is on my ass to fix the memory issue."
The bug exists either way, but in one world my site is still up while I fix the bug and prioritize it against other work and in another world my site is hard-down.
That only works if the bug actually gets fixed. When you have normalized the idea that restarting the cluster fixes a problem — all of the sudden, you don’t have a problem anymore. So now your motivation to get the bug properly fixed has gone away.
Sometimes feeling a little pain helps get things done.
You and I wish that's what happened in real life. Instead, people now normalize the behavior thinking it'll sort itself out automatically over time without ever trying to fix it.
Self-healing systems are good but only if you have someone who is keeping track of the repeated cuts to the system.
This is something that has been bothering me for the last couple of years. I consistently work with developers who no longer care about performance issues, assuming that k8s and the ops team will take care of it by adding more CPU or RAM or just restarting. What happened to writing reliable code that performed well?
Business incentives. It's a classic incentive tension between more time on nicer code that does the same thing or building more features. Code expands to it's performance budget and all.
At least on backend you can quantify the cost fairly easily. If you bring it up to your business people they will notice easy win and then push the devs to make more efficient code.
If it's a small $$ difference although, the devs are probably prioritizing correctly.
I've witnessed the same thing, however there is nothing mutually exclusive about having performant code running in Kubernetes. There's a trade-off between performance and productivity, and maintaining a sense of pragmatism is a good skill to have (that's directed towards those that use scaling up/out as a reason for being lax about performance).
Nothing is this black and white. I tried to emphasise just a simple philosophy that life gets a lot easier if you make things easily replaceable. That was the message I tried to convey, but of course if there is a deep problem with something it needs proper investigation + fixing, but that is an actual code/application problem.
That's not what cattle vs pets is. Treating your app as cattle means that it deploys, terminates, and re-deploys with minimal thought at the time of where and how. Your app shouldn't care which Kubernetes node it gets deployed to. There shouldn't be some stateful infrastructure that requires hand-holding (e.g. logging into a named instance to restart a specific service). Sometimes network partitions happen, a disk starts going bad, or some other funky state happens and you kill the Kubernetes pod and move on.
You should try to fix mem leaks and other issues like the one you described, and sometimes you truly do need pets. Many apps can benefit from being treated like cattle, however.
When cattle are sick, you need to heal them. Not shoot them in the head and bring in new cattle. If you your software behaves badly you need to FIX THE SOFTWARE.
Just doing the old 'just restart everything' is typical windows admin behavoir and a recipy for making bad unstable systems.
Kubernetes absolutly does do strang things, crahes on strange things, does strange things and not tell you about it.
I like the system, but to pretend its this unbelievable great thing is an exaturation.
> All of this glosses over the biggest issue with Kubernetes: it's still ridiculously complex, and troubleshooting issues that arise (and they will arise), can leave you struggling for days pouring over docs, code, GitHub issues, stackoverflow.
It's probably good at this point to distinguish between on-prem and managed installations of k8s. In almost four years of running production workloads on Google's GKE we've had... I don't know perhaps 3-4 real head-scratchers where we had to spend a couple of days digging into things. Notably none of these issues have ever left any of our clusters or workloads inoperable. It isn't hyperbole to say that in general the system just works, 24x7x365.
Agreed. We moved from ECS to GKE specifically because we didn't have the resources to handle what was supposed to be a "managed" container service with ECS. Had agent issues constantly where we couldn't deploy. It did take a little bit to learn k8s no doubt. But now it requires changes so little I usually have to think for a minute to remember how something works because it's been so long since I needed to touch it.
Agree that the k8s tax, as described, is a huge issue. But I think the biggest issue is immaturity of the ecosystem, with complexity coming in second. You can at least throw an expensive developer at the complexity issue.
But when it comes to reliable installations (even helm charts for major software is a coin flip in terms of whether they’ll work), fast moving versioning that reminds me of the JavaScript Wild West (the recent RBAC on by default implementation comes to mind, even if its a good thing), and unresolved problems around provider-agnostic volumes and load balancing... those are headaches that persist long after you’ve learned the difference between a replicaSet and a deployment.
To further this point about the ecosystem, and this is AWS specific. You need, or have needed, to install a handful of extra services/controllers onto your EKS cluster to get it to integrate the way most would expect with AWS. Autoscalling? Install and configure the autoscaler. IAM roles? Install Kube2IAM. DNS/ALB/etc etc? etc etc etc.
After a slog you get everything going. Suddenly a service is throwing errors because it doesn't have IAM permissions. You look into it and it's not getting the role from the kube2iam proxy. Kube2iam is throwing some strange error about a nil or interface cast. Let's pretend you know Go like I do. The error message still tells you nothing specific about what the issue may be. Google leads you to github and you locate an issue with the same symptoms. It's been open for over a year and nobody seems to have any clue what's going on.
Good times :) Stay safe everyone, and run a staging cluster!
Kubernetes can be very complex.. or it can be very simple. Just like a production server can be very simple, or extremely complex, or a linux distro, or an app..
Kubernetes by itself is a very minimal layer. If you install every extension you can into it, then yes, you'll hit all kinds of weird problems, but that's not a Kubernetes problem.
You could use this argument for literally anything, though. I spent days poring over docs, googling, SOF, github issues, making comments, the whole works when I learned any new software/technology. The argument doesn't hold water, IMO.
You can make an argument that Linux is ridiculously complex and troubleshooting issues that arise can leave you struggling for days pouring over docs, code, etc and that msdos is a much simpler system and be sort of right.
True. I'm currently in the middle of writing a paper on extending Kubernetes Scheduler through Scheduler Extender[1]. The process has been really painful.
You're saying a feature that's in alpha, released 2 months ago is painful? You should at least wait until a feature is beta until expecting it to be easier to use.
Scheduler Extender was initially released over 4 years ago[1]. What you are referring to is Scheduling Framework[2], which indeed is a new feature (and will replace/contain Scheduler Extender).
> can leave you struggling for days pouring over docs, code, GitHub issues, stackoverflow
I've had that when running code straight on a VM, when running on Docker, and when running on k8s. I can't think of a way to deploy code right now that lets you completely avoid issues with systems that you require but are possibly unfamiliar with, except maybe "serverless" functions.
\ And of those three, I much preferred the k8s failure states simply because k8s made running _my code_ much easier.
> I can't think of a way to deploy code right now that lets you completely avoid issues with systems that you require but are possibly unfamiliar with, except maybe "serverless" functions.
This is basically the same comment I was going to write, so I'll just jump onto it. But whenever I hear people complain about how complex XXX solution is for deployment, I always think, "ok, I agree that it sucks, but what's the alternative?"
Deploying something right now with all of its ancillary services is a chore, no matter how you do it. K8s is a pain in the ass to set up, I agree. But it seems to maintain itself the best once it is running. And longterm maintainability cannot be overlooked when considering deployment solutions.
When I look out in the sea of deployment services and options that exist right now, each option has its own tradeoffs. Another service might elimite or minimize anothers' tradeoffs, but it then introduces its' own tradeoffs. You are trading one evil for another. And this makes it nearly impossible to say "X solution is the best deployment solution in 2020". Do you value scalability? Speed? Cost? Ease of learning? There are different solutions to optimize each of these schools of thought, but it ultimately comes down to what you value most, and another developer isn't going to value things in the same way, so for them, another solution is better.
The only drop-dead simple, fast, scalable, deployment solution I have seen right now is static site hosting on tools like Netlify or AWS Amplify (among others). But these only work for static generated sites, which were already pretty easy to deploy, and they are not an option for most sites outside of marketing sites, landing pages, and blogs. They aren't going to work for service based sites, nor will they likely replace something being deployed with K8s right now. So they are almost moot in this argument, but I bring it up, because they are arguably, the only "best deployment solution" right now if you are building a site that can meet its' narrow criteria.
If you're running on a single host anyways, why not just use init scripts or unit files? All Kubernetes is giving you is another 5-6 layers of indirection and abstraction.
EDIT: Quick clarification: still use containers. However, running containers doesn't require running Kubernetes.
> learning Kubernetes can be done in a few days
The basic commands, perhaps. But with Kubernetes' development velocity, the learning will never stop - you really do need (someone) dedicated part time to it to ensure that a version upgrade doesn't break automation/compliance (something that's happened to my company a few times now).
> If you're running on a single host anyways, why not just use init scripts or unit files?
You're absolutely right. Init scripts and systemd unit files could do every single thing here. With that said, might there be other reasons?
The ability to have multiple applications running simultaneously on a host without having to know about or step around each other is nice. This gets rid of a major headache, especially when you didn't write the applications and they might not all be well-behaved in a shared space. Having services automatically restart and having their dependent services handled is also a nice bonus, including isolating one instance of a service from another in a way that changing a port number won't go around.
Personally, I've also found that init scripts aren't always easy to learn and manage either. But YMMV.
> The ability to have multiple applications running simultaneously on a host without having to know about or step around each other is nice.
If you're running containers, you get that for free. You can run containers without running Kubernetes.
And unit/init files are no harder (for simple cases like this, it's probably significantly easier) than learning the Kubernetes YAML DSL. The unit files in particular will definitely be simpler, since systemd is container aware.
I'm extremely cynical about init scripts. I've encountered too many crusty old systems where the init scripts used some bizarre old trick from the 70s.
Anyway. Yes, you're once more absolutely correct. Every thing here can be done with unit scripts and init scripts.
Personally, I've not found that the YAML DSL is more complex or challenging than the systemd units. At one point I didn't know either, but I definitely had bad memories of managing N inter-dependent init scripts. I found it easier to learn something I could use at home for an rpi and at work for a fleet of servers, instead of learning unit scripting for my rpi and k8s for the fleet.
It's been my experience that "simple" is generally a matter of opinion and perspective.
75% of this is boilerplate, but there's not a lot of repetition and most of it is relevant to the service itself. The remaining lines describes how you interact with Docker normally.
In comparison, here's a definition to set up the same container as a deployment in Kubernetes.
Almost 90% of this has meaning only to Kubernetes, not even to the people who will have to view this object later. There's a lot of repetition of content (namely labels and nested specs), and the "template" portion is not self-explanatory (what is it a template of? Why is it considered a "template"?)
This is not to say that these abstractions are useless, particularly when you have hundreds of nodes and thousands of pods. But for a one host node, it's a lot of extra conceptual work (not to mention googling) to avoid learning how to write unit files.
That's a great example! Thank you very much for sharing.
That said, it's been my experience that a modern docker application is only occasionally a single container. More often it's a heterogeneous mix of three or more containers, collectively comprising an application. Now we've got multiple unit files, each of which handles a different aspect of the application, and now this notion of a "service" conflates system-level services like docker and application-level things like redis. There's a resulting explosion of cognitive complexity as I have to keep track of what's part of the application and what's a system-level service.
Meanwhile, the Kubernetes YAML requires an extra handful of lines under the "containers" key.
Again, thank you for bringing forward this concrete example. It's a very kind gesture. It's just possible that use-cases and personal evaluations of complexity might differ and lead people to different conclusions.
> There's a resulting explosion of cognitive complexity as I have to keep track of what's part of the application and what's a system-level service.
If you can start them up with additional lines in a docker file (containers in a pod), it's just another ExecStart line in the unit file that calls Docker with a different container name.
EDIT: You do have to think a bit differently about networking, since the containers will have separate networks by default with Docker, in comparison to a k8s pod. You can make it match, however, by creating a network for the shared containers.
If, however, there's a "this service must be started before the next", systemd's dependency system will be more comprehensible than Kubernetes (since Kubernetes does not create dependency trees; the recommended method is to use init containers for such).
As a side note, unit files can also do things like init containers using the ExecStartPre hook.
Even here on HN, one need not look far to find someone who resents everything about systemd and insists on using a non-systemd distro. Such people run systems in real life, too.
Should you run into such a system, you're still just writing code, and interacting with a daemon that takes care of the hardest parts of init scripts for you.
There are no pid files. There are no file locks. There is no "daemonization" to worry about. There is no tracking the process to ensure it's still alive.
Just think about how you would interact with the docker daemon to start, stop, restart, and probe the status of a container, and write code to do exactly that.
Frankly, Docker containers are the simplest thing you could ever have to write an init script for.
It gives you good abstractions for your apps. I know exactly what directories each of my apps can write to, and that they can't step on each others' toes. Backing up all of their data is easy because I know the persistent data for all of them is stored in the same parent directory.
Even if the whole node caught on fire, I can restore it by just creating a new Kubernetes box from scratch, re-applying the YAML, and restoring the persistent volume contents from backup. To me there's a lot of value over init scripts or unit files.
> I know exactly what directories each of my apps can write to, and that they can't step on each others' toes
You can do this with docker commands too. Ultimately, that's all that Kubernetes is doing, just with a YAML based DSL instead of command line flags.
> Even if the whole node caught on fire, I can restore it
So, what's different from init/unit files? Just rebuild the box and put in the unit files, and you get the same thing you had running before. Again, for a single node there's nothing that Kubernetes does that init/unit files can't do.
> You can do this with docker commands too. Ultimately, that's all that Kubernetes is doing, just with a YAML based DSL instead of command line flags.
Well, I mean, mostly. You're gonna be creating your own directories and mapping them into your docker-compose YAMLs or Docker CLI commands. And if you have five running and you're ready to add your sixth, you're gonna be SSHing in to do it again. Not quite as clean as "kubectl apply" remotely and the persistent volume gets created for you, since you specified that you needed it in your YAML.
> So, what's different from init/unit files? Just rebuild the box and put in the unit files, and you get the same thing you had running before. Again, for a single node there's nothing that Kubernetes does that init/unit files can't do.
Well you kinda just partially quoted my statement and then attacked it. You can do it with init/unit files, but you've got a higher likelihood of apps conflicting with each other, storing things in places you're not aware of, and missing important files in your backups.
It's not about what you "can't" do. It's about what you can do more easily, and treat bare metal servers like dumb container farms (cattle).
> You're gonna be creating your own directories and mapping them into your docker-compose YAMLs or Docker CLI commands.
You don't have to create them, docker does that when you specify a volume path that doesn't exist. You do have to specify them as a -v. In comparison to a full 'volume' object in a pod spec.
> And if you have five running and you're ready to add your sixth, you're gonna be SSHing in to do it again
In comparison to sshing in to install kubernetes, and connect it to your existing cluster, ultimately creating unit files to execute docker container commands on the host (to run kubelet, specifically).
> apps conflicting with each other
The only real conflict would be with external ports, which you have to manage with Kubernetes as well. Remember, these are still running in containers.
> storing things in places you're not aware of, and missing important files in your backups.
Again, they are still containers, and you simply provide a -v instead of a 'volume' key in the pod spec.
> treat bare metal servers like dumb container farms
We're not talking about clusters though. The original post I was responding to was talking about 1 vm.
I will agree that, when you move to a cluster of machines and your VM count exceeds your replica count, Kubernetes really starts to shine.
"BTW, learning Kubernetes can be done in a few days. " - Learn something in a few days and the ability to run it in production are completely 2 different things. Security, upgrades and troubleshooting cannot be learned in couple of days.
Anyone who starts doing anything "in production" based on a "maybe you don't need k8s" article should step back and think about whether they are the right person to put things into production.
Are you bootstrapping your stealth-mode side-project? Pick whatever you think is best, but think about the time value of operations. (So maybe just pick a managed k8s.)
Are you responsible for a system that handles transactions worth millions of dollars every day? Then maybe, again you should seek the counsel of professionals.
Otherwise these articles are just half-empty fuel cans for an (educated?) dumpster fire.
That said HashiCorp stuff is almost guaranteed to be amazing. I haven't even looked at Nomad, but I think anybody starting out with orchestration stuff should give it a go, and when they think they have outgrown it they will know what's next. Maybe k8s, maybe something else.
Sure, but "how to toss existing docker containers into GKE" is IMO less than three days. Unless you have a reason to manage your own k8s cluster, k8s is extremely easy.
4. Learn yaml. (May be helm and plethora of acronyms specially designed for k8s).
5. Combine spaghetti of 1,2,3 to build container image scripts.
6. Tie it into CI/CD process for adventurous.
7. Learn to install and manage k8s cluster or learn proprietary non open source api of google or amazon or azure.
8. Constantly patch and manage plethora of infrastructure software besides application code.
Now with all this use a tool designed for million user application on an application which will be used by 100’s to 1000’s of users. I think k8s is designed for google kind of problem and is an overkill for over 80-90% of deployments and applications.
May be just use simple deployment with Ansible, puppet,chef, nix, gnu guix etc. to deploy and manage software based on necessity on a single vm and extend it to a large cluster if necessary of bare metal or vm or container in a cloud agnostic manner.
Not sure when the technology fashion overtook the infrastructure area like complex web app tooling in JavaScript world. K8s has its own place at scale required by handful of Organization with google level traffic and load for most traditional companies Simple cluster management and configuration management tool will work wonders with less moving parts and cognitive load.
Then you still have to learn Ansible, so you're still learning a tool. And you probably want to use Docker anyways, unless you're shipping statically linked binaries.
Also, you said VM which implies that instead of Kubernetes you're going to use a VM platform, which comes with every bit of the complexity Kubernetes has.
I agree with you that it's a simpler deployment method when you don't need HA. As soon as you need HA, then all of a sudden you need to be able to fail over to new nodes and manage a load balancer and the backends for that load balancer. Kubernetes makes that easy. Kubernetes makes easy things harder than they should be, and hard things easier than they should be.
The number of things I've managed that don't need HA is vanishingly low.
I think you're starting to get at what has annoyed me about a lot of the anti-k8s folks.
I really just don't think they understand the tool, because any production environment should have many of the features Kubernetes helps provide. So the argument becomes "I know how to do it this other way, so learning a new tool is too complex."
Kubernetes helps standardize a lot of these things - I can very easily hop between different clusters running completely different apps/topologies and have a good sense of what's going on. A mish-mash of custom solutions re-inventing these wheels is, in my opinion, far more confusing.
> Kubernetes makes easy things harder than they should be, and hard things easier than they should be.
This is really the crux, I think. I think a lot of people look at Kubernetes, try to learn it by running their blog on a K8s platform, and decide it's overly complex to run something that Docker probably solves (alone) for them. When you need HA for many services, don't want to have to handle the hassle of your networking management, design your applications with clear abstractions, etc., and really need to start worrying about scale, Kubernetes starts to really shine.
> I can very easily hop between different clusters running completely different apps/topologies and have a good sense of what's going on. A mish-mash of custom solutions re-inventing these wheels is, in my opinion, far more confusing
Kinda like jumping between Rails projects (assuming those rails projects don't diverge heavily from the "convention") vs jumping around between custom PHP scripts ;)
Or for Java Dev... kinda like jumping between Maven (again, assuming mostly follow the Maven convention) vs random Ant scripts to build your Java-based systems.
There will always be naysayers no matter what because they're used to the previous tech.
I'm the kind of guy who just got lucky enough to land in workplace that moved away from previously ducktaped/custom build script to something not-necessary new but have since become accepted to be a better standardized tools.
I don’t have to go docker or kubernetes path for anything complex. I can still use more secure lxd containers with the same system configuration tool and scripts used for VM, bare metal to manage container.
Indeed due to problem with docker and CRI all kubernetes were vulnerable recently and needed security patch, as docker containers do not run in user namespace like lxd.
So for HA also traditional methods are better and functional programming approach like guix and nix to generate lxd container images, vm images or deploy to bare metal to run application is far superior and secure instead of spaghetti and non understandable black box images popular in docker world.
You have either deliberately misrepresented "Docker" in your post, or don't know enough about the vulnerability (and the affected software, runc) to make the claims you are making. The vulnerability was in runc, and Docker et al have had the capability to utilize user namespaces for a number of years.
No I did not mis-represent docker, used it since version 0.96 and after 1.0. Also commented, when under the new investment it tried to re-invent the wheels by moving away from lxc base to re-write its own libcontainer and lost on the development of unprivileged containers which landed in lxc 1.0. Since than it’s always been a security nightmare.
Docker although inferior and less secure than LXC/LXD became popular due to marketing driven by VC money, not on technical merits.
Check the old thread discussing the security issue with docker not supporting unprivileged container. [1]
Weird, we are 3 working on infra/CI/tooling supporting some 80+ engineers working on the product which didn't write a single line of config for Docker, CICD, Infra automation, K8s.
In fact, it was done by one person until the product teams reached some 40 engineers.
We aren't 10x guys either. What's different though is that we don't believe in the microservice hypetrain so I don't have 200 codebases to watch after.
For whatever accounting/asset/tax reason, corporations seem happy to spend vast sums on monthly costs such as AWS, yet loathe purchasing physical computers to do the same job.
Agree / disagree. I’ve been using k8s in large deployments (1000s of nodes) for about 3 years.
It’s easy to get started with using GKE, EKS, etc. it’s difficult to maintain if you’re bootstrapping your own cluster. 3 years in, and despite working with k8s at a pretty low level, I still learn more about functionality and function every single day.
I do agree it’s great tooling wise. I personally deploy on docker for desktop k8s day one when starting a new project. I understand all the tooling, it’s easier than writing a script and figuring out where to store my secrets every damn time.
The big caveat is - kubernetes should be _my_ burden as someone in the Ops/SRE team, but I feel like you frequently see it bleed out into application developer land.
I think that the CloudRuns and Fartgates* of the world are better suited to the average developer and I think it’s Ops responsibility to make k8s as transparent as possible within the organization.
> Kubernetes requires very little maintenance. Kubernetes takes care of itself. A container crashes? Kubes will bring it up. Do I want to roll out a new version? Kubes will do a rolling update on its own. I am running apps in kubes and for almost 2 years I haven't looked at my cluster or vms. They just run. Once every 6 months I log into my console and see that I can upgrade a few nodes. I just click ok and everything happens automatically with zero downtime.
> BTW, learning Kubernetes can be done in a few days.
This is true if you are using managed k8s from a provider or have an in-house team taking care of this. Far, far from the truth if you also need to set up and maintain your own clusters.
I'm very comfortable with k8s as a user and developer, would not be comfortable setting up a scalable and highly available cluster for production use for anything more serious than my homelab.
The parts between "docker build" and "kubectl apply" is literally CI versus CD; they're more complicated than 2 steps. And when there's a problem with either, K8s is not going to fix it up for you. You'll have to be notified via monitoring and begin picking through the 50 layers of complexity to find the error and fix it. Which is why we have deployment systems to do things like validation of all the steps and dependencies in the pipeline, so you don't end up with a broken prod deploy.
> Kubernetes requires very little maintenance
Whatever you're smoking, pass it down here... Have you ever had to refresh all the certs on a K8s cluster? Have you ever had to move between breaking changes in the K8s backend during version upgrade? Crafted security policies, RBACs, etc to make sure when you run the thing you're not leaving gaping backdoors into your system? There's like 50 million different custom solutions out there just for K8s maintenance. Entire businesses are built around it.
I'm not very familiar with Kubernetes, but from what I've seen, to me it looks quite complicated. I'm wondering, when people say they use Kubernetes and consider it easy/simple, does that typically include operating the underlying Kubernetes container infrastructure as well? Or does "using Kubernetes" usually mean deploying containers to someone else's Kubernetes hosting (like Google)?
When I say it (and I would guess outside of people who either contribute to Kubernetes or are using it to build their own PaaS) I mean a managed service like GKE. When I attempted to set my own cluster up all the confusion was around all of the choices to make, what container networking stack, etcd, and so on. GKE gives you an easy button for simpler architectures where you just have some containers and you want them to go in a cluster.
You will probably use someone's recommended Kubernetes stack, like k3s. It doesn't make a ton of difference in practice whether you let someone else host your control plane or do it on your own hardware, but your actual nodes you probably want instance level control on.
Over time you'll probably grow your own customizations and go-tos on kubernetes that you layer on top of k3s or what have you.
I couldn't agree more. I've built a lot of sideprojects on Kubernetes and couldn't be happier with it. It's incredibly cheap (basically free in terms of resource usage), low-touch, and abstracts away so many pesky problems that I had to deal with before. My cluster has been running for 12+ months without a single application-level outage - containers simply restart and get rescheduled when something goes wrong, and external logging and analytics solutions are a breeze to integrate.
I disagree here - there's definitely some minimum amount of resources you want to use to make k8s worth it memory- and CPU-wise. You may be able to get by with 1 CPU and 1GB of RAM for your master (which is just coordinating, not running any workloads), and there's some overhead on the workers as well.
I've been looking at running k8s at some raspberry pis at home, but anything smaller than the recently released 4 is just not worth it IMO (though I've seen several people run clusters of 3B+s).
You can’t really compare you deploying kubernetes to a bunch of raspberry pis to people normally deploying them to cloud hosting services or using managed services.....
I am not. GP doesn’t mention running managed. When talking about “basically free” and for side-projects I don’t really get your point. Minimum cost for a one-node n1-standard-1 cluster is a subsidized ~25 USD per month on GKE (assuming not using preembtibles).
After 1 year you’re above the price of comparable hardware (there are pretty decent 300 $ laptops these days).
I’m a happy customer of GKE but it’s not for everyone and everything. Like you say, different use-cases.
> BTW, learning Kubernetes can be done in a few days.
As someone who has deployed K8S at scale several times, this is nonsense. Learning K8S deeply enough to deploy AND MAINTAIN IT is a huge undertaking that requires an entire team to do right.
Sure, you can deploy onto AWS via KOPS in a day, and you can deploy a hello world app in another day. Easy.
But that only gets you to deployment, it doesn't mean you can maintain it. There are TONS of potential failure modes, and at this point you don't understand ANY of them. When one of them crops up, who do you page at 3AM, how do you know it's even down (monitoring/alerting isn't "batteries included"), how do you diagnose and fix it once you _do_ know what's broken?
Not to mention the fact that you _have_ to continuously upgrade K8S as old releases go out of maintenance in under a year. If you're not continuously testing upgrades against production equivalent deploys, you're going to footgun spectacularly at some point, or be stuck with an old release that new/updated helm charts won't work against.
TL;DR: If you can afford a team to deploy and maintain K8S, and have a complex enough application stack to need it, it's awesome; but it's not free in either time or staff.
As far as I have seen there is still updating overhead where you have to initiate upgrades to a known stable/supported version at a regular cadence. EKS suggests having an E2E CI pipeline to test it before updating production, and I feel like that's the only way to do it. There is nonzero churn even though it's managed.
> Kubernetes is really really cheap. I can run 20 low volume apps in a kubes cluster with a single VM. This is cheaper than any other hosting solution in the cloud if you want the same level of stability and isolation. It's even cheaper when you need something like a Redis cache. If my cache goes down and the container needs to be spun up again then it's not a big issue, so for cheap projects I can even save more cost by running some infra like a Redis instance as a container too. Nothing beats that. It gets even better, I can run my services in different namespaces, and have different environments (dev/staging/etc.) isolated from each other and still running on the same amount of VMs. When you caculate the total cost saving here to traditional deployments it's just ridiculously cheap.
If you're running 20 apps on a kubes cluster with a single VM you are running twenty apps on a single VM. There's no backup, scalability or anything else. There's no orchestration.
your deployment is a hipster version of rsync -avH myapp mybox:/my/location/myapp followed by a restart done via http to tell monit/systemd to restart your apps. It is a perfectly fine way of handling apps.
k8s shines when you have a fleet of VMs and a fleet of applications that depend on each other and have dynamic constrains but that's not what most of k8s installations work
> I mean yes, theoretically nothing needs Kubernetes, because the internet was the same before we had Kubernetes, so it's certainly not needed, but it makes life a lot easier. Especially as a cheap lazy developer who doesn't want to spend time on any ops Kubernetes is really the best option out there next to serverless.
Only in a throw production code over the fence sense.
You can be able to use Kubernetes very quickly. If you have a manged k8s cluster available to you, that somebody manages for you, then sure it is all upside. All the stuff you describe is great!
The thing is, if you have to manage the complexity and lifecycle of the cluster yourself, the balance tips dramatically. How do you provision it? How do you maintain it? How do you secure it? How do you upgrade it?
So I agree, k8s is great for running all manner of project, big and small. If you already have a k8s you will find yourself wanting to use it for everything! However if you don't have one, and you aren't interested in paying somebody to run one for you, then you should think long and hard about whether you're better off just launching a docker-compose from systemd or something.
Have you looked into Erlang/the BEAM VM? It provides many of these same benefits.
> This is cheaper than any other hosting solution in the cloud if you want the same level of stability and isolation.
Erlang provides total process isolation and can theoretically also run on only a single machine.
> It's even cheaper when you need something like a Redis cache. If my cache goes down and the container needs to be spun up again then it's not a big issue, so for cheap projects I can even save more cost by running some infra like a Redis instance as a container too. Nothing beats that.
In Erlang, each process keeps its own state without a single point of failure (ie a single Redis instance) and can be restarted on failure by the VM.
> Kubernetes takes care of itself. A container crashes? Kubes will bring it up.
Erlang VM takes care of itself. A process crashes? The VM will bring it up.
> Do I want to roll out a new version? Kubes will do a rolling update on its own.
Ditto for Erlang VM, with zero-time deployments with hot code reloading.
> I just click ok and everything happens automatically with zero downtime.
Erlang is famous for its fault tolerance and nine nines of downtime! And it’s been doing so for quite a bit longer than K8s.
I've looked at k8s a while ago and it looked insanely complex to me. Maybe it's one of those things that you need to look at several times to understand it?
Kubernetes addresses a bunch of infrastructure concerns, so IMO it's more equivalent to learning a new cloud platform (AWS, Azure) than anything else. People who already know one or more cloud platforms, and are familiar with the kind of issues that come up with distributed and HA systems on cloud (e.g. networking, secrets management) may find Kubernetes much easier.
"BTW, learning Kubernetes can be done in a few days."
How do you troubleshoot an api-group that is failing intermittently?
How do you troubleshoot a CSI issue? Because CSI isn't a simple protocol like CNI.
What do you look at if kubectl gets randomly stuck?
What do you do if a node becomes notReady?
What do you do if the container runtime of a several nodes starts failing?
What do you do if one of the etcd nodes doesn't start?
What if you are misisng logs in the log aggregator or metrics?
What if when you create a project it's missing the service accounts?
What if kube-proxy randomly doesn't work?
What if the upgrade fails and ends up in an inconsistent state?
What if your pod is actually running but shows as pending?
Sure, you can learn how to deploy kubernetes and an application on top of it in a couple days, but learning how to run a production will take way longer than that.
One side of this - our base k8s config is 44k lines of yaml, leading to a bunch of controllers and pods running, in a rather complex fashion.
Not to mention k8s conplexity and codebase itself.
It blackboxes so many things. I still chuckle at the fact that ansible is a first class citizen of the operator framwork.
It can certainly implode on you!
In my experience running nomad & consul is a more lightweight and simple way of doing many of the same things.
It’s a bit like the discussion raging in regards to systemd and the fact that it’s not “unixy”. I get the same feeling with k8s, whereas the hashicorp stuff, albeit less features, adheres more to the unix philosophy.
Thus easier to maintain and integrate.
Edit, sorry - I missed the dot - I meant to write 4.4K lines, but greping through the templates dir it's actually close to 12k lines.
Ah, no, it's not about replacing functionality. It's about opening up for general integrations and ease of use.
If you've set up a fully fledged infrastructure on k8s with all the bells and whistles, there's a whole lot of configuration going on here. Like a whole lot!
I most certainly can't replace all of the above with those two tools, but they make it easier to integrate in any way I see fit. What I'm saying is that Nomad is a general purpose workload scheduler, where k8s is k8s POD's only.
Consul is just providing "service discovery", do with it what you want. And so on...
Having worked a couple of years using both these setups I'm a bit torn. K8s brings a lot, no doubt, but I get the feeling that the whole point of it is for google to make sure you _do not ever invest in your own datacenters_.
k8s on your own bare metal at least used to be not exactly straight forward.
> k8s on your own bare metal at least used to be not exactly straight forward.
I actually just deployed k8s on a raspberry pi cluster on my desk (obviously as a toy, not for production) and it took about an hour to get things fully functional minus RBAC.
> What I'm saying is that Nomad is a general purpose workload scheduler
Yeah, Nomad and k8s are not direct replacements at all. Nomad is a great tool for mixed workloads, but if you're purely k8s then there are more specific tools you can use.
> I meant to write 4.4K lines
Just a small difference! Glad no one wrote 44k lines of yaml, that's just a lot of yaml to write...
> close to 12k lines
Our production cluster (not the raspis running on my desk!) runs in about 4k lines, but we have a fairly simple networking and RBAC layer. We also started from just kubernetes and grew organically over time, so I'm sure someone starting today has a lot of advantage to get running more easily.
If you want ”cloud style” ingress, you’ll probably use metalLB and bgp etc.
Here’s where it gets fun.
I mean, don’t get me wrong, it works - now at least. Never liked it until 1.12 tbh, which is when a bunch of things settled.
The article is about “maybe you don’t need...” and as an anecdote I helped build a really successful $$$ e-commerce business with a couple of hundred micro-services on an on-prem LXC/LXD “cloud” using nomad, vault & consul.
You can use these tools independently of each other or have them work together - unixy.
I have anecdotes from my last couple of years on k8s as well, and... it just ends up with a much more narrow scope.
Sort of similar to the fact that I basically always have both tmux and dtach installed, the latter for "just make this detachable", the former for "actually I'd like some features today".
Something like that. I want service discovery, where Consul really shines - cause containers ain't all we're doing, mkay. K8s forces etcd on you, for service discovery, and only within the cluster.
So, sync it then... more code more complexity, no "pipe" ("pipe" not to be taken literally in this context) and simple integrations.
Not to mention, consul is already a full DNS server, but in k8s, we need yet another pod for coredns. Is YAP a thing? =)
For example, I love how easily envoy integrates to any service discovery of you liking, even your own - simply present it with a specific JSON response from an API. Much like how the Ansible inventory works. It makes integrating, picking and choosing as well as maintaining and extending your configuration management just so much more pleasant.
It definitely can't be don in a few days. It took me 2-3 months to go through everything and run my own cluster at home. And yes, you need to go through about every major concept (architecture, object type, API, authentication, services, deployments, updates, etc, etc, etc) in order to make sense of Kubernetes overall. It's a very complex software. You can learn a simple programming language like Go in a couple of days, but definitely not Kubernetes.
Which network fabric do you use and how did you set up DNS/cert management? For me certificates has been one of the pain points - have been using cert-manager with LetsEncrypt for some time but it has been notoriously unstable and they have introduced plenty of breaking changes between releases. (That being said I haven't tried the more recent releases, maybe things have gotten more stable in the past couple of months)
Google recently release managed certs for those running on GKE, but those are limited to a single domain per cert.
I use the external-dns and cert-manager tools. cert-manager uses lets-encrypt but fully automates everything, you just add an annotation to your ingress resource. Been using it in prod for around 6 months now with no problems.
Ah, sounds like they’re stabilizing then - I’ve had a lot of stability and upgrading issues with older versions.
Just the fact that you couldn’t configure it for automatic renewal with anything else than 24h before expiry and these renewals would fail half the time...
Can second this. For most cases it's very low touch. When things break it has always been because I deployed bad code or configs. My weekends have freed up.
Your single VM is a single point of failure. You probably want to run this in 3 VMs, one in each data center. ECS gives this to you out of the box, rolling deployments and health checks included.
An awful lot of server systems can tolerate a hardware failure on their one server every couple years given 1) good backups, 2) "shit's broken" alerts, and 3) reliable push-button re-deploy-from-scratch capability, all of which you should have anyway. Lots of smaller shops trying to run to k8s and The Cloud probably have at least that much downtime (maybe an hour or two a year, on average) due to configuration fuck-ups on their absurd Rube Goldberg deployment processes anyway.
[EDIT] oh and of course The Cloud itself dies from time to time, too. Usually due to configuration fuck-ups on their absurd Rube Goldberg deployment processes :-) I don't think one safely-managed (see above points) server is a ton worse than the kind of cloud use any mid-sized-or-smaller business can afford, outside certain special requirements. Your average CRUD app? Just rent a server from some place with a good reputation, once you have paying customers (just host on a VPS or two until then). All the stuff you need to do to run it safely you should be doing with your cloud shit anyway (testing your backups, testing your re-deploy-from-scratch capability, "shit's broken" alerts) so it's not like it takes more time or expertise. Less, really.
Business services generally need high availability goals, so often that doesn't cut it. And your single server doesn't autoscale to load.
AWS gives you availability zones, which are usually physically distinct datacenters in a region, and multiple regions. Well designed cloud apps failover between them. Very very rarely have we seen an outage across regions in AWS, if ever.
In practice I see a lot of breakage (=downtime), velocity loss, and terrible "bus factor" from complex Cloud setups where they're really not needed—one beefy server and some basic safety steps that are also needed with the Cloud, so aren't any extra work, would do. "Well designed" is not the norm and lots of the companies are heading to the cloud without an expert at the wheel, let alone more than one (see: terrible bus factor)
Businesses always ask for High Availability, but they never agree on what that actually means. IE, does HA mean "Disaster Recovery", in which case rebuilding the system after an incident could qualify? Does it require active-active runtimes? Multiple data centers? Geographic distribution?
And by the way, how much are they willing to spend on their desired level of availability?
I still need a better way to run these conversations, but I'm trying to find a way to bring it back to cost. How much does an hour of downtime really cost you?
Agree - different business functions have different availability goals. An system that computes live risk for a trading desk might have different availability goals from an HR services portal.
I once ran a Linux server on an old IBM PC out of a run-down hotel's closet with a tiny APC battery for 10 years without a reboot. Just because I got away with it doesn't make it a great idea. (It failed because the hard drive died, but for a year and a half nobody noticed)
> An awful lot of server systems can tolerate a hardware failure on their one server every couple years given 1) good backups, 2) "shit's broken" alerts, and 3) reliable push-button re-deploy-from-scratch capability, all of which you should have anyway
Just.... just... no. First of all, nobody's got good backups. Nobody uses tape robots, and whatever alternative they have is poor in comparison, but even if they did have tape, they aren't testing their restores. Second, nobody has good alerts. Most people alert on either nothing or everything, so they end up ignoring all alerts, so they never realize things are failing until everything's dead, and then there goes your data, and also your backups don't work. Third, nobody needs push-button re-deploy-from-scratch unless they're doing that all the time. It's fine to have a runbook which documents individual pieces of automation with a few manual steps in between, and this is way easier, cheaper and faster to set up than complete automation.
> Just.... just... no. First of all, nobody's got good backups. Nobody uses tape robots, and whatever alternative they have is poor in comparison, but even if they did have tape, they aren't testing their restores. Second, nobody has good alerts. Most people alert on either nothing or everything, so they end up ignoring all alerts, so they never realize things are failing until everything's dead, and then there goes your data, and also your backups don't work.
But you should test your backups and set up useful alerts with the cloud, too.
> Third, nobody needs push-button re-deploy-from-scratch unless they're doing that all the time. It's fine to have a runbook which documents individual pieces of automation with a few manual steps in between, and this is way easier, cheaper and faster to set up than complete automation.
Huh. I consider getting at least as close as possible to that, and ideally all the way there, vital to developer onboarding and productivity anyway. So to me it is something you're doing all the time.
[EDIT] more to the point, if you don't have rock-solid redeployment capability, I'm not sure how you have any kind of useful disaster recovery plan at all. Backups aren't very useful if there's nothing to restore to.
[EDIT EDIT] that goes just as much for the cloud—if you aren't confident you can re-deploy from nothing then you're just doing a much more complicated version of pets rather than cattle.
> more to the point, if you don't have rock-solid redeployment capability, I'm not sure how you have any kind of useful disaster recovery plan at all. Backups aren't very useful if there's nothing to restore to.
As Helmuth von Moltke Sr said, "No battle plan survives contact with the enemy." So, let's step through creating the first DR plan and see how it works out.
1) Login to your DR AWS account (because you already created a DR account, right?) using your DR credentials.
2) Apply all IAM roles and policies needed. Ideally this is in Terraform. But somebody has been modifying the prod account's policies by hand and not merging it into Terraform (because reasons), and even though you had governance installed and running on your old accounts flagging it, you didn't make time to commit and test the discrepancy because "not critical, it's only DR". But luckily you had a recurring job dumping all active roles and policies to a versioned write-only S3 bucket in the DR account, so you whip up a script to edit and apply all those to the DR account.
3) You begin building the infrastructure. You take your old Terraform and try to apply it, but you first need to bootstrap the state s3 and dynamodb resources. Once that's done you try to apply again, but you realize you have multiple root modules which all refer to each other's state (because "super-duper-DRY IaC" etc) so you have to apply them in the right sequence. You also have to modify certain values in between, like VPC IDs, subnets, regions and availability zones, etc.
You find odd errors that you didn't expect, and re-learn the manual processes required for new AWS accounts, such as requesting AWS support to allow you to generate certs for your domains with ACM, manually approving the use of marketplace AMIs, and requesting service limit increases that prod depended on (to say nothing of weird things like DirectConnect to your enterprise routers).
Because you made literally everything into Terraform (CloudWatch alerts, Lambda recurring jobs, CloudTrail trails logging to S3 buckets, governance integrations, PrivateLink endpoints, even app deployments into ECS!) all the infrastructure now exists. But nothing is running. It turns out there were tons of whitelisted address ranges needed to connect with various services both internal and external, so now you need to track down all those services whose public and private subnets have changed and modify them, and probably tell the enterprise network team to update some firewalls. You also find your credentials didn't make it over, so you have to track down each of the credentials you used to use and re-generate them. Hope you kept a backed up encrypted key store, and backed up your kms customer key.
All in all, your DR plan turns out to require lots of manual intervention. By re-doing DR over and over again with a fresh account, you finally learn how to automate 90% of it. It takes you several months of coordinating with various teams to do this all, which you pay for with the extra headcount of an experienced cloud admin and a sizeable budget accounting gave you to spend solely on engineering best practices and DR for an event which may never happen.
....Or you write down how it all works and keep backups, and DR will just be three days of everyone running around with their heads cut off. Which is what 99% of people do, because real disaster is pretty rare.
This is kind of what I'm talking about WRT the cloud being more trouble than it's worth if you app sits somewhere in between "trivial enough you can copy-paste some cloud configs then never touch them" on the one end and "so incredibly well-resourced you can hire three or more actual honest-to-god cloud experts to run everything, full time". Unless you have requirements extreme/weird enough that you're both not-well-resourced but also need the cloud to practically get off the ground, in which case, god help you. I think the companies in that middle ground who are "doing cloud" are mostly misguided and burning cash & harming uptime while thinking they're saving and improving them, respectively.
You nailed it in the first sentence. The blog post pretty much boils down to "k8s looks good but we were too lazy to learn how to use the thing, so we opted into something else".
Fair enough, legit argument, but trying to make a "counter-k8s" case based on that is not very convincing.
Point taken.
However, the part about "why not kubernetes" reads:
[...] we started adding ever-more complex layers of logic to operate our services.
As an example, Kubernetes allows [...] this can get quite confusing [...].
[...] this can lead to tight, implicit coupling between your project and Kubernetes.
[...] it’s tempting to go down that path and build unnecessary abstractions that can later bite you.
[...] It takes a fair amount of time and energy to stay up-to-date with the best practices and latest tooling. [...] the learning curve is quite steep.
So in short "it is complex so this and that may happen if you don't learn it properly".
Not every tool is equally complex or requires the same amount of learning. K8s has a reputation for being really high on the scale, so a reasonable team could consider it and then decide to use something less complex.
Let's get it right:
Kubernetes is really really cheap. I can run 20 low volume apps in a kubes cluster with a single VM. This is cheaper than any other hosting solution in the cloud if you want the same level of stability and isolation. It's even cheaper when you need something like a Redis cache. If my cache goes down and the container needs to be spun up again then it's not a big issue, so for cheap projects I can even save more cost by running some infra like a Redis instance as a container too. Nothing beats that. It gets even better, I can run my services in different namespaces, and have different environments (dev/staging/etc.) isolated from each other and still running on the same amount of VMs. When you caculate the total cost saving here to traditional deployments it's just ridiculously cheap.
Kubernetes makes deployments really easy. docker build + kubectl apply. That's literally it. Deployments are two commands and it's live, running in the cloud. It's elastic, it can scale, etc.
Kubernetes requires very little maintenance. Kubernetes takes care of itself. A container crashes? Kubes will bring it up. Do I want to roll out a new version? Kubes will do a rolling update on its own. I am running apps in kubes and for almost 2 years I haven't looked at my cluster or vms. They just run. Once every 6 months I log into my console and see that I can upgrade a few nodes. I just click ok and everything happens automatically with zero downtime.
I mean yes, theoretically nothing needs Kubernetes, because the internet was the same before we had Kubernetes, so it's certainly not needed, but it makes life a lot easier. Especially as a cheap lazy developer who doesn't want to spend time on any ops Kubernetes is really the best option out there next to serverless.
If learning Kubernetes is the reason why it's "not needed" then nothing is needed. Why use a new programming language? Why use a new db technology? Why use anything except HTML 4 + PHP, right?
BTW, learning Kubernetes can be done in a few days.