> We would need to build/staff a full-time Compute team
This actually was a very real problem at my current job. The data pipeline was migrated to k8s and I was one of the engineers that worked to do that. Unfortunately, neither myself (nor the other data engineer) was a Kubernetes guy, so we kept running into dev-ops walls while also trying to build features and maintain the current codebase.
It was a nightmare. If you want k8s, you really do need people that know how to maintain it on a more or less full time schedule. Kubernetes is really not the magic bullet it's billed to be.
> Managed Kubernetes (EKS on AWS, GKE on Google) is very much in its infancy and doesn’t solve most of the challenges with owning/operating Kubernetes (if anything it makes them more difficult at this time)
Oh man this hits home. EKS is an absolute shitshow and their upgrade schedule is (a) not reliable, and (b) incredibly opaque. Every time we did a k8s version bump, we'd stay up the entire night to make sure nothing broke. We've since migrated to cloud functions (on GCP; but AWS lambdas could also work) and it's just been a breeze.
I also want to add that "auto-scaling" is one of the main reasons people are attracted to Kubernetes.. but in a real life scenario running like 2000 pods with an internal DNS, and a few redis clusters, and Elastic Search, and yadda yadda... it's a complete pain in the butt to actually set up auto-scaling. Oh, also, the implementation of Kubernetes cron jobs is also complete garbage (spawning a new pod every job is insanely wasteful).
I work on a 2-person project and decided to go with kubernetes (through digitalocean) for the cluster. I am managing everything with terraform and I don't have any big problems. I like that I can write everything as terraform manifests, have it diffed on git push and applied to prod if I want to.
Sure it had a learning curve but now I just describe my deployments and k8s does the rest, which then reflects back on digitalocean. If I need more power for my cluster, I increase the nodes through digitalocean and k8s automatically moves my containers around how it deems fit.
I used normal blue/green deployments on self-managed VMs in the past, then worked with beanstalk, heroku, appengine and I much prefer k8s. Yes it's easier on heroku, but try to run 2-3 different containers on the same dyno for dev to keep cost down. On k8s I can run my entire stack on one single small digitalocean $10 VM if I wanted to.
I wouldn't even know what I else could pick that gives me equal flexibility and power?
> I used normal blue/green deployments on self-managed VMs in the past, then worked with beanstalk, heroku, appengine and I much prefer k8s. Yes it's easier on heroku, but try to run 2-3 different containers on the same dyno for dev to keep cost down. On k8s I can run my entire stack on one single small digitalocean $10 VM if I wanted to.
So you already spend about a decade learning all the skills. What the other guy is talking about coming from dev not from ops. If you come from dev you don't necessarily know what an ingress or egress is, and might never have done a blue/green deployment etc. This is all stuff that needs to be learned first. I worked with many many teams who had zero skills in data center tech before they were moved to k8s full time.
I personally like it to learn all that stuff. And I love that my job requires it now. But it's more like vim than like node-red, and that was a shock for many people, from engineer to EVP.
I'm also on a 2-person project on DigitalOcean k8s, also very happy.
K8s is kind of messy compared to Heroku, which I don't love, but is also way more powerful and can be more secure. I don't know what I'd use instead of it, exactly as you said.
Also, we run a VPC-only K3s node for some simple internal tools that works great as well.
For those missing Heroku: there exists Dokku [1], a small Heroku-like implementation for container management. It uses the same underlying buildpacks and you get the same comfort as with Heroku. And it's free to use. You can't deploy to multiple host machines though. But for small projects that fit on a single host, it's very nice to use.
Looks great! considering migrating from dokku. I also came across exoframe recently which looks lower level but works with your existing docker projects https://github.com/exoframejs/exoframe
It seems Dokku wins at "easier" since in many cases, you can just push the application code you used for development and the required stack is automatically detected. Adding a database is two commands. No need to know Dockerfiles.
It's easier because for many cases you don't even need to search the document ro know what command to run to fire up a database. You just select it from a GUI list.
Looks interesting - but the installation instructions put me off a bit. Open a port on your server, and don't change the default password `captain42` - then run a cli tool from your dev machine.
I'll look more into it, but it didn't really inspire confidence.
> Also, we run a VPC-only K3s node for some simple internal tools that works great as well.
We do exactly the same thing! We have a one-node k8s for all these dev things that just works. Everything is containerized for local dev anyway so moving it to k8s was just writing the deployment manifest.
On heroku, all of these would be separate dynos (or one glued-together dyno that does everything). On a self-hosted VM we'd have to deal with managing that.
I liked this approach so much that I now have a small 1-node personal cluster that hosts all of my private hobby projects that aren't ready for prime time yet, that were on heroku previously. Costs me only $10 + (persistent storage + IP address if needed)
I feel like I’m witnessing two co-founding colleagues - who sit by each other day in and day out - discover the other’s persona on HackerNews.
Sure, maybe you two (@dvcrn and @arcticfox) don’t work together and don’t know each other, but it’s definitely more entertaining imagining the scenario above.
The point is to keep cost down while a project is in development, then scale depending on needs without having to worry about container distribution and resource utilization.
On dev I don't need a 10 node cluster if I had 10 containers running. One $10 vm is fine.
On prod I can start with a 3-node cluster for these 10 containers, then scale it up depending on traffic and needs while controlling my spending.
Not everyone has thousands of dollars of VC money to throw at hosting
The "adhoc" part is the problem. K8S is standardized and offers high-availability, failover, logging, monitoring, load balancing, networking, service discovery, deployments, stateful services, storage volumes, batch jobs, etc. And it can run self-contained on a single machine or scale out to 1000 nodes.
Why piece all of that functionality together yourself into some fragile framework instead of using a industry standard?
"Why piece all of that functionality together yourself into some fragile framework instead of using a industry standard?"
Quite recently developed "industry standard". Many tools mentioned have been used for tens of years, they work robustly, are well documented and there is lots of people who can use them. I personally would use the word "industry standard" a little bit differently.
You still have to put them all together into some custom solution just for your setup which adds overhead and fragility. New employees will have to learn that instead of using K8S APIs. Deploying new components can’t take advantage of the wide and fast growing ecosystem.
There’s really nothing like the full suite that kubernetes provides.
It's also very empowering for people whose job isn't to run things, but to build the things that are to run. I've worked on a dozen-or-so of bespoke "industry standard" setups, and each and every one had a number of weird quirks, involved learning some new "industry standard" components and either made it very hard and dangerous for non-ops/devops/infra people to run their things themselves, or had homegrown tooling that pretty much replicated what the k8s API can do, just only a small subset and badly. Some YAML and kubectl are well within what a typical data scientist can be expected to understand, more so if it means they can run their things on a dev cluster themselves, and in a pinch, that data scientist can debug prod issues of their things, because it all works the same way. We have a very useful bot that was built and deployed by someone decidedly not ops while waiting for jobs to finish – a simple K8S deploymend YAML is like 50 LOC, with 40 being pretty much standard, et voilà, running bot, without having to build lots and lots of automation in-house or having to take up ops time to deploy this for them or having to grok advanced sysadmin-ing first. Used with appropriate caution and safeguards, it's super powerful.
> Quite recently developed "industry standard". Many tools mentioned have been used for tens of years, they work robustly, are well documented and there is lots of people who can use them.
k8s is based off of Borg, which has existed for far longer.
That's not what I'm saying. Your point was about the maturity of k8s which makes sense if it sprung from nowhere, but k8s encapsulates a lot of "lessons learned" of a very stable and mature product (even if proprietary).
It can be worth it to piece things together yourself. A complex tool can also be fragile if you don't take the time to learn and understand every facet of it.
If you only need certain parts of what k8s offers, building those parts yourself can offer you more stability, control, and insight into what your application is doing.
I think using something like k8s prematurely is more an example of "resume driven development" than the other way around.
Building it yourself doesn't mean building all of it. For example, it's quite easy to get zero downtime deployments with a tiny bit of systemd configuration and the SO_REUSEPORT socket option. That seems easier for a team to understand than "here is kubernetes and everything that comes with it"
That is, however, a very tiny slice of what might be needed.
You need to deliver the application to the machines somehow. You might need to configure the network to reach the application, etc. All quite common tasks.
(And honestly, systemd can bite you just as much if not worse than k8s definitions, because the API is much less cohesive and the defaults like to cut off your hands)
Building one yourself is a good exercise for understanding something, I agree.
The problem with that logic is everyone on your team will need to do it, so you're going to be stuck picking a standard. Should it be yours, should it be mine, or should we both just learn something with a large community behind it?
Nothing is perfect of course, but k8s makes a really good target for CI/CD, which is something you want when you're developing as part of a team. If you're not quite a team yet and you don't know how to bootstrap k8s and CI/CD, then you need to figure out when those types of things are important.
Probably lots of people could stick with a monolith and a VM for longer than they did, but automated testing will save you a fair bit of time if you're not figuring out how to do it at the same time.
k8s is nice, yes. But, by its age, no more standardized now, than MySQL was 15 years ago.
k8s also adresses a very small portion of the market. If you have to scale, yes, you might need k8s. Chances are, you don't. Really.
I stopped counting the (supposedly big) customers that burnt themselves into k8s when they logistically need not to, and only brought organisational issues on them.
"some haproxy" - what is the haproxy configuration? I tried doing this myself and then realized setting up a cheap k8s cluster on DO was way easier. Now, yes, there have been occasional problems, but not since I just let DO handle the whole thing. I can do zero-downtime deploys, cronjobs (why is spawning a pod so wasteful? it spawns it, runs the job, and then kills it?), all with a single "kubectl apply" command that takes a split second to run.
If you're in AWS, you can use their ELB (Elastic Load Balancer) service instead of setting up your own haproxy. I worked on a team that used it for years without any issues orchestrating zero-downtime deployments. It was extremely easy and didn't require any real configuration.
yeah, we started out with ansible aswell.
but some stuff is just way harder. especially no downtime.
haproxy is fragile and sending files via haproxy over ssh aswell.
docker is also a standarized package+repository format.
before we used some kind of hacky cdn solution, etc.
it was stuff glued together, written by myself and I was the only guy who understood it and ever will be.
I use k3s to manage my Plex/NAS/home server machine. Just simply as a deployment tool, it's much easier for me than managing a collection of services deployed by Ansible/Puppet held together by custom scripts and systemd units.
Sadly I don't have many resources I can refer you to (maybe someone else can add?) but digitaloceans kubernetes guides are excellent so I'd start with that: https://www.digitalocean.com/docs/kubernetes/how-to/
k8s has a loooot of stuff it can do and reading too many blog posts that go into too much detail can be intimidating, so my advice would be:
1. Create a dummy cluster on digitalocean (or docker desktop with k8s support / minikube), then setup kubectl to connect to it.
2. Start with the difference between resource types: What are deployments, what are pods?
3. Create a deployment manifest that just tries to pull a container from some registry, apply it with kubectl -f foo.yml
4. Play with kubectl to inspect things: kubectl get pods, kubectl get deployments, kubectl describe pod xxxxxx, kubectl logs xxxx
Deployments/services/(pods) are all you need in the beginning for running containers on k8s and exposing them. Of course then there are things like persistent storage but if your app is made to run ephemeral, you likely have storage/db setup externally already.
For running through the CI, once you have your manifests you could run kubectl apply directly through the CI if you wanted to. We are using terraform in front of k8s with hashicorps hosted state, then run `terraform plan`. If it passes, on dev we automatically apply to the cluster, on prod there is a manual step through the hashicorp admin UI that needs an apply trigger. Then there are more advanced tools like spinnaker that can be used to setup more complex pipelines on what to do on push.
Gitlab CI is my weapon of choice here since it’s integrated nicely with Kubernetes. There’s a wealth of tools out there - but the work I do on managing PCGamingWiki is publicly available [1] to give you a starting point. I use Kustomize + kubectl, and when I need to rollback a deployment I can just do it from Gitlab’s environment page.
My experience is the same. I really like automating tedious and error prone parts of deployments, and Kubernetes is the best tool I've found for that. It is a lot to learn about, and there are a lot of missing features that people go to great lengths to build for themselves (see "service mesh" for example), but the core is very solid.
I like loosely coupling things, and Kubernetes is the first ecosystem where that has worked well for me. (OK, it worked great when I worked at Google, but a lot of effort was put into that by thousands of people.) For example, for the first time in my life, I automatically renewed a TLS certificate for my various personal projects. When I started using Let's Encrypt, I just manually ran certbot every 3 months when I got a warning email that my cert was expiring. That is fine, but it's kind of a waste of time. There are tightly coupled solutions to this problem, but they basically require you to totally commit to their approach (Caddy is a good example of this). I use Envoy, but Kubernetes let me not care. I run cert-manager, which just runs in the background and updates my certs when they need to be updated. It's stored as a Kubernetes secret, which can be mounted into my Pod as files. When the secret changes, the filesystem is atomically updated. Envoy can notice this and start using the new certificate. cert-manager doesn't know anything about Envoy and Envoy doesn't know anything about Let's Encrypt. So I'm not locked into any particular decision -- I can change my CA, and nothing about my frontend proxy has to change. I can change my frontend proxy, and nothing about my certificate management has to change. This, to me, is a big deal. I have one less thing to worry about, and I am not locked into any other decisions.
I also like the flexibility with which I can write programs to manage my infrastructure. All the primitives available to me as a programmer are high-level and well-tested, and are the same things that the CLI tools do. For example, in preparation for HTTP/3, I needed some way to get UDP traffic into my cluster. My cloud provider doesn't provide a load balancer for UDP, so instead I wrote a program that watches changes to Nodes from the Kubernetes API server, and updates a DNS record with the external IP addresses of all the healthy nodes. Then I can instruct browsers capable of HTTP/3 to use that DNS address to attempt an upgrade to HTTP/3, and it doesn't matter that my cloud provider can't do that at a lower layer in the stack. The alternative to this approach is to basically commit to having a certain IP address available, and keep that updated manually. It's fine, but again, one more thing to worry about. I can take this exact code, and it will work perfectly on any other Kubernetes provider -- so I'm not tied to DigitalOcean, and I'm not tied to any manual processes. One less thing to worry about.
I agree that a lot of people get into a situation where they have to move hundreds of apps and tens of nodes all at once, and under those circumstances, it sure is a lot of work to figure out Kubernetes compared to putting a band-aid on the problem and getting back to work. The biggest problem is that you are probably facing some sort of crisis, and have to decide, with very little experience, whether you want to use a managed offering or build it yourself. Building it yourself is quite complicated. What CNI plugin are you going to use (they all seem both wonderful and horrible on paper)? Why do you have to buy five nodes only dedicated to master tasks, like etcd? How are we going to upgrade to the next version with no downtime? You can go managed, but then you give up a lot of control. Who controls DNS at the node level (fun fact: container pulls don't go through the same DNS stack that the Pod will eventually use)? How can you use gVisor to isolate pods from the host kernel? (You can't! You will have to run it yourself.) Compromise fatigue is going to kill you here -- you have a crisis, and all the options are bad. (I've been there myself. I started using Kubernetes because our Convox Rack was so outdated that we couldn't deploy new software anymore. We tried upgrading things, but it broke things even more. So until we got k8s working, and converted every workload from a proprietary format, we couldn't deploy software. It was frustrating. But the reality is that I wanted to switch a long time ago, so the transition was quite smooth, with no prior real-world experience. And now this problem won't happen again, because tens of thousands of people know how to deal with Kubernetes.)
I also agree with this article that Amazon's managed Kubernetes offering is terrible. EKS was my first Kubernetes experience, and it was clear to me that Jeff Bezos walked into someone's office and said "we need Kuberthingie in two weeks or you're all fired." The team saved their jobs, but that's about it. It's very much the managed Kubernetes solution for people that are locked into AWS already. What people really want is not Managed Kubernetes but "namespace as a service". They just want to kubectl apply something and let a background task provision their machines. They don't want to screw around with RBAC, service meshes, managing the Linux distribution on their worker nodes, managing the master nodes, etc. That service unfortunately doesn't exist. Maybe send me an email if you want to work on something like this, though, because I certainly do ;)
In summary, I get the pain points, but I think they are worth embracing. Things aren't perfect, but you are going to have pain points at all the big breakpoints in infrastructure. Going from 0 applications to 1 application is going to be a major change for your team/company. Going from 1 application to 2 applications is also going to be a major change, but most people overcome this with sheer willpower and tedium until they hit something like 10 or 15 applications, and then are up a creek without a paddle. I recommend embracing future growth early, so that your second application is as easy to run as your first. It's not hard, it's not time consuming, it's just very different from "I'll pop in a Debian CD and rsync our app over."
> What people really want is not Managed Kubernetes but "namespace as a service". They just want to kubectl apply something and let a background task provision their machines. They don't want to screw around with RBAC, service meshes, managing the Linux distribution on their worker nodes, managing the master nodes, etc. That service unfortunately doesn't exist.
I think you just described AWS Fargate and Google Cloud Run.
AWS's version is pretty half-baked. You can't provision or use persistent volumes (so no stateful apps), and you have to use their load balancer which terminates TLS (preventing your software from being able to do ALPN, using Let's Encrypt, supporting HTTP/3, etc.).
Cloud Run just seems like standard "serverless" stuff, nothing to do with Kubernetes. (The downsides involve not being able to run applications that are designed to run on a generic Linux box; everything has to be specially developed. That is fine, and they have open-sourced all the tools necessary to move off of them so you aren't locked in, but it's a bigger paradigm shift.)
I use Cloud Run and nothing is specially developed for it, other than just being stateless. I can take my container and run it on a generic Linux box with zero changes (in fact, I run it in WSL2 on my Windows machine all the time). And I just install the normal Node.js, imagemagick, etc in my Dockerfile, no special builds or flags.
> > We would need to build/staff a full-time Compute team
I'm not sure I get this objection. Presumably in a company like Coinbase there is already an infrastructure team that runs the AWS instances, helps build the AMIs, etc. This team could re-tool and hire some k8s experts to help them make the shift. The promise of k8s (at least one of them) is that you can do more with less ops resource, since the system is so programmable. The idea that you'd need a completely new full-time team doesn't grok for me; that new team should replace another team that's no longer needed, or more likely, involve a combination of hiring some experts and retraining your existing engineers.
I do take seriously the other issues raised RE: security (though one GKE cluster per security boundary is a perfectly reasonable approach and gets you further than you might think).
> Unfortunately, neither myself (nor the other data engineer) was a Kubernetes guy
I think this is a different issue than the OP was raising; in any case, in order for a technology to succeed, you need to have subject matter experts embedded in your dev teams, or a separate function that provides the service to the teams that use it.
In the context of the OP, I think your case would be more like saying "hey data team, you need to build your data jobs into AMIs, go figure it out". Regardless of the technology chosen, it's not going to succeed if the teams doing the work don't know how to use the tools.
> that new team should replace another team that's no longer needed, or more likely, involve a combination of hiring some experts and retraining your existing engineers.
The gotcha there is that it rarely goes that way unless you have a very clear direction from senior management, at least at big corps. In most cases, it's just another thing that gets added to the pile, and it's incredibly difficult to migrate entirely out of whatever the old solution was, so now you end up supporting both.
Note that they already have a team who has built their current compute platform, who built the pipeline to run containers/processes on VMs with auto scaling groups.
It’s great if their own solution works well for them at less cost, but that system didn’t built itself and has non-zero maintenance costs.
> Note that they already have a team who has built their current compute platform, who built the pipeline to run containers/processes on VMs with auto scaling groups.
In that case I would expect Coinbase to write blog posts on how their setup is the absolute best solution to their problem, and not how they refrain from adopting the best solution to their problem because they claim they don't have anyone on the team that is able to pull that off.
> Presumably in a company like Coinbase there is already an infrastructure team that runs the AWS instances, helps build the AMIs, etc. This team could re-tool and hire some k8s experts to help them make the shift.
The key is that there is a lot of additional services and interface points to handle. As the Coinbase article noted, you need extra pieces on top of k8s (storage, service mesh, config/secrets, etc) that need care and feeding. Even if the company moved 100% of their services into k8s there's now more work to be done for the same level of service.
: The control points that k8s exposes are not simple "drop in your provider here" bits of integration. You would likely still have the same core providers (ex: EBS for storage) but there is now more code running to orchestrate them, and more access control to implement and verify.
My personal experience (4 years on GKE in production) has been the opposite; running on k8s has abstracted away a number of things that I’d otherwise have to engineer.
Volumes just get attached (using PersistentVolumeClaims), and automatically migrate to a new node of the original pod dies. Vs. having to do some sort of rsync between nodes to keep disks in sync.
Secrets get encrypted by k8s and mounted where needed. I would agree that RBAC is a bit tricky but I don’t think it’s harder than IAM provisioned with Terraform.
If you are not using a service mesh for your VMs then you don’t need one in k8s. (I don’t use one, and rolled TLS to the pod in less effort than it would take to maintain TLS to the VM). The reason you want a service mesh is to abstract TLS and retry mechanics from the application layer - i.e. make your service authors more productive. If you don’t use a service mesh then you are back to managing TLS per-service, which is where you are with VMs already.
There are definitely more services you _could_ run, but in my experience these are additive, I.e. they are extra work, but give you a productivity boost.
Anyway, YMMV and I haven’t operated a system as large as Coinbase, so I could be missing something. Interested in hearing others’ experiences though.
> As the Coinbase article noted, you need extra pieces on top of k8s (storage, service mesh, config/secrets, etc) that need care and feeding.
The problem with that assertion is that it does not make any sense at all. For instance, storage and config/secrets is already supported out-of-the-box with Kubernetes. Even so, complaining about storage with Kubernetes is like complaining about EBS or EFs or arguably S3 in AWS. And if you feel strongly about service meshes then you really aren't forced to use them.
> Even if the company moved 100% of their services into k8s there's now more work to be done for the same level of service.
There really isn't. For example, if they go with managed Kubernetes solutions then the only thing they need to worry about is to actually design their deployments, which would be very strange if they couldn't pull off. That's a ramp-up project for an intern if the solution architecture is already in place.
> You would likely still have the same core providers (ex: EBS for storage) but there is now more code running to orchestrate them
There really aren't. Kubernetes' equivalent to EBS is either setting up a volume or a persistent volume claim on a persistent volume. Just state how much memory you want and you're set.
> If you want k8s, you really do need people that know how to maintain it on a more or less full time schedule.
What is the alternative to k8s that does not need people to have any technical knowledge?
To me Kubernetes is extremely attractive because it helps me avoid learning cloud vendors' proprietary technologies. K8s is learn once, use everywhere, which is fantastic.
I am a 1-person venture doing everything from JavaScript/React to maintaining backend infra, and I couldn't have done it without k8s.
Plain old linux is the alternative, which is also "learn once, use everywhere" whether its AWS EC2 or GCP Instances or nearly any machine under the sun.
I don't see how k8s avoids the need to learn about cloud vendor specific tech. e.g searching "aws RDS k8s" gives me a beta github packages and a bunch of blog posts on how to configure it right. It doesn't sound like much less work than learning how to use RDS without k8s - read their docs, figure out the API.
Maybe i'm an "old man yelling at new tech" but meh i just see very little value in k8s because you inevitably need to understand the layer beneath - linux (k8s is far from a non-leaky abstraction imo), PLUS all the complexity of k8s itself. I do see the value when managing a big and complex infra with 100s of servers or something, but very few people have that problem.
How do you run an application on a cluster of plain old linux machines? How do you do load balancing? How do you scale up and down? How do you update your app without downtime? How do you roll back easily if something goes wrong? How do you ensure all your servers are running the same version of dependencies? How do you update those dependencies? How do you replicate your environment if you want to add a new server to your cluster? If your app has microservices how do services discover each other? How do you mount volumes from cloud storage? How do you update configuration? How do you automatically restart failed applications? How do you monitor if your applications are working? How do you make sure the right number of MongoDB replicas are running at all times? How do you view your log files remotely? How do you port-forward from localhost to your Linux server to test your app locally?
These are commonly raised concerns, all of which have answers much simpler than "install this giant distributed system". I'll go ahead and answer them since I take the questions to be in good faith...
> How do you run an application on a cluster of plain old linux machines?
Build a package, install it in an image, run that image in an autoscaling group (or whatever equivalent your cloud of choice offers).
> How do you do load balancing?
An Elastic Load Balancer (v1 or v2), HAProxy, an F5 - this is deployment environment specific (just like in Kubernetes).
> How do you update your app without downtime?
Blue-green deployment, or phased rollout.
> How do you ensure all your servers are running the same version of dependencies?
Build them from a common image.
> How do you update those dependencies?
Update the Packer template that builds that image.
> How do you replicate your environment if you want to add a new server to your cluster?
Start the server from the same image.
> If your app has microservices how do services discover each other?
Consul, or DNS, depending on your appetite.
> How do you mount volumes from cloud storage?
It's a bit unclear exactly what you mean here, but I'll assume you mean either block devices (just attach them at machine boot, or on startup if they need a claim), or NFS.
> How do you update configuration?
Either update Consul and have it propagate configuration, or update a configuration package and push it out.
> How do you automatically restart failed applications?
Systemd restart policy.
> How do you monitor if your applications are working?
From outside - something like pingdom, and some kind of continuous testing. It's critical that this is measured from the perspective of a user.
> How do you make sure the right number of MongoDB replicas are running at all times?
Somewhat flippant answer here: the right number of MongoDB servers is zero. More generally, by limiting the size of an autoscaling group.
> How do you view your log files remotely?
Cloudwatch, Syslog, SSH (depending on requirements).
> How do you port-forward from localhost to your Linux server to test your app locally?
So if you'll indulge me -- this list is exactly why a system like Kubernetes is valuable and why I think personally that it contains a lot of essential complexity.
Kubernetes attempts to do all of the above, which is why it's so massive, and I'd argue it's actually less complex than knowing all the tools above -- but it's an equal degree less universally applicable. In this way, it's perfect for the dev who never wants to "do ops", and less so for the dev that already knows ops (or any regular sysadmin/ops person), because they already know all these answers.
Yeah, this is where it kicked in for me. Never mind the fact that all of that is AWS specific and absolutely doesn't help you if you ever move clouds. Great to know all the stuff below, but Kubernetes is a wonderful abstraction layer above that stuff, and it gets better every day.
CF could have become Kubernetes -- it was supposed to be, but it just never got the mixture right (and of course is AWS exclusive).
I find the ridiculous false dichotomy between Terraform for Kubernetes and Cloudformation for more basic infrastructure even more ironic given that I am still the eighth most prolific contributor to Terraform _over three years after leaving HashiCorp_.
> So if you'll indulge me -- this list is exactly why a system like Kubernetes is valuable and why I think personally that it contains a lot of essential complexity.
Yes. I would agree to your statement precisely as an answer to @jen20.
Some things such as getting stateful systems, HPAs and persistent storages were a little tricky initially but a breeze after.
But I do want to mention that you really really need a team to look after it. Without it, it'll bound to be another snowflake.
[edit]: <sigh/> i meant to say stateful when i wrote stateless.
Thank you. I use most of this, I've been using it for years and I just don't talk about it because it's hard to argue when people just want to force an idea that k8s is "really the best way of doing things".
Also, haproxy is one of the most reliable software I've ever used.
> Thank you. I use most of this, I've been using it for years and I just don't talk about it because it's hard to argue when people just want to force an idea that k8s is "really the best way of doing things".
I wouldn't call it the best way; Rather a good way because Kubernetes does encapsulate the really good bits from scalability, development, security and reliability aspect. It's not a panacea but if you have team bandwidth to run k8s cluster, it's definitely worth a look.
I feel like your post describes exactly what Kubernetes and container images would bring to your infra.
If you were to deploy a solution like you described, you would get something more complex than simply running Kubernetes, except worse. I suspect that you believe your solution would be simpler only because you are more comfortable with those technologies than with k8s. The more I read criticism of k8s, the more I'm persuaded that what people calls "old boring technologies" truly is "technologies I'm comfortable with".
On top of that, you'd need to separately document everything you do on your infra. The advantage of Docker images over AMI is that you have a file that describes how the image was built. With an AMI, you would need to hope that the guy who created the AMI documented it somewhere (or hope that he has not quit). Same goes for k8s, where configurations are committed into your repository.
At the end of the day, k8s stays a tool that you should use only when needed (and also if you have the capabilities of using it), but I think you shouldn't discard it simply because you are capable of producing the same result by other means. You get a lot more by using k8s, in my opinion.
> the more I'm persuaded that what people calls "old boring technologies" truly is "technologies I'm comfortable with".
My infrastructure runs in Nomad on bare metal. I am by no means opposed to “progress”, I just don’t think Kubernetes is the be-all-and-end-all of infrastructure and would like to have a less hysterical debate about it than the parent to my original post presented.
To be fair though, that's not "Plain old linux" like zaptheimpaler suggested was somehow possible. That's linux plus AWS managed services plus software from Hashicorp. Which is a great stack to be on, but has its own complexities and tradeoffs.
Thanks for this great list of answers. I was startled to see someone vomiting a list of unresearched questions as if they constituted a rebuttal. "Taking the bait" was the right call. Thanks again.
Would you mind pointing out some examples that you can achieve these on a bare-bones installation in an automated manner? Like, I mean sharing some real examples that I can install and forget. You can deploy the simplest "hello world" webserver in any language of your choice.
>> How do you run an application on a cluster of plain old linux machines?
> Build a package, install it in an image, run that image in an autoscaling group (or whatever equivalent your cloud of choice offers).
How come it is any different than running the Docker image on Kubernetes? The image build process is the same, delegating running the image to the platform is the same, the only difference at this point is the name of the platform you are running. Even if you were deploying ZIP archives to Elastic Beanstalk, if it doesn't work as expected, you'd have to debug it as an outsider, and you'd still have to know about the technology. I don't see how it is any different than Kubernetes.
>> How do you update your app without downtime?
> Blue-green deployment, or phased rollout.
How exactly? There are gazillion ways of doing them, they are rough concepts, what we need is a reliably working setup that requires as much effort from us as possible, and there are absolutely no standards on how to do them. Are you going to use Ansible? Maybe just SSH into the node and change the symlink? Maybe some other ways?
>> How do you replicate your environment if you want to add a new server to your cluster?
> Start the server from the same image.
How do you do that? You'd either do that manually on AWS console, or build some tooling to achieve that. If you were to do that via the autoscaling options the vendor is providing, then it is no different than Kubernetes: if that doesn't work then you'd have to debug regardless of the platform that is managing the autoscaling.
>> If your app has microservices how do services discover each other?
> Consul, or DNS, depending on your appetite
What is the difference between trying to learn how does Consul handle service discovery vs how does Kubernetes handle it?
>> How do you mount volumes from cloud storage?
> It's a bit unclear exactly what you mean here, but I'll assume you mean either block devices (just attach them at machine boot, or on startup if they need a claim), or NFS.
Would you mind sharing examples that are not vendor-specific and that'd be configurable on a per-service fashion easily, hopefully without writing any code?
>> How do you update configuration?
> Either update Consul and have it propagate configuration, or update a configuration package and push it out.
How is this any better than pushing your changes to Kubernetes? I personally don't know how does Consul work or how to update a configuration package and push it out to somewhere, I don't even know where to push them. In this context, learning them is also not any better than learning how to do them on Kubernetes.
>> How do you automatically restart failed applications?
> Systemd restart policy.
So, this means that you'd need to learn how to utilize Systemd properly in order to be able to start running your application and write the configuration for that somewhere, and also deal with propagating that configuration to all the machines.
>> How do you monitor if your applications are working?
> From outside - something like pingdom, and some kind of continuous testing. It's critical that this is measured from the perspective of a user.
The question was not really that. The tools like pingdom won't help you if an internal dependency of your application starts failing suddenly. You need a standardized solution for gathering various standard metrics from various services of yours, things like request rate, error rate, request durations, as well as defining custom metrics on the application level such as open database connections, latencies on dependencies, and so on. You will definitely need a proper metrics solution for running any serious workload, and in addition to that you'll also want to be able to alert on some of these metrics. There is no standardized solution for these problems, which means you'll need to roll your own.
>> How do you view your log files remotely?
> Cloudwatch, Syslog, SSH (depending on requirements).
The proper alternative to the Kubernetes' solution is Cloudwatch, and even then the simplicity of `kubectl logs <pod_name>` is still better than trying to understand how Cloudwatch works.
>> How do you port-forward from localhost to your Linux server to test your app locally?
> SSH.
This is not a trivial setup. Let's say you have a service A running remotely but it is not exposed, meaning that you cannot reach it from your local machine, and you'd like to be able to use that while developing your service B locally, how would you set this up in an easy way?
The points regarding the images are the same points as any Docker image, so it really boils down to the choice and one doesn't have an advantage over the other in this context.
What I am trying to say is: there are quite a lot of problems when running any kind of serious workload, and there are thousands of alternative combinations for solving them, and they were solved even before Kubernetes existed; however, there were no standardized way of doing things, and that's what Kubernetes is allowing people to do. There are definitely downsides of Kubernetes, but trying to point specific examples like these don't help as they are just names of individual software that also have learning curve and they all operate differently. I do wish there was a simpler solution, I wish Docker Swarm succeeded as a strong alternative for Kubernetes for simpler cases as it is brilliant working locally, and I wish we didn't have to deal with all these problems, but it is what it is.
As of today, I can write a Golang web application, prepare a 10-lines Dockerfile, write ~50 lines of YAML and I am good to go: I can deploy this application on any Kubernetes cluster on any cloud provider and have all these stuff defined above automatically. Do I need to add a Python application alongside: I just write another 20-lines Dockerfile for that application, again ~50 lines of YAML for Kubernetes deployment and bam, that's it. For both of these services I have automated recovery, load balancing, auto-scaling, rolling deployments, stateless deployments, aggregated logging, without writing any code for any tooling.
The difficulty of all that stuff on plain old Linux is overstated and the difficulty of doing that on K8s is understated. And I agree with grandfathers point that if you don’t understand how to do that on plain old Linux you will struggle on K8s. K8s is ok for huge enterprises with tons of engineers but somehow K8s advocates make it seem like learning and operating nginx on ubuntu is this huge challenge when it’s usually not.
I think the point is more that the complexity of a cluster of VMs that manage the lifecycle of containers is often overkill for a service that would work with nginx installed on Ubuntu, and that often times the former is sold as reducing complexity and the latter as increasing it.
No, the assertion that nginx on ubuntu is equivalent to kubernetes os mind-numbingly wrong, for starters for being entirely and completely oblivious to containers. The comparison isn't even wrong: it simply makes no sense at all.
And no, being able to run software is not equivalent to Kubernetes. It's not even in the same ballpark of the discussion. You are entirely free to go full Amish on your pet projects but let's not pretend that managing a pet server brings any operational advantage over, say, just being able to seamlessly deploy N versions of your app with out-of-the-box support for blue/green deployments and with the help of a fully versioned system that allows to undo and resume operating with a past configuration. You don't do that by flaunting iptables and yum, do you?
If anyone needs to operate a cluster then they need tooling to manage and operate a cluster. Your iptables script isn't it.
DNS round-robin sends inbound to 2 haproxy servers, which proxy traffic on to the rest of the cluster. Scaling means "add another box, install your app, add haproxy entry". Service discovery is just an internal-facing haproxy front. If you must have "volumes from cloud storage", you use NFS (but if you can help it, you don't do that). Updates, restarts (including zero-downtime courtesy of draining the node out of haproxy), etc. are all quite doable with SSH and shell scripts. You run an actual monitoring system, because it's not like k8s could monitor server hardware anyways. Likewise, syslog is not exactly novel. I... don't understand why you're port forwarding? Either run on the dev cluster or your laptop.
So yes, you'd need a handful of different systems, but "k8s" is hardly a single system once you have to actually manage it, and most of the parts are simpler.
Having done exactly that in the past, both by hand and with configuration management (including custom scripting that synchronized HAproxy configs etc.), I would say that I can do all of that much, much simpler in k8s.
Installing application and routing it with load balancing, TLS etc and support for draining is really simplified, something that used to require annoying amount of time (it got better as Lua scripting was extended in HAproxy, but by that time I also had k8s).
Resource allocation, including persistent storage (including NFS) is so much easier it's not funny, it makes my old years painful to think about. Syslog is not novel but getting all applications to log the same way was always a PITA, and at least with containers it's a bit easier (I still have to sometimes ship physical logs from containers...).
As for monitoring, that's one of the newer and more advanced topics, but it's possible to integrate more complex server monitoring with k8s - it already provides a lot of OS/application side monitoring that really makes it easier to setup observability - but now there's quite simple way to integrate custom node states into kubernetes allowing a quick glance way to check why a node is not accepting jobs, integrated with the system so you can actually trigger from a health-check that the node is in trouble and should not take jobs.
The easy answer is that you don’t need to run your service on distributed across 1000 VPS instances.
A handful of dedicated machines is enough. For example: stackoverflow.
None of these are difficult to answer. You can set up automated deployment/rollback however you want. You don’t care about the dependencies your code is compiled and linux has binary compat. You don’t need to split your app into 1000 unmanageable interdependent microservices. You have enough disk space on your dedicated machine or use NFS. You set up your systemd service file to auto restart your service. Etc etc.
When you have k8 as a hammer everything looks like a nail.
“Plain old Linux” really isn’t an alternative to K8S though. You would need a load balancer, service discovery, a distributed scheduler, a configuration management system (K8S is a very strong alternative to building things around Ansible IMO). You can do all of those things without K8S, of course, but not with “plain old Linux” (what would that be anyway? GNU coreutils and the kernel? Vanilla Debian stable?)
Maybe I’m way off, but aren’t all of those things required for k8s? Ingress controllers, etcd cluster, terraform modules, storage configuration, etc...
I guess if you pay for a hosted service a lot of the control plane is taken care of.
I’ve used k8s in orgs where it’s a great fit and really fills a need, but it is considerably more complex than running a web service balanced across a couple of machines, and it definitely requires a lot more upfront complexity (as in, you have to solve problems before you have they are actually problems).
> Kubernetes is designed to be cloud and vendor agnostic.
But it’s not. Connecting to EKS is completely different from connecting to GCP. Setting up worker nodes is completely different too, oh and load balancers.
We've migrated from Heroku to EKS using Terraform to set it up.
We done a small PoC for Azure, and within a day we had a small cluster and one of our apps running. The app terraform code required only 1 variable change.
Sure, it's far from drop & replace, but "completely different" is a huge misrepresentation in my experience. Running multi-cloud seemed quite doable.
I feel like what 98% companies need is really just barebones linux with some good documentation how to spawn new nodes.
To use k8s you need to know linux anyways, but to use linux you don’t need k8s knowledge. Most of things are as easy to setup with linux and the things where k8s really shines are not needed most often.
What I really wonder is where did you learn k8s, which parts did you learn the most? It seems huge and I would love to be in your position
K8s is a nice api on top of gnu/linux. Want a iptables rule? Write a yaml (network policy). Want a storage for you app? Write a yaml (persistent volume) etc etc.
For people who already know Linux, kubernetes comes naturally because it is pretty obvious.
But indeed, by experience, many companies can go to "unicorn scale" with two or three boxes.
Couldn't you get nearly the same behavior using some basic Ansible playbooks? My impression was that the killer feature of k8s was scaling, automatic failover, etc., although to be fair it's been several years since I last looked into it.
Think of kubernetes as a cluster operating system. Instead of dealing with vms, you deal with your applications directly, without worrying about where they're running. It gives you a unified view and ability to manage a distributed system at the level of the application components.
Ansible can't give you anything like that. Even if you use Ansible to automate something like network setup, the commands and modules you use will be different between e.g. cloud providers. Kubernetes give you a consistent abstraction layer that's almost unchanged across providers, with the exception of annotations that are needed in some cases for integration with certain provider services.
For me k8s kinda extends what you can do with ansible.
For example, if you use just kubelet (the main daemon) without the api, drop some static yaml files (ala unit files) under /etc/kubernetes it will be something like systemd is. No big deal so far, but using the kubernetes api (just another daemon) it allows you to run the all the things anywhere.
I mean, you terminate one computer, k8s will figure out that the program should be running and it will be started in another computer. (Including moving detaching/attaching the block device from/into computers, this kind of things)
There is no much secret.
For me it really feels like an (well designed/integrated) API on top of standard Linux technology (ipvs, netfilter, mount, process mgmt, virtual ip, etc) via yaml running your stuff in 1..N computers but you manage it as one big computer :)
I go in and out of development and ops and devops. 20 years.
To me as a developer heroku is the gold standard.
Docket-compose makes sense.
Kubernes and helm is a giant soul crushing wtf On getting anything done.
Now I’ll get it figure it out. Angular + exhaust+ typescript was a similar experience after I took a break from front end for a few years. A few months of pain, then it starts clicking.
but this just seems insane for a basic web app.
So many different tools needed to get it going.
Tutorials all have many steps that don’t apply, or use other odd pieces swapped in.
Many tutorials are unfortunately bad, they actually got worse over time.
Oooold presentations (think 2016 and older) tended to talk more about basic building blocks and how they interacted, especially the design involving resources and their controllers working in a loop of "check requested state, check real state, do changes to implement requested state", and how those loops went up from single pods, through ReplicationControllers (now ReplicaSets), then Deployments, etc.
This is the exact problem I ran into on rxjs. Every tutorial was badly out of date. Even ones written six months earlier.
You had to become a master of figuring out the direction of where community was going and stay on bleeding edge.
Some people found this fun. I have many times enjoyed this. When I want to get a boring feature built and hand off to the jr develops so I can spend some time with my kids. It’s frustrating
The most annoying thing, to me, is that Kubernetes doesn't even move fast enough!
The old presentations? Ones that remember K8s 1.0? There's one major change (move from Replication Controller to Deployment+ReplicaSet) that doesn't invalidate ~90% of the material, because the core stuff is about how controllers work!
Yet it seems more and more common to me that people don't learn the core mechanism of kubernetes unless per chance they got there writing CRDs :|
> Couldn't you get nearly the same behavior using some basic Ansible playbooks?
No. For starters, Ansible playbooks don't allow you to dynamically deploy and redeploy a set of containers in the node that at a given moment happens to have more available computational resources such as bandwidth, nor do teu respawn a container that just happened to crash.
Arguably Ansible only implementos a single feature of Kubernetes: deploy an application. Even so, while with Ansible you need to know and check all details about the underlying platform, with Kubernetes in essence you only need to say "please run x instantes of these containers".
Ansible gets you like 80% of the value of Kubernetes, yes. But you have to allocate applications to hosts manually, your configuration is less declarative, and what do you gain?
What about service discovery? What if you need to put power your node down for maintenance? What if your app goes crazy and consumes so much memory it OOMs some other services? Reliable, multi-service deployments ARE HARD and always be regardless what tech you use to achive it. Of course you can use Ansible and do it old-school way and wast lots of underutilised VMs and custom idempotent scripts. But K8S solves many of challenges in a standardised, data-driven way. It has steep learning curve but once learned, all resources/limits are properly set, logging in place, etc, it works nicely.
Firstly tell me more about your 1-person full-stack venture, but second how comes you, with barely any time for sitting down can use k8s happily but it falls over for others. I am struggling to see truth amount the comments here :-(
TBF with a completely greenfield project and managed K8s (GKE or EKS) you can absolutely get a pretty well set up infra very quickly if you are willing to learn how to do so.
I often get the feeling a lot of the negativity comes (rightfully so) from trying to replicate a current existing project into kubernetes. This is true of almost any paradigm - try replicating a Java EE monolith into Erlang and you are going to have a lot of problems. The big thing to note is that starting a project on Erlang very well might solve the problems that a Java EE project ran into, but that is because they were able to solve them at the ground floor, and just popping a Java EE project with all of it's architecture into an Erlang project will probably end up in a worse spot.
I think that this is what often happens with k8s as well - if you or your company have a currently working implementation that isn't on k8s, of course you won't be able to just easily plop it into a k8s cluster and everything be all well and good, but I think the problem is that people are equating that issue with k8s itself, which is a completely different paradigm.
> managed K8s (GKE or EKS) you can absolutely get a pretty well set up infra very quickly
And then tear your hair out when something doesn't work for some reason and root causing it requires learning a stupid number of layers. k8s is easy until it goes wrong.
Not the GP, but I honestly couldn't tell you. A lot probably comes down to tooling, the applications you are deploying, security requirements, etc., as well as how familiar you are with k8s itself.
I migrated PCGamingWiki from running on some Hetzner boxes to DigitalOcean Kubernetes in a few days of work creating Dockerfiles and k8s manifests. I run a Kubernetes cluster at work fairly hands-free that hosts applications critical to our billing operations, and developers on the team deploy new applications with little or no support. Any of the issues I've hit are an artifact of migrating legacy applications not designed to run in more-or-less stateless environments, which is why the PCGW Community site still runs on its own server (Invision Community sucks).
I really don't see all the issues people have that aren't due to a mismatch of application design vs target environment (and no, it's not monolith vs microservice - monoliths run just fine on k8s; but you should be designing your application with a 12-factor environment in mind) or a misguided notion that you will be drowning in YAML hell (it's real, but you can manage it - and it's directly related to the complexity of the services you are deploying).
I can totally agree with this. Most of my customers that I see struggling with K8s are those that haven't internalized 12-factor principles: not just heard, read or understood, but really internalized. It is unfortunate that K8s talks / blob posts / articles do not focus enough on this pre-requisite.
Seems like you are a thoughtful engineer who can also make sure you don't make any fundamental design flaws while building these systems well. Whenever I have seen kubernetes fail it's often also because the engineer(s) who built it were not thoughtful at all and often didn't fully understand what they were doing. Perhaps k8s's failing is it makes people who dont know enough think that they do.
I mean, I’m far from perfect - the trick has always been to KISS. I don’t use istio, it’s absolute overkill for my needs. I use nginx-ingress because it fits the bill, I know nginx as do enough other people that they could exec into a pod to debug it. I don’t run stateful applications that aren’t prepared to have servers randomly vanishing because it take a LOT of work to get these running in-cluster. I don’t use public helm charts because they often suck and making your own container is something you can do quickly if you were able to deploy the software on a traditional server. Every choice I make is done with the day 2 operations in mind, not what is hot, what gets initially deployed fastest - but what makes it so I can touch the thing as frequently as possible.
PCGW is a great example - installing a new Mediawiki extension, changing a config file, upgrading to a newer MW release is just updating a file or a git submodule and committing. I don’t get paid thousands a month to manage the site for the owner, so I make my time spent as efficient as can be done.
I think this is a big failing of the DevOps movement as a whole (at least what DevOps became in practice — devs doing ops) which results in things like passwordless mongodb exposed to the internet...
> I think this is a big failing of the DevOps movement as a whole (at least what DevOps became in practice — devs doing ops) which results in things like passwordless mongodb exposed to the internet...
Hardly. If anything at all, it tells about the _team_ and/or the culture of the organisation. In any DevOps/SRE/Opsec culture worth the salt, an immediate blameless postmortem analysis would be performed to help with premortem analysis in future.
DevOps is not about exposing unsecured endpoints. You've got it all wrong son.
I'm not your son, nor am I talking about what DevOps "is about", but about what is became in practice, which you would have understood hadn't you rushed to reply in the most condescending tone you managed to invoke.
> I'm not your son, nor am I talking about what DevOps "is about", but about what is became in practice, which you would have understood hadn't you rushed to reply in the most condescending tone you managed to invoke.
Fair enough. I am letting you know though the part where you got mixed up - that is not because _What DevOps has become in practice_. That is precisely because of failings and shortcomings in team culture and/or the organisation that practice DevOps.
This is pretty typical when a product is still climbing the adoption curve. I've helped small companies set up (managed) k8s clusters and migrate their apps to them, and when you know what you're doing, it's a super smooth experience that's basically all upside.
But, if you're approaching it for the first time with no assistance, there are lots of things that can trip you up, and lots to learn. That's not a reflection on k8s, it's just the nature of the large set of problems it's solving.
K8s is succeeding because it's very well designed, has a large and diverse ecosystem, and solves set of important problems that very few other tools even try to tackle. Apache Mesos perhaps comes closest, but it's not quite as pragmatic, and its adoption level reflects that.
Also, because of k8s' scope, many people may not fully appreciate the range of problems it's solving, seeing it through the lens of their own background and focus.
People who don't like - or don't "get" - declarative systems tend to spend an inordinate amount of time and effort fighting them. I've seen the same thing with a delcarative build system (maven), or with adopting an ORM - if you're willing to work with the tool then it will save you a lot of effort, but if you're determined to do things your own way then you can make it almost arbitrarily difficult.
Might also be the amount of experience a person has.
I have done Softwaredevelopment for ~15 years, swiched now to infrastructure and build a gke cluster. It was awesome, very logical, nice and easy to use.
Now i read stories from coinbase and don't get it.
> What is the alternative to k8s that does not need people to have any technical knowledge?
Paying for a managed provider. Heroku, Elastic Beanstalk, GAE, GCP, Fargate, etc. You push some buttons, they manage your cluster/services.
People still think they can get a free lunch by downloading some free software. If that were so, Windows and Mac would be dead and Linux would be the only desktop OS. But good news: I hear 2030 will be the year of the Linux Desktop!
I'm in the same boat opting for docker-compose instead. docker-compose is much simpler to manage. Obviously it doesn't have feature parity with k8s but docker-compose does the basics well.
An inexpensive VPS runs compose well with more resources at a lower cost than the managed costs of k8s.
You just run scp with the compose file and run docker compose down; docker compose up
Bonus points if you just mount the compose file on an NFS share.
I've wasted enough man months on kubernetes that unless you tell me to manage 1000 nodes this approach will never cost me more time than the time I spent using and learning kubernetes.
Agreed. Good support for compose currently, and swarm is usually overkill. With a bit more elegant HA functionality, docker-compose could be the go-to for many more. The comment below claiming he doesn't understand your comment reminds me of all of those who will say "well its not FOR production" - seems more like superstition than science.
It's a pity that docker swarm did not make it. It wasn't perfect but it was a lot simpler to setup and manage than kubernetes.
If you can get away with it, vanilla docker hosts running docker compose provide most of the same benefits with a fraction of the cost. For most startups, that's a great way to avoid getting sucked into a black hole of non value adding devops activity. You lose some flexibility but vanilla ubuntu hosts with vanilla docker installs are easy to setup and manage. We used packer and ansible to build amis for this a few years ago with some very minimal scripts for container deploys.
No matter what they claim, it's really not supported in the sense most commercial oss projects are. We finally switched off after a minor version introduced a segfault when adding nodes in certain conditions, and the issue was unfixed after 5 months.
It exists but in terms of people using it or it being actively developed, it's dead as a doornail ever since Docker was more less forced to also support kubernetes and basically gave in to the reality that world + dog was opting for kubernetes instead of swarm.
They never really retired it but at this point it's a footnote in Docker releases.
I've not actually encountered it in the wild in four years or so and never in a production setup.
Docker Swarm doesn't support multiple users and there is no remote-accessible API. Nomad doesn't implement network policy (Nomad Connect sidecars may be an option but sidecars bring new problems). Just learn K8S, Helm, Terraform & Terragrunt properly. Use proper tools (k9s, Loki, wrapper scripts around kubectl). Stop finding excuses for not using K8S. Stop putting proxies everywhere (that Istio/servicemesh bull*hit) and use Cilium CNI instead.
I'm also a fan of swarm, and still use it in production.
It's just so damn simple in comparison to k8s - basically, if you know Docker Compose, you know Docker Swarm.
I appreciate it doesn't have the full power of k8s, but it has what most apps need: simple deployments, zero-downtime updates, distributed configs and secrets.
You can't make the complexity disappear. Kubernetes just offers a very standardized, stable and relatively polished way to handle it.
The other alternatives, like running your own orchestration system, are just as complex if not more so, although you might be more familiar with it since you built it.
That being said, many companies don't need any of it in the first place for their scale, and that's probably the biggest issue with K8S today.
> That being said, many companies don't need any of it in the first place for their scale, and that's probably the biggest issue with K8S today.
Yes. I helped push out a k8s platform where I work, and it's been running well in production for ~3 years.
We most definitely did not start with "we want to deploy k8s". We started with "we're being asked to meet certain business requirements", which lead to "we will need to change how we do some of our development and deployment", which lead to "our platform will require these characteristics."
The easiest way to get those turned out to be buying a packaged k8s from a vendor (OpenShift, not that it matters; there's plenty of options).
Most of the pain with k8s seems to come down to people wanting to polish their CVs (by "having done k8s"), or people who sneer at packaged/hosted solutions because they want to build a cottage industry of building k8s services from scratch because it's their idea of a good time.
Unsurprisingly both take orgs down the route of pain, money, and regrets. Oftentimes the people driving this decision then prefer to say the problems is k8s, or containers. Or microservices or whatever fad it was they were chasing without understanding whether it would be a good idea.
It's impossible to make essential complexity disappear but it is certainly possible to reduce incidental complexity. Most software is much more complex than it needs to be and Kubernetes is no exception.
Before k8s every serious shop automated the crap out of their infra. Jump/kickstart recipes, rolling cluster patching that split RAID mirrors before applying, blue/green deployment scripts to tickle the loadbalancer, cron jobs to purge old releases ...
That stack is super complex and utterly bespoke to the company.
With k8s it’s standardised and usually better quality.
It's on a path to standardised, but not there yet: etcd vs. others, different ingress controllers, providers replacing most of network parts, storage is bumpy/not so standard, deploy may be kubectl apply/helm/operator.
I would really appreciate a more mature ecosystem.
I'd say it the other way; k8s has to do all the things that another management system would have to do, but sufficiently tied together that you can't ease into it as needed.
I guess that depends primarily on whether you're installing and operating K8S yourself. Use something like GKE and it's a very seamless experienced, with AKS getting pretty good and the rest being rather crummy.
Once you have a managed cluster, the deploying apps is fairly easy. A single container/pod is a 1-liner and you can work your way up from there.
That's fair; all of this is heavily influenced by your operating environment. If you can run GKE or if you have an ops team to deal with that stuff, then yeah k8s is great. Unfortunately, I'm part of the ops team, and our company is too small to have a dedicated k8s team and too low-budget to (likely) do well with a managed service (we get absurd value per money out of bare metal servers, which is very much a tradeoff that is sometimes painful). So to me, k8s looks like a very iffy tradeoff. Bigger company, bigger budget, different constraints? Yeah, k8s would be great.
> we get absurd value per money out of bare metal servers, which is very much a tradeoff that is sometimes painful
What do you find most painful about your use of bare-metal servers? The thing that I like most about a hyperscale cloud provider is the level of redundancy, including even multiple data centers per region, and their built-in health checks and recovery (e.g. through auto-scaling groups) based on that redundancy. With bare-metal servers, I'd have to cobble together my own failover system for the occasional time when one of those servers goes down or becomes unreachable due to a network issue. And of course, I'd probably find out that my home-made failover system doesn't actually work at the worst possible time.
I don't know that it's a single big thing; it is indeed many small things that we have to manage ourselves. We backup the databases with our own scripts, we failover manually (I miss RDS), we use an overlay network because no VPC, and deployments involve ansible running docker-compose. There's basically no elasticity; we provision massive bare metal servers with fixed memory and disk installed. But, it is dirt cheap, so we manage, and all the pieces are small and easy enough to use.
k8s and even docker are trying to solve a problem not many people will ever face. However, being the sexy new things loads of people get sucked into integrating it into their stack from the word go.
Being reluctant to adopt new technology unless you really really need might be a more sensible thing to do.
> This actually was a very real problem at my current job. The data pipeline was migrated to k8s and I was one of the engineers that worked to do that. Unfortunately, neither myself (nor the other data engineer) was a Kubernetes guy, so we kept running into dev-ops walls while also trying to build features and maintain the current codebase.
I've experienced this as well. At the last large company I worked at we had a Heroku-like system to run our apps on. That was deprecated for a Docker-based solution. And then _that_ has now been deprecated for a Kubernetes offering. We just ran some Python web apps–we didn't want to have to learn and support an entire system. And here's the thing, most big tech companies I've worked with are all made up of these small, "internal" services, that just want a simple place to run their services.
I've used k8s on an early phase project I later moved off from so I don't have much experience with it but I got the impression that simple scenarios worked like documented. You already had a docker based stack so it sounds like hosted k8s shouldn't be that far off from what you were running.
What issues did you encounter ? I had to spend a week working through it to get familiar with everything but I wasn't experienced with docker previously (I've played with it a few times but never had to setup stuff like custom registry, versioning, etc.)
I did end up in situations where the cluster was FUBAR but I eventually figured out that everyone just recommends rebuilding the cluster over diagnosing random stuff that went wrong during development.
Been there. Simple python django app. Max 20 APIs. Expecting 100 requests per month. But POs and directors wanted shiny DevOps tools. Problems with typical MNCs. Every quarter manager/director comes with some new hype.
If it's a hosted k8s, all you'd do is containerize your application, then create a deployment that pulls that container and exposes it through a loadbalancer. The container is the same if you'd push it to herokus container registry or beanstalk
On k8s, you write those env variables into the deployment manifest that describes the container. If they are sensitive, pack them into a k8s secret and add that to the container instead.
You can use the external services the same way if they are already setup with things like DATABASE_URL that holds a postgres connection string to amazon RDS, etc. Running k8s doesn't mean you have to move your entire stack into k8s, you can still use hosted services just fine.
In all my time with k8s, I never used Helm. If you don't need it, don't use it and keep it simple. k8s can do a ton of stuff but in reality, you barely need more than the basic.
> If you want k8s, you really do need people that know how to maintain it on a more or less full time schedule.
I think this is roughly similar to saying "if you want linux, you need people that know how to maintain it." Which is to say, you can create an architecture where this is absolutely true, but it doesn't need to be true.
The big issue with K8S right now isn't K8S; its that there aren't (big, well established) solutions like Heroku or Zeit or whatever, for K8S, where you don't need to worry about "the cluster", just like those solutions don't make you worry about Linux. K8S really is two parts; the API and the Cluster. The API is the more valuable of the two.
And, you know, maybe it won't ever get there. Heroku and Zeit solve strikingly similar problems to K8S. Maybe K8S just is a platform like that, but for enterprises who want to home-grow, and maybe most companies shouldn't worry about it. But I think the platform, and thus the community, simply needs more time to figure out where it makes sense.
Most companies shouldn't touch K8S. You'll probably regret it. But, to your second point: AWS literally has nothing beyond EKS/ECS + Fargate which approaches a Heroku-like service. Beanstalk is supposed to be that, but its really just a layer on-top of EC2 which doesn't touch the "ultra low maintenance" of a Heroku, or Zeit, or App Engine. So if you're on AWS, and you want to use their other excellent managed services, you either go outside AWS, or you'll go EKS, or you'll end up trying to in-house something even worse.
> I was one of the engineers that worked to do that. Unfortunately, neither myself (nor the other data engineer) was a Kubernetes guy, so we kept running into dev-ops walls
This seems like an organizational problem to me. "DevOps walls" sounds like there is a "DevOps team" (a famous DevOps anti-pattern) and there are knowledge silos between development and ops, which ironically is the exact opposite of what DevOps is about. What this also means is that developers need to be aware of the environment in which their services run and should be very familiar with how k8s works, why not take that as an opportunity to learn?
But this makes no sense. Why would a cloud operator even bother with k8s if the customer is only interacting with functions? It’s much more efficient to bypass Kubernetes and run directly on the cloud’s native system (like borg).
I think you’re right though - Kubernetes is a massive red herring, we should ideally be running containers/functions on as close to bare metal as possible. Fundamentally, VMs are the wrong abstraction if all your code is containerized.
Joyent’s triton is the closest thing we have to this... I really don’t know why AWS/Azure/GCP haven’t cottened onto this, it would massively reduce their COGS and improve our developer experience.
> Why would a cloud operator even bother with k8s if the customer is only interacting with functions?
Because there are various "tiers" of users, some companies (like coinbase) could actually leverage K8s in their Codeflow/Odin project and prevent a lock-in. But a regular developer looking to just "get things shipped" isn't meant to waste his/her time with pure K8s.
> Kubernetes is a massive red herring
We agree but on a different note. The biggest selling point of K8s is it's API design. The entire industry needs to converge on one "defacto" standard of packing and deployment. Google's Cloud function is a perfect example of this. The API is based on K8s and Knative but under the hood it actually runs directly on GCE rather than GKE. What happens underneath is hidden from business developer, you only care about the data in your yaml and your docker image.
>> I really don’t know why AWS/Azure/GCP haven’t cottened onto this
Conflict of interest. If k8s yields to the most revenue why would they try to decrease that? If some customers are so delusional that they go for an inefficient abstraction so be it. Btw. this is my experience with k8s too, people use it because it is a trend. Not a single company / developer could justify using it to me over leaner resources like EC2, ASG, cloud native resources, etc.
How about companies who don't want to rely on cloud and want on-prem as much as possible? Or those who don't want to be tied to a single vendor? Apple is already getting rid of their Mesos based PaaS and moving to Kubernetes.
The 'needing a team' aspect of Kubernetes sounds remarkably similar to conversations I had like 8 years ago when Openstack was the new hotness.
We went with ECS and have been happy with it. It plays well with all of AWS's other products and features. For the few things we have to run On-Prem we use Docker Swarm in single node mode and it works well (albeit missing a few features like crons from Kubernetes).
Yep. I remember the Openstack consultants swarming to office trying to sell it as a solution and not a giant overhead / problem. Luckily they failed the POC so we did not need to waste our life on a non-issue trying to solve it with a non-solution. Now it is k8s' time to do the same. We will see how far this buzzword train gonna go.
TBF I like k8s as a service or platform. Once metallb came out to solve the "bare metal load balancer" problem it could deliver the entire product I was looking for.
My big issue with it is the underlying complexity and house of cards nature of running your control plane as sidecar containers on your runtime infrastructure.
You CAN set it up and run the control plane out of band, but last I looked there wasn't step by step documentation for doing so. I also couldn't find anyone doing it in prod which to me is a nonstarter. If I can't figure it out myself and can't hire for it I'm not doing it.
I'm sure it's fine on google cloud, but ECS solves 90% of our problems AND integrates with everything else we're using already.
The first iteration (I actually wasn't around for that) was trying to run a cron for every "data ingestion job" -- at some points, we were doing about 50k+ API requests daily (FB/Instagram/Twitter/etc.) and that was absolutely not tenable using k8s cronjobs.
I wasn't there for this decision, but I assume cronjobs were being treated as "cloud functions" -- and to be fair, the k8s documentation kind of makes it seem that you could technically do that, but fails miserably if you try to do so in practice.
Run 11 person startup. Use hosted GKE. We spend less than 1 man-hour per week dealing with K8s or anything like that. K8S is a big reason we are able to out-execute our competition.
I agree with you completely, we use Kops and see a similar workload. The real boon for us is not in production, where the HA/error-tolerant/easy horizontal scaling certainly helps, but in development, where we can easily bring up ephemeral feature branch deploys as part of the CI/CD pipeline (which itself runs on k8s using Gitlab CI)
I mean, k8s has 1000s of features. I am not using every one of them. I carved out the subset I need, and it is FANTASTIC. I get
* Great rolling deploys
* Self healing clusters
* Resource sharing / bin packing (rather than have 20 half used servers, I can have 6 much more heavily utilized servers)
I could maybe frankenstein these features on top of something else.. but frankly it seems absurd. If these are the only 3 features I use to run docker pods, k8s is a HUGE win to me.
I don’t understand the value of container tech (unless dealing with legacy code).
Serverless seems better and cheaper (lambda, GC Functions, Azure Functions). Pay as you go with no overhead costs. In many cases usage may never exceed the free credits.
For the database and persistent storage, you need a cloud native service like S3 and Postgres RDS. (Running a database in a container is asking for complete data loss from what I understand.) This is the primary cost for my tech stack and for lite loads $15 is about as minimal as possible.
So I don’t see how using containers could be cheaper unless you are keeping the data in the container, which is a bad idea since containers are supposed to be stateless.
So serverless cloud native tech seems better by far from everything I’ve experienced.
Also when using a tool like serverless framework, deploying an entire stack of serverless resources is dead simple.
I’ve tried a few times to see the value of containers, but I just don’t get it.
Your comment and the post and threads in general are making me chuckle, having just spent the last half hour reading this post, for no particular reason, from a few months ago about the transformation of racknut-and-perl roles like neteng and sysadmin into text editor-based devops:
I have to say, certain comments -- which I'm sure are just as real-world as yours -- lie in...let's call it tension...with the threads and comments here in this post. That everything in the old post appears to be subsumed into a "sysadmin II: internet boogaloo" of orchestrator expertise is humorously ironic.
Nomad + Consul has been pretty great to us over the last year. We're a small nonprofit and chose it specifically because we can't afford to pay someone to keep watch over k8s
We actually had a cold start tolerance of ±30 seconds and so far, very few jobs are out of that range. This was one of our main sticking points, and the Google guys gave us some great tips on how to reduce cold starts.
I had a similar problem, setup a small cluster and it worked fine until next upgrade and nightmare repeated on each upgrade. I realize spending most of my time on fixing k8s issues rather than doing any valuable work
This reads like an oxymoron. 'DevOPS' was supposed to break down walls between teams. You probably have a traditional "Ops" organization, regardless of team naming conventions.
Anecdata: Series B startup. I've found GKE to be almost completely painless, and I've been using it in production for more than 4 years now.
I don't think the article gave a fair representation on this count; sharing a link to a single GKE incident that (according to the writeup) spanned multiple days and only affected a small segment of users doesn't (for me) substantiate the claim that "it isn’t uncommon for them to have multi-hour outages".
In my experience, multi-hour control-plane outages are very rare, and I've only had a single data-plane outage (in this case IIRC it was a GKE networking issue). Worst case I see is once or twice a year I'll get a node-level issue, of the level of severity where a single node will become unhealthy and the healthchecks don't catch it; most common side-effect is this can block pods from starting. These issues never translate into actual downtime, because your workloads are (if you're doing it right) spread across multiple nodes.
I wouldn't be surprised if EKS is bad, they are three years behind GKE (GA released in 2018 vs 2015). EKS is a young service in the grand scheme of things, GKE is approaching "boring technology" at 5 yrs old.
AWS wants k8s to fail because it works against the significant lock-in AWS has tricked teams into building themselves into. They do not want people already in the AWS ecosystem to move to EKS, it is instead there to not lose potential customers.
And that is why pricing and features sit right at “good enough” and not great.
I'm sure it works fine if you can get it running (documentation didn't work when I played with it). I'm referring more to the $200/month it was per control plane.
That to me is a product they offer because someone else is offering it, but they don't want you to actually use it.
If you're actually looking to build isolation in AWS then you're going to need EC2 dedicated for your EKS member hosts. So you're not getting isolation for $200/month (I'd have to spec it out but dedicated hosts are pricey to the point that it'd be competing with physical hardware in a colo).
That $200/month for not-really-isolated is also per plane. So if you want a separate staging environment it's another $200/month. Client API sandbox? Same thing. It's wack.
(Note the GKE SLA is for regional clusters, which is what you should be doing if you care about uptime. The zonal cluster SLA is 2.5 nines. I couldn't find a difference in EKS, maybe there's an equivalent better SLA for regional clusters I couldn't find.)
So, per my original comment, I am surprised. (Having never used EKS directly I have no idea what their actual uptime is; in my experience GKE has been way higher than 3.5 nines, but obviously I don't have enough data to make statistically significant observations on this.)
I work for a mid size company with 30-40 engineers managing 20-30 very diverse apps in terms of scale requirements and architectural complexity. It took our devops team (4-5 people) probably 18 months to learn and fully migrate all our apps to Kubernetes. The upfront cost was massive, but nowadays the app teams own their deployments, configurations, SLA's, monitors, and cost metrics.
Introducing Kubernetes into our org allowed us to do this; we would have never gotten here with our legacy deployment and orchestration Frankenstein. The change has been so positive, that I adopted Kubernetes for my solo projects and I am having a blast.
I understand Coinbase's position, and they need to stick to what works for them. I just wanted to bring up a positive POV for a technology I am becoming a fan of.
Do you have any intuition as to what percentage of the benefits your company has seen come from kubernetes specifically, and how much just from the exercise of spending 18 months working on a modern, coherent, efficient infrastructure?
Kubernetes is almost a standard, all our engineers are eager to learn it. It is extensible, some app teams have started building operators to tackle problems in a more efficient way. It allowed us to standardize metrics and monitoring, we didn’t have a clear story for this before. It is cloud agnostic and with great compatibility; I was part of a migration from GKE to EKS, and it was painless.
One other advantage of Kubernetes that is overlooked in the article is the benefit with a heterogeneous cluster of instant auto scaling. For example you have 10 apps on a k8s cluster that each use the same resources, you can give 20% buffer for the cluster, which would let any single app use 300% of their allocation instantly. With VMs, you’ll be stuck waiting for VMs to spin up or have to give each app their own large buffer to handle bursts of traffic.
_any_ non-proprietary tool is "cloud agnostic". Kubernetes is bundled software, and achieves the things those pieces achieve. There is nothing holy about k8s specifically, the tools you train with are easy, and it's very easy to get skewed opinions on that.
For example a lot of people would find writing scripts cumbersome, but not a person who's written a lot of them. They're not any more fragile than other logic error capable software is
Any load balancer will let you do that. Scaling is a few lines of scripts to do on any platform, and well worth the couple minutes, you don't even need a tool for just that.
No tool "works the rest out". It's _always_ a compromise, because inherent complexity can never be removed, only moved. What you gain in one area, encumbers another.
You may freely use k8s, but it's not magical nor easier to use than any existing systems. In fact adopting it often takes non-trivial time and the web is full of failure stories with very benign warnings and catastropic results.
Maybe I'm spoiled by GCP and it's not the same in other providers; but I can have a brand new debian VM configured and operational in less than 10s (less the time that I run my own startup script).
Debian machines spawn with incredible speed; not much slower (if at all slower) than a new container.
But: I've also been put in the terrible position of supporting a platform sold to run on kubernetes in an air gapped, on prem, bare metal environment.
But: I've also been put into the terrible position of fighting vendors armed with agile, k8s and microservices selling that combined mayhem as a replacement for an openmp,openacc based massively parallel, on prem, bare metal HPC system with a strongly conservative development and operational tradition.
I understand loving efficiency and time savings but not everything simply works better with k8s.
Exactly. A lot of the comments here are along the lines of "we setup a k8s cluster and now managing it is a huge burden" which is not surprising. The power of k8s is to allow separation of concerns in your technology organization. You can have a dedicated team to build and maintain the underlying cluster and then app development teams are consumers who deploy their application on the common infrastructure. Kubernetes provides a nice abstraction layer so those two teams/orgs can interact through a well-defined API. As a dev team, we can manage our own infrastructure and pipeline through declarative configurations and let someone else manage the underlying compute and network infrastructure. As long as you don't fall into the anti-pattern of "every team builds it's own k8s cluster" then you should be able to derive some nice economies of scale.
As someone whose gone the opposite way (moving from ECS to Kubernetes), I think the author is understating how good managed Kubernetes solutions are.
At my current job, I use Azure's managed Kubernetes service, which does a great job at providing a consistent environment that's very easily managed, no unexpected updates, great dataviz, and if you choose, simple integrations to their data storage solutions (run stateless K8 clusters if you can) and key vault. We don't do much outside of our kubectl YAML files, which as commented below has a de-facto understanding by a large number of people.
CVEs will always exist, which is why network security is important. I think we can agree that the only ingress into your cloud environments should be through API servers your team builds, and everything else should be locked down to be as strict as possible (e.g. VPNs and SSO). With a system like K8, so many eyes on the code mean so many more CVEs will exist, so I don't find this argument compelling.
My team, and so many other teams worldwide are betting that the K8 community will accelerate much faster than roll-your-own solutions, and K8 gives us the best opportunity to create cloud-agnostic architecture. Additionally, helm charts are easy to install, and afaict more software vendors are providing "official" versions - which means for a team like mine, which is happy to pay for services to manage state, in the same vain a company chooses AWS RDS over managing their own Postgres server, we can get the same benefits as the author with a cloud-agnostic solution.
One thing that is regrettable about K8s winning the orchestration wars so remarkably, is that it pretty much killed all other solutions. Swarm is dead, Nomad doesn't seem like it has much community support and Mesos feels like it's on life support. Mesos still has a lot of people working on it however, but the perception feels different.
Personally I've found Mesos much easier to manage, secure, and operate than k8s. However, when it first came out all the cool kids were using it, then most of them jumped ship to k8s. AirBnB's Chronos is now pretty much a dead project, Mesosphere's Marathon is now gimped (no UI) and major features moved into DCOS. At the same time, Mesosphere (now D2iQ?), now seems like it's more focused on k8s.
k8s is everything plus the kitchen sink, and managed k8s isn't the killer feature I thought it would be. I don't blame people for not jumping on the k8s bandwagon at all.
We're using Nomad to manage a large fleet of firecracker vms at fly.io. It's not as robust as k8s, but I think that's a feature. It's well documented, extensible, and predictable. Not a big community, but hashicorp folks are responsive on GitHub.
Nomad, Consul + Swarm is still a lovely solution and I prefer it a lot to K8s. K8s is a big monolith and often way too complex for my personal use cases.
I hope Hashicorp sooner or later builds a proper replacement for Swarm so that we can have overlay networks without hassle. I know there is Weave, but never tried it.
K8s is a lot of things, but a monolith it is not. Quite the opposite - the complexity comes from the large number of relatively simple components interacting in various ways.
Nomad already does the container part, and Consul Connect is the networking overlay / service mesh. Work is being done to get Nomad better integrated with Connect.
FWIW, where I work we're going with Nomad over K8S. It gives us everything we want and nothing more (plus it's from a source we trust and love: HashiCorp).
The nice thing with Nomad is, thanks to it's straightforward design/approach, it should be easy(ish) in the future to swap it out and go with something else if we outgrow it (or HashiCorp abandon it for some reason).
I tried using Nomad once after being a little worn out by Kubernetes' complexity. For some reason, the Nomad abstractions didn't click on the first couple attempts. In comparison, Kubernetes' abstractions mapped 1:1 to my understanding of the service oriented architecture.
I'd have probably gotten used to it had I spent more time using it, but it'd have taken some rewiring of thinking process.
> I tried using Nomad once after being a little worn out by Kubernetes' complexity. For some reason, the Nomad abstractions didn't click on the first couple attempts. In comparison, Kubernetes' abstractions mapped 1:1 to my understanding of the service oriented architecture.
<3 Nomad. However, Nomad only satisfies a really tiny part of Kubernetes ecosystem which is the ability to pack containers and schedule them efficiently in a cluster; Plus, it scales really well. Kubernetes provides a bit more than that.
I think it's more that Mesos committed suicide by DCOS - which sapped the strength from the community, something that first Google, then CNF work hard at avoiding.
Another thing that might have been just me, but I really couldn't see any structure in Mesos deployments. Yes, you could run X, Y, Z on top of it to get various features, but they all had separate APIs, separate input files, etc.
Having used k8s before that it was a huge blow, since at the time (late 2018) I was used to think that Mesos et al were more mature and advanced, and instead encountered k8s circa 1.2 with less cohesion :/
I'd agree that k8s has a lot of functionality built-in, another important thing to realise is what k8s doesn't do.
In addition to the well-known integration points (Container Runtime/Network/Storage Interfaces), there's things like the lack of a good built-in user authentication mechanism with Kubernetes, which means you pretty much always need some external authentication service for your clusters.
That's not too bad if your on one of the big managed providers (GKE/AKS/EKS) but can get complex for people who want to deploy on-prem.
> That's not too bad if your on one of the big managed providers (GKE/AKS/EKS) but can get complex for people who want to deploy on-prem.
Go spin up Keycloak, join it to your user-directory of choice (or not and just use the internal directory), configure it as your authentication provider, done.
Right so in addition to the complexity of running k8s (which is the general point of the post) you now have to learn about OAuth servers and LDAP integration.
In many corporates you also now have the challenges of cross-team/department work, for the k8s team to work with the AD team to get it setup.
And still that won't get you away from the problem that without a first class user / group object in k8s people often end up running into problems with JML processes over time and mismatch between AuthN and AuthZ...
LOL. You clearly not worked with SSO or anything a bit more complex. It's a pretty hard problem, there are even companies whose whole portfolio is around authentication only!
We‘ve been mostly happy users of Mesos, Apache aurora, and consul. It works pretty well for us (200+ engineers). We have maybe 5 people dedicated to keeping it all alive, and they’d be able to maintain the pieces of it that we use. Aurora configuration kinda sucks, but I think job configurations might just suck in general.
That being said, it has been concerning to us to see the big institutional players move off of mesos and aurora. We wouldn’t like to be active maintainers, but we could.
I think that’s the main difference between Mesos and K8s: with K8s, we wouldn’t want to maintain it, and we wouldn’t be able to (since it’s so large). Somehow mesos and aurora feel more manageable.
But, to be fair, they’re not dedicated to only keeping the lights on, they probably spend 20% (1 eng year per year) of their collective hours fighting fires, so I don’t think burnout is too much of an issue. The remainder of their time is spent maintaining libraries and web interfaces. They’re a pretty standard “platform engineering” team.
In my perspective, it doesn’t matter what technology you go with (K8s or otherwise): a responsibly-managed your platform team requires at least one team with a full on-call rotation (i.e. at least 4 engineers), depending on how wide your golden path is.
I'm a former Apache Aurora maintainer. Aurora has been (and continues to be) awesome for us and I'm so happy to hear other folks are still using it and it's working out for them.
Funny that you mention the configuration part. At the most recent KubeCon in San Diego, CA, the folks at Reddit gave a talk in which they said they got sick and tired of dealing with yaml. They accidentally went on to recreate Pystachio as the remedy so I think you're right on the money with your statement.
When the Project Management Committee (PMC) voted to put Aurora in the attic we were all super bummed but we just ran out of interested developers :(.
Oh it’s super cool to see you in the wild! To clarify, I had a lot of qualifying thoughts running through my head when I said “kinda sucks” (hence the “kinda”!). :)
I actually think managing aurora configs is way easier than managing yml files, and I agree that I think aurora configs were ahead of the game: having access to python in your config feels like a super power. I feel like we’ll converge on something that compiles aurora configs into yml files, prior to runtime.
That being said, we’ve never been able to get good editor support for things like “go to definition”, with the whole “include” syntax. We have maybe 2-3k aurora config files, of which maybe 100 are shared boilerplate. Do you have any advice on this? I tell vim to treat them like python files, but pylint hates them :)
We were bummed by the PMC decision too. I think some people at my company have considered becoming maintainers over the years, but, for the most part, everything “just works”, so we haven’t felt a selfish need to, so to speak. I actually think it’s a kind of unintuitive credit to your project, that it doesn’t require a horde of maintainers. That being said, I’ll set aside some time this weekend to take a look at some issues. :)
Oh no worries, no offence taken at all with the comment, configurations files tend to suck in general :).
Pystachio was indeed very forward looking and the folks who worked on this at Twitter at the time deserve all the credit there.
I think what you mention is a general problem I've encountered with IDEs when it comes to dealing with Python (esp. the "go to" issue you mention). Even when I've had to touch the Aurora client code, which is full on Python code, I've come out pulling my hair thanks to PyCharm acting wonky.
> We have maybe 2-3k aurora config files
Those are some big files! The boiler plate stuff is definitely something I've heard before from users but, unfortunately, there doesn't seem to be a better answer.
When it comes to managing job configs, I'm pretty low on the pecking order in terms of knowledge since we ended up creating our own thrift client using Go to use with Aurora. (As a consequence, all our job definitions exist as Go code.)
Stephan Erb (https://twitter.com/erbstephan/) may have better advice in this case. Some of the Twitter folks may have good info too, but they've been radio silent for months.
> I actually think it’s a kind of unintuitive credit to your project, that it doesn’t require a horde of maintainers.
That's definitely a great point and a great compliment to the project. There's a lot of love that went into this project and I'd be ecstatic to get some new contributions, even if it something simple like fixing documentation or bumping up dependencies :D.
I know quite a few startups going the Nomad route. It might not be as mainstream, but I don't think it's going to be phased out any time soon. Rancher was still maintaining their own scheduler for a while (not sure if they still do) because there were a lot of legacy customers still on it. Racher and RancherOS has pretty much moved to a full k8s management shop though.
This article was written to address what seems like an internal debate or discussion about why they don’t use Kubernetes at Coinbase. As such, it boils down to: we already use something else, and moving to K8s comes with risks and challenges that outweigh the benefits, particularly security concerns. As a crypto firm, Coinbase rightfully seems zealous about security.
The article doesn’t claim that K8s is a bad solution. It only claims that migrating to K8s doesn’t work for them at this point in time.
If you are starting something new, or rebuilding a system entirely, and if you are on GCP, and you anticipate the need for scale, then K8s/GKE is a sane choice. What would be insane is trying to roll your own solution. There’s no avoiding the complexity of managing infrastructure, but at least with K8s you won’t be alone, and you won’t make the mistakes others have already made and fixed. Some people never seem to get K8s, in the same way some people never get functional programming. You might be one of them, so that’s something to take into account.
A lot of people have had bad experiences with K8s, but just as many have had bad experiences with Docker, or with Linux, or with computers in general. This doesn’t mean the technology is bad. It just means that not every good technology will work for you or your use-case. Kubernetes is a solid choice, especially GKE. But think carefully before deciding to use it. It is one of those tech decisions that will dominate the rest of your choices for years. In my case, it was a decision I‘ve never regretted.
- Odin kicks off a step function and begins to deploy your application. New VMs are stood up in AWS and loaded into a new ASG, your software is fetched from various internal locations, a load balancer starts health-checking these new instances, and eventually traffic is cut over in a Blue/Green manner to the new hosts in the new ASG behind the load balancer.
- To handle secrets and configuration management we have built a dynamic configuration service that provides libraries to all internal customers with a p95 of 6m
- re-scheduling/moving of your containers if your VM dies/becomes unhealthy in your ASG
So if your company has all of that, I agree no reason to use Kubernetes. But what if your company doesn't have any of above system? Kubernetes.
I have used bare ec2 vm, deploy with Ansible, Capistrano, Chef, Docker..all of them. I even roll my own autoscaling with Consul, SQS(for termination notice) and I have more downtime than when using Kubernetes.
With Kubernetes, the learning curve is very high, but once you get it. It's painless to bring in new service, with AWS ACM to terminate TLS, and an ingress controller, pretty much as long as you have a Dockerfile, you can run it.
Odin is just Step Function flow that deploys Auto Scaling groups. It's not that complicated, and not so different from your setup.. main difference is they deploy docker images to ec2 rather than run ansible and configure a running machine.
What about resource allocation? With Kubernetes I can specify both requests and limits of cpu and memory and it will fill my nodes to match. A node autoscaler gets me more nodes if needed. I can define vertical pod autoscaler that can dynamically modify those requests and get me an emptier or bigger node if needed. I can define a horizontal pod autoscaler to keep aggregate cpu at a set target by auto-spawning more containers for me, and k8s handles load balancing via dns. Do typical production setups not require some/most of these features? Mine do.
I dropped midway through the post. It reads as a classic engineering justification for X vs A, B or C (the same justification can be made for anything vs anything).
I found it interesting that for someone that doesn't use Kubernetes they spend a lot of time describing it.
There are benefits in using mainstream solutions like Kubernetes. I've spent over a decade building distributed systems from Hadoop to Mesos and Kubernetes and have seen the pains of datacenter and AWS, Azure and GCP and all I can say is GKE works great so I don't buy the simpler in-house argument. Once you start properly integrating concerns that need to be end-2-end integrated, the complexity explodes and you end up with a partially working system.
I do believe that serverless will likely make Kubernetes and similar-level tech irrelevant for most users, however only part of a rich ecosystem that provides all other concerns.
Having used Kubernetes extensively at my last position, adoption of k8s strikes me as similar to the adoption of Linux as your desktop OS at the Turn of the century. Some people are doing amazing things with it! Other never figure out how to get their WiFi to consistently work and are bitter people keep talking about it.
Eventually something akin to Ubuntu will grow up in the k8 ecosystem and people will stop complaining that WiFi doesn’t work.
I was very much looking forward to OpenShift/OKD to be "the Ubuntu" of K8s, but their targeted scope just keeps getting bigger (or it always has been big, not so sure anymore).
On the other end, K3s from Rancher is a minimalist distribution, and I think it'll limit its adoption in sophisticated environments with larger teams.
Does any other K8s distro look like a good candidate to standardize on in the future?
https://jenkins-x.io/ has brought together a number of open source projects in theoretically a good way but it's still pretty rough around the edges (at leas last time I worked with it in March) and still falls short of an "Ubuntu" experience.
One thing people forget is containers don’t necessarily not to be created with Docker.
Lately I’ve been creating containers using NixOS and couldn’t be more happy with the ability to have everything in the container configured by a configuration.nix file. https://nixos.org/nixos/manual/#ch-containers
The idea of a Docker Ubuntu, Arch, Alpine, etc base image is kind of silly when you think about it. The idea that we have separate repo for sharing massive container images sometimes multiple GB rather that a simple file that can be tracked in git and will produce the same image every-time has been very eye opening.
Dockerfiles are imperative and the order of steps changes the container image that ultimately gets built.
Nix packages are declarative and idempotent. Everything is based on a functional package manager and when you configure your container via nix you get the same thing every time and can easily mix and match other packages and dependencies.
A contrived example is you have a program that depends on multiple versions of python, because nix does not globally set dependencies rather links them locally you can have both with versions of python without any conflicts
Containers built from Dockerfiles are difficult to exactly reproduce too. A very common practice is to update the package manager in which depending on what the server returns will be different from one day to the next where as nix hash addresses and builds from source everything (you can use caches to speed this up) and you get the same thing every time.
Note: NixOS isn’t perfect in terms of being purely functional as many package builds use scripts in pearl,bash,etc but from my understanding these are usually source specific reproducibility issues with Building particular packages and generally speaking things works as expected.
The declarative and idempotent container specification seems like a nice developer QOL improvement over docker but...
> Containers built from Dockerfiles are difficult to exactly reproduce
This isn't true in my experience, at least for a level of exactness that could have an impact on the behavior of the software running in the container. I have been working with docker in various contexts for 5 years and have never experienced any issues related to reproducibility (which is pretty much the primary selling point of docker). To your point though, I can see how reproducibility issues could happen in theory, but since its not a problem in practice it doesn't make for a very compelling reason to try nix instead. Just to be clear, I'm not trying to disparage nix, it honestly sounds pretty interesting, especially as a way to automatically provision a local workstation (docker is not good at this), but I am still curious as to the practical tradeoffs in terms of nix vs docker as a tool for building containers.
Then have your Dockerfile build from source. Many of the public layers that people build their containers upon do that for that exact reason.
If NixOS is only "generally speaking" correct because of the order it runs its scripts, then put it in a Docker container, where the whole point is to perfectly reproduce the order you run your scripts in order to minimize the errors that arise from doing otherwise. Still worried? That's why layers exist, so that every step you run you now have a static binary that you know will never shift under you, and that you can keep on building on top of.
In practice, building Dockerfiles is unreliable and can produce errors because they inherit all of the deficiencies of aptitude, python packaging, javascript packaging, et al.
In practice, building nix dependencies is reliable and reproducible because of the militant isolation of build environments, hash integrity checks on _all_ inputs, explicit dependencies, and the long tail of bug fixes related to non-determinism in unix tooling.
That's a great a vote of confidence. I would've thought that nix would only manage packages on the OS-level. How does it re-implement pip to be reliable? I currently have to stage a couple of `apt-gets` and `pip installs` before my `pip install requirements`, and it took a couple of tries to lock in the right sequence that worked. How does nix solve those problems?
Ha! I did this but the containers end up being somewhat large since you have all the NixOS package stuff in the container too.
However, building the container from a configuration.nix file gives you exactly what you need in the container and nothing more. It's like basing your container from scratch in a Dockerfile and copying only the needed compiled dependencies and nothing more into the container.
Another advantage is a unified development environment as I can run the server in the container or directly on host machine in an isolated nix-shell.
Yet another advantage, is unlike the layer of a docker container, isolated dependencies are isolated and linked so packages link together based on a hash-address/path of a dependency in the nix store so there is no need to worry about order of steps but rather what depends on what making the cache misses in a nix-store vs a docker layer more efficient. For example if the first step in a Dockerfile needs to be changed all proceeding steps need to be rebuilt where as if a dependency in a nix file say paython-2 to python-3, every package that depends on python just get re-linked and only those packages need to be updated and you already have python-3 used in a nix-shell or in another container odds are it's already cached and ready to go!
Haven’t actually deployed to k8s, currently using them to isolate local development but there I know (at least at one point) kubernetes supported other images than just docker images but even if they do t NixOS containers supports generating docker images...
I imaging a ci/cd flow where it build the image and installs your app via a local Nix package built from source in your git repo and all that needs to be shared is the git repo and you have everything you need to reproduce the container er running in production.
What I don't like in these kind of posts is the fact that he seems to know much more than he could.
"For example, most folks that run large-scale container orchestration platforms cannot utilize their built-in secret or configuration management."
So, with thousands (maybe more?) of "folks" out there, each with a different environment, requirement, background and so on. How can someone knows what most of them can or cannot do?
Sure there is a lot of them posting about it in blogs and showing in conferences, but it still a very small subset of all the companies running containers today in the world, and the majority will not expose any info about it, for a myriad of reasons.
Probably by "most" he just means he's talked to 20 friends who head up devops for such large-scale platforms, and 19 out of 20 explained how it didn't work for them, and he then assumed that's generally applicable for the other couple thousand out there.
While this may not be stated in the most diplomatic way, it's important and relevant for this discussion and I don't think it should be downvoted. It's very well established that Coinbase's infrastructure is unable to handle traffic spikes, despite the fact that traffic spikes are a fact of life in their industry and downtime can easily cost their customers millions of dollars.
They've worked to fix their trading system to address these 100X spikes in trading volume - it's less about _how_ they run their stuff and more about _what_ they were running...
Wanted to mention the same. Though I kind of disagree with the conclusion.
Though the have a lot of problems with the stability of their service, it also means they know what they want to solve with their current architecture. So we'll see
It puzzles me a bit that they don't want to invest on.k8s expertise (at their scale, especially security-wise can be though, I guess) but at the same time they develop their own deploy system which sounds a lot like Spinnaker and they have their own secrets/config management system (Hashicorp's stack is pretty neat and battle tested).
Coinbase hasn't scaled very well IMO, probably because of people like this. Once you try out Kraken it is like night and day difference in terms of features and supported currencies.
Trading during high-volume hours was next to impossible: orders were not getting filled (sometimes they didn't even reach the order book!), the UI wasn't loading, etc.
Then they shut down for a while and rebuilt everything starting from the matching engine. I guess they also reworked their architecture because after that, it was smooth sailing ever since.
> The only way to sanely run Kubernetes is by giving teams/orgs their own clusters
Funny, we're running it sanely without doing that. We've separated our clusters based on use-case - delivery vs. back-end, aiming towards the "cell-based" architecture.
> Managed Kubernetes (EKS on AWS, GKE on Google) is very much in its infancy and doesn’t solve most of the challenges with owning/operating Kubernetes (if anything it makes them more difficult at this time)
...some details on the challenges they don't solve, or indeed make more difficult would be good.
But yep, K8s is complex. So, to paraphrase `import this`, you only want to use it when you have sufficiently complicated systems that the complexity is worth it.
It will catch up with you. I was at one shop with an 11 person platform team dedicated to platforms. The shop moved really fast, and even with 500 employees, they were able to moved from OpenStack to DC/OS in 2~3 months. (We had CoreOS running on open stack, but fully migrated over to DC/OS. Jenkins -> Gitlab also happened very rapidly; really good engineers).
At my current shop, we struggle to maintain k8s clusters with an 8 person team. We inherited the debt of a previous team that had deployed k8s and their old legacy stuff was full of dependency rot. We have new clusters, and we update them regularity, but it's taken nearly half a year so far and we don't have everything moved over.
You do need good teams to move fast; and good leaders to prioritize minimizing tech debt.
We've used a GitOps model (using Flux[1]) with a reviewer team made of people across our dev teams (and the sysops, natch) to ensure that people aren't just kubectl-ing or helm installing random crap, and we put about 2 weeks of effort into getting RBAC right, so that everyone has read access to cluster resources, but only a subset (generally 1 or 2 per team) have what we call "k8ops" roles - and those are the same people reviewing pull requests in the Flux repo - and the norm is to use the read-only role as default. Only time I've had to recently use my k8ops role was to manually scale an experimental app that was spamming the logs to 0 replicas so the devs responsible could sort it in the morning.
I think the way we've approached it achieves the same goal as just giving each team their own cluster to avoid them messing up other teams.
1) Those guys can't imagine their life outside of amazon. this is bad. And yes, amazon k8s was bad, it's getting better, still bad.
2) They said we need a separate team and instead they wrote and maintain their own, sort of lightweight solution.
Nothing more about that article.
One opinionated team decided to roll out something homemade to add it to CV later on.
I partially blame keyword driven recruitment in tech for these kinds of responses to a platform/tooling. Kubernetes isn't a magic bullet - it is a platform which solves a very specific set of problems with scaling. And of course it doesn't come for free. You can't just throw in k8s into your existing infrastructure and expect your devs to manage it, in addition to their regular work.
And yet, we keep reading about teams falling into the trap because their lead engineer wanted to put "production kubernetes" on his resume. I hope the k8s team adds a huge "Who is this NOT for" disclaimer to their docs (if it doesn't already exist).
Oh God. This is what is happening at my work. We have an API that has 200 write users and a public front-end that can do reads. None of it is heavy though, with most writes occurring for a month in the Winter and a month in the Fall. In the unlikely event of heavy write loads we could just scale up CPU/RAM for those two months. Any read load be solved with cache or spending time on the worst offenders in SQL. The Lead Dev is gung-ho that it MUST be a micro-service with K8, Kafka, and I'm sure a bunch of other shit we don't need for what is the same application that has been being written for half a century. Data in/out with business logic applied. The entire API has about 8 paths, with your basic HTTP methods for each. Its probably the smallest API I've ever worked on.
The positive for me is I am learning a bunch of stuff during the process. The negative for the project is he expects developers with no previous skillset in this space to design all this new (for us) tech without introducing technical debt. The downside for the client is...well guess.
It's sad. I am rewriting a problematic legacy application and creating a whole new one. I guess I can put K8 and the kitchen sink on my resume when I look for a new job just before this thing implodes on release though. I just wanted to write clean code man, thats all. There is a sadistic part of me that wants to grab some popcorn and see how this explodes in his face...there is also a part of me that wants to be proved wrong...I hate this job.
So are you assuming that your service will never grow? Because vertical scaling up is nice and easy as long as you know the limits but once you cross a threshold, not matter how many CPUs you throw at a problem it just won't scale. Your senior lead seems to be anticipating that and preparing in advance.
Who assumes a system will never grow? No one. I'm looking at the 10-year-old legacy system we are replacing. That tells me a lot. Where its been, why its that way, and a lot about where it's likely to go in the next 2-3 years. Its function is actually being reduced in this rewrite.
I'm not against the microservice idea. I just rather focus on solving the problems the legacy system had. None of those had to do with scaling. They were related to a certain federal agency changing their mind every 2-4 years and really poor coding practices. A microservice with kafka and k8 don't really solve those issues.
Nomad has been brought forth in a lot of comments, and it has a feature that nobody brought up yet, I think. It's multi-platform. It currently has official task runners for Docker, Isolated/Raw Fork/Exec, Java, and Qemu. It has several community-based task drivers, including Windows IIS and FreeBSD Jails.
Kubernetes supports mostly Linux, although has recently gained Windows node support.
Indeed, if you are coming from a world with a lot of legacy systems but want to make Docker and containers orchestration your golden path, but still streamline all of your deployments, Nomad seems like great choice. At one point you might need more then Nomad can give you, but then you are at least familiar with concepts like raft, declarative CI/CD pipelines for containers, vault and secret handling, canary deployments etc.
I am extremely happy with Nomad, it is one of the best decisions we have ever made in our organisation. We had a thin abstraction layer over Consul + Docker Swarm that we could port to Nomad in a matter of a few hours and it's been rock solid so far.
But sometimes I wonder whether we are hurting our own careers not going down the k8s route. It really seems it has a bright future.
> Nomad has been brought forth in a lot of comments, and it has a feature that nobody brought up yet, I think. It's multi-platform. It currently has official task runners for Docker, Isolated/Raw Fork/Exec, Java, and Qemu. It has several community-based task drivers, including Windows IIS and FreeBSD Jails.
I think `rkt` and `lxc` are missing from your list.
My personal mantra of being a late adopter when it comes to cloud deployment tools has some benefits has served me well. I am even reluctant to integrate docker in my work flow. Git pull, build and deploy bash scripts are serving me well enough for now. Thank you very much.
One note on the security side of things -- if you're interested in seeing what a truly hardened k8s/GKE configuration looks like, check out the Vault examples:
In summary, for your security-critical workloads you're going to want to put them in their own cluster; treat k8s in this case as an API for updating the code that's running on your VMs. (Except your VMs can run a stripped-down read-only OS like Container-OS or CoreOS).
Pinnacle of devops effort to deliver apps before Kubernetes was AWS Beanstalk, which their setup replicates in great detail.
I'd take k8s over Beanstalk any time.
Kubernetes is not about scale, it is about defining primitives, everybody can use to describe their setup. It is a DSL everyone converges to and as a result of this unification products from different vendors can be packaged and deployed in the uniform way.
I honestly think treating every company except R&D or startups like this makes sense. I've been playing firefighter at F&I orgs who've stood up k8s recently, and it's insane the amount of debt they've taken on in such a small amount of time. The infra spend is magnitudes greater than it would be if they had physical boxes, because getting approval for physical boxes is hard. But for some reason budget on AWS, GCP or Azure is easy.
The discussion and pressure always start on the engineers' side, but normally they don't consider it is different to build a system from scratch using the technology or migrating.
The article is clear: Coinbase has a very strong, tested and validated infrastructure and moving to Kubernetes 'just to be part of the hype' does not bring any benefits at this point. And a statement like that is a nightmare to some DevOps.
It's slightly faster than 6 seconds for me, but not much. The initial response for the html page already seems incredibly slow at around 500-1000ms. And then it does seem like there's at least a whole second worth of JS being executed before the site is actually loaded.
Many of the author's issues with kubernetes are specific comparisons with their current workflow. One I can definitely agree with is the burden of upgrades and keeping the platform current. There are techniques which can make that easier to handle. One statement that surprised me:
>> at Google it isn’t uncommon for them to have multi-hour outages with GKE
For what it's worth we've been running multiple GKE clusters in production for over three years. We're medium size, with some dozen or so in-house services handling perhaps a total of 20k rps. We are rarely affected by any GCP issue, and as far as I can recall we have never been down due to a GKE-specific problem. In addition to the basic orchestration features we make significant use of ingress and storage primitives. It all quite literally just works.
Their system seems very sane. I'm jealous, and I hope it stays that way (sane, not static). It's also entirely dependent of features of AWS. I used to work for a company that had a very insane hybrid datacenter/aws deployment environment, and containers provided some sanity.
Service discovery, cluster management, secret storage, these are all problems that we _already_ had. Containers (for us on mesos) just solved some parts of that picture.
I lived and breathed containers, distributed systems, config management, deployment pipelines etc. for years, and I forget that to many K8s is just seen as one magic bullet solution. You will have to pick it apart and interact with pieces of it if you really want to use it at a medium sized company. That takes a lot of research and understanding.
Coinbase built and maintains their own platform that's working for them.
Coinbase provided an analysis worth studying. The major takeaway for me: asking people to manage their own Kubernetes cluster is like asking people to manage their own hypervisors when they just want VMs.
Question: some of the security/management concerns cannot be addressed via AWS Fargate?
The bottom line is: K8s is awsome but it is complex.
It is a Google-needs-oriented software.
Sadly average project will not have that complexity you find in Google Plex.
I can run Oracle Database in 100MB of RAM...you need 4GB at least to run a K8s node... a costly option.
Below ten servers I find no point in K8s: I highly suggest docker swarm.
Quick shout-out toAptible (Heroku for HIPAA) which have been amazing for our healthcare startup.
Still, I honestly feel that we've taken a step back in the industry where we went from ec2 --> heroku --> k8s. But I know there are many people working hard to create the next infrastructure so we don't have to deal with containers and all that re-inventing dev-ops non-sense for every project.
I'm a big fan of this - to me containers is another hype train like cloud and serverless. There absolutely are good use-cases (we run a number of K8s clusters with great success at work) but it's absolutely not the solution to all (app deployment) problems. As said in the article, I think complex systems like K8s can often have a detrimental effect on productivity. KISS.
The history is a bit fuzzy. The interesting feature introduced in 2.6.24 was PID and network namespaces. Containers were "complete" by Linux 3.8 with user namespaces. Cgroups are not that important to be able to build containers (isolation first). There were other out-of-tree technologies before that, notably VServer and OpenVZ.
I find it interesting that many people seem to conflate the complexity of managing infrastructure and services with K8s.
K8s is complex because managing distributed services is. Not using it doesn't mean it goes away. The complexity migrates and ends up being bundled up in a separate tool or a runbook process or some script.
It's hard to maintain because the tools and apis are different from what some engineering teams are accustomed to using. Building an in-house tool gives them a warm fuzzy feeling and comfort that they can handle problems when they appear due to familiarity with their own code and design choices.
It's a fair trade off. I do wonder how much of the time spent doing this exercise could have been spent on K8S training.
I do feel that the K8S community do downplay how much a PITA k8s configuration can be and that the perceived robustness of cloud-managed K8S isn't up to scratch for something this complex.
Imo: the best way to run K8s is via some managed provider. Sure you can do it the hardway a la Kelsey Hightower but in production and if you’re not staffed with k8s experts I’d rather give that to a provider and focus on what I know — the code and biz logic.
I'm currently going through this but the alternative is just too painful, I've tried nomad, Deb packages, etc but the tooling around all this is basically build-your-own vs kubernetes which has tooling for a lot of things.
It's all about scale. Large scale requires an orchestrated application management system with progressively integrated bells and whistles and minimal fiddling with userspace. Small scale doesn't factor in. Do small scale the old way if you want: Binary artifacts, amis, home grown integrations and playbooks/scripts. Having done it both ways I continue to do it both ways. Run a monolith the old way and run your web tier, stateless and customer facing, highly available services in k8s. If it's not broken don't fix it (but for some use cases (large scale) the old way _was_ broken).
Thanks for the article.
I've been in tech for the past 10 years, working in or around devops teams for the most part but I don't get all the fuss about k8; yes it's an amazing tool doing a lot more than any other.
But it has a big learning curve and setup & maintenance are very costly. I don't understand why most orgs are moving to k8 considering this. When talking to my peers I often see numbers like 6 months to 2 years full migration with a very small added value - at least for 90%+ of the companies using it.
It usually boils down to attracting talents and keeping them excited trying the new shit.
There is a learning curve either way. It’s either an in house container orchestration platform or one of the open source ones. Coinbase seems to have chosen the former.
K8s is simply a good default environment which provides rock solid stability for your applications by outsourcing the distributed systems complexity to your infrastructure team (whether its internal to your company or to a managed one like GKE). Teams are not using it just because it’s “cool” (maybe some are), there is no need to develop in house strategies to deploy and keep an app running and scale it (among other things; this is the lowest common denominator).
It’s the same reason why big data tech has somewhat standardized on a set of tech (spark, airflow etc): once people learn the system, they can focus on building products that provide value rather than building the products and the relevant infrastructure.
Can fully relate to the author. We have been struggling with effective k8s at our company for 2 years now. Too much to learn to get your first service in production. You will end up writing wrappers after wrappers that make you think “we can’t be the first guys to solve this” and after googling, you find that every issue you find is described as “just” one of the downsides of using k8s and here’s how we overcame it.
I wish someone can make a deployment orchestrator for your private DC that is as simple to use as Heroku is.
Edit: fixed typo.
> I wish someone can make a deployment orchestrator for your private DC that is as simple to use as Heroku is.
This is what I'm working on, although it is built on top of Kubernetes! We feel Kube suffers from a disconnect between the insane complexity requirements of enterprise deployments and what most coders and businesses actually need - which is exactly what you said: a consistent Heroku-ish experience on their own hardware or whatever cloud is convenient. Kube needs what Git needed - a GitHub. Feel free to reach out to me (email in profile), I'd be happy to chat your ear off about it!
I don't think there is a disconnect, the K8s team knows about it but this is not a problem meant to be solved by K8s alone. Project like Knative, OpenDeis, Fargate, Cloud Functions are supposed to provide Heroku-like PaaS on top of Kubernetes.
We haven't even got to ops stage yet. Small team has spent 6 months trying to repackage our app into openshift and still barely works. I think everyone regrets even looking at it.
My feeling is that I just want AWS Lambda functions to allow for larger packages. And then clusters and scaling and OS updates etc. are Amazon's problem.
I would rather just be able to do that than get into ECS/Fargate, much less K8s. It seems like all of that stuff is just adding more complexity for me.
Of course none of my projects are gigantic or need to be highly secure.
I wonder if mesos with it's somewhat more tidied up infrastructure and codebase and less moving parts could make it as a k8s replacement, if you absolutely must run Linux container workloads (though I understand mesos can run a anything not just Docker-/runc-like images).
This is why I ditched systemd, et. al. and went bareboned embedded Linux on QEMU. Then build up using Makefile. Probably could have used Yast, but didn’t want the huge pullin of libraries.
BUT, my prototyping turnaround is way much faster and more stable than K8s. Way less staffing too.
imo running k8s in the cloud makes no sense. If you've already setup on prem servers, load balancers, config management, patching, access control, etc to allow developers to run applications on VMs k8s can provide an integrated experience with significantly less work. If you're running k8s in the cloud then just use a hosted service and leverage enterprise support.
In the on prem case, you already have a dedicated infra team that probably has the tools to effectively deploy and manage a cluster.
> In the on prem case, you already have a dedicated infra team that probably has the tools to effectively deploy and manage a cluster.
Tools? Yes. Care and attention? Hell no.
The ops team at my work still has important services running on CentOS 6, and they are spending all their time trying to get Kubernetes configured and working.
You would be surprised at how much infrastructure out there is just glued together adhoc. Several senior engineers at my work are successfully blocking adoption of gitops and CD.
The most popular cloud product (aws) has a really shitty managed k8s offering. This is perhaps the main reason so many infra teams have spent so much collective time into building k8s. If eks was as good as GKE, what you said is absolutely spot on.
That's really oversimplifying. EVERY tech has learning cost, and so a return on investment. K8s has been so hyped that many developers think it's a mandatory skill. It's not. I've seen talented app developers not getting through successfully deploying an app on K8s. For many, Heroku or Cloud Foundry is good enough.
Even in their case, it looks like they took a conscious decision and documented it. I'd wish more teams would do that. You are free to use K8, and I'm sure right now it's still not a controversial choice that will run against walls.
There are also application engines in major cloud providers, base on your technology. Some tech stack can run serverless. For self-host scenarios, things like Dokku could also work.
To me it looks like they saved quite a lot of engineering effort, and the price was lockin. Seems like it was probably a fair trade for them.
At least if it's sane and well thought out it can be taken back apart and repurposed for something different. There will be cost. Consider it paying back the loan. That's why it's called technical debt. The terms of the debt looks good to me.
The CTO at a client of mine has spent over 1 year trying to deploy ~7 web services to k8s, hired 5 contractors to help and they still don't have a stable deployment.
I informed them my team could do it all in under two days using Terraform + AWS + EBS. Unfortunately, they didn't take us up on the offer.
Sunk cost, etc.
I've never seen a company use k8s and not end up with a giant expensive team maintaining it all.
"For example, most folks that run large-scale container orchestration platforms cannot utilize their built-in secret or configuration management. These primitives are generally not meant, designed, or built for hundreds of engineers on tens of teams and generally do not include the necessary controls to be able to sanely manage, own, and operate their applications. It is extremely common for folks to separate their secret and config. management to a system that has stronger guarantees and controls (not to mention scaling)."
Fyi to anyone who is reaching (or close to reaching) this point: this is exactly the problem we're trying to solve with EnvKey[1] in a secure, robust, and scalable way.
Our current product runs on the cloud (using zero-trust end-to-end encryption) and solves the problem quite well imho for companies with up to 50 or so engineers and moderately complex infrastructure. It fits easily into either containerized or non-containerized stacks.
But our v2 is almost done after years of work, and I think it will now be able to handle almost any scale and workload. The launch target is August 1st. Some cool features it will offer:
- Source available self-hosting with auto-scaling, HA, and strong consistency that just works.
- "Config blocks" that can be used in multiple projects, allowing you to de-duplicate configuration and secrets.
- Version control with simple and advanced rollback capability.
- Comprehensive access logs with simple and advanced filtering and auditing capabilities.
- Ability to manage local environments.
- Ability to react to updates and e.g. restart servers when config values change.
- A CLI that will have full parity with the UI, either for automation or those who prefer a CLI-based workflow.
- An option to use our UI via an incognito web browser pointed at localhost instead of Electron (for those who hate Electron)
- Faster, lighter, modernized end-to-end encryption built on NaCl (v1 uses OpenPGP, which is great, but it's time to move on).
- Device based auth with optional passphrases (think SSH) and an easy, secure workflow for granting access to new devices.
- Ability to authenticate and invite users via Github, Gitlab, Google, Okta, or SAML (including inviting from multiple sources within a single org).
- Teams/groups and advanced group management: grant teams of users access to groups of apps, connect groups of blocks to groups of apps, etc.
- Customizable everything: environments, sub-environments, and access roles can all be molded to fit your workflow.
Our goal is to fully solve this crucial piece of the stack so that it "just works" with minimal time spent on integration (for new projects it can be installed and integrated in minutes). If you're interested (and/or want early access), submit your email to the form at the bottom of our site: https://www.envkey.com -- and of course if the v1 can solve your configuration and secrets management needs as it already does for hundreds of our customers, give that a shot! Upgrading from v1 to v2 will be very quick so you won't be duplicating any work.
Also, we're hiring--remote anywhere in the US. We'll put up a jobs page with more detail soon, but our stack is TypeScript (node + react), Go, and polyglot (since we need to write lots of integrations). Shoot me an email if this sounds like a system you'd be excited to work on: dane@envkey.com
Unfortunately I live in not-US (Aus), but I have to say, envkey looks very good! I'm actually floating this and HashiCorp Vault by management at the moment, since even with 4 devs, keeping environment variables in sync is a PAIN!
If you don't mind me asking, how does the Go integration work? I initially thought it was actually some sort of alternate `os.Getenv()` that you imported, but that doesn't seem to be the case. And what would the latency be, for changes in environment variables being synced to running deployments?
When a process starts, envkeygo makes a request to our config service (via envkey-fetch) to fetch the encrypted config, then decrypts the config and sets them on the environment so they can be retrieved with `os.Getenv`. Both the lookup id (for fetching the encrypted config) and the encryption passphrase for decryption are initially passed in via an ENVKEY=... environment variable--you can think of it as a single environment variable that "expands" into all the others that you need.
Latency on a request is generally in the 150-300ms range from the US (our primary servers are in us-east-1).
EnvKey is strongly consistent and transactional, so once you make a change to your config, it will be available immediately for any subsequent requests. For now, it's still up to you to restart servers/services yourself after a change. With the v2, this will be scriptable.
We'll eventually hire outside the US too, but for now I'm trying to keep the timezone spread and administrative burden low :)
At this point I’m of the belief that adopting k8s at most companies should result in shareholder/investor lawsuits for the near-criminal waste of corporate resources.
This actually was a very real problem at my current job. The data pipeline was migrated to k8s and I was one of the engineers that worked to do that. Unfortunately, neither myself (nor the other data engineer) was a Kubernetes guy, so we kept running into dev-ops walls while also trying to build features and maintain the current codebase.
It was a nightmare. If you want k8s, you really do need people that know how to maintain it on a more or less full time schedule. Kubernetes is really not the magic bullet it's billed to be.
> Managed Kubernetes (EKS on AWS, GKE on Google) is very much in its infancy and doesn’t solve most of the challenges with owning/operating Kubernetes (if anything it makes them more difficult at this time)
Oh man this hits home. EKS is an absolute shitshow and their upgrade schedule is (a) not reliable, and (b) incredibly opaque. Every time we did a k8s version bump, we'd stay up the entire night to make sure nothing broke. We've since migrated to cloud functions (on GCP; but AWS lambdas could also work) and it's just been a breeze.
I also want to add that "auto-scaling" is one of the main reasons people are attracted to Kubernetes.. but in a real life scenario running like 2000 pods with an internal DNS, and a few redis clusters, and Elastic Search, and yadda yadda... it's a complete pain in the butt to actually set up auto-scaling. Oh, also, the implementation of Kubernetes cron jobs is also complete garbage (spawning a new pod every job is insanely wasteful).