> I am publishing this now in the hope that it can serve as a warning to everyone out there who is investing in Kubernetes. You’ll never be able to run it as effectively, at the same scale, as GKE does – unless you also invest in a holistic change to your organization.
This is a meaningless argument. I don't have to run Kubernetes at the same scale as GKE to develop--I just run minikube, which runs very well on Linux hosts. When I get ready to deploy there is a pick of environments to host on because Kubernetes apps are largely portable.
OpenStack has never achieved this level of accessibility.
1. I had a local OpenStack environment. Most of what I needed for app dev I could do there.
2. A lot of app devs aren't happy with Kubernetes and talk about it. Sometimes in the comments right here on Hacker News.
Kubernetes has a lot of parallels when you drill down and look at it. Aeva isn't the only one talking about it.
Kubernetes isn't an application platform though, in the same vein of Heroku. That's what Application developers want, and they're simply not going to get it. It was never supposed to be that.
Kubernetes competes with AWS, in a sense. Its a standard API for interacting with any cloud resource. I can give an AWS ASG an AMI, tell it to create 10 instances, and it will do it. I can give Kubernetes a Docker image, tell it to create 10 instances, and it will do it. You wouldn't expect application developers to enjoy creating AMIs, or maintaining them, or worrying about the global high availability of their 10 instances; they wouldn't enjoy that with AWS, and they won't enjoy it with Kube. And they shouldn't.
The confusion comes from the fact that Kubernetes needs to be deployed somewhere; well, lets deploy it on AWS. And now there's this expectation that Because we've created a layer on top of AWS, this layer should be closer to the application development process. It is! But, not as close as it could be, or should be in a productive shop. Kubernetes isn't the endgame; its just a better place to start.
There are two angles to this problem that I hope Kubernetes continues to see improvement on in the coming years:
First, cloud providers need deeper integration. Kubernetes should replace the AWS/GCloud/AZ API. If you want to access object storage, Kubernetes should have a resource for that which Ops can bind to an S3 bucket, then applications go through the Kube resource. If you want a queue, there should be a resource. This is HARD. But, over time I hope and do think we'll get there. You can already see some inklings of this with how LoadBalancer Services are capable of auto-provisioning cloud-specific LBs.
Second, we need stronger abstractions on top of Kubernetes for application development. Projects like Rancher are doing some work in this regard, as well as KNative and many others. Even, say, the Google Cloud console is an example of this, as it does a great job of hiding the internals behind friendly forms and dialog boxes.
I've got a very keen eye on that project as it develops.
One thing I do think is: It feels like we should be looking at this a bit more general, and saying things like "I need a Queue" not "I need an SQS Queue", allowing the operators to bind the Queue generic to an SQS queue on the backend, then using the application-facing spec to assert things like "it has to be FIFO, it has to guarantee exactly once delivery" etc. And if the backend cloud resource provider that is configured can't meet the requested spec, we get an error.
I don't know for sure if this would be better or worse. But for some generic cloud resources, like Object Buckets, Queues, PubSub, or SQL Databases, we can arrive at a commonly accepted set of generic abstract qualities that an Exemplary implementation of a system which says its a "Thing" should assert (ex: with Object Buckets, characteristics like regional redundancy, consistency, lifetime policies, immutable object locking, etc).
The interesting thing there is that now you've got a common base API for, well, common things that any application would need. Open source projects could flourish around saying "Check out NewProjectX, its a Kubernetes-compliant Object Storage provider". The backend doesn't have to be a cloud provider; it could run right on the cluster or on a different machine you own, just like how load balancers can work (see: MetalLB).
Obviously I don't expect AWS to publish an API that divorces the implementation from the spec, but I think we should think about it as a community. And also, not every cloud resource the Big 3 provide would make sense to be "more generic"; for example, having a generic "NoSQL Database" provider is far too implementation specific to account for all the differences between, say, Dynamo and Firestore. So the work AWS is doing on that project is ultimately valuable.
> Kubernetes isn't the endgame; its just a better place to start.
> Kubernetes isn't an application platform though, in the same vein of Heroku.
This, exactly
> That's what Application developers want, and they're simply not going to get it.
... that's exactly what we want, and I think the more perceptive among us doing the deciding want to choose one that is built on Kubernetes, or at least have Kubernetes ready for when one comes along, basically for the reasons you highlighted. It seems to be the winning standard, put up against AWS. It's a major improvement over the old model of how Ops has handled provisioning resources.
> Kubernetes is for people building platforms. If you are a developer building your own platform (AppEngine, Cloud Foundry, or Heroku clone), then Kubernetes is for you.
I think there are enough different choices for that "top layer of the dev stack" now, which DOES provide developers with the kind of experience they/we want, while usually protecting us from the underlying infrastructure like deployment YAML and service/ingress, that it really is a realistic concern that we'll choose the wrong one.
We want to make a choice and be stuck with it. We don't want to choose wrong and have to choose again. (Especially if we're planning to buy a support contract, which we almost definitely are. But how can we even predict which stack layer vendor we'll ultimately need support from?)
Our IT moves at an institutional pace, and guidance councils seemingly prefer we have a comprehensive plan in place before we take the first single solitary step.
My perception is that they want to wait for the market to narrow before committing to any new shiny, but it's clearly still expanding, and my sense is that I really don't want to see the market narrowing (as that might be a signal that the grand experiment just isn't going so well anymore.)
By the way, it does look like this is coming, too:
> Kubernetes should replace the AWS/GCloud/AZ API. If you want to access object storage, Kubernetes should have a resource for that which Ops can bind to an S3 bucket, then applications go through the Kube resource. If you want a queue, there should be a resource.
> We want to make a choice and be stuck with it. We don't want to choose wrong and have to choose again.
I don't know anything about your organization.
But my take is that this isn't a quality that you see in healthy organizations. We're human. We can't see the future. In the best of cases, we ingest as much information as we can find and we use that to make the best decision we can. And, truly, the best of cases never happens, but even if they did: New information is discovered. The Environment develops and changes. And that change has to respond with internal change as well.
One of the phrases I've heard people in my organization say is something along the lines of "are we sure this is the right decision?" or "how can we be sure this is the best path forward?" That's a mindset I'm trying to move people away from. The better question is "how are we accounting for a need to change in the future?" If you view a system as "X", then changing "X" becomes very hard. If you view a system as "X+Y+Z", then you can ask "how can we change Y to V without throwing away all the work we did on X and Z?" And then, in 12 months when you have X+V+Z, maybe you want to change X to W. And so on. That's continual improvement and true agility.
Its damn hard; in both implementation and changing mindsets from the "we want this perfect thing from day 1" to "its alright if it isn't perfect; its more important that we can easily change it." And, actually the hardest part is convincing people that this Is Not Optional; the most productive, highest performing organizations on the planet are the ones who know how to do this, and they'll eat your lunch if you're not ready. Maybe in 6 months, maybe in 50 years, but it'll happen.
> But my take is that this isn't a quality that you see in healthy organizations.
It might help if I told you something about my organization. I'm in University IT. We're not building a product, the product is the education, and we merely support that with technology (the students, the research, and the administrative efforts.)
That's part of the problem, to be honest, is that the organization will not rise or fail due to the tireless efforts or minor failings of IT. We like to shoot for perfect, we want to do the best thing, but unfortunately if it's a choice between making a decision that leadership sees as a little shaky or uncertain, versus maybe making a more conservative choice that doesn't have as many bells and whistles, but that we're sure we can live with for a long time, they'll have us go the conservative route every time, so we can get this one important thing off of our plate and get back to the central focal business of the University.
I appreciate the way you're decomposing the issue, because I think you're right about all of this. The problem all this time has been, (and I've started to recognize it more and more)
1. we propose Kubernetes, knowing that it solves a lot of problems for us, right out of the box. X is Kubernetes.
2. Leadership asks "what problem does X solve" looking for the big show-stopper answer that says "well obviously, we have to solve that. We'll make it a priority!"
3. For each "well obviously" the honest answer is, "X doesn't really solve that without additionally Y and Z."
We don't even really truly get to the point in the conversation where we're worried about picking the wrong X. It's in the back of everyone's head, who has done any research on Kubernetes. There are so many flavors to choose from, how do we even know that Y and Z will work when we get there, if we start with X first?
Fortunately I think the glacial tides are turning, but they don't call it an "institutional pace" for no good reason.
There weren't ever 101 different ways[1] for you to get your OpenStack environment provisioned.
101 products from 81 certified vendors. 33 completely separate, independent, certified, hosted environments, and every other different entry on the CNCF listing. With no less than 12 different ways to install it for yourself, on resources that you own one way or another. I think we're past Landmark status already, OpenStack never did all this.
As a developer, I feel I am ready to go with this approach.
It's my ops teams that can't cope with that degree of choice – they're apprehensive to choose, knowing that with 90+ options and almost all of them acceptable to me and my team, there's non-zero risk that we're going to choose the wrong one! We'll have to switch. And who knows why? We'll find out, if we settle on one.
The operational expense for us to set up Kubernetes is already great enough. The prospect of ultimately learning that maybe we picked the wrong one, then needing to switch to another one, for them, seems too large, I think.
Why not wait for the market to die down a little bit, or for that list to get just a little bit shorter first? Seems like I'll be waiting forever. If I narrow it down to only options that have been certified since K8S v1.9, maybe the choices will look a little bit more constrained.
The migration between hosted Kube clusters is trivial compared to any other product migration in the history of PaaS. Of course you'll still need to be thoughtful and delicate, but between EKS, AKS, GKE, Kubeadm-on-baremetal... The Kube API doesn't change a whole lot. Some annotation changes, but...
Just choose the cloud vendor you already trust the most or boot up a cluster on your own. It's just a set of systemd services. The level of fear regarding K8S offerings among developers is staggering and I _cannot_ figure out where it comes from. What would you "get wrong" that can't be easily changed? There aren't that many deep engineering pits to get yourself into that would take ages to get out of...
The Kubernetes API is the same everywhere but setting up the underlying infrastructure can be difficult.
In particular, there are a number of options for the networking layer and the one you choose, and the way you configure it, can have significant performance implications.
It’s not entirely fair to call it a black box but I do totally get what you’re saying. Debugging involves stitching together logs from the api-server, kube-proxy, scheduler, and kubelet itself.
You also have to be extremely careful with your affinity / anti affinity rules. The interactions get realy complicated really fast.
I prefer to talk about this in terms of taints and planes.
Almost everyone already knows the Control Plane. That's where your Kube API is served from, and it potentially includes the etcd service maintaining the cluster state. The language has changed here, but this is still the most familiar example for anyone who has run a Kube cluster at any scale.
node-role.kubernetes.io/master:NoSchedule
This taint on a node, means that only pods which tolerate the taint may occupy a node. This is how you get so-called "dedicated masters" also known as your Control Plane. You can remove the dedicated taint in a single-node cluster to get a "minikube-like" experience without necessarily fanning out, but at least keeping the option there. I think it's better to start with only a single node, that's how I've learned much of my experience at least. All of the reliability calculations are much easier when you don't need to divide by anything.
Practically nobody but cloud vendors really need to care about masters or Control Plane anymore, since so many cloud vendors have a cost-saving solution called "Managed Kubernetes" where you just consume the Kubernetes API and pay for your own application workloads, receiving the masters with High-Availability at low (or no) cost.
But that's the most basic way to explain or set up anti-affinities that I can think of. You can set up taints and tolerations for anything, say you have your own dedicated "Routing Mesh" or nodes that are used as load balancers, there'll most certainly be a taint you may use for that, or feel free to invent and supply your own. (Another thing we don't need to do, since cloud vendors provide LB services. At some layer you'll still find a place for this concept if you think about the architecture of your system or product, I suspect. But all of my boilerplate examples are stale.)
I think affinities are usually handled in other ways, like StatefulSet, but I am not really sure how to explain pod affinities. I'm still avoiding most stateful workloads, so from me the biggest advice is to be sure that you are setting up resource quotas (limits / requests) and that you have a system in place for refining those definitions. If you make sure you do that, then out-of-the-box Kubernetes will be taking care of a lot of the rest for you. Pods will have an affinity for nodes that have more resources available for them, so long as you remember to give the controller an estimate and maybe also hard cap of the resource usage for each pod deployed.
This was the major advantage of early Kubernetes when it first started putting CoreOS's Fleetd out of business. Resource-aware scheduling. You can be explicit about node affinity with NodeSelectors, like "the database server should run on the only node in the node pool which provisions its nodes with 24 cores." But if your next-largest machine has only 8 cores, it might have been enough to just say in a resource request, "the database pod itself requires at least 12 cores." The effect is not quite exactly but almost/basically the same. You might also prefer to use a taint/toleration/node selector combo to be sure that no other workloads wind up on that node which might cause performance cross-talk with the database.
k8s is a nightmare to run. You need a good clusterfuck on ansible scripts and carefully tuned ... everything .. to get something for your needs.
I've written about container systems before, and I still think the industry needs a way better solution. k3s and some of the stuff coming out of Racher might be the better way to go.
"clusterfuck" and "ansible scripts" tend to go hand-in-hand no matter what you're deploying, though - at least in my experience. That said, Ansible scripts used by OpenShift take the cake so far for me, some of the worst I ever encountered.
But guess what? You don't need all that, which is exactly what things like k3s are exploiting.
I don't think that statement was directed at end-users of Kubernetes; it was directed at the people who operate Kubernetes as a service like AWS, Oracle, Cisco, or RedHat.
A lot of the OpenStack engineering processes, and tools supporting them, were actually really strong for an open source project from the ground up. The problem was that a lot of it was engineering for engineering's sake without a care in the world for usability or what someone would actually use the end result for.
Devstack is a great example of this, because it became a critical building block for continuous integration but also meant testing was more focused on whether you could run it on a developer laptop than ever get an actual cluster working.
This is a meaningless argument. I don't have to run Kubernetes at the same scale as GKE to develop--I just run minikube, which runs very well on Linux hosts. When I get ready to deploy there is a pick of environments to host on because Kubernetes apps are largely portable.
OpenStack has never achieved this level of accessibility.