“Even then, if you’re using Kubernetes, you probably won’t succeed, because it isn’t in Google’s best interest to let anyone else actually compete with GKE.”
Google succeeds if it is anybody but AWS proprietary solutions. If they can groom a healthy ecosystem of open source and commercial solutions that target Kubernetes, then the tremendous advantage of AWS being a one stop shop for any service you can imagine starts to dwindle. As of now, amazon offers a solid compute environment and services galore, which is hard to compete with.
The author didn’t do much at all to tie what happened to OpenStack with Kubernetes. K8s is deployed at scale by all the cloud providers. Both Google and Microsoft solely run containers on it in their public cloud (while AWS still has their own orchestrator). That never happened at scale with OpenStack.
And regardless of how you feel about Google, Azure has a very strong vested interest in K8s success.
Idk, isn't the value of cloud computing supposed to be reduced spend on ops? Then k8s in my opinion doesn't meet that criterion because of its sheer complexity - you very much need ops staff capable to diagnose and fix problems, not to mention the lock-in. OTOH if you just buy RDBMS, MQ, identity and compute service tiers directly, then you might have success with cost as an SME.
I think its more meant to keep spending / complexity controllable in the long term (ie. linear instead of exponential).
Howeverrrrrr you will need to make sure you are actually going to need that larger scale at some point in the future. Otherwise you are probably better off with an simpler solution like Terraform.
I think the k8s madness its a bit like the NoSQL craze in that sense.
I'm not sure about that. I thought cloud was primarily about consolidation of infrastructure, development, and operational headcount which does price some sort of OpEx reduction, but the primary goal being the reduction (elimination, in some cases) of CapEx from a pure infrastructure perspective.
Every place I've been (or "visited") has seen OpEx increase but it's more quantifiable than work-hours which are typically tracked with project buckets in some workforce tracking application.
> Google succeeds if it is anybody but AWS proprietary solutions.
Maybe? But don't forget Azure is bigger than Google's cloud services so they would also benefit from people not using AWS proprietary solutions and there are many smaller companies also in this space.
I don't think Kubernetes necessarily helps good at all in the market place itself. Maybe with talent and admiration but almost everyone offers Kubernetes deployments now.
Is azure really bigger? Because if you ask azure, they claim to be bigger than AWS as well. They are obviously lumping office 360 and all of their cloud software.
Yes, Azure is universally recognized as being far larger than Google's cloud offering (at least twice the size). Google is trailing badly behind AWS and Azure, it's why Diane Greene was forced to step down. There is an endless parade of research and prominent articles pointing this out:
And Amazon lumps WorkDocs and WorkMail and Chime into their AWS line item as well. Its the same thing. Except Amazon's just sucks, so they don't generate nearly as much revenue.
hmm. “stock” k8s still has limitations that make it look like a toy (how many nodes can you have in a cluster again? is that hyperscale?).
the pattern I’ve seen is that teams/companies that go into k8s hoping it’s going to solve the problems they have replace some of their problems (which k8s does address) with the operational burden of keeping k8s up-to-date and stable. the struggle to understand new things being rolled out and keeping up with the tech is real (another way of saying this is that k8s is nowhere close to having a story around deployment and maintenance that would make it easy to operate - big players can figure it out, but chances are you are not a big player and the big players want you to pay for their shit - pretty much like Rackspace wanted you to pay for openstack)
another aspect that people don’t seem to get is that k8s is what google is doing (or used to do 5 years ago) internally but without the supporting infrastructure/experience that google has internally.
> I am publishing this now in the hope that it can serve as a warning to everyone out there who is investing in Kubernetes. You’ll never be able to run it as effectively, at the same scale, as GKE does – unless you also invest in a holistic change to your organization.
This is a meaningless argument. I don't have to run Kubernetes at the same scale as GKE to develop--I just run minikube, which runs very well on Linux hosts. When I get ready to deploy there is a pick of environments to host on because Kubernetes apps are largely portable.
OpenStack has never achieved this level of accessibility.
1. I had a local OpenStack environment. Most of what I needed for app dev I could do there.
2. A lot of app devs aren't happy with Kubernetes and talk about it. Sometimes in the comments right here on Hacker News.
Kubernetes has a lot of parallels when you drill down and look at it. Aeva isn't the only one talking about it.
Kubernetes isn't an application platform though, in the same vein of Heroku. That's what Application developers want, and they're simply not going to get it. It was never supposed to be that.
Kubernetes competes with AWS, in a sense. Its a standard API for interacting with any cloud resource. I can give an AWS ASG an AMI, tell it to create 10 instances, and it will do it. I can give Kubernetes a Docker image, tell it to create 10 instances, and it will do it. You wouldn't expect application developers to enjoy creating AMIs, or maintaining them, or worrying about the global high availability of their 10 instances; they wouldn't enjoy that with AWS, and they won't enjoy it with Kube. And they shouldn't.
The confusion comes from the fact that Kubernetes needs to be deployed somewhere; well, lets deploy it on AWS. And now there's this expectation that Because we've created a layer on top of AWS, this layer should be closer to the application development process. It is! But, not as close as it could be, or should be in a productive shop. Kubernetes isn't the endgame; its just a better place to start.
There are two angles to this problem that I hope Kubernetes continues to see improvement on in the coming years:
First, cloud providers need deeper integration. Kubernetes should replace the AWS/GCloud/AZ API. If you want to access object storage, Kubernetes should have a resource for that which Ops can bind to an S3 bucket, then applications go through the Kube resource. If you want a queue, there should be a resource. This is HARD. But, over time I hope and do think we'll get there. You can already see some inklings of this with how LoadBalancer Services are capable of auto-provisioning cloud-specific LBs.
Second, we need stronger abstractions on top of Kubernetes for application development. Projects like Rancher are doing some work in this regard, as well as KNative and many others. Even, say, the Google Cloud console is an example of this, as it does a great job of hiding the internals behind friendly forms and dialog boxes.
I've got a very keen eye on that project as it develops.
One thing I do think is: It feels like we should be looking at this a bit more general, and saying things like "I need a Queue" not "I need an SQS Queue", allowing the operators to bind the Queue generic to an SQS queue on the backend, then using the application-facing spec to assert things like "it has to be FIFO, it has to guarantee exactly once delivery" etc. And if the backend cloud resource provider that is configured can't meet the requested spec, we get an error.
I don't know for sure if this would be better or worse. But for some generic cloud resources, like Object Buckets, Queues, PubSub, or SQL Databases, we can arrive at a commonly accepted set of generic abstract qualities that an Exemplary implementation of a system which says its a "Thing" should assert (ex: with Object Buckets, characteristics like regional redundancy, consistency, lifetime policies, immutable object locking, etc).
The interesting thing there is that now you've got a common base API for, well, common things that any application would need. Open source projects could flourish around saying "Check out NewProjectX, its a Kubernetes-compliant Object Storage provider". The backend doesn't have to be a cloud provider; it could run right on the cluster or on a different machine you own, just like how load balancers can work (see: MetalLB).
Obviously I don't expect AWS to publish an API that divorces the implementation from the spec, but I think we should think about it as a community. And also, not every cloud resource the Big 3 provide would make sense to be "more generic"; for example, having a generic "NoSQL Database" provider is far too implementation specific to account for all the differences between, say, Dynamo and Firestore. So the work AWS is doing on that project is ultimately valuable.
> Kubernetes isn't the endgame; its just a better place to start.
> Kubernetes isn't an application platform though, in the same vein of Heroku.
This, exactly
> That's what Application developers want, and they're simply not going to get it.
... that's exactly what we want, and I think the more perceptive among us doing the deciding want to choose one that is built on Kubernetes, or at least have Kubernetes ready for when one comes along, basically for the reasons you highlighted. It seems to be the winning standard, put up against AWS. It's a major improvement over the old model of how Ops has handled provisioning resources.
> Kubernetes is for people building platforms. If you are a developer building your own platform (AppEngine, Cloud Foundry, or Heroku clone), then Kubernetes is for you.
I think there are enough different choices for that "top layer of the dev stack" now, which DOES provide developers with the kind of experience they/we want, while usually protecting us from the underlying infrastructure like deployment YAML and service/ingress, that it really is a realistic concern that we'll choose the wrong one.
We want to make a choice and be stuck with it. We don't want to choose wrong and have to choose again. (Especially if we're planning to buy a support contract, which we almost definitely are. But how can we even predict which stack layer vendor we'll ultimately need support from?)
Our IT moves at an institutional pace, and guidance councils seemingly prefer we have a comprehensive plan in place before we take the first single solitary step.
My perception is that they want to wait for the market to narrow before committing to any new shiny, but it's clearly still expanding, and my sense is that I really don't want to see the market narrowing (as that might be a signal that the grand experiment just isn't going so well anymore.)
By the way, it does look like this is coming, too:
> Kubernetes should replace the AWS/GCloud/AZ API. If you want to access object storage, Kubernetes should have a resource for that which Ops can bind to an S3 bucket, then applications go through the Kube resource. If you want a queue, there should be a resource.
> We want to make a choice and be stuck with it. We don't want to choose wrong and have to choose again.
I don't know anything about your organization.
But my take is that this isn't a quality that you see in healthy organizations. We're human. We can't see the future. In the best of cases, we ingest as much information as we can find and we use that to make the best decision we can. And, truly, the best of cases never happens, but even if they did: New information is discovered. The Environment develops and changes. And that change has to respond with internal change as well.
One of the phrases I've heard people in my organization say is something along the lines of "are we sure this is the right decision?" or "how can we be sure this is the best path forward?" That's a mindset I'm trying to move people away from. The better question is "how are we accounting for a need to change in the future?" If you view a system as "X", then changing "X" becomes very hard. If you view a system as "X+Y+Z", then you can ask "how can we change Y to V without throwing away all the work we did on X and Z?" And then, in 12 months when you have X+V+Z, maybe you want to change X to W. And so on. That's continual improvement and true agility.
Its damn hard; in both implementation and changing mindsets from the "we want this perfect thing from day 1" to "its alright if it isn't perfect; its more important that we can easily change it." And, actually the hardest part is convincing people that this Is Not Optional; the most productive, highest performing organizations on the planet are the ones who know how to do this, and they'll eat your lunch if you're not ready. Maybe in 6 months, maybe in 50 years, but it'll happen.
> But my take is that this isn't a quality that you see in healthy organizations.
It might help if I told you something about my organization. I'm in University IT. We're not building a product, the product is the education, and we merely support that with technology (the students, the research, and the administrative efforts.)
That's part of the problem, to be honest, is that the organization will not rise or fail due to the tireless efforts or minor failings of IT. We like to shoot for perfect, we want to do the best thing, but unfortunately if it's a choice between making a decision that leadership sees as a little shaky or uncertain, versus maybe making a more conservative choice that doesn't have as many bells and whistles, but that we're sure we can live with for a long time, they'll have us go the conservative route every time, so we can get this one important thing off of our plate and get back to the central focal business of the University.
I appreciate the way you're decomposing the issue, because I think you're right about all of this. The problem all this time has been, (and I've started to recognize it more and more)
1. we propose Kubernetes, knowing that it solves a lot of problems for us, right out of the box. X is Kubernetes.
2. Leadership asks "what problem does X solve" looking for the big show-stopper answer that says "well obviously, we have to solve that. We'll make it a priority!"
3. For each "well obviously" the honest answer is, "X doesn't really solve that without additionally Y and Z."
We don't even really truly get to the point in the conversation where we're worried about picking the wrong X. It's in the back of everyone's head, who has done any research on Kubernetes. There are so many flavors to choose from, how do we even know that Y and Z will work when we get there, if we start with X first?
Fortunately I think the glacial tides are turning, but they don't call it an "institutional pace" for no good reason.
There weren't ever 101 different ways[1] for you to get your OpenStack environment provisioned.
101 products from 81 certified vendors. 33 completely separate, independent, certified, hosted environments, and every other different entry on the CNCF listing. With no less than 12 different ways to install it for yourself, on resources that you own one way or another. I think we're past Landmark status already, OpenStack never did all this.
As a developer, I feel I am ready to go with this approach.
It's my ops teams that can't cope with that degree of choice – they're apprehensive to choose, knowing that with 90+ options and almost all of them acceptable to me and my team, there's non-zero risk that we're going to choose the wrong one! We'll have to switch. And who knows why? We'll find out, if we settle on one.
The operational expense for us to set up Kubernetes is already great enough. The prospect of ultimately learning that maybe we picked the wrong one, then needing to switch to another one, for them, seems too large, I think.
Why not wait for the market to die down a little bit, or for that list to get just a little bit shorter first? Seems like I'll be waiting forever. If I narrow it down to only options that have been certified since K8S v1.9, maybe the choices will look a little bit more constrained.
The migration between hosted Kube clusters is trivial compared to any other product migration in the history of PaaS. Of course you'll still need to be thoughtful and delicate, but between EKS, AKS, GKE, Kubeadm-on-baremetal... The Kube API doesn't change a whole lot. Some annotation changes, but...
Just choose the cloud vendor you already trust the most or boot up a cluster on your own. It's just a set of systemd services. The level of fear regarding K8S offerings among developers is staggering and I _cannot_ figure out where it comes from. What would you "get wrong" that can't be easily changed? There aren't that many deep engineering pits to get yourself into that would take ages to get out of...
The Kubernetes API is the same everywhere but setting up the underlying infrastructure can be difficult.
In particular, there are a number of options for the networking layer and the one you choose, and the way you configure it, can have significant performance implications.
It’s not entirely fair to call it a black box but I do totally get what you’re saying. Debugging involves stitching together logs from the api-server, kube-proxy, scheduler, and kubelet itself.
You also have to be extremely careful with your affinity / anti affinity rules. The interactions get realy complicated really fast.
I prefer to talk about this in terms of taints and planes.
Almost everyone already knows the Control Plane. That's where your Kube API is served from, and it potentially includes the etcd service maintaining the cluster state. The language has changed here, but this is still the most familiar example for anyone who has run a Kube cluster at any scale.
node-role.kubernetes.io/master:NoSchedule
This taint on a node, means that only pods which tolerate the taint may occupy a node. This is how you get so-called "dedicated masters" also known as your Control Plane. You can remove the dedicated taint in a single-node cluster to get a "minikube-like" experience without necessarily fanning out, but at least keeping the option there. I think it's better to start with only a single node, that's how I've learned much of my experience at least. All of the reliability calculations are much easier when you don't need to divide by anything.
Practically nobody but cloud vendors really need to care about masters or Control Plane anymore, since so many cloud vendors have a cost-saving solution called "Managed Kubernetes" where you just consume the Kubernetes API and pay for your own application workloads, receiving the masters with High-Availability at low (or no) cost.
But that's the most basic way to explain or set up anti-affinities that I can think of. You can set up taints and tolerations for anything, say you have your own dedicated "Routing Mesh" or nodes that are used as load balancers, there'll most certainly be a taint you may use for that, or feel free to invent and supply your own. (Another thing we don't need to do, since cloud vendors provide LB services. At some layer you'll still find a place for this concept if you think about the architecture of your system or product, I suspect. But all of my boilerplate examples are stale.)
I think affinities are usually handled in other ways, like StatefulSet, but I am not really sure how to explain pod affinities. I'm still avoiding most stateful workloads, so from me the biggest advice is to be sure that you are setting up resource quotas (limits / requests) and that you have a system in place for refining those definitions. If you make sure you do that, then out-of-the-box Kubernetes will be taking care of a lot of the rest for you. Pods will have an affinity for nodes that have more resources available for them, so long as you remember to give the controller an estimate and maybe also hard cap of the resource usage for each pod deployed.
This was the major advantage of early Kubernetes when it first started putting CoreOS's Fleetd out of business. Resource-aware scheduling. You can be explicit about node affinity with NodeSelectors, like "the database server should run on the only node in the node pool which provisions its nodes with 24 cores." But if your next-largest machine has only 8 cores, it might have been enough to just say in a resource request, "the database pod itself requires at least 12 cores." The effect is not quite exactly but almost/basically the same. You might also prefer to use a taint/toleration/node selector combo to be sure that no other workloads wind up on that node which might cause performance cross-talk with the database.
k8s is a nightmare to run. You need a good clusterfuck on ansible scripts and carefully tuned ... everything .. to get something for your needs.
I've written about container systems before, and I still think the industry needs a way better solution. k3s and some of the stuff coming out of Racher might be the better way to go.
"clusterfuck" and "ansible scripts" tend to go hand-in-hand no matter what you're deploying, though - at least in my experience. That said, Ansible scripts used by OpenShift take the cake so far for me, some of the worst I ever encountered.
But guess what? You don't need all that, which is exactly what things like k3s are exploiting.
I don't think that statement was directed at end-users of Kubernetes; it was directed at the people who operate Kubernetes as a service like AWS, Oracle, Cisco, or RedHat.
A lot of the OpenStack engineering processes, and tools supporting them, were actually really strong for an open source project from the ground up. The problem was that a lot of it was engineering for engineering's sake without a care in the world for usability or what someone would actually use the end result for.
Devstack is a great example of this, because it became a critical building block for continuous integration but also meant testing was more focused on whether you could run it on a developer laptop than ever get an actual cluster working.
I worked a little bit with OpenStack about four years ago, and my impression was that it was very design by committee. Design by committee doesn't work too well in software: https://sourcemaking.com/antipatterns/design-by-committee
I think a lot of the enterprise companies supporting OpenStack, like Mirantis (https://en.wikipedia.org/wiki/Mirantis), realized this one way or another, got themselves acquired, and then used the new funding to pivot to Kubernetes or another open-source IaaS offering: https://www.mirantis.com/
Without any promise of enterprise support, there's really no way for the large companies targeted by OpenStack to adopt it and make that adoption sticky. So that's how it died.
The practical end result was that you didn't buy "OpenStack", you bought "Mirantis OpenStack" or "Juniper Openstack" (... that one was so broken...) and so on, and there wasn't much portability between them.
The basics worked across all distributions, as far as I know. (The openstack-cli which was built on the HTTP API, which was shared.)
Mostly I had problems with the classic deployment, debug, develop cycle. Reporting bugs is like throwing time out of the window, debugging through overlay networks, über verbose python daemon logs and RabbitMQ madness was also more of a surreally dark exercise in futility, than rewarding experience.
Most of the problems I experienced were problems due to the fundamental trade offs taken during the deveopment of OpenStack. And these are slowly addressed, but ... it was too little too late - at least in our case.
My opinion: it grew to a gross, large set of python and apis that, when combined with multiple implementations, extensions, and company-specific customizations, made it an unmaintanable mess that was difficult to deploy and code around.
So, although the author compares it with the growing k8s project out there now, at least k8s more clearly stewarded, more developer oriented instead of only for ops (with code quality to match), and doesn't feel as hamstrung by environments and dependencies (just try to run a little openstack setup on your laptop for development... very annoying for a project of such age with so many company's hands in the pot).
I wanted to like OpenStack but at the time I felt like it didn't... Let me?
It did everything complicated in a relatively straightforward manner, but I didn't want dev to be complicated. I wanted my dev simple, and I felt like they just weren't really interested in that (strangely enough).
Could be wrong though. Never found a way to justify adoption to the team.
Not clearly enough, IMO. Not very "clean" to begin with, it's accumulating cruft at an alarming pace, and doesn't drop much legacy over time (a painful, but necessary step in fighting code entropy). It seems to have inherited Google's internal modus operandi: launch new shit and then let it rot.
If you develop on AWS you get a supported experience for a long LONG time (see simpleDB which I used and still works even though they don't seem to market it). Same thing with old instance types. S3 etc etc.
With openstack at least a year or two ago - who can seriously stay on top of what is going on there. You could develop something 3-4 years ago and getting it going on the latest open stack = total pain. What exactly open stack was also muddy - lots of ifs/buts/this 5 year old code that ran on vendor X openstack doesn't seem to run today on vendor Y.
Didn't spend much time on open stack though - and I know the hype train was / is huge - (AWS killer etc). My own sense - a lot of folks freaked out about AWS and all WANTED openstack to work so they had some big gun to blow up AWS with - but they didn't seem to spend much time talking to actual customers / developers, while AWS certainly did.
> see simpleDB which I used and still works even though they don't seem to market it
They don't market it, and if you create an AWS account after it was deprecated in favor of Dynamo you'd basically never know it existed except for some footnotes in the Dynamo documentation referencing its predecessor.
Which is fine; hats off to AWS for maintaining it for customers for so long.
I've never seen an openstack implementation that wasn't horrible. The reason more people use k8s is it's actually less useful than openstack, which makes it more opinionated, which makes its implementations more uniform. Plus, most people deploy it either on top of an openstack or other cloud platform.
You should not build your own cloud platform; that much is obvious. It's less obvious that you should not build your own k8s, because it seems simpler and more useful.
I think the battle for running virtual machines in the enterprise was already won by vmware and so OpenStack became a niche for service providers wanting to do Nfvi workloads and had resources to afford dedicated teams to running OpenStack.
Kubernetes on the other hand capitalized on the need for running and orchestrating containers. Kubernetes also got a few things that right such as well documented and prescriptive set of tools one could use to get dev and production cluster up.
On a separate note, having worked on OpenStack I can also attest that the code was gross, not so much in kubernetes.
For me the reason why VMWare won is outlined in the article. Case in point: running OpenStack with shared storage on SAN is major PITA while for VMWare it is the recommended way to run things.
I believe that’s a bit unfair. Different Openstack components had different level of quality.
The biggest problem I saw was that little to no thought was put into what the experience of an operator would be. It looked more like a playground / place to experiment and learn than something you would bet the farm on.
If someone would have cared enough to holistically drive this across the whole platform I think this could have gone in a different direction.
We are working on a project where we plan to manage our new HPC system with OpenStack. This will replace 3 legacy HPC systems that were manged with propriatory management systems. Due to shifting requirements by our customers (scientist) we decided to move to a cloud framework where probably still the majority of resources is dedidcated to a batch scheduling system but it would allow us to also provide more cloud like services (jupyerhub, rstudio, databases, etc)
It's quite an ambitious project and we (4 engineers) basically spent the last year understanding the in and outs of OpenStack. We also went full in with integrating all kinds of datacenter components into OS (NetApp, SDN, DNS, etc)
Some lessons learned so far:
- OpenStack is very complex
- It's less of a product and more a framework and you need a dedicated engineering team with cross cutting skillset
- You definately need a dev/staging environment to test upgrades and customizations
- Some of the reference implementations of OS servies (SDN) are fine for small deployments but if can replace them with dedicated hardware/appliances you should do that.
>When you’re looking at other cloud products, think about similar conflicts of interest that might be affecting your favorite spokespersons today… (I’m looking at you, kubernetes)
Funny enough I'm deploying OpenStack right now for us as an internal playground. It's decent - but hell, the learning curve is nasty and the documentation is incomplete. Many things I could only get working after asking on IRC and waiting hours for a reply.
But still, it's better than having to manage KVM by hand and cheaper than buying VMware.
OpenStack was an extremely ambitious project, which requires so much cooperation and interoperability between so many companies that I believe it was doomed from the start.
In 2015 my company purchased "Flexpod" which is a solution that's certified by VMware, Cisco, and NetApp to work together. The result is nothing but a bunch of back and forth finger pointing with support, and even a critical vulnerability will take 6+ months to get patched and certified between all the different vendors.
I personally like the Ansible approach where each Storage/Computer/Network vendor provides APIs for management of their devices, and Ansible is just the glue between them.
TL;DR getting major tech vendors to play nice together is hard.
K8s is going in that direction. Providing an API for storage providers to implement and letting them drive the implementation versus trying to offer a monolithic all batteries included solution.
> OpenStack was an extremely ambitious project, which requires so much cooperation and interoperability between so many companies that I believe it was doomed from the start.
OpenStack was doomed from the start, but the reasons were subtly different than the difficulties of integrating software from multiple vendors.
However difficult OpenStack was to get (and keep) running, it would have been worth it if, once you (or anyone else) got an OpenStack instance up and running, a developer could migrate their app (from or to any other OpenStack cloud) with no code or configuration changes.
That was never really possible, since every OpenStack-based cloud provider insisted on adding their own special sauce to the developer experience. That was, for most of them, the whole point of participating in OpenStack: Sharing the cost of developing the code necessary for building a public or private cloud, but locking in their customers just as firmly as AWS was doing.
As a result, none of those individual cloud providers ever got big enough to give AWS serious competition, and there wasn't never a realistic portability story that could give them collective weight in the market.
The simplest way to ensure developed and deployed applications and services were truly (and trivially) portable between the different OpenStack cloud providers would have been to commit OpenStack to API-compatibility with AWS (which would also have given AWS customers a clear migration path to OpenStack), but this suggestion was rejected outright at the outset of the project.
Having real portability between different OpenStack-based cloud providers and AWS would mean that competition would largely have been on price (and quality/reliability) rather than features, both lowering AWS' margins and growing the market faster than Amazon could capture, as well as enabling higher-level businesses such as marketplaces with dynamic spot-pricing and rapid migration of jobs between providers.
Unfortunately, that isn't how it shook out. None of the cloud companies behind OpenStack wanted to compete with AWS on price instead of features, which meant they stuck their heads in the sand, kept their attractive margins, but effectively ceded the bulk of the market to Amazon, keeping prices high enough that the market grew only about as fast as Amazon/MS/Google could collectively add capacity (roughly maintaining AWS marketshare), rather than the hockey-stick growth that would have happened if everyone could have got in on the act (like web-hosting did in the 90s).
I've been using AWS for years, and had never heard of OpenStack until this year.
The only reason I'm aware of it, is because I'm studying for a degree part time - and OpenStack is taught in one of the modules. It's a shame really that they only mention AWS and advise against using it in case you accidentally spend money.
As an outsider, the concept of openstack was always really appealing but it felt like RackSpace’s answer to AWS. Never really learned more about it than what was on their website but that was always the impression I got.
I think the question is not "What happened to OpenStack" but rather, "Is OpenStack still total garbage?"
I've been at two companies that attempted to go down the OpenStack route. One wanted to start a cloud offering to their clients and hemorrhaged tons of money trying to just keep OpenStack stable. We couldn't even run our basic Logstash offerings on our OpenStack cluster without them having bizarre performance issues.
We had a really good manager too who had accounts on every other provider (Rackspace, RedHat, Canonical .. all the big ones) and time and time again he was like, "What is this? How are they doing this.." and we just figured they used a ton of specialized proprietary plugins they just weren't open sourcing or a ton of special patch sets.
Second shop had tried moving onto an OpenStack cluster to save on AWS prices. It could never run anything reliably and they scrapped the entire project and re-purposed all the servers for DC/OS, which was super nice and reliable and every team migrate hundreds of services onto.
Interesting that you cite stability concerns. I am not sure that's the case anymore.
My employer runs 5-6 complete openstack environments and those things have never had an unplanned outage that I'm aware of. My stuff hasn't ever gone down, I know that.
Back in 2013 we had to evaluate existing cloud/VM platforms in order to replace the plain KVM/libvirt and support and enable the growth. oVirt was garbage (missing installation ISOs, randomly broken install process, cluster nodes not communicating etc.), OpenNebula buggy, OpenStack seemed to be quite hard to grasp, Hyper-V Windows only and VMWare expensive as hell (even now the TCO Calculator gives us 4000+ EUR/VM - this must be joke).
We run several VMs with docker and our apps, manage dedicated servers and their networks (VLANs as provider networks in OpenStack), provide IPSec VPNs to for tenants, run Kubernetes clusters on OpenStack. We also manage several dedicated servers that are not managed by OpenStack for historical reasons and hopefully will migrate them to the cloud.
If OpenStack makes our heads hurt it is due to lack of documented design patterns. After the years, documentation is good for the initial deployment and IMHO for developers (either API consumers or contributors), but no so much for network engineers or system architects.
Some design choices are pretty crucial upfront and you will pay the price to change the design. We ended up modifying database records several times and then slowly rolled the changes to the compute nodes. Recently the OpenVSwitch flow tables were populated undeterministically after some network changes and we had to inspect sources and even then did not understand, why do we experience the issues.
But never did we encounter the stability issues, that were not caused by wild actions of an administrator.
So I guess the typical node is the beefiest you could get in order to justify the license price?
I had this conversation with a colleague who offers VMWare managed Windows VMs for his clients and he told me similar thing, but on the other hand, he was shocked of the prices of our hardware (approx. $6k per server) and was seriously considering migrating to the OpenStack.
A 2U or 4U server is $10k to $20k. You're going to fit all you can in the box, including a minimum of 512GB of memory.
It's not just about license costs though, it's about hardware costs and capacity management. You want to have as few servers as possible for a given capacity, it's easier to manage and cheaper. You must have VmWare to abstract the hardware, a bit like AWS. You work with virtual machines and it packs them on the hosts.
Last I bought it but that was a few years ago. VmWare was $5000 per node for the full package. There was a free edition limited to about 100 GB of memory, but without cluster management and live transfer of running VM (vMotion).
Do your VMs have very long uptimes? This is an indicator of the stable system fallacy.
If the system was implemented correctly the first time, resource use never exceeds capacity, maintenance always works properly, versions are always up to date, and the infrastructure (power, network, host, storage, cooling) never has problems, then the system appears perfectly stable. But introduce changes and errors with increasing frequency and you quickly find out how robust it actually is.
To me the problem with openstack is a strategic one. The Project tries to automate status qou putting the compatibility burden on the automation. PaaS alternative like Kubernetes on the other hand turns this on its head and puts the burden on the komponents of the applications. This is a much better long term strategy for keeping komplexity curbed.
Also openstack was open for fake vendor openness. Where vendors could make compatible api with extensions. This doesn’t help the system integrator in the long run.
I think if you're going to commit to an offering like this, you need a strong team and they should be able to debug such issues.
Your performance issues are only really going to be related to the VM tech, overlay network,storage layer, the orchestrator settings, or logstash itself.
Given that to can switch these in and out, you can isolate the problem and replace the broken part. You can also trace the app to see what syscalls are taking so long.
You can have similar issues with pretty much any environment if your team can't debug that, and if that's the case you should probably go for a popular vendor supported solution, but you'll be in a sad place when the vendor doesn't have the staff to debug their solution, so pick carefully.
This post isn't supposed to sound insulting to you or your ex colleagues, just pointing out that there is a gulf of knowledge between the guys who can get things to work, and the guys who can tell you why something doesn't work, and this gulf only really presents itself when shit hits the fan.
> We couldn't even run our basic Logstash offerings on our OpenStack cluster without them having bizarre performance issues.
I'd really be interested in post-mortems. As long as you're not using SDN/overlay networks/weird plugins for Cinder instead of plain NFS, many components of OpenStack are nothing more but a config generator and deployer for core Linux iptables/bridges/KVM virtualization.
> It could never run anything reliably and they scrapped the entire project and re-purposed all the servers for DC/OS, which was super nice and reliable and every team migrate hundreds of services onto.
We're moving our stuff away from DC/OS as we're sick of the instabilities and especially the UI and configuration changing every release. It's been two and a half years of banana-ware for us.
We started with 1.8 and upgraded to 1.12 finally end of last year.
Our biggest pain point, next to the tendency of amok-running deployments leading to disks filled up with useless logs (leading once to a totally corrupted master after a weekend), was/is that the "official" Jenkins package is the ultimate PITA to upgrade, massively lags behind despite security issues (current: 2.150.3 - mesosphere/jenkins: 2.150.1!) and you can't even run Jenkins outside of DC/OS because it needs the Marathon shared library to work.
Another thing that we dearly missed was the ability to "drain" a node - for example if I want to perform maintenance on a node, but cannot shut it down right now as a service on the node is being used... then I'd like to at least prevent new jobs from being spawned on that node. Or during system upgrades that stopping the resolvconf generator does not restore the original resolv.conf leading to a broken DNS, or when specifying NTP servers by name that the NTP server could not be resolved at boot time (as the resolv.conf still referred to the weird DCOS-round-robin-DNS), leading to DC/OS not wanting to start because the clock was out of sync,...
k8s - first in our OpenStack environment (once I figure out how to get external connectivity) and then once all the deployment jobs written for DC/OS are migrated to k8s, the DC/OS nodes will be reprovisioned with k8s on bare metal.
No cloud for that project, contractual prohibition - everything must be kept in-house.
This way of writing (like a journalist) is so annoying. I prefer much the scientific style of writing where the main idea is given in the first sentence, and suspense is avoided as much as possible.
The very first thing on the page tells you that it's a personal blog consisting of "online ramblings," so I think it's a trifle unfair to complain about the style.
The author chose to write a first person narrative on their personal blog. What you prefer doesn’t really matter, except in so far as you can choose not read the piece.
Ironically, what you describe is the way journalists used to write before the internet. Nowadays, everyone wants to bore their readers with badly written narratives and nonsensical tangents thinking it's good writing. I hate it too and wish journalists and others would take a journalism class.
Google succeeds if it is anybody but AWS proprietary solutions. If they can groom a healthy ecosystem of open source and commercial solutions that target Kubernetes, then the tremendous advantage of AWS being a one stop shop for any service you can imagine starts to dwindle. As of now, amazon offers a solid compute environment and services galore, which is hard to compete with.
The author didn’t do much at all to tie what happened to OpenStack with Kubernetes. K8s is deployed at scale by all the cloud providers. Both Google and Microsoft solely run containers on it in their public cloud (while AWS still has their own orchestrator). That never happened at scale with OpenStack.
And regardless of how you feel about Google, Azure has a very strong vested interest in K8s success.