You might not like this answer, but... play with it. In our environment, we're c...

mdaniel · on Sept 9, 2016

Can you elaborate on the information you wish was documented but isn't?

Also, have you seen the box at the bottom of the docs that says "I wish this page ..."? It goes right into their issue tracker, which increases the likelihood of something getting fixed.

jat850 · on Sept 9, 2016

To be fair, I have not seen or used that link, but I will take note of it for the future - thank you, that's very helpful.

I should keep notes on specifics, so I apologize that I can't highlight a particular thing that I've been frustrated by in a given moment. I'll certainly say that the documentation is improving.

As a general item, I think my biggest struggle has been hitting a wall with the documentation - there are some things that are left almost as exercises to the reader, especially getting into more advanced topics (how does one do rolling upgrades or canary deployments in more esoteric situations, how might one do load balancing properly in a master-slave style configuration versus something more round-robin oriented, etc.)

And I don't like to levy a complaint without acknowledging that anything opensource is ripe for improvement by contribution, but my professional situation prevents me from doing so. Trust that I would love nothing more than to help and not just make an empty observation or moan about it.

mdaniel · on Sept 9, 2016

So I typed out this whole answer and only realized at the end that I should have asked: have you tried their Slack channel: https://kubernetes.slack.com/messages/kubernetes-users/ and/or their mailing list https://groups.google.com/forum/#!forum/kubernetes-users for getting answers to your specific concerns?

how does one do rolling upgrades or canary deployments in more esoteric situations

So there are a couple of moving parts to that: foremost, Kubernetes is only a tool, and so I would doubt there is The One True Way(tm) of doing canary deployments -- and I'm taking liberties with your comment in that I presume you don't mean the literal rolling-upgrade which is ``kubectl rolling-update`` and is very smart. Having said "it's just a tool," the first thing that sprung to mind when reading canary deployments was the health checker built into the ReplicationControllers. There are several ways one can check on the "health" of a Pod, and k8s will remove any Pod from service that has declared itself ineligible for service.

If the QoS metrics are too broad for any one given Pod to have insight into, k8s has a very robust API through which external actors can manipulate the state of the cluster, rolling back the upgrade if it turns out the new deploy is proving problematic.

I hope he doesn't throw tomatoes at me for suggesting this, but Kelsey Hightower <https://github.com/kelseyhightower> is an amazing, absolutely amazing, community resource. If you don't find inspiration from the talks he has given, then reach out on Twitter and ask for a good place to read up on your concerns. I bet if he doesn't know the answer, he will know the next contact for you to try.

load balancing properly in a master-slave style configuration versus something more round-robin oriented

Now there is something you and k8s may genuinely have a disagreement about - in its mental model (to the best of my knowledge) every Pod that matches the selector for a Service is a candidate to receive traffic. So in that way, there is no "slave" because everyone is equal. However, if I am hearing the master-slave question correctly, failover is automagic because of the aforementioned health check.

jat850 · on Sept 9, 2016

I have, yes, and there's no doubt they're both crucial in getting answers to some of those more advanced topics.

edit I want to also add that the Kubernetes community meeting is fantastic. I don't attend every week but I do play catchup on the videos released after.

Regarding rolling updates, I have found that looking at the kubernetes API - and discerning what kubectl is doing under the hood - is helpful in some cases. We've taken to API inspection to assist with some of the more complex update patterns we're trying to address.

Completely agreed on the value of Kelsey to the community. I hesitate about contacting any one person directly on a topic, but his guides and github repo are just outstanding.

On the load balancing/master slave thing - I would have agreed completely with you. Master/slave configurations seem antithetical to some of the core Kubernetes concepts... but then the waters are becoming muddied because it seems like PetSets are a response to that missing pattern. I think you're right on the mental model. Every pod is (from what I understand), or at least should be, seen as an equal candidate, the only thing being "can it receive and service requests, if yes, kubernetes doesn't care".

Failover by letting Kubernetes do the dirty work is, of course, an option. If the master or a slave dies - it'll come back up. Except... it's also at odds with anything that handles its own clustering model (thinking Redis Sentinel or Redis Cluster, or InfluxDB when it had clustering support). Sometimes "coming back up" in a system that has an orthogonal view to Kubernetes is challenging.

It also doesn't accommodate for the situation where... what if Kubernetes doesn't bring it back up for some reason? If the pod was killed for resource contention, or something along those lines. And now I have two slaves, no master, nowhere to feed writes?

I dont't complain too loudly about these things, because most often the solution is (correctly) to think differently about it, but there are real-world considerations that don't always fit perfectly with the existing Kubernetes model - but we may still have to use them from time to time.

smarterclayton · on Sept 10, 2016

PetSet co-author here:

Think of PetSet as a design pattern for running a cluster on Kubernetes. It should offer a convenient pattern that someone can use to Do-The-Right-Thing.

The hard part of clustering software is configuration change management. If an admin adds a new member to the cluster by hand, they're usually correct (the human "understands" that the change is safe because no partition is currently going on). But an admin can also induce split brain. PetSets on Kube try to make configuration changes predictable by leveraging their own strong consistency (backed by the etcd quorum underneath the Kube masters) to tell each member the correct configuration at a point in time.

The PetSet goal is to allow relative experts in a particular technology to make a set of safe decisions for their particular clustered software that behaves exactly as they would predict, so that they can make a set of recommendations about how to deploy on the platform.

For instance, CoreOS is working to provide a set of recommendations for etcd on pet sets that would be correct for anyone who wants to run a cluster safely, and encode that into tutorials / docs / actual pet definitions. They're also driving requirements. Other folks have been doing that for elastic search, zookeeper, galera, cassandra, etc. PetSets won't be "done" until it's possible to take the real world considerations into account safely when writing one.

jat850 · on Sept 10, 2016

That's really valuable insight, thank you. The biggest struggle so far has been with zookeeper and Kafka. Things that are stateful have posed a lot of difficulty - from a mental perspective more than fighting against Kubernetes specifically, just trying to think and adhere more to microservices principles.

I'm following PetSets very closely and I think they're going to help a great deal.

Is it accurate that PetSets were introduced to compensate for the type of thing I've singled out? That certain models just don't quite fit with the "everything is equal and interchangeable" notion? Or does it just feel that way? I don't want to end up missing the point of PetSets or using them for something they're not truly intended for, or leaning on them only to hit a stumbling block.

smarterclayton · on Sept 10, 2016

Yes. PetSets are intended to be step one of "some things need to be stable and replaceable at the same time". So that unit can then be composed into apps (i.e. Zookeeper + Kafka might be a PetSet for ZK and multiple scale groups for Kafka). basically trying to boil the problem space down into "what do cluster aware software need to be safe", like protection from split brain and predictable ordering.

There is likely to be more needed (on top or around) because we need to also solve problems for highly available "leaders" and active/passive setups. Think of what something like pacemaker does to ensure you can run two Linux systems in an HA fashion, and then map the necessary parts to Kube (fencing, ip failover, cloning, failure detection). Still a lot of hard problems to solve - PetSets are the first building block.

hnarayanan · on Sept 9, 2016

I agree with the parent comment, it is easiest to just play with it.

If you're particularly interested in Python webapps, I've written a tutorial on how Kubernetes can help deploy a Django app in a scalable and resilient fashion: https://harishnarayanan.org/writing/kubernetes-django/