Kubernetes 1.9: Apps Workloads GA and Expanded Ecosystem

slap_shot · on Dec 18, 2017

I cannot give enough thanks to everyone that works so hard on Kubernetes. The speed of these releases, community support, industry adoption, feature set, and ease of use are beyond anything I could have asked for.

I started a startup about 18 months ago and our product requires a massive amount data pipelining. The infrastructure we have running on Kubernetes is beyond what a small group should be able to deploy and maintain by ourselves, but Kubernetes makes it not only possible, but enjoyable (really).

To everyone involved (and there are a lot): Thank you SO much. I can't wait to see what 2018 looks like for Kubernetes.

ericand · on Dec 18, 2017

Great post. Could you describe further what you mean by data pipelining? Is this something that you could also use Spark, Kafka for? I'd be interested to know if Kubernetes would work for me as well.

atombender · on Dec 18, 2017

One promising piece of tech is Pachyderm [1], which runs natively on Kubernetes. It allows you to set up pipelines of batch jobs (no streaming yet, as far as I know) that operate on immutable, versioned data.

There are some gotchas, though. For one [2], it requires read-write access to your storage buckets (!), and so on GKE you have to modify your nodepool's access scopes to give the whole node read-write access to all your buckets. Not is this terrible security practice, but access scopes are deprecated on Google Cloud. You should always grant access through service accounts, not scopes. We spent some time fighting this, as there's literally no way to make Pachyderm use a service account.

There are some other niggling issues, and the Pachyderm team has been less than enthuastic in addressing them (e.g. see [3], which has zero activity despite being absolutely crucial to run in production), but it does seem quite promising. The architecture, at least, is sound.

[1] http://pachyderm.io

[2] https://github.com/pachyderm/pachyderm/issues/2538

[3] https://github.com/pachyderm/pachyderm/issues/2537

jdoliner · on Dec 18, 2017

Hey lobster_johnson, Pachyderm CEO here. Thanks for the shout out and we're glad you feel Pachyderm is promising for these use cases. I just wanted to clarify / respond to a few things.

> no streaming yet, as far as I know

Pachyderm actually does do streaming, there's nothing in the API about it because it all happens automatically. This is one of the huge advantages of immutable data, the system can automatically detect if it's already done a computation by looking at a hash of the input and the code. The system will never do the same computation twice, all pipelines are streaming by default.

Sorry we've been less than enthusiastic in addressing these niggling issues. The truth is it's pretty hard to be enthusiastic in addressing issues for specific deployment scenarios. AFAIK the Google Cloud Storage library we're using doesn't support service accounts, the whole library has been deprecated, the second GCS library we've had deprecated underneath us. So upgrading that is a bit of a pain but the bigger pain is that now users with existing Google clusters will have their clusters break under them unless we can figure out a way to get scopes and service accounts to both work. And we have every reason to believe that all of this will be deprecated within the next 6 months and we'll be on to our 4th GCS library. And that's only one deployment scenario, we're doing the same thing for AWS, Azure and a host of local deployment options.

Anyways, these are our problems to solve, and we will solve them. We're thrilled that you want to use Pachyderm and we want to make that experience as smooth as possible. We just need to ask for a bit of patience regarding these deployment scenarios.

atombender · on Dec 19, 2017

Thanks for responding. First of all, I don't buy that "deployment scenarios" shouldn't be a primary concern of yours. Both Pachyderm and Kubernetes are frameworks for deploying code.

Secondly, it's not like Google Cloud Storage is some kind of esoteric platform. You decided to support GCS, and so with that comes to expectation of up-to-date support. The details is where the devil is, and that's where you have to be diligent in ironing out the bumps.

Thirdly, while it's absolutely frustrating that Google shuffles the Go SDK around so much, it seems that that the surface area you need to cover here is rather small, and GCS itself hasn't changed. As far as I can tell, the fix for #2538 requires just a single line change. (I'll comment on the issue.)

That said, I'm sure this is just a temporary speed bump while Pachyderm matures. I'm looking forward to using it.

As for streaming, I know that Pachyderm streams the data internally, but I was referring to real-time streaming processing á la Spark, as opposed to batch jobs. From what I can tell, Pachyderm is not only batch-job-based, but its data model is based around files, so it's not suitable for situations where you have a continuous, fast, unbounded stream of data (e.g. analytics events).

dsnuh · on Dec 19, 2017

Do you guys have any deployment docs or Kubernetes deployment for on-prem Pachyderm? I tried setting it up a while back on our cluster and it was pretty challenging.

jdoliner · on Dec 19, 2017

We do: http://docs.pachyderm.io/en/latest/deployment/on_premises.ht...

Hopefully that should work for you. Deploying on prem can mean a lot of different things so some paths will work better than others. The most common on prem deployments we see are back by Minio.

dsnuh · on Dec 19, 2017

Great, thanks for the info. Admittedly, this was a while back and I know docs change quickly in this space.

I agree that on-prem can be a challenge. You really have to know your stuff, and your environment. We run it in production and it has been quite a journey.

jaz46 · on Dec 19, 2017

Deploying Pachyderm on-prem is very similar to deployment on a cloud provider, you just need Kubernetes and an object store. The challenging part in my experience is getting an on-prem Kubernetes cluster, but this is well-tested territory for Kubernetes these days. We point people towards this guide to figure out the best Kubernetes deployment option for their infra:

https://kubernetes.io/docs/setup/pick-right-solution/

Pachyderm on-prem guide: http://pachyderm.readthedocs.io/en/latest/deployment/on_prem...

lemoncucumber · on Dec 18, 2017

As much as I like kubernetes, I find the whole ecosystem to be frustratingly immature and unreliable. I'm sure it's trending in the right direction with respect to reliability, but it just doesn't feel like it's there yet. Minikube seems especially buggy and is missing some obvious features.

akvadrako · on Dec 19, 2017

Well minikube is not for production, so maybe it's buginess is evidence they are focusing their time on the more important components.

jaz46 · on Dec 18, 2017

Have you checked out Pachyderm (disclaimer: my company) for data pipelining on Kubernetes?

Our product has been available since the first Kubernetes release and is specifically designed for data analysis and pipelining tasks on Kubernetes.

github.com/pachyderm/pachyderm

pachyderm.io

cirowrc · on Dec 18, 2017

That's so great to hear! I love to see how it empowers small teams to deliver big stuff.

What was the trickiest part for you when it comes to real production usage?

linkmotif · on Dec 18, 2017

Using it on GKE. Literally no tricky parts except figuring out how to keep my images on GCR in sync with my manifests. For that this tool is quite helpful: https://github.com/dminkovsky/kube-cloud-build.

Otherwise it's been such a pleasure. PVs and StatefulSets are a total game changer. Looking forward to 2018.

EDIT: I should say RBAC has been a little frustrating. But obviously it's for the better.

linkmotif · on Dec 18, 2017

Couldn’t have said it better myself. I am so grateful.

ec109685 · on Dec 18, 2017

What type of infrastructures do you run kubernetes on?

slap_shot · on Dec 18, 2017

We run Kubernetes on all the three major clouds: AWS, GCP and Azure.

linsomniac · on Dec 18, 2017

I really wanted to like Kubernetes, but I couldn't figure it out. I'm a pretty experienced Linux admin, and despite days of reading all the sources of documentation I could find I couldn't get the networking setup to where I thought it should be.

I don't know if that was a problem of my mental model being wrong and the documentation not educating me on the correct model, or the documentation of the CNI, Flannel/whatever just being too sparse, or what.

I started with the Google Ubuntu packages, and was able to get containers up fairly quickly, but being able to access them from anything other than the host machine I just couldn't figure out. Asking for help on IRC wasn't useful (aside: what's the deal with IRC channels with tons of users but no messages for days?)

I've since wiped that cluster and tried the Ubuntu Kubernetes installation using conjure-up, it seems like it did the right thing but I'm not sure yet how to get to a cluster from a single machine.

Trying to decide if I pursue Kubernetes more, try Dokku/Flynn/Deis (which I just saw referenced in a previous HN discussion) because they look really promising...

I had hoped it would be the ganeti of containerization, but I had far fewer problems with ganeti.

nickstinemates · on Dec 18, 2017

You should take a look at Rancher[1]. I think it has the right mix of raw access to Kubernetes and associated tools, but also addresses some of the requirements you have.

1: https://rancher.com

Disclaimer: I've been working in the container ecosystem for a number of years, currently at Rancher.

slap_shot · on Dec 18, 2017

FWIW, if you create a GKE cluster and create an nginx deployment:

kubectl apply -f https://k8s.io/docs/tasks/run-application/deployment.yaml

And simply create a Load Balancer:

https://kubernetes.io/docs/tasks/access-application-cluster/...

You have nginx exposed via an external IP address in about two minutes. Once you're comfortable with that, you can start working in Ingress and such to get more sophisticated.

I agree the IRC/Slack channel communication isn't the best. I prefer the Kubernetes User Group. You can usually get a great response in less than a day. I've used it a lot.

linkmotif · on Dec 18, 2017

I was going to suggest this. Create a GKE cluster and play with it for a while to see what you’re supposed to have. Then you can create one yourself from scratch once you understand the target.

merb · on Dec 18, 2017

well that's what I tried. and basically I get:

> gcloud container clusters get-credentials cluster-1 ...trimmed

> Fetching cluster endpoint and auth data.

> ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=404, message=The resource "projects/infinite-alcove-164114/zones/europe-west3-c/clusters/cluster-1" was not found.

> Could not find [cluster-1] in [europe-west3-c].

> Did you mean [cluster-1] in [europe-west3]?

note that neither gcloud alpha, nor glcoud beta nor europe-west3-(a|b|c) worked.

prolly a bug when using `Regional (Beta)`.

swozey · on Dec 18, 2017

I was in the Regional Alpha so I'm familiar with this but it's been awhile since I had the issue. The fix I believe was;;

export CLOUDSDK_CONTAINER_USE_V1_API_CLIENT=false

Note; It's in beta now and I believe --region was added to the recent gcloud command but I could be incorrect. If you have problems msg me on k8s slack, mikej, I can run through it with you.

I don't believe you need this at all now that it's beta, but I set up my beta regional cluster a few weeks ago.

Also, re: IRC, there are 27k people on the k8s slack. I've used k8s for years and never gone into irc. I just don't use irc anymore.

merb · on Dec 18, 2017

thanks for your help, but I guess I'm fine now (just trying out stuff) well the ENV probably would have helped but there is also a config: `gcloud beta config set container/use_v1_api_client false` and I needed the --region (I always used --zone, since that was what the GUI suggested).

thesandlord · on Dec 18, 2017

Try:

gcloud beta clusters get-credentials cluster-1 --region europe-west3

You need the beta command, and you need to set the region flag instead of the zone flag. Looks like there are some documentation and UX gaps, I'm opening tickets to get them fixed (I work for Google).

merb · on Dec 18, 2017

hm... thanks. I also needed to set `gcloud beta config set container/use_v1_api_client false`. well I also was to lazy to just copy & past from the connect window.

linkmotif · on Dec 18, 2017

Make sure you setup gcloud and kubectl bash autocomplete. Kind of essential to me, especially kubectl.

sercand · on Dec 18, 2017

I am not a Linux expert, but I deployed clusters to 5 Raspberry Pi and Azure with custom terraform scripts. With the kubeadm things are much more straightforward now. My suggestion is to start with minikube with kubeadm bootstrapper than use more complicated solutions. https://kubernetes.io/docs/setup/pick-right-solution

gnufied · on Dec 18, 2017

Most of kube-users hang out on slack. I am not saying you will get definite answers but as a k8s core contributor - I can definitely say that most developers are on slack rather than IRC.

linsomniac · on Dec 18, 2017

I've found one way past it: kubespray

I spent the morning listening to presentations from Kubecon last week, and a guy mentioned it in passing in one of the presentations. Went and looked at it and it's an Ansible playbook for deploying Kubernetes clusters. Respun my test nodes with a bare install and ran kubespray and I was up and running with Calico networking all set up, I'm able to get to the test pod I set up from anywhere.

So pro tip: Kubespray, especially if you are familiar with ansible!

btmiller · on Dec 18, 2017

Your experience is a real shame, truly. If Kubernetes could fix their horrendous documentation and onboarding story, I think you'd have a much better experience. After bruising past that battle, I found Kubernetes to be the perfect mix of sanity and opinionated architecture. I think part of it is also due to how rapidly it's evolved over the past 2 years, so hopefully you get a chance to revisit it once some of those rough edges are smoothed over!

org3432 · on Dec 18, 2017

Watch tutorials by Kelsey Hightower, he's been helping folks understand the differences and the models. e.g. https://www.youtube.com/watch?v=HlAXp0-M6SY

atombender · on Dec 18, 2017

I don't know what IRC channels you're referring to, but the Kubernetes Slack is very active. I've gotten a lot of help there. It also has channels where the devs are active.

ksajadi · on Dec 18, 2017

I feel what you are going through. I wrote this last week trying to help: http://blog.cloud66.com/8-components-you-need-to-run-contain...

tetraodonpuffer · on Dec 18, 2017

not sure if it helps, but I recently did a series of posts on my blog on installing Kubernetes from scratch on a set of CoreOS guests on Xen, with flannel, RBAC, TLS and so on and it does give you a cluster that you should be able to access from the outside no problem.

Link in my profile, I tried to make as low in magic as possible, I just ended up writing a couple of bash functions and a python script to make it easier to assemble the Ignition files for the deployment, but besides that it should all be step by step.

user5994461 · on Dec 18, 2017

No worries, we understand the pain, we've all been there.

The abstracted network is a constant source of headache when dealing with kubernetes and containers.

jcastro · on Dec 18, 2017

Kubecon just wrapped up and all the videos of the talk are up, lots of good content here: https://www.youtube.com/channel/UCvqbFHwN-nwalWPjPUKpvTA/vid...

cirowrc · on Dec 18, 2017

I'm particularly excited about the advance of the Container Storage Interface (CSI).

Having developed two custom docker volume plugins I wonder whether Docker is going to adopt it for its cluster offering (swarm mode) or leave it aside as it did with CNI (real question). Anybody knows about it?

dward · on Dec 18, 2017

It's to be seen but owners of the spec represent mesos, kubernetes, docker and cloud foundry so it seems like Docker is open to the idea. source:

https://github.com/container-storage-interface/spec/blob/mas...

cirowrc · on Dec 18, 2017

oh, I didn't look at that. Thanks!

eduren · on Dec 18, 2017

Its really fascinating to see Windows support be added throughout the container ecosystem. Does anybody have experience actually moving legacy Windows applications into a containerized infrastructure?

reacharavindh · on Dec 18, 2017

Acknowledging the risk of sounding like "why can't I use rsync scripts instead of this Dropbox?",

Can someone explain to me like I just got out of high school, why do I need to complicate my setup with Kubernetes/Docker Swarm/Deis/Dolly/Flynn instead of spinning up&down containers using scripted LXC commands?

Someone was talking about using Kubernetes for data pipelines. For such a case, why is Kubernetes better than a script that spins up pre-made container images?

Not being snark at all. Genuinely asking a "why?". Appreciate any educating responses!

xorcist · on Dec 18, 2017

Well, if you took the risk of sounding snarky, I might as well take the risk of offending a number of projects as well. The following is my personal opinion only.

From touching this stack quite a bit over the past year, my lasting impression is that most people don't actually need it, but that at the same time one shouldn't decry the value of toying with a new tech stack.

(By the way, I get the same impression from many Hadoop users who carry this large infrastructural debt from building out much too early.) Many people gravitate towards these things in order to learn them better, which can be wise on a personal level should they become more popular.

For me personally, I think the value with Kubernetes is if it becomes a provider independent cloud platform. Bascially what Openstack couldn't deliver, Kubernetes takes the core of and delivers an orchestration platform on that is actually useful. So while you shouldn't invest resources in it solely because it's a new stack, the design is sane and the community is there.

donaldguy · on Dec 18, 2017

Mostly the answer is "if you want to do it all yourself, go for it";

Things like this, and especially Kubernetes, are about creating a common and agreed upon language for infrastructure - both patterns and specific "off the shelf" applications

It's very nice to be able to start a Proof of Concept data pipeline with `helm install incubator/kafka` (https://github.com/kubernetes/charts/tree/master/incubator/k...)

And likewise the common language/interface creates opportunities for wider coverage and easier adoption in tooling for monitoring, autoscaling, smart routing (see "service mesh"es), etc.

You may not need all of k8s functionality immediately but it's the agreed upon common core for modern distributed systems functionality

And to everyone's credit it's also being done pretty much as modularly and extensiblely as we could hope for

nielsole · on Dec 18, 2017

Kubernetes is not supposed to be run on a single node, but on dozens or hundreds of machines. Scheduling, networking, and authorizing heterogeneous highly available workloads by possibly different teams + managing ingress + managing networked storage is what Kubernetes is built for.

I wouldn't know how to do this with LXC

colek42 · on Dec 19, 2017

The API for k8s is great. We mostly run large (50 core 500GB memory) machines co-located at client sites. We have hundreds of these to manage, but they are separated geographically. K8's API/Infrastructure would be great to manage/monitor them.

nielsole · on Dec 19, 2017

Sounds like an interesting problem. What is your use-case? Caching servers in ISPs? Kubernetes kind of assumes that any node can fail any time and reschedules workloads. This assumption might not be in line with your requirements.

_euac · on Dec 18, 2017

In general, I tend to think that any tooling I can avoid writing (and documenting, and debugging, and maintaining...) is an advantage. As someone who has done the whole "running my own containers from bash scripts and cronjobs", Kubernetes (and in particular the hosted Kubernetes clusters provided by Google Cloud, Azure and AWS) does a great job of managing a lot of crap I don't want to deal with, such as:

- Networking setup and service discovery - Managing container failure - Secret storage - Deployments (in particular, sensibly-managed Rolling Deployments) - Container-cluster size management - Ingress management (load balancers are a pain in the ass to manage from bash scripts) - Resource monitoring (Kubernetes comes with a nice interface to watch the cluster and modify it dynamically)

There's nothing stopping you from building all of that from scratch, but why bother?

DigitalJack · on Dec 18, 2017

I agree with the "because it's fun to learn" answers.

But I'd also add the reliability factor of keeping a quorum of containers running. It's a tool for dealing with the unreliability of cloud infrastructure, and unexpected process death.

manishjhawar · on Dec 19, 2017

I've been in your shoes using LXC way back but moved to Docker fairly quickly when it came out, so I kinda get your position.

Now, Kubernetes/Docker provide better tooling with added benefits. Docker provides a Dockerfile based declarative approach of building the setup and storing it in an image and then running it as a container with greater flexibility and more features than a typical LXC based script. Kubernetes takes the running of multiple containers to another level capable of managing 100s or even 1000s of them for scalable operations.

RTFM for the details :-)

HTH!

akvadrako · on Dec 19, 2017

Why would you write scripts or learn LXC commands when you can just write a k8 deployment? Even if you don't consider all the more advanced features, it's just easier and declarative and supported natively by several cloud providers.

drdaeman · on Dec 18, 2017

Maybe you don't. I wouldn't recommend anyone to use something extremely complex for mission critical architecture, unless unavoidable. If you feel the same, it's probably a perfectly sound reason to avoid Kubernetes if there's no necessity.

When I wrote "extremely complex" I did't mean all that models and concepts from user documentation - they're the easy part (although I haven't used K8s much and probably missing a lot about its user-facing parts). I mean the internals. Because when things break and cluster goes down it's the internals that matter.

I use Docker instead of plain LXC because it adds convenience of layers and versioning while not introducing any significant complexity. I had some issues with libnetwork (which sometimes behaved oddly when containers crash under high load or docker-proxy gets somehow stuck) but I can live with it.

I also prefer Swarm to manual networking. It's trivial to set up and just works under normal circumstances. And I feel confident that I can fix issues reasonably fast. I had experimented with it under some odd scenarios (like a heterogeneous cluster with a mix of x86_64 and armv7h nodes) and when it had (obviously) failed, I was able to quickly find the relevant source code, read it, understand the details and correct the problem (which was trivial but not documented). And if I wouldn't be able to, I'm absolutely certain I would be able to scrap the cluster and script reprovisioning with standalone nodes, Compose and ad-hoc VPNs - I did this when Rancher (old version) had failed me badly.

Now, maybe I'm just too stupid but I can't say this for Kubernetes. I've tried to experiment with it - also under explicitly unsupported conditions - and tried to dig into its documentation and code when it had failed (obviously it did, I was asking for it). However, I was literally overwhelmed by how much there is. What I've learned is a little tiny bit about CNI internals but the primary result was a conclusion "sorry but nope, that's not something I feel comfortable dealing with if I'm to support it". And I don't mean that it fail-prone or anything - just that every software project has its bugs, and sometimes they manifest themselves at really inconvenient times.

On the other hand, Kubernetes is very nice when it's not you who's responsible for its operation but you're just an user. Spin a cluster on GKE and you know Google would take care of it - you'd just have to write some YAML to describe your project and enjoy. In this scenario, I think I can recommend it - even though I haven't ran anything serious on K8s but some toy projects worked without issues and were easy to deploy and maintain. And Kubernetes feels more mature than other options, having lots of nice functionality, supposedly covering a lot of different use cases and scenarios. E.g. Swarm feels like it lacks a lot of stuff - I've subscribed to a lot of tickets for features that I wanted but that aren't yet there.

Just a personal opinion.

coryodaniel · on Dec 18, 2017

Keep up the good work. You guys/gals make my job easy. Infinity thanks, I owe you a drink (or three).

thomasahle · on Dec 18, 2017

I have a simple service I would like to put in production. It is a simple ML code + Rest API + testing sandwich.

I only have one (virtual) server to run this on, but I still think Kubernetes looks really helpful, allowing me to throw new versions in test and update the production server without downtime.

Minikube is really nice to use, and everything seems to work well, however everywhere I read, I am told minikube shouldn't be used in production. Can anybody tell me if there's a reason for this? And if so what is the recommended setup for a single-node Kubernetes?

(Side question: Why does minikube run everything in a virtual machine? Doesn't that hurt performance?)

rckrd · on Dec 18, 2017

Minikube runs in a virtual machine because its meant to be a cross platform development environment, and linux containers aren't natively supported on macOS or windows.

Minikube does have a --none driver, which run on bare metal and can be used to provision a cluster on virtualized hardware, such as a CI system or a GCE instance.

Why not minikube in production? Upgrades, downgrades, clean-up and maintenance. We don't offer any real guarantees on these lifecycle operations unlike a more robust setup like GKE.

Disclaimer: I work on minikube

snug · on Dec 18, 2017

Kuberenetes is meant to be a self-healing always up infrastructure, if your minikube goes down, it cannot heal fast enough, and your services will go down. It is recommended to have at least 3 nodes.

iamwil · on Dec 18, 2017

Huh, I've been wrapping my ML models into a flask app and deploying it as well. If you don't mind me asking, what are you using the ML model for, and what are the current suck-ass parts of deploying it?

dosethree · on Dec 18, 2017

You can get what you want with capistrano and a something like unicorn taht supports no downtime deploys