Container technologies at Coinbase: Why Kubernetes is not part of our stack

dvt · on June 8, 2020

> We would need to build/staff a full-time Compute team

This actually was a very real problem at my current job. The data pipeline was migrated to k8s and I was one of the engineers that worked to do that. Unfortunately, neither myself (nor the other data engineer) was a Kubernetes guy, so we kept running into dev-ops walls while also trying to build features and maintain the current codebase.

It was a nightmare. If you want k8s, you really do need people that know how to maintain it on a more or less full time schedule. Kubernetes is really not the magic bullet it's billed to be.

> Managed Kubernetes (EKS on AWS, GKE on Google) is very much in its infancy and doesn’t solve most of the challenges with owning/operating Kubernetes (if anything it makes them more difficult at this time)

Oh man this hits home. EKS is an absolute shitshow and their upgrade schedule is (a) not reliable, and (b) incredibly opaque. Every time we did a k8s version bump, we'd stay up the entire night to make sure nothing broke. We've since migrated to cloud functions (on GCP; but AWS lambdas could also work) and it's just been a breeze.

I also want to add that "auto-scaling" is one of the main reasons people are attracted to Kubernetes.. but in a real life scenario running like 2000 pods with an internal DNS, and a few redis clusters, and Elastic Search, and yadda yadda... it's a complete pain in the butt to actually set up auto-scaling. Oh, also, the implementation of Kubernetes cron jobs is also complete garbage (spawning a new pod every job is insanely wasteful).

dvcrn · on June 9, 2020

I work on a 2-person project and decided to go with kubernetes (through digitalocean) for the cluster. I am managing everything with terraform and I don't have any big problems. I like that I can write everything as terraform manifests, have it diffed on git push and applied to prod if I want to.

Sure it had a learning curve but now I just describe my deployments and k8s does the rest, which then reflects back on digitalocean. If I need more power for my cluster, I increase the nodes through digitalocean and k8s automatically moves my containers around how it deems fit.

I used normal blue/green deployments on self-managed VMs in the past, then worked with beanstalk, heroku, appengine and I much prefer k8s. Yes it's easier on heroku, but try to run 2-3 different containers on the same dyno for dev to keep cost down. On k8s I can run my entire stack on one single small digitalocean $10 VM if I wanted to.

I wouldn't even know what I else could pick that gives me equal flexibility and power?

nednar · on June 9, 2020

> I used normal blue/green deployments on self-managed VMs in the past, then worked with beanstalk, heroku, appengine and I much prefer k8s. Yes it's easier on heroku, but try to run 2-3 different containers on the same dyno for dev to keep cost down. On k8s I can run my entire stack on one single small digitalocean $10 VM if I wanted to.

So you already spend about a decade learning all the skills. What the other guy is talking about coming from dev not from ops. If you come from dev you don't necessarily know what an ingress or egress is, and might never have done a blue/green deployment etc. This is all stuff that needs to be learned first. I worked with many many teams who had zero skills in data center tech before they were moved to k8s full time.

I personally like it to learn all that stuff. And I love that my job requires it now. But it's more like vim than like node-red, and that was a shock for many people, from engineer to EVP.

arcticfox · on June 9, 2020

I'm also on a 2-person project on DigitalOcean k8s, also very happy.

K8s is kind of messy compared to Heroku, which I don't love, but is also way more powerful and can be more secure. I don't know what I'd use instead of it, exactly as you said.

Also, we run a VPC-only K3s node for some simple internal tools that works great as well.

Lukas_Skywalker · on June 9, 2020

For those missing Heroku: there exists Dokku [1], a small Heroku-like implementation for container management. It uses the same underlying buildpacks and you get the same comfort as with Heroku. And it's free to use. You can't deploy to multiple host machines though. But for small projects that fit on a single host, it's very nice to use.

[1] http://dokku.viewdocs.io/dokku/

GOTERTHe · on June 9, 2020

Tried Dokku, but found CapRover [1] to be a much better / easier option

[1] https://caprover.com/

jamil7 · on June 9, 2020

Looks great! considering migrating from dokku. I also came across exoframe recently which looks lower level but works with your existing docker projects https://github.com/exoframejs/exoframe

Lukas_Skywalker · on June 9, 2020

Better for what reason?

It seems Dokku wins at "easier" since in many cases, you can just push the application code you used for development and the required stack is automatically detected. Adding a database is two commands. No need to know Dockerfiles.

kasra85 · on June 9, 2020

It's easier because for many cases you don't even need to search the document ro know what command to run to fire up a database. You just select it from a GUI list.

gnud · on June 9, 2020

Looks interesting - but the installation instructions put me off a bit. Open a port on your server, and don't change the default password `captain42` - then run a cli tool from your dev machine.

I'll look more into it, but it didn't really inspire confidence.

kasra85 · on June 9, 2020

You can choose a custom password during installation

gnud · on June 9, 2020

I just don't understand why you can't change the password (or even better, choose a certificate) before you open the ports to the world.

dvcrn · on June 9, 2020

> Also, we run a VPC-only K3s node for some simple internal tools that works great as well.

We do exactly the same thing! We have a one-node k8s for all these dev things that just works. Everything is containerized for local dev anyway so moving it to k8s was just writing the deployment manifest.

On heroku, all of these would be separate dynos (or one glued-together dyno that does everything). On a self-hosted VM we'd have to deal with managing that.

I liked this approach so much that I now have a small 1-node personal cluster that hosts all of my private hobby projects that aren't ready for prime time yet, that were on heroku previously. Costs me only $10 + (persistent storage + IP address if needed)

dfee · on June 9, 2020

I feel like I’m witnessing two co-founding colleagues - who sit by each other day in and day out - discover the other’s persona on HackerNews.

Sure, maybe you two (@dvcrn and @arcticfox) don’t work together and don’t know each other, but it’s definitely more entertaining imagining the scenario above.

charlieflowers · on June 9, 2020

“If you like pina coladas...”

KineticLensman · on June 9, 2020

https://en.wikipedia.org/wiki/Escape_(The_Piña_Colada_Song)

meigwilym · on June 9, 2020

You really should write an article detailing how all this is set up. It sounds fascinating.

ashtonkem · on June 9, 2020

Heroku is also very cheap on the low end. EKS costs at least $72/mo before adding compute & storage. That would get you a lot of dynos.

gigatexal · on June 9, 2020

I’m a bit dubious in the Heroku-is-less-secure-than-k8s claims

serverlessmania · on June 9, 2020

for 2 person project dokku is the smartest choice

rawoke083600 · on June 9, 2020

if you can run everything on a $10 vm...do you really need k8 ?

takeda · on June 9, 2020

You do if you do a resume driven development.

OOPMan · on June 9, 2020

Ooooh, that one's going to sting in the morning

dvcrn · on June 10, 2020

The point is to keep cost down while a project is in development, then scale depending on needs without having to worry about container distribution and resource utilization.

On dev I don't need a 10 node cluster if I had 10 containers running. One $10 vm is fine.

On prod I can start with a 3-node cluster for these 10 containers, then scale it up depending on traffic and needs while controlling my spending.

Not everyone has thousands of dollars of VC money to throw at hosting

merb · on June 9, 2020

most people probably want the following: - no downtime deployments - distributed jobs - as managed infra as possible

without k8s some things would be hard.

Juliate · on June 9, 2020

no downtime deployments? happened before k8s; used to do that several times with some haproxy.

distributed jobs? same; nothing prevents from spawning runners with adhoc libraries and queues.

managed infra? not specific to k8s

manigandham · on June 9, 2020

The "adhoc" part is the problem. K8S is standardized and offers high-availability, failover, logging, monitoring, load balancing, networking, service discovery, deployments, stateful services, storage volumes, batch jobs, etc. And it can run self-contained on a single machine or scale out to 1000 nodes.

Why piece all of that functionality together yourself into some fragile framework instead of using a industry standard?

spottybanana · on June 9, 2020

"Why piece all of that functionality together yourself into some fragile framework instead of using a industry standard?"

Quite recently developed "industry standard". Many tools mentioned have been used for tens of years, they work robustly, are well documented and there is lots of people who can use them. I personally would use the word "industry standard" a little bit differently.

manigandham · on June 9, 2020

You still have to put them all together into some custom solution just for your setup which adds overhead and fragility. New employees will have to learn that instead of using K8S APIs. Deploying new components can’t take advantage of the wide and fast growing ecosystem.

There’s really nothing like the full suite that kubernetes provides.

ethbro · on June 10, 2020

> New employees will have to learn that instead of using K8S APIs

You know tech has made it when people offer argumentum ad Java.

'Sure, X might not be the best solution, but you can always buy someone to do it.'

dividedbyzero · on June 10, 2020

It's also very empowering for people whose job isn't to run things, but to build the things that are to run. I've worked on a dozen-or-so of bespoke "industry standard" setups, and each and every one had a number of weird quirks, involved learning some new "industry standard" components and either made it very hard and dangerous for non-ops/devops/infra people to run their things themselves, or had homegrown tooling that pretty much replicated what the k8s API can do, just only a small subset and badly. Some YAML and kubectl are well within what a typical data scientist can be expected to understand, more so if it means they can run their things on a dev cluster themselves, and in a pinch, that data scientist can debug prod issues of their things, because it all works the same way. We have a very useful bot that was built and deployed by someone decidedly not ops while waiting for jobs to finish – a simple K8S deploymend YAML is like 50 LOC, with 40 being pretty much standard, et voilà, running bot, without having to build lots and lots of automation in-house or having to take up ops time to deploy this for them or having to grok advanced sysadmin-ing first. Used with appropriate caution and safeguards, it's super powerful.

cameronbrown · on June 9, 2020

> Quite recently developed "industry standard". Many tools mentioned have been used for tens of years, they work robustly, are well documented and there is lots of people who can use them.

k8s is based off of Borg, which has existed for far longer.

https://research.google/pubs/pub43438/

vertex-four · on June 9, 2020

"What Google does" is not an industry standard.

cameronbrown · on June 9, 2020

That's not what I'm saying. Your point was about the maturity of k8s which makes sense if it sprung from nowhere, but k8s encapsulates a lot of "lessons learned" of a very stable and mature product (even if proprietary).

skj · on June 9, 2020

Really it's just "inspired" by borg. Using both, they're really not very similar.

bschwindHN · on June 9, 2020

It can be worth it to piece things together yourself. A complex tool can also be fragile if you don't take the time to learn and understand every facet of it.

If you only need certain parts of what k8s offers, building those parts yourself can offer you more stability, control, and insight into what your application is doing.

As with anything else it varies case-by-case.

p_l · on June 9, 2020

Building it yourself is also a case of "resume-driven development".

Not that there aren't good reasons or good outcomes from having done a deep dive, but just putting it there.

bschwindHN · on June 9, 2020

I think using something like k8s prematurely is more an example of "resume driven development" than the other way around.

Building it yourself doesn't mean building all of it. For example, it's quite easy to get zero downtime deployments with a tiny bit of systemd configuration and the SO_REUSEPORT socket option. That seems easier for a team to understand than "here is kubernetes and everything that comes with it"

p_l · on June 9, 2020

That is, however, a very tiny slice of what might be needed.

You need to deliver the application to the machines somehow. You might need to configure the network to reach the application, etc. All quite common tasks.

(And honestly, systemd can bite you just as much if not worse than k8s definitions, because the API is much less cohesive and the defaults like to cut off your hands)

ownagefool · on June 9, 2020

Building one yourself is a good exercise for understanding something, I agree.

The problem with that logic is everyone on your team will need to do it, so you're going to be stuck picking a standard. Should it be yours, should it be mine, or should we both just learn something with a large community behind it?

Nothing is perfect of course, but k8s makes a really good target for CI/CD, which is something you want when you're developing as part of a team. If you're not quite a team yet and you don't know how to bootstrap k8s and CI/CD, then you need to figure out when those types of things are important.

Probably lots of people could stick with a monolith and a VM for longer than they did, but automated testing will save you a fair bit of time if you're not figuring out how to do it at the same time.

Juliate · on June 9, 2020

k8s is nice, yes. But, by its age, no more standardized now, than MySQL was 15 years ago.

k8s also adresses a very small portion of the market. If you have to scale, yes, you might need k8s. Chances are, you don't. Really.

I stopped counting the (supposedly big) customers that burnt themselves into k8s when they logistically need not to, and only brought organisational issues on them.

cdelsolar · on June 9, 2020

"some haproxy" - what is the haproxy configuration? I tried doing this myself and then realized setting up a cheap k8s cluster on DO was way easier. Now, yes, there have been occasional problems, but not since I just let DO handle the whole thing. I can do zero-downtime deploys, cronjobs (why is spawning a pod so wasteful? it spawns it, runs the job, and then kills it?), all with a single "kubectl apply" command that takes a split second to run.

bobbyi_settv · on June 9, 2020

If you're in AWS, you can use their ELB (Elastic Load Balancer) service instead of setting up your own haproxy. I worked on a team that used it for years without any issues orchestrating zero-downtime deployments. It was extremely easy and didn't require any real configuration.

p_l · on June 9, 2020

Having done it myself in the past?

Setting up and taking care of k3s is much easier.

merb · on June 9, 2020

yeah, we started out with ansible aswell. but some stuff is just way harder. especially no downtime. haproxy is fragile and sending files via haproxy over ssh aswell. docker is also a standarized package+repository format. before we used some kind of hacky cdn solution, etc. it was stuff glued together, written by myself and I was the only guy who understood it and ever will be.

serverQuestion · on June 9, 2020

if a $10 vps would work, but you want no downtime deployments heroku would work perfectly

serverlessmania · on June 9, 2020

I use dokku since the distributed job is not a feature i’m looking for

cameronh90 · on June 9, 2020

I use k3s to manage my Plex/NAS/home server machine. Just simply as a deployment tool, it's much easier for me than managing a collection of services deployed by Ansible/Puppet held together by custom scripts and systemd units.

azernik · on June 9, 2020

Sure, you can go without it. But you'll need to write a lot of annoying scripts manually.

wiradikusuma · on June 9, 2020

Could you share the resources to get started? I'm in a similar boat, 2 person trying to set up K8S in DO using automation (CI/CD).

dvcrn · on June 9, 2020

Sadly I don't have many resources I can refer you to (maybe someone else can add?) but digitaloceans kubernetes guides are excellent so I'd start with that: https://www.digitalocean.com/docs/kubernetes/how-to/

k8s has a loooot of stuff it can do and reading too many blog posts that go into too much detail can be intimidating, so my advice would be:

1. Create a dummy cluster on digitalocean (or docker desktop with k8s support / minikube), then setup kubectl to connect to it.

2. Start with the difference between resource types: What are deployments, what are pods?

3. Create a deployment manifest that just tries to pull a container from some registry, apply it with kubectl -f foo.yml

4. Play with kubectl to inspect things: kubectl get pods, kubectl get deployments, kubectl describe pod xxxxxx, kubectl logs xxxx

5. Learn how kubernetes ties things together through labels: Create a ClusterIP/LoadBalancer service and try to get it to balance to your pods from above through labels (https://www.digitalocean.com/docs/kubernetes/how-to/add-load...)

Deployments/services/(pods) are all you need in the beginning for running containers on k8s and exposing them. Of course then there are things like persistent storage but if your app is made to run ephemeral, you likely have storage/db setup externally already.

For running through the CI, once you have your manifests you could run kubectl apply directly through the CI if you wanted to. We are using terraform in front of k8s with hashicorps hosted state, then run `terraform plan`. If it passes, on dev we automatically apply to the cluster, on prod there is a manual step through the hashicorp admin UI that needs an apply trigger. Then there are more advanced tools like spinnaker that can be used to setup more complex pipelines on what to do on push.

snuxoll · on June 9, 2020

Gitlab CI is my weapon of choice here since it’s integrated nicely with Kubernetes. There’s a wealth of tools out there - but the work I do on managing PCGamingWiki is publicly available [1] to give you a starting point. I use Kustomize + kubectl, and when I need to rollback a deployment I can just do it from Gitlab’s environment page.

1: https://gitlab.com/pcgamingwiki/pcgamingwiki

sandGorgon · on June 9, 2020

Try k3s.io . You will have a much more pleasant experience

jrockway · on June 9, 2020

My experience is the same. I really like automating tedious and error prone parts of deployments, and Kubernetes is the best tool I've found for that. It is a lot to learn about, and there are a lot of missing features that people go to great lengths to build for themselves (see "service mesh" for example), but the core is very solid.

I like loosely coupling things, and Kubernetes is the first ecosystem where that has worked well for me. (OK, it worked great when I worked at Google, but a lot of effort was put into that by thousands of people.) For example, for the first time in my life, I automatically renewed a TLS certificate for my various personal projects. When I started using Let's Encrypt, I just manually ran certbot every 3 months when I got a warning email that my cert was expiring. That is fine, but it's kind of a waste of time. There are tightly coupled solutions to this problem, but they basically require you to totally commit to their approach (Caddy is a good example of this). I use Envoy, but Kubernetes let me not care. I run cert-manager, which just runs in the background and updates my certs when they need to be updated. It's stored as a Kubernetes secret, which can be mounted into my Pod as files. When the secret changes, the filesystem is atomically updated. Envoy can notice this and start using the new certificate. cert-manager doesn't know anything about Envoy and Envoy doesn't know anything about Let's Encrypt. So I'm not locked into any particular decision -- I can change my CA, and nothing about my frontend proxy has to change. I can change my frontend proxy, and nothing about my certificate management has to change. This, to me, is a big deal. I have one less thing to worry about, and I am not locked into any other decisions.

I also like the flexibility with which I can write programs to manage my infrastructure. All the primitives available to me as a programmer are high-level and well-tested, and are the same things that the CLI tools do. For example, in preparation for HTTP/3, I needed some way to get UDP traffic into my cluster. My cloud provider doesn't provide a load balancer for UDP, so instead I wrote a program that watches changes to Nodes from the Kubernetes API server, and updates a DNS record with the external IP addresses of all the healthy nodes. Then I can instruct browsers capable of HTTP/3 to use that DNS address to attempt an upgrade to HTTP/3, and it doesn't matter that my cloud provider can't do that at a lower layer in the stack. The alternative to this approach is to basically commit to having a certain IP address available, and keep that updated manually. It's fine, but again, one more thing to worry about. I can take this exact code, and it will work perfectly on any other Kubernetes provider -- so I'm not tied to DigitalOcean, and I'm not tied to any manual processes. One less thing to worry about.

I agree that a lot of people get into a situation where they have to move hundreds of apps and tens of nodes all at once, and under those circumstances, it sure is a lot of work to figure out Kubernetes compared to putting a band-aid on the problem and getting back to work. The biggest problem is that you are probably facing some sort of crisis, and have to decide, with very little experience, whether you want to use a managed offering or build it yourself. Building it yourself is quite complicated. What CNI plugin are you going to use (they all seem both wonderful and horrible on paper)? Why do you have to buy five nodes only dedicated to master tasks, like etcd? How are we going to upgrade to the next version with no downtime? You can go managed, but then you give up a lot of control. Who controls DNS at the node level (fun fact: container pulls don't go through the same DNS stack that the Pod will eventually use)? How can you use gVisor to isolate pods from the host kernel? (You can't! You will have to run it yourself.) Compromise fatigue is going to kill you here -- you have a crisis, and all the options are bad. (I've been there myself. I started using Kubernetes because our Convox Rack was so outdated that we couldn't deploy new software anymore. We tried upgrading things, but it broke things even more. So until we got k8s working, and converted every workload from a proprietary format, we couldn't deploy software. It was frustrating. But the reality is that I wanted to switch a long time ago, so the transition was quite smooth, with no prior real-world experience. And now this problem won't happen again, because tens of thousands of people know how to deal with Kubernetes.)

I also agree with this article that Amazon's managed Kubernetes offering is terrible. EKS was my first Kubernetes experience, and it was clear to me that Jeff Bezos walked into someone's office and said "we need Kuberthingie in two weeks or you're all fired." The team saved their jobs, but that's about it. It's very much the managed Kubernetes solution for people that are locked into AWS already. What people really want is not Managed Kubernetes but "namespace as a service". They just want to kubectl apply something and let a background task provision their machines. They don't want to screw around with RBAC, service meshes, managing the Linux distribution on their worker nodes, managing the master nodes, etc. That service unfortunately doesn't exist. Maybe send me an email if you want to work on something like this, though, because I certainly do ;)

In summary, I get the pain points, but I think they are worth embracing. Things aren't perfect, but you are going to have pain points at all the big breakpoints in infrastructure. Going from 0 applications to 1 application is going to be a major change for your team/company. Going from 1 application to 2 applications is also going to be a major change, but most people overcome this with sheer willpower and tedium until they hit something like 10 or 15 applications, and then are up a creek without a paddle. I recommend embracing future growth early, so that your second application is as easy to run as your first. It's not hard, it's not time consuming, it's just very different from "I'll pop in a Debian CD and rsync our app over."

sciurus · on June 9, 2020

> What people really want is not Managed Kubernetes but "namespace as a service". They just want to kubectl apply something and let a background task provision their machines. They don't want to screw around with RBAC, service meshes, managing the Linux distribution on their worker nodes, managing the master nodes, etc. That service unfortunately doesn't exist.

I think you just described AWS Fargate and Google Cloud Run.

https://aws.amazon.com/fargate/

https://cloud.google.com/run

jrockway · on June 9, 2020

AWS's version is pretty half-baked. You can't provision or use persistent volumes (so no stateful apps), and you have to use their load balancer which terminates TLS (preventing your software from being able to do ALPN, using Let's Encrypt, supporting HTTP/3, etc.).

Cloud Run just seems like standard "serverless" stuff, nothing to do with Kubernetes. (The downsides involve not being able to run applications that are designed to run on a generic Linux box; everything has to be specially developed. That is fine, and they have open-sourced all the tools necessary to move off of them so you aren't locked in, but it's a bigger paradigm shift.)

thomasrognon · on June 9, 2020

I use Cloud Run and nothing is specially developed for it, other than just being stateless. I can take my container and run it on a generic Linux box with zero changes (in fact, I run it in WSL2 on my Windows machine all the time). And I just install the normal Node.js, imagemagick, etc in my Dockerfile, no special builds or flags.

jrockway · on June 9, 2020

That sounds pretty compelling then.

mauflows · on June 9, 2020

Cloud Run uses KNative which is a stateless extension of k8s

tarsinge · on June 9, 2020

If it runs on a single instance then a simple docker-compose file give you all of that.

quickthrower2 · on June 9, 2020

I'm on a 0.05 person project, using K8S through AKS

theptip · on June 8, 2020

> > We would need to build/staff a full-time Compute team

I'm not sure I get this objection. Presumably in a company like Coinbase there is already an infrastructure team that runs the AWS instances, helps build the AMIs, etc. This team could re-tool and hire some k8s experts to help them make the shift. The promise of k8s (at least one of them) is that you can do more with less ops resource, since the system is so programmable. The idea that you'd need a completely new full-time team doesn't grok for me; that new team should replace another team that's no longer needed, or more likely, involve a combination of hiring some experts and retraining your existing engineers.

I do take seriously the other issues raised RE: security (though one GKE cluster per security boundary is a perfectly reasonable approach and gets you further than you might think).

> Unfortunately, neither myself (nor the other data engineer) was a Kubernetes guy

I think this is a different issue than the OP was raising; in any case, in order for a technology to succeed, you need to have subject matter experts embedded in your dev teams, or a separate function that provides the service to the teams that use it.

In the context of the OP, I think your case would be more like saying "hey data team, you need to build your data jobs into AMIs, go figure it out". Regardless of the technology chosen, it's not going to succeed if the teams doing the work don't know how to use the tools.

CorpOverreach · on June 9, 2020

> that new team should replace another team that's no longer needed, or more likely, involve a combination of hiring some experts and retraining your existing engineers.

The gotcha there is that it rarely goes that way unless you have a very clear direction from senior management, at least at big corps. In most cases, it's just another thing that gets added to the pile, and it's incredibly difficult to migrate entirely out of whatever the old solution was, so now you end up supporting both.

birdyrooster · on June 9, 2020

You should always have a next gen and a production version of your infrastructure as a code.

brown9-2 · on June 8, 2020

Note that they already have a team who has built their current compute platform, who built the pipeline to run containers/processes on VMs with auto scaling groups.

It’s great if their own solution works well for them at less cost, but that system didn’t built itself and has non-zero maintenance costs.

rumanator · on June 9, 2020

> Note that they already have a team who has built their current compute platform, who built the pipeline to run containers/processes on VMs with auto scaling groups.

In that case I would expect Coinbase to write blog posts on how their setup is the absolute best solution to their problem, and not how they refrain from adopting the best solution to their problem because they claim they don't have anyone on the team that is able to pull that off.

tso · on June 9, 2020

> Presumably in a company like Coinbase there is already an infrastructure team that runs the AWS instances, helps build the AMIs, etc. This team could re-tool and hire some k8s experts to help them make the shift.

The key is that there is a lot of additional services and interface points to handle. As the Coinbase article noted, you need extra pieces on top of k8s (storage, service mesh, config/secrets, etc) that need care and feeding. Even if the company moved 100% of their services into k8s there's now more work to be done for the same level of service.

: The control points that k8s exposes are not simple "drop in your provider here" bits of integration. You would likely still have the same core providers (ex: EBS for storage) but there is now more code running to orchestrate them, and more access control to implement and verify.

theptip · on June 9, 2020

My personal experience (4 years on GKE in production) has been the opposite; running on k8s has abstracted away a number of things that I’d otherwise have to engineer.

Volumes just get attached (using PersistentVolumeClaims), and automatically migrate to a new node of the original pod dies. Vs. having to do some sort of rsync between nodes to keep disks in sync.

Secrets get encrypted by k8s and mounted where needed. I would agree that RBAC is a bit tricky but I don’t think it’s harder than IAM provisioned with Terraform.

If you are not using a service mesh for your VMs then you don’t need one in k8s. (I don’t use one, and rolled TLS to the pod in less effort than it would take to maintain TLS to the VM). The reason you want a service mesh is to abstract TLS and retry mechanics from the application layer - i.e. make your service authors more productive. If you don’t use a service mesh then you are back to managing TLS per-service, which is where you are with VMs already.

There are definitely more services you _could_ run, but in my experience these are additive, I.e. they are extra work, but give you a productivity boost.

Anyway, YMMV and I haven’t operated a system as large as Coinbase, so I could be missing something. Interested in hearing others’ experiences though.

rumanator · on June 9, 2020

> As the Coinbase article noted, you need extra pieces on top of k8s (storage, service mesh, config/secrets, etc) that need care and feeding.

The problem with that assertion is that it does not make any sense at all. For instance, storage and config/secrets is already supported out-of-the-box with Kubernetes. Even so, complaining about storage with Kubernetes is like complaining about EBS or EFs or arguably S3 in AWS. And if you feel strongly about service meshes then you really aren't forced to use them.

> Even if the company moved 100% of their services into k8s there's now more work to be done for the same level of service.

There really isn't. For example, if they go with managed Kubernetes solutions then the only thing they need to worry about is to actually design their deployments, which would be very strange if they couldn't pull off. That's a ramp-up project for an intern if the solution architecture is already in place.

> You would likely still have the same core providers (ex: EBS for storage) but there is now more code running to orchestrate them

There really aren't. Kubernetes' equivalent to EBS is either setting up a volume or a persistent volume claim on a persistent volume. Just state how much memory you want and you're set.

flowerlad · on June 8, 2020

> If you want k8s, you really do need people that know how to maintain it on a more or less full time schedule.

What is the alternative to k8s that does not need people to have any technical knowledge?

To me Kubernetes is extremely attractive because it helps me avoid learning cloud vendors' proprietary technologies. K8s is learn once, use everywhere, which is fantastic.

I am a 1-person venture doing everything from JavaScript/React to maintaining backend infra, and I couldn't have done it without k8s.

zaptheimpaler · on June 8, 2020

Plain old linux is the alternative, which is also "learn once, use everywhere" whether its AWS EC2 or GCP Instances or nearly any machine under the sun.

I don't see how k8s avoids the need to learn about cloud vendor specific tech. e.g searching "aws RDS k8s" gives me a beta github packages and a bunch of blog posts on how to configure it right. It doesn't sound like much less work than learning how to use RDS without k8s - read their docs, figure out the API.

Maybe i'm an "old man yelling at new tech" but meh i just see very little value in k8s because you inevitably need to understand the layer beneath - linux (k8s is far from a non-leaky abstraction imo), PLUS all the complexity of k8s itself. I do see the value when managing a big and complex infra with 100s of servers or something, but very few people have that problem.

flowerlad · on June 9, 2020

> Plain old linux is the alternative

How do you run an application on a cluster of plain old linux machines? How do you do load balancing? How do you scale up and down? How do you update your app without downtime? How do you roll back easily if something goes wrong? How do you ensure all your servers are running the same version of dependencies? How do you update those dependencies? How do you replicate your environment if you want to add a new server to your cluster? If your app has microservices how do services discover each other? How do you mount volumes from cloud storage? How do you update configuration? How do you automatically restart failed applications? How do you monitor if your applications are working? How do you make sure the right number of MongoDB replicas are running at all times? How do you view your log files remotely? How do you port-forward from localhost to your Linux server to test your app locally?

jen20 · on June 9, 2020

These are commonly raised concerns, all of which have answers much simpler than "install this giant distributed system". I'll go ahead and answer them since I take the questions to be in good faith...

> How do you run an application on a cluster of plain old linux machines?

Build a package, install it in an image, run that image in an autoscaling group (or whatever equivalent your cloud of choice offers).

> How do you do load balancing?

An Elastic Load Balancer (v1 or v2), HAProxy, an F5 - this is deployment environment specific (just like in Kubernetes).

> How do you update your app without downtime?

Blue-green deployment, or phased rollout.

> How do you ensure all your servers are running the same version of dependencies?

Build them from a common image.

> How do you update those dependencies?

Update the Packer template that builds that image.

> How do you replicate your environment if you want to add a new server to your cluster?

Start the server from the same image.

> If your app has microservices how do services discover each other?

Consul, or DNS, depending on your appetite.

> How do you mount volumes from cloud storage?

It's a bit unclear exactly what you mean here, but I'll assume you mean either block devices (just attach them at machine boot, or on startup if they need a claim), or NFS.

> How do you update configuration?

Either update Consul and have it propagate configuration, or update a configuration package and push it out.

> How do you automatically restart failed applications?

Systemd restart policy.

> How do you monitor if your applications are working?

From outside - something like pingdom, and some kind of continuous testing. It's critical that this is measured from the perspective of a user.

> How do you make sure the right number of MongoDB replicas are running at all times?

Somewhat flippant answer here: the right number of MongoDB servers is zero. More generally, by limiting the size of an autoscaling group.

> How do you view your log files remotely?

Cloudwatch, Syslog, SSH (depending on requirements).

> How do you port-forward from localhost to your Linux server to test your app locally?

SSH.

hardwaresofton · on June 9, 2020

So if you'll indulge me -- this list is exactly why a system like Kubernetes is valuable and why I think personally that it contains a lot of essential complexity.

Kubernetes attempts to do all of the above, which is why it's so massive, and I'd argue it's actually less complex than knowing all the tools above -- but it's an equal degree less universally applicable. In this way, it's perfect for the dev who never wants to "do ops", and less so for the dev that already knows ops (or any regular sysadmin/ops person), because they already know all these answers.

15155 · on June 9, 2020

Devops person here: I already know all of these answers and would still choose Kubernetes over hand-rolling all of this again and again.

"Just use Packer, some AMIs, some ASGs, some CF templates, some ELBs, some EC2 instances!"

No thanks: I'll Terraform an EKS cluster in 30 lines of HCL and deploy my applications with a Dockerfile and a handful of YAML files.

hardwaresofton · on June 9, 2020

> CF Templates

Yeah, this is where it kicked in for me. Never mind the fact that all of that is AWS specific and absolutely doesn't help you if you ever move clouds. Great to know all the stuff below, but Kubernetes is a wonderful abstraction layer above that stuff, and it gets better every day.

CF could have become Kubernetes -- it was supposed to be, but it just never got the mixture right (and of course is AWS exclusive).

jen20 · on June 9, 2020

I find the ridiculous false dichotomy between Terraform for Kubernetes and Cloudformation for more basic infrastructure even more ironic given that I am still the eighth most prolific contributor to Terraform _over three years after leaving HashiCorp_.

ex_amazon_sde · on June 9, 2020

Both options are questionable.

lsofzz · on June 9, 2020

> So if you'll indulge me -- this list is exactly why a system like Kubernetes is valuable and why I think personally that it contains a lot of essential complexity.

Yes. I would agree to your statement precisely as an answer to @jen20.

Some things such as getting stateful systems, HPAs and persistent storages were a little tricky initially but a breeze after.

But I do want to mention that you really really need a team to look after it. Without it, it'll bound to be another snowflake.

[edit]: <sigh/> i meant to say stateful when i wrote stateless.

trepatudo · on June 9, 2020

Thank you. I use most of this, I've been using it for years and I just don't talk about it because it's hard to argue when people just want to force an idea that k8s is "really the best way of doing things".

Also, haproxy is one of the most reliable software I've ever used.

lsofzz · on June 9, 2020

> Thank you. I use most of this, I've been using it for years and I just don't talk about it because it's hard to argue when people just want to force an idea that k8s is "really the best way of doing things".

I wouldn't call it the best way; Rather a good way because Kubernetes does encapsulate the really good bits from scalability, development, security and reliability aspect. It's not a panacea but if you have team bandwidth to run k8s cluster, it's definitely worth a look.

<3 HAProxy. It's a solid piece of software tested to the teeth over the years and is great; Here's the thing: You can run it as your preferred ingress controller too :) https://www.haproxy.com/blog/dissecting-the-haproxy-kubernet...

Longwelwind · on June 9, 2020

I feel like your post describes exactly what Kubernetes and container images would bring to your infra.

If you were to deploy a solution like you described, you would get something more complex than simply running Kubernetes, except worse. I suspect that you believe your solution would be simpler only because you are more comfortable with those technologies than with k8s. The more I read criticism of k8s, the more I'm persuaded that what people calls "old boring technologies" truly is "technologies I'm comfortable with".

On top of that, you'd need to separately document everything you do on your infra. The advantage of Docker images over AMI is that you have a file that describes how the image was built. With an AMI, you would need to hope that the guy who created the AMI documented it somewhere (or hope that he has not quit). Same goes for k8s, where configurations are committed into your repository.

At the end of the day, k8s stays a tool that you should use only when needed (and also if you have the capabilities of using it), but I think you shouldn't discard it simply because you are capable of producing the same result by other means. You get a lot more by using k8s, in my opinion.

jen20 · on June 9, 2020

> The advantage of Docker images over AMI is that you have a file that describes how the image was built.

https://packer.io - how is this even a discussion?

> the more I'm persuaded that what people calls "old boring technologies" truly is "technologies I'm comfortable with".

My infrastructure runs in Nomad on bare metal. I am by no means opposed to “progress”, I just don’t think Kubernetes is the be-all-and-end-all of infrastructure and would like to have a less hysterical debate about it than the parent to my original post presented.

mvanbaak · on June 9, 2020

> Somewhat flippant answer here: the right number of MongoDB servers is zero.

AMEN to this one.

Very nice answer. It's the way we run infra.

dlidstrom · on June 15, 2020

What does this mean?

lenkite · on June 9, 2020

I loved this list of succinct answers.

sciurus · on June 9, 2020

Great answers.

To be fair though, that's not "Plain old linux" like zaptheimpaler suggested was somehow possible. That's linux plus AWS managed services plus software from Hashicorp. Which is a great stack to be on, but has its own complexities and tradeoffs.

lazzlazzlazz · on June 9, 2020

Thanks for this great list of answers. I was startled to see someone vomiting a list of unresearched questions as if they constituted a rebuttal. "Taking the bait" was the right call. Thanks again.

karakanb · on June 9, 2020

Would you mind pointing out some examples that you can achieve these on a bare-bones installation in an automated manner? Like, I mean sharing some real examples that I can install and forget. You can deploy the simplest "hello world" webserver in any language of your choice.

>> How do you run an application on a cluster of plain old linux machines?

> Build a package, install it in an image, run that image in an autoscaling group (or whatever equivalent your cloud of choice offers).

How come it is any different than running the Docker image on Kubernetes? The image build process is the same, delegating running the image to the platform is the same, the only difference at this point is the name of the platform you are running. Even if you were deploying ZIP archives to Elastic Beanstalk, if it doesn't work as expected, you'd have to debug it as an outsider, and you'd still have to know about the technology. I don't see how it is any different than Kubernetes.

>> How do you update your app without downtime?

> Blue-green deployment, or phased rollout.

How exactly? There are gazillion ways of doing them, they are rough concepts, what we need is a reliably working setup that requires as much effort from us as possible, and there are absolutely no standards on how to do them. Are you going to use Ansible? Maybe just SSH into the node and change the symlink? Maybe some other ways?

>> How do you replicate your environment if you want to add a new server to your cluster?

> Start the server from the same image.

How do you do that? You'd either do that manually on AWS console, or build some tooling to achieve that. If you were to do that via the autoscaling options the vendor is providing, then it is no different than Kubernetes: if that doesn't work then you'd have to debug regardless of the platform that is managing the autoscaling.

>> If your app has microservices how do services discover each other?

> Consul, or DNS, depending on your appetite

What is the difference between trying to learn how does Consul handle service discovery vs how does Kubernetes handle it?

>> How do you mount volumes from cloud storage?

> It's a bit unclear exactly what you mean here, but I'll assume you mean either block devices (just attach them at machine boot, or on startup if they need a claim), or NFS.

Would you mind sharing examples that are not vendor-specific and that'd be configurable on a per-service fashion easily, hopefully without writing any code?

>> How do you update configuration?

> Either update Consul and have it propagate configuration, or update a configuration package and push it out.

How is this any better than pushing your changes to Kubernetes? I personally don't know how does Consul work or how to update a configuration package and push it out to somewhere, I don't even know where to push them. In this context, learning them is also not any better than learning how to do them on Kubernetes.

>> How do you automatically restart failed applications?

> Systemd restart policy.

So, this means that you'd need to learn how to utilize Systemd properly in order to be able to start running your application and write the configuration for that somewhere, and also deal with propagating that configuration to all the machines.

>> How do you monitor if your applications are working?

> From outside - something like pingdom, and some kind of continuous testing. It's critical that this is measured from the perspective of a user.

The question was not really that. The tools like pingdom won't help you if an internal dependency of your application starts failing suddenly. You need a standardized solution for gathering various standard metrics from various services of yours, things like request rate, error rate, request durations, as well as defining custom metrics on the application level such as open database connections, latencies on dependencies, and so on. You will definitely need a proper metrics solution for running any serious workload, and in addition to that you'll also want to be able to alert on some of these metrics. There is no standardized solution for these problems, which means you'll need to roll your own.

>> How do you view your log files remotely?

> Cloudwatch, Syslog, SSH (depending on requirements).

The proper alternative to the Kubernetes' solution is Cloudwatch, and even then the simplicity of `kubectl logs <pod_name>` is still better than trying to understand how Cloudwatch works.

>> How do you port-forward from localhost to your Linux server to test your app locally?

> SSH.

This is not a trivial setup. Let's say you have a service A running remotely but it is not exposed, meaning that you cannot reach it from your local machine, and you'd like to be able to use that while developing your service B locally, how would you set this up in an easy way?

The points regarding the images are the same points as any Docker image, so it really boils down to the choice and one doesn't have an advantage over the other in this context.

What I am trying to say is: there are quite a lot of problems when running any kind of serious workload, and there are thousands of alternative combinations for solving them, and they were solved even before Kubernetes existed; however, there were no standardized way of doing things, and that's what Kubernetes is allowing people to do. There are definitely downsides of Kubernetes, but trying to point specific examples like these don't help as they are just names of individual software that also have learning curve and they all operate differently. I do wish there was a simpler solution, I wish Docker Swarm succeeded as a strong alternative for Kubernetes for simpler cases as it is brilliant working locally, and I wish we didn't have to deal with all these problems, but it is what it is.

As of today, I can write a Golang web application, prepare a 10-lines Dockerfile, write ~50 lines of YAML and I am good to go: I can deploy this application on any Kubernetes cluster on any cloud provider and have all these stuff defined above automatically. Do I need to add a Python application alongside: I just write another 20-lines Dockerfile for that application, again ~50 lines of YAML for Kubernetes deployment and bam, that's it. For both of these services I have automated recovery, load balancing, auto-scaling, rolling deployments, stateless deployments, aggregated logging, without writing any code for any tooling.

tomerbd · on June 9, 2020

I wish I knew to do all this, is there any guide best practice that I can read like a book that includes all above?

ditonal · on June 9, 2020

The difficulty of all that stuff on plain old Linux is overstated and the difficulty of doing that on K8s is understated. And I agree with grandfathers point that if you don’t understand how to do that on plain old Linux you will struggle on K8s. K8s is ok for huge enterprises with tons of engineers but somehow K8s advocates make it seem like learning and operating nginx on ubuntu is this huge challenge when it’s usually not.

rumanator · on June 9, 2020

[flagged]

anang · on June 9, 2020

I think the point is more that the complexity of a cluster of VMs that manage the lifecycle of containers is often overkill for a service that would work with nginx installed on Ubuntu, and that often times the former is sold as reducing complexity and the latter as increasing it.

redis_mlc · on June 9, 2020

rumanator: the mistake is yours. Please read the parent post again, since you missed any nuance.

I agree with said parent. You can go a long way with ASGs, iptables and yum install - literally the entire first decade of your startup.

In larger companies, just the politics introduced with managing k8s by another dept. is mind-numbing.

rumanator · on June 9, 2020

No, the assertion that nginx on ubuntu is equivalent to kubernetes os mind-numbingly wrong, for starters for being entirely and completely oblivious to containers. The comparison isn't even wrong: it simply makes no sense at all.

And no, being able to run software is not equivalent to Kubernetes. It's not even in the same ballpark of the discussion. You are entirely free to go full Amish on your pet projects but let's not pretend that managing a pet server brings any operational advantage over, say, just being able to seamlessly deploy N versions of your app with out-of-the-box support for blue/green deployments and with the help of a fully versioned system that allows to undo and resume operating with a past configuration. You don't do that by flaunting iptables and yum, do you?

If anyone needs to operate a cluster then they need tooling to manage and operate a cluster. Your iptables script isn't it.

redis_mlc · on June 9, 2020

> No, the assertion that nginx on ubuntu is equivalent to kubernetes os mind-numbingly wrong

Read the actual post, instead of projecting your nonsense.

yjftsjthsd-h · on June 9, 2020

DNS round-robin sends inbound to 2 haproxy servers, which proxy traffic on to the rest of the cluster. Scaling means "add another box, install your app, add haproxy entry". Service discovery is just an internal-facing haproxy front. If you must have "volumes from cloud storage", you use NFS (but if you can help it, you don't do that). Updates, restarts (including zero-downtime courtesy of draining the node out of haproxy), etc. are all quite doable with SSH and shell scripts. You run an actual monitoring system, because it's not like k8s could monitor server hardware anyways. Likewise, syslog is not exactly novel. I... don't understand why you're port forwarding? Either run on the dev cluster or your laptop.

So yes, you'd need a handful of different systems, but "k8s" is hardly a single system once you have to actually manage it, and most of the parts are simpler.

p_l · on June 9, 2020

Having done exactly that in the past, both by hand and with configuration management (including custom scripting that synchronized HAproxy configs etc.), I would say that I can do all of that much, much simpler in k8s.

Installing application and routing it with load balancing, TLS etc and support for draining is really simplified, something that used to require annoying amount of time (it got better as Lua scripting was extended in HAproxy, but by that time I also had k8s).

Resource allocation, including persistent storage (including NFS) is so much easier it's not funny, it makes my old years painful to think about. Syslog is not novel but getting all applications to log the same way was always a PITA, and at least with containers it's a bit easier (I still have to sometimes ship physical logs from containers...).

As for monitoring, that's one of the newer and more advanced topics, but it's possible to integrate more complex server monitoring with k8s - it already provides a lot of OS/application side monitoring that really makes it easier to setup observability - but now there's quite simple way to integrate custom node states into kubernetes allowing a quick glance way to check why a node is not accepting jobs, integrated with the system so you can actually trigger from a health-check that the node is in trouble and should not take jobs.

ex_amazon_sde · on June 9, 2020

This is how Amazon works internally, and by choice, because unnecessary complexity is the enemy.

> How do you do load balancing?

With dedicated load balancers. They are even cheaper than servers because they are optimized for doing one thing.

> How do you scale up and down?

With pretty trivial automation than trigger deployments when load is too high.

> How do you update your app without downtime?

(app?!) A package manager pulls updated packages, deploys them and restarts the services. The LB does the flipping.

> How do you roll back easily if something goes wrong?

The package manager does that quite easily

> How do you ensure all your servers are running the same version of dependencies? > How do you update those dependencies?

Package managers support versioning on dependencies.

> How do you replicate your environment if you want to add a new server to your cluster?

You install MyStuff version 1.2.3 and everything goes in.

> If your app has microservices how do services discover each other?

Thankfully Amazon uses services that are not micro. DNS does the trick.

> How do you update configuration?

It's in the packages.

> How do you automatically restart failed applications?

OSes do that since decades.

> How do you monitor if your applications are working?

LBs check the endpoints, monitoring tools do the rest.

Essentially you can build all of this with the technology that existed 15 years ago. It works reliably and it takes much less learning.

rapsey · on June 9, 2020

Tangentially lots of those features are quite nicely handled by Erlang.

Using plain old Linux and Erlang/Elixir as a backend you can get extremely far with scaling with very little overhead and issues.

It is unfortunate that so many ignore the benefits of using Erlang as a backend technology.

lazzlazzlazz · on June 9, 2020

This is a toxic trolling technique called "sealioning".

Thankfully, `jen20` was kind enough to take your questions in good faith and made a good demonstration of this confusion.

OOPMan · on June 9, 2020

I call bullshit on any 1-man team needing to worry much about this stuff.

Also, see jen20's response.

If your only tool is a hammer (called k8s) then I'm sure everything really does look like a nail...

verroq · on June 9, 2020

The easy answer is that you don’t need to run your service on distributed across 1000 VPS instances.

A handful of dedicated machines is enough. For example: stackoverflow.

None of these are difficult to answer. You can set up automated deployment/rollback however you want. You don’t care about the dependencies your code is compiled and linux has binary compat. You don’t need to split your app into 1000 unmanageable interdependent microservices. You have enough disk space on your dedicated machine or use NFS. You set up your systemd service file to auto restart your service. Etc etc.

When you have k8 as a hammer everything looks like a nail.

wwright · on June 9, 2020

“Plain old Linux” really isn’t an alternative to K8S though. You would need a load balancer, service discovery, a distributed scheduler, a configuration management system (K8S is a very strong alternative to building things around Ansible IMO). You can do all of those things without K8S, of course, but not with “plain old Linux” (what would that be anyway? GNU coreutils and the kernel? Vanilla Debian stable?)

anang · on June 9, 2020

Maybe I’m way off, but aren’t all of those things required for k8s? Ingress controllers, etcd cluster, terraform modules, storage configuration, etc...

I guess if you pay for a hosted service a lot of the control plane is taken care of.

I’ve used k8s in orgs where it’s a great fit and really fills a need, but it is considerably more complex than running a web service balanced across a couple of machines, and it definitely requires a lot more upfront complexity (as in, you have to solve problems before you have they are actually problems).

folmar · on June 9, 2020

At least in Azure Kubernetes ingress and etcd is yours, the other things are taken care of. And the terraform you get is quite nice.

wwright · on June 11, 2020

I was stating that plain Linux isn’t an alternative to K8S, not that one should always use Kubernetes ;)

threeseed · on June 8, 2020

> I don't see how k8s avoids the need to learn about cloud vendor specific tech e.g searching "aws RDS k8s"

Not sure what you mean here.

Kubernetes is designed to be cloud and vendor agnostic.

And RDS is a hosted database that you connect to from your application. Whether that applications runs standalone, in a container or VM is irrelevant.

Aeolun · on June 9, 2020

> Kubernetes is designed to be cloud and vendor agnostic.

But it’s not. Connecting to EKS is completely different from connecting to GCP. Setting up worker nodes is completely different too, oh and load balancers.

It’s only the last mile that’s similar.

dtech · on June 9, 2020

We've migrated from Heroku to EKS using Terraform to set it up. We done a small PoC for Azure, and within a day we had a small cluster and one of our apps running. The app terraform code required only 1 variable change.

Sure, it's far from drop & replace, but "completely different" is a huge misrepresentation in my experience. Running multi-cloud seemed quite doable.

15155 · on June 9, 2020

Provisioning a new cluster on each of these platforms is tens of lines of Terraform.

ponyous · on June 8, 2020

I feel like what 98% companies need is really just barebones linux with some good documentation how to spawn new nodes.

To use k8s you need to know linux anyways, but to use linux you don’t need k8s knowledge. Most of things are as easy to setup with linux and the things where k8s really shines are not needed most often.

What I really wonder is where did you learn k8s, which parts did you learn the most? It seems huge and I would love to be in your position

joana035 · on June 8, 2020

K8s is a nice api on top of gnu/linux. Want a iptables rule? Write a yaml (network policy). Want a storage for you app? Write a yaml (persistent volume) etc etc.

For people who already know Linux, kubernetes comes naturally because it is pretty obvious.

But indeed, by experience, many companies can go to "unicorn scale" with two or three boxes.

hwbehrens · on June 8, 2020

Couldn't you get nearly the same behavior using some basic Ansible playbooks? My impression was that the killer feature of k8s was scaling, automatic failover, etc., although to be fair it's been several years since I last looked into it.

antonvs · on June 9, 2020

Think of kubernetes as a cluster operating system. Instead of dealing with vms, you deal with your applications directly, without worrying about where they're running. It gives you a unified view and ability to manage a distributed system at the level of the application components.

Ansible can't give you anything like that. Even if you use Ansible to automate something like network setup, the commands and modules you use will be different between e.g. cloud providers. Kubernetes give you a consistent abstraction layer that's almost unchanged across providers, with the exception of annotations that are needed in some cases for integration with certain provider services.

joana035 · on June 8, 2020

For me k8s kinda extends what you can do with ansible.

For example, if you use just kubelet (the main daemon) without the api, drop some static yaml files (ala unit files) under /etc/kubernetes it will be something like systemd is. No big deal so far, but using the kubernetes api (just another daemon) it allows you to run the all the things anywhere.

I mean, you terminate one computer, k8s will figure out that the program should be running and it will be started in another computer. (Including moving detaching/attaching the block device from/into computers, this kind of things)

There is no much secret.

For me it really feels like an (well designed/integrated) API on top of standard Linux technology (ipvs, netfilter, mount, process mgmt, virtual ip, etc) via yaml running your stuff in 1..N computers but you manage it as one big computer :)

EDIT: added more context within "()"

treeman79 · on June 9, 2020

I go in and out of development and ops and devops. 20 years.

To me as a developer heroku is the gold standard.

Docket-compose makes sense.

Kubernes and helm is a giant soul crushing wtf On getting anything done.

Now I’ll get it figure it out. Angular + exhaust+ typescript was a similar experience after I took a break from front end for a few years. A few months of pain, then it starts clicking.

but this just seems insane for a basic web app. So many different tools needed to get it going. Tutorials all have many steps that don’t apply, or use other odd pieces swapped in.

p_l · on June 9, 2020

Many tutorials are unfortunately bad, they actually got worse over time.

Oooold presentations (think 2016 and older) tended to talk more about basic building blocks and how they interacted, especially the design involving resources and their controllers working in a loop of "check requested state, check real state, do changes to implement requested state", and how those loops went up from single pods, through ReplicationControllers (now ReplicaSets), then Deployments, etc.

treeman79 · on June 9, 2020

This is the exact problem I ran into on rxjs. Every tutorial was badly out of date. Even ones written six months earlier.

You had to become a master of figuring out the direction of where community was going and stay on bleeding edge.

Some people found this fun. I have many times enjoyed this. When I want to get a boring feature built and hand off to the jr develops so I can spend some time with my kids. It’s frustrating

p_l · on June 9, 2020

The most annoying thing, to me, is that Kubernetes doesn't even move fast enough!

The old presentations? Ones that remember K8s 1.0? There's one major change (move from Replication Controller to Deployment+ReplicaSet) that doesn't invalidate ~90% of the material, because the core stuff is about how controllers work!

Yet it seems more and more common to me that people don't learn the core mechanism of kubernetes unless per chance they got there writing CRDs :|

rumanator · on June 9, 2020

> Couldn't you get nearly the same behavior using some basic Ansible playbooks?

No. For starters, Ansible playbooks don't allow you to dynamically deploy and redeploy a set of containers in the node that at a given moment happens to have more available computational resources such as bandwidth, nor do teu respawn a container that just happened to crash.

Arguably Ansible only implementos a single feature of Kubernetes: deploy an application. Even so, while with Ansible you need to know and check all details about the underlying platform, with Kubernetes in essence you only need to say "please run x instantes of these containers".

lmm · on June 9, 2020

Ansible gets you like 80% of the value of Kubernetes, yes. But you have to allocate applications to hosts manually, your configuration is less declarative, and what do you gain?

neop1x · on June 10, 2020

What about service discovery? What if you need to put power your node down for maintenance? What if your app goes crazy and consumes so much memory it OOMs some other services? Reliable, multi-service deployments ARE HARD and always be regardless what tech you use to achive it. Of course you can use Ansible and do it old-school way and wast lots of underutilised VMs and custom idempotent scripts. But K8S solves many of challenges in a standardised, data-driven way. It has steep learning curve but once learned, all resources/limits are properly set, logging in place, etc, it works nicely.

lifeisstillgood · on June 8, 2020

Firstly tell me more about your 1-person full-stack venture, but second how comes you, with barely any time for sitting down can use k8s happily but it falls over for others. I am struggling to see truth amount the comments here :-(

dashwav · on June 8, 2020

TBF with a completely greenfield project and managed K8s (GKE or EKS) you can absolutely get a pretty well set up infra very quickly if you are willing to learn how to do so.

I often get the feeling a lot of the negativity comes (rightfully so) from trying to replicate a current existing project into kubernetes. This is true of almost any paradigm - try replicating a Java EE monolith into Erlang and you are going to have a lot of problems. The big thing to note is that starting a project on Erlang very well might solve the problems that a Java EE project ran into, but that is because they were able to solve them at the ground floor, and just popping a Java EE project with all of it's architecture into an Erlang project will probably end up in a worse spot.

I think that this is what often happens with k8s as well - if you or your company have a currently working implementation that isn't on k8s, of course you won't be able to just easily plop it into a k8s cluster and everything be all well and good, but I think the problem is that people are equating that issue with k8s itself, which is a completely different paradigm.

solidasparagus · on June 9, 2020

> managed K8s (GKE or EKS) you can absolutely get a pretty well set up infra very quickly

And then tear your hair out when something doesn't work for some reason and root causing it requires learning a stupid number of layers. k8s is easy until it goes wrong.

harpratap · on June 9, 2020

Isn't this the same for every software? How is debugging issues with Linux, NGINX, any complex framework any easier?

Aeolun · on June 9, 2020

Well, for one thing, all logs are in /var/log.

If it doesn’t fail just right in kubernetes there might be no logs at all.

I’m thinking on particular about trying to mount filesystems into pods.

lenkite · on June 9, 2020

Debugging k8s issues is like debugging a vast distributed monolith with a vague guesses on where the problem occurred.

snuxoll · on June 8, 2020

Not the GP, but I honestly couldn't tell you. A lot probably comes down to tooling, the applications you are deploying, security requirements, etc., as well as how familiar you are with k8s itself.

I migrated PCGamingWiki from running on some Hetzner boxes to DigitalOcean Kubernetes in a few days of work creating Dockerfiles and k8s manifests. I run a Kubernetes cluster at work fairly hands-free that hosts applications critical to our billing operations, and developers on the team deploy new applications with little or no support. Any of the issues I've hit are an artifact of migrating legacy applications not designed to run in more-or-less stateless environments, which is why the PCGW Community site still runs on its own server (Invision Community sucks).

I really don't see all the issues people have that aren't due to a mismatch of application design vs target environment (and no, it's not monolith vs microservice - monoliths run just fine on k8s; but you should be designing your application with a 12-factor environment in mind) or a misguided notion that you will be drowning in YAML hell (it's real, but you can manage it - and it's directly related to the complexity of the services you are deploying).

anticristi · on June 9, 2020

I can totally agree with this. Most of my customers that I see struggling with K8s are those that haven't internalized 12-factor principles: not just heard, read or understood, but really internalized. It is unfortunate that K8s talks / blob posts / articles do not focus enough on this pre-requisite.

ramraj07 · on June 9, 2020

Seems like you are a thoughtful engineer who can also make sure you don't make any fundamental design flaws while building these systems well. Whenever I have seen kubernetes fail it's often also because the engineer(s) who built it were not thoughtful at all and often didn't fully understand what they were doing. Perhaps k8s's failing is it makes people who dont know enough think that they do.

snuxoll · on June 9, 2020

I mean, I’m far from perfect - the trick has always been to KISS. I don’t use istio, it’s absolute overkill for my needs. I use nginx-ingress because it fits the bill, I know nginx as do enough other people that they could exec into a pod to debug it. I don’t run stateful applications that aren’t prepared to have servers randomly vanishing because it take a LOT of work to get these running in-cluster. I don’t use public helm charts because they often suck and making your own container is something you can do quickly if you were able to deploy the software on a traditional server. Every choice I make is done with the day 2 operations in mind, not what is hot, what gets initially deployed fastest - but what makes it so I can touch the thing as frequently as possible.

PCGW is a great example - installing a new Mediawiki extension, changing a config file, upgrading to a newer MW release is just updating a file or a git submodule and committing. I don’t get paid thousands a month to manage the site for the owner, so I make my time spent as efficient as can be done.

andrenth · on June 9, 2020

I think this is a big failing of the DevOps movement as a whole (at least what DevOps became in practice — devs doing ops) which results in things like passwordless mongodb exposed to the internet...

lsofzz · on June 9, 2020

> I think this is a big failing of the DevOps movement as a whole (at least what DevOps became in practice — devs doing ops) which results in things like passwordless mongodb exposed to the internet...

Hardly. If anything at all, it tells about the _team_ and/or the culture of the organisation. In any DevOps/SRE/Opsec culture worth the salt, an immediate blameless postmortem analysis would be performed to help with premortem analysis in future.

DevOps is not about exposing unsecured endpoints. You've got it all wrong son.

andrenth · on June 9, 2020

I'm not your son, nor am I talking about what DevOps "is about", but about what is became in practice, which you would have understood hadn't you rushed to reply in the most condescending tone you managed to invoke.

lsofzz · on June 10, 2020

> I'm not your son, nor am I talking about what DevOps "is about", but about what is became in practice, which you would have understood hadn't you rushed to reply in the most condescending tone you managed to invoke.

Fair enough. I am letting you know though the part where you got mixed up - that is not because _What DevOps has become in practice_. That is precisely because of failings and shortcomings in team culture and/or the organisation that practice DevOps.

antonvs · on June 9, 2020

This is pretty typical when a product is still climbing the adoption curve. I've helped small companies set up (managed) k8s clusters and migrate their apps to them, and when you know what you're doing, it's a super smooth experience that's basically all upside.

But, if you're approaching it for the first time with no assistance, there are lots of things that can trip you up, and lots to learn. That's not a reflection on k8s, it's just the nature of the large set of problems it's solving.

K8s is succeeding because it's very well designed, has a large and diverse ecosystem, and solves set of important problems that very few other tools even try to tackle. Apache Mesos perhaps comes closest, but it's not quite as pragmatic, and its adoption level reflects that.

Also, because of k8s' scope, many people may not fully appreciate the range of problems it's solving, seeing it through the lens of their own background and focus.

lmm · on June 9, 2020

People who don't like - or don't "get" - declarative systems tend to spend an inordinate amount of time and effort fighting them. I've seen the same thing with a delcarative build system (maven), or with adopting an ORM - if you're willing to work with the tool then it will save you a lot of effort, but if you're determined to do things your own way then you can make it almost arbitrarily difficult.

antonvs · on June 9, 2020

> if you're determined to do things your own way then you can make it almost arbitrarily difficult.

This is true of software development in general, if not life itself!

But you're right that declarative systems amplify this issue.

banifo · on June 9, 2020

Might also be the amount of experience a person has.

I have done Softwaredevelopment for ~15 years, swiched now to infrastructure and build a gke cluster. It was awesome, very logical, nice and easy to use.

Now i read stories from coinbase and don't get it.

0xbadcafebee · on June 8, 2020

> What is the alternative to k8s that does not need people to have any technical knowledge?

Paying for a managed provider. Heroku, Elastic Beanstalk, GAE, GCP, Fargate, etc. You push some buttons, they manage your cluster/services.

People still think they can get a free lunch by downloading some free software. If that were so, Windows and Mac would be dead and Linux would be the only desktop OS. But good news: I hear 2030 will be the year of the Linux Desktop!

hardwaresofton · on June 9, 2020

Check out Dokku[0]

[0]: https://github.com/dokku/dokku

OOPMan · on June 9, 2020

Couldn't have done it without k8s?

Ahahahahaha, kids say the darndest things these days...

mmgutz · on June 8, 2020

I'm in the same boat opting for docker-compose instead. docker-compose is much simpler to manage. Obviously it doesn't have feature parity with k8s but docker-compose does the basics well.

An inexpensive VPS runs compose well with more resources at a lower cost than the managed costs of k8s.

threeseed · on June 8, 2020

I don't understand this comment at all.

Kubernetes is designed for a cluster of nodes not a single one.

Of course a single VPS is lower cost and easier to use but what happens if you want a second or third. Or any form of redundancy.

imtringued · on June 9, 2020

You just run scp with the compose file and run docker compose down; docker compose up

Bonus points if you just mount the compose file on an NFS share.

I've wasted enough man months on kubernetes that unless you tell me to manage 1000 nodes this approach will never cost me more time than the time I spent using and learning kubernetes.

folmar · on June 9, 2020

A lot of people use it for deployment standard and don't care about multinode.

j4ah4n · on June 9, 2020

I presume they are using docker swarm in this instance.

temptemptemp111 · on June 9, 2020

Agreed. Good support for compose currently, and swarm is usually overkill. With a bit more elegant HA functionality, docker-compose could be the go-to for many more. The comment below claiming he doesn't understand your comment reminds me of all of those who will say "well its not FOR production" - seems more like superstition than science.

jillesvangurp · on June 9, 2020

It's a pity that docker swarm did not make it. It wasn't perfect but it was a lot simpler to setup and manage than kubernetes.

If you can get away with it, vanilla docker hosts running docker compose provide most of the same benefits with a fraction of the cost. For most startups, that's a great way to avoid getting sucked into a black hole of non value adding devops activity. You lose some flexibility but vanilla ubuntu hosts with vanilla docker installs are easy to setup and manage. We used packer and ansible to build amis for this a few years ago with some very minimal scripts for container deploys.

santamarias · on June 9, 2020

> It's a pity that docker swarm did not make it.

Sorry I do not understand that statement, in my naive opinion Docker Swarm seems to be a thing. Care to elaborate, please?

silvester23 · on June 9, 2020

It is, but at this point it is unclear for how long Docker Swarm will be supported, see e.g. https://boxboat.com/2019/12/10/migrate-docker-swarm-to-kuber...

We are actually currently in the process of migrating from Docker Swarm to k8s and I am not 100% sure that's a good idea. We will see.

patricjansson · on June 9, 2020

"conversations have led us to the conclusion that our customers want continued support of Swarm without an implied end date."

https://www.mirantis.com/blog/mirantis-will-continue-to-supp...

tylermenezes · on June 9, 2020

No matter what they claim, it's really not supported in the sense most commercial oss projects are. We finally switched off after a minor version introduced a segfault when adding nodes in certain conditions, and the issue was unfixed after 5 months.

jmb12686 · on June 10, 2020

This. Docker Swarm, and by extension Docker EE / UCP, is barely in maintenance mode. Go compare the Moby and docker projects on GitHub vs kubernetes.

To be clear, I use Docker Swarm in my home cluster due to simplicity and ease of use. Unfortunately that pattern hasn't scaled to the Enterprise.

AsyncAwait · on June 9, 2020

What about HashiCorp's nomad? Seems a lot simpler to manage than k8s and is actively developed.

jillesvangurp · on June 9, 2020

It exists but in terms of people using it or it being actively developed, it's dead as a doornail ever since Docker was more less forced to also support kubernetes and basically gave in to the reality that world + dog was opting for kubernetes instead of swarm.

They never really retired it but at this point it's a footnote in Docker releases.

I've not actually encountered it in the wild in four years or so and never in a production setup.

neop1x · on June 10, 2020

Docker Swarm doesn't support multiple users and there is no remote-accessible API. Nomad doesn't implement network policy (Nomad Connect sidecars may be an option but sidecars bring new problems). Just learn K8S, Helm, Terraform & Terragrunt properly. Use proper tools (k9s, Loki, wrapper scripts around kubectl). Stop finding excuses for not using K8S. Stop putting proxies everywhere (that Istio/servicemesh bull*hit) and use Cilium CNI instead.

GordonS · on June 9, 2020

I'm also a fan of swarm, and still use it in production.

It's just so damn simple in comparison to k8s - basically, if you know Docker Compose, you know Docker Swarm.

I appreciate it doesn't have the full power of k8s, but it has what most apps need: simple deployments, zero-downtime updates, distributed configs and secrets.

manigandham · on June 9, 2020

You can't make the complexity disappear. Kubernetes just offers a very standardized, stable and relatively polished way to handle it.

The other alternatives, like running your own orchestration system, are just as complex if not more so, although you might be more familiar with it since you built it.

That being said, many companies don't need any of it in the first place for their scale, and that's probably the biggest issue with K8S today.

rodgerd · on June 9, 2020

> That being said, many companies don't need any of it in the first place for their scale, and that's probably the biggest issue with K8S today.

Yes. I helped push out a k8s platform where I work, and it's been running well in production for ~3 years.

We most definitely did not start with "we want to deploy k8s". We started with "we're being asked to meet certain business requirements", which lead to "we will need to change how we do some of our development and deployment", which lead to "our platform will require these characteristics."

The easiest way to get those turned out to be buying a packaged k8s from a vendor (OpenShift, not that it matters; there's plenty of options).

Most of the pain with k8s seems to come down to people wanting to polish their CVs (by "having done k8s"), or people who sneer at packaged/hosted solutions because they want to build a cottage industry of building k8s services from scratch because it's their idea of a good time.

Unsurprisingly both take orgs down the route of pain, money, and regrets. Oftentimes the people driving this decision then prefer to say the problems is k8s, or containers. Or microservices or whatever fad it was they were chasing without understanding whether it would be a good idea.

daze42 · on June 9, 2020

It's impossible to make essential complexity disappear but it is certainly possible to reduce incidental complexity. Most software is much more complex than it needs to be and Kubernetes is no exception.

CraigJPerry · on June 9, 2020

Before k8s every serious shop automated the crap out of their infra. Jump/kickstart recipes, rolling cluster patching that split RAID mirrors before applying, blue/green deployment scripts to tickle the loadbalancer, cron jobs to purge old releases ...

That stack is super complex and utterly bespoke to the company.

With k8s it’s standardised and usually better quality.

folmar · on June 9, 2020

It's on a path to standardised, but not there yet: etcd vs. others, different ingress controllers, providers replacing most of network parts, storage is bumpy/not so standard, deploy may be kubectl apply/helm/operator.

I would really appreciate a more mature ecosystem.

yjftsjthsd-h · on June 9, 2020

I'd say it the other way; k8s has to do all the things that another management system would have to do, but sufficiently tied together that you can't ease into it as needed.

manigandham · on June 9, 2020

I guess that depends primarily on whether you're installing and operating K8S yourself. Use something like GKE and it's a very seamless experienced, with AKS getting pretty good and the rest being rather crummy.

Once you have a managed cluster, the deploying apps is fairly easy. A single container/pod is a 1-liner and you can work your way up from there.

yjftsjthsd-h · on June 9, 2020

That's fair; all of this is heavily influenced by your operating environment. If you can run GKE or if you have an ops team to deal with that stuff, then yeah k8s is great. Unfortunately, I'm part of the ops team, and our company is too small to have a dedicated k8s team and too low-budget to (likely) do well with a managed service (we get absurd value per money out of bare metal servers, which is very much a tradeoff that is sometimes painful). So to me, k8s looks like a very iffy tradeoff. Bigger company, bigger budget, different constraints? Yeah, k8s would be great.

mwcampbell · on June 9, 2020

> we get absurd value per money out of bare metal servers, which is very much a tradeoff that is sometimes painful

What do you find most painful about your use of bare-metal servers? The thing that I like most about a hyperscale cloud provider is the level of redundancy, including even multiple data centers per region, and their built-in health checks and recovery (e.g. through auto-scaling groups) based on that redundancy. With bare-metal servers, I'd have to cobble together my own failover system for the occasional time when one of those servers goes down or becomes unreachable due to a network issue. And of course, I'd probably find out that my home-made failover system doesn't actually work at the worst possible time.

yjftsjthsd-h · on June 9, 2020

I don't know that it's a single big thing; it is indeed many small things that we have to manage ourselves. We backup the databases with our own scripts, we failover manually (I miss RDS), we use an overlay network because no VPC, and deployments involve ansible running docker-compose. There's basically no elasticity; we provision massive bare metal servers with fixed memory and disk installed. But, it is dirt cheap, so we manage, and all the pieces are small and easy enough to use.

zapf · on June 9, 2020

k8s and even docker are trying to solve a problem not many people will ever face. However, being the sexy new things loads of people get sucked into integrating it into their stack from the word go.

Being reluctant to adopt new technology unless you really really need might be a more sensible thing to do.

thrwn_frthr_awy · on June 8, 2020

> This actually was a very real problem at my current job. The data pipeline was migrated to k8s and I was one of the engineers that worked to do that. Unfortunately, neither myself (nor the other data engineer) was a Kubernetes guy, so we kept running into dev-ops walls while also trying to build features and maintain the current codebase.

I've experienced this as well. At the last large company I worked at we had a Heroku-like system to run our apps on. That was deprecated for a Docker-based solution. And then _that_ has now been deprecated for a Kubernetes offering. We just ran some Python web apps–we didn't want to have to learn and support an entire system. And here's the thing, most big tech companies I've worked with are all made up of these small, "internal" services, that just want a simple place to run their services.

rubber_duck · on June 8, 2020

I've used k8s on an early phase project I later moved off from so I don't have much experience with it but I got the impression that simple scenarios worked like documented. You already had a docker based stack so it sounds like hosted k8s shouldn't be that far off from what you were running.

What issues did you encounter ? I had to spend a week working through it to get familiar with everything but I wasn't experienced with docker previously (I've played with it a few times but never had to setup stuff like custom registry, versioning, etc.)

I did end up in situations where the cluster was FUBAR but I eventually figured out that everyone just recommends rebuilding the cluster over diagnosing random stuff that went wrong during development.

croh · on June 8, 2020

Been there. Simple python django app. Max 20 APIs. Expecting 100 requests per month. But POs and directors wanted shiny DevOps tools. Problems with typical MNCs. Every quarter manager/director comes with some new hype.

dvcrn · on June 9, 2020

If it's a hosted k8s, all you'd do is containerize your application, then create a deployment that pulls that container and exposes it through a loadbalancer. The container is the same if you'd push it to herokus container registry or beanstalk

treeman79 · on June 9, 2020

Heroku I configure external items with some environments variables. Often there is a simple command to do it for me.

Helm. I’m staring at 20 pages of Yaml. Most of it seems arbitrary. Crap ton of interconnections. Plus defining all iam roles.

Doesn’t help that devops team keeps writing custom scripts to interact with it.

dvcrn · on June 9, 2020

On k8s, you write those env variables into the deployment manifest that describes the container. If they are sensitive, pack them into a k8s secret and add that to the container instead.

You can use the external services the same way if they are already setup with things like DATABASE_URL that holds a postgres connection string to amazon RDS, etc. Running k8s doesn't mean you have to move your entire stack into k8s, you can still use hosted services just fine.

In all my time with k8s, I never used Helm. If you don't need it, don't use it and keep it simple. k8s can do a ton of stuff but in reality, you barely need more than the basic.

Ataraxy · on June 8, 2020

Isn't that the appeal for google's cloud run? A sort of go between from cloud functions/docker image.

013a · on June 9, 2020

> If you want k8s, you really do need people that know how to maintain it on a more or less full time schedule.

I think this is roughly similar to saying "if you want linux, you need people that know how to maintain it." Which is to say, you can create an architecture where this is absolutely true, but it doesn't need to be true.

The big issue with K8S right now isn't K8S; its that there aren't (big, well established) solutions like Heroku or Zeit or whatever, for K8S, where you don't need to worry about "the cluster", just like those solutions don't make you worry about Linux. K8S really is two parts; the API and the Cluster. The API is the more valuable of the two.

And, you know, maybe it won't ever get there. Heroku and Zeit solve strikingly similar problems to K8S. Maybe K8S just is a platform like that, but for enterprises who want to home-grow, and maybe most companies shouldn't worry about it. But I think the platform, and thus the community, simply needs more time to figure out where it makes sense.

Most companies shouldn't touch K8S. You'll probably regret it. But, to your second point: AWS literally has nothing beyond EKS/ECS + Fargate which approaches a Heroku-like service. Beanstalk is supposed to be that, but its really just a layer on-top of EC2 which doesn't touch the "ultra low maintenance" of a Heroku, or Zeit, or App Engine. So if you're on AWS, and you want to use their other excellent managed services, you either go outside AWS, or you'll go EKS, or you'll end up trying to in-house something even worse.

jupp0r · on June 8, 2020

> I was one of the engineers that worked to do that. Unfortunately, neither myself (nor the other data engineer) was a Kubernetes guy, so we kept running into dev-ops walls

This seems like an organizational problem to me. "DevOps walls" sounds like there is a "DevOps team" (a famous DevOps anti-pattern) and there are knowledge silos between development and ops, which ironically is the exact opposite of what DevOps is about. What this also means is that developers need to be aware of the environment in which their services run and should be very familiar with how k8s works, why not take that as an opportunity to learn?

harpratap · on June 9, 2020

I have said this multiple times in the past (https://news.ycombinator.com/item?id=23361176 https://news.ycombinator.com/item?id=23243626) and will say it again - You (business logic developers) are not meant to use K8s directly, you are supposed to use a PaaS built on top of K8s (like Cloud Functions, Lambda etc)

Lx1oG-AWb6h_ZG0 · on June 9, 2020

But this makes no sense. Why would a cloud operator even bother with k8s if the customer is only interacting with functions? It’s much more efficient to bypass Kubernetes and run directly on the cloud’s native system (like borg).

I think you’re right though - Kubernetes is a massive red herring, we should ideally be running containers/functions on as close to bare metal as possible. Fundamentally, VMs are the wrong abstraction if all your code is containerized.

Joyent’s triton is the closest thing we have to this... I really don’t know why AWS/Azure/GCP haven’t cottened onto this, it would massively reduce their COGS and improve our developer experience.

harpratap · on June 9, 2020

> Why would a cloud operator even bother with k8s if the customer is only interacting with functions?

Because there are various "tiers" of users, some companies (like coinbase) could actually leverage K8s in their Codeflow/Odin project and prevent a lock-in. But a regular developer looking to just "get things shipped" isn't meant to waste his/her time with pure K8s.

> Kubernetes is a massive red herring

We agree but on a different note. The biggest selling point of K8s is it's API design. The entire industry needs to converge on one "defacto" standard of packing and deployment. Google's Cloud function is a perfect example of this. The API is based on K8s and Knative but under the hood it actually runs directly on GCE rather than GKE. What happens underneath is hidden from business developer, you only care about the data in your yaml and your docker image.

StreamBright · on June 9, 2020

>> I really don’t know why AWS/Azure/GCP haven’t cottened onto this

Conflict of interest. If k8s yields to the most revenue why would they try to decrease that? If some customers are so delusional that they go for an inefficient abstraction so be it. Btw. this is my experience with k8s too, people use it because it is a trend. Not a single company / developer could justify using it to me over leaner resources like EC2, ASG, cloud native resources, etc.

harpratap · on June 9, 2020

How about companies who don't want to rely on cloud and want on-prem as much as possible? Or those who don't want to be tied to a single vendor? Apple is already getting rid of their Mesos based PaaS and moving to Kubernetes.

StreamBright · on June 9, 2020

>> How about companies who don't want to rely on cloud and want on-prem as much as possible?

I don't know. Are you saying that staying on-prem means you have to use k8s? Google is not a single vendor?

>> Apple is already getting rid of their Mesos based PaaS and moving to Kubernetes.

And? Should every other company follow Apple?

whatsmyusername · on June 8, 2020

The 'needing a team' aspect of Kubernetes sounds remarkably similar to conversations I had like 8 years ago when Openstack was the new hotness.

We went with ECS and have been happy with it. It plays well with all of AWS's other products and features. For the few things we have to run On-Prem we use Docker Swarm in single node mode and it works well (albeit missing a few features like crons from Kubernetes).

threeseed · on June 8, 2020

AWS Kubernetes (EKS) with Fargate is just as simple as ECS.

Far simpler if you can include the fact that with Kubernetes you can install new applications as simply as "helm install prometheus".

whatsmyusername · on June 9, 2020

Are they still charging something ridiculous for control planes? When I looked at it they were like $200/month.

I wasn't aware they were offering fargate on EKS now. They weren't when I looked at it last.

dtech · on June 9, 2020

Since December 2019: https://aws.amazon.com/blogs/aws/amazon-eks-on-aws-fargate-n...

StreamBright · on June 9, 2020

Yep. I remember the Openstack consultants swarming to office trying to sell it as a solution and not a giant overhead / problem. Luckily they failed the POC so we did not need to waste our life on a non-issue trying to solve it with a non-solution. Now it is k8s' time to do the same. We will see how far this buzzword train gonna go.

whatsmyusername · on June 20, 2020

TBF I like k8s as a service or platform. Once metallb came out to solve the "bare metal load balancer" problem it could deliver the entire product I was looking for.

My big issue with it is the underlying complexity and house of cards nature of running your control plane as sidecar containers on your runtime infrastructure.

You CAN set it up and run the control plane out of band, but last I looked there wasn't step by step documentation for doing so. I also couldn't find anyone doing it in prod which to me is a nonstarter. If I can't figure it out myself and can't hire for it I'm not doing it.

I'm sure it's fine on google cloud, but ECS solves 90% of our problems AND integrates with everything else we're using already.

avita1 · on June 8, 2020

I agree with most of this, but was surprised by your comment:

> Oh, also, the implementation of Kubernetes cron jobs is also complete garbage (spawning a new pod every job is insanely wasteful).

How often/how many cron jobs are you running that spawning a new pod per job is a problem ?

dvt · on June 8, 2020

The first iteration (I actually wasn't around for that) was trying to run a cron for every "data ingestion job" -- at some points, we were doing about 50k+ API requests daily (FB/Instagram/Twitter/etc.) and that was absolutely not tenable using k8s cronjobs.

snuxoll · on June 8, 2020

Why use cronjobs at all for this? This is a classic work queue problem.

dvt · on June 8, 2020

I wasn't there for this decision, but I assume cronjobs were being treated as "cloud functions" -- and to be fair, the k8s documentation kind of makes it seem that you could technically do that, but fails miserably if you try to do so in practice.

dilyevsky · on June 9, 2020

50k/day is less than 1 qps. This is nothing. This is either not the full story or your cluster was setup completely wrong

regularfry · on June 9, 2020

Depends how spiky the distribution was. 20k a second at 2pm? Gonna have a bad time.

brianwawok · on June 9, 2020

Counterpoint:

Run 11 person startup. Use hosted GKE. We spend less than 1 man-hour per week dealing with K8s or anything like that. K8S is a big reason we are able to out-execute our competition.

bvm · on June 9, 2020

I agree with you completely, we use Kops and see a similar workload. The real boon for us is not in production, where the HA/error-tolerant/easy horizontal scaling certainly helps, but in development, where we can easily bring up ephemeral feature branch deploys as part of the CI/CD pipeline (which itself runs on k8s using Gitlab CI)