Moving from Heroku to Google Kubernetes Engine

markbnj · on April 5, 2019

We've been running in production on GKE for a little over two years and it's been a solid platform since day one. It's nice to read articles like this and see others coming to the same conclusions we've come to. If your practices and workflow are oriented to containers and you outgrow your PAAS then k8s is the logical place to land.

With respect to the choice of helm: we started out rolling our own pipeline with sed and awk and the usual suspects. When that became too complex helm was just taking off and we moved to that. We still use it to install various charts from stable that implement infrastructure things. For our own applications we found that there was just too much cognitive dissonance between the helm charts and the resulting resources.

Essentially the charts and the values we plugged into them became a second API to kubernetes, obfuscating the actual API below. The conventions around the "package manager" role that the tool has taken for itself also contribute to lessen readability due to scads of boilerplate and name mangling. We recently started deploying things in a new pipeline based on kustomize. We keep base yaml resources in the application repo and apply patches from a config repo to finalize them for a given environment. So far it's working out quite well and the applications engineers like it much better. Now with kubectl 1.14 kustomize's features have been pulled in to that tool, something I have mixed feelings about, but at least the more declarative approach does seem to be the way the wind is blowing.

physicles · on April 5, 2019

Perhaps I don’t understand the design decisions behind Helm, but it’s always struck me as having a severe impedance mismatch with k8s itself. It defines another entirely different schema, and relies on an agent running in your cluster that’s also trying to reconcile desired state with actual state (which k8s itself is also doing). I’m skeptical that you could use it extensively without also understanding the k8s stuff underneath it.

kustomize came along just as it’s become untenable for us to copy/paste config to multiple environments. I like that it’s pretty much the simplest possible way to customize yaml, and plan to dive in soon.

techntoke · on April 5, 2019

Once you start to see both Kustomize and Helm as a templating language, then you'll realize that Kustomize doesn't cover many use cases and is intentionally limited in scope. There is a reason almost every major project in the Kubernetes ecosystem has a Helm chart, but not a Kustomize configuration file. That doesn't mean that Helm doesn't have it's issues, because their implementation of Go templating is constrained, and Go templating is challenging in itself; however, it does a lot more than Kustomize can offer and covers a significantly larger amount of use cases than Kustomize patch files.

Kustomize is great for getting a job done quickly if you don't mind some duplication of code and effort throughout your projects, but Helm is ideal for managing dependencies and templating across a large amount of projects where you may want to re-utilize other charts.

I think it is ideal to not just discover these tools by reading about them, but actually start using them and spend a few hours trying them out and testing what they can and can't do.

markbnj · on April 5, 2019

You're right about these things in my view. There's a lot you can do in a helm chart that you can't do with kustomize and patches. The helm template functions are powerful in their own right; you have conditionals, loops over ranges, you can define your own helpers, and then you have the whole sprig library at your fingertips.

You can't do any of those things with kustomize. And that's really the (admittedly very opinionated) point. I think helm is perfect for the role the project assumed for itself, as a kubernetes package manager. It works well in that role. If you follow the conventions (as you must if you contribute to stable) then your thing has full config coverage with sensible defaults, you can install it multiple times and everything is namespaced.

Like any package manager it's somewhat difficult to write good packages that have all these attributes. And I think you have to admit that a well-written helm chart that covers all the bases is a lot less readable (albeit much more powerful, stipulated) than a simple yaml description of the object you want. It really does constitute a separate API between the developer and the system.

For our own deployable services we don't need all those package manager-like attributes. What we need is for the on-disk representation of our resources to be readable, and we want to be able to make simple changes to single files and not chase value substitution around through layers of indirection. We'll probably continue using helm for stable infrastructure things, but for our own services that are under continuous development and deployment it's come to feel like a mismatch.

Rotareti · on April 5, 2019

> Like any package manager it's somewhat difficult to write good packages that have all these attributes. And I think you have to admit that a well-written helm chart that covers all the bases is a lot less readable (albeit much more powerful, stipulated) than a simple yaml description of the object you want.

I agree to that, but I think the situations in which you have to write really flexible and complex charts are pretty rare. Most of the charts that we maintain and use internally, use very little templating, because they are tailored to fit our own environments, not every environment under the sun.

techntoke · on April 5, 2019

There are situations where it makes sense to create a shared chart that can be used by multiple teams to implement their services, but to be extensible enough to work with other common services. For example, cert-manager and external-dns. Or for a local environment, you may want to use Minio for AWS-compatible storage but Service Catalog and AWS Service Broker for other environments.

merb · on April 5, 2019

actually

> running in your cluster that’s also trying to reconcile desired state with actual state

is not needed anymore

sciurus · on April 5, 2019

Seconding this. We used helm just to render templates; we don't run tiller.

As I understand it the upcoming version of helm ditches tiller completely.

tlynchpin · on April 5, 2019

Glad to hear I'm not the only one. I appreciate the helm design and I don't dispute people have a fine time using it. I just need something simple to render yaml from a template without regard to cluster state, so we use good old fashioned sed.

What's annoying about helm is many cases would be fine as plain yaml template macro without any cluster state but many public projects are packaged as helm charts so to use them off the shelf you need to go full helm. Thankfully it's not too difficult to use helm template.

chupasaurus · on April 5, 2019

Yeah, who needs those pesky 3-way commits to assure that the actual state will be the one described in chart.

ecnahc515 · on April 6, 2019

Tiller actually doesn't even really look at the current state in the cluster. It basically just hard fails if the resource already exists when it shouldn't, but it doesn't attempt to reconcile manual edits to resources outside of the helm upgrade lifecycle.

atombender · on April 5, 2019

We use Helm, but we really only use it for two things: Templating and atomic deploys/deletes.

Helm templating is pretty terrible. Whoever thought generating YAML as text was a good idea deserves a solid wedgie. But it gets us where we need to be. During our prototyping of our GKE environment, we had lots of individual YAML files, which was not tenable.

Atomic deploys/rollbacks is essential. What Helm brings to the table is a high-level way of tying multiple resources together into a group, allowing you to both identify everything that belongs together, and to then atomically apply the next version (which will delete anything that's not supposed to be there anymore). Labels would be sufficient to track that, in principle, but you still need a tool to ensure that the label schema is enforced.

We don't use any of the other features of Helm -- they're just in the way. We don't use the package repo; we keep the chart for every app in the app's Git repo, so that it's versioned along with the code. We've written a nice wrapper around Helm so people just do "tool app deploy myapp -e staging", and it knows where to look for the chart, the values by environment etc. and invoke the right commands. (It also does nice things like check the CI status, lint the Kubernetes resources for errors, show a diff of what commits this will deploy, etc.)

I've looked at Kustomize, and I don't think it's sufficient. For one, as far as I can see, it's not atomic.

I'm hoping a clear winner will emerge soon, but nothing stands out. My favourite so far is Kubecfg, which is similar to the unnecessarily complex Ksonnet project, which has apparently been abandoned. Kubecfg is a very simple wrapper that only does Jsonnet templating for you.

I'd be interested in how Google does these things with Borg. My suspicion is that they're using BCL (which Jsonnet is based on, last I checked) to describe their resources.

renaudg · on April 6, 2019

Kapitan (https://kapitan.dev) is on my radar as a possible sweet spot between Kustomize and Helm.

Until now I've used Jinja2 templates for our Kubernetes definitions with a variables file for each environment, but this is awfully manual.

I'd love Kustomize to be sufficient for us as it's poised to become a standard thanks to now being part kubectl.

Unfortunately, in some ways its YAML patching philosophy is too limited, and coming from a templating system would be a step back even for relatively simple use cases : for example, you're very likely to need a few variables defined once and reused across k8s definitions (a host or domain name, project ID, etc). You can't really do that in a DRY way with Kustomize.

AFAIK, it also currently doesn't have a good story for managing special resources like encrypted secrets : it used to be able to run arbitrary helper tools for handling custom types (I use Sealed Secrets), but this has been removed recently for security reasons, prior to the Kubectl merge.

Kapitan seems to cover these grounds, and it doesn't carry the weight of those Helm features which are useless for releasing internal software, but I'm still a bit worried about the complexity and learning curve for dev teams.

Is there anything else out there that goes a little further than Kustomize, is simpler than Kapitan and Helm and fits well into a GitOps workflow ?

markbnj · on April 7, 2019

> for example, you're very likely to need a few variables defined once and reused across k8s definitions (a host or domain name, project ID, etc). You can't really do that in a DRY way with Kustomize.

I agree this is one of the areas where you feel the pinch of kustomize's rather puritan design philosophy. We've been able to work around those things in ways that aren't exactly elegant, but don't cause physical discomfort. For shared variables we keep a patch on disk and generate specialized copies of it during deployment. It's a hack, but it retains some of the benefits of a declarative approach. We also still use substitution in a couple of places. It's hard to use kustomize to update an image tag that changes with each build for example.

atombender · on April 6, 2019

I've only looked briefly at Kapita . It looks interesting, but I think what Helm gets right, and these other tools don't, is to have a real deployment story that developers can like. Helm doesn't excel here, but it's better than kubectl.

In short, I think the winning tool has to be as easy to use as Heroku. That means: The ability to deploy an app from Git with a single command.

It doesn't need to be by pushing to git. I built a small in-house tool that allows devs to deploy apps using a single command. Given some command line flags, it:

* Checks out a cached copy of the app from Git

* Finds the Git diff between what's deployed and current HEAD and pretty-prints it

* Checks the CI server for status

* Lints the Kubernetes config by building it with "helm template" plus a "kubectl apply --dry-run"

* Builds the Helm values from a set of YAML files (values.yml, values-production.yml etc.), some of which can be encrypted with GPG (secrets.yml.gpg) and which will be decrypted to build the final values.

* Calls "helm upgrade --install --chart <dir>" with the values to do the actual deploy.

The upshot is that a command such as "deploytool app deploy --red mybranch" does everything a developer would want in one go. That's what we need.

The tool also supports deploying from your own local tree, in which case it has to bypass the CI and build and push the app's Docker image itself.

Our tool also has useful things like status and diff commands. They all rely on Helm to find the resources belonging to an app, and we did this because Helm looked like a good solution back when we first started. But we now see that we could just rely on kubectl behind the scenes, because Helm's release system just makes things more complicated. We only need the YAML templating part.

I hate YAML templating, though, so I think something Kubecfg is the better choice there.

markbnj · on April 7, 2019

> The upshot is that a command such as "deploytool app deploy --red mybranch" does everything a developer would want in one go. That's what we need.

That tool for us is a gitlab pipeline, and I guess the logic in your tool is in our case split between the pipeline and some scaffolding in a build repo. The pipelines run on commit, the image is built, tested, audited, then the yaml is patched and linted as you describe before being cached in a build artifact. The deploy step is manual and tags/pushes the image and kubectl applies the yaml resources in a single doc so we can make one call. We recently added a step to check for a minimal set of ready pods and fail the pipe after x secs if they don't come up, but haven't actually started using it yet.

atombender · on April 7, 2019

That sounds similar, except you prepare some of the steps in the pipeline. Sounds like you still need some client-side tool to support the manual deploy, though. That's my point -- no matter what you do, it's not practical to reduce the client story to a single command without a wrapper around kubectl.

Interesting idea to pre-bake the YAML manifest. Our tool allows deploying directly from a local repo, which makes small incremental/experimental tweaks to the YAML very fast and easy. Moving that to the pipeline will make that harder.

Also, you still have to do the YAML magic in the CI. We have lots of small apps that follow exactly the same system in terms of deploying. That's why a single tool with all the scaffolding built in is nice. I don't know if Gitlab pipelines can be shared among many identical apps? If not, maybe you can "docker run" a shared tool inside the CI pipeline to do common things like linting?

markbnj · on April 5, 2019

> I've looked at Kustomize, and I don't think it's sufficient. For one, as far as I can see, it's not atomic.

Kustomize just applies structured edits to yaml. We run it to apply all the patches and output a single manifest file with all the resources, then send that to the master with kubectl apply. I suspect its as atomic as anything helm does, but I could be wrong.

atombender · on April 5, 2019

The "atomicity" (a misleading term, I agree, but I couldn't think of a better one as I was writing the comment) I was referring to was its ability to do a destructive diff/patch. In other words, if you apply state (A+B+C), then (A+B), it will remove C.

With plain "kubectl apply", there's the "--prune" flag, which is supposed to be able to track upstream resources via annotations. But it's still considered experimental alpha functionality, as least according to the "kubectl --help" for Kubernetes 1.11.9.

markbnj · on April 5, 2019

Yeah I read your reply above and I do see your point. For our own services that we continuously deploy this really just doesn't come up. If we have an http or rpc service it's going to have a deployment, a service, and maybe an ingress for pretty much all of time. If we needed to remove a thing in that scenario it might be the ingress if we change architecture, but it would be a big enough deal that cleaning up manually wouldn't be an added burden.

atombender · on April 5, 2019

Deletion is definitely less common, but we do this all the time. It keeps cruft from accumulating when people forget to delete resources.

It's also nice to be able to do "helm del myapp" and know that everything is wiped. You can do this with "kubectl delete -R -f", but I believe you need the original files. You can of course do something like "kubectl delete -l app=myapp", but this requires consistent use of a common label in all your resources.

ecnahc515 · on April 6, 2019

You can also use kubectl patch locally to apply a label to a set of manifests locally before piping into kubectl apply, eg:

  kubectl patch -f input.yaml --type merge -p '{"metadata": { "labels": {"key": "value"} } }' --dry-run -o json | kubectl apply -f - --prune -l key=value

ownagefool · on April 5, 2019

I might be misunderstanding something here bit is helm really atmoic?

Sure, it'll manage sets, but will it really flip versions in an atmoic way and does this really matter when it's doing 3 rolling upgrades, without anything to manage which traffic goes where?

It's possible it does more than I think it does, but I'm also wondering if atomic is the right word here?

jacques_chester · on April 5, 2019

> I might be misunderstanding something here bit is helm really atmoic?

No, the underlying Kubernetes API doesn't have multi-document transactions. What you're getting is closer to best-effort eventual consistency.

atombender · on April 5, 2019

Sorry, "atomic" was not the best term here.

I don't know if there is a word for it — it comes up in a lot of situations — but given a set of resources, Helm will diff against Kubernetes and wipe out anything superfluous. So if I've deployed a chart that has (A, B, C) and then do a new deploy of (A, B), then C will be deleted. "Destructive diffing"? I don't know.

Kubernetes itself is not atomic right now. I believe Etcd supports multi-key transactions now, so it could be done.

ownagefool · on April 7, 2019

Right.

But even if kubernetes and etcd supported multi-document transactions and thus gave you the ability to update data in an atomic way, you'd also need to be doing green/blue deploy and then atomically switching service calls, whilst maintaing old network traffic/service calls goes to old pods and new network traffic/service calls goes to new pods.

Pretty complicated and whilst it can be solved, I dont want us thinking that services will fully function during a helm or even a kubernetes update with major api changes between services. Likely your old service will call the new service and fail. This level of failure might be acceptable or you can work around it by having retries or keeping APIs backwards compatible for several versions.

Apologies for blabbering, I would consider the current default state of deploys with rolling upgades to be akin to eventual consistency, but it's possible it's more clever than that.

scott_s · on April 5, 2019

I believe Kubernetes itself supports that kind of a change - that is, re-applying a config and creating or deleting resources based on the update.

I would say that helm is batch, in that it's applying a bunch of stuff at once, but not doing it transactionally.

q3k · on April 6, 2019

> I'd be interested in how Google does these things with Borg. My suspicion is that they're using BCL (which Jsonnet is based on, last I checked) to describe their resources.

Yes. Kubecfg is the closest equivalent for k8s. And it also works the best for me (but I might be biased).

jacques_chester · on April 5, 2019

A colleague (Dmitriy Kalinin) recently created this: https://get-kapp.io/

Which does the grouping-of-resources thing and explicitly leaves templating up to you.

shosti · on April 5, 2019

Author here; yeah, helm is the part of our stack we’re least happy with tbh (it’s turned into a huge pile of templated yaml files for each project that seems like it might not be maintainable long-term). I’m curious about looking into kustomize, but the package management/rollback capabilities of helm are quite nice; is there a good “pre-baked” solution for that that doesn’t involve helm?

markbnj · on April 5, 2019

Great article, thanks for writing it.

> I’m curious about looking into kustomize, but the package management/rollback capabilities of helm are quite nice; is there a good “pre-baked” solution for that that doesn’t involve helm?

Not that I'm aware of. I'm only really familiar with helm, kustomize and ksonnet at this point, and of the three only helm took the approach of running a server-side stateful thing that could take responsibility for pseudo-resources called releases. I haven't followed work on the next version closely, but it will be interesting to see what changes as they ditch the client-server architecture. I assume it will be more like terraform with some sort of state file.

Rollbacks for us basically mean go back and redeploy the last good ref. The pipeline is completely deterministic from the commit. The underlying build system that produced the container might not be, but we don't have to rebuild that since it's still sitting in the registry.

mirchiseth · on April 5, 2019

Excellent write up, thank you. I was also looking to read on how the dev experience changed moving to k8s. Were they using heroku cli? Did you create an equivalent experience in k8s or adopted a new one?

tjungblut · on April 5, 2019

I had some really good experience with ArgoCD so far, it is agnostic to what you're using as a configuration tool (helm, pure yamls, etc) and it just works and has a nice UI.

SomaticPirate · on April 5, 2019

Second this, helm is good for day one operations of k8s. If you just want to have a redis deployment running now I think it’s fine for that. But then managing helm charts becomes its own task. Kustomize seems to be a happy medium. Also Helm v3 seems to be doubling down on that “second api” level with addition of embedding Lua in their templates.

techntoke · on April 5, 2019

Managing Kustomize configuration files is even worse in most regards, and you are likely to have a significantly larger amount of Kustomize files than Helm files, especially if you create a common chart and use dependencies correctly within Helm.

xnxn · on April 5, 2019

One intriguing new development is the possibility of leveraging Helm's chart ecosystem and Kustomize's patching mechanism—see Replicated Ship or the Kustomize Generators and Transformers KEP.

brown9-2 · on April 5, 2019

I think it’s important to separate the use cases of a) deploying off the shelf software like Mysql and Redis from b) deploying your own custom built software.

Helm seems a weird choice for B.

zapita · on April 5, 2019

Thank you for sharing this! I’m curious to learn more about your “mixed feelings” regarding built-in kustomize support in 1.14?

markbnj · on April 5, 2019

> Thank you for sharing this! I’m curious to learn more about your “mixed feelings” regarding built-in kustomize support in 1.14?

It's nothing too surprising :). Simply a preference for simple, compose-able tools. I personally feel like patching yaml client-side is outside the scope of a "ctl" tool.

_0nac · on April 5, 2019

If you can't wait for their teaser of "In a future post, I’ll cover the migration process itself", the GCP site has a hands-on tutorial of migrating an app which may prove interesting:

https://cloud.google.com/solutions/migrating-ruby-on-rails-a...

Disclaimer: I work for GCP and wrote most of that :D

At the end of the day, Heroku and GKE are rather different beasts with different philosophies, so migrations are never going to be 1:1. I expect this to become simpler over time as tooling matures though, eg. using https://buildpacks.io to build Docker images instead of having to craft them by hand seems promising.

ZeroCool2u · on April 5, 2019

I've been playing around with GCP, mostly app engine, the past couple weeks making a toy application to get a little more comfortable with building and designing serverless apps. Just wanted to say you, and whoever else, is writing those docs does an amazing job. Seriously, I've tried all 3 platforms, because of work and GCP is by far the easiest to become productive with and the main reason is the glorious documentation. Keep up the great work!

devlance · on April 15, 2019

Great to hear about your experience with the docs and appreciate the call out. You can use the "Send Feedback" button on any of the doc pages to let us know about any feedback you have. Real people read it.

ZeroCool2u · on April 18, 2019

I'll keep that in mind!

wrs · on April 5, 2019

We are happy with GKE, but have gone from Cloud SQL PostgreSQL back to self-managed PostgreSQL VMs. Cloud SQL is still stuck on version 9.6, and still has no point-in-time recovery ability. It's disappointing because the rest of the GCP offering is pretty well thought out and making rapid progress but Cloud SQL seems not to be getting much love.

philliphaydon · on April 5, 2019

AWS is definitely better for managed PostgreSQL compared to GCE/Azure.

wapoamspomw · on April 5, 2019

Aurora Postgres is pretty fantastic.

jdreaver · on April 5, 2019

We just switched to Aurora Postgres, and I agree wholeheartedly. The main DB client is a web app with ~4000 txns/second, and we instantly got a ~30% performance boost. We also got much more consistent performance for our slower queries thanks to Aurora's fast, custom storage engine.

riku_iki · on April 5, 2019

> The main DB client is a web app with ~4000 txns/second

curious what is your bill for this

jdreaver · on April 5, 2019

For Aurora we are on a db.r4.8xlarge, which is ~$3340 per month for the instance alone. Aurora also charges for IOPS against storage, and we pay about $350 per month for those IOPS. Our DB is ~400 GB, so ~$40 per month for that as well.

I am pretty sure we could live comfortably on a db.r4.4xlarge, but we have some bursty analytics applications that sometimes bring our load up for about a minute, and we like the peace of mind of headroom during peak hours (we are in edtech with super predictable traffic throughout US school hours).

Total side note: I just went to the pricing page and it looks like they just released db.r5 instances for Aurora (https://aws.amazon.com/rds/aurora/pricing/)! It might be time to try those out. I don't see an announcement just yet though...

philliphaydon · on April 5, 2019

We have some production PostgreSQL instances running on t2 instances, they are only ~100gb but with the optimized queries we don't exceed CPU credits and the execution time rarely exceeds 200ms. Our requests per second on these are < 50 tho.

pm90 · on April 5, 2019

They mentioned that there will be some interesting updates next week for Google Next; do you think your feelings might change if they announced support for newer versions and point-in-time recovery?

On a related note, if you have tried it, what do you think of their Cloud SQL MySQL offering?

Spartan-S63 · on April 5, 2019

Not the original commenter, but my team uses CloudSQL MySQL. It's not too bad. It's pretty performant, but we've run into some weird issues surrounding replicas.

As far as I know, MySQL 5.7 is as far as they go and like PostgreSQL, they don't support point-in-time recovery. Also, perplexingly (at least as of a year ago?), deleting the instance deletes all backups associated with it, so there's an opportunity to accidentally blow away all your data. I'm sure Google can recover it, but you'll have to submit a support ticket for that.

takeda · on April 5, 2019

Yes, when deleting an instance you can't reuse the same name for a week, so I'm quite sure it is there for at least that amount of time.

wrs · on April 5, 2019

At this point I’m reluctant to commit to it without evidence that they’ll keep it up to date. They can upgrade to v11, but how do I know it won’t still be at v11 in 2021?

We don’t use MySQL.

shosti · on April 5, 2019

Yeah AWS Aurora does have some nice capabilities compared to Cloud SQL; if it weren’t for the other issues mentioned in the article it would have been a driving factor towards choosing AWS. Still Cloud SQL seems “good enough” in general (other than the somewhat ungraceful way maintenance is handled); I’d be curious to hear if there are particular issues you ran into that made you switch to unmanaged SQL?

manigandham · on April 5, 2019

Its 2 major versions behind, has limited extensions, and a tight cap on number of connections.

zimbatm · on April 5, 2019

I was surprised when I found out that Cloud SDL doesn't automatically create snapshots before settings changes like Amazon's RDS does.

One time the Postgresql instance got stuck during a migration. There was no way of creating a fresh instance with a recent snapshot. Luckily Google's support team managed to unstuck the instance but that meant 24h of downtime.

xref · on April 5, 2019

Does AWS RDS have point-in-time recovery now? Last I used it a couple years ago it did not

mwarkentin · on April 5, 2019

Pretty sure they’ve had PITR (up to 5m ago) for years. It will restore from the latest snapshot and then replay logs to restore the state.

shay_ker · on April 5, 2019

> At one point we attempted to migrate to Heroku Shield to address some of these issues, but we found that it wasn’t a good fit for our application.

This part seems very hand wavy, given that Heroku Shield would've solved many (all?) of their problems.

> We were also running into limitations with Heroku on the compute side: some of our newer automation-based features involve running a large number of short-lived batch jobs, which doesn’t work well on Heroku (due to the relatively high cost of computing resources).

How much memory did their batch jobs actually need? If they're using Rails, then I'm assuming they're just running a bunch of Sidekiq jobs that are querying PG. I'm surprised that they'd need that much in terms of compute resources. They should be able to get very, very far by making PG do a lot of the work, or by streaming data from PG and not holding a lot of data in memory.

Even if they did need all this, the following two options seem WAY easier to manage:

1) Use dokku to run your super-intense Sidekiq batch jobs on beefy EC2 instances. You can still schedule them in your Rails app in Heroku, no big deal. Many engineering teams have to do this type of split-up anyway when it comes to Application Engineers and Data Engineers, this is just a simpler way to do it.

2) Similar to 1), use a different language runtime for the batch jobs. If you really need to run CPU intensive jobs, why are you using Ruby? If the jobs aren't so intense to mandate maintaining two languages (fwiw, not that hard), why will moving to k8s solve the issue?

Personally, I'm not sold on their decision to move to Kubernetes, and I use Kubernetes for my job.

shosti · on April 5, 2019

> This part seems very hand wavy, given that Heroku Shield would've solved many (all?) of their problems.

Author here; I don’t want to go into too much detail, but we tried Shield early on and had a negative experience that made us wary about using the platform (it seems to use a different tech stack under the hood from “normal” Heroku and lacks a lot of the things that make Heroku great). Also it’s very expensive compared to VPC-based solutions on AWS and GCP.

W.R.T. the batch jobs, I think I didn’t explain super well—we are using a different language and runtime from our “normal” background processing jobs (which use worker queues in Rails), it’s just that Heroku isn’t very well suited for the use case (which is basically FaaS-like but with long-lived jobs).

The “split” workflow you described is basically what we were doing (but with AWS Batch instead of Dokku); it’s just that it’s more cost-efficient to consolidate everything into one cluster (especially with preemptible gke nodes) and also better to have a common set of tooling for the Ops team.

To be fair, we haven’t yet completed the move from Batch to k8s so it’s possible that part of the plan won’t pan out as expected.

holografix · on April 5, 2019

Disclaimer: I work for Salesforce, Heroku’s parent company.

Heroku Shield is a service added on top of Heroku Private spaces.

You usually don’t need Shield unless you want to be compliant with things like HIPAA, etc

Which of course could be your case here.

ukd1 · on April 5, 2019

It is and we needed HIPAA. For me, it's priced aggressively (~600%, compared to zero for GCP) and wasn't ready when we looked - i.e. caused a few SEVs.

shay_ker · on April 5, 2019

> ~600%, compared to zero for GCP

I've always been curious. What do you need to do to be HIPAA compliant, from a technology standpoint? I figured it's similar to PCI compliance, but I'm not sure.

From what I've heard, though, the cost isn't quite zero, it's just that you have to own & implement all the work to be HIPAA compliant. But perhaps it's not that bad?

holografix · on April 6, 2019

I’m not in product or legal so take this with a grain of salt:

I know that for a customer I spoke to, keystroke logging on running dynos was something they were really interested in, from a compliance point of view.

I think being able to spin up Postgres DBs with rollbacks, fork and follow, HA etc etc (don’t want to sound like a sales rep) in this highly compliant environment also involves some serious infra wrangling.

oskari · on April 8, 2019

FWIW, Aiven PostgreSQL (http://aiven.io/postgresql) runs latest PG versions and is available in HIPAA compliant configurations on AWS and GCP. We don't charge extra for it, but have a minimum monthly commitment to justify the small setup overhead.

shay_ker · on April 5, 2019

Makes sense. It's hard to tell without understanding what the batch jobs are actually doing... it sounds like you're running something similar to EMR jobs?

We were about to use Heroku Shield at a previous gig. It's definitely expensive, but at our requirements it was still less than an engineer. I wouldn't run a ton of "big data" processing on Heroku nodes though. I'm sure/hope it exists, but I haven't seen a Heroku-ized version of data processing.

NightlyDev · on April 5, 2019

I've always wondered: Is kubernetes hard to host on your own on a couple of servers for production?

I've never tried, but I've heard a lot of people saying it's very hard, but people are often complaining about the most basic stuff as being hard.. Sooo..?

physicles · on April 5, 2019

More important than whether it’s easy or hard, is the fact that it’s unnecessary. GKE is great and you only pay for your compute (as opposed to EKS, which is over $140/mo for the control plane, a move which always struck me as a terrible business decision).

I maintain two prod and two test environments: AWS+kops (since before EKS existed), Alibaba+kubeadm (because China), and two local ones that used to be minikube but are now just kubeadm. I spent a total of about two months getting the knowledge to do that, and these days I spend about 4 hours a week on maintenance. I do minor version upgrades but put off major ones because there’s not much business value and upgrades can break stuff.

It’s my least favorite part of my job. I’d rather be coding or doing architecture, which is what I spend most of my time doing. We’ll be moving the AWS stuff to GKE in a couple months. Still haven’t heard much on the quality of Alibaba’s offering. Their IaaS is solid but some of their other services will sometimes throw errors that return 0 hits on google, no documentation, which scares me.

bauerd · on April 5, 2019

Depends on what you mean by hard. Bootstrapping a K8s control plane? Not so much, kubeadm does all the heavylifting for you. Keeping your install recent, etcd performant and backed up, maintaining and scaling underlying filesystems etc. is still a full-time job for production usage imho. Essentially you'll find yourself operating a number of distributed systems below your application stack.

flurdy · on April 5, 2019

You would have to follow this the whole way: https://github.com/kelseyhightower/kubernetes-the-hard-way

barbecue_sauce · on April 5, 2019

There are plenty of tools that do a lot of "the hard way" heavy lifting for personal deployments of Kubernetes, like kubeadm, though doing "the hard way" at least once is good so you know what actually is going on.

autotune · on April 5, 2019

Basically moved to my dream tech stack, jealous of the SRE’s over there.

maciejgryka · on April 5, 2019

We are hiring :) https://rainforestqa.com/careers

autotune · on April 5, 2019

Ha, would love to as a remote candidate out of NYC, don't see anything for SRE though.

neya · on April 5, 2019

I'm interested to know why they didn't move to Google AppEngine instead, which offers a better experience and more advanced features overall. Especially considering that AppEngine is a direct competitor to Heroku than Kubernetes engine.

tlrobinson · on April 5, 2019

The post has a whole section called "Why Kubernetes?"

neya · on April 5, 2019

It doesn't explain "Why not AppEngine"? Which was my question..

tlrobinson · on April 5, 2019

Presumably because of all the reasons they chose Kubernetes?

FTA:

* Kubernetes has a huge amount of traction in the DevOps landscape, with managed implementations from all the major cloud vendors and virtually endless training materials and complementary technologies.

* Kubernetes is open source, which was a major plus: it meant that we could avoid vendor lock-in and implement local development environments that mimic production.

* Kubernetes has a large feature set that fit well with our requirements, including our more exotic necessities like autoscaling based on custom metrics.

AFAIK none of those are true of AppEngine.

riku_iki · on April 5, 2019

> Kubernetes has a huge amount of traction in the DevOps landscape

the appeal of AppEngine is that you don't need DevOps, google manages and monitors services for you.

tlrobinson · on April 5, 2019

That's also true of "managed" Kubernetes platforms like Google Kubernetes Engine and Amazon EKS, the difference being no (or at least less) vendor lock-in.

riku_iki · on April 5, 2019

But you still need to write your app deployment scripts, integrate logs and monitoring.

ukd1 · on April 5, 2019

^ this is why it won

spyspy · on April 5, 2019

GAE comes with a full local dev setup out of the box. You can even run multiple apps with one command.

auslander · on April 6, 2019

> Kubernetes has a huge amount of traction in the DevOps landscape

Definition of 'hype' here :)

alasdair_ · on April 5, 2019

>I'm interested to know why they didn't move to Google AppEngine instead, which offers a better experience and more advanced features overall. Especially considering that AppEngine is a direct competitor to Heroku than Kubernetes engine.

AppEngine has an enormous number of limitations that you only hit once you scale up and gets very expensive very quickly.

latchkey · on April 5, 2019

It really bugs me when people make extremely blanket statements like this.

This is a huge success story that I was part of: https://cloud.google.com/customers/gearlaunch/

At our peak, we were processing $500,000 of online orders per day and spending about $5k/month on our entire GPC bill. It was a rounding error compared with our income.

The biggest expenditure was actually on the Postgres database servers that we needed for our analytics, not on AppEngine.

neya · on April 5, 2019

> AppEngine has an enormous number of limitations

I have used AppEngine in production for about 50+ clients and I am genuinely really curious to know what these limitations are. Maybe it is dependent on the programming language/framework? I run Phoenix/Elixir with Vue.JS + PostgreSQL as standard for most of my clients, it's really a breeze to work with.

In addition, these are the advantages of working with AppEngine:

https://news.ycombinator.com/item?id=17516530

clhodapp · on April 5, 2019

The AppEngine Standard Java 8 API is severely limited because it is coupled to the Servlet API, which severely screwed up on the design of its async API. As a Scala developer, this limitation pretty much sinks the platform for me, since the flexible environment is really expensive and has a worse value proposition vs managed Kubernetes.

haimez · on April 5, 2019

App engine flex is a “bring your own container” set up.

clhodapp · on April 5, 2019

Right, but at that point, you almost might as well move up to "full" Kubernetes if you know it. I'm not saying that there's no use case for App Engine flex, just that it is kind an uncomfortable middle-ground between standard and k8s.

devlance · on April 15, 2019

Worth pointing out that Cloud Run was just released and maybe better meets your needs? https://cloud.google.com/run/

rlancer · on April 5, 2019

Is standard more expensive in reality? Any data on this, flex is certainly very expensive

haimez · on April 5, 2019

Flex is not at all very expensive. Per instance cost is about 1/2 the cost compared to heroku, putting it on par with the likes of ECS in terms of cost but ships with usable monitoring, logging, and metrics out of the box with extremely generous free tiers that stay free way longer and that are still marginally cheaper than what you’d pay in AWS land if you exceed them.

rdsubhas · on April 5, 2019

I work at a large European startup. We are also heavily invested in and love kubernetes, but this is a totally apples to oranges comparison. Heroku is a PaaS and Kubernetes is a CaaS. K8s is great at what does, but to make it act like Heroku is a huge amount of effort and needs a team to manage the tooling around it. Assuming that such team is not needed, and k8s can simply be used like heroku with a bunch of extra CLI tools, usually leads to "Wild Wild West" clusters.

mcguireio · on April 5, 2019

Now if only they could quit spamming every inbox of every company I've worked at, id be impressed. Their sales automation is out of control. Honestly...I write them off before ever looking at their services.

zenlot · on April 5, 2019

For some reason those blogs become boring. Whoever moves to Kubernetes, or chooses to use some service of major cloud provider feels a need to write a blog about it having a same theme as everyone else.

Bonus points if it explains what containers are and differences between EKS, ECS and others.

techslave · on April 5, 2019

well you are missing the point.

the tech details are irrelevant in posts like this. no one, but no one, is throwing down knowledge and insight that anyone and their mama doesn’t know or can’t easily acquire. people just don’t give away their secret sauce that easily.

the 2 points are

a. a recruiting signal. it lets candidates know “we are like you”. we care about the tech stack and what we do. we care about elegant and “good” solutions.

b. it gives company devs a public sounding board. a chance to have a bigger voice than in house obscurity.

if you think these blogs are really about the tech, well then yes they are quite boring.

nitinreddy88 · on April 5, 2019

I guess it's becoming a trend "we moved from here to here" and let's make it to top of HN.

Unless current provider has horrible support/issues which you want to specify to help others, these articles are useless.

HN community making these articles to front page which neither describes architect challenges/scalability challenges is pointless

staticassertion · on April 5, 2019

I found the section on their evaluation of other potential systems like ECS to be interesting - it's something I'm currently considering myself.

ukd1 · on April 5, 2019

glad it was useful!

auslander · on April 6, 2019

> ..agile without hiring a large Ops team, .. we were beginning to outgrow Heroku. We ended up .. running on Google Kubernetes Engine

Or you could hire few cloud infra devs and stick to autoscaled VMs (EC2), not spending time on k8s? I bet k8s took you a while to set up and maintain.

kerng · on April 5, 2019

Crossing my fingers for you that GCP won't shut down your account because some mysterious Google AI decides so.

ec109685 · on April 5, 2019

One of the reasons for choosing GKE was reducing vendor lock in.

jasonvorhe · on April 5, 2019

That should no longer happen since they have added manual oversight to account closures.