Hacker News new | past | comments | ask | show | jobs | submit login
Google Kubernetes Engine is introducing a cluster management fee on June 6 (cloud.google.com)
665 points by agoell on March 4, 2020 | hide | past | favorite | 610 comments



Just received an email from Google Cloud -

"On June 6, 2020, your Google Kubernetes Engine (GKE) clusters will start accruing a management fee, with an exemption for all Anthos GKE clusters and one free zonal cluster.

Starting June 6, 2020, your GKE clusters will accrue a management fee of $0.10 per cluster per hour, irrespective of cluster size and topology.

We’re making some changes to the way we offer Google Kubernetes Engine (GKE). Starting June 6, 2020, your GKE clusters will accrue a management fee of $0.10 per cluster per hour, irrespective of cluster size and topology. We’re also introducing a Service Level Agreement (SLA) that’s financially backed with a guaranteed availability of 99.95% for regional clusters and 99.5% for zonal clusters running a version of GKE available through the Stable release channel. Below, you’ll find additional details about the new SLA and information to help you reduce your costs."


This is awful - I don’t think GCP is fully aware of their position in the market as the second, inferior choice. I took a bet on the underdog by using GCP and they bit me back in return. Especially considering their ‘default’ kubernetes config automatically sets you up with three(!) control planes in replication, that’s, as far as I understand, $~300 added to our monthly bill, for nothing.

Oh, and per their docs, three-control-planes decision is not reversible - I cannot in fact shut two of those down without shutting down my production cluster and starting a new one. https://cloud.google.com/kubernetes-engine/docs/concepts/typ...

Awful. Just so awful.

Edit: To answer some questions below - we have a single-tenant model where we run an instance of our async discussion tool (https://aether.app) per customer for better isolation, that’s why we had bought into Kubernetes / GCP. Since we have our own hypervisor inside the clusters, it makes me wonder whether we can just deploy multiple hypervisors into the same cluster, or remove the Kubernetes dependency and run this on a Docker runtime in a more classical environment.


Thank you for the feedback. The management fee is per cluster. You are not billed for replicated control planes. You can use the pricing calculator at https://cloud.google.com/products/calculator#tab=container to model pricing, but it should work out to $73/mo regardless of nodes or cluster size (again, because it's charged per-cluster).

There's also one completely free zonal cluster for hobbyist projects.


Seth — I appreciate you being here to take feedback, and for the clarification as well. The very surprising email I’ve received this morning is very hazy on the details, and the docs linked from the email are not updated yet.

The main issue is that not charging for the control plane and charging for the control plane leads to two very different Kubernetes architectures, and as per your docs, those decisions made at the start are very much set in stone. You cannot change your cluster from a regional cluster to a single zone cluster for example. So you have customers who built their stacks taking into account your free control plane, and you’re turning the screws in by adding a cost for it — but they cannot change the type of their cluster to optimise their spend, since, per your docs, those decisions are set in stone. That’s entrapment.

You should keep existing clusters in the pricing model they’ve been built in, and apply this change for clusters created after today.

That said, many of us made a bet on GCP. For us in particular, we made a bet to the point that our SQL servers are on AWS, but we still switched to GCP for ‘better’ Kubernetes and for not nickel and diming, since AWS had a charge that looked like it was designed to convey that they’d much rather have you use their own stuff than Kubernetes. It is a relatively trivial amount, but it makes a world of difference in how it feels and you guys know more than anyone how much of these GCP vs AWS decisions are made based not on data sheets but for the ‘general feel’ for the lack of a better word.

AWS’ message is that they’re the staid, sort of old fashioned, but reliable business partner. GCP’s message, as of this morning, is stop using GCP.


Thank you <3. I apologize the email was hazy on details. I can't un-send it, but I'll work with the product teams to make sure they are crystal clear in the future. I'm interested to learn more about what you mean about outdated docs? The documentation I'm seeing appears to have been updated. Can you drop me a screenshot, maybe on Twitter (same username, DMs are open).

These changes won't take effect until June - customers won't start getting billed immediately. I'm sorry that you feel trapped, that's not our intention.

> You should keep existing clusters in the pricing model they’ve been built in, and apply this change for clusters created after today.

This is great feedback, but clusters should be treated like cattle, not pets. I'd love to learn more about why your clusters must be so static.


> This is great feedback, but clusters should be treated like cattle, not pets. I'd love to learn more about why your clusters must be so static.

What’s inside our clusters are indeed cattle, but the clusters themselves do carry a lot of config that is set via GCP UI for trivial things like firewall rules. Of course we could script it and automate, but your CLI tool also changes fast enough that it becomes an ongoing maintenance burden shifted from DevOps to engineers to track. In other words, it will likely incur downtime due to unforeseen small issues.

It’s also in you guys’ interest that we don’t do this and clusters are as static as possible right now, since if we are risking downtime and moving clusters, we’re definitely moving that cluster back to AWS.


Hmm - have you considered a tool like Terraform or Deployment manager for creating the clusters? In general, it's best practice to capture that configuration as code.


Managing clusters in terraform is not enough to "treat clusters like cattle". Changing a cluster from a zonal cluster to a regional cluster in the terraform configuration will, upon a terraform apply, first destroy the cluster then re-create the cluster. All workloads will be lost.

I'm sure there are tools out there to help with cluster migrations, but it is far from trivial.


I think it’s a bit optimistic to assume that your customers will just change their deployment model because you introduced a fee.

You provide a web interface, so it’s reasonable to assume people will use it.


Cloud providers assume everyone is just like them, or like Netflix: Load balancers in clusters balancing groups of clusters. Clusters of clusters. Many regions. Many availability zones. Anything can be lost, because an individual data centre is just 5% of the total, right?

Meanwhile most of my large government customers have a couple of tiny VMs for every website. That's it. That's already massive overkill because they see 10% max load, so they're wasting money on the rest of the resources that are there only for redundancy. Taking things to the next level would be almost absurd, but turning things off unnecessarily is still an outage.

This is why I don't believe any of the Cloud providers are ready for enterprise customers. None of you get it.


I think you're wrong -- containers aren't ready for legacy enterprise, VMs are a better choice of abstraction for an initial move to cloud.

Get your data centers all running VMWare, then VMDK import to AWS AMIs, then wrap them all in autoscaling groups, figure out where the SPOFs have moved to, and only then start moving to containers.

In the mean time, all new development happens on serverless.

Don't let anything new connect to a legacy database directly, only via API at worst, or preferably via events.


Have you considered the Google App Engine on GCP in standard mode? That seems like a good fit based on your explanation but I could be wrong.


I had a similar conversation with a government customer, saying that they should pool their web applications into a single shared Azure Web App Service Plan, because then instead of a bunch of small "basic" plans they could get a "premium" plan and save money.

They rejected it because it's "too complex to do internal chargebacks" in a shared cluster model.

This is what I mean: The cloud is for orgs with one main application, like Netflix. It's not ready for enterprises where the biggest concern is internal bureaucracy.


Why would one want lots of little GKE clusters, anyway? Google itself doesn't part up its clusters this way, AFAIK. I don't want a cluster of underutilized instances per application tier per project; I want a private Borg to run my instances on—a way to achieve the economies-of-scale of pod packing, with merely OS-policy-level isolation between my workloads, because they're all my workloads anyway.

(Or, really, I'd rather just run my workloads directly on some scale-free multitenant k8s cluster that resembled Borg itself—giving me something resembling a PaaS, but with my own custom resource controllers running in it. Y'know, the k8s equivalent to BigTable.)


We run lots of small clusters in our projects and identical infrastructure/projects for each of our environments.

Multiple clusters lets us easily firewall off communication to compute instances running in our account based on the allocated IP ranges for our various clusters (all our traffic is default-deny and has to be whitelisted). Multiple clusters lets us have a separate cluster for untrusted workloads that have no secrets/privileges/service accounts with access to gcloud.

Starting in June our monthly bill is going to go up by thousands. All regional clusters.


Namespaces handle most of these issues. A NetworkPolicy can prevent pods within a namespace from initiating or receiving connections from other namespaces, forcing all traffic through an egress gateway (which can have a well-known IP address, but you probably want mTLS which the ingress gateway on the other side can validate; Istio automates this and I believe comes set up for free in GKE.) Namespaces also isolate pods from the control plane; just run the pod with a service account that is missing the permissions that worry you, or prevent communication with the API server.

GKE has the ability to run pods with gVisor, which prevents the pod from communicating with the host kernel, even maliciously. (I think they call these sandboxes.)

The only reason to use multiple clusters is if you want CPU isolation without the drawbacks of cgroups limits (i.e., awful 99%-ile latency when an app is being throttled), or you suspect bugs in the Linux kernel, gVisor, or the CNI. (Remember that you're in the cloud, and someone can easily have a hypervisor 0-day, and then you have no isolation from untrusted workloads.)

Cluster-scoped (non-namespaced) resources are also a problem, though not too prevalent.

Overall, the biggest problem I see with using multiple clusters is that you end up wasting a lot of resources because you can't pack pods as efficiently.


Aware of all of this, but we have a need to run things relatively identically in GKE/EKS/AKS and gVisor can't be run in EKS, for example.

We're okay with the waste as long as our software & deployment practices can treat any hosted Kubernetes service as essentially the same.



For those that didn't click through, I believe the parent is demonstrating that it is a best practice to have many clusters for a variety of reasons such as: "Create one cluster per project to reduce the risk of project-level configurations"


For robust configuration yes. However one can certainly collapse/shrink if having multiple clusters is going to be a burden cost-wise and operation-wise. This best practices was modeled based on the most robust architecture.


This is it exactly.

Thank you.


Namespace are not always well suited to hermetically isolate workloads.


It's probably not worth $75/month to prevent developer A's pod from interfering with developer B's pod due to an exploit in gVisor, the linux kernel, the hypervisor, or the CPU microcode. Those exploits do exist (remember Spectre and Meltdown), but probably aren't relevant to 99% of workloads.

Ultimately, all isolation has its limits. Traditional VMs suffer from hypervisor exploits. Dedicated machines suffer from network-level exploits (network card firmware bugs, ARP floods, malicious BGP "misconfigurations"), etc. You can spend an infinite amount of money while still not bringing the risk to zero, so you have to deploy your resources wisely.

Engineering is about balancing cost and benefit. It's not worth paying a team of CPU engineers to develop a new CPU for you because you're worried about Apache interfering with MySQL; the benefit is near-zero and the cost is astronomical. Similarly, it doesn't make sense to run the two applications in two separate Kubernetes clusters. It's going to cost you thousands of dollars a month in wasted CPUs sitting around, control plane costs, and management, while only protecting you against the very rare case of someone compromising Apache because they found a bug in MySQL that lets them escape the sandbox.

Meanwhile, people are sitting around writing IP whitelists for separate virtual machines because they haven't bothered to read the documentation for Istio or Linkerd which they get for free and actually adds security, observability, and protection against misconfiguration.

Everyone on Hacker News is that 1% with an uncommon workload and an unlimited budget, but 99% of people are going to have a more enjoyable experience by just sharing a pool of machines and enforcing policy at the Kubernetes level.


It doesn't have to be malicious. File Descriptors aren't part of the isolation offered by cgroups, a misconfigured pod can exhaust FDs on the entire underlying Node and severely impact all other pods running on that node. Network isn't isolated either. You can saturate the network on a node by downloading large amount of data from maybe GCS/S3 and impact all pods on the node.

I agree with most things you’ve said around gVisor providing sufficient security, but it's not just about security, noisy neighbors are a big issue in large clusters.


IOPS and disk bandwidth aren't currently well protected either.


RLIMIT_NOFILE seems to limit FDs, or am i missing something?


CRDs can’t be safely namespaced atm, aiui.


We use Skaffold and it’s great. I’m talking about very minor unforeseen stuff that causes outages, not that we do it manually.


This is an interesting exchange if only for the thread developing instead of a single reply from a rep; it’s nice to see that level of engagement.

More importantly, this dialogue speaks volumes to Google’s stubbornness. Seth’s/Google’s position is: do it the Google way, sorry-not-sorry to all those that don’t fit into our model.

Like we haven’t heard of infrastructure as code? That can’t paper over basics like being unable to change your K8s cluster control plane. This is precisely the attitude that lands GCP as a distant #3 behind AWS and Azure.


Google stubbornly resists the idea that their platforms have actual users who depend on things not being broken for them constantly. It's cultural.

AWS has the complete opposite model.


Even still it’s not like it’s non-trivial to just bring up and drop clusters. Just setting up peering with cloud sql or https certs with GKE ingress can be fraught with timing issues that can torpedo the whole provisioning process.


How is this any helpful? Are they supposed to implement everything in terraform or similar, is that your suggestion? Why don't you completely remove the editable UI then, if whoever is using it is doing it wrong. What a typical arrogant and out of touch with customers Google response.


It could be totally unrelated but having an option such as equivalent TF along with REST and CLI options could dramatically speed up the configuration process.


Right now with k8s it is definit a 'ongoing maintenance'. We allocate around 0.5-2 pt per week on only doing that. If we would not do that, most of our stuff would be already outdated.

I know already too many people which are stuck at a certain k8s version. Do not allow that to happen!


> clusters should be treated like cattle, not pets.

Off-topic, but is this really how people do k8s these days? Years ago when I was at Google, each physical datacenter had at most several "clusters", which would have fifty thousand cores and run every job from every team. A single k8s cluster is already a task management system (with a lot of complexity), so what do people gain by having many clusters, other than more complexity?


The most common thing I've heard is "blast radius reduction", i.e. the general public are not yet smart enough to run large shared infrastructures. That seems something that should be obviously true.

People had exactly the same experiences with Mesos and OpenStack, but k8s has decent tooling for turning up many clusters, so there is an easy workaround


I still feel like that would only work in very niche cases.

I mean, if people aren't smart enough to run a large shared infrastructure, how can I trust them to run a large number of shared clusters, even if each cluster is small. The final scale is still the same.


Updating 100 clusters bares less risk than updating a single giant one.


And no SRE would allow you to run your application in a single cluster. Borg Cells were federated but not codependent - Google's biggest outages were due to the few components that did not sufficiently isolate clusters from one another.

Clusters are probably still pets to most orgs, but the lessons about how to manage complexity still apply. Each of my terraform state files is a pet and I treat it like such... but I also use change-control to assure that even though I don't regularly recreate it from scratch, I understand all that was there.


There are potentially quite a few benefits of being able to spin up clusters on demand [1]:

* Fully reproducible cluster builds and deployments.

* The type of cluster (can be) an implementation detail, making it easy to move between e.g Minikube, Kops, EKS, etc. After all, K8s is just a runtime.

* Developers can create temporary dev environments or replicas of other clusters

* Promote code through multiple environments from local Minikube clusters to cloud environments

* Version your applications and dependent infrastructure code together

* Simplify upgrades by launching a brand new cluster, migrating traffic and tearing the old one down (blue/green)

* Test in-place upgrades by launching a replica of an existing cluster to test the upgrade before repeating it in production

* Increase agility by making it easier to rearchitect your systems - if you have a pet, modifying the overall architecture can be painful

* Frequently test your disaster recovery processes as a by-product for no extra effort (sans data)

* Reduced blast radius

[1] https://docs.sugarkube.io/#benefits-of-sugarkube


I think for one, you cannot easily have Masters span regions without risk of them falling out of communication. Similarly the workers should be located nearby. If there's a counterexample to this I'd love to see it.


> clusters should be treated like cattle, not pets

Heh... how many teams actually treat their clusters like cattle, though? Every time I advocate automation around cluster management, people start complaining that "you don't have to do that anymore, we have Kubernetes!"

Some people get it, yes, but even of that group, few have the political will/strength to make sure that automation is set up on the cluster level—especially to a point where you could migrate running production workloads between clusters without a potentially large outage / maintenance window.


> clusters should be treated like cattle, not pets

Sugarkube is designed to do exactly that.

[1] https://docs.sugarkube.io


For any real production system you have to use terraform and their ilk to manage clusters, as you need to be spinning up and down dev/qa/prod clusters.

I don't know GCP though. In the past I've seen kube cluster archs which are very very fragile as they spin up. If that's the case with GCP I can see why you wouldn't do the above and rather hand hold their creation.


I would love if this happened in the real world, but for every well-architected automated cluster management setup I’ve seen using Terraform, Ansible, or even shell scripts and bubble gum, there are five that were hand-configured in the console and poorly (or not at all) documented, and might not be able to re-create without a substantial multi-day effort.


GKE makes it incredibly easy to spin up + tear down GKE clusters. UI/CLI/Terraform etc, all just work for 99% of the cases.


> but clusters should be treated like cattle, not pets

Ha. They should, but they are absolutely not. Customers typically ask "why should we spend time on automating cluster deployment when we are going to do it just once?" and when I explain that it's for when the cluster goes away, if it goes away, they say it's an acceptable risk.

The truth of the matter is, even in some huge international companies, they don't have the resources to keep up with development of tools to have completely phoenix servers. They just want to write automation and have it work for the next 10 years, and that's definitely not the case.


> This is great feedback, but clusters should be treated like cattle, not pets. I'd love to learn more about why your clusters must be so static.

Clusters often are not "cattle". If your operation is big enough, then yes, they might be. Usually they aren't, they are named systems and represent mostly static entity, even if the components of said entity change every hour.

Personally, I'm running in production a cluster that by now had witnessed upgrades from 1.3 to 1.15, in GKE, with some deployments running nearly unchanged since then.

Treating it as cattle makes no sense, especially since on API level, the clusters aren't volatile elements.


I think our architects’ heads would explode if they were told we should treat them like cattle.

For us, Clusters are a promise to our developers. We can’t just spin up a new cluster because we feel like it. I must be missing something or maybe our culture is just different.


As an architect, I am currently working on our org's first cloud deployment initiative. Due to federal compliance / regulations, we have no write access in higher / production boundaries, and everything is automated via deployment jobs, IaC, etc. Given the experience of the teams involved, I took the opportunity (burden) of writing nearly all the automation. If your architects can't handle shooting sick cattle in prod, I'd say get new architects.


> If your architects can't handle shooting sick cattle in prod, I'd say get new architects.

For every 1 competent person who can develop a solution to fully automate everything, there are 99 others who can automate most of that, maybe minus a cluster or DB or two, and another 500 whom cannot do either, but can run a CentOS box at a reasonable service level.

You experience using great tools and your vast knowledge of k8s each day to do all that, and you have the support of your org, but those other folks may not have the tools, support, knowledge, or sometimes even the capability to attain the knowledge to do that. That doesn't mean they're useless to anyone, to be cast off at will!

The type of thinking that leads to, "get new engineers/developers/designers/architects if yours aren't perfect" needs to die, and needs to be replaced with, "let's do what we can to train and support our current employees to do a great job" because, frankly, there aren't enough "superstar" people who have your skills to do that at every org.

We need to work on accepting people for who they are-- helping them to strive to be a bit better each day of course-- and utilizing those skills in the right place, rather than trying to make everyone the same person with the same skills doing the same things.

Some applications don't need clusters which can be rebuilt and destroyed at will, so let's not make that the bar for every project.


That only means that the "named system" moves upwards, somewhere. It doesn't mean you don't have "pets".

P.S. I consider "pet vs cattle" to be a horrible metaphor that should be taken behind the barn and short like the diseased plague bearer it is.


> I'm sorry that you feel trapped, that's not our intention.

Please don't do this. You can apologize for your actions work to improve in the future , but you cannot apologize for how someone feels as a result of your actions.

Also, intent doesn't matter unless you plan to change your behavior to undo or mitigate the unintended result.


> clusters should be treated like cattle, not pets

So my understanding that the official k8s way to upgrade your cluster is also to throw it away and start a new one (with some cloud provider proprietary alternatives).

Let's say there is something actually important, stateful, single-source-of-truth in my k8s cluster, like a relational DB that must not lose data. I don't want downtime for readers or writers, and I want at least one synchronous slave at all times (since the data is important). I also don't want to eat non-trivial latency overheads from setting up layers of indirection.

What's the recommended way of doing this?


In this case, one needs fault-tolerance. One way to achieve it is through replication, where an extra copy (or perhaps a reconstruction receipt) of your DB instance runs somewhere else. Usually DBs achieves this through transactions. Additionally, you can have distributed DBs, which then use distributed transactions for achieving so.

I am not expert, but K8s handles task replication, and either spawn or route a request to another task instance somewhere else. However, the application logic itself must handle the fault-tolerance (by handling its states through transactions or something else) should an instance fail. K8s doesn't do that for you.


Distributed transactions are a non-starter – I already said I want to run master-slave with synchronous replication, which is basically what you want to do in >99% of cases where you have a DB with important stuff in it.


It is not really about what you want, but about how to migrate the DB* to another location while also minimizing its downtime/slower-ness.

You need to instantiate a secondary DB replica somewhere else and start the DB migration. Since there will be "two instances" of the same DB running, you will also need to set up a (temporary) proxy for routing and handling the DB requests w/ something like this:

1) if the data being requested is already migrated, request is handled by the (new) secondary replica. 2) Primary instance handles the request, otherwise 2.1) Requested data should be migrated to secondary replica (asynchronously, but note that a repeated request may invalidate a migration).

Turn the proxy router off once the whole state of your primary DB instance is fully migrated, making the secondary replica the primary one. That's really just a napkin recipe for completing a live migration, though.

* We are now getting into the distributed transaction world because you can never be 100% sure that writing to 2 databases can succeed or fail at the same time. There is this talk from this guy who deals with similar problem you have: http://www.aviransplace.com/2015/12/15/safe-database-migrati...


It's kind of revealing that there are zero replies to this, 10 hours later.


Thanks for replying to feedback Seth. Stuff like this - following the Google Maps API massive pricing increase, the G Suite pricing increase - is what makes me wary about building stuff on GCP: I'm afraid that Google will increase prices for stuff I rely on. AWS has made users expect pricing for cloud services to only go down.


Can someone explain to me the cattle vs. pets analogy? I'm not sure I get it.


"In the old way of doing things, we treat our servers like pets, for example Bob the mail server. If Bob goes down, it’s all hands on deck. The CEO can’t get his email and it’s the end of the world. In the new way, servers are numbered, like cattle in a herd. For example, www001 to www100. When one server goes down, it’s taken out back, shot, and replaced on the line."

http://cloudscaling.com/blog/cloud-computing/the-history-of-...


https://devops.stackexchange.com/questions/653/what-is-the-d...

This is a pretty good in depth explanation, but at a high level if a your server dies and you are extremely upset about it (similar to if your pet died) you are putting too many eggs in that single basket, with no secondary plan. Conversely if you build your infra in such a way that your server dying is something you see no worse than how a farmer sees one of his cattle dying (which are raised to be killed) - you are much better prepared for the inevitable downtime from your server and can very easily recover


I think a pets vs. corn would be a better analogy as it's a high value single instance vs. a commodity farm.


GKE can't offer financial backed SLOs without charging for the service. This is something that, I assume, significant customers want and that competitors already have:

https://aws.amazon.com/eks/sla/


Workers are not free and never were. So they were already charging.


Correct, but the control plane nodes _were_ free and had no SLA. This changes that. [edit: spelling]


_were_ free. (Emphasis yours.)


> and the docs linked from the email are not updated yet.

That about sums up most things Google does for developers.


I thought the standard advice for Google stuff was "there are always two systems - the undocumented one, and the deprecated one"


What I most frequently heard was "There are two solutions: the deprecated one, and the one that doesn't work yet."


That's a wonderful quote that applies to many companies. (I think that will resonate with the Unity developer community right now.)


I agree the rollout is a little bumpy but I'm curious what workloads you are using k8s for where a $74/mo (or $300/mo) bill isn't a rounding error in your capex?


Think about any medium sized dev agency managing 3x environments for 20x customers. That's 50k/year out of the blue.

My problem is that this fee doesn't look very "cloud" friendly. Sure the folks with big clusters won't even notice it, but others will sweat it.

The appeal of cloud is that costs increase as you go, and flat rates are typically there to add predictability (see BigQuery flat rate). This fee does the opposite.


It's charged per-cluster. GKE encouraged (and was great for) running multiple clusters for all kinds of isolation and security reasons.

This cost increases rapidly for those scenarios.


$3600/year is significant for a startup on a shoestring budget.


Then manage k8s yourself.

Or, better yet, don't use k8s. You don't need it, especially as a startup on a shoestring budget. You can migrate later if you decide you really need to, but just a plain LAMP gets you 99% of the way.


But then you can’t put k8s on you resume for when said startup implodes.


If there were a lower complexity way to deploy containerized apps supported widely I think tons of people would go for it. Currently there's not really much of a middle ground between Cloud Run and K8s offered. It's kind of absurd, honestly.


Google App Engine has been exactly this since 2008.


My impression of app engine is that you have to use all the cloud* services like SQL, cache, etc, which will make it significantly more expensive, even if it does that app layer fine. Is that wrong?


It's wrong today. It was true in 2008, when GAE was Google's entire cloud offering (and there was no Docker or K8s).

Around the time "Google Cloud Platform" became a thing, Google changed GAE from an encapsulated bubble into a basic frontend management system that interacts with normal services through public APIs (either inside or outside GCP). It's more expensive than GCE, but it's fully managed and lets you skip the devops team.


So ask for Google cloud for startups? One free cluster is enough to get started.


> Google Cloud for Startups is designed to help companies that are backed by VCs, incubators, or accelerators, so it's less applicable for small businesses, services, consultancies, and dev shops.[1]

This makes it seem like Google Cloud for Startups is aimed at startups that aren't really on a shoestring budget.

[1]: https://cloud.google.com/developers/startups/


Like every "special offer for startups", it's a vulture waiting for funding round to close.


My boss viewed it as the main way to deploy containerized systems offered by cloud providers and figured we could run most of our internal only things in it for a couple hundred a month - we don't really need the guarantees and scale, and he saw it as a way to avoid creating excess numbers of dedicated VMs, as cloud run isn't sufficient for our non-static stuff. This view up until now has actually been quite accurate because of the dedicated usage discounts.

So I guess the big question in my mind is how do you run containerized apps in the major clouds besides K8s if it's a bulldozer and you just need a cargo bike? Is there something simpler?



E.g. https://aws.amazon.com/ecs/ - that was quite nice https://aws.amazon.com/fargate/ - haven't tried


You could consider the flexible app engine


How is it not a rounding error for Google?


I think you mean opex and not capex here.


Sorry for technical tangent but curious. Your decision making on GCP appears to appeal to best of breed + cost. But you put SQL Server on AWS? If you are saying SQL Server is better on AWS than on Azure it would be interesting to learn why.


We need MySQL 8 because of window functions, which GCP does not offer. That is available on AWS.


My bad. A clever marketing decision made me see the capital SQL as SQL Server since I am used to people saying Postgres, MySQL or SQL.


I see. Curious about the latency between your GCP apps and the database on AWS - is it like 1 ms or 100ms? Does it affect the product?


About 4ms for us. However, we chose our data centres on both ends very carefully. There are tables online you can find that for those pings, one such is here: https://medium.com/@sachinkagarwal/public-cloud-inter-region...

However this means we are paying for egress on both sides. This was something we chose to eat due to GCP Kubernetes, but considering today’s changes, it probably no longer makes sense.


So you decide to eat egress costs in perpetuity, which will scale as you go, but a one time increase of $70 per month is enough to make you go back? What are you even trying to optimize for?


sheesh, the lengths you guys went to build that monstrosity.


If this cost bothers you a great deal, why not just deploy a new cluster?


Hi Seth,

What about clusters that are used for lumpy work loads? Like data science pipelines? For example, our org has a few dozen clusters being used like that.

Each pipeline gets its own cluster instance as a way to enforce rough and ready isolation. Most of the times the clusters sit unused. To keep them alive we keep a small, cheap, preemptive node alive on the idle cluster. When a new batch of data comes in, we fire up kube jobs which then triggers GKE autoscaling that processes the workload.

This pricing change means we're looking at thousands of dollar more in billing per month. Without any tangible improvement in service. (The keepalive node hack only costs $5 a month per cluster.) We could consolidate the segmented cluster instances into a single cluster with separate namespaces, but that would also cost thousands in valuable developer time.

I don't know how common our use pattern is, but I think we would be a lot better served by a discounted management fee when the cluster is just being kept alive and not actually using any resources. At $0.01, maybe even $0.02, per hour we could justify it. But paying $0.10 to keep empty clusters alive is just egregious.


Those empty clusters that you get for free cost Google money. Perhaps it never should have been free, because that skewed incentives towards models like this.


Unfortunately, even if they switch to dynamically started clusters, the latency of spinning a new cluster is much higher than the latency of adding a bunch of preemptible nodes to existing node pool :/


Google are (were) not the only ones offering this free control plane model, though. My DigitalOcean DOk8s managed tend toward unstable if they are used with too small of node pools. (I don't know why that is, but it seems like a good way to make sure I pay attention to the workloads and also spend at least $20/mo for each cluster I run with them.)

It will be interesting in any case to see if DigitalOcean and Azure are going to follow suit! I'd be very surprised if they do, (but I've also been wrong before, recently too.)


The term is "loss leader." GKE provides the manager node, and cluster management so that we don't have to. And in exchange you sell more compute, storage, network, and app services. This is some ex-Oracle, "what can we do to meet growth objectives," "how can we tax the people who we own" thinking. They're customers, not assets Tim. Your cloud portability play should be the last project to jerk them around on.


Keep in mind that GKE cluster management was paid in the original GKE. GCP only stopped billing for cluster management when EKS released free cluster management.


When did EKS release free cluster management?


On GKE, you can use a single cluster with multiple node pools to achieve a similar effect. Just set the right affinity on your job resources.


PTAL at doing Multi-Tenancy in GKE!

https://cloud.google.com/kubernetes-engine/docs/best-practic...

We don't recommend using node pools for isolation.


If it is only workload isolation, why not?


For secure isolation, we learned it's not sufficient. It's good for resource isolation though.

PTAL at https://www.youtube.com/watch?v=6rMGRvcjvKc


That guide looks nice. Have you guys thought about releasing a terraform module or even a cloud composer workflow that will set that up in a project?


Thanks! We actually do and shipped together with the best practices.

https://github.com/GoogleCloudPlatform/gke-enterprise-mt

Please give us feedback there in case you hit any issue!


Yes, this is the general approach. However it unfortunately has security implications as you are putting MT workloads on a pool with access back to a shared control plane. Dealing with customer uploaded code is a nightmare.


Here you go:

https://github.com/rcarmo/azure-k3s-cluster (this is an Azure template that I use precisely for testing that kind of workloads - spinning up one of these, master included, takes a couple of minutes at most).

(full disclosure: I work at Microsoft - Azure Kubernetes Service works fine, but I built the above because I wanted full control over scaling and a very very simple shared filesystem)


Likely this model is precisely why they are introducing this fee.

I guess they realized they couldn't make cluster management MT.


We currently spin up dev clusters with a single node. $73/mo is going to basically double the cost of all of these..


This highlights a sorta-weird consequence of this pricing change: suddenly pricing incentivizes you to use namespacing instead of clusters for separating environments.

(As a security person: ugh.)


That’s interesting - I think you’re right. We might move our staging cluster into our main production deployment.

More likely though, AWS or OpenShift running on bare metal on a beefy ATX tower in the office. We want to have production and staging as close to each other as possible, so this is an additional reason and a p0 flag on reducing the dependency on Google-specific bits of Kubernetes as much as possible, hopefully also useful for our exit strategy as well.


Kubespray works well for me for setting up a bare bones kubernetes cluster for the lab.

I'll use helm to install metallb for the load balancer, which you can then tie into whatever egress controller you like to use.

For persistent storage a simple NFS server is the bees knees. Works very well and a NFS provisioned is a helm install. Very nice, especially, over 10GbE. Do NOT dismiss NFSv4. It's actually very nice for this sort of thing. I just use a small separate Linux box with software raid on it for that.

If you want to have the cluster self-host storage or need high availability then GlusterFS works great, but it's more overhead to manage.

Then you just use normal helm install routines to install and setup logging, dashboards, and all that.

Openshift is going to be a lot better for people who want to do multi-tenant stuff in a corporate enterprise environment. Like you have different teams of people, each with their own realm of responsibility. Openshift's UI and general approach is pretty good about allowing groups to self-manage without impacting one another. The additional security is a double edged. Fantastic if you need it, but annoying barrier to entry for users if you don't.

As far as AWS goes... EKS recently lowered their cost from 20 cents per hour to 10 cents. So costs for the cluster is on par with what Google is charging.

Azure doesn't charge for cluster management (yet), IIRC.


(replying to freedomben): NFS has worked fairly well for persistent file storage that doesn't require high performance for reads/writes (e.g. good for media storage for a website with a CDN fronting a lot of traffic, good for some kinds of other data storage). It would be a terrible solution for things like database storage or other high-performance needs (clustering and separate PVs with high IOPS storage would be better here).


It's good to have multiple options if you want to host databases in the cluster.

For example you could use NFS for 90% of the storage needs for logging and sharing files between pods. Then use local storage, FCOE, or iSCSI-backed PVs for databases.

If you are doing bare hardware and your requirements for latency are not too stringent then not hosting databases in the cluster is also a good approach. Just used dedicated systems.

If you can get state out of the cluster then that makes things easier.

All of this depends on a huge number of other factors, of course.


> Have you used NFS for persistent storage in prod much?

I think NFS is heavily underrated. It's a good match for things like hosting VM images on a cluster and for Kubernetes.

In the past I really wanted to use things iSCSI for hosting VM images and such things, but I've found that NFS is actually a lot faster for a lot of things. There are complications to NFS, of course, but they haven't caused me problems.

I would be happy to use it in production, and have recommended it, but it's not unconditional. It depends on a number of different factors.

The only problem with NFS is how do you manage the actual NFS infrastructure? How much experience does your org have with NFS? Do you already have a existing file storage solution in production you can expand and use that with Kubernetes?

Like if your organization already has a lot of servers running ZFS, then that is a nice thing to leverage for NFS persistent storage. Since you already have expertise in-house it would be a mistake not to take advantage of it. I wouldn't recommend this approach for people not already doing it, though.

If you can afford some sort of enterprise-grade storage appliance that takes care of dedupe, checksums, failovers, and all that happy stuff, then that's great. Use that and it'll solve your problems. Especially if there is some sort of NFS provisoner that Kubernetes supports.

The only place were I would say it's a 'Hard No' is if you have some sort of high scalability requirements. Like if you wanted to start some web hosting company or needed to have hundreds of nodes in a cluster. In that case then distributed file systems is what you need... Self-hosted storage aka "Hyper Converged Infrastructure". The cost and overhead of managing these things is then relative small to the size of the cluster and what you are trying to do.

It's scary to me to have a cluster self-host storage because storage can use a huge amount of ram and cpu at the worst times. You can go from a happy low-resource cluster, then a node fails or other component takes a shit, and then while everything is recovering and checksum'ng (and lord knows what) the resource usage goes through the roof right during a critical time. The 'perfect storm' scenarios.


I use an SMB file share for my node pools - here's how to set up a non-managed cluster on Azure that does that: http://github.com/rcarmo/azure-k3s-cluster


Have you used NFS for persistent storage in prod much? I know people do it, but numerous solutions architects have cautioned against it.


My experience with NFS over the years has taught me to avoid it. Yes, it mostly works. And then every once a while you have a client that either panics or hangs. Despite the versions of Linux, BSD, Solaris, Windows changing over the decades. The server end is usually a lot more stable. But that's of little to no comfort to know that yes, other clients are fine.

However, if you can tolerate client side failure then go for it.


What? Shouldn't you try to make the creation and deletion of your staging cluster cheap instead of moving it to somewhere else?

And if that is your central infrastructure, shouldn't it be worth the money?

I do get the issue with having cheap and beefy hardware somewhere else, i do that as well, but only for private. My hourly salary spending or wasting time on stuff like that costs the company more than just paying for an additional cluster with the same settings but perhaps with much less Nodes.

If more than one person is using it, the multiplication effects for suddenly unproductive people, is much higher. Also that decreases the per head cost.


I suspect I'm in the minority on this, but I would love for k8s to have hierarchical namespaces. As much as they add complexity, there are a lot of cases where they're just reifying complexity that's already there, like when deployments are namespaced by environment (e.g. "dev-{service}", "prod-{service}", etc.) and so the hierarchy is already present but flattened into an inaccessible string representation. There are other solutions to this, but they all seem to extract their cost in terms of more manual fleet management.


Hey - I'm a member of the multitenancy working group (wg-multitenancy). We're working on a project called the Hierarchical Namespace Controller (aka HNC - read about it at http://bit.ly/38YYhE0). This tries to add some hierarchical behaviour to K8s without actually modifying k/k, which means we're still forced to have unique names for all namespaces in a cluster - e.g., you still need dev-service and prod-service. But it does add a consistent way to talk about hierarchy, some nice integrations and builtin behaviours.

Do you want to mention anything more about what you're hoping to get out of hierarchy? Is it just a management tool, is it for access control, metering/observability, etc...?

Thanks, A


Any reason why you put your link behind a URL shortener besides tracking number of clicks?

Since there are no character limits to worry about here unlike Twitter, better to put up the full URL so the community can decide for themselves if the domain linked to is worth clicking through or not.



Hey, thanks for asking! My interests in it are primarily for quota management -- in my experience, this is inevitably a hierarchical concern, in that you frequently run into the case of wanting to allot a certain cluster-wide quota to a large organizational unit, and similarly subdivide that quota between smaller organizational subunits. Being able to model that hierarchy with namespaces localizes changes more effectively: if you want to increase the larger unit's quota in a flat namespace world, for example, there's no way to talk about that unit's quota except as the sum of all of its constituent namespace quotas.


Thanks! We're not currently planning on implementing a hierarchical resource quota in HNC, but HNC is trying to define a definition of hierarchy that could certainly be used to create a HRQ. Give me a shout if you're interested in contributing.


Yes, namespace alone isn't sufficient for isolation. Would you be able to look at our latest Multi-Tenancy best practices?

https://cloud.google.com/kubernetes-engine/docs/best-practic...

It's a living product which comes with Terraform modules. We introduced various features to enable doing Multi-Tenancy as well (and more on their way!)


I’m sorry if i am reading it wrong, but this guide to multi-tenancy seems to suggest not being multi-tennant and instead running a separate cluster per project. This seems more like scaling single-tenancy than multi-tenant (no bin packing oppty for instance.) or did i read it wrong?


Sorry if it was confusing. You need to read into more about how to set up in-cluster multi-tenancy.

We do recommend robust configurations for production setup (e.g. dev, staging and production) however you can certainly squash and skip it if not necessary.

Thanks for the feedback though. We'll consider adding such notes explicitly.


>You need to read into more about how to set up in-cluster multi-tenancy.

I am trying to do that. Where would you suggest? Throughout the comments on this post, when people suggest namespace, pod or node level separation you ask them to PTAL and read the link which suggests the single-tenant cluster-per-project approach (that is under the Multi-tenant cluster, confusingly.) The link you sent talks about cluster-per-project, which is not multi-tenancy as I understand it. Perhaps a different name would be less confusing (robust federated cluster administration?)


"This guide provides best practices to safely and efficiently set up multiple multi-tenant clusters for an enterprise organization."

This "multiple" multi-tenant clusters part isn't coming through. Please do jump into "Securing the cluster" section to cut corners and learn what to do in a single cluster. We're fixing the sections to avoid the confusions. Thanks for the feedback!

https://cloud.google.com/kubernetes-engine/docs/best-practic...


You can dedicated nodes by namespace, at which point the isolation is pretty strong.


Nit: we don't recommend dedicated nodes for isolation. PTAL at https://www.youtube.com/watch?v=6rMGRvcjvKc And the guidance from GKE is at https://cloud.google.com/kubernetes-engine/docs/best-practic...


* Assuming you also configure strong RBAC, network isolation and don't let persistent volumes cross-talk


As also a security person (:wave:), you can use dedicated node pools and workload identity to isolate workloads in the same cluster.


Workload identity is a GCP-specific beta feature for mapping to GCP IAM, right?



yes


Or move to minikube and friends. If it's a dev environment you can usually get away with such things.


Kubernetes consumes a lot of CPU even when idle, due to the polling design, which makes Minikube is a really poor fit for developer machines. It's well known [1] to sit there eating 20-30% CPU, draining your battery and frustrating your life while doing absolutely nothing. This applies to all the Kubernetes distributions, including Kind and Docker Desktop. Not sure if the same applies to K3s, though.

[1] https://github.com/kubernetes/minikube/issues/3207, https://github.com/docker/for-mac/issues/3065, https://github.com/docker/for-mac/issues/3539, etc; there must be dozens and dozens of these


It does. My k3s setups (like https://github.com/rcarmo/azure-k3s-cluster) take up nearly 100% of the puny master node I allocate to them, and kill my Raspberry Pi SD cards as well.

Swarm, in comparison, is much friendlier (and you can use it for dev/test across multiple machines just fine)


Assuming you can do that, and your system is not using namespacing for its own purposes.


I know kubeflow can use namespaces for its own purposes, but otherwise I thought that was quite rare. Namespaces are intended to be used for exactly this usecase (isolating teams and/or workloads).

What kind of system have you seen where this isn't true?


We've seen plenty of examples where people do this. Sometimes it's different teams (e.g. the ML team is namespaced away from the primary customer flow) and sometimes it's for different customers. Really, it sounds like we're in agreement about why this happens, the confusion is just whether or not that normally happens within a company in the course of doing business?


Gotcha. If you're looking for subnamespacing, HNC offers self-service subnamespace creation, have you looked into that? https://github.com/kubernetes-sigs/multi-tenancy/tree/master...

(My apologies if we've chatted about this before in another venue, I'm losing track of whom I've already talked to)


I use namespaces to isolate names. So I can have a service called memcached in namespace A and also one in namespace B.


Its still billed by the minute. If you run your dev clusters all the time 24x7 then they apparently are critical enough.


For a dev environment, why not host your own hardware? Especially if cost is a concern, it seems like a no brainer.


Generally curious, isn’t Docker Kubernetes an option?


It is - especially on OSX, it is very cpu and memory intensive thought.


> There's also one completely free zonal cluster for hobbyist projects.

Nice.


Too many people drank the cloud kool-aid. The move from day one was to create provider agnostic cloud architectures and repent the use of provider-specific services.

That said they do make it damn hard. Our k8s cluster is as basic as it comes, no databases, simple deployments, but we do still have a dependency on Google Cloud Loadbalancer (which we hate).

If pricing goes up too much from this we'll move, but the GCL dependency will be a PITA :/


We’re in the same situation — we’ve engineered for minimum provider-specific dependencies but GKE LoadBalancers were where they got us via arm twisting as well. There is no way to expose a cluster to the outside world in a production environment otherwise.


It's kind of ridiculous internal load balancers can't get automatic certs. We've had to do a stupid dance just to get certs via the LE DNS challenge out of band, and then regularly install them on internal LBs.


Same! I still manually provision some certificates just because LEGO/etc. just don't work with GCP + Google Cloud Load balancer! And the docs for the entire subject are useless..


We paid for long-lasting wildcard certs because of that. Which Apple killed a few days ago. It’s going to be fun when they are close to expiry.


Maybe I don't understand your problem, but can't you just use Traefik (https://docs.traefik.io/user-guides/crd-acme/). It will get certs from letsencrypt for you.


TCP coming into your cluster means that you practically have to go through kube-proxy (because the load balancer and the Kubernetes scheduler aren't perfectly synchronized) and that the load-balancer can't balance per-request, only per-connection. If the load balancer terminates TLS, then it can just watch cluster endpoints and automatically route to the right node without any extra hop through kube-proxy, and it can also split large individual requests out of HTTP/2 and GRPC streams.

I'm guessing 99% of workloads won't notice either of these issues, but it is an actual issue.


Key word: "internal" -- these aren't on the internet, Traefik does ALPN, which means the LB itself has to be on the Internet. (Or something else that leaks the cert to the LB, but that doesn't sound any less complicated than using the DNS challenge.)


Does cert-manager not for your needs?

https://github.com/jetstack/cert-manager


> As this project is pre-1.0, we do not currently offer strong guarantees around our API stability.

Notably, we may choose to make breaking changes to our API specification (i.e. the Issuer, ClusterIssuer and Certificate resources) in new minor releases.


In practice, the cert-manager team has made breaking changes in probably close to 1/3 of minor releases (which is really fine pre 1.0, IMHO), there has been comprehensive guidance to lead users or cluster admins through upgrading, that walks through exactly what steps are needed, and followed well does not interrupt your cluster's service in any way.

It's not dark magic, it might make building off of it in the form of integrations prohibitive, but they have done a great job making sure users can upgrade one release to the next.

It is a little bit of a treadmill, but it certainly beats manually renewing certificates!


Do you also have occasional outages because the load balancer gets into a confused state and changes take 10+ minutes to propagate with no re-course other then than to destroy and re-create the entire resource?


There are ways to expose your cluster to public and/or run your own load balancers on GKE (or any other cloud k8s deployment).


Load balancers are created by the pre-installed controller that each cloud provides to let external traffic reach the nodes. You don't have to use it.

It's no different than running your own load balancer like HAProxy pointed at the nodes which forward to a node-port service.

There's also MetalLB if you're running your own hardware: https://metallb.universe.tf/


How about managing own k8s running on VMs / bare-metal?

Pretty much anyone who works in ops longer understood from the go that its impossible to be totally provider-agnostic. K8S is just a nice api on top of provider api that still requires provider specific configuration.


Disclaimer: I work for Red Hat and am very biased, but this is my own honest opinion.

If you're going to run on bare-metal or in your own VMs, OpenShift is very much worth a look. There are hundreds, maybe thousands of ways to shoot yourself in the foot, and OpenShift puts up guard rails for you (which you can bypass if you want to). OpenShift 4 runs on top of RHCOS which makes node management much simpler, and allows you to scale nodes quickly and easily. Works on bare metal or in the cloud (or both, but make sure you have super low latency between data centers if you are going to do that). It's also pretty valuable to be able to call Red Hat support if something goes wrong. (I still shake my head over the number of days I spent debugging arcane networking issues on EKS before moving to OpenShift, which would have paid for a year or more of support just by itself).


So you move from vendor lock in with the cloud provider, to vendor lock in with an expensive, proprietary* IBM k8s distribution with its strange nonstandard opinions about workflows that you have to manage yourself?

Don't get me wrong, I appreciate RHAT's code contributions very much, they have done a lot for k8s! But running OKD on one's own is a bad idea, while paying for IBM support makes you as much a hostage as anything Google will do to you. Better to just stick with a distribution with better community support and wait for RHAT's useful innovations to be merged upstream (while avoiding the pitfalls of their failed experiments...)

* Yes it's open source, but the community version (okd) isn't really supported, nor is it widely used, so if you're serious about running this you're doing so for the Enterprise support and you're going to be writing those checks


Thanks for the edits (and acknowledging our contributions). I wasn't sure if you were just trolling or not before, so I didn't want to engage.

Your concern is valid, and I agree with you that OKD is not supported enough. I have my own theories as to why, but I will keep my criticism "in the family" (but do know there are people that want to see OKD be a first-class citizen, and know we are falling short right now). We had some challenges supporting OKD 4.x because the masters in 4.x now require Red Hat CoreOS (and nodes it is highly recommended), but RHCOS was not yet freely available. This is obvoiusly a big problem. Now that Fedora CoreOS is out, there is a freely distributable host OS on which to build OKD, so it will be better supported and usable. FWIW I have a personal goal to have a production-ish OKD cluster running for myself by end of the year.

I'll admit I am a little offended at being called a "proprietary IBM K8s distribution," but I don't think you meant to be offensive. IBM has nothing to do with OpenShift, beyond the fact that they are becoming customers of it. Every bit of our OpenShift code is open source and available. You are right that it's not in a production-usable state (although there are people using it) but it's a lot better than you'll get from other vendors. We are at least trying to get it to a usable state, unlike many of them. We are strapped for resources like everyone else, and K8s runs a mile a minute and requires significant effort to stay ahead). This space is still really young and really hot, and I am confident we'll get the open source in a good, usable, state, much like Fedora and CentOS are. I also don't think OpenShift is really all that expensive considering the enormous value it provides. The value really does shine at scale, and may not be there for smaller shops.

I don't blame you for waiting, I probably would too. Our current offering is made for enterprise scale, so isn't tailored to everyone. I've heard OpenShift online has gotten better, but haven't tried it myself. Eventually I plan to run all my personal apps on OKD (I have maybe a dozen, mostly with the only users being me and my family), but until then I've been using podman pods with systemd, which will be trivial to port to OKD once it's in a good state.


Yes, I realize I will come across as being overly negative here, and I apologize for this.

It's not that openshift is bad per se, I just don't imagine it solves many problems an org that is fretting about lock in or gcp pricing will have. Such an org is probably cost sensitive and looking for flexibility, but openshift is expensive, and if you adopt its differentiating features you are de facto locking yourself in. And if you do not leverage those features out of a desire to avoid lock in, you are effectively paying a whole lot just for k8s support...

And I really should say, for certain orgs (especially bigcos) this may well be worth it, I just don't think it is a good option for anybody worried primarily about avoiding vendor lock in and keeping costs in check.


Honestly, the fact that it requires rhel or centos was enough to make it not feasible for us. I wish that would change, since I can't think of any reason the distribution should affect openshift.


There are several reasons, many of them are that the nodes themselves are managed by OpenShift operators. If you run (as cluster admin) `oc get clusteroperators` you'll see plenty that are for hosts, such as an upgrader and a tuner. If the operators had to be distro agnostic it would be a support nightmare, and we wouldn't be able to do it. With RHCOS (immutable OS) we also have enough guarantees to safely upgrade systems without human intervention. Can you imagine doing that in an Enterprise environment while trying to support multiple distributions? I can't.


Can you describe what kind of tuning the rhel tuning tools do that are not available using the normal kernel constructs? Last I checked tuna and others did everything you could do in Ubuntu, but without knowing the guts of the system.

Again, I think the idea of OS is great, but you've lost us, and likely other big customers because of that restriction. Having old kernels is just not an option for some people.


I'm not informed enough to tell you what the tuning tools do, so I'll dodge that question. But "[h]aving old kernels is just not an option for some people" is exactly the type of problem this solves. You literally don't have to know or care what kernel your node runs, because it doesn't matter! The OS is a very thin layer underneath K8s, a layer which is entirely managed by applications running as pods (supervised by an operator) on the system. Whatever apps/daemons/services you need to run move to pods on OpenShift. If you need to manage the node itself there is an API for it. If you truly need underlying access, then this is not for you, but you'd be amazed at how many people (myself included) started out balking at this and thought "no way, for compliance we need <tool>" but after re-thinking the system realized you really don't. By "complicating" the system with immutable layers, we actually simplify the system. It was much like learning functional programming to me. By "complicating" programming by taking away stuff (like global variables, side-effects, etc) it actually simplified it and reduced bugs by a huge margin.

If you are like me and are old school and think "huh, yeah that makes me nervous" I completely understand that, but we've seen some serious success with it. I'm a skeptical person, and telling me I can't SSH to my node freaks me out a bit, but I'm becoming a convert.

I would also note if you buy OpenShift you get the infrastructure nodes (masters, and some workers for running openshift operator pods) for free (typically, but I'm not a salesperson so don't hold me to that if I've misspoke :-P), so you aren't paying for the super locked in OS. I suppose you do have to pay for RHEL8 or RHCOS on the worker nodes running your pods, and we don't support other distros (because we expect a very specific selinux config, CRI-O config (container runtime), among other things), so I guess there's some dependence there, although I recommend RHCOS for all your nodes and then just use the Machine API if you need it.


btw. I would never run OpenShift after the CoreOS debacle. This was a really sketchy move and still is.

Yes RH did a lot for k8s, but killing of a working distribution without a direct migration path that is like "start again". will make your customers angry.

also I think the OpenShift terminology is way too much and OpenShift should be a way more thinner layer on top of k8s.


I agree, the CoreOS thing went down grossly. It was a technical nightmare tho. They deeply merged CoreOS and Fedora/RHEL and created a hybrid animal. Creating an upgrade path would have been an insane challenge, and in the end the advice would have been to rebuild anyway to avoid unforeseen issues. They could have left CoreOS repos and stuff up tho and given a longer transition period.


I work for Microsoft and I quite like OpenShift (although we have AKS as the default, "vanilla" managed k8s service, you can also run OpenShift on Azure).

Sure it's opinionated, but at this point, which flavor of k8s isn't? Even k3s (which I play around with on https://github.com/rcarmo/azure-k3s-cluster) has its quirks.

Everyone who's targeting the enterprise market has a "value-added" brew of k8s at this point, so kudos to RH for the engineering effort :)


Don't want to sound snarky, but how about an upgrade path from 3.11 to 4.x? I am a heavy Openshift user and it seems that RH just dumped whatever architecture they had with pre-4 clusters and switched to a Tectonic-like 4.x installations without any way to upgrade other than a new installation. This makes it hard to migrate with physical nodes.


Not snarky at all, you are more correct than you may realize. The upgrade is a challenge because we move from RHEL 7 to Red Hat CoreOS 8 as the host OS, as well move all the OpenShift code to operators instead of in the binary. OpenShift itself can now also manage the nodes (thanks to RHCOS), as well as fully self-upgrade. For container runtime we move from Docker to CRI-O (for a number of reasons, high on the list is security). It's a massive overhaul which is really more akin to a brand new product than a major version. I generally can't stand major overhauls because most of the time they don't give you much new and often bring regressions. However, this really was a tremendous improvement, the fruits of which are not even fully realized.

Because of that major change, the cluster upgrade path is a little more involved than usual. It's a complete reinstall of the OS and rebuild of the cluster. There are tools to help tho, and as the path is tread things will get easier and better supported. If you go through the same wave as me, you'll be annoyed at first but then once it is done and you have a 4.x cluster you'll be really happy with it (especially when you can manage everything as an operator).

Luckily from an app perspective very little will change since Kubernetes is the API. A sibling comment linked to some helpful documentation. I can't be specific right now but I can tell you that your need is known, and some very smart people are working on it. If you want to email me (check my HN bio page) or jump in the Keybase group called "openshift_okd," I'm happy to chat more about it (not in an official Red Hat support capacity, just as friend to friend :-) ). I haven't done a migration myself yet but I know people who have, and I plan to get into it personally soon as well.


There is a free tool to help with migrations from OpenShift 3.x to 4.x - https://access.redhat.com/documentation/en-us/openshift_cont...

It's not a rolling upgrade, but it allows you to move the apps, PVs, policies in as granular a manner as you wish.


I'm running my own on bare metal dedicated servers. You will need to install a few extra things (MetalLB for LoadBalancer, CertManager for SSL, an ingress controller (nginx, Ambassador, Gloo), and one of the CSI plugins for your preferred storage method). It is extra work but as a personal cluster for hobby work, I'm paying $65/mo total for the cluster. Same specs would probably be $1000/mo at a public cloud provider.


MetalLB looks fun, hadn't seen that one.

If you want something production-grade (i.e. doesn't say "beta" on the tin) then I think Calico should solve most of the same problems too (it does BGP peering to your ToR switch):

https://docs.projectcalico.org/networking/determine-best-net...

Does MetalLB do something extra I'm missing?


Metallb had more features than calico before calico had the external service advertisement feature. Now that they do, services can use ecmp load balancers just as metallb does.


Traefik can also automatically get letsencrypt certs, if you don't want to use CertManager. Traefik gives the added benefit of also doing Ingress.


From what I've seen looks like managing k8s on your own often ends up requiring a dedicated team to keep with their insane release cycle.


Can confirm. Depending on your cluster size you will need at least 2 dedicated people on the "Kubernetes" team. You'll probably also end up rolling-your-own deployment tools because K8s API is a little overwhelming for most devs.


To be honest we are heavly using EKS and AKS in other teams and each of those teams has a dedicated devops subteam to help them not only with k8s but also other infrastructure because bare k8s is pretty useless for business.

So either way you end up in a situation where you require dedicated devops team pr dedicated teammembers to keep up with changing requirements.


I started learning Kubernetes and was overwhelmed. The biggest problem was the missing docs. I filed a Github issue asking for missing Kubernetes YAML docs:

https://github.com/kubernetes/website/issues/19139

Google will ignore it like all of the tickets I file. The fact is that Google is in the business of making money and they are focused on enterprise users. Enterprise users are not sensitive to integration difficulty since they can just throw people at any problems. So eventually everything Google makes will become extremely time-consuming to learn and difficult to use. They're becoming another Oracle.


"kubectl explain deployment"


Confirmed here too. I don’t think management realized the amount of toil that Kube requests to stay up to date.


The big problem with running your own cluster is the extra machines you need for a high-availability control plane, which is expensive. That is why Amazon and now Google feel like they can charge for this; you can't really do it any better yourself.


Doing load balance, DNS, and egress has been way uglier in the Google Cloud K8s than I expected. Pushes projects towards doing it themselves in cluster IMO.


Interesting. As a mid level GCP customer, it won't make a big dent on our bill specifically, but in the end, I'm not sure this pricing move is a smart strategy.

With this fixed fee model, the change will barely make a difference (== Google revenue) for the large customers who can spare the money, but will create a significant entry barrier to that side project / super-early stage that considers getting hooked on GCP, specifically GKE.

Then again, not my decision to make.


Thats for me the most frustrating thing with GCP, AWS and Azure. I would never use them as a very early small 3 people startup or for private reasons.

There is no billing protection (which could make you very poor very fast) and every service has a certain cost and quality which is just not feasable in the beginning.

Even GKE with its free kubernetes master does block a lot of resources on the nodes: https://cloud.google.com/kubernetes-engine/docs/concepts/clu...

Also a ton of great features on gke you will probably never use if you are too small. It is so much cheaper to just get cheap hardware somewhere and put your own k8s onto it if you have more time then money.

Even on Digital Ocean you have the load balancer problem: you need to use the provided and also 'costly' LoadBalancer service. There is only one hacky way to prevent it by exposing your ingress on the host and mapping that one ip but then you loose all the self healing stuff and loadbalancing capability.


Both AWS and Google offer free tier products and pay-for-what-you-use products. Reserved instance pricing starts at around $25/year. Many other incredibly useful products (S3, Lambda, VPC, etc.) are free with an instance or start at $0.

You can set billing alerts that will project your monthly budget every hour, and send you an alert when it's projected to be exceeded.

IMHO your claim (that there is an entry cost barrier) is the opposite of reality. AWS and Google have brought incredible power and choice to developers starting at zero initial cost.


My main concern is, that i can't define an upper limit. My billing alert is nice and im aware of it, but it doesn't help you if someone takes over your account, mines bitcoins on expensive machines and a day later you read your email.


AWS refunds you when that happens.

The fundamental issue with setting a limit is it's technically infeasible to decide what to do when it's exceeded. They have no way of knowing what assets to terminate. The way to avoid what you describe is to shut off access to APIs that you don't want to use, and keep your credentials safe.


AWS has budgets that do exactly what you want. For the "cost budget": "Monitor your costs against a specified dollar amount and receive alerts when your user-defined thresholds are met."


If I’m running a startup trying to develop a product with limited staff, the last thing I want to be worried about is the “undifferentiated heavy lifting”. I want to be concentrating on what is going to add business value.

Then again, I know enough about AWS and how to control my cost.


> you loose all the self healing stuff and loadbalancing capability.

I mean, yes? You can either build your own for “free”, or pay for the value-add features DO provides that you have described. I don’t see where the problem is.


Good point about GKE blocking resources on the nodes. I wish they would at least allow more control over that with the introduction of the master fees.


The 'cluster control plane is free' selling point was basically the _only_ thing I saw from all the different groups I worked with which was in GKE's favor. Yes you can get one free cluster but anyone serious about using Kubernetes would have _at least_ two clusters (a prod and non-prod staging cluster), so unless you're a true hobbyist (and the use case for K8s in that realm is pretty slim unless it's to backstop work projects) this effectively means you're going to pay as much for GKE cluster control planes than you do for EKS.


Can you help me understand how these changes would be _more_ than EKS?


Sorry, misread the original post, it would be the same.


Maybe I'm misunderstanding your comment but at $0.10 an hour wouldn't GKE pricing be half of what the EKS pricing for managed control plane at $0.20 an hour?

https://aws.amazon.com/blogs/containers/cost-optimization-fo...


EKS is $0.10/hour


From my link above:

>"The EKS control plane is the easiest to understand with a fixed cost of $0.20 per hour."


AWS pricing changes over time (although they are careful not to trap or antagonize customers with changes like this one). You linked to a static blog post that has a price from the past.

https://aws.amazon.com/eks/pricing/ is the up to date page. https://aws.amazon.com/blogs/aws/eks-price-reduction/ is the announcement of the price cut.


Oh wow, that price reduction is fairly recent too. Thanks for the updated info!


My approach with any Google b2b product - always have a plan to migrate out of Google and never agree to anything that locks you to Google.

After seeing what they did to Google Maps and Api.AI / Dialogueflow jumped from free to 5k$ overnight - just can't trust them.


There is a whole generation of future CTO / VP of Engineering types who are coming up on these reputations for GCP, AWS, Azure, etc. and it'll be interesting to see how the biases play out over the next 5-10 years. I predict a strong move back to self-hosting once the pains of i.e. self-managing a bare metal K8s cluster come down, as well as storage/ram/cpu prices continuing to drop. I for one welcome it.

There is a billion dollar company on the horizon for whoever can best commoditize bare metal with an apple-esque usability model.


Maybe Apple does it? They are good at building apple-esque usability models and are business aligned to sell metal.


> never agree to anything that locks you to Google.

How is Google different from other cloud poviders in terms of vendor lock-in?


Google has a habit of killing off "enterprise" services and applications. Google has a habit of raising prices on services, while I've only ever seen price decreases from AWS and Azure. There is a reason why Azure has seen significantly faster growth than Google, even though both were late entrants to the cloud computing space. It seems that Google is doing their best to shoot themselves in the foot.

I used to recommend GKE as the best way to get started with K8S. With this price change, that advantage is gone. This price change kills off any advantage Google had for getting started in the cloud. Their enterprise-unfriendly habits have already killed off any advantages Google had for established, larger customers of cloud computing.


Google is the most likely to kill off a service you rely on?

AWS _still_ has SimpleDB kicking around. That has been retired as a product for more than 8 years now.

I haven't seen Azure do similar, either, but I haven't paid as much attention to them.


Which Google Cloud service have they killed? This was one of their first services and it's still running: https://cloud.google.com/appengine/docs/standard


Fair point - yes, I kind of try to avoid it for any provider. But special thing about Google is they are raising prices or worse cancelling or modifying services at will.

For AWS or Azure I developed way more trust over time - could be subjective - but also could be that there is a reason Google is distant 3rd in the game.


Being locked into a provider that does not increase prices or modify services in a non-compatible way ( cancel etc ) works much better than being locked into a provider that does.


GCP is actually more the 3rd inferior option, behind Azure. Gartner lists Azure as just behind AWS for IaaS providers, and GCP a more distant 3rd:

https://pages.awscloud.com/Gartner-Magic-Quadrant-for-Infras...


I don't have a horse in the race, but I work with Gartner alot, and would encourage you to actually read their guidance carefully and read about the Magic Quadrant methodology carefully. Gartner analysts go through the features and functions very closely, but the magic quadrant ratings are heavily weighted by Gartner customer and other peer feedback.

The magic quadrant isn't a good housekeeping seal of approval. It's a screener for an architect in Fortune 500 or .Gov to show social proof that their product selection isn't insane.

The "cautions" for GCP are about the nascent state of their field sales and solution architects, enterprise relationship management, and limited partner community.

The "cautions" for Azure are poor reliability, poor support, and sticker shock.

My takeaway was very different from yours. When you read the analysis, it was reflective of a mature, dominant player (AWS) and two highly capable challengers with different issues.

Google is a newer business that is missing some services (ie. anything user facing) and is transitioning from a weird sales model to a more conventional enterprise one. Microsoft has an established business process and best in class sales org, but they tend to use sticks of various types to force adoption and are organizationally poorly equipped to support customers.


Azure AKS is pretty terrible TBH in comparison to GKE.

Also lack of SLA / shady SLA does not help.

Ps. Talking as someone with hands on experience.

Ps2. Azure support is terrible and their response times are constantly breaking SLA..


Counterpoint to your PS2 - I've used Azure support at both my enterprise job and my microISV - each time the responses have been quite quick, and each time they have been helpful.

Honestly, I've been pleasantly surprised.


First response is often generated by automat. A generic one “hello this is X ill take care of your case” and the next real person response can happen the very next day when the ticket urgency is CRITICAL. Thats hilarious.

The best support I got a mispleasure to work with didn’t even know how the service Im having an issue is even working... And it was a “technical” ticket.

Seems like after the last round of pouching of best support engineers by AWS, Azure is left with only outsourced mob.


I’m keen to hear what problems you’ve had with AKS if you’ve got time to share them here.


For example - metrics-server provided by AKS from the go is running in highly insecure manner. If you want to change that -> you cannot because they have automatic tools to keep bugging you out.

Another - constant disconnections of PV..

Another - 1/3 times the new provisioned node in vmss has a broken kubelet and doesn’t successfully register in control-plane. I was literally shocked when it happened twice in a single day. Response from support was that we are supposed to monitor that ourselves and drain unsuccessfully provisioned node - (we already were and it was mentioned in opening ticket) makes scaling horizontally REALLY PAINFUL..

CNI default reservation of IPs (30) - cannot be less - so if you have a service node running and you want only few pods to run on it for HA - well sucks to be you.

Kubenet not working until up recently with anything - for example AG, tho AG is a disaster of a service by itself.

Various API failures related to networking - sometimes control-plane lost connection to AKS subnet for some time (fixed by itself by still...)


Gcp is not gke. In my opinion the gke offering fromg gcp is the best right now.


I agree. Though I will say that IAP and gsuite groups backed IAM is nice.


i kind of agree, after trying AKS, EKS and GKE. GKE blows everyone out of the water.


Anecdotal, I know, but for prospective Latacora customers this is absolutely not reflected in market share. It's AWS first, GCP second, Azure very distant third. I'd happily believe Azure is dominating in some segments where MS showers prospective customers with millions of dollars in credit, but IMO a blind person can see Azure does not have the product offering to warrant a "completeness of vision" that is right on the heels of AWS.


According to Canalys, GCP is at 6% cloud market share (in dollars), Azure at 17.4% and AWS at a bit over 32%.

https://www.canalys.com/static/press_release/2020/Canalys---...

AliCloud and Rackspace are very close to GCP as well.

That being said, if you're planning on running Kubernetes, I'd choose GCP over any other offering - the tooling and support just seems better, in my entirely subjective opinion.


Anecdotally and in my opinion, Azure is more complete than GCP. Between stuff like this and their product dropping stigma, most of my customers (in the cloud consulting space) are trying to get into Azure. This is across every industry we work in (retail especially). I've come across 2 customers in 3 years of consulting that want anything to do with GCP.


> Azure is more complete than GCP

It has more features, yes. How well those features work is another matter entirely.


Can you give more details on this? Anecdotal is fine

I loosely follow AWS, GCP and Azure but I always get mixed opinions on them, especially the last two


I use only small subsets of Azure, but every bit that I do use leaves me with a feeling that I'm the first user of a minimum viable product.

To pick a random example that I'm familiar with: Azure DNS Zones.

When I used AWS Route 53, the main issue I had with it was that I thought the cutesy marketing name was stupid. That's about it. By reading through the docs, I learned a little bit about DNS I didn't know, and I got to learn about the clever engineering that AWS did to work around issues with the DNS protocol itself. In the end, it had more features than I needed, and the basic stuff Just Worked.

When I tried to use Azure DNS, their import tool shredded my data. I then wrote a custom PowerShell import tool, but it took hours to import a mere few thousand records. The next day my account was locked out for "too many API calls" because I simply had the console web gui open. Not used. Open. The GUI showed entries different to the console tools. The GUI string limits don't match the console tool. The perf monitor graph was broken, and is still broken. Basic features were missing, broken, or "coming soon".

You would think DNS would be one of those services that "just works", but nope. Bug city.

Now mind you, most of those issues are fixed now, and they're adding more features and fixing the issues those new features are introducing.

But ask yourself: Why are buggy features being rolled out in production? Did nobody test this shit? Did they ever do a load test? Did they even try basic things like "have the console open with more than 10 records"? Why am I discovering this? Do they not have thousands of customers who have battle-tested this stuff?

Clearly they're just throwing things over the fence and letting support tickets be their QA feedback.

PS: It's even how they use DNS themselves that's just wrong. E.g.: If you use Azure CDN you end up with like 6 CNAME redirects in a row. The DNS standard says CNAMEs shouldn't point to other CNAMEs! At a minimum this is slower than it needs to be, but it's also less reliable because there's more points of failure...


What essential product offerings is Azure missing that AWS have?

My experience is that once AWS offers a new service that gets attention, a few month later also Azure offers it - and vice versa.


For what it's worth, we're seeing a GKE being the main reason prospective clients use GCP at Latacora, to a point where I'd say I was surprised if someone was on GCP but _not_ using GKE. Obviously that's a small subset of all companies, but GKE does seem like the goose that lays the golden eggs for them, at least insofar they care about startup market share.

I also think of them as the second, inferior cloud, but they're almost certainly the better k8s hoster. If you're serious about running k8s on AWS, there's a good chance you're doing something like CloudPosse's Terraform-based setup, not EKS.


I've used both extensively, GKE is superior to EKS in every way... Except on cost it now seems.


> I don’t think GCP is fully aware of their position in the market as the second, inferior choice.

I'm sorry, but first choice is AWS, second is Azure (mostly because retailers don't like AWS), and third is actually GCP.


Thank you for your feedback and we understand this was a surprise to you and many.

For cluster per customer architecture, would you be able to look into https://cloud.google.com/kubernetes-engine/docs/best-practic... to see if there is anything useful? We understand changing the architecture isn't easy at all and we'd love to know how we can help.


I'm curious whether you're comfortable with this move or not. Something about your tone gives the impression you think this was a strategic error.


thockingoog's responses below articulates my feeling very well.

https://news.ycombinator.com/item?id=22487726 https://news.ycombinator.com/item?id=22487110

What I can do is to lower the bar for aggregating clusters with the investments we've done so far. Hope that makes sense.


> I don’t think GCP is fully aware of their position in the market as the second, inferior choice

Aren't they the third choice, after AWS and Azure?


For comparison, running a HA master node in London on n1-standard-1 will set you back ~93 dollars per month. On top of that obviously you'd be figuring out how Kubernetes works, what the best configuration is among other things. I don't agree with the blatant bait and switch, but it still works out way better.


It's 72$ not 300 and the first is free.

I'm not sure what your usecase is that you would choose gke and you are worried about 300$ per month infra costs.

For Corp we use gke. For private I use selfhostet k3samd for our startup asuper cheap digital ocean cluster.


> worried about 300$ per month infra costs

That feels like the wrong attitude.


The salary of people working and using those 'tools' this infrastructure is higher then 300$.

If your kubernetes cluster is part of your core infrastructure, then 300$ more or less should not be an issue at all (not to say that i think 300$ is nothing).

That should not mean that you should waste money but often enough, if you buy cheap and your hardware breaks and your time&material costs much more then what a better hardware would have cost, then you wasted money by buying cheap.

Unfortunate with IT products, there are certain things which are not directly visible: Like how secure is your product. GCP offers 2FA, Digital Ocean does not. How much money is it worth to you to have your whole infrastructure protected by 2FA? For me in a business context, non 2FA would be a no go.


> GCP offers 2FA, Digital Ocean does not.

Digital Ocean definitely supports 2FA[1]

[1]: https://www.digitalocean.com/docs/accounts/security/2fa/


Indeed. I did not see it in my account but those limitations are still a nogo.


$300 is more than minimum wage in some countries.


Context is critical. You should be able to get my context. We are not sitting around a fire somewhere nowhere. We are discussing things on hn.


> I don’t think GCP is fully aware of their position in the market as the second, inferior choice.

They are 3rd for me, I will take AWS or Azure long before I would take GCP. Hell for some projects I would even take Digital Ocean or Linode over GCP.


For obvious reasons, this anecdote reminds me of the adage that one of the things you have to watch out for when saving someone from drowning, is that their flailing about will put your own life in danger.


GCP is not second, but third behind Azure.


> their position in the market as the second, inferior choice Who's the superior choice? EKS?


EKS feels like a product where Jeff Bezos walked into a room and said "you're all fired if you don't have a Kubersomethings service up in the next two weeks". The team did it and haven't touched it since.

The amount of management they make you do is amazing. Need to upgrade the kernel on your nodes? Fill out a five page CloudFormation workflow, and if you copy-pasted anything wrong, your cluster will just randomly work strangely for no reason. This is what they consider "fully managed"! (There are also lots of neat bugs, like if you create a cluster in the web interface while logged in with SSO, all credentials necessary to access the cluster will disappear in an hour. By design, apparently. So even though they have a web interface, you have to create a role account and use the command-line.)

It's really a wondrous product and shows how complacent you can be when you're in first place.


Okay, so what's the best option? It can't be them lol


I've been using digital ocean k8 cluster. It's been pretty good. (a few kinks with configuring load balancers, but other than that its good).


> I don’t think GCP is fully aware of their position in the market as the second, inferior choice.

That's because they are aware of their position as a distant fourth place. They just (well last quarter) allocated an additional $2 billion to try and get a foothold in the enterprise, shook up the management team, and is on a spending spree as well.

Because google search is so shitty I couldn't find the $2B announcement (though it was on HN at the time) but here's a recent article on the general subject https://www.fool.com/investing/2020/02/27/google-buying-way-...



Dumb plan, underdogs don't have billion dollar connections. Worse plan because Google has a reputation for bad service.


> I took a bet on the underdog by using GCP

That's a concerning take on business decisions on two levels.


It's considerably cheaper than EKS. Looks like $75-80 a month vs I believe around $200 per EKS cluster.

Everyone's having EKS cost problems while I'm just sitting over here paying nothing for ECS control planes.


I am not sure how you get to $200/month for an EKS cluster - it is $0.1/hr

https://aws.amazon.com/eks/pricing/


I might be thinking of the numbers I put together for dev/stage/prod.

Turned out to be irrelevant. Following their instructions I couldn't get EC2 runtime hosts to attach after a couple hours of fucking around (which is my standard for, 'is this mature enough to use'), while with ECS I hit one button and was up an running. Wasn't a hard choice when we started dockerizing (especially since I could simplify most jobs even further having dev use Fargate, albeit at a premium).

EKS struck me as a feature parity product, not something you'd actually use.


It's the same price as EKS... https://aws.amazon.com/eks/pricing/


Who would be the first choice: Amazon or Microsoft?


For K8S, I think the best alternative is AWS+kops. EKS also has a high fee for the control plane, and has a ton of shortcomings aside from that. AKS, at least the last time I used it, was much worse in terms of functionality. But I can spin up a kops-based cluster really quickly. It is capable of creating truly production-grade clusters.

If I'm running a small cluster, I can use a small VM for the master node(s) and get my control plane costs down much lower than $73/month.


Give digital ocean k8 cluster a try. It's been pretty decent for me


The projects I've been on that used GCP over AWS fell into two categories.

1). CEO with delusions of grandeur who thought that Amazon was a direct competitor to their business and should not be given money.

2). Projects that used Kubernetes.

Two is the only type of project which doesn't result in tens of millions of dollars wasted.


> 1). CEO with delusions of grandeur who thought that Amazon was a direct competitor to their business and should not be given money.

I mean, look at Netflix.


Netflix chose AWS because it was the only viable cloud at the time. (They used Azure for backups.)

In fact, Netflix helped build AWS. There were daily long-term scheduled meetings with AWS developers to review new features and versions.

In addition, the policy from mgmt. was that they were fine being mono-cloud, since if they changed their mind, they had the engineering resources to migrate whenever it made sense.

Source: worked at Netflix.


Time to dump GCP then. It's not even that the fee is that large, but rather that this is once again Google failing on a long term commitment and shafting those on their platform once again. This was one of the benefits that was pushed by their sales team when they called us up to market GCP over AWS and their EKS offering. Doesn't matter that they are price matching, Google's inability to actually commit to long term support, servicing, pricing or features across any of their products is tiresome. Time to move business elsewhere to AWS or Azure. They may be more expensive, but at least we know what we are paying for, and that it's going to stay that way for a significant length of time.


If the main value of GKE over DIY is $73, you should totally DIY.

I mostly try not to be too Google-focused here, but I have to say...

I'm pretty proud of GKE, and I think it offers a lot of value other than just being cheap. Managing clusters is not always easy. GKE handles all of that for you - including integrations, qualifications, upgrades, and patching clusters transparently BEFORE public security disclosures happen.

We have a large team of people who deal with making GKE the industry-leading Kubernetes experience that it is. They are on-call and active in every stage of the GKE product lifecycle, adding value that you maybe can't see every day, but I promise you is there. When things go sideways, there isn't a better team on the planet to field the tickets.

I don't understand the anger here - you're literally saying you'd rather pay more for a service of lower quality because... why? Because they will continue to charge you more? Does not compute.

For those people who use a large numbers of small clusters, I understand this may make you reconsider how you operate. As a Kubernetes maintainer, I WANT to say that a smaller number of larger clusters is generally a better answer. I know it's not always true, but I want to help make it true. GKE goes beyond pure k8s here, too. Things like NodePools and sandboxes give you even more robust control

GKE is the best managed Kubernetes you can get. And we're always making it better. Those clusters actually DO have overhead for Google, and as we make GKE better, that overhead tends to go up. As someone NOT involved in this decision, it seems reasonable to me that things which are genuinely valuable have a price.

Also, keep in mind that a single (zonal) cluster is free, which covers a notable fraction of people using GKE.


I believe everything that you say. The value it provides is very good.

If Google Cloud would have charged 73$ from the start (or after beta), i think there wouldn't be so much anger.

The anger comes from, a product was free and now it is not. A lot of people made architectural choices that depended on the price of 0. (You mentioned these cases in your post).

However, i believe the bigger issue is, that Google Cloud broke essentially a promise.

As a customer I need to be able to trust my cloud provider, because I am literally helpless without it.

Can I trust an entity that breaks promises ? No, I can't. I need to worry. Especially, if I cannot follow the reasoning behind it.

If it is true, that Google's overhead went up, because of improvements, then it would have better to have two kinds of clusters (better and paid, old-school and free). You would have not broken the promise. People can choose on their own pace to upgrade if they need to.

Also keep in mind, that you also carry the Google brand. Hence, if other teams of Google break promises (like f.e. Stadia) this will also reflect on the Google Cloud team. Unless you keep a crystal clear track record, i need to assume it can get worse than what you have done right now.

My conclusion is that, I will design the cloud architecture I am responsible for, such that it has minimal dependencies on Google Cloud specifics.


> However, i believe the bigger issue is, that Google Cloud broke essentially a promise.

Which promise?


> I don't understand the anger here - you're literally saying you'd rather pay more for a service of lower quality because... why? Because they will continue to charge you more? Does not compute.

This response, right here, is everything you need to understand about why Google Cloud is failing to sell to the enterprise market.

The enterprise market only really cares about one thing: rock solid stability. It doesn't care about features, and it doesn't (really) care about price. It wants a product that it can forget is there.

What's really sad is, technically, GKE is that product. It just works. It is solid. You do get to forget that it's there. Until you get a random email telling you that you get to explain to your boss that your bill is going up next month and your project might end up running over budget as a result.

If you can understand why a large segment of the market prefers to pay a higher but stable charge over a lower but undependable charge, then you can understand why Google Cloud is failing at selling to enterprise.


In addition, AWS charges only ever go down. I don’t think I’ve ever seen a price increase.


They also make sure to charge enough for services so it's always profitable to them (smart).


Just a data point:

I'm the CTO at a very small company. All our stuff is running on GKE. Our monthly bill tends is a lot less than $10,000/mo. We're currently in the process of splitting our stack into separate projects and clusters, because co-locating projects in a single cluster has gotten messy. We'll probably end up with 4-5. That will increase our bill by $292/mo, worst case, assuming the first cluster is free. For a company our size, it's not a huge expense. But these things add up.

Since moving from DigitalOcean, our Google Cloud setup has more than doubled our monthly bill. We're paying for more compute, but certainly not twice the amount, as we've only gone from 14-15 nodes to around 20; it's just more expensive across the board, both node cost and ingress/egress. We're even cost-cutting by using self-hosted services instead of Google's; for example, we use Postgres instead of Cloud SQL. I ran the numbers earlier today; the equivalent on Cloud SQL would be 3.4 times more expensive.

In short, Google Cloud is expensive, and it's not like the bill is getting smaller over time.

Developments like these factor in my choice of cloud provider for future projects.


Not sure what size a "very small company" size is, but I'm just curious as to why you chose GKE. I make tech decisions for a (probably) much smaller company, and I found things like App Engine Flexible Environment, Cloud Run and Cloud Functions let me do much of the stuff I can do with k8s but with much, much less complexity (at least on my side of things). The main factor is that I don't have a full-time infrastructure expert, and my experience in the past is that k8s essentially requires that.


We have about 200 long-running pods right now. On Cloud Run, that would cost us more than $12,000 in CPU alone, and that's excluding memory and request costs.

That also excludes stateful apps like Elasticsearch that would not be able to run on Cloud Run. Not sure what Google product is appropriate there.


So let's say you have a small pile of apps that are all <app_server>, <redis>, <jobqueue/workers>, <db>, <frontend>, but not all the same DB, language, etc. They're all low traffic and you want to automate/simplify and containerize them. On developer machines docker compose works great, but you need to deploy in a cloud provider. What are choice do you have other than K8s?


AWS Lambda with SAM local?


Less than 15 employees. Several products, two teams, <= 20 nodes.

We migrated our stuff from DigitalOcean around 2018. At the time, we briefly toyed with the notion of self-hosting Kubernetes on DO, but it's complex to manage, and we don't have any dedicated ops staff. GKE is significantly easier to manage.

At that time we migrated, the things you mentioned weren't available/mature, I think. Even today, I'd choose Kubernetes over a complex mishmash of different systems. I like the unified, extensible ops model. In fact, I'd go so far as to say that I wish all of GCP could be managed as Kubernetes objects.


Re. managing GCP as Kubernetes resources: https://cloud.google.com/config-connector/docs/overview


That's very cool, thanks! Note that this allows selectively creating Kubernetes resources backed by GCP resources. Looks like it will not automatically sync everything that already exists, which seems like a missed opportunity.


But DigitalOcean has managed K8s now: https://www.digitalocean.com/products/kubernetes/


DigitalOcean did not have Kubernetes then. Are you suggesting we should spend 6-12 man months migrating back?


How about contracting an ops-oriented person for a month that would do the migration for you? Where do those cost functions intersect?


Would never happen. Just the amount of time needed to dedicate to onboarding a temporary contractor would be really disruptive to the developers, not to mention the disruptive effect of the technical migration — databases to move over, persistent volumes to copy, DNS to repoint, lots of downtime, etc. There's a good reason companies don't switch clouds often.


If it takes more than a month to migrate a 20-node K8s cluster, then that's a red flag. Too much tech debt or a strong vendor lock-in? Either deserves attention.


Doing the migration might take a few late evenings; deploying all our apps and Helm charts to a new cluster takes just a few commands. Learning what needs to be migrated, deciding on how the configuration should look on the destination end of it, and designing a detailed plan and checklist for the whole process is the big task. Adding the onboarding of someone who's never seen the inside of your company and you're talking about a longer, more disruptive process. Tech debt isn't a factor here.


> Learning what needs to be migrated, deciding on how the configuration should look on the destination end of it ...

Sounds like you've identified an area of risk you should address


No my mistake DOKS came out late 2018.

I had been using it since May 2018 but it didn't come out of early access till December.


Indeed, for small apps that average 2 instances or less during the month it's now cheaper to run App Engine Flex than GKE. Each instance costs about US$ 50/month in US central.

For those that are not aware, App Engine Flex runs services based on Docker containers with autoscaling similarly to Kubernetes. It has way less features than GKE, but if you are just running a standard app that only needs to connect to a database that is more than enough.

Bonus points: you can have multiple services, like back-end and front-end and make them available in the same subdomain - to avoid CORS problems. You can also host your front-end for almost nothing with App Engine Standard. It has an awesome CDN built-in if you know how to use it.


For sure, though I will say if you're trying to cut costs, it's my understanding serverless is quite cheap. So if you can turn some of your services into serverless containers/functions. I'd highly recommend it.


How much would it cost for you to provision and colocate your own hardware, run a k8s cluster, and manage upgrades?

You might not be at the scale where this is feasible yet since that's probably multiple full-time engineers, but eventually the cost functions intersect.


Well, we don't even have a dedicated ops person.


> I don't understand the anger here ... Does not compute.

The saying: “the market’s perception is your reality”, is especially apt here. Google’s decision makers tends to forget that in the end they are dealing with human customers not machines. Contrary to the concept of economic rationality, humans are notorious for exhibiting behavior that, to the untrained eye, appear irrational.

A commenter helpfully explained their perception of the new pricing change:

“The anger comes from, a product was free and now it is not. A lot of people made architectural choices that depended on the price of 0. (You mentioned these cases in your post). However, i believe the bigger issue is, that Google Cloud broke essentially a promise.”

IOW, from their perspective, the pricing change was framed as a loss [1] which opened up a host of negative emotions (anger, mistrust, etc) that come with mitigating a loss that is imminent.

Google as an engineering company may look down on fields like psychology or behavioral econs, but if they genuinely want a fighting chance against AWS and Azure, they will need to court sales leaders with a strong humanities tinge, to avoid these kinds of decision-making that achieve the opposite intended effect — eroding people’s trust in GCP.

[1] https://en.wikipedia.org/wiki/Loss_aversion


>If the main value of GKE over DIY is $73, you should totally DIY.

It's not the fee itself, it's the worry that GKE will do what Google Maps did and massively increase fees with very little notice, causing people to scramble to migrate.

Google has a really bad reputation right now when it comes to cancelling projects that people have built their businesses upon, or jacking up fees quickly. The $73 is irrelevant on its own - the issue is (a lack of) customer trust.


Google and Google Cloud are largely different businesses, though I understand it's hard to keep that in mind in the context of things like this.

I encourage everyone to always stay nimble and keep your eyes on portability. I also encourage you to try to assess the REAL costs of doing things yourself. It's rarely as cheap as you think it is.

As a Kubernetes maintainer, I am fanatical about portability.

As part of the GKE team, I think we provide tremendous value that people tend to under-estimate.

NOTE: I was NOT involved in this decision, but I understand it, and I want to help other people understand it.


If you brand it as "Google," you can't expect the positive associations to transfer and the negative associations not to transfer. That's not how it works. You get both or you get neither.

Keep in mind that you aren't selling to me, you are selling to middle management who hears "it's different we promise" and then goes home to have nightmares about their rival manager piling on: "GCP canceled a service from under you? Who could possibly have seen that coming? Oh, the salesman told you it wouldn't happen, my mistake. (Everyone laughs at manager's stupidity.)"

GCP needs to give these guys ammunition. AWS burns goodwill like they've got a city to light: surprise bills, abandoned (but technically not canceled) services, poor performance, sticky abstractions, shameless grand announcements of services that upon further investigation only exist in the sense that you can ask support and wait 5 days (rDNS), etc. Broken software and subtle (or overt!) killer caveats abound. Go with AWS, we scale to the moon! Oh, "the moon" is >5GB of data through our time series ML? Lol no. Hard limit. Oh, we let your buddies exceed that limit and we're advertising that you can exceed the limit? Lol -- not our problem. Yet nobody holds them accountable. Nobody gets fired for choosing AWS, because AWS over-promising and under-delivering is not a meme. It's reality, especially by Google standards, so GCP marketing could make it a meme if they had any sense about them, but as far as I can tell they aren't interested.

If GCP stays the course, the results are 100% predictable and frustrating as hell, because AWS is a real steaming pile and I hate to see them continue to win The Game because Google, of all companies, can't figure out how to advertise and manage their reputation.


Google and Google Cloud are not largely different businesses to the world at large. That may be true internally, but what you do reflects on the other.

Aside from that, Google has a reputation for pulling this shit, and now Google Cloud does too.

The value you provide is irrelevant to how you make people feel with decisions like this.


sorry that you have to work with such low-IQ people that made this decision. Followed your work on k8s, thank you, its better because of people like you and others.


I don't think that is at all a fair characterization, you just don't have the same data available to you.

Thanks for the props. It means a lot to me personally.


It's not about whether it's worth it. I have no idea if $73/mo/cluster is more or less than what GKE is worth, but you can't ignore the time-varying aspect of this change. If GCP had started at $100/mo and just announced a price drop to $73/mo, your customers would probably be cheering, even though you'd be charging them the same! They'd be happy about being loyal to you, thankful that your offering gets better over time without asking, etc.

Let me put it another way. Lunch perks at Google are probably worth way more than $73/mo. Googlers would probably not feel too great about having to start paying that amount... Even though it's also worth it, people working hard to prepare food, etc.

It happens time and time again that GCP paints itself in a corner with an unsustainable offering and finds itself in a situation where it has to dent customer trust by hiking prices or retiring products on short notice.

Why?

GCP is in no position to pull this off, competitively. Now doesn't seem like the time to pull unit margins up. Google could do so on ads or with Maps API because it has the best product by far in those areas. But not in Cloud, a market where stability is a feature, and GCP is a late, small player against larger competitors who are giving customers a much more stable product.

It doesn't matter if GKE is the best thing since sliced bread, and keeps getting better — as the CTO of a cloud-based business, you have to decide whether you want to build on it over the long run and take the risk of it becoming super expensive over time; there's not a lot of reassurance that this is even a concern at Google at this point.


> Now doesn't seem like the time to pull unit margins up.

Tbf, you don't have the data to say that. Maybe GCP hit the growth targets they had and are not interested in growing more; maybe they've grown so much that service quality is going down; maybe they want to stall and build up other parts of the offer... there are a lot of reasons why they might want to cash in. Whether they could have foreseen it happening at this specific point in time, and how they managed the communication, is another matter. As others said, Google is saddled by default with a perception of commercial unreliability among the tech-savvy, so any new service they launch should factor that issue in terms of long-term optics.


> Maybe GCP hit the growth targets they had and are not interested in growing more; maybe they've grown so much that service quality is going down; maybe they want to stall and build up other parts of the offer...

No, they are on a death march for market share. https://www.crn.com/news/cloud/google-reportedly-set-ambitio...


> If the main value of GKE over DIY is $73, you should totally DIY.

I'm with you here, and the price feels about right (looking at GCE prices)--it doesn't feel like it's a ripoff or someone's trying to squeeze out more revenue.

> I don't understand the anger here

This is Google being Google. The fee probably makes sense, but as flawed humans, we're subject to the endowment effect and feel like Google's taking something from us. Amazon's all about this customer. This move is all about accurately pricing services, and it's Google putting correctness before emotions.


Shouldn't the investment in Kubernetes bring its running cost down instead of increasing it?


The problem is you are charging more without providing a clear case that the value is worth it. It looks like a bad deal from an existing customers view point.

73/month is not a lot, but it’s still turning back on the original value proposition. If it’s such a small amount, absorb the cost as cost of doing business, so you can retain customers better and grow new ones.


> As a Kubernetes maintainer, I WANT to say that a smaller number of larger clusters is generally a better answer.

I'm not trying to nitpick here, but that justification is awful. It goes against reliability engineering on a deeply fundamental level, pretty much guarantees to make already not that reliable things even less reliable. Generally the more isolated entities you have and the smaller they are the less they affect each other and the environment when something bad happens, the faster they can be recovered, the fewer end users they affect, etc. If I remember correctly, this is even how some Kubernetes people justified ideas behind Kubernetes itself that you want to drop now.


That is not absolute truth. If it were you would eschew kubernetes altogether and just use VMs.

Everything is a tradeoff. If you want total isolation, you pay for it. If you don't want to pay for it, you make more value-based tradeoffs.

Concretely, Google runs "a handful" of "pretty reliable" services on a relatively small number of clusters.


Thank you for the feedback.

> Google's inability to actually commit to long term support...

This is _exactly_ what Google is doing in this case. We are providing an SLA - a legal agreement of availability and support. These changes introduce a guaranteed availability of the management control plane.


He means that there was a sales pitch from all gcp sales guys to not charge for that. 99.95% is not enough IMO to charge 73$/mo.

As someone else noted, it breaks a lot of recommended architectures where you would have auto provisioning and a lot of clusters to separate concerns and keep costs down.

Finally, the pricing changes are starting to look like a pattern, every time Google deems the usage of a product is good enough, they will increase the price.

They are the Ryanair of the cloud.

Edit 1: moreover, it will increase the cost of composer, and on top of that, the recommended pattern where composer is paired with a kubernetes cluster for executing the workloads


> They are the Ryanair of the cloud.

Isn't Ryanair literally the Ryanair of the cloud(s)?


EKS only gives you 99.9% uptime, and I'm uncertain as to whether you could achieve more than 99.9% uptime on your own by DIYing your cluster in a public cloud provider without doing multi-region.


To put that in perspective, three 9’s allows you about 9 hours of downtime a year, which will certainly require multi-region and a dedicated ops team.

Two and a half 9’s is a whole different story. We achieved about 20 hours of downtime last year even without HA on k8s bare metal in Alibaba cloud. But I’m uncertain whether that’s a feat we can repeat this year.


> Finally, the pricing changes are starting to look like a pattern, every time Google deems the usage of a product is good enough, they will increase the price.

To be fair this is hardly new and by no means limited to Google. Any number of SaaS startups that have survived to at least moderate success have done similar things.

Look at UserVoice as an example: started out with a free tier plus some reasonable paid tiers with transparent pricing, then a year or two back killed the free tier and moved to a non-transparent "enterprise" pricing model with absolutely exhorbitant fees.

Plenty of other companies offer free to build their userbase and reach, then either water down the free tier, or remove it entirely. It's practically the SV modus operandi for the last decade.


>Any number of SaaS startups that have survived to at least moderate success have done similar things.

Google is not a startup, it is one of the largest companies on the planet.


I don't see how that's relevant: my point is that it's a tactic employed by a wide range of businesses including but not limited to startups and we shouldn't be surprised to see it here. It's not a pattern that has suddenly emerged.


Shouldn't that be opt-in? The management control plane is not something we consider critical to operations. I'd happily accept if it was unavailable for 1 and a half minutes a day versus these additional costs.


That's great feedback. I'll relay that to the product team. IANAL, but I think it would be legally challenging.


Hard to understand how it would be legally challenging. ISP's do it all the time when differentiating their business plans from residential. Both services run over the same infrastructure and you typically get the same/similar speeds, but a key difference is an SLA with the business plan.


IANAL either, but I don't see why it would be? Just have a separate cluster type, e.g SLA Zonal, SLA Regional. The SLA already differentiates the current cluster types. Anthos Clusters are also not subject to any additional fees?

And having it opt-in will save face with those users of GKE where an additional $73/m is significant.


Opt-in for the SLA and additional cluster cost would be fantastic. We run pretty small clusters but don't need any additional SLA's on top of what's already provided. Frankly we could care less about the control plane SLA.


btw. I would prefer to have something like a cost reduce if the cluster runs 24/7. Currently we also do not need that amount of SLA. but we actually have a single cluster and are a really small customer that reall choose GKE, because of no management fee (i.e. nodes are really expensive compared to non cloud providers). But we never used more than one REGIONAL cluster (we also never spun up a new one, we only change workers). And now it will cost us money. What a shame.

P.S.: german sites have the pricing wrong.


Sure, in this case I can see that. I was referring to those four points with respect to Google services in general. I'm sure I don't need to dig up a list of features and services that have been merged, shuttered, price hiked or moved into a different product suite over the years. Admittedly a lot of the issues are with the GSuite side of things, but it's sad to see this coming to GCP as well.

On a hopefully more constructive note, if this is the way it's going to be from now on, I would at least expect to see an exemption on such a management fee/SLA on preemptible nodes - having an SLA and management fee on the cluster whereby nodes can be killed in a 30 second window without prior warning seems to be a little more than pointless.


Even if your worker nodes are pre-emptible, the master nodes are not. The management fee covers running those master nodes and many other core GCP integrations (like Stackdriver logging and other advanced functionality). Billing is computed on a per-second basis for each cluster. The total amount will be rounded to the nearest penny at the end of the month.


> We are providing an SLA - a legal agreement of availability and support.

Do I still have to pay the bill first, fill out forms, get account managers involved, at some point receive a partial credit, and repeat this until the delta between what I was expected as the SLA credit and what I got as the SLA credit is less than the cost of the time to fight for another cycle?


What? A 73$ price difference to AWS was their main selling point?


It depends entirely on how a firm has their infrastructure set up - if you have small cluster(s) per client for isolation/compliance purposes, you end up with, for example with 250 clients, each one using, say, 1000 billable hours/month:

250 * 0.1 * 1000 = $25,000/month.

This is quite the price hike for something that was a) free until now. b) Not a service that warrants such a fee given that it uses existing (GCE) resources and can frankly be done manually by one of the DevOps engineers for a few hours/month and some scripting. It's just a charge for convenience it seems.


You keep noting how "easy" it is to provision and manage a Kubernetes cluster. From experience, properly securing and maintaining a Kubernetes cluster is a multi-person full-time job.


We already do provision clusters, using the aforementioned tools. There is some setup involved, but once done, provisioning and upgrade is relatively simple. Indeed, we used to exclusively provision and upgrade via Terraform/Ansible. When we started using GCP, any data that could be stored by a US company without causing compliance issues was offloaded to GCP over other providers due to the auto provisioning/management at no cost.

If you guys find it hard to maintain/upgrade clusters, that's your business. All I am saying is that as a company giving you business, with this change, you are now no longer the cheapest, most reliable or most convenient. As a result, we will be moving to provision instances with other providers from now on.


No, the $73 was a selling point of GCP. You could easily spin up different clusters at no additional costs (at least for the cluster itself).


Never trust free without an escape hatch.


> Last modified: November 27, 2018 | Previous Versions

> As of November 28, 2017, Google Kubernetes Engine no longer charges a flat fee per hour per cluster for cluster management, regardless of cluster size, as provided at https://cloud.google.com/kubernetes-engine/pricing. Accordingly, Google no longer offers a financially-backed service level agreement for the Google Kubernetes Engine service. The service availability of nodes in a Google Kubernetes Engine-managed cluster is covered by the Google Compute Engine SLA at https://cloud.google.com/compute/sla.

> Uptime for Google Kubernetes Engine is nevertheless highly important to Google, and Google has an internal goal to keep the monthly uptime percentage at 99.5% for the Kubernetes API server for zonal clusters and 99.95% for regional clusters regardless of the applicability of a financially-backed service level agreement.

https://cloud.google.com/kubernetes-engine/sla

Interesting that they're now walking back from this...


Google, walking back their commitment to software? gasp



Good time for people to look into DigitalOcean managed Kubernetes (DOKS). I've been using it since it was in pre-release and its been great so far. Their support has been very responsive as well.

https://www.digitalocean.com/products/kubernetes/


Care with DigitalOcean's Kubernetes offering. Unless they have completely revamped their entire stack since their initial release, it's a security nightmare. The way they launched it was unacceptable for any company to use. I immediately migrated my projects away and closed my account upon understanding how abysmally they screwed up the security of that offering. The fact they were willing to launch that means they cannot be trusted to host anything.

Not only are your cluster's administration ports exposed on public addresses with no ability to firewall them, but each node pulls an Admin-level auth token to manage the DigitalOcean account behind the scenes. A single http request made to an internal IP, from within any Docker container running within the cluster, grants the attacker full read/write access and control to the underlying DigitalOcean account. This includes any developer who does so from code deployed into the cluster, giving the developer full access to the business's DigitalOcean account.


Are you referring to this? https://www.4armed.com/blog/hacking-digitalocean-kubernetes/

I'm curious as to whether that has been fixed and a proper security evaluation done. I've been avoiding k8s on DO for a while because of that (perfectly comfortable with their other services) but it would be good to get an update.


Yeah, that's the one. I'd done my own analysis before that came out and figured it was an unacceptable risk, and then that article came out with the actual attack vector in all its glory. The simple fact the k8s/etcd ports are exposed on a public address with no ability to firewall it off is bad enough, as you're relying on the security of the software running on those ports rather than a firewall restricting which source address(es) can even connect to begin with.

The credentials (certificates) being exposed via http://169.254.169.254/metadata/v1/user-data – from within any pod/container, not just from a physical node – was the final straw. I'd forgotten that the DO token wasn't directly listed there, but can be extracted from the etcd instance where it is stored (explained under "DigitalOcean Account Takeover" of that article).

Again, all of this may have been (and hopefully has been) mitigated since the original release. For me, it's too late to reevaluate; the fact that was considered releasable in the first place destroyed any credibility in my eyes.


Another happy DO user here. I wanted a k8s cluster for my smaller project when they just started the beta.

Very happy with it, also the resources available helped a lot in actually understanding k8s. On GCP the line between what gcloud and what k8s was a bit blurry for someone new like me, and doing just k8s on DO helped me really 'getting it'.


I am using this as well for two smaller projects, it has been amazing. I tied it with Gitlab autodevops in about 30 minutes, most of which was dealing with migrating to DO managed Postgres.


This strikes me as an attribution problem e.g. Within large companies business units are treated as standalone financial entities so you are either making money or a cost center. I think what could be happening here is that the kubernetes group is generating what look like un-paid-for costs, instead of being back-credited for ALL of the compute they are selling via the worker VMs. I would argue that people come to GCP for GKE which means it should be seen as a revenue generator, in that light it makes sense to hold the loss leader of the control plane open and 'free' to grow GKE as quickly as possible. This decision is bad for business.


Well then this is perhaps a big failure by the execs at GCP. It’s making the individual teams look good at the cost of unhappy customers.

Sometimes I think Google has phenomenal engineers, they have this massive ad cash cow from search, YouTube etc. However when it comes to business units that need to handhold their customers and need human interaction, they are pretty terrible at it. Empathy is perhaps a skill that’s not important to them when they are hiring.


It would be cool if service teams were credited a referral fee for all future consumption, platform-wide, by consumers that started with their service. As a bonus it would incentivize each service team to prioritize easier onboarding.


We moved our cluster into the Hetzner Cloud for Emvi [1]. Yes, they don't offer a managed solution and it took us about two weeks to set up and test the cluster properly. But the cost is less than 1/6 of what gcloud costs us. If you have the resources and knowledge to maintain your own cluster, check it out. They have insanely good pricing (2,96 €/month for the cheapest instance which is faster than Googles $15/month VM).

Here is a very good tutorial on how to set up your own cluster: https://community.hetzner.com/tutorials/install-kubernetes-c...

[1] https://emvi.com/


Hetzner is solid from a perf / price perspective. Mind that the network peering outside Europe is not that great. So could be a very good choice depending on your user base' location.


True. In our experience it's fast enough. Additionally traffic is free unless you hit the 20 TB outgoing traffic per node (which probably will never happen to us). In contrast gcloud costs about 12ct per GB outgoing traffic (!!!).


Do you use any special solution like Rancher or kubeadm?


kubeadm


After this, I've been exploring other places to host our team's clusters... copied pricing below

- EKS: $0.10/hour/cluster

- Digital Ocean: Free (only charges for the nodes)

- Azure: Free (only charges for the nodes)

In the long run we'll probably try and build our stack on vendor-agnostic tools..

- Rancher - https://rancher.com/products/rancher/

- Infra.app - https://infra.app (mentioned a few weeks back on the Kubernetes podcast)

- Prometheus https://prometheus.io/ - metrics

The cloud providers all include their own tooling (logging, monitoring) built-in but I'm worried this will only lock us on to further price increases.. has anyone found a good vendor-neutral logging system? We don't really want to use ELK stack right now since it's really heavy and costly to run...


OVH also offer a free control plane, but their service is relatively beta so far.


We've been running our own cluster on EC2 nodes built with kops and it's worked well so far. As for logging, which part of ELK is heavy? You can use the cloud operator to run it from within your cluster (https://www.elastic.co/elastic-cloud-kubernetes). We've also switched to https://vector.dev as a more lightweight alternative to filebeat/logstash.


One more shameless plug for my startup, https://kubesail.com (YCS19)!


Shameless plug: https://www.kubermatic.io Disclaimer: I work at Loodse (the company behind kubermatic)


www.scalyr.com Fastest, Cheapest, Easiest w/ no indexing



Honestly I understand the hard work it takes to manage all the clusters, but this was a total bait and switch and hurts the reputation that everyone has with Google Cloud. Telling us to DIY because we cannot pay $71 just sounds like someone who works at Google would say, which you do work at Google.

The sentiment with my clients before was that Google Cloud was a great choice because of the security and expertise with GKE. It's also free!

Meanwhile, in the back of my head I've always had this fear because of your reputation that you do not keep your promises and that you do not care about your users. Because of this fear, we have tried to make every infrastructure decision not use a managed service by Google even though it may be easier to do so short-term.

For the product I'm working on, we decided to use Kubernetes just in case you baited and switched us with the reputation you have. In terms of monitoring, we really wanted to use Stackdriver, but now we're 100% using fluent-bit + prometheus + loki + grafana. It's the only way to protect ourselves from your reputation which is becoming a reality.

So yeah, this is pretty sad and a bad decision. Should have priced GKE at $70 / month to begin with and we would have been fine with it. Now we're (actually) looking at EKS since Amazon doesn't seem to have this reputation and you've spooked us. We never would have thought about using any other provider until today.


I understand the emotional response here, but I don't think it's rational. GKE has to work as a business, or else the whole thing is in trouble.

I think GKE provides tons of value, but people tend to under-estmate that. In order to keep providing that value, we need to make sure it is sustainable.

I'm really, truly sad that you perceive it as bait-and-switch, but I disagree with that characterization. If you want to move off GKE, I'll go out of my way to help you, but I urge you to take a big-picture look at the TCO.


To be fair, it's unusual for a product at this scale to go from free to paid. It's also unusual for it to happen to a product which already went from paid to free once before.

I don't agree with the parent that it's a bait-and-switch, but I also don't think what's happening is an emotional response. For many people and companies, clusters being free have been a feature of Google Cloud. Making it a paid feature completely changes the dynamic.

It's an unexpected announcement that will further sour sentiment about Google as a company. It's really hard to build trust in this industry, and it's really easy lose it. Google has this thing about announcing changes that blow up negatively on HN, and could learn from this.

(For the record, I'm a big fan of Kubernetes, and I like GKE a lot.)


This kind of mentality is why Google is struggling. You forget that your customers are human and make emotion-driven decisions. This price increase proves that you are not making sustainable long-term decision and you are willing to dump the cost of that mistake on your customers.

We already don't trust Google to provide long-term, stable, reliable infrastructure and each time something like this happens, we become more convinced that Google isn't trustworthy.


I think part of the optic's issues is your peers seem to be offering similar services for free, while being sustainable.


EKS has always had a fee.

AKS, well, I don't have any insight into their business, but I have my suspicions.


Oh wow, one of the biggest reasons we picked Google Cloud was that you did not have to pay a flat fee for their managed Kubernetes service. Luckily there is Kubernetes support across all Cloud Providers so we're happy we're not vendor locked in. (biggest reason we picked Kubernetes in the first place.)

We were thinking of using Stackdriver for logging, but we were scared of vendor locked in due to price increases or other changes that we've been warned about with Google. In this case, I think it's safe to say we'll be using Prometheus + Grafana + Loki instead since there may be a random Stackdriver flat fee introduced or some other weird fee and we may need to migrate out of Google.


Stackdriver is terrible, and super expensive for what it is. We ran a dedicated Fluentd in GKE for a long time to work around its shortcomings (GKE also uses/used Fluentd to shuffle logs into Stackdriver), then switched to using Loki + Promtail + Grafana, which has been excellent.


+1 for Loki + Promtail + Grafana. Really low maintenance once it’s set up.


Stackdriver indeed horrible


The last place I worked, with a couple of petabytes of monthly Stackdriver logs and a full embrace of almost every GKE/GCP tool, also switched to Prometheus + Grafana due to a lack of functionality within Stackdriver. I think you're making a good choice.



"Let's trade goodwill for short term profits." I suppose not really surprising given the maps fiasco and Oracle appointment but this really comes off poorly to me.

Flat fee is gonna suck for people running a lot of clusters. I bet there are some people out there spinning up a cluster per x who are going to be real unhappy about this.


I know the HN crowd hates this sort of thing, but seems reasonable to me. $70/month is a very reasonable price, and most businesses likely have well under 10 clusters (I’d think many just have 2-3, a single cluster for prod and then 1-2 dev/test type environments). This is probably mostly to cut down on edge cases users who are spinning up crazy numbers of clusters for weird reasons, and costing Google a bunch of $$$.


Furthermore, anyone spending enough on compute to warrant k8s shouldn’t balk at all at $70/mo. I think the threshold for introducing the complexity and overhead of k8s isn’t probably until at least $5-10k/mo of spend (and probably 3-10x that in the normal case). Less than that and k8s is a whole lot of overkill.


We use google cloud projects to isolate customers and environments. (Some of our clients are old school and VERY scared of cloud, and multi-tenant). so for our pretty small company, we have >60 projects, each with a k8s cluster. that is a pretty good bump in costs come this summer.

Historically, we have powered down all the compute in a project that isn't needed, but left the k8s cluster in place (with its compute nodes powered down) because we could then bring it back up in 2 min or less. That dropped our cost for each project that was powered down to ~$25-$50/month, depending on how much disk space, etc, they were using. this will more than double (or triple) the cost of those projects. If we have to rebuild a full cluster from scratch, we then have to wait for the global load balancer to build our ingress, and then go authorize a TLS cert. This adds 20-30 min to re-activating our projects, which will suck.


Perhaps the right thing for GKE to do is introduce a cluster snapshot. Sounds like a great feature.


That would actually be pretty awesome.


I do think that $70/mo is reasonable per cluster but don't dismiss the value of k8s even for small projects. I used to bring a project from Ansible to k8s and even though it use only 2 nodes (~$200/mo), the tooling, the abstraction, the snappiness of gke was very much worth the switch.


For two nodes you need neither ansible nor k8s.


Time to start looking into DigitalOcean more seriously.

G Cloud is already unreasonably expensive and nearly impossible to price manage. It's cool to see them double-down on that.


For last 7 years I am running DO and never had any issue. I never understand why it is looked down. In fact I have faced so many issues with AWS (particulary their old hardward). In one case, our ec2 instance was rebooting frequently. AWS team didn't accept any issue from there end and after few weeks ask us to upgrade instace because of bad health.

In my experience, AWS is a very expensive cloud with clunky UI and big brand name. During consulting gigs, I have seen many customers want to go with AWS only because of brand. And later they cry when bills start to hit roof with vendor lock in.


I recently tried to spin up a VM for my own use in AWS, but I had to do a rate increase because I wanted a beefier machine. Easy peasy. My experience was comically bad.

====================== First email from AWS (several days after my request): ======================

Thank you for submitting your Limi Increase request.

I'm contacting your to inform you that we've received your Workspaces Application Manager - Total Products limit increase request, for a max of 5 in the Oregon region. I will be more than happy to submit this request on your behalf.

Please note that for a limit increase of this type, I will need to collaborate with our Service team to get approval. This process can take some time as the Service team must review your request first in order to proceed with the approval. This is to ensure that we can meet your needs while keeping existing infrastructure safe.

You may rest assured I will push towards expediting your request to be addressed as soon as possible. As soon as the Service team contacts me I will definitely let you know by email.

In the meantime, please feel free to let me know if you have any additional questions or concerns and I'll be happy to help!

I appreciate your patience while we evaluate your request.

====================== Second email: ======================

Thank you for your kind patience whiIe we continue to evaluate your Workspaces Application Manager - Total Products limit increase request.

I apologize for the time is taking to provide you with a resolution as we've always aimed to provide our customers with a rewarding experience that meets and goes beyond expectations. Unfortunately, from time to time there are cases where the final outcome is handled by another department and the time they take is completely out of our hands.

We certainly understand the sense of urgency that you have for this particular request and therefore, we have spent time communicating with the service team to let them know about it. Rest assured that your case is active, being looked into and the sense of priority has been transferred. As soon as we have an update from their end we'll be touching base with you immediately.

I am committed ensuring that you will get the help that you need as fast as possible, so we can ensure everything is being handled to your satisfaction, please feel free to let us know if you have any further questions or concerns through this case, so we can address them as soon as possible.

============= My response: =============

You can go ahead and cancel my request -- I've decided to not go forward with my project.

============= Their reply: =============

Greetings from Amazon Web Services.

We're sorry. You've written to an address that cannot accept incoming e-mail.

If you need to contact us, please visit http://www.aws.amazon.com/contact-us .

Thank you for your business.


I always suspect, do they still copy configuration manually ? Is this delay because of that ? There was article where in early days AWS was doing it. Even amazon.com was not running on AWS those days. Hope it is not the case.


It seems... strange to call Workspaces Application Manager “a VM you tried to spin up”

Workspaces are already a specialized product built on the AWS ecosystem; WAM is a niche management tool inside that niche.



DO is famous for being a pain in the ass. They’re great for tiny hobby things but honestly I’d never run a prod/serious/client workload there. Too many issues. I’ve had multiple clients lose a droplet due to a simple credit card expiration.

It’s a race to the bottom on price so this doesn’t surprise me. They chose this life.


To be fair, what is the best way to handle expiring credit cards? For one of my SaaS products, I give a 30 day grace period, then delete the data. If they didn't have a backup, that's on them...

If they delete the droplet the second a single CC payment fails, that's one thing, but I don't believe that's how their system works.


If you are literally in the business of enabling, storing, and protecting production workloads, data, etc.. then catastrophic data loss should be an asbolute last resort.

In both of these instances I am referring to a balance of less than $20.

So for less than $20 (a few weeks late) DO says, welp fuck this customer we are going to terminate all of their resources immediately.

This is what DO and others need to do: Put it in your terms that you will keep racking up charges and then send it to collections. Charge interest, charge fees, do whatever you want. Turn $20 into $40. Why? Because businesses do not give a shit... if it is between losing everything or a slap on the wrist (monetary fee) they will chose the latter every time.

One of my clients had to painstakingly trudge through archive.net to recreate their missing blog posts. How fucking miserable is that? Over a few hundred megabytes of disk that DO could have kept around...

Also, actally make an effort to reach out before doing anything serious. Call phone numbers, email other members on the team to alert them to the issue, etc...

Too many times I have seen some script kiddie throw together a client's WP site and toss it on DO because it is 'so cheap and cool' and yet they forget about everything else: backups, security, managing the box, etc... and inevitably shit will hit the fan.

I was really rootin' for DO in the beginning. I even applied to work there when they were first starting out but did not want to relo to NY. Now I am moving three clients OFF of DO because they are all very unhappy with the level (or lack) of service they've received.


I think the saying "you get what you pay for" would apply in this case. People want to not pay for things, they don't get the things.


I think OP is saying that DO will delete all of those things if your credit card expires. Even if you have those features in place, you will lose everything.


Yep and I'm saying the reason DO isn't able to call you up to see why your card expired and to get you updated is because you're not paying enough to expect that level of service from them. There simply isn't enough "profit" from a service that costs so little to allow for that many customer service reps.


Or...if it was important enough to you that losing it hurts, then maybe pay attention to your emails and don't let things expire and pay your shit on time. And of course, a sane person would backup anything important.


Yep, and that is something I have since instituted since taking the reigns. Still... DO could turn this lemon of a situation into lemonade by increasing revenue and preventing unnecessary headache for their customers.


Ok, lets say I store all my backups at AWS, Google and Azure. My credit card expires and all backups are gone. What's the point of additional backups in this scenario?


Aws, gcp and azure are unlikely to delete your shit the moment your card expires?


I keep backups of my cloud data whenever possible. Mostly a couple hundred mb for small projects, I have been bitten by the same situation in the past


Why would you use the cloud but have that single point of failure? Billing is also a network activity. Why not have the backup infrastructure linked to another credit card?


Except no data wasn actually lost in that case? They got full access back to their account, and additionally, it wasn't a technical issue - they were accidentally flagged as a fraudulent / abusive account.


I linked two incidents. The first one required a trending HN post to get resolved. The second, the developer never got their data back.


If you know how these storage services worked under the hood you would understand that durability is not guaranteed. It is the responsibility of the customer to ensure their data is backed up.


Goddammit.



DO minimum for a cluster is $20/mo. That sure beats $72 although gcloud is offering the first zonal cluster for free. It might still be cheaper to use gloud for small things. I do really like DO though, have a few personal projects hosted there.


Look at Vultr too. My last 2 support tickets were responded to within 2 minutes and solved within 10 minutes. Their support has always been good, but unlike almost all other companies, it seems to get better as they grow.

I run a Kubernetes cluster on Vultr and I haven't had any problems.


Outside of just hating Microsoft...why not Azure?


Azure is as expensive as G Cloud, and less robust.


DO is for garage projects. If you need anything serious it's AWS / GCP or MS.


I run a hobby project in multiple zones because low latency is important, and Kubernetes makes it easy to do so.

There's no way I'm going to pay an extra $73/mo -- I already pay for the computing resources, this should be free.

Looks like I'll be moving away from GKE. It's a shame, I _was_ a big advocate.


We offer one free zonal cluster, which is specifically designed for your use case :)


I have personal experience of a few companies in the UK where GCP are offering 90+% discounts to onboard. GCP are spending hundreds of millions to do this. K8S control-planes are a rounding error compared to this.

You could have grandfathered in current deployments but - nope. In the tech world this is up there with killing Google Reader.


I use more than one zone because I run TCP services which need low latency. I'll probably just switch to Digital Ocean.


It's interesting how naive the gcloud folks in commenting here seem to be about how it is to be an enterprise customer evaluating their service. Deploying to a cloud vendor is a bit like deciding to build a house on a rented lot. You place an incredible amount of trust in your landlord - unbelievable really. It's a miracle people do it at all in some ways. This kind of action is like Google going to their land for rent and posting earthquake signs on front of it. It is toxic to the most fundamental element of trust people rely on when they choose a cloud vendor. It can't possibly be worth whatever miniscule revenue they will accrue from implementing this.


It's not so much about an additional fee (honestly, 0.10$ per hour are nothing) it's more about Googles practice of suddenly charging services, shutting down services as they like, not giving a shit about us customers. Azure and AWS are much more customer friendly here.


A long time Google cloud customer who migrated away to DO recently. Here are the things Google gets completely wrong

- They are way behind compared to AWS in terms of stability and features, even then they always demand similar premium (ref. recent incedent: https://www.cbronline.com/news/google-cloud-down)

- Tax issues in foreign countries are never resolved. Due to Indian laws, we must pay TDS on our business payment and google has to refund them to us. It takes almost a year to get those refunds from Google. Not startup friendly.

- No support!! If anything goes wrong, AWS or DO solved issues even technical ones at times even when you have no support plan purchased. Google literally is useless when it comes to support. They carry that attitude from their search business.

- I see someone mentioned egress charges. In reality, cloud providers get super cheap bandwidth and they charge 9-10 cents for egress which is outrageous. It's not only Google but AWS as well but Google being new in market could have done better job here.

- DO Kubernetes is free and as someone mentioned, control plane downtime does not matter until workloads are not affected.

- In a nutshell, if I need stability, I will go to AWS, if I need better price, DO is not a bad choice. I am not sure where is Google positioned.


Hey everyone - Seth from Google here. Please let us know if you have any questions! You can learn more about the pricing changes at https://cloud.google.com/kubernetes-engine/pricing.


Hey Seth,

Thanks for being the recipient of everyone's (justifiable) frustrations. They probably don't pay you enough.

I think, what is especially frustrating about this, is that we do already pay for resources that are provisioned by our K8S clusters. We pay for the network traffic, the storage, the compute. I saw you mention StackDriver... we pay for that as well.

I can appreciate that actually setting up and managing GKE backplanes is a non-trivial expense, but I generally assumed that that cost was amortized out, just like I don't pay for the backplane that runs GCE and the rest of GCP's service suite.

I also appreciate that you mention some customers are perhaps taking advantage of this "free" resource. But, isn't that quotas are for?

Frankly, more concerning than the fact that now I have a new $73/mo. fee attached to my account (which, is not the end of the world) is that this really comes out of left field, and in the context of concerns about the the nature of GCP's new leadership, and reports of Google leadership debating GCP as a going concern. I realize a lot of that isn't well founded, but it's surprises like this one that keep that narrative alive. AWS ain't no saint, but they are pretty consistently who they are: not full of bad surprises.

This just leaves a bad taste in the mouth, and makes me wonder if I can expect other surprising cost increases, or perhaps, if these don't "work", worse surprises like deprecation notices. Is this the precursor to you all discontinuing GKE because, as the DevRel class likes to tweet, nobody should be using Kubernetes if they can use (more expensive) services like Cloud Run?

Are we about to get Oracled?


> They probably don't pay you enough.

Can confirm :)

> ...we do already pay for resources that are provisioned by our K8S clusters

Customers are charged for worker nodes, but until this point, the control plane ("master") nodes have been free. In addition to the raw compute costs for those nodes, there's the SRE overhead for managing, upgrading, and securing them.

> ...but I generally assumed that that cost was amortized out

<googlehat>I'm not really sure.</googlehat> <civilian>My guess would be that, initially, this was the case. However, over time, people have created many zero-node clusters. Now the amortization isn't. Again, pure speculation.</civilian>

> But, isn't that quotas are for?

See my comment above about zero-node clusters.

> I have a new $73/mo. fee attached to my account (which, is not the end of the world) is that this really comes out of left field...

Acknowledge, but I do want to highlight that changes take place a few months from now (June 2020), not immediately. Furthermore, each billing account gets one zonal cluster with no management fee.

> Is this the precursor to you all discontinuing GKE because, as the DevRel class likes to tweet, nobody should be using Kubernetes if they can use (more expensive) services like Cloud Run?

100% no. Also, Cloud Run is almost always cheaper than running a Kubernetes cluster.

> Are we about to get Oracled?

I'm not sure what you mean by that verb.


Thank you for the detailed response.

> In addition to the raw compute costs for those nodes, there's the SRE overhead for managing, upgrading, and securing them.

By that logic, can we expect to see charges for GCP Projects and the GCP Console? Cloud IAM?

> people have created many zero-node clusters

I'd be really curious what is driving folks to do that. Are they using the backplane for CRDs and custom controllers and no compute?

This feels like it could be addressed similar to alpha clusters, or with a quota, e.g.: clusters with 0 nodes for > 24 hours will be terminated?

Separately, It seems like handing everyone 3 months to figure out what to do about a new $73 * X fee isn't the best plan. Including some kind of estimate in the emails that were sent out would have been helpful. There was a change in pricing for StackDriver a while back that did this. It was very helpful to understand how we would be impacted.

> Furthermore, each billing account gets one zonal cluster with no management fee.

My feedback is that you would probably get getting way less blowback if that free-tier didn't come across as inadequate. I can appreciate that there are use-cases where it makes sense for you all to be charging. But one zonal cluster... It makes the whole thing feel punitive.

> I'm not sure what you mean by that verb.

I have a feeling we're all about to go on a journey of discovery together.


> I'd be really curious what is driving folks to do that.

I was one of those people. I got an email from Google this morning and thought "that's weird. I didn't even know I was running a Kubernetes cluster." I think I created it years ago to work through a Kubernetes tutorial and, since it was free, never bothered to delete it.

So, I can imagine this being a problem. Though it seems like having a minimum hourly charge per cluster would have been a better way to handle this (i.e. if your cluster is using less than $0.10/hr in resources, you get charged the difference).


That seems like a really good idea, maybe they should look at doing that? As noted, $73 should be a trivial charge both from Google's perspective and the customer's for an actual cluster.


If abuse of zero-node clusters is an issue, wouldn't it be better to introduce a zero-node cluster fee the same way you charge for unused reserved IP addresses?


Also, how much resource does a 0-node cluster actually use on the control plane?


>> Are we about to get Oracled?

> I'm not sure what you mean by that verb.

The CEO of Google Cloud is the former President of Product Development at Oracle Corporation. Oracle Corporation has a reputation for being incredibly hostile to their customers, which includes things like "finding creative new ways to charge our customers more money. I mean, what are they going to do, switch to Postgres? lol"

I think the fundamental problem is that Google Cloud's reputation is irreparably harmed by Google's overall reputation among developers. Treating this like a tactical or technical problem ("our solution is the best and cheapest!") is missing the forest for the trees.


> finding creative new ways to charge our customers more money.

This is what I'm worried about, because it's the second time this year that GCP started charging for something that was previously included. Back in January they started charging $2.92 per month for in-use IP addresses.

The IP charge might be justifiable because of IPv4 scarcity (although their major competitors still include IPs), but with today's announcement coming just a couple months later, I'm worried they're going to start nickel-and-diming us. I'll be skeptical of any new GCP services that are advertised as having no extra cost.


I think the reason a lot of people create zero node clusters is that they want to "turn off" their cluster without destroying its current configuration or state, which otherwise doesn't seem possible.

I may be missing something here, but my guess is that a lot of people turn to GKE to learn how to use K8s, and then are like "wait, I'm in the middle of this project/tutorial/etc., but I don't want to be billed overnight when it's literally just going to be doing nothing, what do I do?" and find Stack Overflow or something recommending you just scale it to zero. See questions like this: https://serverfault.com/questions/877619/turn-off-a-cluster-...


> Furthermore, each billing account gets one zonal cluster with no management fee.

This seems like a really important detail. For hobby projects, one cluster in one zone should be enough. Per your statement, those people will not be impacted. With this knowledge, I'm experiencing much less FUD.


So couldn't you charge the control-plane for zero-node clusters?


Why is this change coming in? I can hardly see costs on Google's side having increased to provision and 'manage' K8S having increased over the past 3 years, especially given that it's used in production there. Also, given that no-cost K8S clusters was pushed by your sales and marketing teams back in 2018 as a significant benefit for switching to GCP, it doesn't really inspire confidence in GCP if we're just going to be shafted further on down the line. Lastly, $0.1/hour is expensive given that a 'managed' k8s cluster can be rolled out using Terraform and Ansible with a bunch of GCE nodes with minimal effort. This frankly just feels like a cash grab for those that are either inexperienced/unfamiliar with cluster management/provisioning, or from those that are in too deep with GCP and won't have another option other than to pay the piper, so to speak.


Thank you for the question. While I can't go into deep detail... as with most free things, people find a way to abuse the system. While we've invested significant effort to curtail such abuse, this is the road we've landed.

To your point about running your own K8S cluster - two things:

1. That's something you have always been (and still are) entitled to do.

2. Having personally run large-scale K8S clusters, the challenge isn't provisioning, it's maintenance, security patches, upgrades, etc.


So...

You guys roll out a free service. You tell your sales people to hype it up as a benefit over other providers. You somehow don't anticipate that some users will "abuse" the free service, so you hike up rates for everyone?

Sorry, I don't think you're likely to find much empathy on this one.


> You guys roll out a free service. You tell your sales people to hype it up as a benefit over other providers.

Strange argument. It is basically what whole world do. Give some free or heavily discounted product or service in hope of gaining market and later on increase price / start charging for that thing.


>It is basically what whole world do

It's not what the whole world does. Many companies gasp start charging for something right away! They're entire sales pitch was that it was free. People made decisions based on that which are not so easy to turn around.

It's a legal form of bait and switch and it's hardly accepted as an ok thing to do by consumers.


This is really disappointing. I've been a big proponent of GKE not only to my employer but to my friends as well. I think it's the best Kubernetes implementation available. The justification for a management fee b/c there are abusers just feels like an excuse for making some extra revenue. Truly with Google's prowess you can detect and deal with abusers without having to raise cost for everybody. I'm worried this is going to dampen the momentum of Kubernetes adoption unfortunately...


It's not _just_ abuse. It's not _just_ the new SLA. It's also the additional functionality we've built beyond just Kubernetes and how simple we have made the offering and auto-scaling, etc.


Hi Seth - I am a LONG time lurker here on HN, but this news just forced me to create an account.

I am part of a small company which has separated our deployment into a number of sub projects, some of which are: dev, staging, production, ci, etc.

The difference for us will be several hundred dollars per month, and that will make an actual (negative) difference for us. We didn't need a "financially backed SLA" before and we don't need it now.

You asked for a question and here it is: Why isn't a financially backed SLA a part of a billing negotiation? I mean, there are some really cool features in "Anthos" but I am not picking up a phone to find out how much that is going to cost.

If a really useful feature like "Cloud Run for GKE" is awkwardly placed in the "Anthos" box, then why isn't the SLA part of "Anthos" too?

Free clusters was a huge part of why we selected GCP. If this SLA nonsense isn't made optional, our next project is not landing on GCP.


Thank you for the feedback. I'll relay this to the product team. I feel your frustration and, unfortunately, I do not have much to offer beyond my promise to relay this feedback and the items I've expressed in other responses.


Hi Seth, two messages this change is sending: 1) GCP can arbitrarily add additional fees to services we consume whenever a PM is under pressure to increase revenue. 2) GCP pricing only goes one direction: UP

I know you are just the messenger here, and I send my sincere sympathies that you have to work with a product manager there that can't compute strategic impact of this change :)


Very disappointed. Not by the price increase per se... but by the lack of a reasonable 'always free tier'. I think you should strongly consider tweaking the pricing to provide one or two or three multi-zone clusters for free instead of one single-zone cluster. Let us see the power of GKE without the extra charge and grow on your platform. This would allow new companies to choose gcp over aws/azure and start out with a proper highly available cluster or two/three in different regions and grow to more clusters over time. With the new pricing you're forcing them to choose between a single zone cluster or $70 per month per cluster (or another cloud). Please consider tweaking the new pricing to enable a lower price ramp up for newer companies... why not offer three multi-zone same-region cluster for free and then charge more established enterprises using more than 3 clusters? I appreciate the money is in the big customers... but why scare away the small customers who want 1-3 highly available clusters behind a gclb to provide higher availability and lower global latency. The mindshare of developers will move away from gke if you're not careful... both to aws/azure and others like DO kubernetes.

I believe the community is keen to engage with you on this based on the comments in this thread. If your team would like to talk to a disappointment (very small but hoping to grow) customer I'd be happy to jump on a call. I hope others here would be happy to do the same.


I think you'd be interested in our Google Cloud for Startups program: https://cloud.google.com/developers/startups


It's a good program. I'm currently in the stage one step before that program. Would your team consider tweaking the pricing as I mentioned, with the goal of helping early stage startups choose GCP? GKE/kubernetes is increasingly not just for big enterprise. Personally I find GKE as easy as app engine or cloud run but much more future proof and more flexible/powerful... the real heart of a GCP to rival AWS. Just this week I set up Config Connector to provision a global load balancer and other GCP resources used by two clusters. An always free tier of two or three (ideally multi zone clusters) would I think go a long way to earn the trust and belief of many devs and early stage startups. As would coming back in the next few days with tweaked pricing based on community feedback.

Edit: Additional comments: You could limit the number of nodes in the always-free-tier clusters. Above n nodes the free tier clusters aren't free.

With the new pricing, I can't choose to use GKE instead of app engine/cloud run and get the same availability without having to pay for both the nodes and the new control plane cost. Those managed products run over multiple zones in a region. It's disappointing that even just one multi-zone cluster is charged.


> Would your team consider tweaking the pricing as I mentioned, with the goal of helping early stage startups choose GCP?

To be clear, it's not my team. I'm relaying feedback, but I can't make any guarantees or promises.

All this feedback is super valid and important, and it's being synthesized to the product team.


I appreciate that. Thanks for being available on hackernews and helping relay feedback.


Additional comments: You could limit the number of nodes in the always-free-tier clusters. Above n nodes the free tier clusters aren't free.

With the new pricing, I can't choose to use GKE instead of app engine/cloudrun and get the same availability (by this I mean multiple zones) without having to pay for both the nodes and the new control plane cost. Those managed products run over multiple zones in a region. It's disappointing that even just one multi-zone cluster is charged. I'd be very happy to see you include at least a single multi-zone cluster control plane in the free tier.


This change is pretty huge for non-revenue units & small teams at institutions and SMBs. These smaller teams often seem to run two clusters rather than try to split their production and dev environments within one cluster (I think this is even widely recommended for smaller, less experienced outfits). For many the management fee will probably be a large percentage cost increase for units that are very cost sensitive and require significant re-engineering to avoid for units where engineer hours a scarce resource.

Seems weird, given that GKE is basically the main reason people seem to use Google Cloud. These kinds of users aren't big fish, but I suspect a lot of them are going to run.


Yeah. Having 1 free cluster will just encourage SMBs and hobbyists to resort to the bad practice of commingling dev and production, wont it? It's the same attitude that GitLab has with a lot of their CI stuff. It's not only huge enterprises that want to try to follow best practices.


It's one free cluster per billing account. Have separate billing accounts for dev and prod usage of GCP. Probably a good idea to follow separate accounts on any cloud, for that matter.


This decision is penny wise and pound foolish.


Hey Seth, thanks for taking to the comments here; sad I wouldn't be able to catch one of your talks at Next this year in-person.

I'd like to share some feedback that echoes that of other commentators, from a different perspective.

I run a local cloud developer community with regional pull for attendees, as well as working directly with local early-stage startups looking to become cloud-native.

GCP has always been my go-to for recommendation for our attendees (mix of developers and technical founders, and some enterprise technology folks) given the affordability factor, pathways to additional credits to flesh out ideas, learn new technologies, or stretch the limited runway of their new organization, and ultimately my belief that GCP is one of, if not the best, clouds for developers given the investment in documentation and engagement DevRel channels.

With the rollback of open-enrollment into a smaller plan of Google Cloud for Startups, and price changes like this, I'm fearing I've chosen the wrong hill to die on when talking with these new customers.

I appreciate the inclusion of a free regional cluster per account, which will still afford myself the opportunity to demonstrate k8s at meetups and to end users without taking more of an out-of-pocket hit, and for folks to learn on their own or maintain hobbyist projects on the same budgets they are accustomed to.

My fear with this announcement is that the negative repercussions of this will not be felt on the bottom line or figures that it seems more and more is the priority of the Google Cloud leaders. Rather, it will be felt hardest by the smaller customers; the hobbyist developer or technical co-founder looking to learn new technologies, to scale up their operations, and who at least in my experience, are driving growth in the mindspace around GCP in their communities.

Put another way, moves like this will further tarnish the reputation of Google for those who the sales engineers have for the last two years, promoted heavily no cluster management fees like "the other guys," and in the eyes of many starting out in these areas (of which I recognize most will never become the big customers that satisfy the requirements of executives).

I hope that when the dust settles, this does not lead to a retraction of what makes Google Cloud great in my mind, which is specifically to developer experience and outreach.

With that in said, I would suggest really driving home this change through dismissable in-console communication at the point of cluster creation, on the dashboard, and in email communication to folks who this will impact, with a clear picture of the impact on them. No one wants another large disruption of thousands of small organizations and users, as was the case with the Google Maps pricing change.

Personally, I'd love to see it increased to one free regional or zonal cluster per account for the remainder of 2020, and then making only one zonal free per account effective 2021. Given the uncertainty around engineer capacity and scheduling given the ongoing human malware crisis affecting companies large and small, I think this could be a good middle ground to satisfy most customers affected by these changes, while still achieving the objective of moving this away from being a loss leader of sorts.


This feedback is super valuable - thank you for sharing. I'll be sure to relay it to the product team.



> One zonal cluster per billing account is free

For hobby projects nothing will change.


Note: Azure does not charge a master/cluster management fee (bias disclosure I work for Microsoft)


I'm not sure any provider appearing to capitalize on a momentary pricing decision is a great idea. All are too big and things change too fast for any crowing as if anything is a long term decision.

Replies on another thread by alleged Google employees appear to indicate they are learning on the fly that doing support matters and costs money.

There are a lot of comments from people wanting free things. You may get those things for a while but it won't last. Free only works for so long in anything no matter how hard that is for some people to figure out across all aspects of life.


Not looking to capitalize just looking to state a fact that has been in place since the service was launched. In the earlier comments people were discussing/suggesting alternatives and I believe AKS is worth considering.


Given this news, I suspect that won't last.


yup. also, AKS does not have an SLA -- https://azure.microsoft.com/en-in/support/legal/sla/kubernet...

>As a free service, AKS does not offer a financially-backed service level agreement. We will strive to attain at least 99.5% availability for the Kubernetes API server. The availability of the agent nodes in your cluster is covered by the Virtual Machines SLA. Please see the Virtual Machines SLA for more details.

google didn't have one either, so they added it + a price for said SLA.


It wouldn't surprise me if Azure charges for control plane at some point. Seems like EKS and GKE can get away with charging so why not...


Attempting to capitalize on a competitor's announcement isn't cool and doesn't look great.


I don't work for Azure and having used both GKE and AKS, I can say GKE is superior, however, I don't understand why this comment isn't cool. It's not like they're capitalizing on your misery. It's a decision you have taken and it's a fair game IMO that they want to emphasize their advantage over you.


Seems fine to me. Frankly, what isn't cool and doesn't look great is you not disclosing you work for Google in this comment.


Its actually appreciated. When a vendor makes a change to screw customers, those customers like me appreciate hearing from the vendor's competitors.


Oh, so now in addition to Google's reputation for killing services, GCP wants a reputation for raising prices?


They have done it before. Remember Google Maps?


They already had this reputation.


I don't like the direction this is heading, it seems like the SLA and the accompanying charge could easily have been optional.

We have a couple of dozen clusters, two per client, and can't change the architecture. We use helm and terraform and can build new clusters quickly but we can't treat them entirely like cattle because we don't own all the DNS. Our clients are not the sort to do things quickly - or even slowly.

Does anybody have any good and up to date resources comparing the current options for K8 providers? I'd like to get a feel for what it would take to switch.


As many have mentioned here already, $72/mo most likely a rounding error on workloads kubernetes is designed for.

I think, most customers love the change because of SLA where even 1 minute of downtime per year is amplitude more costly than 10 years of cluster managing fee.

This also showing the commitment from google to provide great and reliable service.

If you're looking to run k8s "for free", Digital Ocean looks like the way to go, but again, it's two completely different set of offerings and if you've chosen the google cloud in the first place then DO doesn't look like suitable alternative.


> As many have mentioned here already, $72/mo most likely a rounding error on workloads kubernetes is designed for.

There are _many_ reasons to use k8 beyond just workload scale and that amount per cluster per month isn't anywhere near a rounding error for many deployments.


I would be interested to know when you may need kubernetes for small types of workload and simple architecture.

For my understating, support contract alone for google cloud will cost you around $150/mo.

For small to medium workloads there are plenty of tools if you want to use containers: docker swarm and nomad. Docker Swarm is really simple and most engineers already know it because of `docker-compose.yml` they're using each day.

I can't really understand what type of workload you have because kubernetes cluster management requires you to have at least 2 full-time DevOps team. What is $72 compared to salary of two employees?

If you have an issue with $72 for SLA guarantee then I can't really understand why you need google cloud at all. They are literally bound to maintain zero downtime because of huge losses if something go down for their entire customer base.


One simple example - of many: Enterprise clients with low workload but very sensitive data strict infosec and segregation requirements - several clusters per client all managed with infrastructure as code.

There are lots of benefits to k8 other than just scale, and there are architecture choices that are made for good reasons that rely on separate clusters with no particular need for a legal uptime SLA on a control plane.

This decision will cost us a lot more than $72 a month.


I think that CRDs are partially to blame for this. CRDs can tax the API Server and backing data store, without directly mapping to a revenue-generating activity.

I’ve noticed a trend where teams spin up new clusters for each application. Since CRDs are installed on the cluster level, it is not possible to namespace resource versions. It is easier for teams to take the cluster-per-application approach as opposed to mandating a specific version of cluster tooling.

More small clusters means more control planes, and more subsidizing if a cloud provider is giving away the control plane.

I just finished a blog post on this opinion that goes into more detail- https://caleblloyd.com/software/crds-killed-free-kubernetes-...


If the trade is a SLA for a management fee, that is a reasonable business decision, and largely a rounding error for a decently sized company with a well designed clustering system. Lack of SLAs is a major issue IME with cloud providers.


"Cloud computing is a trap, warns GNU founder Richard Stallman" [2008]

https://www.theguardian.com/technology/2008/sep/29/cloud.com...


I know this is kinda against the guidelines, but just to highlight what an ideological war is being fought, at this moment, my parent post has received 5 upvotes and 6 downvotes, with the last 4 downvotes all occuring in the last minute.

On the content, RMS chose a colorful (insulting) language, yes, do hold that against him, because that is wrong, but in my opinion, at the core, his statements are quite legit.


I am grateful for Stallman's efforts but claiming cloud is a nefarious plot no matter the style is not "legit".

As for the voting, join the club. Like all communities online HNN is an often hypocritical cliquefest where the mob will have its way. But at least it's not r ;)


This is seriously such a bummer. This was the main reason for us moving out of AWS


We also jumped over for this reason and had to deal with a large number of gotchas from GCP. I kinda wish I never spent the long nights on this...


That's about $72 a month, which matches Amazon EKS's lowered pricing. I guess that rules out my hopes of EKS not charging for the management plane in the near future.


I mean... if I were Amazon, I'd eliminate those management fees this afternoon just to spite them. I'm sure it's a rounding error as far as AWS is concerned.


It looks like Google's offering one free cluster per billing account if I read that right. If that's not the case, I'll be turning off my cluster before this kicks in.


One zonal cluster per billing account is free


Wow, this is a huge bummer. A lot of our infrastructure assumptions have been based around having several small GKE clusters.


I think this trend is not good overall, and people will eventually be very unhappy with it. I'd rather help you figure out how to use fewer clusters.


Our cloud service here at confluent is designed around giving customers their own infrastructure. A lot of the times, that means giving them their own k8s cluster. The management overhead there isn't the issue however.

The real issue comes into play when you try to make developer environments.

To give our developers any semblance of a "real production-like" workload, they need to work with an entire kubernetes cluster - maybe even a couple - to simulate what's happening in production.

This means at any given time, we have hundreds of GKE clusters because each developer needs a place to try things. Yes, these are ephemeral and can be tossed aside, and yes they cost a tiny bit in VM prices, but adding a per-cluster management fee is going to skyrocket this expense and push us towards trying to figure out ways to share these clusters between developers, which defeats the entire purpose of the project.

We'll have to seriously consider abandoning GKE for this use-case now and that sucks, because it's by far the fastest managed k8s solution we've found so far.


Try KIND. Much better devex.


ya we're using that in a few places, too actually.


Tim, would you be willing to elaborate on why you dislike the "many small clusters" pattern?


Many small clusters just do not deliver on a lot of the value of Kubernetes. Clusters are still hard boundaries to cross (working to fix that). Utilization and efficiency are capped. OpEx goes up quickly.

There are reasons to have multiple clusters, but I think the current trend takes that too far.

TO BE SURE - there's more work to do in k8s and in GKE.


As always, the insight is appreciate, Tim!


Actually, could you elaborate on the benefits of your approach? edit: I am asking because this is counter intuitive to anything I'd want to solve with K8. Specially when it comes as a managed service.


I'm sorry to hear that. You can use namespaces and and separate node pools to isolate workloads. We'd love to hear more about your use case for having many small GKE clusters.


Hey Seth, I know you used to work at Hashicorp on vault. I think Vault recommends that if you want to deploy it on Kubernetes, it should have the cluster to itself.


That's correct. Vault Enterprise (at my last math) was ~$125k/yr, so that management cost is negligible :)


This is really disappointing. GKE was a staple amongst Kubernetes adoption, not only for the feature-set but also that there were no overhead costs.

I hope GCP re-thinks this.


For folks just trying it out, 1 cluster is still free.


For now.


For folks just trying it out, 1 cluster is still free... in a single physical data centre. Sadly you'll be charged for running a cluster across two or three data centres in the same availability zone (eg London).


This is a fair point. We don't have an HA (multi-master) zonal offering either, because mostly people don't want that.


$72/month per cluster, regardless of the size. It's interesting they're not charging per managed node (outside of the regular machine cost), makes it steep if you want to have a small cluster up.


Each billing account gets 1 free zonal cluster so the cost for keeping a small cluster up won't change.


How do if i know if my cluster is zonal? Does it mean all nodes are vms provisioned in the same zone?


>Single-zone clusters

>A single-zone cluster has a single control plane (master) running in one zone. This control plane manages workloads on nodes running in the same zone.

https://cloud.google.com/kubernetes-engine/docs/concepts/typ...


Currently I have a small cluster with 3 nodes running, each in a different zone (data centres next to each other, eg in London). Sadly this will now be charged.


You Probably Don't Need Kubernetes

If you're small, you can probably just run a few containers/VMs with automatic restart.

If you're big, you already made something custom like Borg.

Who's it for? Temporarily embarrassed unicorns?


Thomas Kurian is going to be the Steve Ballmer of Google, and he's currently tearing up Google from the inside. Google leadership need to wise up and give this guy his walking papers.


he is like GKE, once you start him, you can't stop :)


I am really disappointed by this. One of the reasons we moved to GCP from AWS was the ability to create multiple clusters at no extra charge. Now it looks like the pricing matches EKS.


I don't offhand remember AWS increasing prices for a service before but I might be wrong. How often does Google increase prices?

For a business I prefer a company that starts with higher prices and then only lowers them to one that may increase them at any time.


Having worked for AWS, one of the things we got a lot push back from was offering something free and then walk back from it.

AWS really strongly focuses on gaining customer trust, and they will only lower price, and never increase price. They won't turn off things until the last customer stops using it (they might stop new customers from onboarding)

I did not enjoy working for AWS so I left pretty fast, but some of the customer obsession there really impressed me.


For a AWS product manager, pricing is a one-way door once you cross it there is no way back from it you.


I was wondering the same thing. Nothing comes to mind but there are enough services I don't use that its easily possible something slipped through.

AWS lowering compute costs is fairly largely shared, but I am curious if anyone has compiled a list of the cloud providers (AWS, Azure, and GCC) increasing the costs of services.


they added IPv4 address rental costs earlier this year, nearly doubling the cost of a small VM

with no IPv6 option (of course)


Google Maps, GCE, this the second time I see Google increase price on highly important parts of business. On the third case it will be a trend.

Makes me rethink whether I want to do any business with Google anymore.


I was an early adopter of Google App Engine when it was first released and the consistency guarantees of the datastore burned me pretty badly. What I experienced in the test environment was not even remotely close to what I got in production.

Just last week, I was so tempted to give Google another shot over AWS. By all accounts, they have come so far, and more importantly, started using more of my preferred toolset as well. This is exactly the kind of thing that really scares me off.

I have a huge amount of respect for the world class systems scientists and engineers at Google. Unfortunately, they seem to counter it with a customer experience full of uncomfortable, often expensive surprises.


We're a little bit too locked to GCP to migrate out.... But I'll definitely stop evangelizing GCP to people.

I have now lost faith in you, GCP.


same here - losing face now with my team when I argued for GCP (and won) against AWS.


And this is why when Google's team comes knocking on your door enticing your company to move to GCP from AWS you should ask for at least one years worth of credit at minimum.

While you transition leave as many doors open to switch back.


The email definitely scared me and put a sour taste in my mouth. Sudden fee changes with somewhat vague details aren’t a great way to build trust.

Especially when GCP doesn’t have a great rep like Azure and AWS.

To clarify, this is hourly charge per entire cluster right ? If it’s a small cost, ~100/month, why can’t GCP absorb it. Digital ocean, which is a smaller player has much simpler and saner pricing.

AFAIK GCP has higher compute pricing for similar instances compared to AWS. (Last I checked). How is this a good deal for the customers ?


I think this is fair: for hobbyists, a free zonal cluster is sufficient and you probably wouldn't use more than one cluster. For businesses/revenue drivers, the $7.30/mo/cluster is nothing (EDIT: actually $73/mo/cluster, which may be a tougher sell but if the business is in a case where it benefits from Kubernetes, it's still likely insignificant relative to the cost of actually running the VMs on it )


What's unfair about the situation is the fact that Google's sales people hyped this up as an advantage over other providers for years.


Just doubling down on the free zonal cluster as a hobby tier. Folks seem to be missing that among the announcement.


If I'm not mistaken, it should be $73.00+/mo


Oops, I can't math. Fixed.


The guarantees for uptime with 99.5% for regional and 99.5% for zonal clusters seem pretty good. Is this above the average in terms of cloud providers?


Small typo: 99.95% for regional. It's similar to AWS with a 99.9% for EKS clusters. Also not sure what GKE's new SLA reimbursement structure will be going forwards.


99.95% availability (three nines) is something a single datacenter infrastructure could do, it's even typical, but 99.5% is actually pretty low, that's worse than running a service from home with a single home internet connection.


There's this theory - "CRDs Killed the Free Kubernetes Control Plane"

https://caleblloyd.com/software/crds-killed-free-kubernetes-...

Basically running etcd in production with reasonable SLA and more than a little bit of data is super hard.


I haven’t done cost calculations yet, but this might actually make me want to explore to EKS pricing. I do love the experience of GKE.


It's the same as the new pricing that EKS has at $0.10/hour. Guess that rules out my hopes of EKS not charging for the management plane any time in the near future. :'(


It's slightly different than other cloud pricing because we include a free zonal cluster as a hobby tier.


You don't need to do the calculations by hand :). We've updated our pricing tool to account for these changes: https://cloud.google.com/products/calculator#tab=container.


Careful Google, free Kube is GCP's only big-shiny draw and you've just cut that into quarters.


Note: Azure does not charge a master/cluster management fee (bias disclosure I work for Microsoft)


This is penny wise pound foolish GCP.


At one side EKS is slashing their prices and on the other GKE is increasing it :(


"Anthos GKE clusters are exempt from this fee"

Adopt Anthos or pay, basically.


As long as they don't support kvm Anthos is going to be tough sell, considering the VMware premium.


There actually was a 15/cent/hr fee a few years back:

https://www.forbes.com/sites/janakirammsv/2017/11/29/google-...

YMMV, but I think the value prop of using a managed GKE cluster vs the raw costs and engineering time to run your own control plane is still strong.


I guess I need to start migrating my clusters, then. $73 per cluster on my prod, dev, QA, and test clusters is a significant increase for me. Looks like Google is doing their best to kill off GCP - https://www.cnbc.com/2019/12/17/google-reportedly-wants-to-b...


Curious where you are going to go that is going to be cheaper for the management plane and work nodes.


AWS+kops is less expensive. Digital Ocean and Azure provide services without charging for master nodes.

I will likely choose AWS+kops.


The number of people that still trust google not to screw customers over is amazing.

They got you into their tech by offering something for free - now they're upping the price to match the leading competitor (in cloud that is).

Better than getting you into their ecosystem and discontinuing a service I suppose.

I hope this backfires on them and they provide it 'free forever' as they would have received a lot of customers drawn in from their free GKE control plane offering


Ahh, wonderful. You're taking away the only thing that made me pick GCP and GKE over AWS and EKS?

AWS used to charge me for the master node, and it was more than the worker nodes cost. So the logic reaction to AWS lowering that cost to a flat $.10 an hour was for Google to raise their own price point to that??????

As a GCP customer, I've got one simple question. Why exactly should I continue to use GKE?


The good ol' bait and switch tactic; lure new customers in with zero fees and switch out the fee structure once they're locked in.


I've been mentioning Google's penny pinching for awhile here. Simple things like Chrome's address bar showing google searches before my bookmarks. It's all part of the monetization. Are fees like this because Google is struggling to continue to grow? That's probably the most concerning thing for Google's future, rather than a $73 fee.


One of the reasons I recommend GKE over EKS is because of the lack of fees on the control plane.

I guess that one advantage is gone now...

Bad move Google Cloud. Bad move.


AKS still has a free control plane. GCP won my business for a bit, but quickly lost it based on some features. I still love StackDriver and BigQuery, but don’t love doing business with GCP- the sales/support experience was pretty lacking, networking for serverless was immature for multi-region, and what they are doing with Anthos feels like Oracle (it is).


There are so many issues with this. I used GCP because I liked the service and was able to build all of my stuff well on it. My personal projects already make me shell out over $250 a month because of running costs and now you want to gouge me for hourly running of a cluster I alraid pay hourly of compute time for? What in the actual hell...


The SLA guarantee for stable is interesting because at every turn the documentation and our TAMs encourage us to switch from Stable channel to the Regular.

Workload Identity, for example, is pushed hard, but it's Beta so there's no SLA and it's broken in Stable so if you use it you're going to have outages or be forced onto Regular.


Are there any other companies doing managed Kubernetes in GCP?

If there was interest I could try to build a third party clone of GKE running in all GCP regions and still managing GCP VM's, and I could run it for a lot less than $0.10 per hour (although obviously it would take many months of refinement till I could offer a decent SLA)


Azure and AWS both have managed Kubernetes ...


But a lot of businesses have a reason to stick to GCP. For example if you need access to TPU's... You can't easily use Azure kubernetes with the actual VM's running in GCP...


Canonical has an offer


Years ago, GKE had a $0.15/hr charge for clusters with more than 5 nodes. They dropped that and other clouds matched them. Now they're adding back a fee for all clusters.

Bringing back that original min cluster size would be a good idea and probably solve a lot of the issues with people running dev/stage clusters.


I'm not gonna beat on the dead horse, but with GKE increase I need to explain why I choose GCP for GKE and other services like new KFP. The increase is not much TBH but the point is that there is an increase and tomorrow could be BQ and then later APIs...I don't know what to say to my manager now...


I honestly can’t tell if this is satire.


Those who had a chance to look deeper into k8s control structure, can comprehend such movement. There's no free lunch. I mean master nodes, etcd and all that stuff in HA has its costs. That's it. Surprisingly, AWS announced 50% price reduction for EKS control plane January this year :)


This is not cool. One of the best parts of GCP is that the control plane is free for k8s. :( this change is simply a money grab at least optically.

How does the exemption for anthos clusters work? If I enable it on my cluster but it’s still a GCP native cluster and not on non-premises one, am I still billed?


I wish there was a "Preemptible" or more cost effective tier for GKE. I get why they're doing this and $70/month seems reasonable for a premium service but would be nice to have a more cost effective option similar with tradeoffs similar to "Preemptible" nodes.


Now is the right time to Terraform stuff and get some portability in. We can help you out if you need to be quickly up and running on Infrastructure-as-Code ---> https://www.cloudskiff.com/


Does anyone have any opinions on alternative stacks to GKE within GCP? My side project is all dockerized. We love that aspect. Google Cloud Run looks promising, but it's not available in Sydney, which rules it out for us unfortunately.


This makes GKE pretty much impossible to use for side projects. Given that it will cost $100 per cluster per month without including the instance costs.

Edit: A single zonal cluster per GCP account is exempt from this fee, so this comment is inaccurate.


Free (zonal) cluster per account, regardless of size, should cover a lot of this, no?


Great point! Didn't catch that one. I'll edit my comment


That’s kind of a lot. $876 per year for a toy cluster that might not do anything...


Why is the third in place cloud (GCP) making their offering worse than the second place one (Azure)? DigitalOcean also doesn't charge per cluster. Not to mention how behind the Kubernetes releases they are...


I'm not sure why everyone is so surprised. Part of Google's monetization model is to offer developer-friendly software for free, then charge a small fee once it crosses the headache-to-replace threshold.


GKE is the best Kubernetes offering according to my investigation of Azure/AWS/GCP. However adding extra $73/mo is not trivial for small teams like us. That is price of a medium-classed VM...


Not that I invested much in GCP but this was just what I needed to completely stay away from it myself and my clients. Really awful decision. I'm sorry for those affected who trusted them.


What resources would the cluster mgmt require? I always thought k8s was perfect cloud platform as the management was minimal and you're actually paying for the pod resources you require.


Now I'm not sure what sure what we were paying for when we paid for our cluster? Isn't the price we pay to use/build the cluster the payment for the cluster?


And I just cancelled all my Kubernetes side projects on GCP


Wait until people realize gcp might not exist after 2022 if it’s not profitable. Better pay for it than have your platform yanked out from under you.


Better still to take this opportunity to get out before Google does what Google does best.


How many businesses really need Kubernetes? Can't you orchestrate infrastructure + rolling deploys with Terraform + Docker containers?


I'm not aware of any rolling deploy functionality with terraform or docker.

Docker swarm does, but they are primarily using swarm with kubernetes these days.


Just a reminder to everyone: avoid proprietary services such Google datastore/firestore, etc! There is no (easy) migration path.


So what makes anyone think this is the last?


i can't remember the last time AWS raise the price of their services.

in fact, i can only recall when they reduce price.


Just got this email. $0.10 per hour is so much for something that was 0 per hour before. Wow these guys. The chutzpah! This isn’t just offensive monetarily, it’s offensive like price gouging is offensive. It’s emotionally offensive. Gotta look somewhere else. Shame.


Ah, the sound of former Oracle executives counting their GCP bonuses.


Why people think cloud providers are benevolent provider of infra is beyond me. Their margin is because people are willing to pay it. Either run your own metal (k8s arguably makes this easier than VMs along in the past) or form a cloud coop, but absolutely don't be shocked when a business is taking money off the table because they can.

Don't marry yourself to a provider, stay portable, it's just good risk management. Pricing changes? Spin up a cluster elsewhere, migrate data, migrate traffic, profit. The terms of the agreement can change at any time.


Different companies make money using different approaches and it's perfectly valid to be upset at the approach a particular company is taking. Just because they provide a service doesn't mean they'll try to f* you over at every chance. For some it's bad long term business to do that.

AWS, for example, begins with high prices and then lowers them over time. It costs money but you know the maximum.

Google seems to be grabbing you with cheap prices and then jacking them up when you're committed (google maps is another example offhand). Maybe no on purpose but bad initial pricing and ill-intent have the same impact externally.


> AWS, for example, begins with high prices and then lowers them over time. It costs money but you know the maximum.

That's only true for official pricing though.

I know of multiple cases where AWS had initially given significant discounts, only to stop doing so once they believed the customer to be firmly tied to the platform.


> AWS, for example, begins with high prices and then lowers them over time. It costs money but you know the maximum.

Past performance is no guarantee of future benevolence or reasonable behavior.


> Past performance is no guarantee of future benevolence or reasonable behavior.

Sure, but pushing prices down and eating competitors margin to the benefit of the customer has been Amazon's MO since inception.

Google seems to make a lot of missteps wrt pricing and the cloud. Remember the geo pricing change that put a lot of projects out of business??


Amazon's bandwidth charges are still extortionate. Have those ever gone down?


Same goes for metal, your data center can suddenly increase your prices massively or go bankrupt.


For sure! That's why I said, "Don't marry yourself to a provider, stay portable". Compute is a commodity, treat it as such.


Except that costs time and money, and for many people isn't worth it. Good business is not about eliminating risk but understanding and managing it.

For a startup, the risk of AWS doing something is tiny compared to all other risks so not worth spending effort to mitigate.

For a moderately sized company, true, being on multiple clouds may have an advantage.

For a large company, you get long term contracts with AWS that mitigate the risk.


Yes I agree. But this industry is driven by fashion not engineering or risk analysis both of which are changed to fit the scenery. There are very few purists left who understand this.


I guess I won't be using GCP then.


Alternatively, don't use kubernetes.

https://doineedkubernetes.com/


Thems fightin words around these parts.


That's it, I'm out of GCP.

They could at least start working on fixing their horrendous documentation.


Try project Gardener.

It's fully open source and uses kubernetes to run kubernetes control planes and manage underlying infrastructure across many infrastructure providers.

Manage homogeneous kubernetes clusters across Azure, AWS, GCP, Alicloud, OpenStack, VMWare at scale. Kubernetes on bare metal with Packet Cloud or using open source metal-stack coming soon.

Extensible for other infrastructures, contribute support for your favorite infrastructure.

Automation of day 2 operations e.g. etcd management including automated backup/restore.

Choose kubernetes version, DNS provider, operating system, network plugins, container runtimes.

Extended cluster services like DNS management and TLS certificate services.

https://gardener.cloud/ https://github.com/gardener https://kubernetes.io/blog/2018/05/17/gardener/ https://kubernetes.io/blog/2019/12/02/gardener-project-updat... https://landscape.cncf.io/selected=gardener

https://github.com/gardener?q=extension-provider https://github.com/gardener?q=gardener-extension-os https://github.com/gardener?q=gardener-extension-networking https://github.com/metal-stack/gardener-extension-provider-m...

https://www.packet.com/developers/integrations/container-man... https://github.com/metal-stack https://knative.dev/docs/install/knative-with-gardener/


slowly killing the competition


A very basic truth that AWS understands is that you introduce a product at price X, and then either keep it at X or reduce the price moving forward. You never, ever, EVER increase prices on established products. You also basically never discontinue products, even if there's only a handful of customers still using it (ex: SimpleDB).

Apparently it's just too tempting to Google management to juice revenue in the short term, but screw their customer trust and adoption in the longer term.


We see it again and again and again and again. I’m amazed that people still put trust into Google’s services.


Google is the worst product management company. Being the GCP, they don't really care, but this will move many SMBs to DigitalOcean.


I saw someone mention Digital Ocean k8s products, right now Linode has a Linode Kubernetes Engine (LKE) "Beta" program going on where they give you some credits to fool around. I've been using it to learn k8s myself, and it couldn't be easier to use for a beginner.

https://www.linode.com/products/kubernetes/

I'm not even going to post or give anyone my Linode referral code, because I think their product is so good I'm happy to promote it for free, for no credit, especially if it can help someone leave Google for good.


[flagged]


> dick pic app so fuck em.

> fucking

> shit

> horseshit

> arrogant fucksteins

> crap

I'm not usually one to flaunt the HN guidelines, but I would suggest a read-through, and perhaps exploring different venues for venting frustrations.


[flagged]


It's not ok to bring in someone's personal history as ammunition in arguments on this site, so please don't.

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

It's also not ok to do personal attacks with abusive language, so please don't do that either.

https://news.ycombinator.com/newsguidelines.html


Welcome to HN - this is not reddit - please take a look at the comment rules: https://news.ycombinator.com/newsguidelines.html


What’s the betting line on GCP shutdown? Q3 2021? Q2?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: