Licensing decision aside, I don’t feel a lot of sympathy for Hashicorp here. I think this is different than other scenarios where big MSP’s sell tools based 99% or open source software.
This service will largely be used for deployments on Google Cloud, for which Google invests a lot of development effort in maintaining their own provider. It’s not like there’s not already significant contribution from Google to the code base.
This isn’t really a service. All it does is deploy an infrastructure template into your GCP project. It won’t largely be for deployments on Google Cloud. It’s for automating Terraform, and whatever providers the customer wants to use.
You, the customer, pay for everything deployed, and Google just pre-connects it to all their services for monitoring and maintenance. It would be like if Route53 on AWS was free, but it deployed a VM in your account, added gateways and nats, opened ports, etc., so that it all ran on discrete infrastructure for just you and you got charged for everything and had to do all the scaling.
If you’ve used their Apache Airflow product (Cloud Composer), it’s basically the same thing. With Cloud Composer, they are setting up an Airflow node and cluster on your behalf, in your account, that you pay for, and connecting it to their services.
This is no different than going to a consulting company and asking them to setup and maintain a Terraform automation platform in your account., which Hashicorp said was allowed. Google isn’t reselling it as a product. They’re setting it up on their platform on your behalf and giving it to you.
And there’s no reason you couldn’t switch it out to OpenTF.
They may well do, but to be honest my fundamental point extends beyond just Google. Hashicorp benefits extensively from third parties maintaining their own providers.
So from a product perspective they basically manage tfstate, you get to use “pre-packaged and recommended” providers (not clear if say aws provider is allowed) and they slapped iam to it? Seems like another one of those frankenstein cloud designs…
This is definitely half baked IMO. Not a whole lot of benefits over running TF with Cloud Build and a storage bucket. Definitely have more control and better UX than the weird hoops you jump through to set this up.
There are numerous comments here about the licensing change and that this could be related, but HashiCorp announced this at Google Cloud Next (and on their own blog), so it seems like a fairly standard partnership arrangement.
- the situation: you build cloud infrastructure via CLI or console
- the problem: if something gets deleted, or if you make a bunch of changes and want to undo them, or if you need to make a change to a lot of stuff at once, or if you want to copy what you've built to a new region, how do you do it? or, if you're on a team, how do you as a team make and track changes to your cloud infrastructure?
- terraform as a solution: you describe your cloud infrastructure as yaml files. terraform can figure out what is different between what's in your cloud infrastructure and what your yaml files say it should look like. and, it can make changes to your cloud to e.g. build it from scratch, make wide-ranging changes, make a copy of it, etc.
- since your yaml files are code, you can also create a repo and do PRs to make and track changes to your cloud infrastructure as it evolves over time
Appreciate the answer. In hindsight my use of the word scripts was insufficient.
Looked at the TF code; my solution implements similar functionality to handle AWS CRUD ops. What I avoid is all the DSL parsing and such.
For me an AWS account is a struct with fields of AWS SDK resource types, which it seems is what TF resources map to (they handle a lot more so there’s more to it, but kind of sort of if I squint just right). Either going to duplicate the internal logic or DSL chunks per project, would rather avoid the context switch between syntax, “learning the TF ecosystem”.
What Terraform brings to the table for us is the capability of calculating the delta of "I want those resources" and "these resources are actually there" by having a separate state stored e.g. as JSON in S3 to compare your code, the world as it should be and how it actually is. That takes away reimplementing that.
Why just not writing idempotent resource creation? Terraform also uses this to calculate a "plan" that shows the diff of your changes with reality, which really helps to figure out what happens to your RDS when executing, especially when more abstraction (in form of Terraform modules) is involved.
We used Terraform also in a situation where writing custom code would be "prettier" but would have required to write this actual vs desired state code ourselves and could save us the work of doing so.
The DSL of Terraform is sometimes quite cumbersome though as it's derived JSON and not some actual programming language.
To your last point, yeah, I think Terraform gets really painful when you have to do something involving derived values in a loop. Also just computed values in general there is not a great story around (which is not necessarily terraforms fault, but rather a symptom of what you are provisioning).
A simple example of what I mean by computed values is that let’s say you want to provision a k8s cluster on top of a network. The k8s provider might want the network name/id which you could normally get by setting it upstream. The problem is you can’t plan the network creation and k8s cluster in a single pass because you don’t get the network name until it’s actually provisioned. You actually need to apply the network tf first to get the inputs you need to plan the cluster. Meaning not only do you need to run tf twice, you also can’t E2E plan infra provisioning
If anyone has a solution/pattern for the above (or more generally how to chain these modules together when this limitation exists) I’m all ears
Can your example be solved by having the k8s cluster resource reference the network resource’s “name” attribute?
Doing that allows Terraform to create both resources in one plan/apply step, and it also helps Terraform understand the dependency between the resources so that they are created in the correct order.
The point of TF is working with your infrastructure declaratively. You write down what you want, how it should be integrated with each other, how IAM should be set up. And then that is what you get.
For me using terraform is quicker than clicking things up or using CLI even when initially developing things, as if something goes wrong I can just destroy the state and re-apply the terraform config.
> For me using terraform is quicker than clicking things up or using CLI even when initially developing things
then you obviously have every single resource name, required values, and relationships between them memorized because in my experience post-facto encoding of something into TF can be valuable to the organization but trying to _discover_ the providers, iam, required fields to achieve a desired outcome is crawling over broken glass as compared to click-ops-ing something in place
Hell, there's even a browser extension to record the AWS calls so one can at least see what was done later for replay, but with GCP they have their own sneaky RPC something-something encoding so that idea's off the table
> then you obviously have every single resource name, required values, and relationships between them memorized
I don't.
> crawling over broken glass as compared to click-ops-ing something in place
I do not experience reading the terraform provider docs like crawling over broken glass but okay.
I program with Python also and it's not like I have memorized every class and function either, but still, I and millions of other people somehow manage get by.
Is there anything particularly painful about working with the Google Cloud Terraform provider? If there isn't, I would rather use OpenTF with that provider and manage state myself.
In my experience, running a plan is much less likely to catch a bad value than the AWS provider.
Subjectively, the AWS provider will at least validate that fields have valid values during the plan step. The Google provider doesn't seem to validate actual values until apply, and then you get a failure
I can’t help but feel… sad about this. I only recently picked up Terraform and am astounded that this is what goes as coding in the infrastructure world. I was coming from Ansible so there was only improvement to be had, but man did Terraform let me down so far.
It (well, the provider) doesn’t validate fields until apply. That’s just so… sad. How is that acceptable? It’s like a car without a steering wheel, and people just go along with it.
It's not really Terraform's fault. Terraform provides the capability to do all kinds of validations before running an apply, but it's up to the providers to implement the validations. If the provider doesn't implement the validation, then it's not there.
It gets hairier when you delve into the details. The provider is typically an official provider that wraps some company's API, so that company ought to have a good set of validations, since it's their own API, right? Wrong. The team that writes the Terraform provider is typically different from the team that creates API methods, and the API methods themselves don't typically expose "dry-run" style functionality, so there's little for the team writing the Terraform provider to check. Meanwhile, the business doesn't care - the Terraform provider checkbox is already checked and validations/dry-running isn't a feature that affects revenue.
Do you know how hard/tedious/pointless it is to write client side evaluations for everything you do on the server? The documentation for the Google Cloud provider is shit though and absolutely should be improved.
How is a terraform plan different from a dry run?
I always mentally mapped terraform plan == dry run to validate what changes will be made. Your comment throws a gauntlet into that understanding..
I just recently used it and found it significantly more verbose than the AWS provider. Which is unfortunate, because I've actually grown quite fond of GCP.
As someone who doesn’t have any experience with GCP. But I do have experience with AWS, I would have thought this would already be a thing.
I know AWS doesn’t have managed TF support - and as a former AWS ProServe employee i know that they thought about it but didn’t do it because they didn’t want to step on Hashicorp’s business.
AWS does have its own hosted IAC service that was introduced before TF was a thing with CloudFormation.
Which is why it's all the more puzzling Google didn't spring for OpenTF here. They single-handedly could have proven it as the fork of choice but instead they're paying into HashiCorp's bad decision?
Google has a history of welcoming the product on their platform without taking ownership or responsibility for maintaining it. It does the same thing with Elastic, Confluent, and Redis. I’m guessing Hashicorp is deploying infrastructure on Google cloud on the customers’ behalf. It’s not costing Google anything because they’re not reselling it. AWS does something similar with it’s Marketplace, but customers prefer the AWS full-service product rather than adding it to their environment by network peering or deploying templates in their account.
Is that really relevant? CloudFormation is just another primitive at this point.
It's not much different from high-level code compiling down to machine code. The benefit of writing high level code isn't that machine code is entirely gone. The benefit is instead that as a user you can mostly forget that machine code exists.
I see a roughly even split of people using CFn, Terraform and (Python) CDK.
AWS shot themselves in the foot by making the Python version of CDK second-tier after Typescript; IaC is still done by DevOps people far more often than application people, and DevOps people use Python.
Another gripe is the number of services and new features which launch without CFn support, which also blocks CDK support; when Terraform supports a new platform feature before the vendors own tools do, that's a sign the product teams are being driven by the wrong metrics.
And as far as TF supports services before CFT. Guess which is easier for an AWS employee to do - getting the CF service team to support a new service or just contribute to Terraform’s open source project?
I know of at least one service where the service team introduced the needed APIs and then an employee of AWS wrote the TF provider and contributed to the project before AWS’s own internal team added it to CFT.
Source: former AWS ProServe employee. I am not referring to myself as the author.
Terraform really drove CFN to even pretend to care about resource support. It’s typically day 1 now, but only because NAT Gateways took so long it was embarrassing for AWS.
Idk if I buy this. I’m mostly a backend service developer, but my team manages our own infra using CDK and we love it. And we’re glad they used typescript as it’s a fantastic language.
> Python version of CDK second-tier after Typescript
What? It's not second-tier at all, every single feature in Python is in sync with TypeScript, the library versions are in sync, and the docs for all the languages are auto-generated. They're not second-tier, they're 100% single tier.
IME quite a few of those auto-generated docs (mostly the examples) have slight incorrectness, where the code snippet is a mangled mix of Python and TS.
Also, the tooling you need to drive CDK is all TS-based, which means I now need to think about NPM and Node versions occasionally, which are not relevant in any other part of my workflow.
Admittedly as someone in scientific computing I am unusually far from the JS/TS ecosystem - but all I can tell you is that it feels second tier as a user.
Presumably GitHub/GitLab support is coming, but this is quite a limited product at the moment. It doesn’t even support their own Cloud Source Repositories.
You can version the Terraform configuration, either in a public Git repository or in a Cloud Storage bucket. Use Object Versioning to version configurations in a storage bucket.
This seems likely a partnership between Google and Hashicorp, from what I understand most GCP integrations like redis are actually partnerships rather than just running the OSS independently. Potential license and trademark implications seems like it would be a particularly bad time to try the latter with Terraform.
As others have mentioned, this is not all that useful of a service - it doesn't seem to even have the concept of a "plan" let alone any approval system and seems to only allow for the most basic of workflows. Given that TF by default will store state in a bucket with true locking, I can't come up with any potential benefit this provides vs using Terraform directly.
The main question is whether this will be improved in the future or is intentionally just "Terraform Trial Edition" with terms of partnership preventing anything encroaching on Terraform Cloud. Perhaps for the former, the trial is important to understand usage to better inform revenue sharing for a future improved product.
I was really hoping this would come with rollback support for deployments. That, to me, would be the big advantage of having gcp manage the tfstate. It doesn’t look like that’s currently supported, but maybe that’s coming in the future.
Yeah if I have multiple terraform projects in one repo it would be helpful to have a UI show deployments for each project with a list of rollouts so I can see which one I want to go back to. For SREs it might not be as helpful but if you delegate some scoped terraform module creation to the developers themselves it’d help them to have a simple UI for rollbacks
Ah yeah I see what you mean. At my last (big tech fwiw) shop as we were transitioning to GitOps, we ran into this issue a lot. My current gig we're too small for this to be a big issue for us but I remember the pain well.
This service will largely be used for deployments on Google Cloud, for which Google invests a lot of development effort in maintaining their own provider. It’s not like there’s not already significant contribution from Google to the code base.