After trying out most of the kubernetes ecosystem in the pursuit of a declarative language to describe and provision services, Nomad was a breath of fresh air. It is so much easier to administer and the nomad job specs were exactly what I was looking for. I also noticed a lot of k8s apps encourage the use of Helm or even shell scripts to set up key pieces, which defeats the purpose if you are trying to be declarative about your deployment.
https://kapitan.dev/ is the one-stop shop that covers for true declarative configuration with either jsonnet, python (kadet) and jinja, amazing secret management with support of gkms, awskms, gpg, vault.
It is simpler than other tools, because you can get started without even touching jsonnet or python or anything else, when using our generators.
It does more than all the other tools combined, as it replaces helm+helmfile+gitcrypt or kustomize.
It's universal, so you can use it on non-kubernetes situations where other tools leave you high and dry.
Indeed. Helm offers great features but it suffers from the kubernetes unnecessary complexity and by using golang templates in YAML.
When I started with kubernetes I converted my small Docker compose files to kubernetes files. Later I rewrote everything in helm charts. Now it's almost more YAML and golang templates lines than business logic lines in my applications.
I'm considering to go back to Docker compose files. It's simple, readable, and easy to maintain.
Highly recommend trying Jsonnet (via https://github.com/bitnami/kubecfg and https://github.com/bitnami-labs/kube-libsonnet) as an alternative. It makes writing Kubernetes manifests much more expressive and integrates better with Git/VCS based workflows. Another language like Dhall or CUE might also work, but I'm not aware of a kubecfg equivalent for them.
Jsonnet in general is a pretty damn good configuration language, not only for Kubernetes. And much more powerful than HCL.
If you like those, I'd take a look at Grafana's Tanka [0]. It also uses jsonnet but has some additional features such as showing a diff of your changes before you apply, easy vendored libraries using jsonnet-bundler, and the concept of "environments" which prevents you from accidentally applying changes to the wrong namespace/cluster.
I looked at it, I don't like it for the same reason as I dislike many other tools in this space: it imposes its own directory structure, abstraction (environments) and workflow. I'm a fan of the kubecfg-style approach, where it lets you use whatever sort of structure makes sense for you and your project.
It's a 'framework' vs 'library' thing, but in the devops context.
The problem with templating YAML is that you're templating text, in a very sensitive to whitespace syntax. By definition, Jsonnet avoids that because it operates on the data structure, not on their stringified representation.
My experience with jsonnet varied: there's good jsonnet code, and there's bad one, too. Just like with any programming language, you have to apply good software engineering practices. Text templated YAML, however, is terrible by design.
ytt is even more templating-yaml-with-yaml, so it all ends up being a bargain bin Helm. There's no reason to do this over just serializing plain structures into YAML/JSON/...
I'm aware, but I just don't understand the point of it. How is it in any way better than a purpose-specific configuration language like Dhall/CUE/Jsonnet? What's the point of having it look like YAML at all?
https://kapitan.dev/ is the one-stop shop that covers for true declarative configuration with either jsonnet, python (kadet) and jinja, amazing secret management with support of gkms, awskms, gpg, vault. It can also render helm charts!
It is simpler than other tools, because you can get started without even touching jsonnet or python or anything else, when using our generators.
It does more than all the other tools combined, as it replaces helm+helmfile+gitcrypt or kustomize.
It’s universal, so you can use it on non-kubernetes situations where other tools leave you high and dry.
* https://github.com/kapicorp/tesoro a secret “webhook controller” to seamlessly handle Kapitan secrets in your cluster. Better than sealed-secrets because there is no need to convert secrets and it supports KMS like google and aws together.
Get started with our blog: https://medium.com/kapitan-blog or join our kubernetes slack on #kapitan
It is. But k8s has no convenient way of parameterizing releases that can beat Helm. A simple stateless application needs:
- a deployment
- a service
- an ingress
- a config map (or several)
- a secret (or several)
It's even worse for stateful applications.
And each of the resource definitions is 60% boilerplate, 35% application-specific and 5% release- or environment-specific.
Helm would probably be a nice and neat tool if it had stopped at maintaining a simple map of variable names to values. But since applications need things like "if the user said SQLite, add a pvc, a configmap and a secret and refer to them in the ss, if she said Postgres, go pull another chart, deploy it with these parameters, then add this configmap and this secret and refer to them in the ss", Helm is an overcomplicated mess.
Sometimes I even wish they could embed a JavaScript interpreter... After all, YAML is almost equivalent to JSON, which the perfect templating language for JSON is -- JavaScript tbh.
Or people have to keep inventing half baked things.
> Helm charts are declarative way of deploying app(s) and their accompanying resources.
How do you make helm chart deployment declarative? `helm install` is not declarative (in my understanding `kubectl apply` is declarative and `kubectl create` is not. Let me know if my understanding of declarative is wrong). Thanks.
This is one way of doing declarative using helm, but this doesn't guarntee resource creation order. `helm install` creates resources in order while `kubectl apply` creates in order of alphabetically filename. This may lead to some unexpected issues.
Maybe I was doing it wrong but every guide was verb based - “helm install X”. My declarative ideal ended up being a text file full of helm install commands and that wasn’t what I wanted.
Quick-start guides takes the easiest path to get something running, which is `helm install` in the Helm world.
If you want to have complete control of what you're pushing to the API, use Helm as an starting point instead, run `helm template` and save the YAML output to some file, publish it using `kubectl` or some other rollout tool. I recommend using `kapp` [1] for rollouts.
Great writeup, even if I'm longing for more details about things like Unimog and their configuration management tool.
Pay close attention to the section where they described why they went with Nomad (Simple, Single-Binary, can schedule different workloads and not just docker, Integration with Consul). Nomad is so simple to run that you can run a cluster in dev mode with one command on your laptop (even MacOS native where you can try scheduling non-Docker workloads). I'd go so far as saying that it would even pay off to use Nomad as a job scheduler when you only have one machine and might have used systemd instead. You can wget the binary, write a small systemd unit for Nomad, then deploy any additional workloads with Nomad itself. By the time you have to scale to multiple machines you just add them to the cluster and don't have to rewrite your job definitions from systemd.
The biggest hurdle with adopting nomad is the kubernetes ecosystem and related network effects. Things like operators only run on Kubernetes, and they're driving an entirely new paradigm of infrastructure and application management. HashiCorp is doing their best to keep up while supporting standard kube interfaces like CSI/CNI/CRI, but I don't know how they can possibly stay relevant with Kubernetes momentum.
In my opinion, HashiCorp should look at what Rancher did with K3s and offer something like that, integrated with the entire Hashi stack. The only reason most people choose nomad is the (initial) simplicity of it (which quickly goes away once you realize how "on an island" you are with solutions for ingress etc). Deliver kube with that simplicity and Integration and it's a much more compelling story than what Nomad delivers today.
This is what Nomad is though... It works without their other products and also natively integrates into them. Deploying Vault, Consul, and Nomad gives you a very nice experience.
Also with Consul 1.8 and Nomad 0.11 you’ll get Consul Connect with Ingress gateways which solves some of those problems you mentioned.
Please make an attempt to understand a technology before making comments like this. Nomad has no requirement to store kv's, nor secrets. It's an entirely separate thing from Consul and Vault. It simply integrates with Consul, and Vault. It even runs its own raft store, so each product's backend is totally separate (Vault dropped it's reliance on other backends as of 1.4 and can run it's own raft store now).
Nomad template stanzas are simply consul-template (https://github.com/hashicorp/consul-template#plugins), you could use `{{ with secret "" }}` and never touch Consul. You also have every function beyond service and the key* set of functions. You could build pretty static, or dynamic configurations using these blocks without ever touching Consul. On top of that, both Terraform templates and Levant work well for templating job specs out themselves, which contain template stanzas in them.
An example of something helpful would be if you wanted to drop a very small binary that changes with each deploy, you could use https://github.com/hashicorp/consul-template#base64decode and just change the contents each time you deploy the job.
If you wanted to use redis keys in your consul-template, simply drop a binary on each server as a plugin to consul-template, then for example: `{{ "my-key" | plugin "redis" "get" }}`.
Why not try running it yourself?
`nomad agent -dev` will get you a server running, then `nomad job init` will give you a full spec which you can run with `nomad job run example.job`.
I understand nomad, and what you're saying is not how anyone runs it. Directly from their documentation:
"Nomad schedules workloads of various types across a cluster of generic hosts. Because of this, placement is not known in advance and you will need to use service discovery to connect tasks to other services deployed across your cluster. Nomad integrates with Consul to provide service discovery and monitoring."
So one of the very basic features of an orchestrator, which is service discovery, requires consul. Sure, I can use it without consul to just start jobs that don't communicate, but obviously that's that the normal use case of it, and you can see that by looking at their issues list.
So basically, when CloudFlare was making the decision to adopt Nomad, they had already adopted Consul and had already built in-house a custom scheduler (unimog) for their customer traffic.
It's rather disingenuous to compare to Kubernetes-based installations by pointing out that Nomad is a single Go binary that's easy to install, because they're also running Consul and unimog. For what it's worth, kubelet (which schedules jobs, like Nomad) is also a single Go binary that's easy to install, as is kube-proxy (which helps services running on a node to send traffic to the right node, so, roughly analogous to Consul).
If anything, the author practically makes the case against Nomad. Practically speaking, the only reason why CloudFlare adopted Nomad is because the stars aligned on their stack. Most companies will actually reduce complexity by adopting Kubernetes instead, compared to running two different schedulers for jobs and externally-managed service discovery.
I agree with everything you said, but it’s important to note that the blogpost isn’t comparing nomad to K8s, but describing why nomad makes sense for cloudflare.
From cloudflare’s perspective, K8s is a riskier choice, precisely because it’s large and multifunctional. If Nomad (or Consul) upstream went south, CF would probably be able to maintain it without seriously growing their headcount. And they’d probably be playing the lead role in that ensemble.
If kubernetes upstream went south, the above statements would almost certainly not hold.
For them, picking a stack means picking something they need to stick to, potentially for the next decade or two. When your decisions are made with that long term in focus, the biggest, shiniest, and most exciting community is not always the right choice.
Like you said, it seems like this is the right choice for CF. It’s probably not the right choice for $TECH_STARTUP, for whom pinning the future of business critical infrastructure to Google and RedHat’s stewardship of K8s is the least of their concerns, and that’s OK.
Having worked extensively with both the k8s and hashicorp stack let me tell you that there is something to be said regarding the simplicity of the former.
First - companies will want service discovery for all managed resources. Consul is just outstanding, so there’s that.
On-prem, nomad is simple to manage and it gives you a form of freedom that k8s locks up in a black box.
After all, it’s a process scheduler, go ahead and schedule what you want.
A container, raw exec or batch job. Add parameters to the batch job and dispatch them at will.
If you start from scratch or can move absolutely everything to k8s? Well sure, but not a lot of places you’d call “enterprise” isn’t in that position.
It's often _distributed_ as a container, but it ~~is~~ was a single monolithic go binary. However, while I was looking for supporting evidence to that claim, I discovered that it recently[1] was evicted from the github.com/k/k repo and the docker image that is now built uses a shell script named hyperkube and the Dockerfile just untars the underlying binaries into /usr/local/bin
I wasn't able to immediately track down the alleged new repo for hyperkube, though
Do people that aren't cloudflare scale really see the need for kubernetes and/or Nomad?
Of the two Nomad seems much more sane because it does one thing only and is much simpler to manage and deploy.
That said, having have used it, we are mostly moving away from it. Consul + Docker/Docker-compose with systemd in "a service per vm" model has proved much easier to administrate to our scale (couple of datacenters, ~1k VM mark). It is actually so simple that is boring. Instead of fiddling with infrastructure the developers spend time solving business problems...
Our stack is really boring, developers can find their way around ansible and share galaxy roles to configure / provision infrastructure... Other cloud native projects like Prometheus and Fluentd give us all the visibility we need in a very straight forward and boring way.
From my personal experience, yes, but not just for scale. I've used Kubernetes in a 300k person organization and now in a 35 person organization. Kubernetes isn't the simplest solution, but it's just reliability in a box. You can control the network, and the servers, and the load balancing all through the API which means for most things I can install a fully clustered solution with a handful of commands. Set the number of instances I want to run and forget the whole thing exists. Suddenly I'm cloud agnostic by default and the underlying K8s management is handled by the cloud provider.
In the 300k person org it was for scale and reliability. In the 35 person org it's so that 1 guy can manage everything without trying.
I see. My last work experiences have been introducing devops workflows into companies that already had a decent tech body (~200 people) and the traditional divide between IT/Dev.
I think the approach I mention ends up being a good compromise from both sides. If I were starting a company from scratch I would certainly consider it differently.
Imo kubernetes isn't about "scale". Folks like cloudflare often run a very limited number of services (proxy plus control plane), which can make features like quotas/affinities/limits less useful because 100% of the machine installed in the pop is dedicated to one process.
I see kubernetes as more useful when organizationally you have more heterogeneous services and separate dev/sre groups. This allows them to divide up responsibilities more easily.
I have a set of developers that can focus on their code and not have to worry about the infrastructure, TLS certs, DNS, storage or whatever. That all gets abstracted away.
For SRE, that gives one a common set of orchestration tooling one can develop against.
Interesting, I went the other direction recently, from systemd to Nomad.
I was motivated by a move away from config management in favor of commands wrapping Nomad API calls. For a "devs on-call" model this was preferable to being gatekeepers of PRs against config management.
To glue the whole thing together I got Consul Connect going in the Nomad jobs, so service config complexity was comparable to docker-compose.
Saying this out loud makes me realize it was the organizational model I was pursuing that led me here (so called "production ownership"). And I'm not a big config management fan ;). I take it you have dedicated Ops teams or Devs willing to learn Ansible well?
I'm surprised Hashicorp hasn't repositioned this product.
Terraform was a huge breath of fresh air after Cloudformation. If you ask me, deploying apps via k8s is even better.
Everyone wants a free lunch and for me, momentum + cloud vendor support + ultimately the Nomad Enterprise features that come free with K8s made the choice easy.
> Terraform was a huge breath of fresh air after Cloudformation.
Humans shouldn't write CloudFormation templates. If you're doing that here is a important free clue: use CDK.
One advantage of CDK over Terraform is that the 'state' is the CloudFormation stacks themselves as opposed to Terraform's state system. Another is that you get a real programming language (your choice of several) using CDK.
TF can be often suffocating still. Pulumi runs circles around it, when it comes to user experience in writing complex, modular, composable and reusable configurations.
By going with a "fork and patch" model, rather than shelling out to the existing terraform providers, they are putting themselves in a constant race to overtake TF's popularity.
To say nothing of the more esoteric providers that one can cheaply build and place in the `.terraform.d/plugins` directory and off you go, in contrast to trying to find the dark magick required to use some combination of https://github.com/pulumi/pulumi-terraform-bridge and https://github.com/pulumi/pulumi-tf-provider-boilerplate but ... err, ... then one has to own that new generated pulumi provider code? So like a fork but worse? Maybe they should make a bot that tracks the terraform-provider topic on GH and uses their insider knowledge to generate pulumi providers for them.
Don't get me wrong: I anxiously await the death of TF, but until there is a good story to tell my colleagues about an alternative to it, they'll continue to use it.
Except now you have to deal with javascript. It's hard enough to turn some operators towards an IaC approach, but throwing Javascript at them isn't going to help.
Pulumi is so good (even if it is just typed-SDK's over Terraform). Only sane approach to IaC IMO, using an actual programming language with types and IDE integration.
It’s also not typed SDKs over Terraform. The best Pulumi providers are native and offer a much richer experience than the wrapped Terraform providers (see the Kubernetes support for example). I for one look forward to those rolling out across the board!
They’re not interchangeable - kubernetes-x builds higher level abstractions on top of the pulumi-kubernetes API which is faithful to the openAPI supplier by Kubernetes.
To my mind, the real power of having general purpose languages building the declarative model instead of a DSL is that libraries like k-x (or indeed AWS-x) can actually be built!
Speaking as a 25-year sys/netadmin who writes a fair bit of code too: nearly anything.
I'd rather learn Go than deal with JavaScript. Python would be fine. Elixir would be great (I'd much prefer Erlang but there are only so many miracles I'm allotted in this lifetime). Perl, please.
Pulumi has support both for Python and Go along with TypeScript and JavaScript.
I never understand when people say operators don't like programming languages. Always seems weird to me that a person that constantly works with computers would prefer a static markup language instead of full programming language for managing the complexity of operating software in production environments.
My original comment was about this attitude. I personally think programming languages are much better than markup languages for almost everything. If you are presenting static content then markup languages are fine but if you're doing anything more than presenting static content then YAML just makes no sense.
A full programming language gives too much flexibility. In particular, this includes the flexibility allowing to shoot oneself in the foot.
A person who constantly works with computers is painfully aware of the people's ability to screw things up by an honest mistake when programming them. The more complex the thing, the easier it is to make a mistake, because human attention is finite.
Whenever you can limit the language to the task at hand, and build in guard rails that would prevent you from doing things you should not be doing, it's usually a productivity win, even if it precludes implementation of 3% of some most advanced and complex solutions. Reliability and simplicity go hand in hand, and reliability is often the thing devops people value very highly.
Having worked in several declarative config systems at a big tech company that tried to provide these guard rails. They errode!!
And it makes sense. As we demand more flexibility to not repeat ourselves (DRY) we get clever and add little features.
Suddenly you can stick an entire cluster config (or several!!) inside a lambda that returns the declared resources... And you've gone straight to hell. You're more than Turing complete by then and engineers are reinventing conditionals and loops with lambdas all over your config codebase.
Might as well have just started and left it at Python/etc.
I can build DSLs (domain specific languages) with programming languages. I can't build DSLs with YAML. And when someone says "You will use YAML because you can't be trusted with a programming languages" then I find that very disrespectful and condescending.
The solution to finite attention isn't less powerful tools. The solution is more powerful tools that augment operators and let them extend their finite attention spans into longer time horizons.
How is your description and viewpoint different from what I just described?
Cloudflare, make sure you upgrade to 0.11.3, the new scheduling behavior is awesome for large clusters.
Also a massive warning to anyone wanting to use hard cpu limits and cgroups (they do work in nomad it’s just not trivial), they don’t work like anyone expects and need to be heavily tested.
There's been a ton of improvements over time. That doesn't make it any less foot-gun-y, unfortunately.
I had the exact same problem with Kubernetes (and even just straight up containers), I just want to make sure folks are extremely aware of how big of a footgun it is, and to really, really, really test it well.
One point i never get for companies operating their own hardware: If your problem was having a number of well known servers for the internal management services and you then move to a nomad cluster or kubernetes to schedule them dynamically, you end up with the same problem as before to schedule the well known nomad servers or kubernetes masters. So is the only advantage here that the nomad server images update less often than the images of the management services?
Not associated with CloudFlare but I've built similar stuff with Nomad.
The cost/maintenance trade-off works when you have more SPOF management hosts than Nomad servers (5). You decrease host images down to 2, Nomad server and client, versus N management images.
Though it does sound a bit like they're using config management rather than pre-build images.
Bonus, Nomad servers are more failure resistant using Raft consensus versus any N management hosts. And for discovery I found the optimal pattern is to put all of the Nomad servers in a "cluster" A record for clients to easily join (pattern works well for Consul too)
The Service is offered primarily as a platform to cache and serve web pages and websites. Unless explicitly included as a part of a Paid Service purchased by you, you agree to use the Service solely for the purpose of serving web pages as viewed through a web browser or other functionally equivalent applications and rendering Hypertext Markup Language (HTML) or other functional equivalents. Use of the Service for serving video (unless purchased separately as a Paid Service) or a disproportionate percentage of pictures, audio files, or other non-HTML content, is prohibited."
In other words, as soon as you start using significant CDN bandwidth for images, video, audio, etc. they will contact you and ask you to upgrade your account.
Thank You! The answer was from 2016, and yet this is the first time anyone has posted this every time a similar question was raised on HN over the past few years.
There are people with free accounts moving GBs every month through their network and I imagine those free users must account for a very large percentage of their traffic.
I’d imagine at this point they are heavily peered in most markets, driven by said free users, so there isn’t a significant opex hit bandwidth -wise. Space/power opex plus network/compute hardware capex probably dominates their spend.
So what you're saying is that Cloudflare is so big they can reach these sort of deals with connectivity providers where they don't pay for bandwidth themselves?
Well, to a certain extent. I didn't mean to imply that their monthly spend for bandwidth is zero -- I'm sure they aren't anywhere close to peering 100% of their traffic, they aren't in the DFZ, and, of course, they've gotta pay somebody (or, more correctly, several somebodies) to connect all of those datacenters together!
---
Long answer:
About six years ago, they described their connectivity in "The Relative Cost of Bandwidth Around the World" [0]:
> "In CloudFlare's case, unlike Netflix, at this time, all our peering is currently "settlement free," meaning we don't pay for it. Therefore, the more we peer the less we pay for bandwidth."
> "Currently, we peer around 45% of our total traffic globally (depending on the time of day), across nearly 3,000 different peering sessions."
Remember, that was about six years ago. I wouldn't be surprised if both their peers and peering sessions have increased by an order of magnitude since that article was published -- just think of all the datacenters that they're in today that weren't back then, especially outside of North America!
Additionally, they've got an "open" policy when it comes to peering, as well as a presence damn near everywhere [1,2]. Since they're "mostly outbound", the eyeball networks will come to them, wanting to peer.
Running an anycast network and being "everywhere" also has some other benefits. They perform large-scale "traffic engineering" -- deciding which prefixes they advertise where, when, and to who, and the freedom to change that on the fly -- so they've got tremendous control over where traffic comes in to and, perhaps more importantly, exits from their network (bandwidth is ~15x more expensive in Africa, Australia and South America than North America, example).
So, yes, CloudFlare is still paying for transit but, at their level, it's relatively "dirt cheap". Plus, in addition to the increases mentioned above, bandwidth is likely an order of magnitude cheaper -- at least -- than it was six years ago!
---
EDIT:
Two years later, in August 2016, CloudFlare published an update [3] to the article linked above. A few highlights:
> "Since August 2014, we have tripled the number of our data centers from 28 to 86, with more to come."
> "CloudFlare has an “open peering” policy, and participates at nearly 150 internet exchanges, more than any other company."
> "... of the traffic that we are currently able to serve locally in Africa, we manage to peer about 90% ..."
> ".... we can peer 100% of our traffic in the Middle East ..."
> "Today, however, there are six expensive networks that are more than an order of magnitude more expensive than other bandwidth providers around the globe ... these six networks represent less than 6% of the traffic but nearly 50% of our bandwidth costs."
"... here is the CPU usage over a day in one of our data centers where each time series represents one machine and the different colors represent different generations of hardware. Unimog keeps all machines processing traffic and at roughly the same CPU utilization."
Still a mystery to me why "balancing" has SO MUCH mindshare. This is almost certainly not the optimal strategy for user experience. It is going to be much better to drain traffic away from older machines while newer machines stay fully loaded, rather than running every machine at equal utilization factor.
I'm an engineer at Cloudflare, and I work on Unimog (the system in question).
You are right that even balancing of utilization across servers with different hardware is not necessarily the optimal strategy. But keeping faster machines busy while slower machines are idle would not be better.
This is because the time to service a request is only partly determined by the time it takes while being processed on a CPU somewhere. It's also determined by the time that the request has to wait to get hold of a CPU (which can happen at many points in the processing of a request). As the utilization of a server gets higher, it becomes more likely that requests on that server will end up waiting in a queue at some point (queuing theory comes into play, so the effects are very non-linear).
Furthermore, most of the increase in server performance in the last 10 years has been due to adding more cores, and non-core improvements (e.g. cache sizes). Single thread performance has increased, but more modestly.
Putting those things together, if you have an old server that is almost idle, and a new server that is busy, then a connection to the old server will actually see better performance.
There are other factors to consider. The most important duty of Unimog is to ensure that when the demand on a data center approaches its capacity, no server becomes overloaded (i.e. its utilization goes above some threshold where response latency starts to degrade rapidly). Most of the time, our data centers have a good margin of spare capacity, and so it would be possible to avoid overloading servers without needing to balance the load evenly. But we still need to be confident that if there is a sudden burst of demand on one of our data centers, it will be balanced evenly. The easiest way to demonstrate that is to balance the load evenly long before it becomes strictly necessary. That way, if the ongoing evolution of our hardware and software stack introduces some new challenge to balancing the load evenly, it will be relatively easy to diagnose it and get it addressed.
So, even load balancing might not be the optimal strategy, but it is a good and simple one. It's the approach we use today, but we've discussed more sophisticated approaches, and at some point we might revisit this.
Thanks for the detailed reply. It would be interesting to see your plots of latency broken down by hardware class and plotted as a function of load. I'd be pretty surprised if optimal latency was achieved near idle, since in my experience latency is a U-shape with respect to utilization: bad at 100% but also bad at 0% since it takes time to wake resources that went to sleep.
I'm sure your system has its benefits, I just get triggered by "load balancing" since it is so pervasive while also being a highly misleading and defective metaphor.
Actually, I think K8S is much simpler to operate than nomad.
With nomad, you have too many options, you can run your service/job as a process or within containers, but with Kubernetes, there is only one way to do things -- container, which is really a simple choice to make.
Besides, Kubernetes got etcd builtin, so I don't have to use deploy and maintain consul.
Last but not least, I still see containers mysteriously gone, and have no idea how nomad did that. With kubernetes, such thing never happened.
I love nomad and architected a large system that used it for several years. We are finally moving away from it, however, because of its lack of native support for autoscaling. I know there are third party solutions for it, but that doesn't work for us. I suspect a large number of k8s users could use nomad instead with way less overhead.
As you probably know, decisions like changing infrastructure architecture is a team decision and are made with the information available at the time. The nomad autoscaler just wasn't announced/released in time for us to seriously consider it. That coupled with the uncertainty around the future of Fabio made us look elsewhere. We remain staunch supporters of Hashicorp products overall. We will continue to use Terraform, Consul and a little Vault.
Hope they allow users to access documentation for past versions of nomad. Currently there is no easy way to find if particular configuration option mentioned in the documentation is available for not.
- have a restart policy outside of what the docker daemon offers
- automatically schedule across multiple machines based on constraints including affinity and anti-affinity, bin packing etc
- “dispatch” jobs
The Terraform Docker provider effectively provides a tiny subset of the functionality of Nomad, which provides similar functionality across many drivers, not just Docker.
I worked at a hedge fund that also used nomad. The problem, however, is not how well it scales or whatever, but the fact that all the accompanying deployment info and literature is for kubernetes, and k8s has far more features.
I like the quality of products from HashiCorp, but k8s is far, far, ahead of where nomad is.
What I really want is better integration for Terraform and kubernetes. The current TF k8s support leaves much to be desired, too many things are missing or broken and I find there are several bugs that result in deployment flapping (i.e., constantly re-deploying the same thing when there are no changes).
That's a fair point, I guess it depends on your use case. The risk, however, is that the powers that be at HashiCorp one day decide to abandon Nomad once they realize it will never be a profit centre for them.
It would be very odd if he said the opposite, since customers could be spooked. But saying it's going to be around is very different from saying it will always have a lot of resources on it going forward, especially compared to their cash cows of consul and vault.
"standing by" until they're not. If their VCs start applying pressure to dump the unprofitable projects and seek more profits, their tune will change in an instant. At the end of the day these are for-profit entities, and the only thing that matters is profit.
The only reason they're pursuing this strategy is because they think they can get a piece of the kubernetes market. If that doesn't pan out, they will dump nomad like a bad habit.
For-profit companies aren't open source charities.
Nomad is open source (or, at least, a significant subset of it is). Anyone is able to continue to improve it, even if Hashicorp is no longer paying people to work on it.
That's not enough, Basho (creators of Riak database) also made it open source. In fact after they went out of business Bet365 even purchased their proprietary code and made it open source, but the database is still considered dead.
I used to do sales for an enterprise "open source" software product, so I get it, but the truth is as soon as someone stops paying people to keep the project going it will die.
Seems like some projects survive their parent company abandoning them like illumos (successor to opensolaris). It's not common, but also not impossible.
Not at all. Hashicorp literally pays for all development of nomad. They're the only commits short of a small number of PRs. Kubernetes commits are from a wide array of companies, and Google is only one.
And just as with Kubernetes and Google, Nomad development can continue outside of Hashicorp if Hashicorp no longer decides to support it. Which org has a longer track record of deprecating almost everything they release? Not Hashicorp, and frankly, I’ll always trust Hashicorp versus Google based on the historical behavior and forward incentives of both.
It's not quite the same. Hashicorp controls whether PRs get merged. Google does not control whether PRs get merged into kubernetes. There's a long list of companies that do, including IBM, redhat, Huawei, etc. Sure, you can fork it, but now you have a separate repo that requires people to know about it.
Crossplane https://crossplane.io might be what you’re looking for with its bunch of controllers and a nice composition API.
One of the best features is that you can bundle a CR to request a MySQL database and it will be satisfied with whatever config is in your cluster so that app only declares the need but not care how it’s done.
Disclaimer: I’m one of the maintainers of Crossplane.
I think that'd be a bit challenging because Config Connector is highly opinionated, for example, Kubernetes Namespace corresponds to GCP project. Though it might be enabled to be used as part of a Composition when we support namespaced CRs to be used as composition member.
Every config connector resource can be annotated with the project for GCP resource to be created in. Namespece to GCP project mapping is encouraged, but not enforced, you are still free to create resources in multiple projects from a single namespace as well as in a single project from multiple namespaces.
Isn't exactly declarative, nor really imperative, mostly but not entirely idempotent, and worst of all, it has faux-state that may or may not reflect the actual state of the services.
I used to love Terraform, but it broke my heart. Now we're done.
Which features is Nomad missing? Feature count comparisons are meaningless unless the features are tied to actual important use cases. Lots of software is encrusted with rarely used features that just add complexity.
So generally it's somewhat useless in organizations where there are multiple different teams that should be able to coexist on a cluster without stepping on eachothers' toes, or even where you want a CI system to access the cluster in a safe manner.
We fully intend to migrate more features to OSS in the future -- especially as we build out more enterprise features. As you can imagine building a sustainable business is quite the balancing act, and there's constant internal discussion.
This is going to make our AI team very happy because they can just dump experiments into the cluster at low-priority so those'll be done when those are done.
It's also going to make the operators very unhappy because it'll be harder to monitor actual memory utilization (allocated memory vs memory really in use) in order to plan cluster extensions. Are there some tools around or work planned to make this kind of scaling and utilization easier?
I was going to say "yes!" because we do have some telemetry/metrics improvements coming up, but then I realized those won't address your use case. It seems like aggregating cluster resource usage by job priority is what we need for that. Our UI might be able to calculate that, but I don't think our metrics include job priority so there's no way for your existing tooling to display it.
Please file an issue or link me to an existing issue of you have time. This seems really compelling!
For my personal use I don't need them and for a business Enterprise won't break the bank. You'd be surprised how much your management might be interested in having an escalation path if everything goes south and you need a hot fix. I can vouch that Enterprise support is worth it if you're on a small team that can't spend all day on this stuff
We run everything in Nomad for all our teams, but we don’t use any of the features you mention, and it’s not causing issues. We wrote some metrics for how much memory all the allocations are claiming, and we throw another EC2 instance in the cluster when we’re running out. Works pretty well. Preemption seems like a nice feature, but I imagine getting teams to coordinate their preemption values would be a political nightmare.
For me the ability to squeeze in some jobs in between the cracks at best effort priority is crucial (ie. any sort of high latency batch processing / experiments). I wouldn't want these extremely low priority jobs to compete with, I don't know, an actual customer facing service.
Also, we run on bare metal, so there really isn't a way to request extra capacity within seconds.
Nice! It's definitely been one of the bigger holes.
I hope you'll also consider sequential tasks in a single jobfile. Without it it's kind of awkward because you have to be passing shell scripts around just to run a setup step before your actual workload.
For us it's not a feature missing, but the mindshare is low, and there's not really any prebuilt examples of how to run and maintain services long term reliably.