Hacker News new | past | comments | ask | show | jobs | submit login
The Dhall Configuration Language (dhall-lang.org)
232 points by bjourne on July 14, 2022 | hide | past | favorite | 188 comments



I work on a deployment tooling team. In 2019/2020 we did a deep dive into Dhall vs. Jsonnet^1 for standardizing config and kubernetes templating across my company (Zendesk). We ended up going with Jsonnet (although some Dhall evangelists in the company have kept the dream alive!), which I think is a more approachable language for many, but Dhall has a lot of cool features and good things going for it.

Jsonnet is far from perfect, but it gets the job done pretty well and has been relatively easy for engineers across the company to pick up.

That said, after a 3+ year journey, the shortfalls in our original design have become more noticeable and we're giving more thought to writing a more robust tool using a more traditional language like Go to solve some of our configuration/deployment and data templating problems.

Cue^2 is something I'm keeping an eye on these days as well.

[1]: https://jsonnet.org/

[2]: https://cuelang.org/


CUE's author invented Borg configuration language or BCL since 2008.

BCL code is the 3rd largest human written code in Google internal code base. In 2019 Aprial, there is 180M lines of BCL, while C++/Java sits at ~300M.

BCL configuration's large scale use probably is beyond any other infra as code use cases known to human.

And the learning and ideas over more than a decade, is manifested in CUE.

Personally, this is enough to convince me to comfortably ignore anything else on the market.


> In 2019 Aprial, there is 180M lines of BCL, while C++/Java sits at ~300M.

It's... not obvious to me that that's a good thing? The ratio of configuration code to code in the things being configured being that close makes me think that BCL is something that's ill-suited to what it's now being asked to do but there's too much of it to realistically replace.

Maybe it's amazing and the problem it's solving is so complicated that even a language designed very well for solving that problem leaves you needing a lot of it, but I don't really consider "a staggeringly large amount of code has been written in this language" to necessarily be an endorsement of that language's quality. Citing Google's 300M lines of Java would also be a silly reason to pick Java in a project where you'll never interact with the Google ecosystem.


It's the configuration complexity clock again. https://mikehadlow.blogspot.com/2012/05/configuration-comple...


This is a really good mental image of what happens - thank you for sharing it.


The number of lines of BCL at Google says little-to-nothing about the efficacy of the language itself, it's more of a reflection of the complexity and scale of the _systems_ its used to configure.


BCL is great, it's just lived out by its longevity.

But the point is that this large scale application offered the space for exploring the space of configuration as code, infra as code, and many relevant technical problem space for designing and deploying configuration.


It's it a good language? Unknown.

Can it be used to successfully build and maintain configuration at extremely large volume and complexity scales? Yes.

Also we're not talking about using this language, but its spiritual successor.


> It's it a good language? Unknown.

I have personally written several thousand lines GCL (the generic version of BCL used at Google) and I can say that it can be pretty frustrating.

The difficulty and complexity of defining configurations using it really depends on the system you are configuring since you are (generally) just defining a set of static fields that are packaged into a protobuf and fed into whatever system you are working with.

Outside of syntax issues, it's up to the system you are configuring to provide concise config semantics and helpful error messages


Without knowing anything about Cue, just be careful with choosing to a technology only because it scales to the largest infra on earth, it doesn’t mean it scales equally good for smaller deployments.


For the record, when I was an SRE at Google, most people I talked to hated BCL with a passion.


You are hating BCL because it's not loved by anyone. It's like the poor kid who were supporting the prodigy that is the Borg, but is actually supporting the Borg ecosystem on its back, where the prodigy kid just grab all of the acclaim.

That's just typical of almost all human involved affairs. Some dudes hard support but get 20% of what they deserve. Whereas some few get 800%.

As for BCL, it literally had no investment since 2010, yet still reliably March on ward with little support, in which case 99.99999% of software would simply die into oblivion. BCL just shows it's power and strength.

To name the importance of any single software for Google's success, the 120k lines of CPP code in borgcfg stands on a high peak that look down upon all the other dwarfs with ease.


You could use nearly the same argument to sing the praises of PHP at Facebook. (Or Cobol at banks.) Doesn't make either language any better.


I don't know php at Facebook.

But I sure can tell you a few easy changes for BCL ecosystem can make a huge difference:

* Testing: despite the common knowledge of testing everything, BCL has not testing facility. No man power to support this easy (easy in relative scale) feature.

* Code search: BCL has inheritance like semantic, but no support in code search to navigate the code base. That's ridiculous. If anyone claim Google cpp code can be supported by lack of code searching navigation, then BCL is rightly to be called mysterious and lack sane language design.

* Packaging: BCL is always halg mixed and half separated with executable. That's just plainly stupidity.

* Lack of sane abstraction from Borg: Borg was designed without package concept. Borg's package was entirely an invention by BCL and borgcfg, as an idempotent shell command executed before starting Borg job. Go figure how much a shaky ground BCL is relying on. And how much blame Bcl has to suffer on behalf of Borg's lazy design.


It was in fact the running joke that everyone knew everyone hated BCL.

But the issues with it were well known, so it makes sense to me that after learning all that the guy who invented it could have come up with a really good successor.


Same here. It did some cool stuff but really bent your mind.

Agree with the other comment that the amount of it at Goog should have a mixed interpretation. Obviously it was useful, but although Google systems were complicated, it still doesn't seem right that it's on the same magnitude as the main system langs.


From talking with people who used both BCL, JSonnet and CUE, CUE is an attempt to learn and fix the things that made BCL hated.


Yeah, I really like what I've seen from CUE, and obviously the core team behind it is the real deal. I've never worked at Google but I've done quite a bit of research into Borg/BCL. AFAIK Jsonnet is basically a direct decedent of BCL, whereas CUE is the next evolution that tries to fix what BCL got wrong.

CUE was in its infancy when we were evaluating Jsonnet (and I wasn't even familiar with it at the time). If we were picking a data templating language today there's a good chance we would choose it. We very well may end up migrating to it or incorporating it in some way.


Whilst I like the look of Cue... this argument reminds me of the Kubernetes trap ( We should use it... because Google does it and they're huuuuuuuge don't you know etc )


This is actually one of my main concerns with CUE and something I'd like the community overall to keep in mind. There is a high chance in my opinion that CUE ends up being adopted as a silver bullet for configuration just for being cargo culted as something that is from Google and the drive for dependence by open source users sometimes, sort of similar to what happened with Kubernetes. That modules and package management in CUE are modeled after what Go did is also questionable IMO but makes sense since the project is Go based. CUE is a good tool for validation and has its uses, but Dhall has some really neat innovations that make it an exciting project so worth looking at both at least and compare first.


CUE talks about unification, is that similar to prolog meaning of that term ?


It certainly has that feel... that you can make statements (express constraints) about a given configuration item — in any order — and the final result falls out (having satisfied all constraints or "errors").


Just my 2cents - ehen I started at current employer, we had a huge, convulted dhall project for kube. We ended up switching to a real language (python in our case due to reasons, Go is a more correct choice) and are very pleased with the results.


I thought the same thing. As a Node.js dev, I can quickly create a .mjs file and use a `module.export` to return a JavaScript object who contains the configuration. Thus, i can use template string and/or function to do what I want. I can even load JSON files natively


I like Dhall for what it gives you with type safer etc but the compile times are extremely slow on a large project and type signatures can get real long/hard. It is bit of a Dhall monolith though ;)


> we're giving more thought to writing a more robust tool using a more traditional language like Go to solve some of our configuration/deployment and data templating problems.

You might want to give Pulumi (no affiliation) a look. I've been using it with Typescript, but it supports Go, too.


> a more robust tool using a more traditional language like Go

I heard you liked configuration languages, so I wrote a configuration language for your configuration language.

How long before you stick a Yaml config into your Go configuration library for easier maintenance? And so the cycle continues.


"Dhall... that you can think of as: JSON + functions + types + imports". Is this Typescript?


IIUC, Dhall is pure and have no side effects


Unless you count the ability to do imports from around the web. But you can still make it side-effect-less by using caching and semantinc hashes on those imports?


these are side-effects performed by the compiler's runtime in a sandboxed environment. Dhall scripts are unable to perform side-effects on their own. This is a completely different setting compared to unrestricted side-effects available within a script written in general-purpose language.


so something like using typescript on Deno without extra permissions?


"Pure" means something specific for programming languages: https://en.wikipedia.org/wiki/Pure_function


jsonnet seemed like a great idea to me, but I've experienced extremely low performance. My Kubernetes manifest with a couple hundred lines of code in 4 files, which rendered instantly in Python before and renders instantly now with Helm, would take 10-20s with jsonnet. The "lazy evaluation" would probably go into some quadratic or exponential behavior, evaluating the same thing many times, but I sure couldn't see why (it was very straightforward code) or where (no debug tools...).

I really wouldn't use it for anything until tooling improves (but then against I'd much rather use something like Starlark).


> jsonnet seemed like a great idea to me, but I've experienced extremely low performance.

Although each implementation of jsonnet has some quirks, take a look at sjsonnet^1 (scala-based) or go-jsonnet^2 for improved performance. We generally prefer go-jsonnet.

There's also a Rust version^3 that claims to be the fastest yet^4, but I haven't experimented with it at all.

[1]: https://github.com/databricks/sjsonnet

[2]: https://github.com/google/go-jsonnet/

[3]: https://github.com/CertainLach/jrsonnet

[4]: https://gist.github.com/CertainLach/5770d7ad4836066f8e0bd91e...


14s to 2s, pretty large improvement! Thanks for the recommendation.


just another 'have you looked at' : https://carvel.dev/ytt/

ytt lets you embed logic via a python-subset (starlark) and also provides "overlays" as a "replace/insert" mechanism. and all valid ytt files are valid yaml files, so they can be passed-through other yaml parsing stages.


Thanks for sharing! We've done a little experimenting with ytt (we already use several tools from Carvel/k14s, mainly vendir and kbld), but it's been a while...probably a year or more...since I've played with it. Need to give it another look.


CUE is far superior to Jsonnet from my experience. The Validation and abstractions feel much more native


> using a more traditional language like Go to solve some of our configuration/deployment and data templating problems.

Have you looked at https://github.com/kris-nova/naml ? :)


I had not seen that but it looks super interesting and potentially useful! Thanks for sharing!


Relying on configuration files to glue together a huge stack of tools and services in order to produce an end product is a huge mistake. I say it's a mistake fully knowing that nearly everyone in the industry is doing it.

Configuration is opaque. Undebuggable. Unmaintainable. By its very nature. No matter what "language" you use for it.

You should strive to keep configurable things to an absolute minimum.

Although it's worth making a distinction between "configurations" and "settings": when something is configured wrong, the application basically fails to operate as intended. For settings, there can't be "wrong" settings, all settings are valid, and if the input for a specific setting was invalid, the application can simply ignore it and use the default value. Example of a "setting": font size for a text editor. Example of configuration: the connection information for a database.


Wholeheartedly agree - convention before configuration wins with tooling even at modest scale.

Larger team(s) using free form yaml configuration and templating quickly becomes messy.

Setting a boundary at “configuration” and exposing configuration options to dev teams through settings works really well.

It requires you to have a tooling team with a few developers, but it scales!


Convention over configuration is really the future, but not everything follows that philosophy. For example, OCaml codebases are basically a free for all and you can organize things however you want. So with that in mind, you need something to organize and build.


Convention seems to be nice in the beginning. But it's very hard to maintain.

Much better is default configuration. I.e. all "conventions" are explicitly generated into a default-configuration file which you can then change/overwrite/update in whatever way you want.


I think you can have both (for example, go and rust starter projects)


One problem with convention is that it isn't discoverable. At least config files guide you to where to look for stuff.

But even so I think you're probably right that convention is better because it forces everyone to use the same structure for stuff.


Convention before configuration doesn't solve the core problem.

Why do you need configuration? Because you build your application by gluing together a multitude of different applications that need to configured properly to be able to talk to each other.

Once you understand this, the solution is obvious: don't build your application this way.

Build it in one language. If you want to "reuse" existing solutions, reuse them as libraries, not as separate programs.


Say sql database:

Teams have agreed to use postgres for sql, this is by agreed convention.

A tooling team implement hashicorp vault and build a pg module/deployment that is packaged and accessible through whatever tooling you build.

This makes a datasource of type pg sql available to all teams in a couple of minutes.

These modules/deployments are governed by a bunch of conventions, but those are of no special concern to consuming dev teams.

Use “something” to keep track of service upstream/downstreams: consul, a database, or something else.

When a service is connected to a database, say using consul upstreams you have the job gen api inject the appropriate env vars according to an agreed upon convention.

Now dev teams have an option to deploy incredibly secure pg databases, and by simply setting a service as downstream it will automatically receive rotating connection strings.

There’s no configuration to be managed by dev teams except keeping a service graph up-to-date.

Lot of conventions above, and a tooling dev team with a lot of infra and tooling expertise.

I believe that for scaling dev teams you should look at tooling like a product domain and treat it as such.


> Configuration is opaque. Undebuggable. Unmaintainable. By its very nature. No matter what "language" you use for it.

What makes you think so? How do you define configuration?

I mean, in the extreme case you can see Python as a configuration language for the behaviour of the Python interpreter. Does that make Python a configuration language? Where do you draw the line?


Configuration is some sort of information that is stored in a text file, for reading by one or more programs, where these programs cannot meaningfully operate without this critical information in the configuration files.

If you have ever worked in the web startup industry, their use is endemic in everything. You often cannot deploy any web applciation without properly understanding tons of configuration files.

Because everyone builds applications out of services that must be glued together, and these services often glue themselves together via configuration files. The python side of the application code needs to figure out how to connect to all the databases (postgres, redis, elastic, memcached ... etc). Each of these databases sometimes needs its own configuration before it can meaninfully operate. And because everyone is using Docker, you need docker files for all of your services. The list of things that need text based configuration fiels goes on and on.

Go to any web company and ask around to see who can meaningfully understand and edit or maintain these configuration files, and you will find the majority of the people have no idea what's going on inside them. It's very common that only two people have a decent understanding of all the configs.

Now to answer your core question:

Why are they opaque and undebuggable?

Because to understand the configuration files, you have to understand the programs they are inteded for and the environments they end up in, and the language they are written in.

For example, to understand a three-line docker compose configuration for ha_proxy, you have to understand the language of docker compose, and the way configuration files are aggregated. For example, you can define some VARIABLES in one config files and have them be available for reading by another docker compose config file in a totally different place. But you have to understand under what conditions which files are included together. For instance, you can have 10 files define different values for one VARIABLE and each of them runs in a different environment. You have to understand all of this. And we haven't even gotten to ha_proxy yet. Now you have to understand how ha_proxy is influenced by the configurations you have specified. Let's say it's telling it about another server to connect to, or a directory to serve static assets from. The path of this directory depends entirely on how other docker images are composed together, which in turn is usually influenced by the contents of many other docker compose config files that are - again - spread out across many files. You need to figure out _which_ config file is used to make a certain directory available at a certain location, etc etc.

There's no tool that can debug this mess. There's no substitute for deep and intimiate knowledge about all the details of the system.

It's very possible for the whole system to collapse when you make one tiny small mistake in one configuration file. There's no tool that can help you debug what's going on.


> Because to understand the configuration files, you have to understand the programs they are inteded for and the environments they end up in, and the language they are written in.

Seems like the right approach. To use something, one has to understand how to use it. Why is this unusual?


Your comment makes it sound like my problem with configurations is that they require understanding something.

But that's not what I said.

What I said is:

> Configuration is opaque. Undebuggable. Unmaintainable. By its very nature. No matter what "language" you use for it.

The underlying point - perhaps not metioned explicitly - is that they are way too complex for what you achieve with them.

Of course, complex systems require a deep level of understanding. But everything being equal, complexity is a thing to avoid - when you can do it without loss of capability.


Dhall is my favorite configuration language that I never get around to using.

I manage DNS in Terraform, and since every Terraform provider uses different objects definitions, and every object definition is rather verbose, Dhall would be a way to specify my own DRY types and leave the provider-specific details in one place. Adding new DNS entries and moving several domains between providers would be a matter of changing fewer lines.

Dhall also has Kubernetes bindings:

https://github.com/dhall-lang/dhall-kubernetes

Although I'm tempted to just stick to Helm here: even though it's less type-safe, Dhall's verbosity makes me reconsider.

I'd like to hear if anyone has used dhall-kubernetes if they like it.


I use tanka/jsonnet and cringe everytime I read a helm chart. Type safety would be nice, but the k8s api can verify the validity on the server.

https://tanka.dev/


Tanka is what I want to use when the time comes.

Are there any pitfalls you have learned to avoid?


I think it might still have issues figuring out that it needs to apply CRDs first: https://github.com/grafana/tanka/issues/246 Besides that, I found it super-handy for deploying https://github.com/prometheus-operator/prometheus-operator and https://github.com/kubernetes-monitoring/kubernetes-mixin

LE: The "garbage collection" feature is very interesting, but I didn't get to experiment with it enough yet: https://tanka.dev/garbage-collection


tanka helped me make peace with k8s (yaml) or at least made for a good learning environment for k8s than the other options. Wonder about other peoples experience or their rite of config tool passage


We use https://github.com/octodns/octodns for some of our DNS records. It's flexible, much faster than Terraform for thousands of records, and the maintainer Ross has been responsive on issues and pull requests. Also see Cloudflare's blog for how they use it


I think Dhall is good at pushing forward some ideas, but honestly I feel like Skylark (Python but not turing complete) just feels like the right way forward. Being able to specify dynamic functionality in a configuration file, when paired with a good configuration API, really makes stuff straightforward IMO. Gunicorn is the best example of this.

We have all of these tools that try to propose declarative configuration, then run into the fact that people really do want dynamic systems with some abstraction capabilities, and then have to overlay that into their systems.


I feel increasing alone in this position but I absolutely hate those Python-like languages.

They look like Python, and they occasionally act like Python. But occasionally they don't, and the mask of looking like Python obscures those times. Then you never really know which Python facilities you have and which ones you don't.


Is skylark (now called starlark btw) general purpose? Its my understanding that its largely used for configuring bazel.


It is used by other tools like Tilt https://tilt.dev/ (a Kubernetes dev tool)



Comparison of similar projects, including Dhall:

https://github.com/oilshell/oil/wiki/Survey-of-Config-Langua...


> Can you spot the mistake?

Nope, so now I have no incentive to use your config format because it's established something is wrong and it's completely non-obvious.

Thanks for not wasting my time, I guess.


This opinion is ironic because this is exactly what the author intended to describe -- the typo IS hard to find.

The "Hello, World" example is nothing more than JSON, showing a string repeated 3 times. On the third time, "bill" is misspelled with "blil". The next tab shows how Dhall uses a variable definition to prevent this type of error.


I too spent the whole time scrutinizing the syntax and ignoring the values, since this is an intro to the language and not to Bill’s dotfiles. I was ultimately unable to deduce the presumed syntax error and independently came to the same misconclusion.


It can't be a syntax error though as it evaluates successfully as demonstrated by the right-hand side.


In general, if you think that someone’s claimed experience doesn’t make sense, you’re probably just missing information.

On small viewports, the “right-hand side” is down page and off screen, and the instructions I’m following reference only tabs visible above.


I took it to mean "can you find the syntax error". It wasn't clear they meant the text


Yeah, I eventually caught it but wished I hadn't wasted my time on it - seems like something they could better have shown with an animation


Sure, but how do I know that the user doesn't store their public key in /home/blil?


I mean... from context, it's probably a mistake, but on the other hand we don't actually know the structure of that filesystem or the author's intentions.


They are trying to tell a story about how Dhall helps you avoid hard to notice errors, but they aren't doing a great job.

Would be cool if they had a three panel story:

* using json with the typo

* using dhall with the typo

* using dhall with variables to avoid the typo

Might be easier to understand.


Meh, I think you're pointing out why doing this with a variable is better. In other words that mistake is the entire point of the example. Perhaps the definitions tab could make it more clear what the mistake was.


Yeah, I spent 3 minutes looking for a syntax error in 5 lines of code. I couldn’t find anything. I’m obviously too dumb to figure out this language so I guess I’ll just stick with YAML.


How could they make the mistake obvious? It's a mistake in a string value!


It had me confused too. I figured I was looking for a syntax error in the config format since it’s an introduction…to the config format.


How about not making the very first thing on the page a confusing puzzle?


They should at least reveal the answer somewhere immediately close by. I couldn't spot the error after a few minutes, came to the comments to find out what it was. Not a great introduction because you're asking someone to spot an error in something they don't know the syntax or ruleset for.

The project looks cool though.


Change the name from Bill to John/Jhon. That's more obvious than all the barcode characters next to each other.


Previous thread w/ many comments: https://news.ycombinator.com/item?id=20355405

Another ~similar project not mentioned in that thread is https://ucg.marzhillstudios.com


I like a lot of the ideas in dhall, but but I disagree with some of the design decisions.

The syntax for objects with dynamic keys seems unnecessarily verbose.

The arithmetic capabilities are very restricive. No subtraction, division, or modulo. Addition and multiplication only work on Natural. No numeric comparison. = And != Only work on booleans. There isn't really much motivation given for why it is so restricted.

Maybe I'm missing something, but it seems like if a type has a lot of optional fields, which is pretty common, you have to explicitly pass None for all of them. Maybe the pattern is to merge a default record with a smaller record that has what you actually want? But I didn't see any examples of that. Also, how do you deal with cases where null, and the absence of a value are treated differently? For example if leaving the value off means use the default and null means to turn a feature off. From what I can tell optinals are either always null or always absent when None.

It also seems like it would be annoying to have to specify types whenever calling functions like List/length.

IDK, maybe in practice these aren't as bad as they seem.


> Maybe I'm missing something, but it seems like if a type has a lot of optional fields, which is pretty common, you have to explicitly pass None for all of them.

You can specify defaults for you record types via Type/default pattern https://github.com/Gabriella439/dhall-manual/blob/develop/ma...


Count me as another big Dhall fan -- Dhall was a huge improvement on our codebase compared to Helm for large Kubernetes deployments.

Full disclosure: I maintain the [dhall](https://pypi.org/project/dhall/) package on PyPI


As much as I love Dhall, I think it's actually better to allow any turing complete programming language and require it to generate the config in a certain format (whatever it is, maybe even dhall).

The only thing is needed to run the config-generation process in a sandbox and restrict how long it can run. If it takes too long then we stop it - the same we should do with a Dhall config as well, even if Dhall theoretically will finish in a finite amount of time.

That way everyone can use their preferred language to describe and maintain the configuration. We just have to agree on the format that is being spit out at the end, such as json, yaml, dhall or even a proprietary format.


> The only thing is needed to run the config-generation process in a sandbox and restrict how long it can run.

That's a very brittle restriction at best, and if implemented strictly on time, it's a very flaky on as well.


How so? Normally, generating a config, even a big one, is a matter of milliseconds even for slow languages.

We are not talking about applying the config - we are just talking about using some inputs (variables) and generating a string / text-file that we return to the entitiy that then validates and uses the config.

Not sure why that would be flaky.


Generating a config may include downloading a number of remote definitions, their evaluation & caching for future reuse (Dhall does that, Nix too), which doesn't fit into milliseconds threshold on the first run.


Potentially yes. But if we compare it to Dhall, then the sandbox would just allow to generate a config, potentially based on some inputs, but without disk access, network access, etc.

If you allow that for Dhall, then you also have the problem of a config potentially taking extremely long to load / to be ready. So you will want to set a timeout anyways and I don't see a big difference here from a practical point of view.


Flaky in the sense that you can have stuff that's fitting right into your cut-off time limit on some runs, and going over on some others.


I mean, theoretically yes - but I doubt that this would be an issue in any but the rarest cases.


Depends on how aggressive your timeouts are, I guess?


I love seeing useful languages that aren’t turing complete. Tooling gets much more interesting when you don’t have so many impossibility results around all the interesting stuff.


What practical advantages are there? Someone can always write a program that runs until the heat death of the universe even in a Turing-incomplete language.


> What practical advantages are there?

The major advantage of a language that isn't Turing complete is not having the major risk inherent to Turing complete languages: asking if any non-trivial program will produce any given result or behavior is undecidable[1].

> write a program that runs until the heat death of the universe even in a Turing-incomplete language.

The Halting Problem is just a simple example of program behavior. The undecidability extends to any other behavior. Asking if a given program will behave maliciously is still undecidable even if we only consider the set of programs that do halt in a reasonable amount of time.

When you are using a regular language or deterministic pushdown automata, questions about the behavior or even asking if two implementations are equivalent is decidable. It is at lest possible8 to create software/tools to help answer the question "is this input safe." When you use a non-deterministic pushdown automata or stronger, you problem becomes provably undecidable*,

I highly recommend the talk "The Science of Insecurity"[2].

[1] https://en.wikipedia.org/wiki/Rice%27s_theorem

[2] video: https://archive.org/details/The_Science_of_Insecurity_ slides: [pdf] https://langsec.org/insecurity-theory-28c3.pdf


> The major advantage of a language that isn't Turing complete is not having the major risk inherent to Turing complete languages: asking if any non-trivial program will produce any given result or behavior is undecidable[1].

It's not obvious to me that I should care about this property in a configuration language. For a given configuration use case, I probably have a good idea about the extreme upper-bound for a correct program--say, 5s. If the program runs for 30s, the supervisor kills it.

> The Halting Problem is just a simple example of program behavior. The undecidability extends to any other behavior. Asking if a given program will behave maliciously is still undecidable even if we only consider the set of programs that do halt in a reasonable amount of time.

What's a malicious action in a configuration use case that a Turing complete program could muster but not a non-Turing-complete program?


In theory that's true for many of the kinds of Turing-incomplete languages we care about. (Eg it's not true for JPG or for (proper) regular expressions.)

From Dhall's docs (emphasis mine):

> Note that a “finite amount of time” can still be very long. For example, there are some short pathological programs that take longer than the heat death of the universe to evaluate. The main benefit of evaluation being finite is not to eliminate long-running programs but to make them significantly less probable. In practice, you will discover that you will rarely author a configuration file that takes a long time to evaluate by accident.

> For example, Dhall does not provide language support for recursion. If you try to define a recursive expression or function you will get a type error. Lists are the only recursive data structure and the only way to build or consume lists is through safe primitives guaranteed to terminate, like List/fold. This restrictive programming style keeps code simple and makes expensive code more obvious (both to the code author and reviewer).


I guess I don’t care much if a configuration language takes a long time in some pathological case (a user writes a loop that doesn’t terminate). Such cases are already too rare to justify a dedicated language. And if the concern is bad actors, then a bad actor wouldn’t have to try to hard to get a Dhall program to run for a long time (copy paste something from the Internet). The more compelling reason to use Dhall is the static type system which is sadly uncommon among configuration languages.


Infinite loops are a pretty common bug in my experience.

(Same for the recursive equivalent.)


So common that it justifies a change of language? "Oops, this thing that should have run for 5s has run for 30s, something must be wrong, <cancel>". You can even automate that reasoning by way of a timeout. In a hypothetical world where I'm choosing between two otherwise exactly equal configuration languages, I suppose this would give the advantage to Dhall, but honestly I'd probably pick TypeScript because making an entire org learn a new syntax is far more trouble than that saved by non-Turing-completeness.


Totality checking like that in Idris is one nice example. That's kind of circular though - it isn't Turing complete because it has totality checking (i.e. you must prove the program terminates).


Used this for a fairly complex, large project.

It really is Haskell for configuration language and undoubtedly superior than YAML.

But for my grug brain, function currying and the lack for loops made it hard.

Functional languages continue to be a great place to steal from but pure functional requires warping your brain quite a bit.

Maybe Tanka is more my style


Also look into Starlark or Cue


I absolutely loved using this tool to generate many instances of Datadog resources in terraform json. Not only did it simplify a nasty mess of dynamic typed and schema-less bash/python/mako, it taught me a lot of functional programming paradigms along the way.


I've always wanted an excuse to use this but never had a reason. The closest I come is using Nix, and I know there is a Dhall-to-Nix compiler, but Dhall can't represent everything possible in Nix I haven't seen any good reason to use it.


Dhall is advertised as a configuration language but you can do a tad more. I use it in my blog-engine to fit a use-case I found was poorly addressed by other approaches: small-cardinality datasets that benefit from type-checking and templating (e.g., list of notes, a photo gallery). I don't claim the idea is especially novel but I found the use case rare and interesting enough to write some explanations, design, and a demo here: https://lucasdicioccio.github.io/dhall-section-demo.html


Related:

Dhall: A Non-Repetitive Alternative to YAML - https://news.ycombinator.com/item?id=20355405 - July 2019 (177 comments)


JSON is a strange choice, with all those unnecessary commas. S-exprs / EDN would be a better choice, especially as this is a functional language.


Strange choice for what? It's just one of the compile targets.


> Dhall is a programmable configuration language that you can think of as: JSON + functions + types + imports

At the top of the main page.

See also the Hello World and other examples.


I wonder if a version of Python with types with some way to force programs to be total would be a 'good enough' substitute. JavaScript has `"use strict";`, `# use total` could ban recursion and loops over non-constants. You'd have to work hard to ensure things were really total, e.g. ban assigning functions to variables, but maybe you could make it hard enough to pay off.


Starlark is a version of this, and it's used by Bazel in particular. It's very cool tech, and sidesteps the difficulties of ensuring Python works without shipping a specific Python interpreter.


Didn't know about Starlark, that is awesome.


Assigning functions to variables doesn't have to ruin your totality.

But yeah, you'd need to carefully select your subset of the language that is both useful and total, and practical to implement.


I have been writing a fair amount of Dhall using autogenerated CloudFormation bindings ( https://github.com/jcouyang/dhall-aws-cloudformation/ ). It is a fantastic way to reduce boilerplate and factor out recurring blobs. My main frustration is that the type checker is not smart enough (or maybe the type system is undecidable?) - every time you want to use a polymorphic function, you must pass in the type parameters yourself (this is also true for empty lists and `None`). This makes simple FP idioms extremely noisy, to the point where you're better off writing longhand. In a language that's meant to be alleviating YAML/JSON boilerplate.

It's still a massive improvement, but it could be so much better if the typechecker was smarter.


Whatever happened to Puppet, Ansible or Salt frameworks? They were the darling of the devops folks 7 years back.


They are different to Dhall. Puppet, Anisble and Salt are task runners. You give them configuration and they parse the configuration and then run tasks on a destination machine based on the configuration.

Dhall is a tool to generate configuration such as json, yams, xml. It’s useful if you have *complex* configurations such as a AWS Cloudformation stack or Kubernetes yaml files.


Ah ok .. thanks for that details. Puppet was also called a configuration manager and it had the capability to generate configuration files of any format in a dynamic way (via support of variables, parameters etc).


Kubernetes ate their lunch, mostly.


No. Tools like Ansible are just wrappers around ssh when you need to run a command across many servers. They don't do anything k8s does.


I'd say partially right.

10 years ago people were running hundreds/thousands of VMs/Servers with full operating systems on them. They would often be long-lived and you needed puppet/ansible to update or run stuff on them.

With kubernetes deployment and updating your app/data is all native. The servers/nodes often run some sort of cut down operating system and are immutable.


Or Terraform or Helm or ...


this looks super useful, but its also horribly ugly.


I disagree regarding ugly, but it’s rooted in Haskell syntax fwiw.


I loved my experience with Dhall from the PureScript community. It prompted me to learn a lot more and I built some small libraries too. My experience was mostly positive. Supporting Unicode is the cherry on top.

Of all my gripes, only two still stick in my mind: • No JavaScript implementation; I want to be able to use it more places, but I always end up having to convert it to JSON to use it • the built-in formatter is way too aggressive; I'm fond of the never-contract-only-expand formatters that don't pressure me to collapse things I don't want and the way it is isn’t to optimize Git diffs where conflicts can arise.


Dhall needs to provide official bindings to all major platforms (these days that’s JS, Python, .NET, JVM). It looks like a really cool project but I don’t use Haskell or Ruby.

Has the situation improved?


    { home       = "/home/bill"
    , privateKey = "/home/bill/.ssh/id_ed25519"
    , publicKey  = "/home/blil/.ssh/id_ed25519.pub"
    }
This is the most awkward abuse of formatting I've seen in a long time. Just allow/require a final trailing comma, and this nonsense goes away


This is a by-product of Dhall coming from the Haskell community where this formatting was preferred instead of final trailing commas. It has been internalised and was carried on. I say this as someone who looks at this code and thinks "This is totally fine." -- but I admit, I'm environmentally damaged.


Yes. And, alas, you can't just allow a final trailing comma everywhere in Haskell.

Eg ("Foo", 2,) is different from ("Foo", 2) in Haskell thanks to TupleSections. For innocent bystanders: in Haskell ("Foo", 2) is the tuple you'd expect it to be. But ("Foo", 2,) is a function that takes another argument and creates a three-tuple. A Python equivalent would be

lambda x: ("Foo", 2, x)


Or allow a leading comma:

    {
    , home       = "/home/bill"
    , privateKey = "/home/bill/.ssh/id_ed25519"
    , publicKey  = "/home/blil/.ssh/id_ed25519.pub"
    }
Or, to make it markdown-ish:

    {
    - home       = "/home/bill"
    - privateKey = "/home/bill/.ssh/id_ed25519"
    - publicKey  = "/home/blil/.ssh/id_ed25519.pub"
    }
This is just to play with the idea that leading punctuation may be preferable because it all aligns in the same column.


> Or allow a leading comma

This actually works just fine.


I would actually prefer no comma's

  {:home        "/home/bill"
   :private-key "/home/bill/.ssh/id_ed25519"
   :public-key  "/home/bill/.ssh/id_ed25519.pub"}


That's the same as a leading comma.


How so? Leading comma puts the terminator on the line after what it's terminating (which is why it bothers me, can't speak for others). This doesn't do that.


Are you mixing up the colons with the separator?


This style is common in Haskell, Nix and Dhall. It also sees more "mainstream" use, e.g. in SQL.


I much prefer this style, I use it wherever I can because I can comment out lines without a trailing comma causing a syntax error. SQL, Ruby… wherever I can.

Try it, I doubt you'll go back (unless someone's stupid parser doesn't let you).


You can do the same, if your language allows a final comma.

(And that's what the comment you reply to suggests: make Dhall allow a final comma.)


What's the point of allowing a trailing comma? A delimiter symbol is a delimiter symbol, be it a an opening curly brace or a comma between elements, and one can simply align delimeters.


The main point is to make git diffs smaller when adding or removing elements, I think.


I feel this is the superior way to format something like that. Trailing commas are nothing but an ugly hack.


Yeah, I also really dislike this mixing up of indentation levels because of a language limitation. The "{" denotes a container, and the "," a separation between items of said container, for me it makes sense for them to be nested a level deeper


That's one way to interpret things, but not the only way.

In practice, this style reads just fine once you get used to it. Indentation is mostly there as a human convenience, so just has to work well with human brains (and be understood by a computer, if it's significant), but doesn't have to necessarily follow some abstract unified theory of syntax trees.


both symbols are delimiters, one can choose to align them together even when they are not the same character. Opening and closing braces aren't the same character either, but people have been aligning them for ages, I don't see a reason why commas, while being part of the same expression, should not follow the same principle.


For me it's not about being the same character (like you've mentioned, opening and closing braces aren't either), it's about commas and braces indicating different things in the hierarchy. Not to mention the symmetry breaking: an opening brace together with data in a line, then a lonely one at the bottom.


In a series of declarations, the lonely closing brace at the bottom can be treated as a substitute for an empty line between two entries, as it produces a similar sparse spacing.


Trailing comma is ugly as sin - I love the little "house" my leading commas put my record into :)


Dhall has a lot of really cool ideas, it would be great to have semantic integrity checks on imports in other languages too.


Comparison with a related language (perhaps biased, since opinion is from creator of that language).

"Comparisons between CUE, Jsonnet, Dhall, OPA, etc." https://github.com/cue-lang/cue/discussions/669


I have been recently looking for a program that does a similar thing, ie simplifies creation of configs. And I did not find a suitable one by Googling. Then I realized the PHP is perfect for that and I already know it! The output of PHP does not have to be HTML or other web stuff, instead it can be a config file.


Apparently I'm in the minority that feels config languages should be little more than namespaces and key-value storage and should not be programmable.

I would not want to use a bash script as a config file, for example.


I too don't want to or would use a bash script as a config file. That's the point of dhall: you get good static types and it's not turing complete, so you get a lot of the safety and code reusability tools without the pitfalls of a general purpose programming language.

And, have you ever worked with giant configs in json or yaml? It becomes incredibly painful to manage.


I prefer to use multiple, smaller config files to keep them manageable.


My biggest pain when doing this is: set up an environment configuration. Then set up a test or stage environment that matches it. Then modify the environment over the next few years keeping the stage and production environment configs in sync so that testing and validation are useful (the stage configuration actually matches production so that success or failure in staging predicts the same in production) before going to production.

Even something as simple as variable substitution becomes very useful in these cases so that configuration can be a single file and substitution delivers the staging or production config. Functions and operators allow more complicated configurations to remain DRY. Checks prevent the stage servers from using the production database connection string, etc.


People tend to evaluate these things based on what cool and neat things they can do it with. Not whether telling people they need to learn Haskell before they can update what would've been a 30 line configuration file on the project they just joined is a good use of peoples' time.


I think more programming power is useful in the service of stopping bugs; for instance, you could have a language where the part that "does stuff" isn't Turing complete, but the part that "stops incorrect stuff from happening" (the type system) is.


We use it for mina (https://github.com/MinaProtocol/mina). AMA


Dhall desperately needs a DefinitelyTyped-like repo with tooling support and it's bound to explode. It'd be just too good of a solution to ignore.


    {- You can optionally add types

       `x : T` means that `x` has type `T`
    -}

    let Config : Type =
Config's type is Type?

Clear as mud.


Config is a type, so it's type is Type.


Figured, but it's a pretty goofy way to introduce it after that comment.

[EDIT] I mean, you can almost "who's on first?" this.

"OK, what type do you want this to be?"

"Type."

"Yes, what type do you want it to be?"

"Type."

"Great, ok, yes, the type, what type is Config?"

"Type."

"WHAT IS THE NAME OF THE TYPE YOU WANT CONFIG TO BE???"

"Type."

flips desk


Is it? My takeaway is "oh cool, first-class types". Experimenting with this, I can write the following:

  let ConfigOf : Type -> Type = \(type : Type) ->
        {- What happens if you add another field here? -}
        { home : type
        , privateKey : type
        , publicKey : type
        }
  
  let Config : Type = ConfigOf Text
and the rest of the example still works and evaluates the same.

Also in a later example it has the expression `generate 10 Config buildUser`, which also works because of first-class types. Instead of needing generics, you just take a type as a regular parameter.


Tried to use it and it didn't fit my usecase: Recursive Types are a bit difficult

Looks great otherwise


what kind of software configuration may require recursive types? I've never had to encode trees in my configs so far.


An explicit representation of JSON requires recursive types.


github.com/coralogix/dhall-concourse


Yikes! Don't do it folks! Business logic belongs in the apps, not in the config.


anyone using this for kubernetes config generation? What's been your experience?


Don't. Use helm, kustomize or a decent code language. Dhall will constrict, slow you down, make onboarding a nightmare, and ultimately be as brittle as other alternatives (Only it's harder to find where it broke). I cannot advocate against dhall enough.


this is what I thought too. I've enjoyed working with helm and totally recommend it but was wondering if I'd missed something. A templating language and not an actual programming language seems to be the right balance for config


What are the advantages of this being its own language vs a DSL?


Why wouldn't you use Python as your configuration language?


Have you seen Starlark? It's not too far from that, but safer in a number of ways: https://github.com/bazelbuild/starlark


I said this above as well: ytt (https://carvel.dev/ytt/) lets you embed starlark into valid yaml, among other cute tricks for managing biz-logic in configs.


Huh. Clever strategy, though I'm not sure it'd be something I'd use...

It is a pretty decent "take your existing yaml and make incremental improvements" setup though. Rewriting all of your config from scratch is rarely an enjoyable experience.


For one thing, Dhall is not Turing complete. You can also freeze imports to prevent supply chain hacks, so it is much safer in theory.

They have a great writeup on their safety here: https://docs.dhall-lang.org/discussions/Safety-guarantees.ht...


Most of the time with configuration files, you want to be able to do at least basic validation on the config without having to run the whole piece of software. Python has no native typing, so there's nothing to stop the user from creating a configuration file where, for example, they have set a field that is intended to specify a port number, as a string. This error will become obvious when you run the software (and it presumably fails some way in) but for some use cases this is too late. With dhall you can specify that the field for the port number needs to not just be an integer, but be a natural number, so you're able to catch plenty of config errors nice and early, and can go some way to validating the config in isolation from the software it's intended to be used for.


I believe you may want a non-turing complete language for strict configuration.

But sometimes, as you indicate, you want VERY dynamic configuration. But I would argue then that such logic goes in your application itself and is not in fact part of your configuration.


Sometimes you just explicitly want to output text or a structured format, but you'd rather have a template language do it because it being a lot less powerful actually makes it easier to write close to the output format.


I really hope that one day Google can open source GCL, which so many languages have been inspired by without ever really being as good as it is. They all claim to address pitfalls in GCL but at the end of the day they're just worse than GCL.


Can you say more about what GCL does better than all of the open source ones?

Anecdotally, I've heard a lot of GCL horror stories, and many Xooglers have chosen to create things like Jsonnet or Skycfg (https://github.com/stripe/skycfg) instead.


It creates nested data without looking like the end format of the data (why I really can't get into jsonnet), it's just obviously its own language but is pretty minimal.

Despite the minimalism it is not necessarily simple. There are features like inheritance and late binding. They can be quite complicated, but the thing that really stands out about GCL is it's not complicated to use. The simple cases are easy, straightforward, and readable. The complicated cases are made possible.

I've used it extensively to store and transform data that has to be manually edited by humans. It's a really great format because I can define all the translation rules which can get pretty complicated, but the complexity I expose to the humans who are writing the data is minimal. But if they need to do some transformations on the data or even just, like, string substitutions...the language is there and they can use it. That's what makes writing your data in a configuration language nice.

(Honestly though I think jsonnet is at least vastly superior to skycfg and the whole starlark ecosystem. I swear I'm so sick of languages that intentionally look like Python without being actual Python!)



Cue doesn't resemble GCL at all.

From that link:

> However, the early design of GCL went for something simpler that coincidentally was also incompatible with the notion of graph unification. This simpler approach proved insufficient, but it was already too late to move to the earlier foreseen approach. Instead, an inheritance-based override model was adopted. Its complexity made the earlier foreseen tooling intractable and they never materialized. The same holds for the GCL offsprings that copied its model.

Needless to say I disagree with the Cue author. I think the inheritance-based override model is fantastic and has made for a great and straightforward configuration language.


And yet, even if they did it right now, it'll be over there in the pile of "yet another configuration language" for the same reason we already have like 50 of them

Damn network effect, ruining everything :-/


Why not instead use one of the popular programming languages (C#, Java, ...), possibly with some dedicated library built on top of it? Also, reminds me of the Heptagon of Configuration [1].

[1] https://matt-rickard.com/heptagon-of-configuration




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: