Hacker News new | past | comments | ask | show | jobs | submit login
In Defense of YAML (atomist.com)
160 points by rbanffy on March 21, 2019 | hide | past | favorite | 170 comments



> This is not structured data. This is programming masquerading as configuration

I find myself expressing this same opinion to people on a frequent basis. See ansible for another big example. Ansible has a try/catch equivalent in yaml [0]!

Unfortunately, YAML is more or less a lowest common denominator for these sorts of interactions - being usable from any language is a really big boon. Would love to see the world adopt a trivial lisp or similar for this sort of thing, but that feels a ways off.

Helm 3 introducing lua, for example, is a big step forward. Pretty much exactly what I've been envisioning for making k8s deployments less crummy, so I hope they end up with a good product. [1] Helm 2 is using Go-templated YAML, and it's terribly unergonomic and not particularly maintainable. Until you run templating, it's not even valid yaml, so text editors are terribly useless in aiding you with linting.

[0]: https://docs.ansible.com/ansible/latest/user_guide/playbooks...

[1]: https://sweetcode.io/a-first-look-at-the-helm-3-plan/



Everything dhall advertises itself as useful for is basically as a replacement for situations where YAML has been abused or where the separation of concerns between code and config was improperly cut (this is largely how YAML files get to be mind numbingly repetitive).

It does this by creating something not quite as powerful as a real programming language (sometimes a feature; probably not here though) while still complicated enough to effectively be unreadable to an average Joe.


You can make Tcl as non-Turing complete as you wish. Stripped of most or all builtin commands, it becomes a flexible configuration language. You can add back power in small controlled doses as it becomes necessary. You can also make it more palatable to the cool kids by adding infix assignment and expression evaluation with a few simple helper procs.


This was a big part of what motivated me to create OTPCL [1]; using full-blown Erlang or Elixir for configuration files feels dangerous and heavy-handed, and I want to be able to readily define DSLs for configuration files instead to offer better control/safety/ergonomics. Plus, I happen to like Tcl ;)

Imagine being able to go from a foo.app like this:

    {application,otpcl,
                 [{description,"Open Telecom Platform Command Language"},
                  {vsn,"0.1.0"},
                  {modules,[otpcl,otpcl_env,otpcl_eval,otpcl_init,otpcl_parse,
                            otpcl_shell,otpcl_stdlib,otpcl_stdmeta]},
                  {registered,[]},
                  {applications,[kernel,stdlib]},
                  {licenses,["ISC"]},
                  {links,[{"Bitbucket","https://bitbucket.org/YellowApple/otpcl"},
                          {"GitHub","https://github.com/YellowApple/otpcl"}]}]}.
to something more like a foo.app.otpcl:

    application otpcl {
        description "Open Telecom Platform Command Language"
        vsn 0.1.0
        modules otpcl otpcl_env otpcl_eval otpcl_init otpcl_parse \
                otpcl_shell otpcl_stdlib otpcl_stdmeta
        registered
        applications kernel stdlib
        licenses ISC
        links {
            Bitbucket "https://bitbucket.org/YellowApple/otpcl"
            GitHub "https://github.com/YellowApple/otpcl"
        }
    }
I dunno about you, but I like the latter a lot better :) It ain't a 1:1 representation, but it'd be straightforward to define each of those parameters as commands that emit the corresponding Erlang forms that make up an actual app spec.

While OTPCL's by no means ready for primetime yet, I'm hoping it'll eventually be good enough to fill that niche in the Erlang/OTP ecosystem the same way Tcl was meant to fill that niche for software in general.

[1]: https://otpcl.github.io / https://github.com/otpcl/otpcl


There's a reason MacPorts[1] uses Tcl for its package definitions. The packages for the most part look just like structured data, but it's just a Tcl DSL that allows them to e.g. write hooks in Tcl instead of having to embed a shell script[2].

[1]: https://www.macports.org

[2]: https://guide.macports.org/#development.examples


Dhall does indeed look like it starts to address this problem - haven't seen it before. I'd have to read deeper to decide whether I like it or not, though. I may try to convert some existing complex YAML I'm using into it to form an opinion.


Dhall is nice that it can natively support configuration migrations via total functions. For example Config1 can be migrated to Config2 by applying a function that always completes. It's also possible to constrain that function so it must accept Config1 and return Config2.

Dhall supports functions from URLs and it's good to pin them using SHA hashes of their internal representation.


Also https://jsonnet.org/ .

Both of them are basically BCL/GCL but with some semantic/scoping horrors removed.


I was involved with json-e a while ago: https://taskcluster.github.io/json-e/

Which to me seemed more intuitive.

But I'm also pretty excited about the UI-as-code features happening in Dart: https://medium.com/dartlang/making-dart-a-better-language-fo...

Mostly I prefer YAML w. json-e because it's declarative and expressive enough for those few cases where you needed fancy stuff..

But the new Dart features coming up, makes me think I could write config as code -- we'll see :)

(Disclaimer: I work at Google)


Ansible deserves some credit at least. It tried to avoid the pitfalls Chef and Puppet fell into, which is having a DSL that allows arbitrary logic, i.e. basically just a Ruby script. These become horrible to trace and debug. Ansible only allowed conditionals or loop constructs in some well-defined places, which actually does help when trying to comprehend Ansible playbooks. (Although one might still argue a strict DSL would offer the best of both worlds.)


From my perspective--as somebody who slings this stuff a lot--what you describe as a "pitfall" is what I describe as "how to make things work."

I don't find that Chef cookbooks are all that onerous to debug at all. Cleanly-written Ruby (and you can write Chef cookbooks in very clean Ruby, it really isn't that big of an ask) isn't hard to trace and in the extreme case--and this has happened twice in my career--I just install Pry and drop a breakpoint into the Chef cookbook, then proceed to poke around, isolate the problem, and move on with my day.

This, contrasted with Ansible's flinging around of Jinja templates and trying to turn YAML into a bad programming language, is a relief, and when I was a consultant I charged a premium for Ansible first because it frustrated me but second because it was generally harder for me to find out what was actually going on due to the verbosity, the commonplace copy-pasting, the relatively poor method of inventorying, and the "strap in, it's gonna get bad" anytime somebody decided to break out of the YAML mines and go write their own module.

There remains a very awkward sysadmin/developer divide in the devops world, and I think that criticisms like the one you describe generally tend to hail from the former. As a developer whose output on occasion happens to be configured systems (when it isn't web pages, etc.), I'd rather a predictable programming environment over a "configuration language" every day (and it's one of the reasons I really wish that Noah's attempt to turn cookbooks into gems paid off; having them be a completely separate artifact from a Ruby gem is just silly).


> and you can write Chef cookbooks in very clean Ruby

you can, but in reality, this doesn't happen. especially for people who just want config management, and not to learn Ruby, or exactlty the "sysadmin/developer divide" as you've so aptly put it. all I got from Chef was loathing Ruby and all it's magic.

anyhow, the point was to illuminate how/why Ansible came about. i don't really like or use Ansible much anymore.


It happens where people hire me, man. ;) And many other places where I've been a consultant. Popped in, looked around, and went "hey, this is pretty good!". About...hm...30% of my Chef engagements were with messy codebases. Another 50% were fine. 20% were better than the applications they supported, and by a lot.

I get that not everybody is a Ruby person. "Chef but in Python" probably would have done pretty well. But it is genuinely not that difficult and probably more widespread than people give it credit for being.


I feel like there is a large gap between shell scripts and Ansible. If we draw a line, you end up with something like this, with each option bringing more power, but also complexity with it.

Bash/Make => ? => Ansible => Chef/Puppet/Salt => CFEngine

There's a big jump in complexity from Bash/Make to Ansible. This is a place where Python or Ruby would make a lot of sense but without all the dsl/yaml/declarative nonsense. A small library of reusable functions would fit in well here, written in a language that isn't fundamentally broken (bash).


It's unfortunate that "shell scripts" represent the imperative solution. Shell scripting is loaded with incidental complexity and footguns that other scripting languages (like Starlark) lack. The latter would be entirely appropriate for this sort of advanced configuration.


Another potential option, is Lightbend Config

https://github.com/lightbend/config

It adds includes/inheritance, substitutions, unit of measure parity , comments , and few other things -- on top of JSON.

does not have if-logic, loops, or arbitrary functions, however.

On plus side, the library is available in a number of languages, easily embeddable. And there is an IntelliJ plugin for it, to color the syntax of the config files.


HOCON is a very nice configuration format that should be preferred to JSON and YAML - it really deserves more adoption. I always reach for Lightbend Config on non-Spring projects.


It has a pretty nice format, but I've run into it's limitations more than once. Array indexing is not possible in the config or library, and nested-environmental variable substitution is not possible (e.g. $ACCESS_KEY_{$ENV_NAME})


> being usable from any language is a really big boon.

Personally, I'm not particularly happy with the idea that I would use something inferior (like YAML) because various mainstream languages are bad and don't make more sophisticated languages possible.


> This is programming masquerading as configuration

On a slight side-note this is something that's been bothering me with markdown and other markup languages.

They're very good for short comments but when you're writing a blog post, or god forbid an article series or a book, sometimes having access to a programming language is fantastic.

It's why I like a markup like Pollen use. Also X-expressions fits very well to model a document.


> Helm 3 introducing lua

Oh, this is cool! I've actually been thinking for a while that things like Helm or CloudFormation templates or other infra-as-code things would be better served by something like Starlark, but Lua probably suits as well.


Have you looked at the work being done by stripe on skycfg? https://github.com/stripe/skycfg has the advantage of preserving protobuf message types as they're passed, so you get type checking.


I'm aware of it, but I haven't looked much at it. I'm a little confused about its scope--is it for building out protobuf specs and Kubernetes config files? Or does it somehow use protobuf specs to build out Kubernetes config files?


It's the latter. You import protobuf specs (compiled into golang). You build a runtime that imports the specs and they're available in the python interpreter.


You want to see a real abuse of YAML? Take a look at SaltStack's Jinja rendered YAML. I just grabbed a random forumla as an example: https://github.com/saltstack-formulas/openssh-formula/blob/m...


I don't understand why people use a markup language when they need a programming language.

I mean, sure, you can convert one into the other syntax-wise, both form trees, but what do you gain?


From what I've experienced, it's mostly tools that target either sysadmins or sysadmins-turned-pseudo-devops, and those tools pretend that anything can be done with 'just configuration' as a marketing point. Programming is for programmers, you see.

It's easy for someone who never {wanted to, learned to} program to accept yet another configuration format (with all options specified up front) and then accidentally learn (bastardized) programming concepts. It'd take much more effort to turn them into a 'real' programmer, learning a language from scratch, practicing that, and only then introducing some tool in the form of a library.


I don't think this is fair to sysadmins, and I don't think sysadmins deserve your low opinion. The tools are designed the way they are not to appease lowly sysadmins, but because it is a pattern:

-Define the desired state of your system in a declarative way. This requires a DSL, and some tools such as Ansible and Kubernets use YAML as that DSL.

-The configuration engine (Kubernetes, Ansible, Chef, Salt) will converge the system to the desired state in an idempotent way using the DSL.

-Many systems have mostly the same config and many systems also run in environments that are mostly similar but with some differences. It doesn't follow the DRY principle to create separate but mostly identical desired state configurations for every system/environment, so we need programming primitives and templating tools to accomplish it. It's no less valid than a web app templating HTML for instance.

It's easy to conflate "mixing programming with configuration" with "using programming to solve configuration generation."


I think you're missing my point. I have nothing against using a DSL for declaratively defining an intended state of a system. It's just that this should be a proper programming language that allows to declare and use arrays, variables, and general logic.

Meanwhile, YAML is not a good DSL framework, it's not even a programming language, it's a markup language. Any logic defined in it will be inherently bolted on and likely based on text templating and full of stringly-typed confusion. A classic CM tool that does get this right is Chef, where normal Ruby code is used to build the declarative intended state. You can extremely easily implement any logic you want to create an intended state (even retrieve it from other components) and then emit a desired, declarative resource state (and --why-run will then show what steps will be taken to converge tit). You can do these things right, just don't try to turn a markup language into a turing-complete DSL.

You compare this to templating HTML - but nobody would ever seriously consider declaring such templating logic within HTML itself with some sort of Jinja2 templating engine duct-taped to the side. Instead, we use whatever is in control of serving HTML to render the final markup, with the bulk of the logic implemented in a normal programming language, while a templating engine takes care of rendering variables and loops.

Comparing Ansible's YAML and Kubernetes' YAML is also wrong. A better comparison would be comparing Ansible to Helm, since they both take the templated-YAML approach. Kubernetes by itself doesn't even let you just use YAML files as intent and idempotently apply them (some objects can be created from YAML using apply, but then not updated, and instead have to be recreated, and that's the logic that's pushed onto external tools like Helm, Kubecfg, ...).

And even instead of using a real DSL that's built into a particular tool, there's the second approach that I currently follow and recommend: use whatever programming language you're comfortable with to emit a desired state in whatever interchange format your management tools needs, and let those tools handle the domain-specific task of converging the state.


> A classic CM tool that does get this right is Chef, where normal Ruby code is used to build the declarative intended state.

If you're saying that, you should clarify for anyone not familiar with various config mgmt tools that in order to provide you with ruby, it compiles ruby in 1 pass, then evaluates the result in ruby again after enriching its environment with the results of the first pass.

This creates surprises when code apparently disappears (because it's been evaluated). In addition, to accomplish some of its other goals, chef performs an up-to 18-layer collapsing of various variables to make them available via the node object, which means that depending on which evaluation phase has completed and what else may have enriched the various layers, the state of the node object can be very hard to understand.

I say this as a big fan of chef - it's quite a lovely product. But on the other side, salt and ansible work to provide fewer footguns in the default case where the goal is to provide structured data to a function that will act on it. It allows you to footgun yourself differently, but your excitement about yaml vs. a programming language is understandable, but impractical since putting programming languages into the position of managing the messy reality of running a program on an operating system requires a lot of any programming language - it's where the rubber meets the road and the road is neither smooth nor straight.


Would you expect all sysadmins to become programmers?


Yes.

And back in the day sysadmins wrote programs like tcp_wrappers, postfix, sendmail, etc. It was definitely common and encouraged if not expected.

The idea that system admins should be able to get away with never professionally stepping up to doing programming is a late-90s/early-00s aberration that was generated by the need to scale up during the dot com boom before good automation practices existed. And it is not a good thing and it stunts intellectual growth. Devops is in some sense back-to-the-future by tearing down an artificial wall that never should have been made in the first place.


Weitse, the author of 2 of those tools, is the only systems person at his university who I'm aware of who has done a lot of tooling. Allman, author of sendmail, was not a sysadmin by any stretch of the imagination. IIRC he was a grad student then systems hacker who was surrounded by unix and programming.

Sure, lots of systems people program and love programming, but the job isn't to write write tcp_wrappers, postfix, sendmail etc. it's to do what's needed to make sure that such things run, which is a different kind of programming.


Would it be beneficial? Yes. Is it necessary for most cases? No. I do think, however, the moment when you infrastructure becomes defined by code (eg. you start using any CM system) then you should be training your sysadmins to program more and duct tape existing solutions less. Operations become vastly easier to scale and maintain if you can defer their logic to actual services with clearly defined business logic rather than a mess of ansible configs and bash scripts.


Maybe some will?

My take is:

The complex admin work will be going to the cloud providers.

The rest (besides some niche stuff) will get so simple a programmer can do it on the side.

Admins that don't learn to program are left behind. Only those who have some cushy full time employment, were they can't be let go, will remain.


> Admins that don't learn to program are left behind. Only those who have some cushy full time employment, were they can't be let go, will remain.

While I personally am a Software Engineer + Systems Engineer combination perfect for modern Devops... I do not agree with you one bit.

There will always be a place for non-programming systems administrators -- especially as we get more and more systems over time.

Will those administrators be able to /scale/ to managing 100s or 1000s of system without learning some sort of programming language or configuration management system? Probably not.

However, there are plenty of businesses out there that only have a handful of servers and services which can totally (and more cheaply) be managed manually.

Automation has an up-front cost that doesn't always make sense for the business to spend.


Because it's often really hard to know what aspects of your product needs the flexibility of code and what benefits from the simplicity of configuration.

I wish we started analyzing this as the hard problem it is rather than getting angry that people didn't always make the right decision by correctly anticipating how their software would end up being used.


Because the tools only support markup languages, so we have to use stuff like Jinja or go templates to create those.


Sure thing, but Jinja looks like a garbage fire.

Why couldn't they use a regular programming language to create the markup? I mean this is done with HTML and JS all the time.


jinja is great, IMO, and well suited for it's intended purpose.

jinja is in wide use in the python community.


Maybe jinja is well-suited to whatever its intended purpose is, but it's not well-suited for programmatically building markup. Using a dynamically typed scripting language to generate the markup at least has the benefit of allowing you to define appropriate semantics via functions ("give me a blob of YAML that represents a Kubernetes service with these parameters").

Sure, you can probably build your own concept of functions on top of Jinja, either by embedding them or building your own paradigm on top of Jinja's template inheritance, but it's a hack either way.


Well, I think it was originally designed with server-side html templating in mind. It works great for that purpose.

In the case of thread OP, that template is pretty messy, yes. In ansible, I would just create a custom module or action plugin in python to spit out whatever it is I need. I don't know how flexible salt is in this regard, but ansible is quite flexible.

For simple cases, jinja works great if you're just doing a find/replace type operation in yaml.


We don't have to--you can generate the markup language with a scripting language and end up with something that is quite a lot cleaner. I've been toying around with Starlark as a yaml generator for Docker Compose, CloudFormation, and Kubernetes. This ends up working out a lot more nicely in practice than Helm's templating solution or CloudFormation's nested stacks.


With salt at least, that .sls file can just as easily be python. Just return the structure you need.


Thanks for pointing that out. It's exhausting reading people in this thread getting very excited and demonstrative about the fact that a flexible system like salt provides a way to do this that results in a serialized data stream. They assume that they know better how this should be done, and posture about their superiority vs. some kind of straw man.


Because their bosses dream up scenarios where customers will be generating config files to be used as templates, etc..., and they're just trying to solve whatever problem was put in front of them and "we'll use YAML", IME. Of course, those scenarios never play out, the thing is bastardized, some new responsibility gets created for some new role where it's their job to handle the template generation on a per customer / per site basis, and on and on we go.


YAML = YAML aint't markup language

Please don't give markup a bad name because of YAML, or the clusterfuck linked by GP.


You got me wrong here. I don't think markup languages are bad.

I just don't understand why someone would create such a "clusterfuck linked by my GP"

If you re-invent a programming language in markup, you basically get a new custom programming language that "probably" can do the same as another non-custom programming language, but without the benefit of many people already know them.

Why not integrate something like Lua or Dyon instead of YAML or XML?


I know what you mean, and agree with you ;) The inflationary use of the term "markup" just bothers me, when it denotes a very specific thing that's been in use since before computers existed.


You can use python to create that directly. Jinja is just another, default way to achieve this. Why? Because it turns out it's much cleaner that way. (I've done both)


Where's the abuse? They're just generating a YAML with Jinja2. Is there something egregious about their YAML structure?


It's really easy to generate invalid YAML with a templating language if you aren't careful. If you like tooling that protects you from yourself and others mistakes then using a template language to generate YAML is going to ring warning bells for you.


True, but once you get it right once, it's really easy to generate the correct yaml over and over again because the input is not unknown random data.


The problem is that in practice you are modifying these somewhat frequently and if you aren't careful one of those modification will result in a production outage.


For the things that change frequently, guardrails can be built (extra testing, extra safety). But over time most configuration ends up stable, with the changes not being wild-wested, in my experience.


Because the linked example is code, not just configuration data.

And if I'm writing code, it should look like code you'd actually want to read and edit. Not something with hacked-on delimiters {%- all over the place %}.


Yep, saltstack is a dumpster fire. Each jinja template can call arbitrary Python functions, and usually brings in external data from somewhere, so if something fails you have a jinja template that imported arbitrary data into a YAML format which defined multiple action items that are executed in some hard-to-understand order and good luck figuring out what went wrong where. Jinja/salt usually wouldn't even tell you what line of the original script caused the error.

My company's scripts have lots of "if fail, try two more times, and hope it succeeds" logic as a result. -_-


After years of using Salt (and contrinbuting upstream patches and PRs), I've finally given up on it entirely. There are too many reasons to include here but it indeed has turned from something useful and promising into a total dumpster fire.


Why would anyone generate YAML as a string when JSON is valid YAML and can be generated programmatically?! Why??


Something I've never liked about YAML, and I've not found a way around, is that you can't know if you have the entire file because there isn't an end marker. If someone cut'n'pastes a block of YAML you don't know if they missed a bit off the end by mistake (unless you know the schema). That makes it harder to debug problems with YAML config.


From the YAML spec[1]:

> Three dots ( “...”) indicate the end of a document without starting a new one, for use in communication channels

[1] https://yaml.org/spec/1.2/spec.html#id2760395


Three dots indicates the end of a block, not a document. Look at Example 2.8 in your link.

If the user sends;

    ---
    time: 20:03:20
    player: Sammy Sosa
    action: strike (miss)
    ...
.. you can't know if they actually meant to send ..

    ---
    time: 20:03:20
    player: Sammy Sosa
    action: strike (miss)
    ...
    ---
    time: 20:03:47
    player: Sammy Sosa
    action: grand slam
    ...
Also, in my experience, which is mostly limited to Ansible config because I prefer JSON, I've never seen three dots in the wild. I don't think devops people like them.


They mark the end of a YML "document", but not the end of a YML "stream", which can contain multiple documents. It's the terminology that's causing confusion.


That's interesting, I have always seen the three dots as the first line in a YAML file, my exposure to YAML being almost exclusively Ansible configurations.


You could make a convention where the file has to end with a document separator, i.e.

  ---
But depending on the exact setup and used YAML parser, this may be hard to enforce.


Having a strictly enforced schema is the ideal solution to this (and several other common problems with yaml).

One really really nice aspect of YAML is how incredibly terse it is. That's a massive and very underrated boon for readability but the lack of syntactic cruft also means that it's much more susceptible to various type errors if it's parsed without a schema.


I really want a good "universal" configuration language (like https://dhall-lang.org/) that can be used to generate arbitrary text files (e.g. nginx configurations, JSON, YAML, etc.)


So, my opinion is based only on perhaps 1h playing with it. I have mixed feelings about this. It's definitely neat, but I don't like that it adds quite a bit of cognitive overhead (again, this might be because I'm not fully used to it). I believe configurations shouldn't force you to think "how will this be rendered?". For my personal projects I started using protobuffer text formats as configurations. Having the proto definition is the only docs you need and you get meaningful errors when you misconfigured/forgot something.

Sure, the text format is quirky, but so far it works quite well for me and protobuffers (v3) can be easily rendered as JSON.


I agree with this sentiment. I work primarily in web dev, and the stacks are already so sprawling with different technologies... config file generation debugging just sounds like unnecessary hassle.

Had not heard of protobuffers before though, and it does look pretty neat.


Glad to hear :) I must admit that protobuffers v2 might be best for configurations files, given that they have the "required" keyword on fields and the generated code has methods like "hasFieldX() -> bool" which proto v3 dropped in favor of simplicity.


Is there any particular advantage to text format protobufs, rather than the JSON or YAML representations? I've avoided the text format, as I couldn't find it documented, and it seems easier to unmarshal YAML or JSON using the Python or Go APIs.

I do agree that your approach seems good. Alternatives such as jsonschema seem to have much worse syntax, tooling and language support. And we don't discuss XML schemas in polite company.


https://jsonnet.org/ exists and I've been playing around with a project that has similar goals https://ucg.marzhillstudios.com ucg is very much not ready for prime time yet but I think it's getting close.


JSonnet looks cool! I look forward to seeing your project also.


At that point, why not just use bash?


This whole argument pivots on the idea that you don't need code in your data.

Sure, you should wrap up all the shell commands you plan to use into code fragments that you can reference. It's neater that way. Suddenly, you have 50 beautifully unit tested objects in your code, each encapsulating a different way you planned on using 'grep'.

And then you open-source your tool, and hundreds of people descend on it and yearn to use it in ways unimaginable to you. You then have to decide whether you're going to a) create a plugin system that allows nicely tested modules in whatever language the user is most familiar with, b) you wrap all the functions yourself (good luck!), c) Say "Sorry, my beautiful tool - with is 99% exactly what you need - is totally not for this. Fork it and be gone" - or d) let people embed scripting statements or shell commands in some way.

If you have the time for (a) and (b), and this is the issue on which you wish to sacrifice yourself, more power to you. If (c) makes more sense to you, then thank you for your input, sorry your project didn't quite take off like you planned.

But please, don't get upset if people choose (d) and get on with their lives. Yes, it _might_ cause pain further down the road, but it's their road to travel.


Thank you... couldn't agree more. It's probably the best option I've encountered for configurations... Hell, look at apache, Ms config files, xml config files, they are all nightmarish compared to YAML. Short of an actual shell script language, it's about as good as we'll ever see.


A substantial part of the french social and tax system is being coded as free software. One of the teams writes the source code in YAML.

It's not a programming language, but rather a specification for calculable parts of the law. A JS programm then generates Typeform-like simulators, a node library, and an online interactive documentation (it is now compulsory for french administrations to explain their algorithms).

I'm not sure YAML is the good long-term design choice, but as we can't afford to write our own language, it's bootstrapped the project successfully.

Here is the main file : https://github.com/betagouv/syso/blob/master/source/règles/b... Don't be afraid by the number of lines, it's just a collection of variables.


Is something typed and better structured (e.g. protobufs) an option?


You can edit YAML with any text editor... how do you edit protobuf files?


There is a text format, but you can serialize YAML into protobufs. The benefit with protobufs is that by defining them you are specifying types and sub-messages etc. so when you assemble disparate pieces you get type checking at various levels.


Interesting. I just hope the resulting software is less buggy than the one for army wages...


Sorry, I should have noted that for the moment it is only a simplified implementation of the latest laws.

Bugs arise when you're forced to integrate rules that change over time.


The YAML spec requiring turning "yes" "1" to true, and "no" "0" to false is one of the most aggravating things about YAML.


I created a library that just ignores that part of the spec and parses everything that isn't defined in a schema as a string:

http://github.com/crdoconnor/strictyaml

It also does away with a few other anti-features (e.g. the object parsing that led to the epic RoR security hole) and there's a long justification for each.

I would like somehow to create a new spec out of this and see it implemented in programming languages other than python.


Also "on" and "off", and variations.

"y|Y|yes|Yes|YES|n|N|no|No|NO |true|True|TRUE|false|False|FALSE |on|On|ON|off|Off|OFF"

https://yaml.org/type/bool.html


"NO" being false is particularly fun:

    - languages:
      - en # english
      - is # icelandic
      - no # norwegian
      - ja # japanese
      - fr # french

    [{"languages": ["en", "is", false, "ja", "fr"]}]


I feel like 99% of YAML's problems (this being the case in point) would be solved by just treating everything as a string and letting the application interpret it whichever way it sees fit.


That's basically the idea behind StrictYaml. Unless you provide a schema, all fields are parsed as string, list or OrderedDict.


That's so entirely ridiculous...


Yeah, they probably should have had special syntax for booleans.


The problem is that they have too much special syntax for booleans.


I meant that they should have special syntax that isn't ambiguous (to a human reader). Just treating bare words as either strings or booleans depending on their value is what makes it confusing.


While we're writing the wrongs of badly specified interfaces, can we please kill this stupid obsession with "writing code" to get work done? Code is buggy. It's difficult to write well, you have to write 10,000 tests for it, you have to have a testing framework, complex procedures to move it through pipelines and roll it back and version control it and debug it and maintain it and deploy it. It's annoying.

I don't want to hand-craft a file like a 14th century woodworker carving out letters on blocks. I want a program to just make the config or do the appropriate action for me, based on questions it asks me. I shouldn't have to write code to do that, or treat a file as code, or use a "language" to get work done, whenever it is possible (i.e. most of the time).

The only times I ever want a "language" is when I need to do something complex and iterative, like crafting a search query, or constructing a simple one-line pipeline of pre-defined functions. Aside from that, crafting configuration and performing operations should have a real user interface.


> Code is buggy. It's difficult to write well, you have to write 10,000 tests for it, you have to have a testing framework, complex procedures to move it through pipelines and roll it back and version control it and debug it and maintain it and deploy it. It's annoying.

All of that applies to config, or to any other solution to the problem, because the complexity is (usually) inherent to the business problem you're solving. The difference is that in config, none of that tooling is available. If I put some complex logic in config I still want to version control it and debug it and test it - but I can't. Give me a real programming language where at least I have standard tools available for managing complexity.

> Aside from that, crafting configuration and performing operations should have a real user interface.

Code is the best user interface anyone has ever come up with for specifying complex logic, precisely. Mathematicians use something very similar, not because maths is done on computers but because they have the same need to communicate precisely. Lawyers use a "plain english" language that ends up being more verbose and less readable. Any "visual" format, including every configuration UI I've seen in practice, is even worse.


So while obviously a bit data UI wise are CPAN's config script and/or the "make menuconfig" script for compiling a Linux kernel good examples of what you're talking about?

While I haven't run menuconfig since the 90's I find myself setting up CPAN on just about every server I manage. Although a bit dated like I said above it's still very easy to use and provides clear instructions on why you should or should not choose certain options based on the environment you are setting up.

Just wondering if these are good examples or not. If not, do you have any other examples of good config programs you can point to?


Really any of them are preferable to nothing at all. Yep, CPAN at least does the bare minimum to get you up and running and saves its configs, you don't have to do a mini research project just to get it working. I think Jenkins is a good model for basic web UIs. I wish tools like Terraform and Consul had a setup UI, but I guess that's how Hashi makes money.


Thanks, I’m not familiar with Jenkins but I’ll check it out.

I have to admit that I used to spend time reading and painstakingly setting each and every option in CPAN, some of the time not even understanding why or what I was setting, but these days I just say yes to “do you want me to try and configure as much as possible automatically?”. It hasn’t failed me yet! :-)

Any tool that can do the heavy lifting by poking the environment a bit like that seemed to fit what you were talking about so that’s why I mentioned it.


I can't tell if this is sarcastic or not, but it doesn't seem to be.

Binary computers are precise in their operation; that's why we use programming languages: to make it easier to create the underlying instructions to perform calculations. You are assuming that human answers to human questions in imprecise human language can even be converted to binary instructions. If it were that easy, general AI would've already been invented and self-driving cars would already exist.


You're taking it too literally/assuming too much. It is super easy. Watch.

  $ awscli s3 ls
  Unable to locate credentials. You can configure credentials by running "aws configure".
That is stupid. Let's fix it.

  #!/bin/bash
  [ -r .awssettings ] && . .awssettings
  [ -n "$USERN" ] || read -p "What is your username? " USERN
  if [ ! -n "$PASS" ] ; then
      read -s -p "Enter your password: " PASS ; echo ""
  fi
  [ -n "$REGION" ] || read -p "What region should I run this in? " REGION
  echo awscli --region $REGION --user $USER --pass $PASS "$@" \
  && echo -en "USER=$USER\nREGION=$REGION\n" > .awssettings

  $ ./foo.sh s3 ls
  What is your username? peter
  Enter your password:
  What region should I run this in? us-east-1
  awscli --region us-east-1 --user peter --pass blahblah s3 ls

This is a contrived example, because of course AWS wants obscure tokens and secrets to log in, but the point is that it was trivial to write an interface that helped the user out. The fact that this isn't the default is stupid. (The argument that it fails by default to "help programs out" is dumb, because a simple argument like --no-questions which becomes default without a tty is easy)


And how do I put process controls around that? How do I have confidence that someone I'm trying to help over the phone will see the same screens and get the same questions that I do? What do they do if it asks the wrong question?

Having simple, predictable, consistent behavior from our tools is far better than trying to guess what the user wanted.


You must live in a different universe from me. In my universe we have been successfully addressing those problems for over 60 years. Also, our tools aren't simple, or predictable, or consistent, which is why our universe created this software called "Docker". It helps, but there's no UI for anything (unless you pay a lot of money) so it's still a pain.


> In my universe we have been successfully addressing those problems for over 60 years.

In my universe people have been pushing "visual programming environments" for something close to 60 years, and they're still awful and haven't caught on. Over the last 10-20 years programs have taken a step away from "wizard" style interactions like you describe, as it turns out that a simple, automatable config format is more useful than a step-by-step interaction. API design has started to realise the importance of understandability over magic, e.g. the new generation of 3D graphics APIs are much less "do what I mean" than the previous generation.


I highly agree with inappropriateness of YAML for GitLab CI/CD configuration. It does work for simple use cases and/or in small repositories - however, doing something slightly more complicated than a simple build/test/deploy pipeline takes an immense effort, with a range of hacks and workarounds.

Even the simplest things, like passing some data (e.g. a link) from one job to another requires quite an overhead (either using artifacts, storing it somewhere in KV or even something worse).

Moreover, .gitlab-ci.yml is one and only entry-point for the CI configuration, so everything goes into it. Yes, it does have concept of includes, but even that is quite limited and not sufficient for any reasonable workflow.


GitLab PM for Verify (CI) here.

I agree that a lot of complex pipelines can be tough to express in YAML. One of our key focuses this year is to make advanced use cases for GitLab CI/CD more lovable. Specifically, you can see the overall direction here: https://about.gitlab.com/direction/verify/. Some specific issues I'd love feedback from the community on that I'm thinking about are:

* Directed acyclic graphs (DAG) for pipelines: https://gitlab.com/gitlab-org/gitlab-ce/issues/47063

* Make it possible to use any language to generate `.gitlab-ci.yml` and pipelines: https://gitlab.com/gitlab-org/gitlab-ce/issues/45828

* First class support for multiple pipelines: https://gitlab.com/gitlab-org/gitlab-ce/issues/28592

* Self-modifying pipelines: https://gitlab.com/gitlab-org/gitlab-ce/issues/44199

(edited for readability)


DAG's are good. Let me define a DAG in imperative code (any language I want) using any techniques I want, then I hand the DAG off to you and you execute it. Anything else is introducing complexity for complexities sake.


Related: If you haven’t seen TOML (”Tom's Obvious, Minimal Language”), have a look. It seems pretty nice IMHO.

https://github.com/toml-lang/toml


Nested schemas are pretty disgusting in TOML. Granted, there's an argument against having deeply nested schemas in the first place.


Article fails to live up to its title and offer any defense of YAML, specifically.

Yes, a lot of YAML abuse consists of trying to pretend code is config, and that problem exists for any config format. However YAML specifically is also a uniquely bad config format, even when used for plain config; too many obscure corner cases (e.g. the abbreviation for Norway turning into false, port mappings turning into times when the ends in a number...)


> Note the script block containing a list of shell scripts. Does this look like data?

Code is Data! Lisp-it or Quit!


One of these days, I want to come up with a non Turing complete s-expression based config format (basically something like a Lispy JSON) and post it here just to watch Lispers rage at not being able to write DSLs in it.


But...they will be able to write DSLs in it. Just create functions with the same name as the head atom of the top level lists, then run the config file as a program. Or write macros matching the syntax of your format to generate code.

I mean, maybe there is something about your format that makes these specific approachs inconvenient. But once you give a Lisper a bunch of S-expressions, you just opened the door for them to unleash the full power of Lisp upon it.

So I think your plan to frustrate Lispers by making an S-expression based format popular, will just give them even more power.


I don't think many non lispers realize that you can pass the same structure to different parts of your program for different manipulations or executions. Truly, the code as data mantra is just too removed from most languages. It is too easy to think eval is something that can just take in a string, at best. Completely missing that you can literally change how your code is evaluated.


Kicad uses non-turing-complete S-expressions as a file format: https://github.com/KiCad/Connectors.pretty/blob/master/bnc-c...


Rational Rose also did that (does? Is that still around?)

The IMAP4 mail protocol uses S-expressions also.



I'm a big fan of using Lua for this. You can put all the configurations you want in a fairly readable format and then if you need to, custom Lua code to get process for the hard parts. The receiving end has the ability to decide if it wants to run the custom code or not for security reasons.

I saw the comment that Helm is going to include Lua, but not getting the advantage that Helm brings over base Lua.


You can actually use Python as a configuration file pretty well, if you need more complicated logic and macros. Otherwise, sticking to a toml or ini file is my preference.


Are you gonna embed a Python interpreter into your C# programs though? Better to use something like dhall that's at least open to being loaded from multiple languages.


I like your point, it goes against the lambda-ultimate post linked separately in the thread. If you accept your point (cross-lang-loading) and accept the LTU argument about the benefits of internal DSLs (you're not slowly reinventing a useful turing-complete language) then you would maybe want to use a very simple, easily interpreted language for your DSL and embed an interpreter for that. (I (can't (think (of any though))))


> then you would maybe want to use a very simple, easily interpreted language for your DSL

Right, which is essentially what Dhall is. There aren't any established "real" programming languages that are restricted enough (non-Turing complete, etc.) to use for this kind of case.

> (I (can't (think (of any though))))

S-expressions are never going to succeed; there are too many subtly incompatible variations going around already, and their fans are unwilling to compromise and agree on a common standard.


This is what IronPython was invented for.


Hahahaha. No. I don't care to ever be forced to write a C# program. I write my applications in python, running in trusted environments. If I need optimized low-level code (which wouldn't be C# either), I access it as a library from within the python glue. Dynamically importing config files works extremely well.


INI is the best config format imo since you can't do anything more complicated than assigning a value to a key in a namespace. The Dist::Zilla config file was a real eye-opener at what you can achieve with this kind of siimplicity.


You need to specify which INI config file format, unfortunately. There are many different ways to handle leading & trailing whitespace, quoting, multiline strings etc. and I think all of them have been implemented at some point.


> the Dist::Zilla config file


Agreed, hopefully ini files have lost the Windows stigma enough to be used everywhere.


If you haven't seen EDN (Extensible Data Notation), it might be worth a look. It was developed by Rich Hickey as a JSON alternative, for use in Clojure. I've found it to be very useful, and not just in Clojure. For example there is an edn crate in Rust.

* https://github.com/edn-format/edn * https://learnxinyminutes.com/docs/edn/ * https://www.compoundtheory.com/clojure-edn-walkthrough/


I’m glad I read the internal vs external DSL discussion two links away from the article:

http://lambda-the-ultimate.org/node/4560


Alternatively, if you're programming and you want a structured data format, use S-expressions! They are basically tailor-made for exactly this problem.

    (gitlab:assets:compile
     #%dedicate-no-docs-pull-cache-job
     (image "dev.gitlab.org:5005/gitlab/gitlab-build-images:ruby-2.5.3-git-2.18-chrome-71.0-node-8.x-yarn-1.12-graphicsmagick-1.3.29-docker-18.06.1")
     (dependencies
      setup-test-env)
     (services
      (docker stable-dind))
     (variables
      (NODE_ENV production)
      (RAILS_ENV production)
      (SETUP_DB false)
      (SKIP_STORAGE_VALIDATION true)
      (WEBPACK_REPORT true)
      ;; we override the max_old_space_size to prevent OOM errors
      (NODE_OPTIONS "--max_old_space_size=3584")
      (DOCKER_DRIVER overlay2)
      (DOCKER_HOST "tcp://docker:2375"))
     (script
      "node --version"
      "yarn install --frozen-lockfile --production --cache-folder .yarn-cache"
      "free -m"
      "bundle exec rake gitlab:assets:compile"
      "time scripts/build_assets_image"
      "scripts/clean-old-cached-assets")
     (artifacts
      (webpack-report
       (expire-in 31d)
       (paths
        "webpack-report"
        "public/assets/"))))
And that’s just a first cut; it could be made much nicer.


> And that’s just a first cut; it could be made much nicer.

This is exactly the problem with S-expressions. No-one is willing to define what "good enough" looks like and create a fixed, reusable standard for how you represent these things. Instead everyone hand-rolls their own, subtly incompatible variant.


Whatever is the final evaluator gets to define what is required. Beyond that, give your users the power to do it their way. Or they will find a way to get it themselves.


> Whatever is the final evaluator gets to define what is required.

Which is completely useless, because it means everyone will do it differently.

> Beyond that, give your users the power to do it their way. Or they will find a way to get it themselves.

Users put value on having a standard set of scaffolding. That's why these standardised config formats have succeeded.


Have you seen the plethora of templating engines used to generate the yaml configs out there? I am surrounded by folks that think Jinja to create yaml is a normal idea. Good, even.

Then there is the slow creep of Turing into config for the sake of dynamic config. Starts as a simple condition flag. Then add if/then logic. It is an amusing tread.

So, yes. You can get a lot of variety in how configs look. But at the end if the day, they should all work. Usually in much more explainable terms. Just look at most people's emacs config. There are good options to make those readable today. It is still not that hard to see how most people's have worked for a long time.


Are those options agreed on and used consistently though? Other people's emacs configs are even more notoriously unreadable than Jinja-templated yaml, IME.


I've found emacs configs are actually not that bad, all told. There can be a lot there, and some are certainly more organized than others, but just figuring out what is happening is usually not hard.

Reading people's CloudFormation and related templates? Especially if they are done using jinja or some other yaml generation trick? Near impossible.

To directly answer your question, though. No, it does not appear that anyone agrees on how to generate config for things. Which is why every team I join seems to have invented a new way for doing it, all in the name of "maintainability." I often get dirty looks for suggesting that people not build automation on top of the config before they have proven it is truly needed with experience. (That is, if this is your first config to create, do it by hand for the reviewability, if for no other reason.)


Okay, you've re-serialized the data structure in a slightly different format. Is there a step 2 I'm not seeing?


OK, I totally get the argument that YAML is for data and not programming. But I'm not sure the line between them is always clear.

For example: I used to use YAML to define field mappings between external data and internal models in a Rails application.

like:

  - src: fieldA
    dest: field_a
    filter: name_of_some_filter_function

Is this programming? Data? Really it's configuration for an import library, but I find the line blurry.


Imo that's data, except if the filter is an executable script. For me, the programming part in yaml is string, either single, array or multiline, that will be compiled, executed or eval-ed.

In your case, it can be defended as documentation for data mapping, but executable scripts are rarely so.


I once built an interpreter that parsed YAML files as s-expressions, just to watch them scream. I called it YAMlisp, because it's better than a potato.


At my work we've started rolling out Pulumi as an alternative to YAML hell. The advantages are numerous. It's an integrated programming environment for devops, and engineers can split config and data or unify them as relevant. Sometimes a mix is good!

The TypeScript SDK is extremely thorough and type driven, enabling non-devops engineers to catch more errors before it's run, and it extensively uses asynchrony to build up a graph of resources needed, even if those are across platforms. e.g.: We create a project in Google Cloud, then create a service account, then we create a GKE cluster, then a namespace in the cluster when that's available, then we create roles, ... all of this falls out of simply using their SDK, the resolution of the DAG and awaiting of results is done automatically with some nifty types (almost all inputs can take a promise-like value instead of a POD data type.)


The configuration method for the Gunicorn server (http://docs.gunicorn.org/en/stable/configure.html#configurat...) would be a great example to follow. Configuration files are just Python modules. You can specify the number of workers and set log file locations with the following:

  from multiprocessing import cpu_count

  def max_workers():
     return cpu_count() * 2 + 1

  workers = max_workers()
  errorlog = "/var/www/lcfs/logs/error_logs.log"
  accesslog = "/var/www/lcfs/logs/access_logs.log"


How would one integrate that with a (Rust|go|C#|thoer-compiled-language) application on Windows? Yeah, you can do stuff like that in a scripted language. Doesn't make it a good option even in the majority of cases.


You should see how people configure their emacs setups. :D


use-package is a nice declarative way to configure packages in .emacs


And it is just a standard macro. Nothing built into emacs to support it.

My point being that just having some code to configure your thing is not exactly novel. Just not commonly done. Outside of lisp communities.


Gunicorn probably got this idea from Django, which probably got the idea from Lisp (see also, emacs). :-)


Slightly related, I just made a Webpack plugin that lets you use Markdown as configuration data: https://www.npmjs.com/package/mdconf-loader

I've been playing with ways of entering test data for years. At Triggerz where I have worked, we use Excel heavily. That works really well but I have to open Excel or LibreCalc to edit the files, which feels really slow and annoying compared to all other files that I can keep in my editor. There is a read-only plugin for VSCode but not yet one that lets me edit.

Markdown tables are quite easy to write when the editor supports them (i.e. auto-formatting), and work pretty well for me.


I can't help wonder how much better YAML would've been with minimal delimiters instead of significant-whitespace. The big feature (X/YA)ML has over JSON is distinction between properties and children; which I think is pretty cool for many use-cases. Lack of comments is also a big limitation of JSON, and of course there's the huge legacy ecosystem behind XML. But I've never seen significant whitespace be a good thing; this alone makes YAML decidedly not "human readable", speaking as someone who doesn't work with it very often.


I'm usally confronted at YAML when I setup gameservers of some indie games (I'm looking at you Empyrion) and I find it extremely painful to handle.

When you have sevral configuration files where you have to change/tweak data I always end up pulling my hairs why stuff isn't working because of an extra space lying anywhere in the file.

Please don't use YAML for anything that has to be edited by a human

INI, XML, JSON are all formats that are more forgivable regarding that (oh and always trim anything you parse. The more you do to help against accidental input, the less admins you'll have ;)


It's weird how languages like YAML, XML and JSON are very much designed for communication between machines, but are still the default choice for human input. Actual programming languages - designed for use by people - are rarely considered for high level configuration.

I have actually seen something similar to this happen:

1. We just need a few configuration options. Let's add an XML configuration file.

2. Keeping all the configuration files in sync for different environments is a lot of work and really error-prone. Let's generate all the configuration files. What about an XML meta-configuration file?

3. Some things are different between environments. We need conditionals in our XML meta-configuration language.

4. There is a lot of repetitive configuration. It would be more maintainable if we had loops, variables, integers, string interpolation, functions, ... in our XML meta-configuration language.

Great, now we invented our own awful programming language that lacks any tooling, documentation or libraries and isn't compatible with anything else.


>It's weird how languages like YAML, XML and JSON are very much designed for communication between machines, but are still the default choice for human input. Actual programming languages - designed for use by people - are rarely considered for high level configuration.

What are you talking about? YAML, XML (particularly HTML) and JSON were always intended to be written by and understood by human beings, no less so than any "programming language." All of these were designed to be "used by people" and machines.

>Actual programming languages - designed for use by people - are rarely considered for high level configuration.

And the rest of your comment illustrates why. What you have when you use a programming language for "high level configuration" isn't configuration. Configuration should describe constant state, not operate on or transform mutable state. What you have, then, is just more application layer, on top of your application.

Which you don't have with JSON. Or INI. And you do still kind of have with YAML, and definitely can with XML, but that's why a lot of people don't like YAML or XML when it gets too complex.

What it looks like your example shows is dumping unnecessary complexity into configuration in order to maintain "simplicity" in the application. You would have the exact same problem using a high level programming language, it would just be potentially infinitely worse with the explosion of complexity and feature creep that comes with it.


Many comments here make the point that configuration languages often pretend to be code and they do so badly. The conclusion is then that we should simply use code instead.

I'd like to propose that there is no fundamental distinction between code and configuration languages. The distinction is actually between total and non-total (Turing-complete) languages. Configuration languages are just programming languages of varying complexity. Some of them are total (i.e. non-Turing-complete) and that's a good thing.


I wish there was a .safe_yaml or something that would just use YAML.safe_load, would get rids of most of the YAML issues for me.


Recently been working on a rails app and been forced to use nano for editing one a remote server and it's now that I really see the niceness of YAML. I always like YAML with ruby but I think a lot of the great features are subtle and hard to appreciate. Inheritance! What a great idea for a configuration language. Greatly missed in JSON.


Anything you can express in YAML, you can express in JSON. I'm not saying I would choose JSON over YAML for configuration... but it's definitely not lacking in equal flexibility in use for structure.


On the flip-side, if you were to embrace using YAML for programming (i.e. by creating a programming "language" by treating certain YAML constructs specially), you'd likely end up with a homoiconic Lispy language (albeit a very weird one), just using YAML instead of S-expressions (maybe call 'em Y-expressions?).


While there are no other format with clear advantage over YAML, I wonder why nobody use javascript with modules as programmable configuration.

Can be broken to file parts, support comment, variable, functions or scripts. And with additional extension can import packages. Though the downside may be JSON format that's noticably bigger than yaml.


I don't even want to think about the support nightmare of releasing an application with configuration beyond YAML or environment variables.


I think JSON is almost always better than YAML, except for very simple cases.


I feel it is always better for programs to accept YAML than JSON, because then you can feed it either YAML or JSON (all valid JSON is also valid YAML).


I can't be the only one that sees the historical irony in Rod Johnson being the one to write up this opinion? (Then again, you could argue that he's speaking from experience.)

Over fifteen years ago I was in programming as configuration hell with Interface21, and later Spring-based enterprise Java stuff...


This article is one of the best arguments I've read in favor of Lisp in a long time.

Code is data is code.


My favourite configuration style is used by Django it's just a Python file. And the result is I can do things like have base.py devel.py staging.py prod.py

All with any form of inheritance or programmatically derived settings I see fit.

ROS2 is going the same way too.


Why we even have something like yaml? We already got JSON, same key value format, but more reliable. I remember my first docker experience was ruined because of yaml and wrong formatting.


YAML predates JSON as far as I know.


At the bottom of that blog post:

> Djikstra

Dijkstra. Please!


YAML is so badly designed, it shouldn't be used for data either.


JSON for everything. Done.


+1000 fake internet points to this article for keeping it short with clear examples.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: