Hacker News new | past | comments | ask | show | jobs | submit login
Always start with simple solution (buklijas.info)
67 points by sasa_buklijas on Jan 14, 2018 | hide | past | favorite | 59 comments



JSON is not a human friendly config format; it has harsh rules for quoting, brackets and commas and most importantly has no intuitive facility for comments. YAML should be preferred to JSON for config. However, ConfigParser is even better and a list of IPs need only be space separated (eg hosts = config.get('hosts').split() , quite simple ) or you could even make the value of the parameter be a JSON list if you really like to type brackets and quotes. The "simplest" approach is in fact to use the most idiomatic approach that everyone recognizes, and if you're doing config files in python that means ConfigParser.

Edit: it seems they went with a newly invented language called json-config that looks like a hybrid of JSON and C-style comments. That's fine, but using an invented config format that nobody knows isnt the "simplest" approach at all. It's the most complicated solution. When people use invented idioms in their projects for things like config and logging that have widely accepted idiomatic solutions included in Python std lib that's an immediate red flag.


I find it very frustrating to hand write yaml personally. Always get some indentation or substructure rule wrong. JSON isn't as human friendly to write, but it's easier to avoid mistakes with it.

Personally the ini format is probably the most ideal config format. Simple to write, 1 dimensional, has comments.


Ansible is a whole programming platform that is based on YAML. We write thousands of lines in it at Red Hat. Ansible tells you immediately where you have syntax errors and it is much easier to read and write than JSON.

But for config, it is usually overkill. If ConfigParser isn't working for your config job you should try to see what it is you're doing that requires such complex config (which again is all about doing the simplest approach possible).


How does Ansible handle defining a subdirectory of a previously-defined directory? Is it more complicated than "${BASEDIR}/subdir"?


I like YAML, but it needs shell-style $ substitution tokens to allow constructing values from previously-defined values (such as when defining a PATH variable, or subdirectory).

To that end, I'm going down the path of writing my own extension to YAML. Can you recommend one that already exists that accomplishes this goal? I don't see an alternative, because I can't find something standard that actually meets the requirements.

Shell scripting languages are platform-specific, and not hierarchically structured, so that's what I'm trying to replace to begin with.

Ansible requires a central server and full commitment to immutable deployments, so it's too heavy for a package manager that can support piecemeal adoption or that could be used by researchers doing ad-hoc development. It doesn't support windows anyway (linux for windows subsystem doesn't count).

Conda is nice, but even though it uses YAML for defining dependencies, it nonetheless falls back to shell scripts for environment variables, and doesn't offer features for environment modularization.

EDIT: Hierarchical structure is needed to be able to both express and manipulate configs for software that expects input as XML, so ConfigParser isn't good enough. Setting a value to a blob of JSON or XML denies the ability to operate on that blob to do things like merge subtrees. Instead, I'm using ruamel.yaml as my starting point, and adding post-processing after it's converted to nested dict/lists.


> Ansible requires a central server and full commitment to immutable deployments

i dont know what that means. Ansible is like a much more structured form of shell scripting, you can run ansible playbooks from anywhere to do anything and make them do anything. i don't know what an "immutable deployment" is.

also ansible has nothing to do with "config", YAML is used as a declarative scripting language rather than a config file format.


Really, I just want to know how ansible provides for token substitution, so I could reuse a value defined elsewhere, so I could define all the subdirectories in a playbook relative to a single base directory.

YAML doesn't offer a syntax for writing "${BASEDIR}/subdir". There's gotta be something in ansbile that makes this possible, but I can't find it.


That "thing" that you're referring to is handled by Ansible, and not by YAML. It uses the Jinja2 templating engine. And I would assume Ansible uses some sort of scoping order/rules in order to provide values for the template generation.

E.g. look here in the Ansible docs:

http://docs.ansible.com/ansible/latest/playbooks_loops.html#...

It provides a construct that allows you to loop over a sub-list of YAML items, and then let's you use the looped-variable in a string-substitution via the templating language. That's the name: "{{ item }}" that you see there.

Edit. Typo


Thank you! this is what I've been looking for, but searches for token, substitution, and "defining subdirectories" have not led to this. Calling it a loop is confusing.

It seems the values have to be defined in some other file, so I couldn't chain them, though.

Suppose I want to write something like...

  paths:
     BASEDIR: /somedir
     SUBDIR1: {{ paths:BASEDIR }}/subdir1
     SUBDIR2: {{ paths:SUBDIR1 }}/subdir2
Doesn't seem like that would work. I would end up putting all my definitions into a separate file.


In what way does Ansible require a central server? you can ask Ansible to run a bunch of playbooks from the shell, without anything else involved.


This might just be my own misunderstanding.


True about the comments for json. But you could use: "_comment": "Some comment" or a similar convention. But you'd have to handle it somehow when instantiating objects.

I do like python style docstring comments that are available in the object graph.

Which would allow for checking (live) config comments at run time:

config.server => "10.x.x"

config.server.comment => "This is on a private ip network because..."


For example, here I needed a configuration file that will have a list of IP addresses, which I will iterate in for loop.

This was the smallest requirement that I needed for my problem.

My instinctive solution would be even simpler --- a text file with one IP per line, similar to a HOSTS file.


Maybe this is an artifact of a generational divide, or with less generalization, a function of familiarity with tooling?

For their most basic requirement, a line-per-item format will suffice, and provides superior usability. You can edit it by hand, you can use all manner of widespread *nix-y userspace programs to manipulate the list or generate it dynamically, and it's dead simple to parse.

But if you've spent less time in that space, a canned 'configuration management' format sounds reasonably attractive, because it has well-defined semantics and you can build out your schema as you add more features without having to rip the underlying layer out.

Attractive as that may be, once you bring in a format whose syntax needs balanced markers, your editing usability goes down unless you use specialized tooling that can understand that format. This is one of many find YAML more palatable for this kind of task than JSON, why CSV will never die, and why INI is a surprisingly good configuration format.

TOML [1] took the best of INI and YAML, and is worth looking at as an alternative to JSON and YAML, while for batch ingestion of data, few things will beat the versatility of CSV. The one-record-per-line format can be implemented as one-column minimal form of CSV, so that it can still be extended later, if the need arises.

[1] https://github.com/toml-lang/toml


You are correct, that is the simplest way.

What did I not disclose in the article is that I also wanted somehow to name data.

I wanted this because I also know that in the configuration file, expect IP addressed I will also need to store sleep duration.

I just find it more articulated if I also have a name for data.

Now I use https://github.com/pasztorpisti/json-cfg for storing configuration as JSON. json-cfg has the ability to add comments to JSON file.


If you want to name your data, then you could have had a couple a different text files that were named differently


I meant data inside the text file.

Anyway, this is more question of preference :-)


I find plain csv a bit underrated in these kind of applications. As long as you have flat data, you can read and edit the file with practically anything without too much trouble.


I have several hundred sites that I need to ssh into and run commands on. I use a csv file with IP addresses, etc. So yeah, a simple text file should have been the developer's first thought if all they needed was IP addresses.


Or how about comma separated list of IPs in configparser format, that you then just split?

Just because the format doesn't support lists doesn't mean you need to make it parse [, ] etc on top.


then replace all the line breaks, <, < and single quotes with ","

split it on the double quotes(!?)

have some stupid ip validation like a minimum length and chars used (trim included ofc)

and voilà! Now it parses line break or comma separated as well as json, xml, html. Can throw any crap at it, if the crap has anything that looks like ip addresses wrapped in quotes and or line breaks it will work.

Make the ip validation better for even better results.


The whole point is to not have to do all that extra work, because you make your input file format simple enough.

Just read a line, take the first (or only) whitespace-separated part and feed it to inet_addr() or whatever networking function you use that takes hostnames and uses them. If it's invalid, it will tell you and you can print the offending one. No need to do anything extra.

My experience has taught me that it's fragile and stupid to do any sort of "validation" by reimplementing parts of lower layers. Need IPv6 support, or hostnames instead of IPs? If you hardcoded validation for IPv4 only, then you have to do extra work to either add that duplication, or (even better) get rid of the code completely. If you let the lower layers do the work, the additional functionalty is free. (And if the data is invalid to the lower layer, it will complain for you.)


Related, in today's Farnam Street email, he linked to an article examining the psychology behind why humans prefer complex things to the simple[0].

It's especially interesting because most other psychological biases appear to be the result of a mental short-cut. Perhaps we see something complex and mistakenly infer the preconditions were complex, and that somehow makes it more valuable.

An especially interesting quotation, relevant to this forum is in that article, by a sportswriter: "Most geniuses—especially those who lead others—prosper not by deconstructing intricate complexities but by exploiting unrecognized simplicities."

[0] https://www.farnamstreetblog.com/2018/01/complexity-bias/


Wonder if ego has anything to do with it as well? Perhaps for some people, they feel 'complexity' makes them look better, even if the simple solution would better in every rational way. Then again, it could just be boredom. The more complex solution also usually has a bunch of 'novel' issues to solve, and solving those can be a lot more enjoyable for certain individuals than doing things the reasonable way.


Yeah, this creates some sort of "moat of knowledge" that increases status if you make something easy look difficult. Also could make someone appear to be less replaceable ("we would have to think about something that looks hard to replace person XYZ!").

There's also the danger of overfitting when building the mental model or process.


> At that time, I was accessing only one device (only one IP address).

> But could see that in future (in few months to one year), I will need to do the same set of command on more devices.

IMHO, this statement represents one of the biggest design smells in programming. It's usually a bad idea to write software you may need in the future. Circumstances change. Writing code is expensive and coding in anticipation of what you might need is often a waste of time.

Write code you need now, not code you may need later.

It makes sense to think ahead but if you're spending a lot of time writing code you might need, then you're probably doing something wrong.


> IMHO, this statement represents one of the biggest design smells in programming. It's usually a bad idea to write software you may need in the future. Circumstances change. Writing code is expensive and coding in anticipation of what you might need is often a waste of time.

Completely agree, as you said, "... is often a waste of time."

Unfortunately, I was not so lucky, literally 2 days after I have finished the first version of the program I needed to add 2 more IP addresses.

I do agree that often is a waste of time.



The first complexity in the ConfigParser example comes from trying to parse something like a python list.

Just accept ips seperated by spaces, users will be thankful.

The other complexity is using the raw loading and converting from bytes.

ConfigParser can give you back strings and read the file for you.

Here is a full example:

    import ConfigParser
    import io
    import shlex

    def main():
        config = ConfigParser.ConfigParser(allow_no_value=True)
        config.read('config_ini.ini')
        testing_1 = shlex.split(config.get('other', 'list_queue1'))
        print(testing_1)
     
    if __name__ == '__main__':
        main()
This is the same as a comment I made on the website, but for the benefit of any python devs here.


This is a pretty confused article. The author starts with a list of useful things which he then deems 'too complicated', and ends up with using a JSON based config file. Ok. But the things he listed are not too complex, and the solution is a simple line-based config file of IP addresses:

   import netaddr, sys
   ips_to_connect_to = [*netaddr.IPGlob(ip) for ip in open(sys.argv[1])]
Those two lines handle points 1, 2 and 3 of his 'nice to haves'. You could easily add 4 and 5 in under 10 lines. You could expand it to read from stdin rather than a fixed file easily enough as well.


Thank for mentioning http://netaddr.readthedocs.io/en/latest/, I did not know about it.


The built in ipaddress module is enough for your current use case, netaddr just handles globs.


Command line args would have been the simplest solution imho. Works simple for simple cases and power users will create a bash wrapper anyway.

Hard to be wise without knowing the context, but I avoid config files as much as possible, because:

- Where does the config file live on different machines?

- What if I want to have 2 configurations on one machine?

- What if I need to change the list dynamically?

`for ip in sys.args` would give you most flexibility.


I agree but I wanted to avoid this:

... writing 34 IP addresses as CLI parameter, that is around 373 letters, is not a nice solution.


A bash script where you call the CLI program once for each IP would be a very simple solution.

    ./script 127.0.0.1 5
    ./script 127.0.0.2 5
    . . .
    ./script 127.0.1.254 10


I was on windows. But OK, there is batch script also.

I had to call program will all IP addresses, one by one would just not work.

Your idea is fine, but it was just acceptable in my use case.


You can write multiple arguments in a nice way:

  ./script \
    192.168.1.1 \
    192.168.1.2 \
    192.168.1.3 \
    ...


valid argument :-), accepted


It's sad that the perfect template for all non-binary data, configuration and otherwise, has been around for quite some time: S-expressions.

Nobody but lispers use them. Why?


How would the example from the post look with s-expressions?


    (test_list (test_1 test_2 test_3))
For the JSON configuration file, I suppose?


Is that really simpler than writing the list out in a file called config.py like

  ips=['192.x....',

  '192.y',

  '192.z']
Then in your main

  from config import ips
Or does distributing as an .exe preclude this?

Edit: formatting


You are correct, that is the simplest way.

What did I not disclose in the article is that I also wanted somehow to name data.

I wanted this because I also know that in the configuration file, expect IP addressed I will also need to store sleep duration.

I just find it more articulated if I also have a name for data.

Now I use https://github.com/pasztorpisti/json-cfg for storing configuration as JSON. json-cfg has the ability to add comments to JSON file.


Could you not do exactly the same but with a python dictionary rather than a list?


I was distributing my Python code as EXE, so use of Python code as configuration was not possible.

Altho, I think that Python code as the configuration is a good solution if you are executing source code, and only developer (not average user who does not know what Notepad is) will edit it.


I'm confused...

You're building a custom configuration file format for people who don't know how to edit configuration files?

And, as they say, "xml is like violence..."


I had to distribute my Python program as EXE because Windows PC where the program needs to be executed did not have Python installed.

That is why using Python as configuration file was not possible.

Hope that I have explained it well.


YAML - I am not a big fan these days. Perhaps I am have OCD on format, but with YAML ordering is “up to the user”. At least INI has a somewhat “header/section” so it looks more organized. The “yes Yes True TRUE true 1 => True (python)” is flexible but can be seen as negative if again you are like me OCD. You can enforce style guideline in your dev team, but for end-user, probably worth reconsidering your strategy.

The reasons I’d consider JSON for configuration are (1) when the configuration is really simple and short, and (2) I don’t want an extra dependency. You don’t want to handcraft for a larger data structure, and context switch between your terminal and JSON validator.

I like INI-style configuration file. But ConfigParser’s API is horrible, and everyone seems to like tweak and invent their own “INI” format.

Instead, for those really need a good configuration file, I recommend TOML [1].

For data file, either YAML or JSON are fine. But each comes with gotcha. Trailing comma in JSON is invalid (which is probably #1 “wtf what’s wrong with my json”). For YAML you need to be very careful with “do I want an int or do I want a string.”

[1]: https://github.com/toml-lang/toml


Now I use https://github.com/pasztorpisti/json-cfg for storing configuration as JSON. json-cfg has the ability to add comments to JSON file.


How does TOML solve the “do I want an int or do I want a string." part? I only know YAML and this caught me a few times, would love to know how toml does it.


None of them do, sorry if I was being too casual. The reason I named YAML because in YAML you can just write

    name: bob
where bob is assumed to be a string by the YAML parser. This is bad if you use Ansible because "bob" could be the name of a variable.

    bob: "I am bob"
    name: bob
    # at runtime Ansible sees 'name: "I am bob"'
But both JSON and TOML need the users to be more explicit. So while users still need to be mindful of "1" vs 1, JSON and TOML don't assume as much as YAML does.

Let me show you in Python.

    import toml
    import yaml

    s1 = "name: bob"
   
    yaml.load(s1)
    --> {'name': 'bob'}

    s2 = "name = bob"
    toml.loads(s2)

    Traceback (most recent call last):
     ....
    File ".../python2.7/site-packages/toml.py", line 664, in _load_value
    v = int(v)
    ValueError: invalid literal for int() with base 10: 'bob'

    s3 = "name = bob eve"
    toml.loads(s3)
    Traceback (most recent call last):
    raise TomlDecodeError("This float doesn't have a leading digit")
See how a space and without space yield a different exception different? Not sure if it's an implementation problem, or the spec says so though. But the point is TOML doesn't assume "bob" without a quote is a string, which is a good thing.


Actually this is pretty simple if you use ConfigParser properly:

  # test.ini
  [host1]
  ip = 1.2.3.4
  [host2]
  ip = 5.6.7.8
  [host3]
  ip = 9.10.11.12

  # config.py
  import configparser

  c = configparser.ConfigParser()
  c.read("test.ini")
  ips = [ c[host]['ip'] for host in c.sections() ]


I prefer to state it this way: "Avoid speculative complexity".


Or "do the simplest thing which could possibly work":

https://ronjeffries.com/xprog/articles/practices/pracsimples...


I'm really a fan of JSON config files for simple configuration. In a strongly typed language you can parse it into a "Settings" object with one line of code and work with the object.

Creating the settings object at least for C# is just a matter of pasting it into a website (after removing all of the sensitive bits of course)


Not directly related to the article, but can someone explain why people from Eastern Europe and Russia tend to drop most of the articles (like "a", "an", "the") when speaking/writing English?


Likely because Russian doesn’t have articles.

> ”There are no definite or indefinite articles (such as the, a, an in English) in the Russian language. The sense of a noun is determined from the context in which it appears.”

https://en.wikipedia.org/wiki/Russian_grammar


Correct, I (author) am from Croatia, and in the Croatian language, there are no definite or indefinite articles.

I have noticed that I do not even see that I drop most of the articles. Only when I use some grammatical spell checker than 90% of errors are missing articles and that I notice it.


English is my second language, and I do same mistakes with “a”, “an” and “the”. Though I try to stay grammatically correct, I often get corrected by my daughter, who is more exposed to English than I was at her age. I am sure someone can point a mistake or two, in two lines that I just wrote.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: