Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Eno – A lightning fast, user-friendly YAML/TOML alternative
102 points by simonrepp on Aug 16, 2018 | hide | past | favorite | 65 comments
eno [1] - A modern plaintext language w/ libraries [2] for JavaScript, Python, Ruby & soon more!

We migrated a big relational research database to a file-based solution - requirements were:

- Super fast and easy editability for users

- Highest performance for parsing/validating >10K documents on every user change.

Our trials with YAML/TOML showed us that we wanted something both faster [3][4] and easier [4], something tailored for file-based content management ... and after months of research & development it's now publicly available (under MIT license) for everyone!

Last but not least I also want to mention eno's document introspection capabilities - with a few lines of code you can build intelligent relational suggestion UIs as shown in [4] below.

[1] https://eno-lang.org/

[2] https://eno-lang.org/libraries/

[3] https://github.com/eno-lang/benchmarks/

[4] https://eno-lang.org/resources/introspection.mp4

PS. Your input for the Roadmap is highly welcome - what do you think should be in the next releases? More languages? (If so, which? Currently in progress: Rust/PHP, Currently planned: Go/Java) Additional IDE/editor support? (Currently supported: Atom/VSCode/Sublime) Or something else entirely? :) Looking forward to your feedback!




After a quick look I have a lot of questions

1) Unlike YAML or JSON this doesn't parse into a simple array structure, but a library dependent object hierarchy?

2) Is the API of the libraries also part of the language spec?

3) Can I assume that document.lookup() will be available as document->lookup() in the PHP library?

4) What about programming language specifics that may differ between languages? Will the PHP objects implement the Iterable interface? Or the ArrayAccess interface? (There are probably similar but slightly different concepts in other languages).

5) There is eno.parse(), but is there some kind of reverse mechanism to create a new Eno document? Like document.addList([...]).addSection('hey', 2) or something.


1) Yes! (You can directly dump it to a language-native structure with the raw() method too, this is not 1:1 YAML/TOML style generic deserialization though as there are no fixed types in eno)

2) Some detail aspects of whitespace-parsing around the line continuation syntax will need to be specified by the language, the shared official API I am implementing for the different platforms is fully open to improvement and future reinvention though, I'd love to see a completely new take for a library API if it comes up in the future. :)

3) Definitely!

4) I try to keep things as consistent as possible across the platforms, but if there are important language specific paradigms I think these should be taken advantage of! I can't answer details regarding the PHP implementation yet but keep in touch, I'm happy about a dialogue here! (Also I can't be good at everything :)).

5) I want one! Obviously there can't be a stable generic "just dump it already" implementation, but a smart builder-type API is definitely on the list, I even started one for enojs but had to re-prioritize because there was so much else to do for the whole ecosystem. ;)


What I was looking for on the website and think is more than implementing a parser in another language is schema support. That is, you should provide something like XSD for XML, JSONSchema for JSON, TOLS for TOML.

Why? There is a need (see above enumeration) to declaratively specify how an Eno file should look like. I do not want validation to creep into my code, like you do with `document.string('author', required: true)`. This just scares the hell out of me. Say you want to parse some Eno file with different languages, you also end up replicating validation, which mains you end up maintaining it, or rather not maintaining it... Apply leverage by moving validation into your parser.

Another thing is that it appears you are implementing the parsers by hand instead of using a parser generator that consumes a grammar for Eno. What is your reasoning behind this? Is it performance? Did you benchmark using generated parsers (maybe wrapped in a nice API)?


As someone who has worked has written parsers & worker with generated parsers, to add to what OP has said about performance, from my experience there is a lot of noise that comes from a generated parser, and interacting with the code/output is quite unpleasant.

Also you have to decide on checking in garbage generated code into your source or adding a build step so you don’t check the code in (which is less trivial for certain languages/stacks).

Unless I’m making an MVP or a prototype I would write the parser. It’s not as hard as it sounds


Yay thanks for the detailed input!

If there is demand or initiative for a portable schema solution I'll gladly support it! The native architecture in the eno libraries is programmatic because that does have its own powerful merits which are employed to the fullest in the API design, like always there's not one best choice, and the 'validation creeping into code' can be turned around into 'external schema definition creeping out of line with code' just as well. ;) Do you have some concrete usecase in mind or planned where we could explore how a portable schema solution could look like for eno? Drafting things from various real life use cases has worked great for eno so far, so that's the route I would love to go here too if we follow that track!

Custom parser implementation is easier to answer: By now I've iterated through dozens of custom parser designs for eno in multiple languages, and I'm pretty much confident that generated parsers will not stand a chance of being faster, they after all do the same thing as I do, only I can't really hand-optimize what they produce afterwards. :) You can study the benchmarks I linked to under [3], there are some generated TOML parsers included with rather disappointing performance to put it mildly, and as it stands there's not much that's faster than the eno parsers in YAML/TOML land anyway, so I have low incentive to experiment in that domain currently. :) Long term goal is to (optionally) integrate (generated or custom) C (respectively Rust) parser cores through native bindings as well, so that will bring up this question again then for sure.


I've come to really appreciate the difference between "syntactically correct" (ie: is a file valid xml) vs "semantically correct" (ie: does it follow a specific dtd if it's valid xml.) More than that, I've come to realize how many other people don't have this appreciation even though they will identify problems that directly relate to this distinction in every day usage.

To truly be able to have a portable file format, there needs to be a way to do both validations reliably in different contexts (eg: different languages). If you ignore this part of your design then it may become the slowest part of the eno ecosystem because your grammar will have quirks that you'll end up needing to support long-term. I suggest toying with this functionality now and providing something which is extremely pessimistic on what it will pass. Only loosen things up as people show a need and keep your entire spec as tight as possible.

I would imagine that you could even use eno syntax to describe document structure, much like xml/dtd has such strong parallels with each other. Then you get the fast parser in both places essentially for free!

Finally, on the format of eno itself, I'm curious on your thoughts relating to unicode characters that visually masquerade as common characters. eg:

http://www.fileformat.info/info/unicode/char/ff1a/index.htm

Sample usage:

---

author: Jane Doe

email: jane@eno-lang.org

---

Does this parse?

How about this:

---

author: Jane Doe:

email: jane=doe@eno-lang.org

---

What do I do if I want a "#" in a name?

---

# #twitter

@hackernews = 0xC0FFEE

---

or:

---

# \#twitter

@hackernews = 0xC0FFEE

---

Are quotes optional somehow? Can I put arbitrary things into an identifier?

Cheers and keep up the great work!


Syntax vs Semantics is distinguished by ParseError vs ValidationError in the eno libraries - I'll keep the importance of distinguishing them in mind for the schema development too - thanks for pointing this out!

Right now only an ASCII colon is interpreted as an operator, but this looks like a question to thoroughly consider for the next and final spec (which is planned for 2019, currently we're in frozen RC) - work on this currently happens at https://github.com/eno-lang/eno.

There is escaping for arbitrary keys by using backticks - see the advanced language feature documentation at https://eno-lang.org/advanced/, in the case of # #twitter you wouldn't need it though unless you omit the space.

Thanks for your input, appreciate it!


What about types?

All parse to string? If not, hopefully please add date/datetime support (better as ISO) and decimal.

---- > More languages?

How about make the core on Rust and the rest use it?

BTW: What your use for the introspection? What editor is that? I like the auto-complete stuff..


This! As the introduction doc coyly says,

> so as a user we usually just concern ourselves with editing the values and a friendly developer takes care of specifying the names for us. :)

The friendly developer gets the job of conveying -- somehow -- that values for landline: and mobile: have to be valid tel#s (but must they have the country prefix?) while "hire date:" must be in the form yyyy mm dd only... and also gets the friendly job of writing one-off, locally unique validation code for these fields and hacking them into the parser.


Conveying what types to enter is not an eno-specific problem, as a user without schema or code access you don't know which types a blank YAML/TOML file expects either!

Asides the absolutely valid meta solutions (e.g. in-file comments, clear key naming, documentation) there is an additional way this is approached in eno: If you use the type loaders provided by the API (say 'my_var = document.url('website')'), and properly expose errors to the user, the user will get a localized (!) error message in his language, like "'website' must be a valid url (e.g. https://eno-lang.org/)".

In the long run we can have community packages for any number of important locally unique types (loaders are just simple functions, so they can be easily authored), so at some point you likely don't have to write any one-off validation code, and neither the error messages or their localizations, you just pull it in as dependencies.


How about type inference? You can look at rebol/red/tcl for inspiration, that already look like a config format but have a defined way to types:

https://randomgeekery.org/2004/12/26/rebol-datatypes/

http://www.re-bol.com/rebol.html


I appreciate the input :) but the thing is that the typing concept in eno as it is now is essentially what makes eno eno. Every application that uses eno decides for itself what types it supports and requires, and that in turn is how eno manages to be so simple and usable on the language level, even for completely non-technical people who normally feel uncomfortable with the idea of editing their content as raw text files.

If I would add types and type inference again, then I would essentially arrive at YAML and TOML again, and I don't want to reinvent them. ;)

But if I actually misunderstood you there, please let me know and do clarify!


I get that, but I wonder how give a base set of types, that avoid small incoherences.

JSOn is a good example:

https://www.tutorialspoint.com/json/json_data_types.htm

Is so spartan that everyone need to encode dates somehow, to make a simple example. I think this are the base types (also, my experience with RDBMs and building a relational lang now, and always having troubles with cvs, json, and others formats in ETL kind-of-task):

- String

- Floats. Can be split Ints/Floats but stick to just Float is ok. However, make it Float64.

- Date(Time). And be ISO. Not ambiguity.

- Boolean

- Decimal64. This is a pet peeve of mine. A lot of data in the business space is about money, and floats are not ok. What if like in rebool $3.2 is decimal?

Then the composites.

ie: This is json + dates/decimal. And make a single encoding (utf8?).

Is insane that, for example, you save a CVS in excel and open it again and excel get lost, and can't parse it fine.

Apart from this, url, email, host, website, phone, cellphone, city, country, state are so common that maybe with a import like "!schema:common-fields" or something.


Regarding types check out https://eno-lang.org/javascript/#loaders - basically eno allows arbitrary types on the language level and provides loaders for all primitive types and currently also a small set of non-primitive types through the libraries. The extent of loaders provided out of the box might grow, likely also externalized into companion packages like https://github.com/eno-lang/enojs-exploaders/, which currently serves as experimentation ground for this.

The core might in fact be reimplemented in C or Rust and used across implementations through native bindings, although only a small portion of the actual parsing core can actually be outsourced like that, so it will depend what the actual benchmarks say then, there's also cost associated with passing around the data through bindings, the devil's in the details there unfortunately. :)

The editor in the introspection demo is atom, the introspection is based on the excellent automcomplete boilerplate at https://codersblock.com/blog/creating-an-autocomplete-plug-i... paired with a few lines that utilize https://eno-lang.org/javascript/#Section-lookup to determine the exact context for the autocomplete suggestions. Glad you like, thanks for your interest!


It seems bizarre that you wouldn't create a C version of this library. If you create a C version, everyone can write bindings and consume it from their favorite language. I think maybe you can do that with Rust code, but it's typical to write the canonical version in C.

Your benchmarks are flawed, as they only compare between different implementations using the same language. If you really care about performance, it seems bizarre that you'd use PHP or Ruby.


That road (C or Rust parsing core through bindings) will likely be taken, but for the initial development and jump-starting the ecosystem it was important for me to start with implementations that can be quickly experimented with and iterated and not spend a lot of extra time on dealing with segfaults, memory leaks, the different binding mechanisms on different platforms, etc. As things stand now, people are provided with multiple, fully functioning, pure implementations that already are faster than the majority of YAML/TOML parsers. In the coming months and years there will be plenty of time to make things even faster. :)

For me caring about performance also means caring about performance on all platforms, why not after all? You can take the tabular benchmark data I provide and paste it together, or use the raw data that is also available as eno files in the repository to compare language against language too (which I initially also did but later dropped because same-language comparison for libraries made more sense to me), if you want the quick run down as far as I remember it: mostly javascript parsers lead the ranking, ruby parsers are a bit behind and just slightly ahead of Python.


(You can with Rust, yes)


The level of sophistication you got this to over the last few months is truly impressive. Supporting several languages and IDE/editors from the start takes serious dedication for one person. Kudos! Looking forward to using it for my next database-less project.


Thanks michael! <3


I'm left looking for a spec, like a real spec-flavored specification with all the gritty details, written like you would want if you were writing a parser. It doesn't seem like there is one, yet.


You're right, not yet! Jump-starting the whole ecosystem was a major time investment for me but now that there is public exposure providing a formal spec has a higher priority because someone might actually see it and do something with it ;) Keep an eye on https://github.com/eno-lang/eno, this is where I'm working on it, I'll also announce it on the newsletter (http://eepurl.com/dA9LcH) when it's there!


Related as a better config language (but universal, not for a use case like this) is dhall (non-turing complete, type-safe, remote imports). https://github.com/dhall-lang/dhall-lang


At first glance dhall looks a bit like some of the CSS preprocessors out there.


first off, I agree that file-based content management is the way to go. thank you for working to make better tools for it.

> we wanted something 1) faster and 2) easier

1) a) why didn't you write a new library using the same spec? b) do you have speed tests to show your libraries are better than existing ones?

2) how is this spec easier? (like a basic rundown)


> b) do you have speed tests to show your libraries are better than existing ones?

The OP linked to their benchmarks in [3]: https://github.com/eno-lang/benchmarks/


Hey, first off thanks! :)

1a) because faster was only one aspect, it also needed to be easier, even more pressingly that in fact. 1b) see answer by other poster (thanks!) 2) Not whitespace sensitive, no way to enter wrong types through syntax mistakes, hardly any learning curve for users because there is so little syntax to memorize, fully localized, hand written parser and validation errors (provided on the library side) ... and so on, check out the website for more, it's all there! ;) Thanks!


Honestly, putting a pair of parentheses in all of these plain text format, you get back lisp.

Greenspun's tenth rule rules!


Can you talk about why "" We migrated a big relational research database to a file-based solution""?

I'm curious how modern DBs failed you here, more details of the problem space please.


Sure!

Cultural research = notoriously underfunded, so although they have and rely on a relational database that holds their data (previously Postgres) the cost and effort associated with maintaining and extending the system is pretty high.

With the new setup the thousands of eno files represent both the place of storage and the interface to edit the data, so by that we eliminated the development effort to provide and maintain a full web frontend to the database, and the effort to just maintain the actively deployed technology somewhere and keep it at least patched for security reasons.

All that remains now technology wise is an Atom plugin that is locally installed on each client at the institute and takes care of validating, provides relational autocomplete helpers as demonstrated in [4] and offers a few hooks to kick off local builds for multiple deployment targets and deploy them to live as well.

Hope this clarifies things! :)


Programming languages are more powerful than SQL, so it might make sense for "research" stuff.


Kinda worthless rant, but I'm tired of reconsidering configuration file markup language every couple of years, and please remind me: why TOML isn't perfect?


If you already believe TOML is perfect, why do you reconsider configuration file markup languages every couple of years?


TOML/YAML:

You could read a truncated TOML file and not realize it.


On YAML, not if you enforce the usage of document end markers (e.g. an optional feature one could require if their use case demanded it). Though that's rarely a concern for configuration file formats in my opinion. If your transport (or storage) layer is that unreliable you probably don't need a human readable format in the first place.


It looks like there is no support for writing data to this formatting. My inner hope was that there was a format I could parse, modify, and write again preserving as much comments and formatting as is reasonable.

I looked at the Python implementation but did not see that type of functionality. Am I wrong?


ruamel.yaml in python does that to a certain degree from what I've read, you might want to check it out if yaml is ok for your usecase! (https://yaml.readthedocs.io/en/latest/)

I've given this some thought as well, and given that the eno libraries hold their own representation of data in memory this might actually be plausible to implement in some way. Still I fear this will turn out to be a hard, hard problem (as eno is not even generically serializable by design), so that's why I haven't explored it further. So for the moment I can only say - Maybe in the near future sometime, check back every once in a while! :)


I like this, but I'm wondering how strict the format is. It seems like a potentially useful format for capturing data from a non- or semi-technical user base, but then some degree of fault tolerance in the entered data would probably be desirable. Was this (or should it be?) one of the design goals?


One of the design conderations was and is that the format is very strict (and that way predictable), but at the same time as helpful as possible in identifying, communicating and resolving issues.

To that end all error messages that can occur are handwritten, fully localized and shared across all eno libraries (see https://github.com/eno-lang/eno-locales/blob/master/specific...) and the API implicitly handles them for you when you write programs that consume eno.

So basically eno does no magic fallbacks of any sort when faults occur, but it is candid and friendly about it when it happens. :)


I just read the benchmarks and was surprised at how slow the toml parsers are, I mean it's syntax is simpler to parse than YAML (assuming spec v1.2) but they where still slower...

So I guess there is simply no high performance toml parser implementations?? (in the tested languages, which don't include rust)


From what I saw, I think for at least a few parsers this might be the case because they are built on generated parser code, and it's just easy to run into unfavorable bits and pieces in the output that way, which can drag down performance completely although 95% of the parser are just fine. Technically there's no reasons why toml parsers shouldn't be just as fast or faster as yaml or even eno. :) In any case I'd be happy if the benchmarks stir up some movement and maybe kick off some high performance toml parser intiative, toml is an awesome format and also 0.5.0 was just officially released so there's a good reason to update the parsers now anyway. :)


I am working on a spectrum diagram rendering system and have been thinking hard about what syntax I should select for source documents. I'm going to give eno a hard look and the existence of JavaScript parser is not the least of my reasons. Thank you.


That's fantastic to hear! Do let me know via email or github etc. if you run into any issues, I'm eager to gather more insights from other people's usecases and work on improvements where needed!


This is amazing. We really needed an alternative to YAML, and this seems great.


Have you seen strictyaml?


No, but I'll look it up. The name is not bad.


Thanks for the kind words, much appreciated!


Is there support for dictionaries as list items / nested dictionaries?


Yes through sections! See https://eno-lang.org/introduction/.

You can nest as deeply as you want and multiple sections on the same level automatically turn into a list of sections. For just a list of flat dictionaries you can also use fieldsets, see https://eno-lang.org/advanced/. :)


good name, good possible alternative to yaml (I juste hate json), and not the least a ruby gem ! I will watch for the project, on how to evolves ....


How does it handle Oblique Strategies?


On npm that's in fact already covered, scrutinize the list https://www.npmjs.com/search?q=eno :)


Thanks, but I would keep using SDLang : https://sdlang.org/


Do you remember when people were writing plain C parsers first? Pepperidge farm remembers!


Why would I use a field set instead of a section? They seem functionally identical, is it meant to be a semantic decision?


Well spotted, good question!

It's also been asked in another thread on HN, I'm quoting myself here: "eno has neither indentation nor closing tags of any sort, that means if you use a section to group some values, you need to start another section to end the previous one (no closing tags!), that's why there are fieldsets, which allow short groupings that automatically end with the next field/list/fieldset."

Hope this explains it :)


no python 2 support means no python support


Python 2's end of life is in 1 year, 4 months. https://pythonclock.org/


This is a meaningless deadline, given that nobody is actually paid to support Python. Third party vendors will support it for years to come.


That was true several years ago, much less so today


Python 2 is still the only python provided out of the box on Mac.


Python 2 is EOL in just over a year. Why would anyone write for a piece of software that will be completely unsupported in 18 months?


Because, it is installed on Macs, as I said. I don't want to have to teach people how to install python 3, when they already have python 2, which is fine.

Also, redhat had promised support for a while yet, and I expect spoke to keep it working until they replace with Python 3 (or remove python altogether?)


> Because, it is installed on Macs, as I said.

Apple also only supports very ancient set of GNU tools. That doesn't mean that in the majority of cases it is a wise choice to lock yourself to those same outdated tools - especially on a platform that hasn't guaranteed the presence of said tools.

> I don't want to have to teach people how to install python 3, when they already have python 2, which is fine.

I haven't directly addressed anything to do with your personal choices. However, this thread started with the quote:

> no python 2 support means no python support

Which frankly, isn't true. Python 2 is about to be EOL'd.

> Also, redhat had promised support for a while yet

I wouldn't depend upon it being long-term however. RHEL8 drops Python 2, and RHEL 7.5 deprecated it. [0]

[0] https://www.phoronix.com/scan.php?page=news_item&px=RHEL-8-N...


Who are you teaching to use Python and a bleeding-edge library like the OP's? You're not teaching them how to install Python virtualenvs, or how to install homebrew?


Yes, but the fact that running `swift` in command line while system python is active spews an unholy amount of error messages is an indication than even there it is going away.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: