Thoughts on Canonical S-Expressions (2019)

papaver-somnamb · 2024-08-09T03:39:27 1723174767

The primary intended use case of this in contrast to say Extensible Data Notation (EDN) seems to be for faster machine processing. The necessitating of prefixing datums with their lengths (Pascal string style) alone is the clue. So an advantage here is it is much easier to place a hard bound on memory and CPU for reading this format, and confers security properties like systemically reduced possibility of buffer overflows. Good for hard real-time, for example guidance systems that do all allocation only at startup.

Beyond lists and string atoms (or whatever the actual list is), this format also makes an affordance for custom types, but as TFA points out, you still have to roll your own other / higher order data types. Data types that you almost definitely have on hand. Now we are talking about needing to do additional processing on the decoded output, just to interpret common data structures like associative arrays and sets. And as a machine-first serialization format, if you are interchanging with other people or with yourself in the future, sure hope you have full agreement on those custom types.

So what do you do: Add libs? Roll your own? Well, competing alternatives already offer that complete picture as mature, battle-tested solutions. So I'm inclined to view Canonical S-Expressions merely as a way-point on our path of technological evolution, worthy of fleeting, mild curiosity.

shalabhc · 2024-08-08T22:29:19 1723156159

I would suggest the author also look at Amazon Ion:

* It can be used as schema-less

* allows attaching metadata tags to values (which can serve as type hints[1]), and

* encodes blobs efficiently

I have not used it, but in the space of flexible formats it appears to have other interesting properties. For instance it can encode a symbol table making symbols really compact in the rest of the message. Symbol tables can be shared out of band.

[1] https://amazon-ion.github.io/ion-docs/docs/spec.html#annot

amiga386 · 2024-08-08T21:58:00 1723154280

Canonical S-expressions seem remarkably similar to bencoding as used in BitTorrent files. They both use length prefixes written in ASCII digits followed by a colon.

    Canonical S-Expression: (9:groceries(4:milk5:bread))
    Bencoding:              l9:groceriesl4:milk5:breadee

Bencoding also manages to specify dictionaries, and yet still have a canonical encoding, by requiring dictionaries be sorted by key (and keys be unique).

It doesn't have the option for arbitrary type names, it just has actual types: integer, bytestring, list and dictionary.

FTA:

> Bencoding offers many of the same benefits of CSEXP, but because it also supports types, is a bit easier to work with.

Hmm, well there you go.

whitten · 2024-08-09T08:30:31 1723192231

How would you encode a bencode where the list doesn’t end in a list ? Or is that the purpose of the letter ‘e’ ?

amiga386 · 2024-08-09T10:53:28 1723200808

Yes, that's the purpose of "e". https://en.wikipedia.org/wiki/Bencode#Encoding_algorithm

bsima · 2024-08-09T01:10:45 1723165845

> One thought that I keep having while I'm using csexp is to use the type hints to store information such as the data type.

This is exactly what edn does. Seems like the author would like edn but doesn’t mention it

https://github.com/edn-format/edn

dang · 2024-08-08T21:17:33 1723151853

We changed the url from https://en.wikipedia.org/wiki/Canonical_S-expressions to a non-Wikipedia article. (Wikipedia submissions are fine but if there's a good third-party source, those are usually preferred because they're less generic.)

Readers may want to look at both of course!

emacsen · 2024-08-08T23:26:39 1723159599

Thanks for the plug on this very old article/topic.

floren · 2024-08-09T03:45:02 1723175102

> In some cases this conversion is easy, 3:100 becomes the integer 100.

Huh, I'd have said it should become the string "100", based on earlier examples such as 5:hello

djaouen · 2024-08-08T21:03:29 1723151009

Imagine if we had this instead of shitty ass C or asm lol

Jtsummers · 2024-08-08T21:58:50 1723154330

Canonical S-Expressions aren't meant to replace something like C or assembly, they're for data serialization. It's meant to be compared against things like JSON, XML, ASN.1, or any other serialization format.

fire_lake · 2024-08-09T10:32:17 1723199537

Maybe they meant writing C in an s expression syntax? Would certainly make macros nicer.