A Core Calculus for Documents [pdf]

082349872349872 · on Dec 31, 2023

> Any document language with a foreach-loop must somehow flatten the node list to one dimension; the key design question is where in the pipeline this should happen.

Or —and hear me out— if one uses streams (monoids) instead of lists (trees) as the basic semantics, one has associativity for free:

    (s; t); u == s; (t; u)

and need not kludge up unquote-splicing or other flattening schemes.

(the brave who proceed to use streams instead of lists not just on the right hand side, but also on the left hand side of the application operator, may discover that all their "monad law" proof obligations become trivial)

digdugdirk · on Jan 1, 2024

If someone has zero understanding of what you just wrote but is curious to learn, what resources would you recommend starting with?

082349872349872 · on Jan 1, 2024

No need for category theory (although that might be interesting in its own right, later) or Haskell (ditto).

I might start with https://en.wikipedia.org/wiki/Monoid#Definition or pp17-20 of https://www.cs.utoronto.ca/~hehner/aPToP/aPToP.pdf

Monoid is a greek-derived name, but the structure itself is very simple: as the name implies, it's completely one-dimensional (aka linear: you can think of them as being like those whiteboard pens that snap together).

If I have a monoid y, I can prepend x to it to get (x;y), or I can append z to it to get (y;z), or I can do both, in either order, to get (x;y;z). You are certainly familiar with this behaviour for addition: 1+(2+3) == (1+2)+3, and may be familiar with matrix multiplication, which differs from elementary school multiplication in that A*B is no longer necessarily equal to B*A, but it is still always true that A*(B*C) == (A*B)*C.

This is distinct from the behaviour of lists, where one order will produce [x [y z]] and the other order will produce [[x y] z] — despite having the same leaves, they are different lists, with distinct nesting patterns.

Finally, what distinguishes monoids from semigroups is that they have an identity: if we call it ∊, then we have that ∊;y == y == y;∊ (in practice this is not a big distinction, because even without a natural identity one can almost always add a formal identity) Note that you probably are already very familiar with identities in various contexts: 0+2 == 2 == 2+0, 1*3 == 3 == 3*1, and ""++"foo" == "foo" == "foo"++"", etc. (and you may be familiar with the difference between finite automata which allow identity transitions —easy to build, not so efficient to use— and those which do not —efficient to use, not so easy to build—).

Does any of that make sense?

maxk42 · on Jan 1, 2024

How does this differ from a one-dimensional array?

nyssos · on Jan 1, 2024

The other replies are massively overcomplicating things. A monoid is

- a set `X`

- together with an associative binary operation (let's call it `+`)

- and an element (`0`) such that `a + 0 = 0 + a = a` for all elements `a` in `X`.

Monoids are a class of structures, the collection of all arrays of a given type (with concatenation as the monoid operation, and the empty array as its identity element) is a particular example of a monoid.

Other examples include the natural numbers under addition, the integers under multiplication, functions `T => T` under function composition, booleans under and, booleans under or, and strings under concatenation. They're one of the simplest and most common algebraic structures.

maxk42 · on Jan 6, 2024

Thank you. In ten+ years, I think this is the simplest and clearest explanation I've read.

082349872349872 · on Jan 1, 2024

It depends upon what you mean by "a one-dimensional array".

In the C or Fortran sense, it's easy to work with individual elements of the array, but not so much the array as a whole.

(in the example above, both y and x;y;z are values in the monoid, conceptually at the same level, like "f" and "foo" are both strings despite one being length-1, and like 5 and 60 are both numbers despite one being prime)

In more recent, but still algol-like, languages, it's possible to work with slices of arrays as well as individual elements, but often these slices are views backed by physical arrays, so slices and arrays are similar but still not equivalent.

In array languages (the APL family), there is very little difference, as it is perfectly natural to treat entire arrays (or parts of them) as values and prepend or append them to each other.

(pedantry: these languages go well beyond the 1D model and allow N-dimensional cubical structures; after they started to allow ragged structures as well they came much closer to the tree-orientation of LISP-likes)

As stated earlier, I think of whiteboard pens (snapping top to bottom, resulting in something equally snappable), or the bytestreams that flow through Unix pipes, where there is always a before and an after but nesting isn't part of the basic deal.

I believe many things in CS are best treated (involved fewer arbitrary decisions) as monoids and semilattices and interactions between these structures, but because of tradition we try to attack them with lists and sets, then wind up mired in accidental complexity trying to remove layers of nesting that would never have occurred had we started with simpler representations. (eg if you feel the need to invent a "programmable semicolon" to ensure that all your lists get flattened as-you-go or the need to make "samefringe" break free of normal stack discipline, you may well make great strides in implementing Haskell or Go, but you may also be doing work which could have been completely avoided [ok, at least abstracted away] by a different choice of representation)

Finally, note that this whole digression was due to the (in my view "accidental") problem of unquote-splicing: using monoids (a) removes the need for two unquote forms, but (b) doesn't prevent anyone from expressing their computed SGML fragments via a monoid of tree values; a "forest" if you will.

fooker · on Jan 1, 2024

Fortran is literally a language designed to work with (multi-dimensional) arrays as a whole.

dreamcompiler · on Jan 1, 2024

Ummm...no.

Fortran was originally designed as a language that made dealing with numbers in arrays easy, but the arrays themselves were containers that had a fundamentally different type than numbers. In a true array language like APL there's no difference.

Modern versions of Fortran have blurred the distinction somewhat but in older Fortran arrays were very different from numbers. Arrays couldn't even be dynamically allocated or resized back in the day much less "added together," whatever that might mean.

An analogy: Fortran has arrays like C has functions. But that doesn't make Fortran an array language any more than C is a functional language.

APL is an array language like Haskell is a functional language.

In Fortran, numbers are the fundamental data type. Arrays are just collections of numbers. In APL, arrays are fundamental. A number in APL is just a lonely array.

selecsosi · on Jan 1, 2024

As part of the definition of the monad it includes "applying" computation on the underlying dataset (wrapping and then applying a function is defined). The mondad also defines the resultant type of the operation (unwrapping) and so in another language, something like mapping a function over a dataset is a built in "consequence" of the definition.

This makes it natural to express a program as a pipeline of operations over a variety of dataset inputs by wrapping, mapping, and unwrapping operations.

An example is something like the IO monad in haskell allows a programmer to incorporate non-functional (side-effect) computations (e.g. writing to disk or to console) naturally where as a 1d list (more like a buffer) doesn't naturally express the same semantics

louthy · on Jan 1, 2024

Category Theory for Programmers [1] covers monoids (and other concepts). There’s a YouTube series of lectures by Bartosz Milewski too.

To get a complete intuition spending some time learning Haskell will also help.

[1] https://github.com/hmemcpy/milewski-ctfp-pdf

antonvs · on Jan 1, 2024

In this case the main thing being described is the basic mathematical operation of associativity. E.g. integers are associative because (1 + 2) + 3 gives the same result as 1 + (2 + 3), i.e. 6.

In that case the combining operation is addition. But we can generalize this to many data types and many possible operations.

For example, lists: in say Python, [1,2] + [3] gives [1,2,3], which is the same as the result of [1] + [2,3].

In that case Python is overloading the + operator for both datatypes, but they're really two different operations - integer addition vs. list concatenation. But in a sense, Python's overloading is in recognition of the commonality here.

Same goes for string concatenation: "ab" + "c" == "a" + "bc".

In mathematics, this commonality is generalized in an algebraic structure called a semigroup (https://en.wikipedia.org/wiki/Semigroup) - "a set together with an associative internal binary operation on it." The set can contains all the possible values of the data type in question, such as the integers, lists, or strings in the above examples; the operation is the combining operation on those values - addition, string concatenation, list concatenation. But there are many more types and operations that can fit these criteria.

This is useful in both mathematics and programming because it allows you to generalize both mathematical proofs and programs.

Of course the semigroup is a very simple structure. One common thing that's often needed when working with such structures is an "empty" value - for example, zero for numbers, [] for lists, "" for strings. Given an empty element, we can add another rule in addition to the associativity rule - if you combine a value with the empty value of that type, the original value is unchanged. E.g.:

    x + 0 == x for all integer x 
    s + "" == s for all string s
    l + [] == l for all list l

A semigroup with an empty value (actually called an identity element) is called a monoid. All the examples so far are both semigroups and monoids.

Note that integer addition with the identity value of zero forms a monoid; but integer multiplication with an identity value of 1 also forms a monoid, since x*1 == x.

Having clearly defined, general structures with strict rules can be very useful, in both mathematics and programming.

Of course, in programming, you're probably already familiar with using "interfaces" or "traits" to do something similar, but these mathematical structures have essentially been aggressively refactored down to the simplest possible structures, and then more complex structures are based on those simple components.

The semigroup wiki page above has a diagram of some of the mathematical structures in question and their relationships, and here's a diagram of some of those structures in the Haskell language: https://wiki.haskell.org/File:Typeclassopedia-diagram.png

Haskell can be worth learning just to get some exposure to these concepts. The concepts aren't exclusive to Haskell by any means, but Haskell is the most widely used language that provides direct support for them.

There's also material out there that uses Haskell to teach these concepts. One example is "The Haskell Road to Logic, Math and Programming": https://fldit-www.cs.tu-dortmund.de/~peter/PS07/HR.pdf (you can buy a paper book version of this if you want.)

antonvs · on Jan 1, 2024

> if one uses streams (monoids) instead of lists (trees) as the basic semantics, one has associativity for free

I'm pretty sure that Krishnamurthi in particular is very well aware of monoids. (I'm less familiar with the other author.)

Are you thinking that the document's primary structure should be monoidal? That doesn't seem viable in the general case. You can, of course, represent documents in two different ways: a tree-structured index over a monoid or a monoidal representation over a tree - but that has a cost.

contravariant · on Jan 1, 2024

I think their choice of distinguishing stream level semantics from tree level semantics is probably correct if you want to talk about general documents.

I mean an HTML document is much better modelled as a tree of nodes than a long string.

clircle · on Jan 1, 2024

There will be a smidgen of irony if the authors have to convert this to Word to submit to their journal.

ashton314 · on Jan 1, 2024

I have yet to see an ACM conference that requires Word. I am a pretty new grad student, but still. I think all the CS (esp. PL folks) would throw a fit and boycott such stupidity.

But yes. Oh how ironic that would be. It hurts me just thinking about it.

lopsotronic · on Jan 1, 2024

This. Is. Pure. Catnip. For. Me. What a new years present. Thanks for sending this along.

As always, a bit miffed they don't mention Asciidoc, which has formal functionalities in core (transclusion, conditionals) without needing template languages or entering extension-landia.

But about that - I'm not convinced that non-natural language functionality like transclusion even belong in the content layer. Tech pubs teams always, always, always underestime the complexity of an interface between natural language and formalist structures (transclusion, conditionals). Ah, and the authors go through this. Boy oh boy, can I say thanks again? I've been looking for a paper in this vein for literal decades, as I lack the deep math.

macmac · on Jan 1, 2024

The arguments against for loops in Scheme returning the result of evaluation of the last form seems to be confused to me. This is what all lisps that I know of do for all blocks/functions.

adrian_b · on Jan 1, 2024

The subtitle "Lambda: The Ultimate Document" is much more informative about the content of this paper than its actual title (for those familiar with the series of papers that have generated the programming language Scheme).

nemoniac · on Jan 1, 2024

Yes indeed, "Passive documents and active programs now widely comingle", but they commingle more widely in ways that this article does not even mention, let alone discuss.

It's interesting to see a discussion of the "boundary of content and computation" but surprising that there is no mention of computational notebooks such as Org Mode and Jupyter where that comingling takes on another level of interaction between the text of program code and its inputs and outputs.

Perhaps it goes beyond the scope of the article but there is a whole level of documents that its taxonomy does not (yet) consider.

tgbugs · on Jan 1, 2024

They explicitly exclude computational notebooks in this paper (see page 5). While they don't mention org-mode I'm fairly certain that a significant portion of org functionality can be understood using their concept of reforestation, at least for org-export functionality.