TeX and Literate Programming (and Lisp) are my fundamental, day-to-day tools.
Code it, explain it, generate a Literate PDF containing the code.
The programming cycle is simple. Write the code in a latex block. Run make.
The makefile extracts the code from the latex, compiles it, runs the test cases (also in the latex), and regenerates the PDF. Code and explanations are always up to date and in sync.
You may be interested in using Org [1], in conjunction with literate-lisp [2] (for Common Lisp) and literate-elisp [3] (for Emacs Lisp). Org provides various outlining commands (among other things), letting you view your program at different levels of granularity. literate-lisp/elisp advise the Lisp reader so the Org file can be loaded and/or compiled directly, without requiring tangling. Consequently, tools like xref will jump to the source block in your Org file, rather than the tangled source. I wrote a little hack/guide to extend this to errors raised in Emacs' compilation-mode. [4]
(Unfortunately, the package that lets each Org source block behave as though it was using the corresponding language's Emacs major mode - poly-org-mode - has a ton of bugs. It was part of why I stopped using literate programming entirely for later projects.)
I should have mentioned that emacs is also a fundamental tool for all of my work. But since everyone, everywhere uses emacs that would be stating the obvious :-)
I tried orgmode. I even attended a course at CMU that used it for the "live notes". It is excellent for teaching. But it has the same flaw as the "live notebook" idea. There is no generally accepted structure to the approach.
Everyone "understands" books. They have a preface, chapters, an index, a bibliography, pictures, credits, and a table of contents. Literate programs leverage that shared understanding of the structure.
Think of a physics textbook. If you just copy every equation out of the book then you "have the code". All the rest is explanation. They belong together so the explanation and equations (code) are intermingled.
I'm a "primitivist". I work in straight text at an emacs buffer in fundamental mode.
The point of my code is to "talk to the machine". The point of my literate program is to "explain to other programmers (mostly 'future' me)". Note that this is NOT DOCUMENTATION. It is explanation, best presented in book form.
I work with the author of your link [2], and his literate programming documents for Common Lisp are awesome. Watching him during Zoom screen-shares is fun. I am not currently a user of literate-lisp but it is on my non-urgent to-do list.
Not really a programmer (sysadmin) but I've tried the literate way with org-mode for my stuff and I use LaTeX for all "pretty docs" needs, however while for document production LaTeX excel literate programming AS A MERE CONCEPT for me does not work that much:
- simple things became uselessly lengthy and prolix
- complex things became too big to be easy read, while the equivalent "pure-code with comments" and a small new developers doc while fail to give an equally deep knowledge of the code base is FAR more quick and easy accessed.
Maybe it's my style, I do not know but while I do not feel LaTeX as painful, since most my docs are of the same kind so a template/class produced once with calm get reused issueless, I can't really digest literate programming...
You may in interested to learn then, that for the past 10 months I have (slowly) been working on a LaTeX package that does exactly this [1]. It writes all the text from between come \begin{code} blocks out to a file, as well as use minted/listings to highlight it in the generated pdf. (exactly what you described but in LaTeX package form).
I am currently a physics student and one of the main problems that I have day-to-day coding is being unable to include diagrams/equations/pictures in the source code. Moreover, because there is a fair amount of 'domain complexity' the comments are often longer than the code itself.
The package is still very much unstable and not even close to being ready for any sort production. However, I did use it for my masters thesis [2], and it was quite an interesting (in a good way) experience. (I recommend everyone to actually try to write literate code at least once).
About 12 years ago, I used LaTex to write a semantic web book, but in 2 editions: Common Lisp and Java. It was very cool to have 2/3 of the manuscript common text, and about 1/3 unique to the programming language.
For the last two evenings, I have been revisiting my old manuscript materials because I am thinking of updating the material and also creating additional editions for more programming languages: Python, JavaScript, and maybe Swift and Prolog.
I had forgot how cool TeX and LaTex are. It was very easy to start working with again, even after a 12 year gap.
I'm curious what literate style you use? I have tried getting into the style a few times, but I often find I'm not taking advantage of the full scramble of the code that is often possible.
Note that the C program is just a hand translation of the Lisp code.
The lisp code has an explanation and the necessary latex macros. The idea is to scan the latex, find each named code 'chunk', and add each one to a hash table. Then the hash table is scanned to dump the requested chunk to stdout. For example:
\begin{chunk}{part1}
code for part 1
\end{chunk}
Ordinary latex code between chunks.
\begin{chunk}{part2}
code for part 2
\end{chunk}
\begin{chunk}{part1}
this code will be appended to the prior chunk
\end{chunk}
I've been struggling with keeping track of research experiments and code at the same time. This seems pretty cool! I like how this method is language agnostic and uses "matured" tools. Question: I'd love to give this a try; do you have any public code snippets?
Even if LaTeX3 brought some nice things to the TeX world, TeX programming is every year lagging more and more behind modern programming languages and practices. SILE has a long way to go before it can compete with math-heavy publications made with TeX, but I think it is already a good solution for typesetting technical documentation or prose fiction.
Better tools and formats have existed for sometime, the problem remains adoption. Part of the issue with mentioning a flawed tool is the supporters who come out of the woodwork to insist that the sharp edges are required, and anything more user friendly would needlessly curtail the power they require.
Personally, I have given up on Latex and write documents in pandoc markdown which I can then convert to pdf through the intermediary latex.
For prose fiction, I don't see much here that is going to drive people away from Indesign. Designers and typesetters aren't really interested in doing that kind of work in a system that would require generating new output for every correction. Even relatively simple tasks like balancing columns in a spread would become a pain.
The whole point is you shouldn't have to manually balance columns in a spread. Obviously a round trip from source to output to fiddle with balancing every spread would be tedious. But if you can say "For this content I want to use spread X defined as this shape and balanced in this way" and then be able to fiddle with your content and have your spread always work, then where is the tedium? That's what SILE is supposed to do.
Many publisher's workflows involve a round of content editing bouncing a word file back and forth, then a period where a typesetter uses InDesign or similar to lay it all out, then it goes to press. You can't keep copy-editing after the designer takes over. With a workflow using source documents in Markdown and typesetting handled by SILE I am able to allow copy-edits to book manuscripts up until minutes before going to press.
Does it come with a context-free syntax? If not, that would be a grave mistake. Being unparseable is one of the biggest problems of TeX, IMO. Not the least because you cannot automatically work with TeX "documents".
TeX is a program for typesetting, and IMO if one is a programmer, it is best used as a target or output format: do all your "document" stuff somewhere else (don't use LaTeX), and use TeX just for typesetting. You can have your document in some parsable format, but not require it of the output .tex file, just as we expect high-level programming-language source code to be parsable but don't really expect to get much utility out of parsing assembly or object code.
Knuth initially added macros to TeX just as a convenience to save some typing here and there, and the tradition of (ab)using that macro ability as a language in which to program, inaugurated in a small way by Knuth himself and later taken to dizzying heights by others (such as Leslie Lamport with LaTeX, and hundreds/thousands of assorted "package" authors since, and the LaTeX team currently), is IMO a mistake.
If you use TeX just for typesetting, it is quite a nice program that does what you want, and has very good error messages (yes) and debugging output.
> TeX is a program for typesetting, and IMO if one is a programmer, it is best used as a target or output format
That's the only thing I ever found it to be good for. Even then, it's a PITA. The world would not be a poorer place if it disappeared and was replaced something that that separated style definition, markup, and content in a reasonable fashion, and used modern language techniques (just a CFG would be a start) for the first two.
Not totally disagreeing but note that it is expected to be printed, and modern readers often just look at a PDF which makes the weights too light. Also it has some aliasing problem (or may be named differently, but basically when you view the PDF some words look like a line is skipped but when you zoom or print it out it has no problem.)
Also modern LaTeX is supposed to not use computer modern anymore, but the Latin Modern (which imitates computer modern but fixes some of its shortcomings.)
I work with TeX quite a lot and I can't even really understand intuitively the principles of the language design let alone the grammar. Like 15 years later and I'm still copy and pasting spells. I'd love to see an CST of my document and figure out why I'm writing the characters I'm writing!
My experience with TeX is a complete opposite. I feel like I can do pretty much anything in plain TeX, with a very good feel of what it is doing in its 'guts'. I wrote parsers in compilers in pure TeX (the latter was admittedly a silly academic exercise). It took a very thorough reading of 'The TeXbook' but after that I am constantly amazed at how deliberate and tasteful Knuth's choices were. TeX is simply ... elegant. I am not sure what 'modern' bells and whistles TeX lacks that everyone is so eager to see. CSS is a nightmare, even with recent additions, it seems clunky, most of its choices look like an afterthought.
It is also somewhat puzzling that same people who advocate for a better grammar would also push the idea of a better markup. So which is it: a better programming experience or a pure markup language that one wants? I feel TeX strikes a near perfect balance here. Most importantly, TeX has stayed nearly unchanged at its core for almost forty years! It is its greatest strength, not weakness. I cannot see what latest fads can improve in TeX
I'm far from an expert but I use LaTeX, and I distinctly remember there is some oddity such that you have to put an empty comment at the end of a specific line in the middle of some code that generates multi column figures, or else the layout gets messed up. That sort of thing just shouldn't be possible in anything that has the right to be called "elegant".
>"The % (between \end{subfigure} and \begin{subfigure} or minipage) is really important; not suppressing it will cause a spurious blank space to be added, the total length will surpass \textwidth and the figures will end up not side-by-side."
I think many of the ills of this type this can be traced, at least in part, to TeX's only-partial whitespace invariance.
The shifts in behavior with an added newline (or absence thereof) or a comment, as above, simply wouldn't happen if the language ignored whitespace outright.
Of course, requiring tags for newlines/paragraphs would make raw TeX as un-readable as raw HTML. It might be worth the trade for the predictability, but for the era in which TeX was written, one without IDEs or LyX, I suspect that Knuth made the right choice.
I am not going to argue about the tastes here but please consider the fact that TeX is a typesetting language. Ignoring whitespace altogether would make .tex files unreadable (as a comment below states). I am again puzzled by complaints about whitespace treatment in TeX (which has facilities for handling it, like \ignorespaces) at the same time having no objections to the choices made by, say, Python (which to me makes a totally wanton decision about such, also I am not suggesting that you personally are inconsistent in your opinions). Also, poorly written LaTeX macros in this case are not a reason to blame TeX itself.
I suspect that the vast majority of people who say they've used TeX have actually used LaTeX - and don't realize just how massively powerful TeX itself is. Much of the LaTeX macros are quite simple; TeX does the heavy lifting for so much of it.
An amazing thing about the TeX Book is that due to TeX's stability, it's mostly still current. It was written in a day when the graphical computer was a special thing that required a trip to the lab. So the stuff it has about how to run TeX is dated.
But once you know how to run TeX at your "site" (as they would have called it then) all the stuff on the language (that is, practically the whole book) is still good.
> one of the things that TeX can’t do particularly well is typesetting on a grid. This is something that people typesetting bibles really need to have. There are various hacks to try to make it happen, but they’re all horrible. In SILE, you can alter the behaviour of the typesetter and write a very short add-on package to enable grid typesetting."
> For two-sided prints with very thin paper, matching base lines would look much better. Especially in two-column documents it may be desirable to have baselines of adjacent lines at exactly the same height.
TeX usually tries to adjust the space between lines so that a single typeblock is more visually appealing. Unfortunately, doing this adjustment independently for side-by-side columns leads to output where the text lines don't line up, and that is way less visually appealing.
If you do a Google image search for [bible] and look at typical layouts used, I think this might be apparent.
Less obvious is that bibles are often printed on thin paper to fit a big document in a small space, so having things line up on one side of a leaf to the other is more important than in most other books.
Context-free languages are the standard for decades now. They haven't been in the very early days of computing due to a lack of formal education of programmers and because they come with a certain, but tiny, memory requirement.
What Knuth did with TeX was a violation of KISS. There's no good reason, but several downsides, to mix markup, code, and interpreter state like this.
I am not sure what you mean by context-free languages because literally none of the modern languages have context-free grammars. This includes HTML and XML although I would not call them programming languages per se. C++ grammar is undecidable. On the other hand, TeX's core typesetting language (i.e. not using its macro facilities) is context-free. Quite trivial in fact. Pascal, which is the language used to build TeX is context-free (even LL(1)), as well, which cannot be said about, say, C.
I would also be quite hesitant to claim that Knuth, who literally pioneered modern LR parsing theory (and deservedly got a Turing award for it) somehow lacked knowledge or awareness of cutting edge parser design techniques.
Finally, there is absolutely good reasons to design TeX as Knuth did. That TeX withstood the test of time for over three decades is a perfect testimony to that.
As someone who is intimidated by the thought of programming TeX, I think the following comment which asserts that it is badly designed seems unlikely to be coming out of nowhere. (MetaFont is apparently better than TeX)
I'll quote the key part here in case that link stops working.
> TeX has two programming systems, the "mouth" (which does macro expansion essentially) and the "stomach" (which typesets and does assignments). They run only loosely synchronised and on-demand.
> For programming purposes, they are a pairing of a blind and a lame system since the "stomach" is not able to make decisions based on the value of variables (conditionals only exist in the "mouth") and the "mouth" is not able to affect the value of variables and other state.
> While eTeX has added a bit of arithmetic facilities that can be operated in the mouth, as originally designed the mouth does not do arithmetic. There is a fishy hack for doing a given run-time specified number of iterations in the mouth that relies on the semantics of \romannumeral which converts, say, 11000 into mmmmmmmmmmm.
> Because of the synchronisation issues of mouth and stomach, there is considerable incentive to get some tasks done mouth-only. Due to the mouth being lame and suffering from dyscalculia, this is somewhat akin to programming a Turing machine in lambda calculus.
> TLDR: the programming paradigm of the TeX language is awful.
This description of TeX's process (originally by Knuth himself) is quite tongue in cheek but I fail to see why this makes TeX's programming paradigm 'awful'. Think of it this way: both C and C++ (even Rust has macros, which I heard is the new hotness) have a preprocessor, which is exactly what TeX's 'mouth' is. The fact that the language of the preprocessor looks very similar to the main language does not make it weird (Rust again...). 'Loosely synchronised and on demand' is a very uncharitable description of what is happening: would you call JavaScript paradigm awful just because events are asynchronously processed? One tricky part of TeX is output routines running asynchronously with the main paragraph typesetting algorithm. This is indeed a compromise made by Knuth simply because in his days the thought of keeping the whole book in memory (which is what a perfect page breaking algorithm would require) was pure science fiction. A mildly augmented design will fix this (indeed, some modern implementations offer exactly this) but the solution found by Knuth is still incredibly beautiful. Also, 'the mouth does not do arithmetic' is not true at all. While I am not a big fan of LaTeX, its developers are incredibly talented programmers and they have implemented a fully expandable (in TeX parlance, taking place entirely in its 'mouth') floating point library in TeX (it uses some eTeX extensions for efficiency but they can be avoided as well; I speak from experience, since I do not use eTeX).
Finally, please do not let your fear stop you from enjoying TeX: most users do not do any serious programming and its output is aesthetically superb. If one day you decide to write some tricky macros, there is a vast TeXlore that awaits.
I'll probably try SILE just because it does not support macro.
I personally think the adoption macro is the embodiment of non intuitive programming languages and why a modern programming languages, for example Rust is supporting macro is beyond me.
I'm not alone in this regard, D a modern successor of C and C++ does not support macro.
Hopefully one day Walter will write document on "Macro Considered Harmful" to enlighten the subject.
I respect your stance (although I disagree strongly) but you might consider something other than Sile, as Chapter 6 of Sile's manual is called 'SILE Macros and Commands'. Also from reading about D, his strongest objection to using macros is that they do not respect the scope. This is probably the main source of macros' strength as code generators. Again, I respect your aversion to macros, I am not sure I understand the reasons, though.
The SILE approach for macro is very limited and sane. I think that's purposely to make sure that it's simple to understand, intuitive and maintain. More complex activities or programming is delegated to Lua as mentioned in the Simon's TUGboat paper:
> SILE’s \define command provides an extremely restricted macro system for implementing simple tags, but you are deliberately forced to write anything more complex in Lua. (Maxim: Programming tasks should be done in programming languages!)
It's technically not macro as mentioned in the same page link:
>C preprocessing expressions follow a different syntax and have different semantic rules than the C language. The C preprocessor is technically a different language. Mixins are in the same language.
D shows that it's possible to be a very powerful and potent programming language for DSL, etc, without all the conventional macro abuse and misuse.
If anyone asked me how come Python has becoming very popular nowadays, the answer will be it's one of the most intuitive and user friendly programming languages ever designed, and that's mainly due to the fact that it does not support macro [1].
It's not a C preprocessor macro. Those are a very specific and limited subset of macros, and that is the main reason why they're criticized.
Broadly speaking, if something can generate parametrized chunks of AST on the go, it's a macro. Which is exactly what D mixins do.
As for Python, it doesn't really need macros because everything is runtime. But if you need it, compile() and eval() are there, and they can be abused much worse than any macro facility ever could.
It really depends on your definition of macro [1]. Modern programming languages like Scala, Julia, Nim, Rust etc are supporting hygienic macro.
According to D authors Mixin is not macro because Mixin still looks like D language while supporting macro would meant the inclusion of macro (hygienic or not), and depending on the implementation, most often than not will render the language unrecognizable (become non intuitive). Again if someone is purposely writing a DSL in D for generating a new language's AST then that's perfectly fine. Essentially the output is another programming language because it is the intentional product of the exercises but the original programming language is still intuitive.
I suppose you can have runtime macro like VBA but Python language designers refused to incorporate it due to issues as mentioned beforehand.
Surely hygienic macros are a subset of macros, as follows from their name?
I still don't see why D's mixins aren't hygienic macros. And the D documentation doesn't make that claim, either - it only highlights the difference with C preprocessor, not macros in general.
OTOH I can't think of anything in VB6/VBA that resembles macros, runtime or otherwise?
Mixin is not macro according to D authors. If you dig into Walter's past posts, perhaps you can find better explanations there than mine.
You can have macro in any runtime for example Ruby language, and that's the main reason for the maxim "Rails is not Ruby". That's also the reason Ruby is so powerful and to create a similar feat in Java will be close to impossible [2]. But at what cost? That's why Python now is much more popular even though Ruby is much more powerful. You can, however, create Rails clone in D language without all the macro nonsense [3].
> I work with TeX quite a lot and I can't even really understand intuitively the principles of the language design let alone the grammar.
Read the TeXbook. No, really, it is a beautifully written text. The program it describes is quite elegant. Most of the complexity comes from modern LaTeX packages, not from the core tool.
TikZ is the worst for this. I don't usually have any problem picking up esoteric programming languages, but whatever the hell is going in with TikZ/TeX will be forever beyond my comprehension.
Oh yes, I find TikZ very useful and it's an amazing piece of engineering. My point is more that I find it very difficult to understand how it works on a deep level.
It's damning that there no truly functional grammar or spellchecker for latex, even ignoring crazy things that can be done with macros.
If some billionaire wanted to 'move the needle' and change society for the better I think building a truly better latex would be worth it. Each year some of the most brilliant people in the world sacrifice on the altar of latex, that could instead be spent on more productive research.
> Each year some of the most brilliant people in the world sacrifice on the altar of latex, that could instead be spent on more productive research.
Perhaps. But I'd wager that more time is lost to the bureaucracy of universities and grant administration than to fighting LaTeX quirks. I'm not sure that "fixing" (La)TeX would result in a noticeable improvement in research that could be done.
I'd like to know what a "better LaTeX" means, and how that would move the needle, given that the only things you can't easily do in modern (Xe)LaTeX are things that you shouldn't be using TeX itself for in the first place, but have other tools do so you can simply embed their output. Especially given the availability of WYSIWYG-esque editors with "one-click" compile buttons that run all the nonsense that we used to have to do by hand back in the dark days of "you want a GUI? Get out of my office, you simp" paired with the modern "just install missing packages without asking me, that's the whole point: I do the writing, you make that frictionless."
I have a difficult time with this comment. everyone I know who uses Latex including myself is painfully aware of its shortcomings and fairly steep learning curve.
maybe the graphical environments eliminate some of the pain around embedding images, or using wrapped figures.
but there is no way they manage to get around the fundamental lack of composibility. that sinking feeling you get when you just add one more thing or try to put an X inside of Y and the whole thing falls over in a pile.
As someone who's written multilingual books with custom typesetting needs in it, the learning curve can definitely be steep, but as someone who's also written regular old papers in it, the learning curve is "use our template", and that's kind of it. The only real learning curve is how to write maths, as long as you don't fall in the trap of trying to make LaTeX do graphics. For the love of all that is sane, use the tools best suited for that job instead. For everything else tex.stackexchange.com already has at least three posts covering any "how do I..." you might ever have.
And as someone who's even written packages: ...you're going to end up needing low level TeX, and TeX is insane, don't try to write packages. It's a world of hurt. (But then most people will never need to write their own packages)
Can you talk more about what you mean with the lack of composibility? Or of course link to some page(s) that makes that point?
Proper programming language that is not meant to carry old silliness such as "oh back in the day memory was important so you have 256 registers" or "fixed point arithmetic should be fine" or "it's meant for typesetting don't use it for complicated stuff so a figure can float 15 pages whatever".
We have PDF as de facto media for handling output and text files for holding the source files. Better TeX means we are not using an archaic madness to go from Txt to PDF but able to utilize modern IT tools.
TeX is just old and died decades ago. Academia is holding it as a hostage because papers...
Disclaimer: ex-academician and a low key maintainer of TikZ
Edit: TeX as a language is horrific exactly what you would you expect from a computer scientist so it's not that Knuth is to blame, it's the rest that didn't get his vision such as academia still leeching off of it instead of using taxpayer money to generate a proper tool
So... XeLaTeX? Which has been around for over 15 years now? Modern unicode support out of the box? check. Compiles to PDF instead of DVI because obviously? Check. Normal system font support? very check. More than 256 registers? "what the fuck is a register and why would you ever need to care, it's not 1980 anymore."
Equal disclaimer: low-key maintainer of ucharclasses.
No difference, same TeX behavior with font support. Using system fonts cannot be a highlight of a tool in 2020s. Mostdef not modern. Also cough... "xdvi"... Cough
This is a red herring, I assert. There is no "boiling the problem down to a simple grammar" that has promise for a total solution. There are plenty of ways to combine the different parts that work well, and even more ways to combine them into nonsense. Disallowing the nonsense is probably not going to make the ways that work well any better.
Yes it comes with its own context-free syntax that (despite some surface resemblance to TeX) is quite easy to parse. It can also ingest XML and some other formats.
TeX is one of those bits of software that's so complex to replace, it'll take another few decades (and likely a few false starts) before we can get there.
The fundamentals of TeX's typesetting are amazing, but everything else is just bolted on. You'd probably want to rebuild it with its own, custom language, perhaps inspired more by modern XML/markdown rather than the old TeX language.
The biggest problem that we're going to have moving past TeX is that it's something of a standard for math representation in text. MathML never _really_ took off. Re-training millions of people skilled in typesetting math is going to be _tough_.
TeX itself is not really complex. The implementation is fine. The hard part is the ecosystem.
Any contender is going to face a daunting task of explaining why adopters should throw away 50 years of collective work of thousands of people that went into TeX packages in CTAN. Math notation is at best 1% of the reasons why people use TeX. Packages are.
The 66-page "Breaking Paragraphs into Lines" is a research paper, and much of it is is the "complexity" of the problem itself, rather than TeX's elegant solution in particular (e.g. pages 48–59 are about history). In a sense, the generality shown in the paper (the various different things that TeX's algorithm can accomplish) shows how TeX reduces the real-world complexity of typesetting with a simple approach that covers all of them.
The book "Digital Typography" is, somewhat misleadingly, not a book that was written about digital typography, but simply a collection (https://cs.stanford.edu/~knuth/selected.html) of several papers that Knuth wrote about TeX and Metafont and some related topics, over several decades.
In any case, I agree with the comment you're replying to: if you use TeX itself without dragging behind you the entire ecosystem and all the packages (in particular, use plain TeX rather than LaTeX) you'll see it's not really complex, and the implementation is fine. Especially today, with things like LuaTeX scripting (and, say, opTeX) it is feasible to bypass LaTeX and packages and do things yourself — that is, unless external circumstances require you to use LaTeX and packages etc, which is still often the case (e.g. submitting papers to journals).
I think the argument comes down to complexity. Getting Markdown --> Pandoc featureset up to a reasonable fidelity is the next stage IMO, especially if Markdown can be made to support the most common downloaded TeX extension packages.
Not going to happen. If you look at the CommonMark discussions, you will find people adamant that X features not be included for varying reasons.
I think the best future option is going to be djot[0]. It is being created by the author of pandoc, who might possibly be the most qualified person in the world to appreciate all if the nuances of marking up text and parsing it.
Clean slate design. I think pandoc markdown is a superset of markdown, which must parse the ambiguous parts of the spec. Djot has a goal to eliminate the corner cases and be able to represent all things without requiring the html escape hatch (eg bold text within a word).
Thanks, that makes sense. Although I’d like to point out that one of the listed problems with commonmark, and the problem you mention, are not problems with Pandoc markdown:
You make markdown support TeX by running it though a tex compiler. The other way around is never in a million years going to happen, because markdown is all about not adding additional markup like \commands, or extensions without which a markdown renderer wouldn't be able to generate the content it's supposed to.
(which is why markdown still doesn't officially have any maths support, only certain non-standard flavours do)
I never figured the purpose of MathML. It's ugly/impractical to type as a text-based format. Verbose and inefficient compared to a binary format. A middle ground some people might like, but I can't stand.
I'd prefer one of these:
- A human-writable text format that's displayed in a formatted way, eg TeX or MathJax
- A binary format that represents structs and enums in code with a clear documentation of how it's packed and unpacked. That can be formatted for reading, and perhaps has a TeX-like API (or is just constructed programmatically using an API)
MathML, in contrast, provides neither the speed and small size of a binary format, while being unreadable and unwriteable.
I think it was designed as a low-level browser standard, not an authoring language. It is supposed to provide a stable, standardized target for authoring languages to compile to. Think of it as WASM for math markup. It makes sense if languages for authoring math is an unsettled design space where we need diversity and experimentation. As a browser standard, like WASM, MathML should enable people to experiment with new ideas and hopefully evolve a better authoring language than we would have got if we had tried to invent a language and standardize it at the same time.
Ideally there would be a replacement of TeX that shares little of its non-math typesetting capabilities but retains (or only slightly modifies) the math syntax. The rest of the system is where the the problems lie, or at least where I waste a lot of time on little things.
I always had the hope that Lout[0] would take over LaTeX: very clean & reasonably small implementation, functional programming instead of macros. I guess it's just hard to move against the inertia that (La)TeX carries.
I've had a look at Lout years ago, but I think the development stalled a bit and the source wasn't that inviting for contributors. So the PDF support came (too) late and still isn't feature-complete with the PS output.
I am still fond of the Scribe[1]-like syntax, but then again, I also liked it in Texinfo or Borland Sprint.
I did typeset in masters thesis in ConTeXt. It worked okay enough, but for me, it didn't provide any features that make up for the increased time spent searching for stuff because there's less documentation and a much smaller community.
I also used it to typeset a project report which had some msword specific formatting rules. I found it more customizable than LaTeX in that regard, but I did have to read the docs a lot more.
I didn't touch latex in years and I don't regret it one minute. Yes the result looks very clean, and yes it's the best system for writing math, but controlling the layout was an everyday fight.
I have yet to see an environment that controlling the layout is not a nightmare pretty quickly.
The entirety of the web world with their rube goldberg interactions between way too many divs isn't exactly giving me inspiration that modern approaches are going to be an answer here.
I found even typesetting math was a struggle for the layout, I still have nightmares of getting equations for my wife's PhD thesis split nicely across 2 lines.
I've never attempted to change the layout substantially except for using different document classes. I just hand off the manuscript at some point to professional editors and type setters at an academic publisher and they do their thing. It still beats everything else as an authoring tool and source for professionals to start working from.
On a scale of LaTeX=0 (absolutely inscrutable) to Rust=10 (holds your hand through the fix) I would say SILE is about a 5. Caveat, I speak as one of the authors. It helpfully traces the location of any problem to an exact location in your document, mostly clarifying what it was trying to do, has a trace stack with code locations for everything it was trying to do leading up to the problem, and in many common cases has sensible descriptive errors. That being said if even as a contributing author I only give it a 5 it clearly has room to improve!
This is the most exciting feature of Sile to me -- since I currently rely on a slow and memory hungry Java toolchain to convert docbook to pdf with fop -- but I just tried it on two docbook documents (a book and also a short article) and it failed to convert both. Apparently the docbook support is currently very incomplete:
Yes, I think Sile has a lot of promise but it really needs some work done to bring these features up to scratch for end users. And inevitably volunteer time is in short supply...
Actually it just takes XML as an input format, but depending on your XML you need to provide a class that defines how to typeset each tag. For Docbook SILE has an example class with about 40% of the possible tags defined to something sensible. If you want to typeset Docbook you'd need to round that out to cover your use case. To date it is only supplied as an example of how to process XML.
SATySFi (pronounced in the same way as the verb “satisfy” in English) is a new typesetting system equipped with a statically-typed, functional programming language. It consists mainly of two “layers” ― the text layer and the program layer. The former is for writing documents in LaTeX-like syntax. The latter, which has OCaml-like syntax, is for defining functions and commands. SATySFi enables you to write documents markuped with flexible commands of your own making. In addition, its informative type error reporting will be a good help to your writing.
The main problem is that a lot of the documentation is in japanese.
I've been using texmacs for a little over year now, and have loved it. For me, the nicest part is much simpler creation and editing of tables and aligned math. I also appreciate the image insertion; in LaTeX my images often end up not quite where I want them (I know, you can specify "here" for your images, but I'd prefer if they were just "there" to start with). Of course, depending on your use case, your mileage may vary.
For those who expect troff to be an ancient relic, Neatroff and Heirloom troff include modern features like microtypography. troff is still great for a lot of applications.
My needs are pretty simple - a program that makes grocery lists that I print. groff with the ms macros works very well. I used to use LaTeX but never figured out how to use it without installing gigabytes of stuff. This is my fault - surely I could have learned plain TeX, or learned how to work the TeX distribution to only install the stuff I needed. But ultimately it was just easier to get going with groff.
I'm excited for the project, but disappointed that they kept the arcane syntax of TeX/LaTeX. Math expressions are hideous to write and read in LaTeX syntax.
I actually love the TeX syntax for math! What I do hate about LaTeX is the arcane document formatting options—i.e. drawing a figure in place requires an entire package to be imported!
Weird, for me it's entirely the other way around. At least there exists a package for drawing a figure in place, but consistently defining and using semantic commands for math is pretty much impossible. Whenever I'm able to actually use semantic commands, I can write LaTeX math pretty much as quickly as I can write math by hand, but getting these commands to work has been a consistently terrible experience for me.
SILE can also read XML or Markdown. I use it for publishing books from Markdown via Pandoc (there are several paths from Markdown→SILE).
And for the record the native SIL syntax resembles TeX at first blush but is actually vastly simpler because it is regular and uses a smaller set of possible syntax variations. If you don't like the look or feel of the syntax you can use something else for your source format and provide your own reader that generates a document AST.
You probably meant ∑ (\sum) rather than Σ (\Sigma). The two are different symbols, and they are sometimes used in the same expressions. For example, in the field I (used to) work, a sum over all characters in the alphabet would be written as \sum_{c \in \Sigma}.
Mathematical syntax is complex, and the typesetting often becomes unreadable outside trivial cases. Using Unicode won't help much, especially because it's easy to confuse similar-looking but unrelated symbols. Proper syntax highlighting might help, but I've never seen a tool doing a good job with it.
The first obviously-better idea is to use a/b instead of \frac{a}{b} and (x) instead of \left(x\right).
The next obvious improvement is to allow (not require — allow) me to write α instead of \alpha, ∈ instead of \in, and ≤ instead of... whatever the code for that was.
Microsoft recently improved the math notation support in Word. You can press [alt]+[=] on Windows or [control]+[=] on OSX to enter "Math Mode" and begin typing mathematical characters. Word tries to guess what you mean as you type, and it's still much more cumbersome that typing plain-text code in (La)TeX; you still have to reach for a mouse for some notation.
This doesn't work well when XML is not hand-edited, but rather a dump of some in-memory structure - there are many cases where e.g. order is not important, so it can be essentially random during serialization, and that then shows up as spurious diffs.
In typical Microsoft fashion it has enough to check the box but not enough to actually be useful.
Does it generate diffs/patches that ultimately can be used to reconstruct the document? Can I take a document and apply a patch and get exactly the same as you get? Can I reorder and edit patches? The answer is no.
Interesting, I didn't know Tex users would want those sort of facilities. I've used Word with track changes in enterprise environments and it does the job, I've never had to use patches for it though (and wouldn't).
For enterprise environments, where demands usually are low to non-existent, Word does fine. In academic environments, when journals are involved, when a document is being worked on by several people simultaneously (in varying meanings of that word) and when data may come from several different outside sources, it is massively inadequate.
I think there is demand of these features for Word users in the enterprise. Version tracking or reusing templates is huge productivity boost and also helps reducing mistakes. It is only that enterprise users have never thought of that because they don't know it is even possible.
Word with integrated git-like history would be a godsend if someone could develop it. As it is, it's easier to either cripple along with track changes and friends or teach everyone how to use Markdown or LaTeX.
The LaTeX input for Word equation editor makes it a lot faster, but it is still quite limited and often doesn't do what you expect for more complicated things. It makes the easy things easier, but the hard things harder.
LaTeX is an art form. The output is perfect, based only on the code used to make it. Typesetting mathematical expressions or even creating complex diagrams using a library such as `tkz-euclide` is just code. There is no WYSIWYG. No invisible bold spaces[1]. No forgetting which changes you made to try to get it to look right. Everything is right there in the code.
(I am aware of the difference between LaTeX and TeX; I likely haven't used TeX directly but the comparisons here should still work fine.)
There is a huge difference in workflow between generating a document from a complete code file and iterating on a final result with incremental tweaks. That is why I cannot use Acrobat, Word, Google Docs, etc. for creating papers that I would use LaTeX for. It is just not the same, and it is not the workflow I desire.
One of the most important things I've found for making LaTeX documents is do not worry about what it looks like until you're almost done. Otherwise you fall into the rabbit hole of adjusting things for hours to find your work is wasted anyway because you changed that page later.
But once the text is done, it's really nice to work with, and you never "accidentally break the whole document by adding a table" - at least not irrecoverably.
I simply don't like those binary formats. I can't use sed, grep, etc. with .docx files. My workflow for doing notes is primarily CLI-based, and thus using Word/Libreoffice is something I avoid if possible and if it makes sense.
Another pro is that I can use git for tracking changes
Adobe's Acrobat editor is painful and barely usable even for ordinary prose. There's no way I'd try it for math. To be fair though, it does not seem intended for heavy-duty editing - it's more for: someone gave me this PDF, I don't have the source document, I need to make some little tweaks directly to the PDF.
Yeah Acrobat and ABBYY editors are really only useful as a last resort when you don't have time to recreate someone's PDF but you really need a couple words changed.
But they are life-savers when that's your situation.
Some more points, collaboration using e.g. git and its ecosystem, a representation that does not force you to use proprietary software (not sure about libre office for formulae), the ability to easily produce tex files by other scripts, e.g. to compile maths exercises from a set of templates.
Title is inaccurate. It uses TeX algorithms, seems (LuaLa)TeX syntax too, and "modern standards". But those don't justify calling it a rewrite. "Modern TeX-inspired system" is a better fit.
Edit: I initially said none of the upstream maintainers call it a rewrite, but I was wrong because the original author did in fact use that turn of phrase on several occasions including in the manual and in a talk.
I agree though it's not an ideal way to describe it because it has some many differences too. It's not a port of TeX to a different language. As you suggest TeX-inspired is much nearer the mark. It's a from-scratch effort at addressing roughly the same problem space. It does re-implement some of the same algorithms. One of it's input syntaxs resembles TeX (although the resemblance is not even skin deep). But no it is not a rewrite, just a new take.
I've been using TeX for decades. No, not
LaTeX, just TeX as in D. Knuth's The
TeXbook.
I've been thrilled with TeX from my first
usage to the present. Of course, when I
want to write some math, I use TeX. I
also have a collection of TeX macros I
wrote for verbatim, cross-references,
annotation of figures, foils, ordered
lists, simple lists, etc. TeX is my
standard for any good or better quality
writing, e.g., serious letters. For my
last published paper in applied math,
right, I did that in TeX (I don't like to
publish -- seems financially
irresponsible). The journal was very
happy to get my TeX source. I also
included the source of the few macros of
mine that I used in the paper, and the
journal was also happy to receive those.
For the core, original applied math for my
startup, I wrote that in TeX.
Net, I really like TeX.
For "parsing" TeX as in this Hacker News
thread, I have no idea what that might
mean or why I would want to do that. TeX,
just the way Knuth designed and documented
it are just fine with me. For me, TeX
solves a big problem; I'm just thrilled to
have that problem SOLVED; and I have no
desire to invest time or energy in another
solution to the problem.
Also relevant, in praise of simple text:
Uh, my most heavily used program is my
favorite text editor, Kedit. To me, the
most important view, at the most important
level, computing and/or computer usage
is just simple text, i.e., is still like
old fashioned typing. So, I have 100+
macros I wrote for Kedit, and some of
those help with typing TeX.
"Simple text" for computing? Yup. Early
in my startup, I decided to go with
Microsoft and Windows instead of Linux or
other versions of Unix.
For the 100,000 lines of code (~24,000
programming language statements and the
rest comments or blank lines) for my
startup (a Web site), I wrote that with
just Kedit. Once I tried to use
Microsoft's Visual Studio, and for just
the start on just a first program in
Visual Basic .NET, I got a big directory
of a lot of files I didn't understand. I
could see I would need to invest a lot of
time and energy into getting the software
for my startup to run via Visual Studio
and could see no important reason why I
should make that investment. I've been
happy with that decision: For Visual
Basic .NET, I type that in via Kedit.
In praise of Visual Basic .NET:
I know; I know; according to a lot of
people and industry norms, I'm supposed to
use C, C++, C#, or other programming
languages in the family of C and
certainly nothing called basic. Well,
as I recall, the original C documentation
admitted that the syntax of C was
"idiosyncratic". E.g., it appeared that
i = ++j+++++k++
was legal -- increase each of j and k by
1; add them; assign the result to i; then
again increase each of them by 1. Once I
tried this statement on two C compilers,
and they didn't agree on the results! So,
it seemed that the syntax was too
"idiosyncratic" even for the compiler
writers! That was enough to warn me to
stay away from C, and mostly I've been
successful at that!
Then much of what I like and want from
Microsoft is their .NET software and
especially their documentation. As far as
I can tell Visual Basic .NET (VB.NET) is a
perfectly good way to make full or nearly
so use of .NET and the CLR (common
language runtime or some such), exploit
their documentation, etc. And with
basic I get traditional programming
language syntax. To me, that was enough
evidence -- decision made. Problem
solved. TODO list item checked off.
Yup, at one point my VB.NET calls some C
code, right, LINPACK -- apparently the way
to do that is to use "platform invoke",
and I did and it works fine. I'm still
happy as a clam. So, right, I type in my
VB.NET code with just Kedit -- been
thrilled! VB.NET and Kedit -- happy as a
clam!
For TeX, I type that in via Kedit. For
email, Kedit. For forum posts, usually
Kedit. For working with collections of
files in the Windows file system NTFS
(abbreviates maybe New Technology File
System), Kedit. For my log of food,
exercise, sleep, ..., Kedit. Recipes,
sure, Kedit. Shopping lists, Kedit. A
very important file, my most important, of
various facts, short notes, references,
and links, right, Kedit. Generally I like
to use just simple text, and, thus, Kedit
for as much as possible.
For this thread and its
"Sile: A Modern Rewrite of TeX"
I looked at it and could make no sense out
of what it was or what it was for. There
was something about hyphenation in
Turkish! Looks like I should stay with
TeX!
I have really enjoyed reading this! My 'suite' is TeX+Emacs+C (although I write occasional code in Python, Perl, JavaScript, and a few others, a first love is a first love... ) but Kedit and BASIC sound fine to me:)
Am I having a stroke or are the examples not looking good? Subtle kerning issues and the page layout reminds me of a Word letter page. Maybe I'm not understanding its potential.
I don't know enough about the system, but good text justification is only the first step in a good document. TeX, through LaTeX and the various report/book/etc environments has centuries of man time dedicated to getting the best possible output for a given document type.
I dunno - that doesn't help anything with the layout. I still see overlapping text, tables with crooked columns, bad spacing between paragraphs and titles. The kerning got better, even though there is still something wrong with the stroke weight of the font. Maybe its just a really bad font, and not something related to Sile. Choosing a bad font to showcase a typesetting system is a bit of a redflag imo.
I think the font choice is a lot of it, at least. Sile may be a great typesetting system technically, but the examples kind of look, well, like they were put together by someone who maybe doesn't know much about good layout and typography.
Code it, explain it, generate a Literate PDF containing the code.
The programming cycle is simple. Write the code in a latex block. Run make. The makefile extracts the code from the latex, compiles it, runs the test cases (also in the latex), and regenerates the PDF. Code and explanations are always up to date and in sync.
I have found no better toolset.