Another LaTeX-to-HTML tool is lwarp (https://github.com/bdtc/lwarp) which starts from the idea that there only exists one program that can parse LaTeX: the LaTeX compiler itself. Implementing a new parser is almost futile. So instead, the lwarp package redefines all the macros to output HTML. Something like
\renewcommand[1]{\textbf}{<b>#1</b>}
This way, compiling LaTeX gives you a PDF whose text is HTML code, so now you can extract the plain text from it and you have an HTML file. The advantage is that it can easily deal with custom macros etc., because these are natively resolved by the LaTeX compiler.
I use lwarp to make https://tikz.dev/, an HTML version of the TikZ manual, which is probably one of the most complicated LaTeX documents in existence.
You are the author of tikz.dev? I have always thought it was made by the tikz author. Mad props to you, the site is very functional and helpful to me. With it, using tikz feels a bit less like a chore.
Sphinx and reStructuredText are, IMHO, underrated power houses of document building. With extensions, you can hook them up to Zotero (or whatever)-managed bibtex files. You can render to beautiful HTML files, and you get latex PDFs and epubs for free. First class latex-math support, plenty of integrations with things like mermaid, graphviz, and the ability to build super-powerful custom directives to do basically anything. And way simpler/easier than pure LaTeX.
It is too complex compared to Markdown and hasn't got enough features to be comparable to Latex. And I still (almost) use the samé Latex templates that I used at university, 25 years ago.
I feel the complexity is justified. One of the biggest gripes I have with markdown is that you never know whether your markdown implementation is github flavoured or some other implementation. Not to mention Sphinx checks your links / references to other pages exist and give you warnings if you don't have them.
One of the selling points of PDF is that it is a single self-contained file. I found this lacking in Sphinx and wrote an extension for it to zip and bundle the assets into a single HTML file: https://github.com/AdrianVollmer/Zundler
Also works with HTML documents produced in other ways.
If you just run sphinx-build with the latex builder and then run xelatex or pdflatex on the result you'll get one fully-consistent PDF with everything it it, including fully functional internal hyperlinks. That's what I do for PDF. I can make big documentation packages this way building 2000 page pdfs in a minute or two on a modest laptop.
In the product of the singlehtml builder, you will have the entire document in one single DOM tree. For large documents, even modern browsers on a modern machine will be brought to its knees.
This is a huge document, and having this all rendered naively in one single page will not only be hard to navigate, it will also feel really sluggish if not crash the browser.
You're getting close to making your own CHM format, which Sphinx could make for you.
I always thought CHM files were a nice self-contained option for multi-page HTML docs. (Though they'd happily execute whatever JavaScript the author embedded in there... Maybe that's why they fell out favor?)
It would be great if there was an open CHM-like format that was supported by all major browsers. The nice thing about browsers is that everyone already got one installed. They can even open PDFs natively these days. Sadly, they cannot even open epubs (which is almost like CHM without interactivity). I believe firefox used to be able to open epubs, not sure what happened.
Edge supported epub until the bitter end of the Spartan renderer. It was only Microsoft's attempt at an ebook store that died long before that. Admittedly, most people's visibility into Edge epub support was through the Store and the sidebar dedicated to store purchases, but if you had no other book reader app take over the .epub file extension (or if you realized that you could drag and drop DRM-free .epub files into new tabs) Edge would still read them right up to the Chromium switch.
I think it was too. I also think a lot of people missed that there was an app in the Microsoft Store from some team adjacent to the Edge team at the time called the boring and easy to overlook name "Reader" that just had the PDF and EPUB viewers from Edge in a file-based UI instead of browser chrome UI. It was such a useful app and you could set it to default for PDF (in Windows 8 and the early years of 10) and EPUB files (in early Windows 10, with some effort). I never understood why their ebook store effort focused on a sidebar in Edge that didn't work like anything else in Edge instead of beefing up a file-based app like Reader. Reader also died when Edge went to Chromium and I still miss it as a lightweight and fast PDF reader.
I write a fair amount of reports professionally and I use word.
Getting data from my Python analysis into the reports are tedious at best and updating numbers last minute is hair pulling frustrating.
But because of the good wysiwyg I can cheat on my adjustments when I need a graph to go “just there”, I can edit my paragraph wording such that I don’t get a almost completely blank page in between sections, etc, etc which is important to make a good looking report, imho.
How do you go about that with rst? I’d love to write a templates rst file that can be fed from my excel sheets and Python scripts, but how do I go about final layout adjustments?
I've gone a few routes. I have used sphinx's singlehtml builder to make a huge HTML file and then used pandoc to convert it into docx for final adjustments. This worked surprisingly well on a 2000-page document. But it's a bit cludgy.
Another (non-Sphinx) thing you can do is just write (portions of) your docx reports directly from Python using python-docx [1]. I use this approach when people give me strict docx templates that need to be filled in from Python in a very specific way. It can drop data-generated tables in at special placeholder sections and everything.
I will say that I've been more and more happy with just using sphinx straight to pdf for very professional looking reports. Given some latex preamble work in the config you can get it looking quite nice. I haven't personally struggled recently with too many egregious formatting issues on the sphinx-built latex stuff. You do have to swap over to landscape mode for large tables, etc. so it takes some work. But you're right that in many cases, formatting issues do still happen, so YMMV.
Another neat trick in sphinx is the csv-table directive [2], which loads table data directly from a csv file you have around, which you can obviously get from your xlsx.
I do something similar for my reports. I write most of it in markdown using Typora and then I export the last draft to docx for fine tuning and distribution (the agencies I work with want docx submissions, not pdf, which always bothers me).
Typora uses pandoc to do the conversion. My reports are mainly text, charts, and lots of math formulae and it works great. You don't get fine adjustment of layout, but I find that a feature not a bug. I see so many people waste time to put a figure in just the right place. It doesn't matter. The goal is clear information transfer so just get the figure in the doc where it makes sense and go on.
There's a lot you can do with latex to automatically import data and update automatically from external sources, and while it might seem counter-intuitive it is much easier and less effort than Word's wysiwyg interface.
I'm jealous of how easy it is to import data when using a structured source code like format such as rst, markdown or latex. I'm sticking with word because I can easily do small layout adjustments like decreasing the margins of a table to make it fit on a page, or easily see when a paragraph is 1 or 2 words too long, causing it to shift all sorts of elements across pages.
You can do that with Latex as well? I use TexStudio which has a preview pane. Any time I make changes I hit f5 and it updates pretty quickly. It's not instantly but pretty close to it, and there are already less problems with things shifting around because it manages that better than Word does, by design.
I've recently switched to Quarto[0] with RStudio desktop[1] as the editor. It's my preferred approach for all writing now:
1. Great markdown editor with both source and WYSIWYG views
2. Render to a wide range of formats including html, pdf, epub, docx
3. Generate books, web sites, single page docs, presentations
4. Incorporate code (like jupyter) except the source is plain text with fenced blocks
5. Supports code in a number of languages including Python and R.
6. Can use other editors too (iirc there's a plugin for VS Code though never tried it).
7. Built in support for MathJax for mathematical formulae and Mermaid for text-based diagramming with auto inline preview
I prefer it to Word for writing and jupyter for notebooks. No affiliation to Posit, the company that develops both Quarto & RStudio. Just a fan of the products.
Sphinx/reStructuredText supports math in LaTeX input format [1], so you can still go nuts with complex math expressions while still benefitting from the relative simplicity.
I have documented at least 10 x 10 matrices with rst math directives and found it to be pretty convenient. I don't understand what the benefit of pure latex is in this context.
As a certified grumpy old developer I spent years writing off the "X but in Rust" projects, but I have to confess that a lot of good things with meaningful improvements have come from the rewrite-everything-in-Rust movement.
I've not used Typst and not authored much LaTeX (but worked on a project with a group of scientists who used nothing but LaTeX) and can see obvious advantages to Typst. Same with many, many other Rust libraries.
I think that typically a rewrite in, well anything, can be helpful - simply because the first write wasn't sure of what may work or what the correct model for the system should be, or how to handle specific parts of the system etc.
A rewrite in Rust can be good for those reasons, as it removes the "cruft" of old implementation, but also gets the nice properties of speed and such.
But ultimately the thing I love most about Rust is not even the safety and such - it's the package management and build system. Just look at the horrible python/js scene for how bad packaging and build systems can be, and you'll understand why that basic uniform experience can be so nice.
So funny to me that people assume, oh it's written in Rust, so it must be a rewrite of something else just so they can use Rust.
They never imagine that people choose Rust for something they want to implement anyway and not just to replicate something existing, that they do not want to use since it's not implemented in Rust????
Yep even as a big fan of it...it's definitely a trope. And one that's very easy to either dismiss or make fun of. It would be a bit strange for fans to feel defensiveness or denial over that.
jamiedumont let out a rambonctious laugh to himself.
- Ah, you got me good you meddling kids!
jamiedumont was talking to himself again.
hackerbod slowly leaned over and squinted at the screen.
- Uh Typst?
- Yeah! It’s a typesetting markup language. It’s supposed to be better than things like latex.
- Ok. What’s so funny about it?
- Oh hehe, it’s written in—guess what?
- I dunno?
- Rust!
jamiedumont started giggling but hackerbod remained neutrally unamused.
- Oh come on! Rewrite in Rust? Language zealots? Young adults who can’t program without some Ruby syntax sprinkled in?
- So this “typt” thing—
- Typst.
- Right, Typst, this typesetting thing was created to promote Rust in some way?
- Oh I don’t think so.
- It doesn’t mention Rust on the homepage or something? You know, Written in Rust?
- Nope. Not to my recollection.
- So is it a rewrite of something else in—
- Nope.
- So then what does that have to do with—
- Ah, but you’re missing the bigger picture, hackerbod.
- Ok.
- Year after year of this eye-rolling promotion and nagging, blah blah blah memory unsafety is bad, blah blah this is why we used angle brackets for generics, and these sly bastards went and pulled off the most epic Trojan Horse that I’ve ever seen been—
- And what’s that?
- They made an actually useful language!
hackerbod had to scoot back as jamiedumont fell off his swivel chair because he was laughing so hard. hackerbod scratched his head.
jamiedumont finally recovered from the ab-induced euphoria.
- Ah hackerbod, I hate to admit it but they got me good! Those cursed language zealots got one over on me!
I agree! I've been also using this as a personal website (for academia). This works like a charm. It's easy to render any equation, and it's fast (because not bloated).
Sphinx/rst are a nice middle ground between the simplicity of markdown and complexity of LaTeX. I used it to generate a lot of html docs for test reports. I did try pdf gen using via LaTeX and pdflatex for a bit, but stopped after the pdf was breaking the multiple thousands of pages.
And it's really tweakable, especially with html output where you can provide your own templates, or add in your own CSS/scripts even manual tags.
Providing my own templates is kind of a weird feature, because that's not really what I want (in the sense "people don't want to buy drills, they want to buy holes") - obviously that's a necessary feature, but I never ever want to make my own template, what I want instead is to have a template that does exactly what I need but that's made and maintained by someone else.
E.g. I don't care about a configurable formatting for bibliography, but I would want a pre-made template that implements the APA bibliography guidelines with all the tiny nuances correctly. I don't want to configure margins for columns, I want a template that does the IEEE formatting standard exactly. (95% compatibility doesn't work, if a single missing feature means the tool can't produce the required document because it's wrong at one spot on page 3, then I'd need to abandon the tool and pick something that works). And crucially, I want the separation between content and formatting so that I can easily take a blob of content that was formatted for one layout and just copy it in a completely different template and have it match the new formatting guidelines, e.g. automatically moving all the image captions to the other side, changing how they're numbered and referenced, etc.
Latex has all this baggage solved, almost everyone who wants a specific format from me will provide a Latex template with their weird typesetting fetishes included, and I just need to provide the content - while any upcoming tool has an uphill battle to become compatible and provide the same things, at the very least pre-made (and well made) templates for all the major formats (each discipline of science generally uses something different).
I forced myself to use it recently, I mostly found it to be both limited (cannot have part of a link in bold or italics) and inconvenient (each line of inline code must be indented).
It does have some limits, for sure. I havent tried bolding a portion of a url before.
I have enjoyed including inline code using the literal-include directive, which allows you to just include sections of code directly from a file in disk. This is great because you can cover your example code with unit tests while also talking about it in docs without replication. You can even use little border comments to mark snippet sections so that it's not sensitive to specific line numbers.
This article really doesn't get what LaTeX does. Of course it is overkill to have 5 lines of text rendered with LaTeX into a PDF. But the point of LaTeX is exactly to set the typesetting of an output document in stone. PDF is meant to do that and HTML cannot do that.
A PDF conserves everything and that is precisely the point to have a set layout for printing or displaying on different devices.
Yes, there should be easy ways to display math on the web. No, this doesn't mean that LaTeX is obsolete.
Besides, what about references, both external and internal? Probably needs more "modern" tooling.
> to have a set layout for printing or displaying on different devices.
That’s a horrible way to go about it. Already in the 90s it was clear that varying display sizes was a problem, and it has gotten orders of magnitudes worse since then.
The concept of a single set layout that is suitable for everyone is utterly absurd.
Then do not use a tool that was designed for typesetting printed pages which is what LaTex is for. The author of the article seems to think about LaTex only for math rendering. But that is just a fraction of what it is used for. Complex diagrams with tikz or typesetting entire books, so that adding content in an arbitrary place still makes the rest of the book look good without breaking layout are some of the examples of why I would use LaTex instead of html
So, I originally posted this last year. When I posted it, I was using tectonic as my LaTeX compiler, and since it didn't support HTML output yet, I didn't actually try the article's suggestion.
Today, when I saw that I got an invitation to repost this article from the mods, I thought I'd take the time to try it out.
The two commands that the article suggests can be combined into one:
I did a comparison[1] of pdflatex and latexml using some old assignments, and it looks like compiling to HTML isn't fully there yet: the spacing was off in some places, and manual line breaks didn't work. But, I remain hopeful. If this gets polished, viewing LaTeX documents on phones would be much nicer.
There's some good news... arXiv just adopted LaTeXML for in-house HTML conversions of its papers. They allow users to submit bug reports and have collected over 700 so far.
LaTeXML is maintained by a team at NIST, and they are actively responding to the bug reports on github issues.
The LaTeX team headed by Frank Mittelbach is also working to add more structural information to the output of LaTeX, which will make compiling to HTML much easier.
For me, the main problem with most tools that render to HTML was that they don't support all math typesetting libraries that latex supports. I used to work with category theory, where it's common to use the tikz-cd library to typeset commutative diagrams. tikz-cd is based on tikz, which is usually not supported for HTML output.
But apart from math typesetting, my latex documents were usually very simple: They just used sections, paragraphs, some theorem environments and references to those, perhaps similar to what the stack project uses [3]. Simple latex such as this corresponds relatively directly to HTML (except for the math formulas of course). But many latex to html tools try to implement a full tex engine, which I believe means that they lower the high-level constructs to something more low level (or that's at least my understanding). This results in very complicated HTML documents from even simple latex input documents.
So what would've been needed for me was a tool that can (1) render all math that pdflatex can render, but that apart from math only needs to (2) support a very limited set of other latex features. In a hacky way, (1) can be accomplished by simply using pdflatex to render each formula of a latex document in isolation to a separate pdf, then converting this pdf to svg, and then incuding this svg in the output HTML in the appropriate position. And (2) is simply a matter of parsing this limited subset of latex. I've prototyped a tool like that here [1]. An example output can be found here [2].
Of course, SVGs are not exactly great for accessibility. But my understanding is that many blind mathematicians are very good at reading latex source code, so perhaps an SVG with alt text set to the latex source for that image is already pretty good.
If anyone can explain to me, a complete noob regarding html, how they achieve this result with html, css and whichever latex engine they use, I would be grateful. I want to make a personal webpage in this style.
I don't like the language, the ecosystem is too big, complicated and breaks, but the end result is hard to do any other way.
This applies both the equations part, and the text reflow part (I think them as separate things, but they usually go together).
It should be possible to write text in HTML or markdown, and write the equations in latex or asciimath, and turn it into a beautiful/article style pdf, but sadly it is not.
Although CSS (colored and rounded boxes and such) + MathJax-SVG also can look nice.
Document formatting seems like one of those problems where 80% or so of the problem space is simple and the remaining 20% is an unfathomable pit of nightmares.
There are so many different ways people could want characters printed on a sheet of virtual paper that the problem is virtually unconstrained in its difficulty.
TeX was a major theoretical advance, and LaTeX is a nice enough UI layer on TeX that has gotten significant traction. But even outside of TeX, it feels like even software like MS Word are impossibly complex and clunky.
You can make something nicer by dramatically simplifying or cutting the feature set. I think that's probably how Google Docs has a pretty simple interface. But I'm not convinced there's a real replacement for the incumbents that simply tries to improve UI without having a deep technical insight about document layout the way Knuth had with TeX.
Latex has a lot of caked in design mistakes which are never going away.
Unfortunately typst seems to have replicated the primary one - inventing a new turing complete programming language rather than piggybacking off an existing one.
It's possible to conceptualize a much better latex but it would take years to build properly and build the ecosystem around it to do all the odd things people need when doing markup requiring 1000-2000 community packages.
Thing is, you can't really cut the feature set much. Nobody needs 90% of the features but for almost everyone there's some 10% of the less-used features that's a must-have, a total dealbreaker if the other tool doesn't have them or does them poorly; and that's a different 10% for different people, so if you have a cut-down feature set you lose many people - some because you don't have A, some because you don't have B, some because you don't have Z, and they all instead use the same old, complex tool that has support for "their thing".
Every time I encounter LaTeX, I think of something I heard: "You shouldn't need a build environment for a word processor." I can't get away from that sentiment. Almost nobody I've seen using LaTeX has actually been using it for typesetting. Usually they're using a typesetter for word processing.
Sometimes it feels like they're only using LaTeX because they "learned it in college." You ever notice that? So many people in LaTeX threads say they learned it in college, or they've been using the same setting since college, or whatever. People learn LaTeX to make college papers look nice, and then they never need to configure it again? Isn't that strange?
The worst part, though, is that people complain if you call it latex. Which I think says quite a lot about it's userbase.
It will be hard to replace LaTeX. I still use it. It's virtually bug-free and compiles documents from 30 years ago. I sincerely think it will be around for another 30. It's tried and tested and that's hard to find in the software world. Typst looks interesting though. I'll keep my eye on it...
How do you handle internationalization, and, in particular, hyphenation? That’s the main reason I use LaTeX for (well, specifically XeTeX & Tectonic, which are pretty modern). Without those two features, one might as well use LibreOffice, no?
Might still be pretty limited, but I've been looking for something with a more modern syntax for years, and this seems a good candidate! Thanks for sharing.
Of course it will take years to replace LaTeX, but we need to begin working on it.
Typst is still fairly limited. Luckily it has a strong webassembly based plugin interface. I am currently using it for anything I'd otherwise use latex for.
I started using it in the last couple of days after reading this and I find it amazing. It's limited in the sense that it may lack templates and a lot of other things, but it's so easy to code for it, that I expect the community will make everything that is needed really quickly. I am SO impressed. I love it.
Talks about "htmldocs" (which shows maths formulas on one of their templates) but there are also various other alternatives mentioned in the discussion.
Well you need to install the appropriate texlive dependencies which can be somewhat complicated, but once that's done it's just writing inline Latex
$$\like{this}$$
into your Markdown files and then doing
pandoc -f markdown -t pdf -o output.pdf input.md
Haven't used this in a while and just tried it again, was just a matter of searching a few error messages, gleaning the missing texlive package names from the results, and installing them. Works like a charm now.
I also had this working for Markdown to HTML conversion back in the day when I needed it, but that requires the website using a JS library like Mathjax.
The recommendation to use Markdown+MathJAX fall short when you want to write longer documents with numbered section, subsection, and theorem/definition/figure etc tracking and referencing.
I'm sure with Sphinx and reStructuredText you can get that large-scale document tracking stuff, but with LaTeX it just works for the most part and you don't need to juggle a bunch of different side-projects and extensions. Plus you get things like automatic index generation (for a physical book).
Markdown actually works great for larger documents when you use it with pandoc [1]. That way you get HTML output and PDF output via Latex, without the HTML being a second class citizen.
I wrote my thesis (50 pages) and multiple published papers this way. Maybe it seems janky but honestly my experience with Latex and it's 10 incompatible compilers and thousands of semi-incompatible packages has been much worse.
I also don't understand why (academic) publishing is so PDF focused. It's a horrible format to read on screens (think multi-column PDFs, and scrolling / jumping up and down to find references), and who actually prints stuff anymore?
The thing I love most about Pandoc is that my notes can just slowly turn into a fully fledged document. Like bullet points - The syntax in Latex is far too verbose to make taking notes with it comfortable.
It's also much easier to extend, I wrote a simple tool that automatically converts URLs into full and correctly formatted citations, so I don't even need a citation manager to get the same results:
The GAN was first introduced in [@gan](https://papers.nips.cc/paper/5423-generative-adversarial-nets).
> I also don't understand why (academic) publishing is so PDF focused
Because academics still often publish physical books.
You prefer to have lots of tools and write custom extensions to programs. And you'll have to maintain those tools forever, and migrate them when the upstream software breaks, or the links you use die. Most academic authors don't want to do that, and with latex they can take the same typeset equations and diagrams (without learning any new tools):
- Publish a paper
- Write a talk
- Publish a book
- Manage a unified bibliography across all of these
I searched for a comment to supports the fact that LaTeX shines in certain areas.
My memory of LaTeX has weakened over the years, since I am not writing long texts with lots of figures and such, but I know it's more than this statement let's on in the article: "Something that is more modern than learning a hundred bits of print typesetting that your student will never, ever need?"
What exactly is, in the end, is 'modern'? Is it because there is less syntax in Markdown to remember and the Modern is syntax-adverse? :D Aren't there editors for these in the first place to avoid the daily grind of remembering syntax?
Modern as in "more recent" (and not as in "the modern era" that ended decades ago). More recent doesn't mean better though : the likes of Overleaf, Google Docs, Github are also "more modern" than some of their alternatives, yet ought to be avoided like the plague.
I honestly don’t see the point of using LaTeX if you’re generating HTML. The great strength of LaTeX, in my view, is the precise control it provides over typography and formatting. As such, it works best with an output format which can faithfully render these documents — such as PDF. For an output format like HTML, which encourages reflowability over faithful rendering, I’d much prefer to use an ‘easier’ document format like Markdown or reStructuredText.
Exactly, there is a triangle of tradeoffs here: prettyness vs easyness vs responsiveness. You can only have 2 of them. pretty and easy is Latex. The reason people call CSS a nightmare is because responsiveness fundamentally makes it much more difficult to make a document pretty. So HTML+CSS gives you pretty + responsive or easy + responsive. That's not the same functionality as a pdf for a fixed scientific document.
I spent a few weeks last year doing the opposite, HTML to LaTex in order to print and nicely typeset top HN articles, so I'd have a nicely printed booklet each morning. I think creating hard copies of web content for offline reading holds a lot of promise, but the internet is a beast.
I like LaTeX for the quality of it's pdf output, I use in for docs that need to be "printed" (non necessarily on paper, but still 'fixed typographical form for potentially long term archiving) not for anything else and yes I DO HATE pdfs because of their design, but PostScript is not much common these days and while a bit better for certain aspect is not much better in general, dvi is even worse.
For my notes, for anything that need to be "live" I use org-mode because:
- it's a far more natural markup than anything else
- it's rendered INLINE, no need to jump between a source form and a rendered one, a thing MD lovers fails to understand
- it's an outlining tool, another thing most other tools fails miserably to understand
- it easily incorporate live things in other languages (org-babel) a thing no modern REPL-alike DocUI like Jupyter can't do
Long story short I prefer the best tool depending on the job. HTML might be the least common denominator tool, making it the worst in essentially all cases. XML for machine usage, SGML in general, are good for machine usage, but they are very impractical in current usage, just see the actual crappy state of things for e-invoicing with XML/XADES docs + XSL to render them in the end as pdf for the human. They are a good too in some case, but again not the best for any specific case.
When I use LaTeX, it's because I want a way to store book manuscripts and their layout as code in version control. I never use any of the math layout. I get the impression that my use case is rather in the minority.
I would use CSS+HTML for layout, but what do I do about automatically generating tables of contents and indexes?
Looks like Pandoc can generate tables of contents for HTML, though I don't see anything about indexes. Roff and friends, and Texinfo, can do both, though with their own tradeoffs.
This is from 2013, so the bet that "nobody will want to read [PDFs] in 5 years" can be considered failed. If anything, PDF has become the lingua franca of the academic web, crowding out even DjVU at the thing that DjVU was made for and PDF was not.
I have not been following the development of mathjax, pandoc, etc. carefully, so I'm wondering: Have the main issues been solved? By these I mean
(1) support for most popular packages,
(2) automatically breaking long outputs into small pages that don't overheat my laptop or crash my browser and yet reference each other properly,
(3) printability (without lines broken in half, senseless overflows and the likes) or cross-compilability with a regular PDF compiler?
I know the ar5iv project is getting closer and closer to (1) and (3), but is that available to regular users?
But don't worry, 2024 is going to be the Year Of Math On The Web.
(I've been trying to do 'math on the web' (ish)) since 2002, and it's always sucked in some way; and all that time, images/pdf have Just Worked(TM). The emphasis in the OP on how much you'll have to report/chip in/fix is telling...)
I've started auto-exporting Zotero-managed references to a bibtex file using better bibtex [1] and then using Sphinx and reStructuredText to process them uniformly into nicely formatted HTML, pdf, and epub using sphinxcontrib-bibtex [2].
> don't just produce PDFs that nobody can read on small screens
I was thinking about this recently. If you get pedantic enough* about it, the typesetting quality you can get from a LaTeX+PDF is strictly better than what can be achieved using (sane) HTML.
I wanted to blog in LaTeX, and to solve the screen-size issue I thought I'd pre-bake to a wide range of page geometries, and then serve up an appropriate one to the client using pdf.js.
Fortunately for everyone, I decided against it in the end and continued blogging in markdown+html (with mathml support)
*well beyond what most readers would possibly care about
I was surprised recently when I changed up my HTML and PDF toolstack not just how good pandoc was, but the entire ecosystem that had emerged around pandoc including pandocomatic and pandoc-resume.
Typst is pretty close to markdown for simple things, and scales nicely to hard things. So you don't really need to worry about the markdown-pandoc shuffle anymore.
LatexML has come a long way. Even arXiv uses LatexML internally to offer HTML5 versions as of late 2023. It does have limitations in not supporting all packages, or producing a high-quality translation in all cases.
If you don't need to convert entire LaTeX documents, MathJaX and KaTeX are really good at rendering a subset of LaTeX as MathML/SVG. I run MathJaX + an xypic extension for commutative diagrams with server-side rendering on my website, and it works great in practice.
Asciidoc has potential. Last time I dug into it the ecosystem was lacking, but there were glimmers of a reboot. I hope that pulls through because it’s a great format.
Edit: yeah it’s managed through the Eclipse Foundation now. They’re slowly working towards a formal spec, haven’t hit 1.0 yet.
You have also AsciiDoctor ( https://asciidoctor.org/ ) which is alive and well. I am using it for technical CS documentation internally, but only for single page documents. I did not try to deploy their whole multi-document setup called Antora ( https://antora.org/ ).
I had experience with AsciiDoc and personally not a fan. IMO it has weird features like totally illegible compact table syntax (seriously, that stuff is worse than XML) and the spec looks abandoned. But I keep seeing it being used, I guess it appeals to people who want something more flexible than Markdown (and who like Ruby, or they would go with RST)
One solution is to embed alternatives within PDF itself. LibreOffice can embed inside a PDF the original editabble source in ODF format. You could also embed ePub. That would mean you would have a single file that could be processed in many useful ways.
Although I use markdown (and similar) for memos, I turn to latex for longer and more complex material.
A lot of this is just because latex has been a standard for publishers in my field since I started (approximately a thousand years ago).
When writing for journals, latex saves a lot of work. Publishers provide latex templates that ensure that articles have a prescribed format and scope of content. Being able to see a good facsimile of the final published form is quite handy for authors. Oh, this paragraph is going on for over a column -- I'll break it up. That sort of thing.
This still applies when writing for longer things, such as textbooks and course notes, but another factor (for me, the larger one) is that latex (more properly, the tex upon which latex sits) is a programming language. Macros can be written to do lots of things that would be a pain if done manually, and once a macro is written, altering an entire text is easy. I did this in a book I wrote a while back, writing macros to colourize text that would be indexed, add margin notes for things I wanted to return to, categorize paragraphs by function, and so on. I could turn all those macros on and off by uncommenting a line. This is really quite helpful in writing something that takes months to years to complete. Frankly, I use this macro approach even in memos written in markdown. Inside almost all of my markdown documents, there are latex commands.
As for reading things on a small screen, which I guess is really the topic here, I must admit that this is something I rarely do within my own field. Sure, I do it if reading one of those 10-km overview articles in Science or Nature. But when it comes to my own field, things are technical and demand long periods of study. I don't try to read this stuff on the bus or in a coffee queue. I need time (hours or days) and I need to be able to take notes.
Another reason I prefer PDF is that it is fixed. My brain puts information into a sort of spatial framework. Somehow, if I look at a paper I first read 40 years ago, I still know what information is on which page, which of the diagrams summarizes the whole thing, and which of the citations is key. This may be a flaw in my brain functioning, but I just don't find these sorts of memories forming when I read content that has a plastic format, with paragraph breaks changing if I adjust my view. But maybe this is just my age talking, I suppose.
I learned LaTeX in grad school in 2013, starting with LyX. Yesterday, I compiled an Rmarkdown document into an APA6-conformant PDF with just a bit of YAML, with a tex file as an intermediate output.
We're almost there for skipping LaTeX entirely, but in my experience, Google Docs and Overleaf still offer vastly superior collaborating tools. Now if we could just edit {.md; .rmd; .ipynb} files directly on Overleaf, with comments and track changes, we'd be in business...
If I'm using LaTeX, I'm writing scientific articles. I expect scientific articles to be read by people on computers with normal screen sizes or printed off. Therefore there's no reason to bother with anything other than PDF. PDF works great.
That's certainly one use case. I might be the exception, trying to look up something on my phone, or following a link in a blog or HN post. Stuff in PDF's is hard to read, especially two column journal articles. I'm often not at my desk, since I might be in a meeting or in the lab.
Don’t you have a computer in your lab? Also I actually think the 2 column format works well on phone bc you can zoom in to fit the column to the screen
A computer at every lab bench, usually tied to specific experiments. Sometimes I sit down at an adjacent server and read something, sometimes pull out my phone.
Anything can be accommodated. One thought is to provide the source code, then people can adapt it to their display preferences. Like how HTML was originally envisioned.
I love the author’s “if you want to leave a comment email me”. I saw this somewhere else and it motivated me to make an automated system that works like that: https://r3ply.com
That is old news. Mathjax 3 is a lot faster nowadays than it used to be and it supports more LaTeX keywords than KaTex. Especially the important \label and \ref are still not supported by KaTex.
I use lwarp to make https://tikz.dev/, an HTML version of the TikZ manual, which is probably one of the most complicated LaTeX documents in existence.