The LaTeX cargo cult

mhandley · on Oct 26, 2014

The biggest advantage of latex is that by (mostly) separating content from presentation, you can use revision control systems like git or svn to collaborate on papers. You can be in that final hour before the paper submission deadline, on a skype call with four authors scattered around the world, all simultaneously editing the paper (each using whatever tools they prefer), and be reasonably confident it won't all end in tears. That's when you really understand the advantage of this latex-style markup. Don't get me wrong - there are lots of things I hate about latex, but I'll almost certainly keep using it because of the workflow it enables.

pavanred · on Oct 26, 2014

I agree with this, my thesis committee wanted my thesis as a research paper (irrespective of whether my work was going to be published or not) but in the final stages the graduate college declined to approve it in spite of my committee's approval because they require a predefined format of the thesis. The change of format took me a few minutes, thanks to latex. I can't begin to imagine how much work and anxiety it would have taken to change the entire format at the last minute otherwise.

Agathos · on Oct 26, 2014

My school publishes a LaTeX template for theses. After I downloaded that, I spent very little time thinking about formatting.

Unfortunately my advisor doesn't do LaTeX so it kind of sucked for collaboration.

neltnerb · on Oct 26, 2014

The latter point is the reason I stopped using LaTeX even though I was proficient enough years past to type up lecture notes in realtime during class with little thought (lots of math). Collaboration with it requires everyone involved to know it.

With my adviser and collaborators, I ended up compiling PDFs, printing them, and then they would mark them up by hand because that was fastest. It was not ideal by any stretch.

Word has it's faults (cough figure placement and cross-references), but track changes and adding comments to the manuscript is just spot on and easy. Getting comments and revisions done in Word is just an order of magnitude easier, and just leaves me at the end to make sure all the figures are positioned properly with captions. And now that Word finally seems to have gotten Styles and sections working properly (so many horrible disasters with this in previous iterations) it's pretty workable for technical documents.

Even the equation editor now very nearly straight up accepts latex markup for the math to the point where it's good enough for all but heavy theory papers.

But with LaTeX, asking your busy adviser to look at the compiled output, and then mark up a separate file they have to parse in their head to comment on is just too much effort.

jahewson · on Oct 26, 2014

My schools's thesis template was incompatible with the packages which I had to use for a specific notation. I spent the rest of the year constantly wrestling it. The OP makes a good point, LaTeX ultimately does not separate content and presentation, nor is it declarative.

pinko · on Oct 26, 2014

Isn't this also a feature of ASCII-based markup languages, which the author suggests as the right alternative to LaTeX?

mhandley · on Oct 26, 2014

I'm old enough to have collaborated on papers using troff (strictly speaking psroff) and sccs as a revision control system, long before latex became popular and PDF didn't yet exist. So, yes, the primary requirement is, as you say, for an ASCII-based markup system to enable this workflow.

However, latex is extremely good at flowing text while applying good style guidelines. Most of the time you know that extending a paragraph by a line won't make anything bad happen - no titles left at the bottom of a page, acceptable stretch of line spacing, and things like that. Of course you can only push this so far before you have to fiddle about moving figures to the right page, etc. This is probably unavoidable, although it would be better to have a bit more direct control over where floats end up. The key thing is that latex does mostly get the small stuff right, so you're not constantly yelling at your collaborators for breaking anything.

lmm · on Oct 26, 2014

I always see this claim that latex separates content from presentation, but I don't see how it's true. As the article says, \emph{hello} is implemented in terms of font commands; it entangles the semantics (emphasis) and the presentation (italicisation), and there is no way to extract the semantics of a latex document and render it in a different format the way you can with e.g. Markdown.

tincholio · on Oct 26, 2014

\emph does not always render as italics, it actually depends on the document class you're using, and you're free to redefine it as you see fit. It just indicates that the argument should be emphasized, not how.

lottin · on Oct 26, 2014

Bad example, because \emph means emphasis, not italics, and you can represent the emphasis in any way you want — the italics is just the default. So there is separation of format and content.

Though it's true that it's not enforced.

dzsekijo · on Oct 26, 2014

There is a big difference between the web (HTML + CSS) and LaTeX (LaTex + styles) worlds in terms of culture, and that defeats the idea of "separating content from presentation".

On the web, that content and presentation are separate, that's a technically a possibility, and culturally an actuality. That is, while you have the technical tools to apply such a separation, the toolkit does not force it on you; you can choose to write html with in-place defined colors and font sizes and other style elements. That the standards are higher than that and one is expected to write clean structural HTML and specify style separately with CSS is just a cultural phenomenon.

In the LaTeX world (scientific community), separation of content and presentation is also technically feasible, but the culture that maintains it as standard is missing. Most LaTeX documents are written by scientists who just want the damn thing look as they prefer, and apply all nasty tricks that the system can offer to get at there. The more computer savvy ones engage macro writing to save them a few keystrokes, but that's far from a way of document authoring that keeps content and presentation separate.

mhandley · on Oct 26, 2014

Over the years I've written nearly 200 papers with about 70 co-authors, probably 75% of them done in latex (most of the rest in some nroff/troff variant). The process of latex authoring usually involves iteration, with different authors contributing different sections, then a certain amount of rewriting of each other's content. Papers typically have a page limit - often 12 pages. But you ignore all that while coming up with the first few drafts. A couple of days out from the submission deadline, someone (often me) starts to panic that the paper is 16 pages and needs to be 12. That's when you start going through and copy-editing, trimming non-essential content and rephrasing for conciseness. Only during the last 48 hours do you start to worry about tuning the layout, because you know from experience that it will likely all change in the last day. In the last day you're doing fine tuning, panicking that the paper is still 13 paqes and needs to be 12. Now you do a layout tuning pass, and it's now that authors tend to get into all the nasty tricks like negative vspace, tuning caption distances, and so forth. Even then, your co-authors are probably still modifying content without paying too much about the effects on layout. Finally, two hours before the deadline, you've got it down to 12 pages. Now you have to stop changing text without regards to layout.

Anyway, my point is that in the weeks/months of writing a paper, the messy mixup of content and presentation really only becomes relevant in the last two hours before the submission deadline. Up until that point, everyone is mostly using the same subset of latex, not worrying too much about the presentation part, because the coarse-grain paper layout is usually handled by the style sheet you get from the conference.

JulianMorrison · on Oct 26, 2014

You want to separate content from presentation? Use pandoc flavored Markdown. LaTeX is a render target. So, for that matter, is Word.

new299 · on Oct 26, 2014

There are a few nice things about Latex for scientific publications:

1. It's style driven. I'm not sure how well you can do this is Word now. But in Latex it's pretty easy to reformat your document to match a journal or thesis style.

2. It's scriptable. I don't mean that it's Turing complete. But you can drive it with a Makefile. This is great for scientific publications I've found. You can script it such that Make will re-build you tools, re-run analysis, generate new figures, and then regenerate the article. This save a lot of time when you're iterating over a publication.

3. Well integrated reference management. Bibtex itself is a mess, but if you don't need to alter the reference format, it works well.

4. Equations!

I think it's probably overkill for what he's doing. Sounds like Markdown + a reference manager would be fine for him. But for a lot of scientific publications it has handy features. I too would like to find something else, but I've not seen anything.

I guess you could write everything in HTML! But it's obviously not well suited to this application.

auntienomen · on Oct 26, 2014

A footnote to your 2nd point: LaTeX actually is Turing complete. (You can find a Turing machine implemented in LaTeX at http://en.literateprograms.org/Turing_machine_simulator_%28L... .) As far as I know, this feature is more amusing than useful.

new299 · on Oct 26, 2014

Oh sure. It's programmability gets used quite a lot I think e.g.:

http://www.texample.net/tikz/examples/tag/fractals/

There are also some really nice graph drawing libraries etc. which I would guess rely on this.

ygra · on Oct 26, 2014

1. has been possible in Word for ages (according to an article recently on the front page, it's been the main difference between Word and WordPerfect), and it's pretty much how you write documents of any size without going insane. I'm actually astonished how many people think that using Word means using direct formatting and eschewing styles.

ultimape · on Oct 27, 2014

I really wish people would use word's features - having the entire document marked up with appropriate headings makes its a breeze to go through and restyle a document.

I find that while collaborating on things, I have to make a pass that involves purely going through and marking up heading text appropriately.

----

I've found the best compromise is to have a sort of technical prologue that discusses how to edit the document, or forcing people to work in something like markdown where the lack of font sizes force the use of a heading class notation.

I prefer https://stackedit.io/editor because it includes mathjax support which means you get access to a large swath of LaTeX style math tools.

mangecoeur · on Oct 26, 2014

LaTeX is a good idea with a terrible implementation. The popularity of markdown (+variants) is a testament to the usefulness of plain text writing. However, LaTeX syntax is a clunky and the ecosystem is a scrapheap-challenge amalgamation of packages with assorted cross-incompatibilities. Latex to PDF converters are also shockingly slow for this day and age, a simple document can take several seconds to compile compared to browsers which can re-flow complex HTML documents in milliseconds.

IMHO this is because of the ad-hoc nature of people using Latex, its been cobbled together by researchers based on their needs at the time while HTML+browsers have been carefully designed and optimized by people who know the intricacies of document rendering. Researchers just aren't very good software engineers as a rule so perhaps its not surprising that they produce something that more or less works but is not very well designed

jessriedel · on Oct 26, 2014

I've said it before and I'll say it again: the single most effective use of ~$1 million for advancing math and physics research (two disciples for which no non-LaTeX solutions exist) would be to hire some developers for a couple of years and make an enterprise-quality successor to TeX. Keep the math syntax, make it handle infinite pages for the Web, and fix all the awful bits that waste hundreds of thousands of grad-student-man-hours each year.

This isn't fantasy. Zotero is evidence that custom built academic software funded by charitable foundations can be a tremendously positive service to the academic community.

https://www.zotero.org/about/

Also, everyone should read deong's comment.

https://news.ycombinator.com/item?id=8511509

joemi · on Oct 26, 2014

To be fair, LaTeX has a more complicated method of processing text since it considers a lot of typographic issues that browsers do not, so it has to be slower than a browser when rendering text. That said, I don't know enough to say whether the amount it's slower is proportional or not.

deong · on Oct 26, 2014

There are a couple of issues here. First, Markdown and HTML simply punt on the vast majority of the issues that TeX solves. Just as the author rightly comments that TeX is not geared toward online publication, HTML is geared toward only that model. If you want to paginate HTML or Markdown, you do it yourself. Widows and orphans are (obviously) your problem to deal with. Compared to the work that TeX is doing, Markdown is, to a first order approximation, just catting the file. HTML can reflow a document in real-time because it's doing a really poor job of reflowing the document. Even when they work, they're just putting line breaks in whenever the width would otherwise be too wide. TeX is running a dynamic programming algorithm to minimize the "badness" of the line breaks across multiple lines and even paragraphs. And quite a lot of the time, the browser just throws its hands up and says, "fuck it, you can just scroll horizontally to read the rest of this line". You can't do that on paper. So of course it's faster. You might as well be complaining that Preview is faster than Photoshop.

HTML and Markdown don't do automatic hyphenation (across multiple languages). They don't do ligatures. They don't do proper text justification (neither does Microsoft Word or Libre Office for that matter). They don't do cross reference tracking (i.e., having automatically numbered sections, tables, figures, etc. with automatically updated references). They have no logic at all for automated float placement. Font handling is specified by a human instead of relying on algorithmic font selection and substitution when necessary. I could go on for pages of this.

I think the idea that web browser vendors are better at this sort of thing than TeX and LaTeX is so wrong I don't know where to start. The author complains that some of his 20 year old LaTeX articles rely on outdated files to render properly. While this is true, and very occasionally a problem, it's only very recently that you had even the vaguest hope that your HTML document would render the same way on two different computers owned by the same person today! Arguably, the biggest slice of the software industry is now devoted to making things render on browsers. And for Markdown, we quite recently saw that even the simplest text rendered in no fewer than 17 different ways depending on which software (and version) you processed it with. If my goal is to be able to reproduce the output of today 15 or 20 years from now, HTML would be the absolute worst choice I could think of, unless again, you stick with <b> tags and the like, and the subset of LaTeX you can reliably assume will always work gives you much broader coverage of the space of typesetting issues than the subset of HTML that doesn't change monthly does. Not to mention, I can still more easily go get an old LaTeX implementation to rebuild a document that doesn't compile anymore (but in 15 years, I've never had to). It's quite a lot harder to get Netscape Navigator 3 up and running to attempt to faithfully render a document I wrote in 1997.

Also, web browsers have historically been just about the buggiest, most insecure, and transient pieces of software we've ever written as a field, and TeX is famously maybe the highest quality piece of software ever written. It's more or less fine that the web changes every 18 months. It's a problem for archivists, but the web isn't really intended for that. Academic publications are though, and the impedance mismatch is, in my opinion, brutal.

The interface (by which I mean the programming language) of TeX and LaTeX is indeed pretty dreadful, but this is a really minor issue compared to the rest of it. There are a lot of things I dislike about LaTeX, but I don't see how HTML or Markdown is an improvement. You'd need a completely new thing that supported everything that LaTeX supports, and while you could certainly do so with a nicer language, you couldn't do it with something as clean and simple as Markdown -- there are just too many things you need to be able to tell it you want it to do.

mangecoeur · on Oct 27, 2014

I disagree that browsers (and I do mean modern browsers, i recognize it hasn't always been this way) are somehow solving an easier problem than tex or doing it in a half-arsed way - on the contrary they solve the very hard problem of correctly rendering content that might be badly formed or underdefined. I don't think there's anything in tex that you can't do in html5 and CSS - including ligatures, auto numbering, and so on.

As for, markdown that's just an example of how there is a demand for text-based writing (I could also give Restructured Text which has a much stricter spec than markdown). I think markdown could evolve to fill the Latex niche.

For a better implementation look at pandoc, which very cleanly parses documents to an internal data structure and convert that to a range of outputs, I think that's a much better basis for a document system. At the moment it has to go via Latex to produce PDF - in fairness latex still has the most mature pdf rendering system. I for one would like to see that change, I think we can do better.

deong · on Oct 28, 2014

As far as I know, every system that can go to LaTeX as an export option gives you a basic LaTeX document. I don't know how you tell Pandoc, for instance, "OK, I need three authors in the author block, centered horizontally, with their affiliations below their names. But authors 1 and 2 have the same affiliation, so only include that information once, but center it below both names as a unit."

How do I tell CSS that I want my bibliography to be sorted by author last name, and have the inline citations be of the form (Author, Year), except when I'm using the author's name in the text as a noun, in which case it should be just "Author (Year) showed that blah blah blah"? For that matter, I don't think CSS can even do justification properly (by properly, I mean not treating each line as an independent unit, but shifting text around within an entire paragraph to minimize deviation from the desired inter-word spacing globally). I know someone implemented TeX's algorithm in Javascript once upon a time, but I'm willing to bet it's not any faster than TeX.

I have no real argument against the idea that you could build something that does everything LaTeX does just as well. Clearly you can. I am arguing that LaTeX has a huge amount of really important things already built in, and people use those things every single day. You have to (a) have all that stuff ready on day one if you want people to use a new thing, and (b) getting from where you are today to that point will necessarily involve taking the nice clean thing that seems so much nicer than LaTeX and making it messier, uglier, and more complex. The only thing that makes Markdown, for instance, nice for people to use is that it only does a handful of common things, so it can make those common things simple and conventional. Bold to bold something. (Amusing and apropos to the topic, HN's version of Markdown appears to not allow me to type star-starBoldstar-star. Not with backslashes or any other way I can find). If you want to build a LaTeX clone though, you need to decide: what's going to be the simple, easy-for-people convention we use to denote "don't put a line break here, because these two characters are someone's initials" and "stack these equations in a group, centered on the equal signs, and include the individual equations on lines 1, 3, 5, and 8 in the global numbering of equations, but not the others." You're going to have to define a stylesheet of some sort to govern the rendering engine's myriad options (do I indent the first line of a paragraph, or should everything be left-aligned, but with extra vertical space between paragraphs).

CSS is arguably already uglier, messier, and more complex, and while I'm sure it's improving all the time, as of about five years ago, I think the entire internet was almost exclusively composed of porn and articles about how to center something vertically, in roughly equal proportion. Epub is an HTML+CSS based format specifically geared at the kind of thing that you'd need, and just like every other technology we're mentioning, it's terrible unless you're doing left-to-right, top-to-bottom, figure-less, table-less, text where formatting doesn't matter. Just like CSS3, we can say, Epub3 supports more stuff now! Someone let me know when it's safe to buy ebooks with code samples in them instead of getting the paper version.

Derbasti · on Oct 26, 2014

CSS3 actually supports pagination, automatic numbering and referencing, hyphenation, justified text, and footnotes. At least that's what the spec says...

deong · on Oct 27, 2014

I'm happy to be corrected on that point, but then the question becomes: what gives us any confidence that the CSS spec is going to be followed in exactly the same way by multiple browser vendors consistently between now and 2034? Certainly nothing in the history of client side rendering on the web gives me any faith in that proposition.

Also, and this is probably just me, but I find CSS even harder to use for bespoke layouts than TeX. Which gets to the last point I made -- certainly you could replace LaTeX with an equally capable substitute, but it's not clear that the substitute wouldn't necessarily recreate a lot of what people hate about LaTeX. Markdown is almost universally loved precisely because it can't do very much. The more features you add, the more cumbersome the mechanism to select them needs to be, and at some point, you just have LaTeX with angle brackets and tag selectors instead of curly braces.

qznc · on Oct 26, 2014

For ACM style papers you need a two-column layout. On the first page at the bottom of the left column must be a copyright notice. As far as I know, CSS cannot do that.

Derbasti · on Oct 27, 2014

CSS now supports fancy footers and multi column layouts.

wanderingmarker · on Oct 26, 2014

> HTML+browsers have been carefully designed and optimized by people who know the intricacies of document rendering.

You're joking, right? HTML+CSS requires heaps of workarounds to achieve the most trivial layouts. The people behind these standards have no understanding of documents and no taste in software: they deem the absence of variables in CSS a feature, and the result is Less, scss, and similar preprocessors.

Had the CSS committee at least the sense to copy the boxes-and-glue model from TeX, things might not be so grim. As is, we seem to be stuck with their clumsiness for a long time.

WallWextra · on Oct 26, 2014

The typesetting quality of web browsers doesn't even compare to that of TeX, which uses a dynamic programming algorithm to minimize the "badness" caused by line breaks in various places. This is aside from TeX's ability to typeset math.

Shorel · on Oct 27, 2014

> HTML+browsers have been carefully designed and optimized by people who know the intricacies of document rendering

I think HTML+browsers is something that has been cobbled together as well. Many times. With the added joy of useful features killed by political or profit driven reasons.

chimeracoder · on Oct 26, 2014

> LaTeX input files are a proprietary format.

This is a really misleading statement, especially since it comes after him stating that "LaTeX is free in both senses".

LaTeX input files apparently lack a standard specification, which is admittedly bad, but then again, so do the input files of many programming languages that some people on HN are writing on a daily basis[0].

This is not the same thing as them being proprietary; anyone can write a new LaTeX parser and there is nothing stopping them legally or technically from doing so.

[0] I may be wrong, but I believe PHP and Ruby both fall under this category. Markdown is another example (everyone parses it in a slightly different way, and while it's generally consistent, there are definitely warts around the edge cases where it's clear that Markdown would benefit from having a standard).

revelation · on Oct 26, 2014

By his definition, every document ever written in any DSL is in a propietary format.

Personally, I'm perfectly happy if I can open a document in a text editor and get the content that way. That obviously works perfectly fine with LaTeX, it doesn't work at all with doc.

lmm · on Oct 26, 2014

> By his definition, every document ever written in any DSL is in a propietary format.

I think that's basically true. The good formats are formats that have a) a standard but more importantly b) multiple independent implementations.

> Personally, I'm perfectly happy if I can open a document in a text editor and get the content that way. That obviously works perfectly fine with LaTeX, it doesn't work at all with doc.

You can get the content, but not the formatting, which was presumably important if you were using latex. It's pretty trivial to extract the plain text from a word doc too.

jmount · on Oct 26, 2014

I like Latex (Math/CS background), but I would definitely like to try some alternatives (like Asciidoc discussed in https://news.ycombinator.com/item?id=8509062 ). The problem is I never can tell which Markdown/Docbook inspired systems actually have working implementations and which ones are hot air. I don't want to end up with a big SGML mess that I can't do anything with or have to edit CSS just to render a book in a standard format.

Any recommendations/tutorials?

My ideal system would allow something like literate-programming/sweave/knitr. The notation could be any of markdownish/xml/ascii. I would have to be able to do call-outs/annotations on listings with tags (not insane/fragile region specifications). I need figures and charts. And I would have to be able to produce at least HTML5, PDF, epub, mobi. And I need support for footnotes, endnotes, tables of contents, indexes, and bibliographies. Flexibility in back-ends (like being able to render to Latex) would also be good.

Edit: the sweave/knitr thing I can live without (could probably arranged a pre-process to do this).

amouat · on Oct 26, 2014

asciidoc is quite nice. It can output DocBook which I think is pretty well supported.

I'm currently writing a book in asciidoc which produces output in all your mentioned formats and includes footnotes, endnotes, call-outs, code listings, index etc. However, whilst I write almost exclusively in asciidoc, a lot of the styling etc is done by the publisher's docbook workflow.

masklinn · on Oct 26, 2014

Sphinx has most of these, though not all (it doesn't have the literate programming part, it could probably be done though; and the standard builders are html4). And I'll acknowledge that ReST is hard to love.

ultimape · on Oct 27, 2014

I've had good luck using tools like Pandoc as part of a workflow to translate between various messes.

wanderingmarker · on Oct 26, 2014

I wish the meme about the supposed superiority of "declarative" languages would go away. They have tradeoffs, like everything else. "Make" is also declarative — and horrific.

TeX has plenty of warts by modern standards (and the LaTeX macro package even more so), but the suggestion that HTML+CSS work better for general layout use is ridiculous (the standards committee only heard that multi-column layouts are impossible without major hackery what, last year?). I tried docbook for a document a while ago, and it was horrible. SGML might be acceptable for machine generation, but not for human writing. The toolchain is even worse than TeX's, hard as that may be to believe.

A replacement for TeX would be fantastic, but its absence over the last 30 years suggests that it's difficult to get right and achieve critical mass.

siscia · on Oct 26, 2014

> LaTeX input files are proprietary to LaTeX, just as .doc is Word. The only definition of LaTeX as a language is “whatever LaTeX does”.

I am sorry but I don't see this point... Everybody is free to write a parser for .tex files and use it for whatever reason they want...

You are not free to do so with .doc or other proprietary format...

tokai · on Oct 26, 2014

His trouble seems to be not with LaTeX, but with materialism. No matter what medium he put his texts down in, will he find that the text is somewhat bound to that medium now and it'll take work to modify and/or transfer it. Pen and paper is "proprietary" in this line of thinking.

ealloc · on Oct 26, 2014

I don't think he understands what the word "proprietary" means. It means that the file format is secret or there are legal constraints on its use. Until recently Word documents clearly counted as prorietary, but now that microsoft has been forced to document its format to some degree .doc is listed as "controversially" proprietary on wikipedia.

http://en.wikipedia.org/wiki/Proprietary_format

Even today you will need to buy Microsoft Word to "properly" read .doc files. There are other readers (libreoffice) but they only do the basics, and in my experience they usually mangle the file.

LaTeX has never been proprietary because it has always been publicly documented.

habitue · on Oct 26, 2014

The point is, are there any other applications that will render a LaTeX document correctly that aren't LaTeX itself? It's open source, but in practice the code is so complicated and large no one has ever duplicated it. It's not proprietary in the traditional sense, but if you want your LaTeX file to not be mangled, you must use LaTeX itself.

ealloc · on Oct 26, 2014

There are many programs that implement subsets of LaTeX/TeX. For example, for math layout I believe Mathjax and matplotlib have both essentially copied Knuth's program, based on his "TeXBook" and "TeX: The Program" books, which document the TeX code extremely thoroughly using his "literate programming" technique. TeX is one of the best documented programs in existence.

Google "Latex implementation" and you will see a lot of hits. I see a Java implementation, Windows implementations, LaTeX3 and LuaTeX are referred to as reimplemetations 'nearly from scratch', etc.

nishonia · on Oct 26, 2014

TeX, pdfTeX, XeTeX, LuaTeX... there are a lot of engines that can properly render a *.tex file.

namnatulco · on Oct 27, 2014

And here's the problem: all of them are mutually incompatible in most situations. Even moving .tex documents over different platforms is an enormous pain, and pdflatex/xelatex don't possess the error reporting you need to quickly find which packages are missing on which systems.

Don't get me wrong, I love LaTeX, and I agree that the author misinterprets 'proprietary', but from a user standpoint the problem is the same: old documents are not rendered correctly, and new documents don't work with old compilers. It's a mess.

nishonia · on Oct 29, 2014

> all of them are mutually incompatible in most situations.

I haven't compiled against every implementation, but I did just recently rerun a report I create 4 years ago under a different engine. I spent about 20 minutes addressing the new complaints, when I finished that the generated pdf looked exactly the same as the old one - but with up to date data. Try that with html :) Heck, I've had the exact same experience switching C compilers. I do agree with you about the crappy error reporting. There is a reason why the Library of Congress is bundling data with binaries now, this is a very common problem - but in my experience Latex has fared much better than most formats.

jimhefferon · on Oct 26, 2014

> no one has ever duplicated it

But it is Free, so why are you duplicating it? I must be missing something.

mikeash · on Oct 26, 2014

The multitude of third-party .doc readers would seem to disprove that assertion.

In any case, it's clear from the text that he uses "proprietary" to mean "specified only by the canonical implementation". In this respect, .tex qualifies but .doc no longer does, although .doc is so bizarre and complex that writing another parser from the spec is... challenging.

xorcist · on Oct 26, 2014

Libraries such as wv has been built by reverse engineering the format, not from official specifications. The latter turned out to be pretty much useless as they didn't contain enough information to actually parse .doc in the wild.

lmm · on Oct 26, 2014

> I am sorry but I don't see this point... Everybody is free to write a parser for .tex files and use it for whatever reason they want...

It's great that you have the freedom to do that in theory. But it doesn't work in practice. The .tex format doesn't have a spec or independent implementations; it's complex and idiosyncratic, and there are no good general-purpose conversions from .tex to other formats (e.g. markdown, html). The only program you can really use .tex with is latex.

thesteamboat · on Oct 26, 2014

You might want to try checking out Pandoc: http://johnmacfarlane.net/pandoc/

habitue · on Oct 26, 2014

Pandoc can't replicate everything LaTeX does. It can take a heavily restricted subset of LaTeX and convert it to other markup languages. Nobody to date has duplicated LaTeX quirk for quirk.

dctoedt · on Oct 26, 2014

FTA: > LaTeX input files are proprietary to LaTeX, just as .doc is Word.

I must be missing something --- LaTex .tex documents are written in plain ASCII text files with pseudo-English tags indicating generally how text is to be processed (italics, bold, etc.).

FTC: > Everybody is free to write a parser for .tex files and use it for whatever reason they want...

Exactly. Pandoc supposedly converts from LaTex into many other formats (although I haven't personally tried any of those particular conversions).

habitue · on Oct 26, 2014

He means "de-facto proprietary" in that there is no standard for the output of LaTeX, except whatever LaTeX outputs. That means anyone who wants to build anither version has a huge amount of work to do endlessly duplicating the quirks of the original implementation. Imagine HTML being defined as "whatever Firefox does". You're chasing a very complicated moving target and you'll always be behind if you aren't just copying the source wholesale.

As to your second point, he mentions LaTeX converters in the article, saying you must write in a very restricted subset of LaTeX for it to convert properly. Obviously, pandoc doesn't have any way to turn everything LaTeX does into a markdown file.

That being said, I personally like LaTeX a lot. But I wanted to clarify the points the author was making.

lamby · on Oct 26, 2014

> You are not free to do so with .doc or other proprietary format...

Eh?

_oq9t · on Oct 26, 2014

doc format is not documented, obviously.

brandonbloom · on Oct 26, 2014

The doc (and docx) formats are actually very well documented, thanks to pressure from the EU:

[MS-DOC]: Word (.doc) Binary File Format

http://msdn.microsoft.com/en-us/library/office/cc313153(v=of...

[MS-DOCX]: Word Extensions to the Office Open XML (.docx) File Format

http://msdn.microsoft.com/en-us/library/dd773189(v=office.12...

cldellow · on Oct 26, 2014

"Very well" is a euphemism here, I assume?

I worked in Windows Server when Microsoft was under the US DOJ consent decree and had to document every thing that looked at all like an API--even internal things that were just APIfied for design reasons / ease of testability / to make servicing simpler.

I can say with some confidence that no one gave a shit about producing good quality docs. Without exception, people viewed the government requirement as onerous and excessive and we produced docs that were perhaps technically correct, but did not provide insight into why things were the way they were. No effort at ease of readability was made, either.

icefox · on Oct 26, 2014

That is really a shame that you guys didn't use this new requirement to improve your product and internal process. Your comment comes off as a group that was just obeying the letter of the law, but not the spirit of the law and I could only guess that this would easily spill over into all cases of documentation even the cases where it matters. Having a large group of developers believe that it isn't worth the time to make good API's and produce worse than horrible docs is really sad. Taking the time to create good API, even for internal use can uncover design flaws, reduce errors, make it faster to make changes, easier to test, and faster to bring in new developers. Here with a government mandate you could have used it as an excuse to grow as a group to become better at creating software.

cldellow · on Oct 26, 2014

I can see how this comes off as an insular group sticking it to the government, but that's not the case.

If I gave the impression that we didn't create good APIs or good docs, I apologize.

We did, but that's not what the government wanted, so we gave them what they would accept. The government just was not very good at deciding what has to be documented and what doesn't. e.g., we had to document sample wire traces of messages that are all auto generated through IDLs and sent over a standard protocol. Rather than 2 page of IDL and a comment saying we use transport X (which is defined in RFC blah), we were actually required to submit 100-pages of traces. That obscures, that does not help.

Even if you wanted to do a great job of producing docs, we quickly learned that the process wasn't about creating great docs; it was about producing docs that the government would accept. Have you seen Office Space? It's that. It's thankless, because you're generating shit docs that aren't relevant that are judged by people who don't have the skills to judge them.

brandonbloom · on Oct 26, 2014

Even a half-assed effort to produce a document no one cares about is "very well" compared to the majority of mission critical and/or open source systems out there for which the only documentation is a README and, if you're lucky, some mailing list archives.

jghn · on Oct 26, 2014

My understanding was that it'd be impossible to make a 100% compatible docx parser even of armed with those docs. As an example, when the EU forced the issue I remember seeing stories about XML fields which simply contained undocumented blobs

al2o3cr · on Oct 26, 2014

Yep, having options / tags that whose definition is LITERALLY "do whatever [some ancient version of Word] does" is totes well-documented.

Implementable, on the other hand, not so much...

tzs · on Oct 26, 2014

If you don't already know how to implement them you aren't supposed to implement them. The spec even tells you not to implement them (and Microsoft does not implement them). They are there for third parties who reverse engineered ancient Word and WordPerfect formats and built tool chains around them, and want to move to a newer format but need to mark places where they depend on quirks of those ancient programs.

Here's the use case this is aimed at. Suppose I run, say, a law office, and we've got an internal document management system that does things like index and cross reference documents, manage citation lists, and stuff like that. The workflow is based on WordPerfect format (WordPerfect was for a long time the de facto standard for lawyers).

Now suppose I want to start moving to a newer format for storage. Say I pick ODF, and start using that for new documents, and make my tools understand it. I'd like to convert my existing WordPerfect documents to ODF. However, there are things in WordPerfect that cannot be reproduced exactly in ODF, and this is a problem. If my tools need to figure out what page something is on, in order to generate a proper citation to that thing, and I've lost some formatting information converting to ODF, I may not get the right cite.

So what am I going to do? I'm going to add some extra, proprietary markup of my own to ODF that lets me include my reverse engineered WordPerfect knowledge when I convert my old documents to ODF, and my new tools will be modified to understand this. Now my ODF workflow can generate correct cites for old documents. Note that LibreOffice won't understand my additional markup, and will presumably lose it if I edit a document, but that's OK. The old documents I converted should be read-only.

Of course, I'm not the only person doing this. Suppose you also run a law office, with a WordPerfect work flow, and are converting to an ODF work flow. You are likely going to add some proprietary markup, just like I did. We'll both end up embedding the same WordPerfect information in our converted legacy documents, but we'll probably pick different markup for it. It would be nice if we could get together, make a list of things we've reverse engineered, and agree to use the same markup when embedding that stuff in ODF.

And that's essentially what they did in OOXML. They realized there would be people like us with our law offices, who have reverse engineered legacy data, that will be extending the markup. So they made a list of a bunch of things from assorted past proprietary programs that were likely to have been reverse engineered by various third parties, and reserved some markup for each.

Rhapso · on Oct 26, 2014

I use latex daily and tolerate it. If you can offer me a tool that does a similar job (conforms to ieee journal style specifications and deals with citations well) then I would love to hear about it.

The biggest reason I do like latex, is that it allows me to put 1 sentence per line and has a text based format that git handles well. This makes collaborative editing and writing much more manageable.

sktrdie · on Oct 26, 2014

I understand some of the negative things about LaTeX but it generates outstanding documents that no other processing system can match so far. Specifically the typography:

http://tex.stackexchange.com/questions/1319/showcase-of-beau...

http://tex.stackexchange.com/questions/85904/showcase-of-bea...

Also the graphs are outstanding:

http://tex.stackexchange.com/questions/158668/nice-scientifi...

hdevalence · on Oct 26, 2014

I don't see anything on that page that couldn't be done with, say, InDesign.

sktrdie · on Oct 26, 2014

Probably but I don't think most people can afford InDesign.

htns · on Oct 26, 2014

>LaTeX will never be able to effectively render a LaTeX document to a web browser’s finite view of an infinitely tall document or read it aloud, because its output primitives are oriented around print media, and its apparently declarative constructs are defined in terms of them.

That is somewhat untrue. You can lay out content in a box of fixed width, and then set the page size to that plus margins. Now perhaps the performance isn't good enough for real time rendering, but setting up custom schemes is very possible and not as difficult as people might fear.

And LaTeX already supports PDF search, so I don't see why it could not support accessibility features like speaking the text.

graycat · on Oct 26, 2014

Good news for the OP: The OP seems to want more output options than just paper or PDF. It appears that the OP also wants HTML output. Okay.

TeX and LaTeX say next to nothing about the final physical form of the document and, instead, leave all that to an appropriate device driver. Or, TeX (and likely LaTeX) puts out a file called device independent with three letter file name extension DVI.

Basically a DVI file says, put this character here on the page, then put the next character there on the page, etc. Then move to a new page.

Well, then, it would appear that there could be a device driver that would convert a DVI file to HTML. And there should be a way to have the HTML file make use of suitable fonts and the special math symbols. Besides, now Unicode has a lot of characters and symbols.

It appears that the OP feels that typing a paper into TeX or LaTeX somehow locks him into TeX in a bad way. But, TeX is fully open source with some of the best and most beautiful software documentation ever written.

andreasvc · on Oct 26, 2014

There are latex to HTML converters and they only work for a subset of latex functionality. It is not as simple as defining a driver that outputs HTML.

Latex and HTML work in fundamentally different ways: latex typesets for a fixed paper size, and specifies its coordinates in physical dimensions. HTML is free flowing; if the user resizes the window the layout should adapt, they layout has to work on mobile devices, etc. The HTML way asks for a completely different way of designing layout, and Latex is simply not the right tool for that job.

Incidentally I believe that the Latex way of typesetting for a specific paper size is superior, because it allows the typesetter to manually arrange everything until it looks just right, whereas with HTML there are x number of browsers, with y number of screen sizes, and you have much less control over the final look.

graycat · on Oct 26, 2014

Let me be more clear: Long ago a friend kept suggesting that I write a converter from TeX (maybe also LaTeX) to HTML. I kept telling him that that was essentially impossible because TeX is a programming language, likely Turing machine equivalent, complete with if-then-else, allocate-free, file read-write, while HTML is just a text markup language. No doubt JavaScript is Turning machine equivalent, but I'd have a tough time believing that HTML is.

So, my suggestion here was not to convert TeX input to HTML.

Instead my suggestion was just to convert TeX output, that is, a DVI file, to HTML. Why? Because a DVI file is essentially just text, or, as I outlined, it specifies put this character at these coordinates on the page, put that character there on the page, go to a new page, etc.

To be more clear, say, about the file reading-writing, that happens when the TeX program reads the user's TeX input and before the DVI file is generated. Given only the DVI file and displaying it, there is no file reading-writing.

So, it looks like could convert TeX DVI output to HTML.

You pointed out that maybe HTML with a browser has more flexibility than TeX output. Okay, maybe. But I didn't claim that, given an HTML file, there would be a TeX input file and a corresponding TeX DVI output file that my envisioned converter would convert to the given HTML file. Instead, I just claimed that for a given TeX and DVI file, the converter would generate an HTML file.

Or the converter would be a function from the set of all TeX DVI files to the set of all HTML files. That is, for each TeX DVI file there would be a corresponding HTML file from the converter. But the function would not be onto the set of HTML files, that is, not all HTML files would be a value of the converter; not all HTML files could be obtained by using TeX input, the TeX program, the DVI file and the envisioned converter.

You also mentioned some ways in which HTML, say, with <div>, is more flexible than TeX. Fine. But I was discussing just converting TeX DVI to HTML.

And, again, I see no way to convert TeX input, which is a programming language, to HTML, which is not a programming language.

Whew!

More clear now?

Leszek · on Oct 26, 2014

The point isn't to convert a TeX program into an equivalent HTML program, it's to be HTML an output of a TeX program. For example, make \emph{foo} output "<i>foo</i>" instead of "/Times-Italic 12 selectfont (foo) show" or whatever the PS output would be.

graycat · on Oct 27, 2014

> The point isn't to convert a TeX program into an equivalent HTML program.

You are correct, of course. And one of my main points is that such a conversion is essentially impossible. E.g., TeX can read and write files, but, thankfully for Internet security, HTML can't.

So, my solution and envisioned converter is to convert TeX output in a DVI file to an HTML file. Such a converter seems doable and to solve a concern the OP had.

Further, my envisioned converter from DVI to HTML would do just what you are describing.

Or, the DVI file has to put the 'f' of 'foo' at some coordinates (x,y) on the page in some font, say, some bold font. Fine. TeX can handle lots of fonts, the many standard ones and more if want to make routine use of the ability of TeX to handle essentially any font given in the form TeX wants.

Want to create your own fonts? Knuth has been there, done that, and left a terrific tool MetaFont, open source, beautiful documentation. Create all the fonts you want and have TeX use them. Then create equivalent fonts for HTML and that a Web browser can use. Such work with fonts is just making routine use of what TeX has had for decades.

So, from the DVI, write to the HTML the markup string

     <b>f</b>

at a position given by absolute coordinates while also specifying the desired font. That's about all there is to it. Seems quite doable to me.

Want to convert to PS? Okay, from the times I read the big, red Adobe books on PS, converting from DVI to PS is also quite doable. Indeed, there is likely a TeX device driver for that conversion now, as there is from DVI to PDF -- which I use heavily. Indeed, checking, my script for converting DVI to PDF uses EXE

     I:\protex1p2_run\miktex\bin\dvipdfm.exe

and that EXE is standard in the TeX world. It works fine.

Leszek · on Oct 27, 2014

Any reasoning about TeX being able to do things that HTML can't is irrelevant. TeX -> PDF can be done without an intermediate DVI stage using pdftex. There could therefore be a similar "htmltex" which could directly convert TeX -> HTML.

In the same way that pdftex has the advantage of knowing its output format (and can e.g. write pdf metadata), this hypothetical "htmltex" would know that its output is html, and could do things like allowing paragraph re-flow and embedding maths using MathJax.

Of course, this wouldn't be easy, you'd likely need to fork TeX to implement it correctly (or only support a subset of LaTeX features like the current TeX->HTML converters), but it's far from impossible.

graycat · on Oct 27, 2014

You are correct. And I am correct. But we are not taking about even a little bit of the same thing.

Once again I will try to be clear: Knuth's work resulted in a computer program, TeX, as an EXE file, say, tex.exe.

A user of TeX as a word processor types in a file with three letter extension TEX, say, my_math.tex. This file, my_math.tex, actually is a computer program, that is, has allocate-free storage, if-then-else, file read-write, arithmetic, string manipulations, etc. This computer program my_math.tex is not Knuth's program tex.exe.

Yes, maybe not all TeX users have their TeX input files, say, my_math.tex, do file reading or writing, but such file manipulations are just routine usage of TeX that I do nearly always. And I have some TeX macros I wrote that do storage allocation-freeing. Maybe not all TeX users do such things, but they are routine usage of TeX, and I do them.

To be more clear on just why file my_math.tex is a computer program, when Knuth's tex.exe runs file my_math.tex (interpretively), the program my_math.tex can read files. Then the output my_math.dvi can vary depending on what was in the file, say, my_math.dat that program my_math.tex read.

Well, there can be no file my_math.htm that will read a file my_math.dat, that is, read the file and process it like my_math.tex can.

So, if only for this reason, as a result, program my_math.tex can never be translated to a file my_math.htm. And program my_math.tex can't be translated to my_math.pdf or my_math.ps either.

But a file my_math.dvi, from my_math.tex and a particular my_math.dat, can be translated to a file my_math.pdf or my_math.ps.

And in this thread I have been suggesting that there could be a program that would translate my_math.dvi to my_math.htm.

> TeX -> PDF can be done without an intermediate DVI stage using pdftex.

Although this is a small point, for pdftex, I am quite sure that internally a DVI file is generated if only because that is what Knuth's program tex.exe generates and rewriting Knuth's TeX code, likely now in C, say, tex.c, would be both unnecessary and the difficult approach. Just generating the DVI file is the easy approach, even if don't have the user aware of the intermediate DVI file.

What PDFTEX does I do frequently by putting in the extra step of going to DVI and then from DVI to PDF. Fine.

I want the DVI file because I like the DVI preview program I have and like it much more than than using a PDF viewer. When I get something that looks good with my DVI preview program, then usually I go ahead and make the PDF file.

However, what I am doing getting a PDF file and what you are talking about with pdftex are not, in the sense I am discussing, a translation of TeX to PDF. Not at all.

> Any reasoning about TeX being able to do things that HTML can't is irrelevant.

True for what you are talking about. False for my point that a file my_math.tex can't be translated to a file my_math.htm.

Or, for a short explanation, you are saying that a file my_math.dvi can be translated to file types PS and PDF and maybe also HTM, and I agree. But I am also saying that a file my_math.tex cannot ever be translated to a file my_math.htm.

To be still more clear, HTML is a mark-up language, and TeX looks like it is also a mark-up language, so one might try to translate TeX mark-up to HTML mark-up. Well, such a translation is just impossible, and will always be.

DanBC · on Oct 26, 2014

>HTML is free flowing; if the user resizes the window the layout should adapt, they layout has to work on mobile devices, etc. The HTML way asks for a completely different way of designing layout, and Latex is simply not the right tool for that job.

I agree that HTML should re-flow. Let's try with the OP article.

<http://imgur.com/LWrS6pn>

tonfa · on Oct 26, 2014

Though he points out this mostly applies to philosophy papers. Many of the points do not really apply in some other scientific fields (I usually had no trouble submit LaTeX papers for CS journals/conferences).

TTPrograms · on Oct 26, 2014

The primary strength of using LaTeX is math typesetting. If you're not writing equations the argument gets to be very subjective. If you are writing equations there's nearly no alternative (at least one nearly as well proven).

qznc · on Oct 26, 2014

What is the alternative? The author does not propose any.

Office documents do not play well with version control systems.

HTML for printing is quite poor and buggy on features (e.g. column layout, page-relative positioning).

sampo · on Oct 26, 2014

> What is the alternative? The author does not propose any.

The author is an academic. He is concerned about writing papers. His solution is to write papers in Word, submit to journals, and let the publisher worry about the final layout.

titanomachy · on Oct 26, 2014

The article mentions markdown, HTML. I think markdown is more practical for actually writing in, and combined with pandoc it can be very powerful. Both of these formats should work well with version control. Personally, I find that markdown's minimalism goes very nicely with git. As long as you use suitable line wrapping it generates very concise and helpful diffs.

Bonus: if you're using pandoc you get native use of LaTeX's math mode.

xorcist · on Oct 26, 2014

Even if you're using pandoc + markdown you're still dependent on LaTeX, so the question to ask is really why not write in LaTeX directly to begin with? If you already know it, and you do if you are in certain parts of academia, it's probably the easiest route. Or put differently: nobody has even gotten rejected for typing in LaTeX, if you pardon the expression.

mxfh · on Oct 26, 2014

Wrote my thesis in IDML, worked surprisingly well, yet Adobe InDesign is a beast of its own. And once you run into more problems you're pretty much on your own.

przemoc · on Oct 26, 2014

All hard (but honest) LaTeX fans may enjoy reading my old post about LaTeX, which is somehow loosely related (I hope I'm not overstretching it :>).

Getting things done in LaTeX (or not)

http://abyss.przemoc.net/post/28208254393/getting-things-don...

Surely any modern complete (La)TeX replacement would be a good thing to have, but I haven't found out any yet, so LaTeX IMHO still remains one of the best choices when it comes to writing/publishing stuff.

I think that reStructuredText could be a nice foundation for some more generic writing/publishing solution, where TeX notation could be still uded for math environments (as I don't know any better one for that). Markdown is too vague, imprecise and inflexible, and CommonMark - a strongly specified, highly compatible implementation of Markdown - is not much better, mostly due to Markdown compatibility.

EDIT: AsciiDoc could be also used instead of reST.

andridk · on Oct 26, 2014

I don't disagree with his arguments. But, it's still the best typesetting tool that I know of.

ObviousScience · on Oct 26, 2014

I too agree with all of his points... but what's a realistic alternative stack that satisfies those points without drastically cutting down on the available rendering tools and packages for specialized tasks?

brodney · on Oct 26, 2014

If .rtf/.doc is in such high demand, can't we output to those formats using LaTeX? I think of it as just another output alongside dvi/pdf/etc, but I know very little of the internals that would generate those additional formats.

jarvist · on Oct 26, 2014

Yes, it's what I do when I submit to a journal that accepts submissions as RTF/DOC. It even has acceptable recoding for the maths.

http://latex2rtf.sourceforge.net/

ObviousScience · on Oct 26, 2014

I mean, I view .DOC was worse than Latex in terms of ability to correctly render it in the future, ability to generate complex documents correctly from originals, ability to programmatically interact with it, and generally anything to do with the future.

I'm tempted to go down some XML path, because that separates concerns between the semantic structuring of the document/corpus and the rendering of it, but is that really better than just using a declarative subset of LaTeX and worrying about correctly implementing the styling scripts to render them as desired?

I have my doubts it would really be an improvement.

For context, I have a project at work coming up for which I have a bit of time to establish a toolchain and our format for things like documentation, specifications, etc. I'm open to the suggestion I should spend some of that time working on a system to make sure we don't hit a rendering issue on a technical manual in a few years when technologies change. (I'd also like to look in to literate programming tools, so semantic demarcation for automatic selection of certain kinds of elements in the document is high on my list of things to look in to, as well as relationships between and metadata in those blocks.)

I'm just not convinced that trying to replace Latex with XML or anything of that nature is actually going to make my life better in those regards, rather than being a waste of time.

(If you haven't noticed, XML is sort of the main alternative to Latex in my mind for the things I'm trying to do; perhaps there are better options.)

UnoriginalGuy · on Oct 26, 2014

It would be more rational to output to HTML5, since there are insane amounts of HTML to X converters around (APIs and tools alike). PDF or Doc from HTML is utterly trivial at this stage.

Derbasti · on Oct 26, 2014

I am very tempted to try to write my next publication in HTML. However, I seriously worry about things like footnotes, code examples, floating figures and references. CSS3 seems to have support for many of these, but I wonder how well the convert-to-PDF pipeline really works, and how flexible it really is.

It's bad enough if I have to convert my original source to some other format years down the line, but it is absolutely critical that I can at least create the initial PDF correctly.

krakensden · on Oct 26, 2014

Math conversions tend to be lossy or involve rasterization (that inevitably ages poorly).

Xcelerate · on Oct 26, 2014

When I started grad school, I initially used LaTeX to prepare my articles. But then my advisor tried to open one of my documents on his machine (Windows, I'm using a Mac) and some kind of weird error came up. So we spent one whole advising session with him searching for MiKTeX and installing it, whereupon my document still wouldn't load for whatever reason.

Then once BibTeX got in the mix, it became even messier. Well, I finally finished the article and submitted it to a journal, but they had so many requirements for LaTeX submissions that it took a while to change everything for it. Once I finally submitted it -- surprise! -- their online LaTeX compiler came back with another weird error. It took me about 8 more submission attempts to isolate the "bug" (which was some kind of issue due to differences in versions and default packages installed on their machine vs mine), and I finally got a PDF generated. Except it still didn't look the same as mine for whatever reason.

Well, the article got rejected from that journal, and so as I'm applying for another journal, I read "Word documents only".

So I gave up on LaTeX. Too much of a hassle for me. Maybe it's a smooth process for everyone else, but I don't have any problems just typing a Word document and sending it off. (Although I do miss LaTeX's equation typesetting system.)

qznc · on Oct 26, 2014

TeX sources are only for co-authors. Everybody else gets the generated pdf. Advisors et al can annotate the pdf or print-and-scribble. All my publishing venues only gave templates and required a pdf.

evanb · on Oct 26, 2014

The preferred format for arXiv submissions is source.

http://arxiv.org/help/submit#text

joosters · on Oct 26, 2014

LaTeX: great for those compsci students who don't want to stop programming, even when they come to write up their projects :)

tehabe · on Oct 26, 2014

I think LaTeX has one disadvantage and it is not mentioned in the linked article. It has nothing to do with this "Cargo cult" thing, which I'm not sure it used correctly in this text but nevermind that.

The disadvantage is, that while LaTeX has an excellent support for PDF or PS its support for e.g. EPUB is awful.

Otherwise there is nothing better to write longer text than LaTeX.

dragonwriter · on Oct 26, 2014

> The disadvantage is, that while LaTeX has an excellent support for PDF or PS its support for e.g. EPUB is awful.

Well, yeah, but ePub is just a packaging format for HTML as input to a system that does its own layout and pagination, LaTeX is a layout/pagination system. Using it to generate ePub makes about as much sense as using ePub reading system to generate LaTeX.

It'd be better to just have end-user device-side apps that compiled LaTeX to PS/PDF/etc. after plugging in a device-appropriate page size, using LaTex as the distribution format and the rendered format as the viewing format than to use LaTeX to generate ePub

tehabe · on Oct 26, 2014

LaTeX is but the code of a LaTeX isn't. You should be able to create whatever you want out of it. The commands in the file (just) have to be interpret differently and yes, this "just" thing is the big point, because it is not easy, especially for a language which is as old as LaTeX.

For me it is odd, I currently rely on LaTeX for my workflow. Creating documents which includes other PDF files and a certain pre defined layout is (for me) very easy to do in LaTeX. Especially if the ground work (layout) is done.

roel_v · on Oct 26, 2014

The whole article is exactly about that point. Epub is just a packaged set of html pages, for which tex is horrible, because it's page-centric.

robinhoodexe · on Oct 26, 2014

While (La)Tex could be better/faster/easier to use, it's still the best typesetting system there is.

brians · on Oct 26, 2014

It's funny to hear this compared to Markdown. The reference implementation of Markdown uses regexes applied in order---an operationally defined language. Only niche implementations like the beautiful Pandoc---also by a philosopher---use a real parser to provide a less leaky declarative semantics.

Animats · on Oct 26, 2014

LaTeX, and TeX, are the last gasp of a long line of development. That line starts with MIT's RUNOFF, which begat roff, nroff ("New ROFF"), troff, ditroff, psroff, and finally TeX. It's the last of the programming-language like word processing systems.

TeX assumes your final output is paper. That's an obsolete assumption. The math features are concerned with presentation. You can't cut a formula from TeX and paste it into Mathematica or MathCAD and have it understood. TeX doesn't understand mathematical notation; it just format it for printing.

We need to get past that.

sjtrny · on Oct 26, 2014

I think wolfram alpha has some support for tex notation.

CrLf · on Oct 26, 2014

As much as I like LaTeX, I think it has outlived its usefulness. Let me elaborate a bit...

LaTeX is pretty good for typesetting lots of text, like novels, if you want to output something that looks professional without much effort. It's also good for typesetting lots of mathematical formulas, as the GUI alternatives are pretty tedious to use and the result doesn't look as good.

It's also interesting as a template language, to produce documents from applications, but here's where it starts to become obsolete: there are a lot of alternatives to do this that don't force the person creating the templates to be technically-inclined and having to learn LaTeX.

Also, LaTeX becomes downright irritating when trying to make complex documents. The rules say you let LaTeX choose the looks of your document, while you focus on content, but it's impossible not to spend hours fighting it because it's not breaking pages or placing figures where we want it to (and iterating over this endlessly as the document is modified). You just cannot force yourself to let LaTeX do everything by itself.

So, it's useful for novels (which might as well be written in plain text and let the publisher do the typesetting) or documents with a lot of maths (which are nowadays most likely also complex documents that cannot go without human intervention to look good.)

I like LaTeX for the technical aspect of it, but in the end it's just a tool do accomplish some goal and, sorry, but Word doesn't consume that much resources anymore on a modern machine...

avian · on Oct 26, 2014

> LaTeX becomes downright irritating when trying to make complex documents.

Sorry, but this is simply not true. Maybe I could agree with a general "it is irritating to make complex documents", though.

After many bitter fights with WYSIWYG software and 300+ page technical documents, I firmly believe that LaTeX is by far the least irritating way of making them.

moomin · on Oct 26, 2014

latex2e is 20 years old. I appreciate that's not the same as permanence, but in terms of stability in computer science, it's far from shoddy.

praseodym · on Oct 26, 2014

Personally, I could just as well use Word for most of my work, but I like not having to use shortcuts or plough through menus to format my text. Markdown is a nice alternative, but doesn't support referencing figures and tables nor does it support citations.

lelf · on Oct 26, 2014

I don't know why many people, including the author skips (or don't even see?) the important point: we are not just miles and miles far away, we are in the entirely different reality from where automatic typography is any good. So the more it's “declarative” (whatever this means) the worse it will look. This is out ugly reality. And the document in LaTeX made by someone without a clue (≈ how we want it, simple and declarative) is f*cking MSWord-level ugly.

zwiteof · on Oct 26, 2014

So LaTeX sucks, but there's not a good alternative?

I'm in the midst of creating a document from scratch with LaTeX for the first time (as opposed to using a template provided to me) and while some things have been annoying, it's mostly been the learning curve of figuring how to do what I wanted. Tables are a mess though. It seems like I need to stitch 3 different packages together to do what I want with my tables.

etrautmann · on Oct 26, 2014

Markdown is a great idea, but are there markdown based solutions for 1) figure numbering and caption handling, 2) references, 3) automatically updating an index?

Anyone in academia needs to worry about all of these, and solutions in Word are sufficiently worse than LaTeX, which automatically updates all references, heading numbers, regenerates figures that have changed, and tracks your index for longer documents like theses.

Cheatboy2 · on Oct 27, 2014

I don't get the point about declarative vs operational paradigms. LaTeX does a great job separating the data and the processes (it is declarative in this sense), although it allows to mess with both inside the document for flexibility.

I can't see the fundamental difference with HTML: it indeed lacks "operational" commands, but is massively used with javascript that implements the dirty stuff.

impendia · on Oct 26, 2014

Philosophy journals won't accept manuscripts typeset in LaTeX?!

Journal publishers charge thousands of dollars for subscriptions to individual journals. It seems reasonable to occasionally expect publishers to, you know, do the work for which they are being paid. (And to do it competently.)

Sharlin · on Oct 26, 2014

As the author mentions, it's a seller's market: the publishers are in a position of both charging thousands of dollars for subscriptions and making their life as convenient as possible at the expense of others'.

Anyway, surely it's no wonder many philosophy journals don't accept LaTeX documents? As far as I know, LaTeX isn't really a thing outside hard sciences.

cvsintellect · on Oct 26, 2014

May be specialized applications giving a WYSIWYG editor while using LaTeX in the background be useful? Like the one we have built: www.cvsintellect.com. Its a CV / Résumé builder built on LaTeX but the user does not need to know LaTeX.

monochr · on Oct 26, 2014

I don't think I've ever seen a physics paper not typeset in Latex.

muyuu · on Oct 26, 2014

I had no idea that LaTeX was popular among philosophers.

siscia · on Oct 26, 2014

> a. It’s an operational rather than a declarative language.

If TeX was declarative you should wait for somebody to implement in the engine any features...

dendory · on Oct 26, 2014

I use LaTeX for anything complex or long form, and Markdown for anything else. I have yet to find something that can beat it.

indubitably · on Oct 26, 2014

We should be writing papers in HTML.

tincholio · on Oct 26, 2014

For someone who works in philosophy, his argumentation is extremely weak...

cheez · on Oct 26, 2014

So the alternative is to use Word? That's messed up.

marcosdumay · on Oct 26, 2014

It's argumentation from "my journal requires Word".

Somebody should add that to the Wikipedia list.

cheez · on Oct 26, 2014

I can see requiring org mode files, but not word.

Yetanfou · on Oct 26, 2014

LaTeX has one large advantage over lesser text processors which this article does not touch upon:

    Quidquid LaTeX dictum sit, altum videtur

Put two documents in front of someone used to quickly scanning scientific articles. One of them is formatted using LaTeX, the other using whatever document template Word happened to be launched with. In my experience the LaTeX-formatted article will be seen as a more reliable source than the Word-formatted one, even if the contents are similar.

sjtrny · on Oct 26, 2014

In my experience this is something that I do subconsciously but it's often justified. People who write in word do so because they are unaware of LaTeX or because they don't know how. The first means they are usually not familiar enough with the field to understand that LaTeX is the default. This lack of understanding is then also present in the content. The second reason is usually indicitive of someone who is lazy or is writing junk. If they can't write LaTeX then it is unlikely that they can write other code.

Sharlin · on Oct 26, 2014

This, of course, only applies in the fields where LaTeX is in fact a de facto standard.