The biggest advantage of latex is that by (mostly) separating content from presentation, you can use revision control systems like git or svn to collaborate on papers. You can be in that final hour before the paper submission deadline, on a skype call with four authors scattered around the world, all simultaneously editing the paper (each using whatever tools they prefer), and be reasonably confident it won't all end in tears. That's when you really understand the advantage of this latex-style markup. Don't get me wrong - there are lots of things I hate about latex, but I'll almost certainly keep using it because of the workflow it enables.
I agree with this, my thesis committee wanted my thesis as a research paper (irrespective of whether my work was going to be published or not) but in the final stages the graduate college declined to approve it in spite of my committee's approval because they require a predefined format of the thesis. The change of format took me a few minutes, thanks to latex. I can't begin to imagine how much work and anxiety it would have taken to change the entire format at the last minute otherwise.
The latter point is the reason I stopped using LaTeX even though I was proficient enough years past to type up lecture notes in realtime during class with little thought (lots of math). Collaboration with it requires everyone involved to know it.
With my adviser and collaborators, I ended up compiling PDFs, printing them, and then they would mark them up by hand because that was fastest. It was not ideal by any stretch.
Word has it's faults (cough figure placement and cross-references), but track changes and adding comments to the manuscript is just spot on and easy. Getting comments and revisions done in Word is just an order of magnitude easier, and just leaves me at the end to make sure all the figures are positioned properly with captions. And now that Word finally seems to have gotten Styles and sections working properly (so many horrible disasters with this in previous iterations) it's pretty workable for technical documents.
Even the equation editor now very nearly straight up accepts latex markup for the math to the point where it's good enough for all but heavy theory papers.
But with LaTeX, asking your busy adviser to look at the compiled output, and then mark up a separate file they have to parse in their head to comment on is just too much effort.
My schools's thesis template was incompatible with the packages which I had to use for a specific notation. I spent the rest of the year constantly wrestling it. The OP makes a good point, LaTeX ultimately does not separate content and presentation, nor is it declarative.
I'm old enough to have collaborated on papers using troff (strictly speaking psroff) and sccs as a revision control system, long before latex became popular and PDF didn't yet exist. So, yes, the primary requirement is, as you say, for an ASCII-based markup system to enable this workflow.
However, latex is extremely good at flowing text while applying good style guidelines. Most of the time you know that extending a paragraph by a line won't make anything bad happen - no titles left at the bottom of a page, acceptable stretch of line spacing, and things like that. Of course you can only push this so far before you have to fiddle about moving figures to the right page, etc. This is probably unavoidable, although it would be better to have a bit more direct control over where floats end up. The key thing is that latex does mostly get the small stuff right, so you're not constantly yelling at your collaborators for breaking anything.
I always see this claim that latex separates content from presentation, but I don't see how it's true. As the article says, \emph{hello} is implemented in terms of font commands; it entangles the semantics (emphasis) and the presentation (italicisation), and there is no way to extract the semantics of a latex document and render it in a different format the way you can with e.g. Markdown.
\emph does not always render as italics, it actually depends on the document class you're using, and you're free to redefine it as you see fit. It just indicates that the argument should be emphasized, not how.
Bad example, because \emph means emphasis, not italics, and you can represent the emphasis in any way you want — the italics is just the default. So there is separation of format and content.
There is a big difference between the web (HTML + CSS) and LaTeX (LaTex + styles) worlds in terms of culture, and that defeats the idea of "separating content from presentation".
On the web, that content and presentation are separate, that's a technically a possibility, and culturally an actuality. That is, while you have the technical tools to apply such a separation, the toolkit does not force it on you; you can choose to write html with in-place defined colors and font sizes and other style elements. That the standards are higher than that and one is expected to write clean structural HTML and specify style separately with CSS is just a cultural phenomenon.
In the LaTeX world (scientific community), separation of content and presentation is also technically feasible, but the culture that maintains it as standard is missing. Most LaTeX documents are written by scientists who just want the damn thing look as they prefer, and apply all nasty tricks that the system can offer to get at there. The more computer savvy ones engage macro writing to save them a few keystrokes, but that's far from a way of document authoring that keeps content and presentation separate.
Over the years I've written nearly 200 papers with about 70 co-authors, probably 75% of them done in latex (most of the rest in some nroff/troff variant). The process of latex authoring usually involves iteration, with different authors contributing different sections, then a certain amount of rewriting of each other's content. Papers typically have a page limit - often 12 pages. But you ignore all that while coming up with the first few drafts. A couple of days out from the submission deadline, someone (often me) starts to panic that the paper is 16 pages and needs to be 12. That's when you start going through and copy-editing, trimming non-essential content and rephrasing for conciseness. Only during the last 48 hours do you start to worry about tuning the layout, because you know from experience that it will likely all change in the last day. In the last day you're doing fine tuning, panicking that the paper is still 13 paqes and needs to be 12. Now you do a layout tuning pass, and it's now that authors tend to get into all the nasty tricks like negative vspace, tuning caption distances, and so forth. Even then, your co-authors are probably still modifying content without paying too much about the effects on layout. Finally, two hours before the deadline, you've got it down to 12 pages. Now you have to stop changing text without regards to layout.
Anyway, my point is that in the weeks/months of writing a paper, the messy mixup of content and presentation really only becomes relevant in the last two hours before the submission deadline. Up until that point, everyone is mostly using the same subset of latex, not worrying too much about the presentation part, because the coarse-grain paper layout is usually handled by the style sheet you get from the conference.
There are a few nice things about Latex for scientific publications:
1. It's style driven. I'm not sure how well you can do this is Word now. But in Latex it's pretty easy to reformat your document to match a journal or thesis style.
2. It's scriptable. I don't mean that it's Turing complete. But you can drive it with a Makefile. This is great for scientific publications I've found. You can script it such that Make will re-build you tools, re-run analysis, generate new figures, and then regenerate the article. This save a lot of time when you're iterating over a publication.
3. Well integrated reference management. Bibtex itself is a mess, but if you don't need to alter the reference format, it works well.
4. Equations!
I think it's probably overkill for what he's doing. Sounds like Markdown + a reference manager would be fine for him. But for a lot of scientific publications it has handy features. I too would like to find something else, but I've not seen anything.
I guess you could write everything in HTML! But it's obviously not well suited to this application.
1. has been possible in Word for ages (according to an article recently on the front page, it's been the main difference between Word and WordPerfect), and it's pretty much how you write documents of any size without going insane. I'm actually astonished how many people think that using Word means using direct formatting and eschewing styles.
I really wish people would use word's features - having the entire document marked up with appropriate headings makes its a breeze to go through and restyle a document.
I find that while collaborating on things, I have to make a pass that involves purely going through and marking up heading text appropriately.
----
I've found the best compromise is to have a sort of technical prologue that discusses how to edit the document, or forcing people to work in something like markdown where the lack of font sizes force the use of a heading class notation.
I prefer https://stackedit.io/editor because it includes mathjax support which means you get access to a large swath of LaTeX style math tools.
LaTeX is a good idea with a terrible implementation. The popularity of markdown (+variants) is a testament to the usefulness of plain text writing. However, LaTeX syntax is a clunky and the ecosystem is a scrapheap-challenge amalgamation of packages with assorted cross-incompatibilities.
Latex to PDF converters are also shockingly slow for this day and age, a simple document can take several seconds to compile compared to browsers which can re-flow complex HTML documents in milliseconds.
IMHO this is because of the ad-hoc nature of people using Latex, its been cobbled together by researchers based on their needs at the time while HTML+browsers have been carefully designed and optimized by people who know the intricacies of document rendering. Researchers just aren't very good software engineers as a rule so perhaps its not surprising that they produce something that more or less works but is not very well designed
I've said it before and I'll say it again: the single most effective use of ~$1 million for advancing math and physics research (two disciples for which no non-LaTeX solutions exist) would be to hire some developers for a couple of years and make an enterprise-quality successor to TeX. Keep the math syntax, make it handle infinite pages for the Web, and fix all the awful bits that waste hundreds of thousands of grad-student-man-hours each year.
This isn't fantasy. Zotero is evidence that custom built academic software funded by charitable foundations can be a tremendously positive service to the academic community.
To be fair, LaTeX has a more complicated method of processing text since it considers a lot of typographic issues that browsers do not, so it has to be slower than a browser when rendering text. That said, I don't know enough to say whether the amount it's slower is proportional or not.
There are a couple of issues here. First, Markdown and HTML simply punt on the vast majority of the issues that TeX solves. Just as the author rightly comments that TeX is not geared toward online publication, HTML is geared toward only that model. If you want to paginate HTML or Markdown, you do it yourself. Widows and orphans are (obviously) your problem to deal with. Compared to the work that TeX is doing, Markdown is, to a first order approximation, just catting the file. HTML can reflow a document in real-time because it's doing a really poor job of reflowing the document. Even when they work, they're just putting line breaks in whenever the width would otherwise be too wide. TeX is running a dynamic programming algorithm to minimize the "badness" of the line breaks across multiple lines and even paragraphs. And quite a lot of the time, the browser just throws its hands up and says, "fuck it, you can just scroll horizontally to read the rest of this line". You can't do that on paper. So of course it's faster. You might as well be complaining that Preview is faster than Photoshop.
HTML and Markdown don't do automatic hyphenation (across multiple languages). They don't do ligatures. They don't do proper text justification (neither does Microsoft Word or Libre Office for that matter). They don't do cross reference tracking (i.e., having automatically numbered sections, tables, figures, etc. with automatically updated references). They have no logic at all for automated float placement. Font handling is specified by a human instead of relying on algorithmic font selection and substitution when necessary. I could go on for pages of this.
I think the idea that web browser vendors are better at this sort of thing than TeX and LaTeX is so wrong I don't know where to start. The author complains that some of his 20 year old LaTeX articles rely on outdated files to render properly. While this is true, and very occasionally a problem, it's only very recently that you had even the vaguest hope that your HTML document would render the same way on two different computers owned by the same person today! Arguably, the biggest slice of the software industry is now devoted to making things render on browsers. And for Markdown, we quite recently saw that even the simplest text rendered in no fewer than 17 different ways depending on which software (and version) you processed it with. If my goal is to be able to reproduce the output of today 15 or 20 years from now, HTML would be the absolute worst choice I could think of, unless again, you stick with <b> tags and the like, and the subset of LaTeX you can reliably assume will always work gives you much broader coverage of the space of typesetting issues than the subset of HTML that doesn't change monthly does. Not to mention, I can still more easily go get an old LaTeX implementation to rebuild a document that doesn't compile anymore (but in 15 years, I've never had to). It's quite a lot harder to get Netscape Navigator 3 up and running to attempt to faithfully render a document I wrote in 1997.
Also, web browsers have historically been just about the buggiest, most insecure, and transient pieces of software we've ever written as a field, and TeX is famously maybe the highest quality piece of software ever written. It's more or less fine that the web changes every 18 months. It's a problem for archivists, but the web isn't really intended for that. Academic publications are though, and the impedance mismatch is, in my opinion, brutal.
The interface (by which I mean the programming language) of TeX and LaTeX is indeed pretty dreadful, but this is a really minor issue compared to the rest of it. There are a lot of things I dislike about LaTeX, but I don't see how HTML or Markdown is an improvement. You'd need a completely new thing that supported everything that LaTeX supports, and while you could certainly do so with a nicer language, you couldn't do it with something as clean and simple as Markdown -- there are just too many things you need to be able to tell it you want it to do.
I disagree that browsers (and I do mean modern browsers, i recognize it hasn't always been this way) are somehow solving an easier problem than tex or doing it in a half-arsed way - on the contrary they solve the very hard problem of correctly rendering content that might be badly formed or underdefined. I don't think there's anything in tex that you can't do in html5 and CSS - including ligatures, auto numbering, and so on.
As for, markdown that's just an example of how there is a demand for text-based writing (I could also give Restructured Text which has a much stricter spec than markdown). I think markdown could evolve to fill the Latex niche.
For a better implementation look at pandoc, which very cleanly parses documents to an internal data structure and convert that to a range of outputs, I think that's a much better basis for a document system. At the moment it has to go via Latex to produce PDF - in fairness latex still has the most mature pdf rendering system. I for one would like to see that change, I think we can do better.
As far as I know, every system that can go to LaTeX as an export option gives you a basic LaTeX document. I don't know how you tell Pandoc, for instance, "OK, I need three authors in the author block, centered horizontally, with their affiliations below their names. But authors 1 and 2 have the same affiliation, so only include that information once, but center it below both names as a unit."
How do I tell CSS that I want my bibliography to be sorted by author last name, and have the inline citations be of the form (Author, Year), except when I'm using the author's name in the text as a noun, in which case it should be just "Author (Year) showed that blah blah blah"? For that matter, I don't think CSS can even do justification properly (by properly, I mean not treating each line as an independent unit, but shifting text around within an entire paragraph to minimize deviation from the desired inter-word spacing globally). I know someone implemented TeX's algorithm in Javascript once upon a time, but I'm willing to bet it's not any faster than TeX.
I have no real argument against the idea that you could build something that does everything LaTeX does just as well. Clearly you can. I am arguing that LaTeX has a huge amount of really important things already built in, and people use those things every single day. You have to (a) have all that stuff ready on day one if you want people to use a new thing, and (b) getting from where you are today to that point will necessarily involve taking the nice clean thing that seems so much nicer than LaTeX and making it messier, uglier, and more complex. The only thing that makes Markdown, for instance, nice for people to use is that it only does a handful of common things, so it can make those common things simple and conventional. Bold to bold something. (Amusing and apropos to the topic, HN's version of Markdown appears to not allow me to type star-starBoldstar-star. Not with backslashes or any other way I can find). If you want to build a LaTeX clone though, you need to decide: what's going to be the simple, easy-for-people convention we use to denote "don't put a line break here, because these two characters are someone's initials" and "stack these equations in a group, centered on the equal signs, and include the individual equations on lines 1, 3, 5, and 8 in the global numbering of equations, but not the others." You're going to have to define a stylesheet of some sort to govern the rendering engine's myriad options (do I indent the first line of a paragraph, or should everything be left-aligned, but with extra vertical space between paragraphs).
CSS is arguably already uglier, messier, and more complex, and while I'm sure it's improving all the time, as of about five years ago, I think the entire internet was almost exclusively composed of porn and articles about how to center something vertically, in roughly equal proportion. Epub is an HTML+CSS based format specifically geared at the kind of thing that you'd need, and just like every other technology we're mentioning, it's terrible unless you're doing left-to-right, top-to-bottom, figure-less, table-less, text where formatting doesn't matter. Just like CSS3, we can say, Epub3 supports more stuff now! Someone let me know when it's safe to buy ebooks with code samples in them instead of getting the paper version.
CSS3 actually supports pagination, automatic numbering and referencing, hyphenation, justified text, and footnotes. At least that's what the spec says...
I'm happy to be corrected on that point, but then the question becomes: what gives us any confidence that the CSS spec is going to be followed in exactly the same way by multiple browser vendors consistently between now and 2034?
Certainly nothing in the history of client side rendering on the web gives me any faith in that proposition.
Also, and this is probably just me, but I find CSS even harder to use for bespoke layouts than TeX. Which gets to the last point I made -- certainly you could replace LaTeX with an equally capable substitute, but it's not clear that the substitute wouldn't necessarily recreate a lot of what people hate about LaTeX. Markdown is almost universally loved precisely because it can't do very much. The more features you add, the more cumbersome the mechanism to select them needs to be, and at some point, you just have LaTeX with angle brackets and tag selectors instead of curly braces.
For ACM style papers you need a two-column layout. On the first page at the bottom of the left column must be a copyright notice. As far as I know, CSS cannot do that.
> HTML+browsers have been carefully designed and optimized by people who know the intricacies of document rendering.
You're joking, right? HTML+CSS requires heaps of workarounds to achieve the most trivial layouts. The people behind these standards have no understanding of documents and no taste in software: they deem the absence of variables in CSS a feature, and the result is Less, scss, and similar preprocessors.
Had the CSS committee at least the sense to copy the boxes-and-glue model from TeX, things might not be so grim. As is, we seem to be stuck with their clumsiness for a long time.
The typesetting quality of web browsers doesn't even compare to that of TeX, which uses a dynamic programming algorithm to minimize the "badness" caused by line breaks in various places. This is aside from TeX's ability to typeset math.
> HTML+browsers have been carefully designed and optimized by people who know the intricacies of document rendering
I think HTML+browsers is something that has been cobbled together as well. Many times. With the added joy of useful features killed by political or profit driven reasons.
This is a really misleading statement, especially since it comes after him stating that "LaTeX is free in both senses".
LaTeX input files apparently lack a standard specification, which is admittedly bad, but then again, so do the input files of many programming languages that some people on HN are writing on a daily basis[0].
This is not the same thing as them being proprietary; anyone can write a new LaTeX parser and there is nothing stopping them legally or technically from doing so.
[0] I may be wrong, but I believe PHP and Ruby both fall under this category. Markdown is another example (everyone parses it in a slightly different way, and while it's generally consistent, there are definitely warts around the edge cases where it's clear that Markdown would benefit from having a standard).
By his definition, every document ever written in any DSL is in a propietary format.
Personally, I'm perfectly happy if I can open a document in a text editor and get the content that way. That obviously works perfectly fine with LaTeX, it doesn't work at all with doc.
> By his definition, every document ever written in any DSL is in a propietary format.
I think that's basically true. The good formats are formats that have a) a standard but more importantly b) multiple independent implementations.
> Personally, I'm perfectly happy if I can open a document in a text editor and get the content that way. That obviously works perfectly fine with LaTeX, it doesn't work at all with doc.
You can get the content, but not the formatting, which was presumably important if you were using latex. It's pretty trivial to extract the plain text from a word doc too.
I like Latex (Math/CS background), but I would definitely like to try some alternatives (like Asciidoc discussed in https://news.ycombinator.com/item?id=8509062 ). The problem is I never can tell which Markdown/Docbook inspired systems actually have working implementations and which ones are hot air. I don't want to end up with a big SGML mess that I can't do anything with or have to edit CSS just to render a book in a standard format.
Any recommendations/tutorials?
My ideal system would allow something like literate-programming/sweave/knitr. The notation could be any of markdownish/xml/ascii. I would have to be able to do call-outs/annotations on listings with tags (not insane/fragile region specifications). I need figures and charts. And I would have to be able to produce at least HTML5, PDF, epub, mobi. And I need support for footnotes, endnotes, tables of contents, indexes, and bibliographies. Flexibility in back-ends (like being able to render to Latex) would also be good.
Edit: the sweave/knitr thing I can live without (could probably arranged a pre-process to do this).
asciidoc is quite nice. It can output DocBook which I think is pretty well supported.
I'm currently writing a book in asciidoc which produces output in all your mentioned formats and includes footnotes, endnotes, call-outs, code listings, index etc. However, whilst I write almost exclusively in asciidoc, a lot of the styling etc is done by the publisher's docbook workflow.
Sphinx has most of these, though not all (it doesn't have the literate programming part, it could probably be done though; and the standard builders are html4). And I'll acknowledge that ReST is hard to love.
I wish the meme about the supposed superiority of "declarative" languages would go away. They have tradeoffs, like everything else. "Make" is also declarative — and horrific.
TeX has plenty of warts by modern standards (and the LaTeX macro package even more so), but the suggestion that HTML+CSS work better for general layout use is ridiculous (the standards committee only heard that multi-column layouts are impossible without major hackery what, last year?). I tried docbook for a document a while ago, and it was horrible. SGML might be acceptable for machine generation, but not for human writing. The toolchain is even worse than TeX's, hard as that may be to believe.
A replacement for TeX would be fantastic, but its absence over the last 30 years suggests that it's difficult to get right and achieve critical mass.
His trouble seems to be not with LaTeX, but with materialism.
No matter what medium he put his texts down in, will he find that the text is somewhat bound to that medium now and it'll take work to modify and/or transfer it.
Pen and paper is "proprietary" in this line of thinking.
I don't think he understands what the word "proprietary" means. It means that the file format is secret or there are legal constraints on its use. Until recently Word documents clearly counted as prorietary, but now that microsoft has been forced to document its format to some degree .doc is listed as "controversially" proprietary on wikipedia.
Even today you will need to buy Microsoft Word to "properly" read .doc files. There are other readers (libreoffice) but they only do the basics, and in my experience they usually mangle the file.
LaTeX has never been proprietary because it has always been publicly documented.
The point is, are there any other applications that will render a LaTeX document correctly that aren't LaTeX itself? It's open source, but in practice the code is so complicated and large no one has ever duplicated it. It's not proprietary in the traditional sense, but if you want your LaTeX file to not be mangled, you must use LaTeX itself.
There are many programs that implement subsets of LaTeX/TeX. For example, for math layout I believe Mathjax and matplotlib have both essentially copied Knuth's program, based on his "TeXBook" and "TeX: The Program" books, which document the TeX code extremely thoroughly using his "literate programming" technique. TeX is one of the best documented programs in existence.
Google "Latex implementation" and you will see a lot of hits. I see a Java implementation, Windows implementations, LaTeX3 and LuaTeX are referred to as reimplemetations 'nearly from scratch', etc.
And here's the problem: all of them are mutually incompatible in most situations. Even moving .tex documents over different platforms is an enormous pain, and pdflatex/xelatex don't possess the error reporting you need to quickly find which packages are missing on which systems.
Don't get me wrong, I love LaTeX, and I agree that the author misinterprets 'proprietary', but from a user standpoint the problem is the same: old documents are not rendered correctly, and new documents don't work with old compilers. It's a mess.
> all of them are mutually incompatible in most situations.
I haven't compiled against every implementation, but I did just recently rerun a report I create 4 years ago under a different engine. I spent about 20 minutes addressing the new complaints, when I finished that the generated pdf looked exactly the same as the old one - but with up to date data. Try that with html :) Heck, I've had the exact same experience switching C compilers. I do agree with you about the crappy error reporting. There is a reason why the Library of Congress is bundling data with binaries now, this is a very common problem - but in my experience Latex has fared much better than most formats.
The multitude of third-party .doc readers would seem to disprove that assertion.
In any case, it's clear from the text that he uses "proprietary" to mean "specified only by the canonical implementation". In this respect, .tex qualifies but .doc no longer does, although .doc is so bizarre and complex that writing another parser from the spec is... challenging.
Libraries such as wv has been built by reverse engineering the format, not from official specifications. The latter turned out to be pretty much useless as they didn't contain enough information to actually parse .doc in the wild.
> I am sorry but I don't see this point... Everybody is free to write a parser for .tex files and use it for whatever reason they want...
It's great that you have the freedom to do that in theory. But it doesn't work in practice. The .tex format doesn't have a spec or independent implementations; it's complex and idiosyncratic, and there are no good general-purpose conversions from .tex to other formats (e.g. markdown, html). The only program you can really use .tex with is latex.
Pandoc can't replicate everything LaTeX does. It can take a heavily restricted subset of LaTeX and convert it to other markup languages. Nobody to date has duplicated LaTeX quirk for quirk.
FTA: > LaTeX input files are proprietary to LaTeX, just as .doc is Word.
I must be missing something --- LaTex .tex documents are written in plain ASCII text files with pseudo-English tags indicating generally how text is to be processed (italics, bold, etc.).
FTC: > Everybody is free to write a parser for .tex files and use it for whatever reason they want...
Exactly. Pandoc supposedly converts from LaTex into many other formats (although I haven't personally tried any of those particular conversions).
He means "de-facto proprietary" in that there is no standard for the output of LaTeX, except whatever LaTeX outputs. That means anyone who wants to build anither version has a huge amount of work to do endlessly duplicating the quirks of the original implementation. Imagine HTML being defined as "whatever Firefox does". You're chasing a very complicated moving target and you'll always be behind if you aren't just copying the source wholesale.
As to your second point, he mentions LaTeX converters in the article, saying you must write in a very restricted subset of LaTeX for it to convert properly. Obviously, pandoc doesn't have any way to turn everything LaTeX does into a markdown file.
That being said, I personally like LaTeX a lot. But I wanted to clarify the points the author was making.
I worked in Windows Server when Microsoft was under the US DOJ consent decree and had to document every thing that looked at all like an API--even internal things that were just APIfied for design reasons / ease of testability / to make servicing simpler.
I can say with some confidence that no one gave a shit about producing good quality docs. Without exception, people viewed the government requirement as onerous and excessive and we produced docs that were perhaps technically correct, but did not provide insight into why things were the way they were. No effort at ease of readability was made, either.
That is really a shame that you guys didn't use this new requirement to improve your product and internal process. Your comment comes off as a group that was just obeying the letter of the law, but not the spirit of the law and I could only guess that this would easily spill over into all cases of documentation even the cases where it matters. Having a large group of developers believe that it isn't worth the time to make good API's and produce worse than horrible docs is really sad. Taking the time to create good API, even for internal use can uncover design flaws, reduce errors, make it faster to make changes, easier to test, and faster to bring in new developers. Here with a government mandate you could have used it as an excuse to grow as a group to become better at creating software.
I can see how this comes off as an insular group sticking it to the government, but that's not the case.
If I gave the impression that we didn't create good APIs or good docs, I apologize.
We did, but that's not what the government wanted, so we gave them what they would accept. The government just was not very good at deciding what has to be documented and what doesn't. e.g., we had to document sample wire traces of messages that are all auto generated through IDLs and sent over a standard protocol. Rather than 2 page of IDL and a comment saying we use transport X (which is defined in RFC blah), we were actually required to submit 100-pages of traces. That obscures, that does not help.
Even if you wanted to do a great job of producing docs, we quickly learned that the process wasn't about creating great docs; it was about producing docs that the government would accept. Have you seen Office Space? It's that. It's thankless, because you're generating shit docs that aren't relevant that are judged by people who don't have the skills to judge them.
Even a half-assed effort to produce a document no one cares about is "very well" compared to the majority of mission critical and/or open source systems out there for which the only documentation is a README and, if you're lucky, some mailing list archives.
My understanding was that it'd be impossible to make a 100% compatible docx parser even of armed with those docs. As an example, when the EU forced the issue I remember seeing stories about XML fields which simply contained undocumented blobs
If you don't already know how to implement them you aren't supposed to implement them. The spec even tells you not to implement them (and Microsoft does not implement them). They are there for third parties who reverse engineered ancient Word and WordPerfect formats and built tool chains around them, and want to move to a newer format but need to mark places where they depend on quirks of those ancient programs.
Here's the use case this is aimed at. Suppose I run, say, a law office, and we've got an internal document management system that does things like index and cross reference documents, manage citation lists, and stuff like that. The workflow is based on WordPerfect format (WordPerfect was for a long time the de facto standard for lawyers).
Now suppose I want to start moving to a newer format for storage. Say I pick ODF, and start using that for new documents, and make my tools understand it. I'd like to convert my existing WordPerfect documents to ODF. However, there are things in WordPerfect that cannot be reproduced exactly in ODF, and this is a problem. If my tools need to figure out what page something is on, in order to generate a proper citation to that thing, and I've lost some formatting information converting to ODF, I may not get the right cite.
So what am I going to do? I'm going to add some extra, proprietary markup of my own to ODF that lets me include my reverse engineered WordPerfect knowledge when I convert my old documents to ODF, and my new tools will be modified to understand this. Now my ODF workflow can generate correct cites for old documents. Note that LibreOffice won't understand my additional markup, and will presumably lose it if I edit a document, but that's OK. The old documents I converted should be read-only.
Of course, I'm not the only person doing this. Suppose you also run a law office, with a WordPerfect work flow, and are converting to an ODF work flow. You are likely going to add some proprietary markup, just like I did. We'll both end up embedding the same WordPerfect information in our converted legacy documents, but we'll probably pick different markup for it. It would be nice if we could get together, make a list of things we've reverse engineered, and agree to use the same markup when embedding that stuff in ODF.
And that's essentially what they did in OOXML. They realized there would be people like us with our law offices, who have reverse engineered legacy data, that will be extending the markup. So they made a list of a bunch of things from assorted past proprietary programs that were likely to have been reverse engineered by various third parties, and reserved some markup for each.
I use latex daily and tolerate it. If you can offer me a tool that does a similar job (conforms to ieee journal style specifications and deals with citations well) then I would love to hear about it.
The biggest reason I do like latex, is that it allows me to put 1 sentence per line and has a text based format that git handles well. This makes collaborative editing and writing much more manageable.
I understand some of the negative things about LaTeX but it generates outstanding documents that no other processing system can match so far. Specifically the typography:
>LaTeX will never be able to effectively render a LaTeX document to a web browser’s finite view of an infinitely tall document or read it aloud, because its output primitives are oriented around print media, and its apparently declarative constructs are defined in terms of them.
That is somewhat untrue. You can lay out content in a box of fixed width, and then set the page size to that plus margins. Now perhaps the performance isn't good enough for real time rendering, but setting up custom schemes is very possible and not as difficult as people might fear.
And LaTeX already supports PDF search, so I don't see why it could not support accessibility features like speaking the text.
Good news for the OP: The OP seems to
want more output options than just
paper or PDF. It appears that the OP
also wants HTML output. Okay.
TeX and LaTeX say next to
nothing about the final physical form of the document
and, instead, leave all that to an appropriate
device driver. Or, TeX (and likely LaTeX)
puts out a file called device independent
with three letter file name extension DVI.
Basically a DVI file says, put this character
here on the page, then put the next character
there on the page, etc. Then move to a new page.
Well, then, it would appear
that there could be a device driver that would
convert a DVI file to HTML. And there should be
a way to have the HTML file make use of suitable
fonts and the special math symbols. Besides,
now Unicode has a lot of characters and symbols.
It appears that the OP feels that typing a paper into
TeX or LaTeX somehow locks him into
TeX in a bad way. But, TeX is fully
open source with some of the best and
most beautiful software documentation
ever written.
There are latex to HTML converters and they only work for a subset of latex functionality. It is not as simple as defining a driver that outputs HTML.
Latex and HTML work in fundamentally different ways: latex typesets for a fixed paper size, and specifies its coordinates in physical dimensions. HTML is free flowing; if the user resizes the window the layout should adapt, they layout has to work on mobile devices, etc. The HTML way asks for a completely different way of designing layout, and Latex is simply not the right tool for that job.
Incidentally I believe that the Latex way of typesetting for a specific paper size is superior, because it allows the typesetter to manually arrange everything until it looks just right, whereas with HTML there are x number of browsers, with y number of screen sizes, and you have much less control over the final look.
Let me be more clear: Long ago a friend kept suggesting
that I write a converter from TeX (maybe also LaTeX)
to HTML. I kept telling him that that was essentially
impossible because TeX is a programming language,
likely Turing machine equivalent, complete with
if-then-else, allocate-free, file read-write, while
HTML is just a text markup language. No doubt
JavaScript is Turning machine equivalent, but I'd have
a tough time believing that HTML is.
So, my suggestion here was not to convert TeX input
to HTML.
Instead my suggestion was just to convert TeX output,
that is, a DVI file, to HTML. Why? Because a DVI file
is essentially just text, or, as I outlined, it
specifies put this character at these coordinates on the
page, put that character there on the page,
go to a new page, etc.
To be more clear, say, about the file reading-writing,
that happens when the TeX program reads the user's
TeX input and before the DVI file is generated.
Given only the DVI file and displaying it,
there is no file reading-writing.
So, it looks like could convert TeX DVI output to HTML.
You pointed out that maybe HTML with a browser has
more flexibility than TeX output. Okay, maybe. But
I didn't claim that, given an HTML file, there would
be a TeX input file and a corresponding TeX DVI output
file that my envisioned converter would convert to
the given HTML file. Instead, I just claimed that
for a given TeX and DVI file, the converter would
generate an HTML file.
Or the converter would
be a function from the set
of all TeX DVI files to the set of all HTML files.
That is, for each TeX DVI file there would be
a corresponding HTML file from the converter.
But the function would not be onto the
set of HTML files, that is, not all HTML
files would be a value of the converter;
not all HTML files could be obtained
by using TeX input, the TeX program,
the DVI file and the envisioned converter.
You also mentioned some ways in which
HTML, say, with <div>, is more flexible
than TeX. Fine. But I was discussing
just converting TeX DVI to HTML.
And, again, I see no way to convert TeX
input, which is a programming language,
to HTML, which is not a programming language.
The point isn't to convert a TeX program into an equivalent HTML program, it's to be HTML an output of a TeX program. For example, make \emph{foo} output "<i>foo</i>" instead of "/Times-Italic 12 selectfont (foo) show" or whatever the PS output would be.
> The point isn't to convert a TeX program into an equivalent HTML program.
You are correct, of course. And one of my main
points is that such a conversion is essentially
impossible. E.g., TeX can read and write files, but,
thankfully for Internet security,
HTML can't.
So, my solution and envisioned converter is to
convert TeX output in a DVI file to
an HTML file. Such a converter seems doable
and to solve a concern the OP had.
Further,
my envisioned converter from DVI to HTML would
do just what you are describing.
Or, the DVI file has to put the 'f' of
'foo' at some coordinates (x,y) on the page in some
font, say, some bold font. Fine. TeX can handle
lots of fonts, the many standard ones and
more if want to make routine use of the ability of
TeX to handle essentially any font given in the
form TeX wants.
Want to create your own fonts? Knuth has been there,
done that, and left a terrific tool MetaFont, open
source, beautiful documentation. Create all the
fonts you want and have TeX use them. Then create
equivalent fonts for HTML and that a Web browser can
use. Such work with fonts is just making routine
use of what TeX has had for decades.
So, from the DVI, write to the
HTML the markup string
<b>f</b>
at a position given by absolute coordinates while also specifying the desired font. That's about
all there is to it. Seems quite doable to me.
Want to convert to PS? Okay, from the times I
read the big, red Adobe books on PS, converting from
DVI to PS is also quite doable. Indeed, there
is likely a TeX device driver for that conversion
now, as there is from DVI to PDF -- which I use
heavily. Indeed, checking, my script for converting
DVI to PDF uses EXE
I:\protex1p2_run\miktex\bin\dvipdfm.exe
and that EXE is standard in the TeX world.
It works fine.
Any reasoning about TeX being able to do things that HTML can't is irrelevant. TeX -> PDF can be done without an intermediate DVI stage using pdftex. There could therefore be a similar "htmltex" which could directly convert TeX -> HTML.
In the same way that pdftex has the advantage of knowing its output format (and can e.g. write pdf metadata), this hypothetical "htmltex" would know that its output is html, and could do things like allowing paragraph re-flow and embedding maths using MathJax.
Of course, this wouldn't be easy, you'd likely need to fork TeX to implement it correctly (or only support a subset of LaTeX features like the current TeX->HTML converters), but it's far from impossible.
You are correct. And I am correct. But we are
not taking about even a little
bit of the same thing.
Once again I will try to be clear:
Knuth's work resulted in a computer program, TeX,
as an EXE file, say, tex.exe.
A user of TeX as a word processor types
in a file with three letter extension TEX,
say, my_math.tex. This file, my_math.tex,
actually is a computer program, that is,
has allocate-free storage, if-then-else,
file read-write, arithmetic, string manipulations,
etc. This computer program my_math.tex is
not Knuth's program tex.exe.
Yes, maybe not
all TeX users have their TeX input files,
say, my_math.tex, do file reading or writing,
but such file manipulations are just routine
usage of TeX that I do nearly always. And I
have some TeX macros I wrote that do
storage allocation-freeing. Maybe not all
TeX users do such things, but they are routine
usage of TeX, and I do them.
To be more clear on just why file my_math.tex
is a computer program, when Knuth's tex.exe
runs file my_math.tex (interpretively), the program
my_math.tex can read files. Then the output
my_math.dvi can vary depending on what was in
the file, say, my_math.dat that program my_math.tex read.
Well, there can be no file my_math.htm
that will read a file my_math.dat, that is,
read the file
and process it like my_math.tex can.
So, if only for this reason, as a result,
program my_math.tex can never be
translated to a file my_math.htm.
And program my_math.tex can't be
translated to my_math.pdf or my_math.ps
either.
But a file my_math.dvi, from my_math.tex and a particular my_math.dat, can be
translated to a file my_math.pdf or my_math.ps.
And in this thread I have been suggesting that
there could be a program that would
translate my_math.dvi to my_math.htm.
> TeX -> PDF can be done without an intermediate DVI stage using pdftex.
Although this is a small point,
for pdftex,
I am quite sure that internally a DVI file
is generated if only because that is
what Knuth's program tex.exe generates and
rewriting Knuth's TeX code, likely now in C,
say, tex.c,
would be both unnecessary and the
difficult approach. Just generating the
DVI file is the easy approach, even if
don't have the user aware of the intermediate
DVI file.
What PDFTEX does I do frequently by putting in the
extra step of going to DVI and then from DVI to PDF.
Fine.
I want the DVI file because I like the
DVI preview program I have and like it much more than
than using a PDF viewer. When I get something that looks
good with my DVI preview program, then usually I go
ahead and make the PDF file.
However, what I am doing getting a PDF file
and what you are talking about with pdftex are
not, in the sense I am discussing, a translation of TeX to PDF. Not at all.
> Any reasoning about TeX being able to do things that HTML can't is irrelevant.
True for what you are talking about. False for
my point that a file my_math.tex can't be
translated to a file my_math.htm.
Or, for a short explanation, you are saying that
a file my_math.dvi can be translated to
file types PS and PDF and maybe also HTM,
and I agree.
But I am also saying that a file
my_math.tex cannot ever be translated
to a file my_math.htm.
To be still more clear,
HTML is a mark-up language, and TeX looks like
it is also a mark-up language, so
one might try to translate TeX mark-up to
HTML mark-up. Well, such a translation is
just impossible, and will always be.
>HTML is free flowing; if the user resizes the window the layout should adapt, they layout has to work on mobile devices, etc. The HTML way asks for a completely different way of designing layout, and Latex is simply not the right tool for that job.
I agree that HTML should re-flow. Let's try with the OP article.
Though he points out this mostly applies to philosophy papers. Many of the points do not really apply in some other scientific fields (I usually had no trouble submit LaTeX papers for CS journals/conferences).
The primary strength of using LaTeX is math typesetting. If you're not writing equations the argument gets to be very subjective. If you are writing equations there's nearly no alternative (at least one nearly as well proven).
> What is the alternative? The author does not propose any.
The author is an academic. He is concerned about writing papers. His solution is to write papers in Word, submit to journals, and let the publisher worry about the final layout.
The article mentions markdown, HTML. I think markdown is more practical for actually writing in, and combined with pandoc it can be very powerful. Both of these formats should work well with version control. Personally, I find that markdown's minimalism goes very nicely with git. As long as you use suitable line wrapping it generates very concise and helpful diffs.
Bonus: if you're using pandoc you get native use of LaTeX's math mode.
Even if you're using pandoc + markdown you're still dependent on LaTeX, so the question to ask is really why not write in LaTeX directly to begin with? If you already know it, and you do if you are in certain parts of academia, it's probably the easiest route. Or put differently: nobody has even gotten rejected for typing in LaTeX, if you pardon the expression.
Wrote my thesis in IDML, worked surprisingly well, yet Adobe InDesign is a beast of its own. And once you run into more problems you're pretty much on your own.
Surely any modern complete (La)TeX replacement would be a good thing to have, but I haven't found out any yet, so LaTeX IMHO still remains one of the best choices when it comes to writing/publishing stuff.
I think that reStructuredText could be a nice foundation for some more generic writing/publishing solution, where TeX notation could be still uded for math environments (as I don't know any better one for that). Markdown is too vague, imprecise and inflexible, and CommonMark - a strongly specified, highly compatible implementation of Markdown - is not much better, mostly due to Markdown compatibility.
EDIT: AsciiDoc could be also used instead of reST.
I too agree with all of his points... but what's a realistic alternative stack that satisfies those points without drastically cutting down on the available rendering tools and packages for specialized tasks?
If .rtf/.doc is in such high demand, can't we output to those formats using LaTeX? I think of it as just another output alongside dvi/pdf/etc, but I know very little of the internals that would generate those additional formats.
I mean, I view .DOC was worse than Latex in terms of ability to correctly render it in the future, ability to generate complex documents correctly from originals, ability to programmatically interact with it, and generally anything to do with the future.
I'm tempted to go down some XML path, because that separates concerns between the semantic structuring of the document/corpus and the rendering of it, but is that really better than just using a declarative subset of LaTeX and worrying about correctly implementing the styling scripts to render them as desired?
I have my doubts it would really be an improvement.
For context, I have a project at work coming up for which I have a bit of time to establish a toolchain and our format for things like documentation, specifications, etc. I'm open to the suggestion I should spend some of that time working on a system to make sure we don't hit a rendering issue on a technical manual in a few years when technologies change. (I'd also like to look in to literate programming tools, so semantic demarcation for automatic selection of certain kinds of elements in the document is high on my list of things to look in to, as well as relationships between and metadata in those blocks.)
I'm just not convinced that trying to replace Latex with XML or anything of that nature is actually going to make my life better in those regards, rather than being a waste of time.
(If you haven't noticed, XML is sort of the main alternative to Latex in my mind for the things I'm trying to do; perhaps there are better options.)
It would be more rational to output to HTML5, since there are insane amounts of HTML to X converters around (APIs and tools alike). PDF or Doc from HTML is utterly trivial at this stage.
I am very tempted to try to write my next publication in HTML. However, I seriously worry about things like footnotes, code examples, floating figures and references. CSS3 seems to have support for many of these, but I wonder how well the convert-to-PDF pipeline really works, and how flexible it really is.
It's bad enough if I have to convert my original source to some other format years down the line, but it is absolutely critical that I can at least create the initial PDF correctly.
When I started grad school, I initially used LaTeX to prepare my articles. But then my advisor tried to open one of my documents on his machine (Windows, I'm using a Mac) and some kind of weird error came up. So we spent one whole advising session with him searching for MiKTeX and installing it, whereupon my document still wouldn't load for whatever reason.
Then once BibTeX got in the mix, it became even messier. Well, I finally finished the article and submitted it to a journal, but they had so many requirements for LaTeX submissions that it took a while to change everything for it. Once I finally submitted it -- surprise! -- their online LaTeX compiler came back with another weird error. It took me about 8 more submission attempts to isolate the "bug" (which was some kind of issue due to differences in versions and default packages installed on their machine vs mine), and I finally got a PDF generated. Except it still didn't look the same as mine for whatever reason.
Well, the article got rejected from that journal, and so as I'm applying for another journal, I read "Word documents only".
So I gave up on LaTeX. Too much of a hassle for me. Maybe it's a smooth process for everyone else, but I don't have any problems just typing a Word document and sending it off. (Although I do miss LaTeX's equation typesetting system.)
TeX sources are only for co-authors. Everybody else gets the generated pdf. Advisors et al can annotate the pdf or print-and-scribble. All my publishing venues only gave templates and required a pdf.
I think LaTeX has one disadvantage and it is not mentioned in the linked article. It has nothing to do with this "Cargo cult" thing, which I'm not sure it used correctly in this text but nevermind that.
The disadvantage is, that while LaTeX has an excellent support for PDF or PS its support for e.g. EPUB is awful.
Otherwise there is nothing better to write longer text than LaTeX.
> The disadvantage is, that while LaTeX has an excellent support for PDF or PS its support for e.g. EPUB is awful.
Well, yeah, but ePub is just a packaging format for HTML as input to a system that does its own layout and pagination, LaTeX is a layout/pagination system. Using it to generate ePub makes about as much sense as using ePub reading system to generate LaTeX.
It'd be better to just have end-user device-side apps that compiled LaTeX to PS/PDF/etc. after plugging in a device-appropriate page size, using LaTex as the distribution format and the rendered format as the viewing format than to use LaTeX to generate ePub
LaTeX is but the code of a LaTeX isn't. You should be able to create whatever you want out of it. The commands in the file (just) have to be interpret differently and yes, this "just" thing is the big point, because it is not easy, especially for a language which is as old as LaTeX.
For me it is odd, I currently rely on LaTeX for my workflow. Creating documents which includes other PDF files and a certain pre defined layout is (for me) very easy to do in LaTeX. Especially if the ground work (layout) is done.
It's funny to hear this compared to Markdown. The reference implementation of Markdown uses regexes applied in order---an operationally defined language. Only niche implementations like the beautiful Pandoc---also by a philosopher---use a real parser to provide a less leaky declarative semantics.
LaTeX, and TeX, are the last gasp of a long line of development. That line starts with MIT's RUNOFF, which begat roff, nroff ("New ROFF"), troff, ditroff, psroff, and finally TeX. It's the last of the programming-language like word processing systems.
TeX assumes your final output is paper. That's an obsolete assumption. The math features are concerned with presentation. You can't cut a formula from TeX and paste it into Mathematica or MathCAD and have it understood. TeX doesn't understand mathematical notation; it just format it for printing.
As much as I like LaTeX, I think it has outlived its usefulness. Let me elaborate a bit...
LaTeX is pretty good for typesetting lots of text, like novels, if you want to output something that looks professional without much effort. It's also good for typesetting lots of mathematical formulas, as the GUI alternatives are pretty tedious to use and the result doesn't look as good.
It's also interesting as a template language, to produce documents from applications, but here's where it starts to become obsolete: there are a lot of alternatives to do this that don't force the person creating the templates to be technically-inclined and having to learn LaTeX.
Also, LaTeX becomes downright irritating when trying to make complex documents. The rules say you let LaTeX choose the looks of your document, while you focus on content, but it's impossible not to spend hours fighting it because it's not breaking pages or placing figures where we want it to (and iterating over this endlessly as the document is modified). You just cannot force yourself to let LaTeX do everything by itself.
So, it's useful for novels (which might as well be written in plain text and let the publisher do the typesetting) or documents with a lot of maths (which are nowadays most likely also complex documents that cannot go without human intervention to look good.)
I like LaTeX for the technical aspect of it, but in the end it's just a tool do accomplish some goal and, sorry, but Word doesn't consume that much resources anymore on a modern machine...
> LaTeX becomes downright irritating when trying to make complex documents.
Sorry, but this is simply not true. Maybe I could agree with a general "it is irritating to make complex documents", though.
After many bitter fights with WYSIWYG software and 300+ page technical documents, I firmly believe that LaTeX is by far the least irritating way of making them.
Personally, I could just as well use Word for most of my work, but I like not having to use shortcuts or plough through menus to format my text. Markdown is a nice alternative, but doesn't support referencing figures and tables nor does it support citations.
I don't know why many people, including the author skips (or don't even see?) the important point: we are not just miles and miles far away, we are in the entirely different reality from where automatic typography is any good. So the more it's “declarative” (whatever this means) the worse it will look. This is out ugly reality. And the document in LaTeX made by someone without a clue (≈ how we want it, simple and declarative) is f*cking MSWord-level ugly.
So LaTeX sucks, but there's not a good alternative?
I'm in the midst of creating a document from scratch with LaTeX for the first time (as opposed to using a template provided to me) and while some things have been annoying, it's mostly been the learning curve of figuring how to do what I wanted. Tables are a mess though. It seems like I need to stitch 3 different packages together to do what I want with my tables.
Markdown is a great idea, but are there markdown based solutions for 1) figure numbering and caption handling, 2) references, 3) automatically updating an index?
Anyone in academia needs to worry about all of these, and solutions in Word are sufficiently worse than LaTeX, which automatically updates all references, heading numbers, regenerates figures that have changed, and tracks your index for longer documents like theses.
I don't get the point about declarative vs operational paradigms. LaTeX does a great job separating the data and the processes (it is declarative in this sense), although it allows to mess with both inside the document for flexibility.
I can't see the fundamental difference with HTML: it indeed lacks "operational" commands, but is massively used with javascript that implements the dirty stuff.
Philosophy journals won't accept manuscripts typeset in LaTeX?!
Journal publishers charge thousands of dollars for subscriptions to individual journals. It seems reasonable to occasionally expect publishers to, you know, do the work for which they are being paid. (And to do it competently.)
As the author mentions, it's a seller's market: the publishers are in a position of both charging thousands of dollars for subscriptions and making their life as convenient as possible at the expense of others'.
Anyway, surely it's no wonder many philosophy journals don't accept LaTeX documents? As far as I know, LaTeX isn't really a thing outside hard sciences.
May be specialized applications giving a WYSIWYG editor while using LaTeX in the background be useful? Like the one we have built: www.cvsintellect.com. Its a CV / Résumé builder built on LaTeX but the user does not need to know LaTeX.
LaTeX has one large advantage over lesser text processors which this article does not touch upon:
Quidquid LaTeX dictum sit, altum videtur
Put two documents in front of someone used to quickly scanning scientific articles. One of them is formatted using LaTeX, the other using whatever document template Word happened to be launched with. In my experience the LaTeX-formatted article will be seen as a more reliable source than the Word-formatted one, even if the contents are similar.
In my experience this is something that I do subconsciously but it's often justified. People who write in word do so because they are unaware of LaTeX or because they don't know how. The first means they are usually not familiar enough with the field to understand that LaTeX is the default. This lack of understanding is then also present in the content. The second reason is usually indicitive of someone who is lazy or is writing junk. If they can't write LaTeX then it is unlikely that they can write other code.