Hacker News new | past | comments | ask | show | jobs | submit login

> LaTeX input files are proprietary to LaTeX, just as .doc is Word. The only definition of LaTeX as a language is “whatever LaTeX does”.

I am sorry but I don't see this point... Everybody is free to write a parser for .tex files and use it for whatever reason they want...

You are not free to do so with .doc or other proprietary format...




His trouble seems to be not with LaTeX, but with materialism. No matter what medium he put his texts down in, will he find that the text is somewhat bound to that medium now and it'll take work to modify and/or transfer it. Pen and paper is "proprietary" in this line of thinking.


I don't think he understands what the word "proprietary" means. It means that the file format is secret or there are legal constraints on its use. Until recently Word documents clearly counted as prorietary, but now that microsoft has been forced to document its format to some degree .doc is listed as "controversially" proprietary on wikipedia.

http://en.wikipedia.org/wiki/Proprietary_format

Even today you will need to buy Microsoft Word to "properly" read .doc files. There are other readers (libreoffice) but they only do the basics, and in my experience they usually mangle the file.

LaTeX has never been proprietary because it has always been publicly documented.


The point is, are there any other applications that will render a LaTeX document correctly that aren't LaTeX itself? It's open source, but in practice the code is so complicated and large no one has ever duplicated it. It's not proprietary in the traditional sense, but if you want your LaTeX file to not be mangled, you must use LaTeX itself.


There are many programs that implement subsets of LaTeX/TeX. For example, for math layout I believe Mathjax and matplotlib have both essentially copied Knuth's program, based on his "TeXBook" and "TeX: The Program" books, which document the TeX code extremely thoroughly using his "literate programming" technique. TeX is one of the best documented programs in existence.

Google "Latex implementation" and you will see a lot of hits. I see a Java implementation, Windows implementations, LaTeX3 and LuaTeX are referred to as reimplemetations 'nearly from scratch', etc.


TeX, pdfTeX, XeTeX, LuaTeX... there are a lot of engines that can properly render a *.tex file.


And here's the problem: all of them are mutually incompatible in most situations. Even moving .tex documents over different platforms is an enormous pain, and pdflatex/xelatex don't possess the error reporting you need to quickly find which packages are missing on which systems.

Don't get me wrong, I love LaTeX, and I agree that the author misinterprets 'proprietary', but from a user standpoint the problem is the same: old documents are not rendered correctly, and new documents don't work with old compilers. It's a mess.


> all of them are mutually incompatible in most situations.

I haven't compiled against every implementation, but I did just recently rerun a report I create 4 years ago under a different engine. I spent about 20 minutes addressing the new complaints, when I finished that the generated pdf looked exactly the same as the old one - but with up to date data. Try that with html :) Heck, I've had the exact same experience switching C compilers. I do agree with you about the crappy error reporting. There is a reason why the Library of Congress is bundling data with binaries now, this is a very common problem - but in my experience Latex has fared much better than most formats.


> no one has ever duplicated it

But it is Free, so why are you duplicating it? I must be missing something.


The multitude of third-party .doc readers would seem to disprove that assertion.

In any case, it's clear from the text that he uses "proprietary" to mean "specified only by the canonical implementation". In this respect, .tex qualifies but .doc no longer does, although .doc is so bizarre and complex that writing another parser from the spec is... challenging.


Libraries such as wv has been built by reverse engineering the format, not from official specifications. The latter turned out to be pretty much useless as they didn't contain enough information to actually parse .doc in the wild.


> I am sorry but I don't see this point... Everybody is free to write a parser for .tex files and use it for whatever reason they want...

It's great that you have the freedom to do that in theory. But it doesn't work in practice. The .tex format doesn't have a spec or independent implementations; it's complex and idiosyncratic, and there are no good general-purpose conversions from .tex to other formats (e.g. markdown, html). The only program you can really use .tex with is latex.


You might want to try checking out Pandoc: http://johnmacfarlane.net/pandoc/


Pandoc can't replicate everything LaTeX does. It can take a heavily restricted subset of LaTeX and convert it to other markup languages. Nobody to date has duplicated LaTeX quirk for quirk.


FTA: > LaTeX input files are proprietary to LaTeX, just as .doc is Word.

I must be missing something --- LaTex .tex documents are written in plain ASCII text files with pseudo-English tags indicating generally how text is to be processed (italics, bold, etc.).

FTC: > Everybody is free to write a parser for .tex files and use it for whatever reason they want...

Exactly. Pandoc supposedly converts from LaTex into many other formats (although I haven't personally tried any of those particular conversions).


He means "de-facto proprietary" in that there is no standard for the output of LaTeX, except whatever LaTeX outputs. That means anyone who wants to build anither version has a huge amount of work to do endlessly duplicating the quirks of the original implementation. Imagine HTML being defined as "whatever Firefox does". You're chasing a very complicated moving target and you'll always be behind if you aren't just copying the source wholesale.

As to your second point, he mentions LaTeX converters in the article, saying you must write in a very restricted subset of LaTeX for it to convert properly. Obviously, pandoc doesn't have any way to turn everything LaTeX does into a markdown file.

That being said, I personally like LaTeX a lot. But I wanted to clarify the points the author was making.


> You are not free to do so with .doc or other proprietary format...

Eh?


doc format is not documented, obviously.


The doc (and docx) formats are actually very well documented, thanks to pressure from the EU:

[MS-DOC]: Word (.doc) Binary File Format

http://msdn.microsoft.com/en-us/library/office/cc313153(v=of...

[MS-DOCX]: Word Extensions to the Office Open XML (.docx) File Format

http://msdn.microsoft.com/en-us/library/dd773189(v=office.12...


"Very well" is a euphemism here, I assume?

I worked in Windows Server when Microsoft was under the US DOJ consent decree and had to document every thing that looked at all like an API--even internal things that were just APIfied for design reasons / ease of testability / to make servicing simpler.

I can say with some confidence that no one gave a shit about producing good quality docs. Without exception, people viewed the government requirement as onerous and excessive and we produced docs that were perhaps technically correct, but did not provide insight into why things were the way they were. No effort at ease of readability was made, either.


That is really a shame that you guys didn't use this new requirement to improve your product and internal process. Your comment comes off as a group that was just obeying the letter of the law, but not the spirit of the law and I could only guess that this would easily spill over into all cases of documentation even the cases where it matters. Having a large group of developers believe that it isn't worth the time to make good API's and produce worse than horrible docs is really sad. Taking the time to create good API, even for internal use can uncover design flaws, reduce errors, make it faster to make changes, easier to test, and faster to bring in new developers. Here with a government mandate you could have used it as an excuse to grow as a group to become better at creating software.


I can see how this comes off as an insular group sticking it to the government, but that's not the case.

If I gave the impression that we didn't create good APIs or good docs, I apologize.

We did, but that's not what the government wanted, so we gave them what they would accept. The government just was not very good at deciding what has to be documented and what doesn't. e.g., we had to document sample wire traces of messages that are all auto generated through IDLs and sent over a standard protocol. Rather than 2 page of IDL and a comment saying we use transport X (which is defined in RFC blah), we were actually required to submit 100-pages of traces. That obscures, that does not help.

Even if you wanted to do a great job of producing docs, we quickly learned that the process wasn't about creating great docs; it was about producing docs that the government would accept. Have you seen Office Space? It's that. It's thankless, because you're generating shit docs that aren't relevant that are judged by people who don't have the skills to judge them.


Even a half-assed effort to produce a document no one cares about is "very well" compared to the majority of mission critical and/or open source systems out there for which the only documentation is a README and, if you're lucky, some mailing list archives.


My understanding was that it'd be impossible to make a 100% compatible docx parser even of armed with those docs. As an example, when the EU forced the issue I remember seeing stories about XML fields which simply contained undocumented blobs


Yep, having options / tags that whose definition is LITERALLY "do whatever [some ancient version of Word] does" is totes well-documented.

Implementable, on the other hand, not so much...


If you don't already know how to implement them you aren't supposed to implement them. The spec even tells you not to implement them (and Microsoft does not implement them). They are there for third parties who reverse engineered ancient Word and WordPerfect formats and built tool chains around them, and want to move to a newer format but need to mark places where they depend on quirks of those ancient programs.

Here's the use case this is aimed at. Suppose I run, say, a law office, and we've got an internal document management system that does things like index and cross reference documents, manage citation lists, and stuff like that. The workflow is based on WordPerfect format (WordPerfect was for a long time the de facto standard for lawyers).

Now suppose I want to start moving to a newer format for storage. Say I pick ODF, and start using that for new documents, and make my tools understand it. I'd like to convert my existing WordPerfect documents to ODF. However, there are things in WordPerfect that cannot be reproduced exactly in ODF, and this is a problem. If my tools need to figure out what page something is on, in order to generate a proper citation to that thing, and I've lost some formatting information converting to ODF, I may not get the right cite.

So what am I going to do? I'm going to add some extra, proprietary markup of my own to ODF that lets me include my reverse engineered WordPerfect knowledge when I convert my old documents to ODF, and my new tools will be modified to understand this. Now my ODF workflow can generate correct cites for old documents. Note that LibreOffice won't understand my additional markup, and will presumably lose it if I edit a document, but that's OK. The old documents I converted should be read-only.

Of course, I'm not the only person doing this. Suppose you also run a law office, with a WordPerfect work flow, and are converting to an ODF work flow. You are likely going to add some proprietary markup, just like I did. We'll both end up embedding the same WordPerfect information in our converted legacy documents, but we'll probably pick different markup for it. It would be nice if we could get together, make a list of things we've reverse engineered, and agree to use the same markup when embedding that stuff in ODF.

And that's essentially what they did in OOXML. They realized there would be people like us with our law offices, who have reverse engineered legacy data, that will be extending the markup. So they made a list of a bunch of things from assorted past proprietary programs that were likely to have been reverse engineered by various third parties, and reserved some markup for each.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: