Wow does simple_markup ever totally miss the point of Markdown. By replacing #-headers with !, and replacing 1.-lists with #, and replacing _-emphasis with __, "Simple" is violating the spirit of the format to make implementation easier.
The point of Markdown isn't to create a super-flexible text2html replacement. The point of Markdown is to come up with rules that match the way people write existing email/Usenet messages as closely as possible. Any time you add a semantic to the format --- like !-headers --- that are virtually never used in email or Usenet, you might as well go all the way off the reservation and implement Textile. I hear Textile does tables, too; I'm sure someone's got an implementation that will generate HTML FORM tags as well. Knock yourself out!
That's interesting, but it doesn't really have any bearing on the thrust of the article; none of those things are cause for the slowness of current Markdown implementations.
(Also, numbered lists are one of my pet peeves with Markdown, since renumbering lists sucks. His way seems more usable.)
Still, I can see that bugging me. I could number a list with all 1.s, but that's like writing code with no indentation or extra spacing. The style of source material matters to me.
Right on the money. Markdown is great and doesn't get too crazy with added syntax for every html entity under the sun that you might want to produce.
In the past I've tried using Markdown for all my docs and failed. Sometimes you just need more control and must use something like LaTeX. Other times you just want to write a README or a few paragraphs that may be converted to html -- these uses are Markdown's bread and butter.
The paragraph type in his OCaml program is a compelling reason for why it OCaml is well-suited for writing compilers:
type paragraph =
Normal of par_text
| Pre of string * string option
| Heading of int * par_text
| Quote of paragraph list
| Ulist of paragraph list * paragraph list list
| Olist of paragraph list * paragraph list list
The type definition is stated almost the same as the grammar rule would be.
OCaml is exceptionally well-suited to writing compilers, or
anything dealing with complex structures of tagged data, really. (See, for instance, http://flint.cs.yale.edu/cs421/case-for-ml.html )
It's a neat language. It's got some implementation and usability quirks, and it seems to have a rather small / quiet community (so there aren't many books, though this one is quite good, IMHO: http://caml.inria.fr/pub/docs/oreilly-book/ ), but it's worth a look.
I've been reading through The Objective Caml Tutorial, http://www.ocaml-tutorial.org/, but I've yet had an opportunity to implement anything with OCaml.
I've actually that essay already about how OCaml is good for compilers, but it really struck me with this example.
...and the OCaml version is a tenth as many lines of code as the C, and a quarter as many as Python, Ruby, and Perl. Also, it uses only a seventh more memory than the C version.
For what it's worth, the OCaml version is a markup processor of the author's design, and not exactly Markdown-compatible. He argues that it's close enough for benchmarking purposes, though. (I agree.)
The point of Markdown isn't to create a super-flexible text2html replacement. The point of Markdown is to come up with rules that match the way people write existing email/Usenet messages as closely as possible. Any time you add a semantic to the format --- like !-headers --- that are virtually never used in email or Usenet, you might as well go all the way off the reservation and implement Textile. I hear Textile does tables, too; I'm sure someone's got an implementation that will generate HTML FORM tags as well. Knock yourself out!