Hacker News new | past | comments | ask | show | jobs | submit login
Sile, a typesetting system inspired by TeX and InDesign (sile-typesetter.org)
151 points by colinprince on Oct 1, 2014 | hide | past | favorite | 77 comments



I've worked in typesetting software a long time. And I've seen several OSS projects reach roughly the point this one has before dying. I can recall only one or two that made it significantly past this point. The challenge is not the coding but the domain problems. It's easy to get a core set of features with very basic operations going. But pushing on into the required capabilities for more than markdown-level formatting is very, very difficult. The issues are mirages, b/c they appear easy to explain, but they're extremely difficult to implement.

For example, footnotes. We all understand how footnotes should be placed--right there at the bottom of the page. However, placement become very difficult if you have a long footnote that appears in the before-last-line of the page. Now, if you include the footnote, there's not enough room for the last few lines of regular text, and the footnoted word is bumped to the next page. Figuring out how to get out of this one problem algorithmically involves heavy-duty backtracking work, projections of layout, recalculations, etc. Its difficulty is compounded if other factors enter in. For example, if the text and footnote appear just before a full page image, etc.

Another classic difficulty is widow and orphan control. Knuth's algorithm on this topic alone requires almost 100 pages to explain [Digital Typography, 1999].

And so on. Suffice it to say that Donald Knuth spent 10 years writing TeX, much of which he did full-time, supported by various grants.

Typesetting is a really, really hard problem domain and while the author here expects to add features like equations soon, to those of us in the field, it feels like he doesn't really grok the difficulty of the problem domain.

The odds are small that he can climb the cliff in front of him unless he's willing to work on this full time for a very long time.

Nonetheless, I wish him good luck.


I have looked at the PDF manual linked to from the homepage. If that PDF is produced bye Sile itself, I'm not impressed because the rendered text looks crowded. Perhaps the choice of the font is at fault, perhaps the engine lacks TeX's font kerning, perhaps.. Whatever the reason, the output is IMO ugly.


The rendering of another open source project in the "TeX-made-after-Y2K" category, Patoline, is much more appealing:

http://patoline.org/patobook.pdf


One of the more interesting features to me was the "document zippers" that lets you programmatically navigate the doc as you might a tree structure in many functional languages.

In addition, the output is certainly more convincing at this time.


Hmm. Patoline sounds cool, but in that book, I have a lot of trouble distinguishing text from code. They don't put a box or even whitespace around code blocks.


For a trivial but high-visibility example, the very first two paragraphs of the document have "orphaned" words on the last line--a big typographical no-no.

In the first paragraph, the orphaned word is "of". If there's no way to squeeze those three characters ("of.") into the preceding lines then one should probably be a little more generous with the character spacing in the second line so that the "of." doesn't look so lonely.

It is possible that as it happens there is no clean way to format that paragraph, or that Sile simply prioritizes other factors over orphan lines, but for a paragraph that opens with "SILE is a typesetting system. Its job is to produce beautiful printed documents.", leaving three characters orphaned on the final line is unfortunate. If nothing else I'd reword the paragraph to address this problem.


Well, they claim that is splits line just like TEX, so maybe it was just an unfortunate mix of wording and font.


I agree, I opened that PDF and the first page has two off-centre lines... not particularly impressed. And the line heights and kerning are all over the place. It's actually straight up terrible. Sorry, OP.


The design issues with this manual do not necessarily detract from the technical capabilities of this new system. But it's bad advertising of course.


Thank you for the gentle reminder that we might be well-served to examine more than the output. It is v0.9.0, after all. It was easy to say, "what is this!?" and start looking to find bugs/mistakes, when what's more interesting (to me) are the (planned?) placement of its capabilities with respect to InDesign and TeX.


If you take a look at the home page [1], he says he's having problems with the cairo pdf library not supporting things he wants, so that may be the source of this (he does mention letter spacing specifically).

1. http://www.sile-typesetter.org/index.html


The manual shows about 80 characters per line. It should be around 66 for optimal readability.

One of the things I like about LaTeX is that you don't have to know about this stuff, and since it's not entirely trivial to change the page margins it's really hard for a novice to make such a mistake.


It can't even get consistent line spacing in the table of contents. See http://imgur.com/ejg2FTS


skimmed briefly. page 16 has four words in it.


The header font changes randomly as well (compare pages numbered 48 and 50, actually 52 and 54 in the document).


Agreed; look at section 2.1 of the documentation:

- There are two sections demarked with horizontal lines, but the upper one is further to right right.

- The horizontal lines spill over into the right margin on the first one.

- There is an unreasonable amount of space above and below the second one.

- The paragraph "To begin with" looks stranded; it's indented because the rule is no doubt all apart from the first paragraph must be indented but it looks strange in this case.


I agree, doesn't look attractive. Spacing is off and text is waaay to close to the image in the example here: http://imgur.com/udshoSy


Footnote typesetting is abominable, too.


Does SILE offer any advantages when compared to LuaTeX? (http://www.luatex.org/)

LuaTeX seems to have all the same benefits, but in contrast to SILE, it also benefits from the CTAN archive.

So what's the selling point of SILE over LuaTeX?


LuaTeX gives you hooks into the TeX system to customise certain aspects of the program's behaviour. But you can't customise the core TeX algorithms. Because the whole of SILE is written in Lua, you can replace any of the core components.

For instance, grid typesetting is a known difficult problem in TeX and its derivatives, but it's very easy in SILE because you just change the way vboxes are added to the vertical list; it's about 30 lines of code.

The frames support is pretty compelling for me as well, and will be even more so as soon as balanced frames are finished.


And I wonder what the author means with "more flexible than TeX". I have never seen (except for emacs) a more flexible software than TeX. Fully programmable (although the programming language is somewhat different than the commonly used ones). And with LuaTeX, you can even use Lua to program the system (like SILE) and you can use LuaTeX as a library to create your own frontend while using the power TeX gives (glue/box model, line breaking algorithm, ...)


To me, it looks like this is a tool mainly for people who aren't already comfortable and at home in the *TeX environment. As one of those people, I'm very interested in what the syntax of this project ends up looking like. My few experiences with TeX were fairly frustrating, with multiple ways to do things (is it \bf or \textbf?) and a relatively large amount of boilerplate/magic incantations required to get a document up and running at all.

Contrast that with Markdown, where the setup is literally "type what you want in a box and we'll format it for you in a reasonable way". There are more advanced features that require a special syntax (embedded code snippets for example), but they are included in such a way that you don't have to worry about them until you need to use them, and when you do they behave like you would expect.

I know of course that TeX is incomparably more powerful than Markdown, but I don't believe that that means that the onboarding experience needs to be incomparably more complex. Easy things easy and all that.


This isn't designed for people who don't know TeX, although of course they're welcome to use it. It was designed to solve particular typesetting problems (which came up primarily in the context of typesetting Bibles, parallel texts, and especially in non-Latin alphabets) which were difficult to achieve in TeX.


According to the doc it's a less-hackish grid-based layout, apart from the usual benefits of a clean room reimplementation. But yes, in general TeX is hard to beat and I'd like to see some good comparisons about tricky stuff, especially considering that you won't switch because you find TeX' syntax to baroque…

I did get a bit more interested after I read who is writing this, as Simon Cozens is pretty well-known in Perl circles.


Just to clarify - the grid layout isn't compulsory; it's an optional package. But it's a good example of a thing which is a hard problem in TeX (http://tex.stackexchange.com/questions/1418/grid-system-in-l...) and an easy one in SILE.


I'm a little surprised that both SILE and Patoline [1] both seem to stick with the TeX formatting tradition of \begin{foo} ... \end{foo}. Is this really the best way to express layout? Or is this so that users will feel comfortable with new systems? I would have thought that taking a cue from some of the markup systems like Markdown, ReStructuredText, ASCIIDoc, etc. might have been a consideration. Just curious about this design choice.

[1] http://patoline.org/


One markup language is much the same as another. The real issue here is the fact that typography is not inherently easy to specify in a serial language. It's an intuitive, visual activity, for which direct manipulation and WYSIWYG are ideal. Any non-trivial layout contains both typographic hierarchy and spatial hierarchy, grid systems and the like. Knuth blazed an important trail but you couldn't typeset a magazine in TeX. It's all very well to scratch one's own itch, of course. It is just that the specfication of layouts is a more complex, intuitive and holistic problem than TeX-like languages acknowledge. You need to be able to work with intrinsically visual and spatial things like clip paths, not to mention checking optically for rivers etc. It seems to be common for a certain kind of programmer to be detail-oriented when it comes to coding but tone deaf to the richness of typography as craft. Grrr.


>> The job of a word processor is to produce a document that looks exactly like what you type on the screen. SILE takes what you type and considers it instructions for producing a document that looks as good as possible.

>> SILE doesn’t show you where the lines will break, because it doesn’t know yet.

Genuinely asking: Why cannot the above be done in real-time as the user types? (I understand that would blur the line between a typesetting system and a word processor, but if output of the typesetting system could be produced in real-time, then that boundary should not need to exist.)


I'm not sure about SILE, but for TeX, the algorithm determining the line breaks was exponential in the length of the document. Even on a modern machine, a moderately complex document will take a few seconds to run; in the 80s and 90s, my professors told me it'd take minutes to run a TeX pass.

The reason why Google Docs / Word / etc. can show the line breaks in real time is that it uses a faster (but less "optimal") algorithm.


Great, this is the type of answer I was looking for. Can you supply or point to more details please? E.g. what is the optimal line-breaks algorithm used for Tex, and what and how much is the negative impact of the less optimal one? Thanks


I'd start by reading through the links on Wikipedia: http://en.wikipedia.org/wiki/Word_wrap#Knuth.27s_algorithm


You can type directly into layouts in InDesign and see your line and page breaks in realtime. I think where the problem lies in InDesign is that it is, at its core, a typesetting and layout application and not a word processing application. There is a lot of work to do to get to the point of being able to write in your layout, and pages don't get added dynamically when adding new content as easily as they do in a typesetting application.


> pages don't get added dynamically when adding new content as easily as they do in a typesetting application.

InDesign will actually do this. I think it's called "Dynamic Flow" or something? It can automatically append new pages to fit the content.

In general, it's not very eager to do this because when you consider spreads (distinct left-hand and right-hand pages), inserting a new page will affect the layout of all subsequent pages. You could insert a new spread, but that may not be what the user wants.

In other words, if the user is laying out pages, actual pages matter. They aren't just an implementation detail derived from the length of the text.


You wouldn't necessarily be "writing in your layout". You could do something where you have a separate preview pane that updates in real time. Similar to what Apple showed earlier this year with their Swift playground IDE, only for typesetting.


> Why cannot the above be done in real-time as the user types?

But why would you want to do that? The display of text that is good for editing is not necessarily the same as the display of text that is good on the page.


I take that. Now we need to get to the next step to resolve that 'necessarily'. Where are the respective maximas and what specifically are the differences. PS: Answer to the above may already be well-known and obvious, so I am just genuinely asking since I do not understand it as yet, being inexperienced in this area.


> Why cannot the above be done in real-time as the user types?

This is what InDesign does.


Great question. I've been wondering why nobody (AFAIK) has done this. Seems like the only worthwile reason to do something as silly as rewriting TeX.


I foresee mashing a robust typesetting application and a robust word processing application as being an extremely difficult thing to do well. Both are very different use cases and resulting environments.


A detailed understanding of this difficulty is what I am seeking. Can you explain more about why such a combined thing would be difficult and/or would be unable to satisfy both use cases? Thanks.


A few things come to mind.

Each use case—typesetting and word-processing—have enough standardized tools to fill an entire application's interface. I think making the compromises needed to do be able to do both adequately would be too much to make either task worthwhile. Maybe that wouldn't necessarily be true for digital, like ePub, but I think that would quickly become true for print.

I don't feel a typesetting interface/context is amenable to writing long-form documents. When I take into consideration multi-column layouts, running headers and footers, art wrapping, page flow across multiple pages, there are what appears to be a lot of potential distractors from the writing process. Word processing documents, when I hide as much of the interface as I can, allow for the tight focus needed for long-form writing. Someone else in this thread suggested having writing and editing happen in another window, but that's still creating an abstraction from the layout, and not really having edits happen "in real time" if only because then editing is not happening actually within the layout.

Finally, I think there would be a significant barrier to training for many authors trying to work in layouts. There are applications, like QuarkXpress, that enable word-processing-like tools in the layout. But I couldn't imagine asking an author to learn enough about the above to ask them to write a long-form paper, much less an entire book, in that environment. Fonts are handled differently, styles are handled differently, etc.


TeXmacs[1] more or less does this. As a bonus, as well as rewriting TeX, it also includes a rewrite of emacs.

1: http://www.texmacs.org/


The last time I checked, the implementation was still half-baked. Hyperlinks for example were not handled in real-time, nor were the equations if I recall correctly.


Looks interesting, the number 1 feature for me is compile time though - I want instant previews which I can't get with latex.


Have you tried Gummi? It's an editor with auto-updating PDF preview. It doesn't improve on compile time because it just runs pdflatex behind the scenes, but I find it usable when designing figures using tikz.


I use texpad which has auto-updating PDF preview, it does partial builds so often even with a large complex document it only takes about a second to update (although this is too long in my opinion). Occasionally it can't do partial builds (some libraries screw it up) and then I'm stuck waiting 10 seconds or more for the preview to update.


So, essentially, this is a rewrite for TeX. But the reality is that creating complex layouts really, truly requires GUI layout tools. No matter how good the output is for this application, it's entering into an firmly-established market with a few, large, expensive players, and not a lot of action.

Publishing automation tools are nothing new, but one has to give up a certain amount (usually a lot) of control to create a document on the cheap. Even for those workflows that are intensely reliant on templates, designers are still working in InDesign for the initial design, which is then handed off to a person, or more frequently a system, to translate into something to be automated.

Even as a long-time TeX user, I'm not sure what the appeal would be here, but I could have been in publishing too long to see this for what it really is.


Some sample inputs–outputs would be great.


From the PDF manual file[0] it is at glance almost similar to Latex.

  \begin[papersize=a4]{document}
  \chapter{Hi there}
  Hello world

  \include[src=chapter2]
  \end{document}
Unfortunately, the power of Latex for maths display and vectorized graphs are completely absent. It also states image handling is still rudimentarily available (only PNG files, for example). I also am not really impressed by the type setting from the PDF, but this might be due to its focus on the engine and not the aesthetics of the typesetting (which will hopefully addressed soon to convince Tex users!).

Personally I don't "program" in Tex, I only write. The benefits with regard to (La)Tex therefore seems minor to me. However, if you program in Tex (like, designing templates perhaps?) I can understand you don't want to fool around with an ancient system like Tex and SILE can be an excellent alternative. The source is on Github [1], so everyone can contribute!

[0] https://raw.githubusercontent.com/simoncozens/sile/master/do... [1] https://github.com/simoncozens/sile

PS. Because the author writes SILE, not sile or Sile; is SILE an acronym?


Two clarifications: Image handling is rudimentary at present but I'm switching to different PDF engine (see http://tex.stackexchange.com/questions/166261/would-it-make-... and http://www.sile-typesetter.org/2014/09/24/Whats-happening-in...) which will vastly improve image support.

Yes, SILE is an acronym but to be perfectly honest I can't remember what for. I promise it was good. "Simon's Improved Layout Engine" is a backronym.

But if TeX does what you want, please use TeX. I don't see much point in "convincing" people who are happily using a piece of software which fits their needs to switch to something else which may not. Do whatever works. TeX has CTAN which is the product of many years of work, and it'll be a long time before SILE can even "compete" on level terms with that. But I'm not interested in competing; I'm interested in being the best in a particular niche. If that happens to be useful for others, then great.


> Yes, SILE is an acronym but to be perfectly honest I can't remember what for.

Would I be wrong to bet at substantial odds that the SIL portion of it, at least, was for http://www.sil.org/ ?


Talking of the syntax, I am a bit disappointed that it is similar to TeX's. In my opinion, TeX's style syntax is one of the most visible features that haven't stood the test of time (in comparison to, say, an XML-like syntax).

(Of course, in the case of (La)TeX it is made worse by the ridiculously unhelpful messages in case of syntax errors.)


I found a lot of examples in the full manual here: https://raw.githubusercontent.com/simoncozens/sile/master/do...

The syntax is very similar to LaTeX, but it's more modern in that it has native support for fonts and images, and uses Lua as its scripting language.


I don't know, but I'm guessing the manual[1] is created with SILE?

[1] https://raw.githubusercontent.com/simoncozens/sile/master/do...



[deleted]


I think it's intended as a physical document with all new chapters starting on the right hand page.


Whoops, I deleted my comment (asking why a page was blank). You have a really good point, I had not even thought about what it would look when printed. Thank you for clarifying that, it makes a lot of sense.


"...text is flowed into \em{frames} on the page..."

Is that the example?


Kinda. The transformation from SILE code to HTML was apparently not completely flawless.


I would like to see some comparison of DocBook and SILE too. Any reasons for using SILE instead of DocBook? The simpler syntax on its own is good, but not enough to switch.


How is Pollen (http://pollenpub.com/) coming along?

Have there been any more books written using it, apart from Butterick’s Practical Typography (http://practicaltypography.com/)?


Pollen is still in development -- I've been playing with it, and reported a trivial (and now fixed) bug that probably hadn't come up because I was the first person to typeset dialogue with it. Many of my current quibbles with it could be addressed with a "cookbook" of recipes, or even a library that can play the role of LaTeX to its TeX, although that's probably longer term.

Having said that, though, Pollen's tackling a different problem space; SILE is aiming for print and not web, as near as I can tell.


Does it support typesetting mathematical formulas like tex/latex does? That's a very important feature for any kind of scientific publishing.


It doesn't, yet; TeX is very good at that, so if that's a need you have, stick with TeX. I'm not trying to corner the whole typesetting market!

However, it shouldn't be difficult (especially now that MathJax is in Javascript, a language not a million miles away from Lua) to add support for maths typesetting. Patches welcome!


Very nice system, Simon. Can I ask if you have given math any thought? Obviously TeX does great with math but there are still some things that many people think could bear improvement, with thirty years of hindsight (including, as I understand it, additional math categories and omitting the restriction on just a few fonts).


There seem to be a couple of conversion glitches in the HTML page: leftover "\em" and "\supereject" in the text.


At first I thought that it was left there on purpose, but it seems like it may be accidental.


I'll stick to ShareLatex.com Makes all of the pain of writing and compiling Tex go away and adds trivial simultaneous editing.


News@11: Haphazard rewrite of popular standard piece of software in random scripting language X revolutionizes the field, says naive programmer.


How about "Idle HN commentator wastes time mocking other people's work instead of offering any constructive input at all."

Seriously. There are no claims about 'revolutionising' anything, just some comparisons with existing software.


The only comparison is the block level layout and then it goes into claims about being better then TeX and InDesign without offering any feature besides "scripted in a different language". The OP does not even begin to understand the complexities involved in the named software to make their featureset happen. Look at the SILE codebase, its amateur hour... Things like this are fun little hobbyist practice.



"amateur hour".. you have no idea who 'the OP' is, do you? :)


While you're being more hyperbolic than most HN readers will like, I tend to agree with your point and wish you'd made it without the barbs.

This package is not even a rewrite, but the implementation of a tiny, tiny subset of TeX. In addition, I've always understood (and followed this rule in my OSS work) that you don't point out weaknesses in other OSS projects as justification for your own. Apart from it being bad form, it doesn't help promote the project.


Due to its very nature I dont take anonymous internet typing too serious and sometimes I like hyperbole to start of a conversation because it cuts right through to the core bits. I totally agree with you though about talking bad about the project you literally just stole verbatim code from (text layout algo).


For people that have used the TeX macro language for a long time and have great comfort with it I think it is difficult to grasp how difficult that language is to learn/use even for someone fluent in many other languages. The difficulty of using the TeX macro language is even greater if you have the misfortune of needing to use it, wait several months, repeat.


Looks very interesting, I might give this a try.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: