Haskell Snap Framework templating 3000x faster with new release

jlouis · on Dec 10, 2012

5 microsecs translates to 5000 nanos. Assuming an average CPI of 0.5 we have 10000 instructions to render the template. Or 30-50 main memory fetches. This isn't too shabby.

But the numbers before the improvement. Ouch. Those were bad.

Peaker · on Dec 11, 2012

I'm not sure measuring the number of CPU instructions makes much sense anymore.

In my experience micro-optimizing code, the majority of time is spent waiting for memory (bandwidth waits or even worse: latency costs).

jlouis · on Dec 11, 2012

Precisely. This is why I mentioned the 30-50 mems. A typical main memory fetch is somewhere in the 100-200ns range on modern hardware. I tend to measure the effectiveness of an algorithm based on, roughly, how it accesses memory.

Caches play a role as well, of course. L1 is around 1ns sometimes even less. if you hit L2 it is in the vicinity of 5-10ns. It easily translates to some 30-40 insns on modern hardware.

It also tells you that hunting for faster execution by compressing instructions down is not going to cut you that much extra speed nowadays. The key to fast programs is data representation. Good data representation.

lrem · on Dec 10, 2012

I wanted to ask how is that even possible, then read:

> However, we realized that a lot of the transformations could be done at load time and preprocessed to an intermediate representation.

Lazy computation has its disadvantages. Still, impressive gain.

dons · on Dec 10, 2012

This isn't about deferring evaluation within a phase, its about compilation over interpretation, and the phase distinction.

icarus127 · on Dec 10, 2012

> Lazy computation has its disadvantages. Still, impressive gain.

After reading the article I don't see anything about lazy evaluation. What am I missing?

mcherm · on Dec 10, 2012

I don't think you ARE missing anything. From my reading, the gains are obtained by pre-computing the string concatenation a single time (rather than while rendering) for all cases in which it can be.

It's similar to moving something from runtime to compiletime (although those distinctions don't quite apply).

icarus127 · on Dec 10, 2012

Ah, that was exactly how I read it. Basically what Yesod uses template Haskell for in a variety of cases.

SilasX · on Dec 10, 2012

>I don't think you ARE missing anything. From my reading, the gains are obtained by pre-computing the string concatenation a single time (rather than while rendering) for all cases in which it can be.

Lazy evaluation: waiting until a computation needs to be done to perform it.

Problem here: it was inefficient to do a particular computation at the very moment before it was needed.

So, how is this not a lazy evaluation problem?

dons · on Dec 10, 2012

Laziness is a specific property of how variable binding and application works in a language, which is not at issue here.

Unless I am mistaken, they didn't change the template language evaluation strategy from call-by-name to call-by-value.

They did change the implementation from an interpreter to a compiler, though.

lrem · on Dec 11, 2012

I had the same line of thinking as SilasX. Conceptually, the change is that instead of deferring work to the last moment available, it is now done immediately. This is the very difference between lazy and eager computation. As I'm not a Haskeller, I didn't immediately realize the strong connotation with language features. Sorry for the confusion.

Peaker · on Dec 11, 2012

Lazy evaluation implies that you keep the result for re-use later if it is shared between multiple users. So if laziness causes performance problems, it is generally:

* A space leak: The deferred computation holds lots of data alive for when it will be needed, whereas computing it would reduce all that data into a small result.

* A latency problem (sometimes called a "time leak") where we may idle around for a while, and only when some value is desperately needed, start computing it. We could preemptively compute the result to hide its latency.

These were not the problem here. The problem here was that part of the computation of the result is redone each time.

Sharing parts of the computation between the invocations was not trivial, and to do this, they found it easier to move the computation to the template loading time ("compile time").

mightybyte · on Dec 10, 2012

Thanks. I agree that it does seem too good to be true. I was also quite surprised when I saw how much of a difference it was the first time I ran the benchmarks. The great thing is that the bigger your page is, the bigger the improvement will likely be.

primigenus · on Dec 10, 2012

I find it interesting that Heist looks to be inspired by or based on XSLT, yet XSLT is not mentioned anywhere in the documentation. Is it just a happy coincidence?

mightybyte · on Dec 10, 2012

My inspiration for Heist came from Lift's template system. That and FBML. Heist is essentially a generalized system for building domain specific markup languages.

LukeHoersten · on Dec 10, 2012

Independent of Haskell, Heist is one of my favorite HTML/XML template engines so it's great to see such huge advancements.

riffraff · on Dec 10, 2012

could you expand on why is it that you favor it?

I ask because I like the idea of a stateless xmlish template language, but I wonder what this offers over the zillion existing solutions.

"Separates view and business logic" and "enables DRY design" are valuable goals, but most template languages have them.

LukeHoersten · on Dec 10, 2012

Good question. A few reason:

1. Heist allows you to define your own HTML/XML tags in the host language (Haskell in this case). This means you're only dealing with (an extended) XML document when doing layout and design so all the normal XML tools still work.

2. Some popular template engines try to separate logic and design but end up letting you cheat a little. Any time you want/have to cheat and put logic in the template, it really was a shortcoming of the template engine. In Heist, you can't cheat but you never want to.

3. The reason #1 and #2 work is because Heist's "recursively applied splices" is just the right abstraction. My HTML templates end up looking just as pretty as well factored Haskell code. Heist makes the perfectionist in me happy.

In short, I would say you're right here: '"Separates view and business logic" and "enables DRY design" are valuable goals, but most template languages have them.' But just because most template languages have these goals doesn't mean they've achieved their goals. Heist, in my experience and opinion, does achieve these goals.

mifrai · on Dec 10, 2012

From their compiled heist docs [1]:

There are two things that compiled Heist loses: the ability to bind new splices on the fly at runtime and splice recursion/composability.

I haven't checked or read the doc thoroughly, but if it's what I think it means - all we get is hierarchal splices. Which is still a lot, but it's not quite as magical.

[1]: http://snapframework.com/docs/tutorials/compiled-splices

mightybyte · on Dec 10, 2012

We still keep some of the magic by allowing you to run the old "interpreted" style splices at load time. These don't have access to dynamic data, but they do have recursion/composability. This combination retains most of the power while allowing a huge speed increase. It just means that to take advantage of both you have to structure things in a certain way.

At this point it seems to me that this structure also ends up being a desirable one for organizational reasons. But the jury is still out as far as whether there will still be reason to want more. We're aware that there might be good reasons to support this extra power and I have a pretty good idea of how it would be implemented. But I want to get more people using it in the real world before we address that issue.

papsosouid · on Dec 10, 2012

I'm not him, but personally I like it because html is actually a pretty good language for markup. I find any custom syntax to be worse than just html, and you lose the ability to use standard html tools, syntax highlighting, etc.

And as much as it may be possible to separate logic from presentation in a typical PHP/ASP/JSP style template, I've never actually seen it done. When something is made awkward, people tend to choose the more convenient approach, so you see an unfortunate amount of nested loops and conditionals in most templates. Being able to have designers write templates by simply telling them "anywhere you want dynamic content, just make up an appropriately named tag for it and pretend it is part of html" is really nice.

LukeHoersten · on Dec 10, 2012

Right on here. Exactly how I feel.

Generating HTML is really the whole point of a web framework so it better be awesome at doing so. Heist does this well.

riffraff · on Dec 10, 2012

thank you both, but I think I failed to express myself: what I meant to ask is: how is this better than other xml based templates, such as wicket, TAL, Kid, Genshi etc.

I would be led to understand, given your comments, that Heist does not allow control structures in the templates, but looking at some snap code[0] it would seem iteration is right there. Which makes sense I guess.

Or am I missing something, and there is a fundamental difference between Heist's

    <posts:reverseChronological>
      <a href="${post:url}">stuff</a>
    </posts:reverseChronological>

and, say, Genshi's

   <a py:for="post in reverseChronological"  href="${post:url}">stuff</a>

Is the difference, and thus your preference, in the fact that posts:reverseChronological works more like a function call taking the content as argument, rather than a "classic" loop?

[0] https://github.com/snapframework/snap-website/blob/master/bl...

mightybyte · on Dec 10, 2012

I think the difference is that Genshi's py:for appears to be a construct provided by the template system. In Heist, posts:reverseChronological, like you say, is just a function call. Looping doesn't happen in the template, it happens on the Haskell side. Genshi has to have a different construct py:if for conditionals. Heist doesn't need another construct. A function call serves both purposes. This has several positive effects. First, it means Heist's core is simpler since it only has one fundamental abstraction that has enough expressive power for implementing all of Genshi's special case keywords. And second, I think it gives a clearer separation between logic and view.

papsosouid · on Dec 10, 2012

>how is this better than other xml based templates, such as wicket, TAL, Kid, Genshi etc.

I don't think it is. To me it is just the haskell template engine in that style (which is the style I prefer). That style of template engine is the minority, so most comparisons are vs either mixed style (php/asp/jsp/rails) or vs custom syntaxes (mustache, haml, etc). What is good about heist certainly applies to similar template engines like lifts, zopes, etc.

diggan · on Dec 10, 2012

"Built for speed from the bottom up. Check out some benchmarks." from the frontpage of snapframework.com

"When we originally wrote Heist, speed was not our goal." from the link submitted here

Feel like a contradiction, couldn't they just say that it turned out fast enough on the first try?

dbpatterson · on Dec 10, 2012

I think that comment is referring to the framework / server, which is quite fast. Heist is a templating system authored by the same people that was not intended to be fast (though now it is). It is somewhat confusing, but I don't think the work they did on the framework/server (which is totally separate from Heist) should be discounted as "fast enough on the first try."

mightybyte · on Dec 10, 2012

Correct. We always marketed Heist as a more experimental part of the framework as a whole. The server and associated API was initially our primary focus.

andrewcooke · on Dec 10, 2012

so why did the api have to change? couldn't the compilation be done on first use? is the api change simply a change of the encapsulating monad (guessing wildly)?

not trying to bash haskell, but i think there's an interesting q about how well it (or any other language) can hide changing implementation details (particularly major ones like a compilation phase) behind an unchanging api.

or maybe that would have been possible, but the api changed for other reasons (the general cleanup)?

really interesting article btw. would have loved more detail... an explanation of introducing compilation in haskell with example would be pretty cool (pretty sure either pg or norvig has written one - with lisp - that i vaguely remember reading years ago).

mightybyte · on Dec 11, 2012

The changes were significant because Heist isn't just an API. It's an inversion of control where you provide routines that get run for various parts of your DOM. They used to be functions that took a node and returned a list of nodes. In order to do the optimizations that we wanted to do we had to change the type signature of the callbacks that the users write to something that took a node and returned a special data structure.

We actually did preserve the old API, so you can actually migrate without making significant changes to your code. Most of those changes are because of the general cleanup. So maybe my statement about big breaking changes was misleading. They're big breaking changes IF you want the performance increase. Otherwise things still work the way they did before. In fact, the process of implementing this refactoring impressed upon me that the old paradigm was even more important that I initially realized.

If you're interested in more detail, check out the rest of the docs linked at the end. They describe the concepts in more detail with a focus on how to use them. In January I will also be giving a presentation to the New York Haskell Users Group (http://www.meetup.com/NY-Haskell/) about some of the things I learned while implementing this new approach and merging it back into the original Heist code base.

papsosouid · on Dec 10, 2012

But this doesn't apply to splices that need data at runtime, like say pulled from a database right? Isn't that typically going to be 95% of your splices? The performance increase seems a bit overstated if it only applies to splices that are just simple substitutions.

mightybyte · on Dec 10, 2012

This might apply to 95% of splices, but not 95% of your template. This particular benchmark does show the best case, but the typical case of a few dynamic splices will not affect things much because the page is still getting converted into a concatenative style and a ton of the splice processing of things like <bind>, <apply>, etc is happening at load time.