Hacker News new | past | comments | ask | show | jobs | submit login
Computers are made of metal, not category theory (chrisstucchio.com)
80 points by stakent on Oct 1, 2014 | hide | past | favorite | 74 comments



If you take a look at how the GHC Haskell compiler (A "Sufficiently Smart Compiler(tm)" in my opinion) works, for example, it is not naively pushing allocating objects, creating thunks and emitting trampoline code.

Instead, it analyzes the program structures from graphs and emits rather efficient machine code in the end. Not too dissimilar from what your native code emitting C compiler does.

If you look at the machine code for something as "stupid" as the Haskell example below, the output object code does not resemble the semantics of the source program at all. (it's not quite as efficient as the same from a C compiler, but still proves a point)

    foreign export ccall fac :: Int -> Int
    fac :: Int -> Int
    fac n = foldl (*) 1 . take n $ [1..]
Compiler and programming language research is a very important topic that yields real performance benefits as well as better programmer productivity. That includes using Category theory to reason about program correctness.

If you're interested in how the Haskell compiler works, "Implementation of Functional Programming languages" is a good (albeit a bit outdated) starting point. The whole book is freely available here: http://research.microsoft.com/en-us/um/people/simonpj/papers...

I do agree with the title a bit, though. Some of our computer programming environments are just ridiculously slow. Being slow also means "consumes a lot of power" which is important when more and more computers are powered by batteries.


The OP (which I totally agree with!) also states:

> Such a compiler could properly determine that intermediate steps are unused, merge them (e.g. translate x.map(f).map(g) to x.map(x => g(f(x))))

Ironically, GHC does that already[0], and will also do TCO.

[0] http://metagraph.org/papers/stream_fusion.pdf


Correct me if I'm wrong, but teaching a computer to produce good low level code from high level code seems very similar to the machine translation problem (translating English to X). Why are people so hopeful about solving the former problem when the latter problem is so hard? Or are these two problems so different that the failure so far to solve one has no impact on the other?


I'd say those two things are only superficially similar. One is trying to correctly interpret a syntax which

- is tailored to the characteristics of extremely parallel computers which spend years unconsciously learning somewhat pointless statistical micro-rules, and then layer a formal grammar on top of that (both layers need to be correctly maintained); and

- sometimes, due to ambiguity, can't be parsed without understanding the nature of what is being referred to (which can be anything one can imagine).

The other is trying to transform expressions which are already formally specified (with semantics to a large extent based on computers' whims rather than humans', even in very high-level languages) into a more efficient form. Doing this as well as humans can could require AI-ish "reasoning" about parts of the program, but it's basically a different task. There's no need to understand everything in the world to know what a Haskell program does.


Even a high-level programming language has a strict grammar with no ambiguity, albeit maybe with some syntactic sugar and "higher-level" atoms. The same cannot be said about natural languages.


Designing a mapping from a new language to machine code is easier than retroactively trying to map two languages together that were never meant to be. Especially if those languages are complex and ill-defined.


Seconding on the SPJ book! It's a solid start for learning about the particulars of FP implementation.


The complaints in the article don't really generalize at all. It seems to just be a rant about why targeting the JVM from a functional language can yield non optimal code without certain program transformations. Haskell and OCaml don't have the problems mentioned here at all. I wish the title actually reflected the content rather than just being kind of antagonistic about functional programming.


For anyone curious, here is the dissasembly of the above code:

http://pastebin.com/R3QiCBsK

And the entire resulting object file:

http://pastebin.com/a21pqaKW


What do all the call statements do? They seem a little redundant as they skip forward to the next statement each time.


Note that the operand in each case is 0x00000000.

I suspect this is a disassembly of the .o file, or maybe it's some runtime dynamic linking thing? There'll be a fixup table somewhere and this stuff will be sorted out before it's executed.


The Glasgow Haskell Compiler Haskell Compiler? Never heard of that one :P


Aircraft are made of metal, not fluid dynamics. Rockets are made of metal, not ballistics equations. Nuclear bombs are made of metal, not quantum theory.

Well, metal is important, but to put it to any use other than very naïve fiddling with it, good abstractions are indispensable.

Flamebait titles, on the other hand, don't help it the smallest bit.


When you first wrote about how hadoop is a waste of time if you don't have multi-TB worth of data (http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html ), I thought that was classic linkbait. And then I actually started seeing real benefits of not doing map-reduce for smaller datasets ( small = 1GB-1TB) & just sticking to plain old scala ( poso as opposed to pojo :) Similarly, this article seems linkbait on the surface but makes a lot of sense if you do anything performance intensive. I recently tried implementing a multi layer neural net in Scala - I eventually ended up rewriting your mappers as while loops & introduce some mutables, because at some point, all this FP prettiness is nice but not very performant. It looks good as textbook code for toy examples, but takes too long to execute. Am still a huge fan of FP, but nowadays I don't mind the occasional side effect & sprinkling some vars around tight loops. Its just much faster, & sometimes that is important too.


I'd like to add that "sprinkling side effects & vars around tight loops" doesn't generalize to all languages. For isntance, GHC can optimize pure code MUCH more than impure code.

Meaning that with purity you get increased performance in many cases.


What I took from this post, is that we should keep working on compilers. They could provably optimize away most of the performance issues and move us closer to doing category theory instead of dealing with the metal.


I'd much rather deal with the metal than with category theory.

(I'm not trying to be snarky here, but I've studied category theory and it seems like a whole lot of effort for very little benefit. On the other hand, it's fun to figure out the metal. (Although silicon's not exactly a metal...))


Yes. To oversimplify, it seems like there are two schools of software: the car-mechanic school and the math-professor school. I happen to be of the car mechanic variety. That doesn't mean I'm not borrowing energetically from people pursuing the math-professor approach, but by now I'm pretty sure where my strengths lie. And given that computers (and cars) are mainly used for practical outcomes, I'm totally ok with that bias.


>the car-mechanic school and the math-professor school.

Heh heh. Having worked mostly with math professors, I would say the code they write almost exclusively belongs to the car-mechanic school. Math professors never deck up their code with CT. The majority of the time a math prof has to write code, its to do heavy-duty applied math, engg stuff ( solve pde's, fluid mechanics, EE math, fourier transforms, gradients etc. ) or to display some cool visualization to the students ( here's how a vector space looks! here is what happens when you apply linear transformation!! here are all the elements of a quotient group!!! ) - those tend to be done in Matlab/Mathematica/Maple/Gap...and these tools have zero CT. The applied stuff tends to use netlib, gams, gsl, colt, apache math...mostly a lot of gsl these days...again no CT there. Most of them are very happy to declare giant matrices & happily mutate away.


Heh. Yeah, a lot of the actual PhD-types I know are amateur coders on a mission, and their code matches that.

I'm sure you get it, but just for the record, by math-professor I meant "very focused on things that to most people seem incredibly abstract", with a side of "passionate about solving intellectual challenges for their own sake".


Agree with you there. Programmers need to understand the semantics of their language and use it to their advantage. I think this is especially the case in functional languages, especially in Haskell, where two seemingly similar pieces of code can execute in different ways, especially given the presence of lazy semantics. Knowing how to write efficient code in your language can be significantly different than knowing how to write code that works, depending on the language.


But that's why the abstraction is leaky. You no longer need to think at just the level of the abstraction, but also at the level of the implementation of the abstraction. This is fine, but it's one of the reasons abstractions tend to be less useful than they seem. If you're always thinking about what's really going on, you might as ditch the abstraction and only think about what's really going on. The less you have to think about its implementation, the better an abstraction is.


You might as well ditch the abstraction? So you program everything in assembler then?

OOP is leaky, maybe we should ditch it. The ADT and typeclasses in Haskell are much more general and well-behaved than objects in my experience.


I sort of buried the lede in my comment... My main point was that while all abstractions are leaky, it's reasonable to judge them on the relative extent of their leakiness.


Title is misleading. This is about some Scala microbenchmarks. There is no CT or metal here.


The metal is also made of category theory:

http://conal.net/blog/posts/circuits-as-a-bicartesian-closed...


Related links:

Cost semantics http://lambda-the-ultimate.org/node/5021

Recent efforts to bring together "logical" CS (lambda calculus, type theory, etc.) and "combinatorial" CS (machine models, big-O, etc.) http://existentialtype.wordpress.com/2014/09/28/structure-an...


I'm no expert on these matters, but it seems a bit ridiculous to call out entire swaths of the programming/computer science world while referencing the JVM as a benchmark.


I'm (author here) not "calling them out" - most people doing research in FP are well aware of these issues. This is aimed at people like me who merely implement stuff in those languages.

Also, most of the Scala code I'm describing compiles down to very straightforward instructions that would be the same on almost any compiler. If you can show me any other language which will handle a linked list as rapidly as the JVM handles an array, I'll be extremely impressed.


But your post makes more claims than merely computers not being able to arbitrage data structures.

In your first example, while I don't have any experience with Scala, any competent compiler for C++, Haskell, Rust, etc. will inline a function like "map", so there is no "function call overhead" whatsoever. (This example is too simple to benefit from stream fusion as such.) The resulting machine code will look fairly similar; if there is a performance difference, it would be far more subtle than your 10x, and would probably have to do with whether the map implementation in question is specialized for the Array->Array case. One that is will allocate the right number of elements up front and perhaps even skip bounds checks that the compiler might or might not be able to optimize out from the explicit version, but one that isn't will not only include bounds checks but repeatedly have to grow the result array and copy the elements so far to the new allocation.


Yes, in the conclusion I remark that a Sufficiently Smart Compiler can do all of this. I don't know that either Haskell or C++ compilers will inline your maps, but if so, that's very cool.

And yes, specializing for the Array -> Array case and eliminating bounds checks will be necessary to get real performance.


> handle a linked list as rapidly as the JVM handles an array

A linked list is not an array. Particular operations will have different complexities. One data structure is not "more rapid" than another.


This is precisely the kind of thinking people need to be careful about. Some data structures are more rapid than others even with more complex operations, because they conform to the reality of the underlying hardware better.


Can you please provide some examples? It would be much appreciated.


http://youtu.be/YQs6IC-vgmo

Stroustrup describes why –on modern hardware– vectors are faster than linked list for insertion and deletion.


I think the critique is with the implied (but not always understood) overhead of some of FP's common types of abstraction.


Good read. Minor quibble: I'm not sure calling Julia a functional language is really a fair statement. Yes it's LISP-like, but if you want to go there, Ruby is a LISP-like language in the same ways Julia is. I don't really encounter many folks making the claim that Ruby is a functional language.


> I don't really encounter many folks making the claim that Ruby is a functional language.

Ruby is a multiparadigm language, and functional is one of the important paradigms it supports. A rather popular essay on Ruby from nearly a decade ago (surprising for me to discover that it was that long ago!) was "Why Ruby is an acceptable LISP" [1]

Its true that recently much of the recent focus on functional languages has been on statically-typed and/or pure functional languages, which neither Ruby nor Lisp is...

[1] http://www.randomhacks.net/2005/12/03/why-ruby-is-an-accepta...


The most important thing here is to measure. Always measure. It doesn't matter what you think is going to be fast or not. The combination of super smart (or not so smart) compilers, multi-level hierarchal caches, pipelining, branch (mis)prediction, etc means you can't just look at a piece of code in a high level language and know how fast it will be. You always have to run the code and measure how long it takes. For anything where you really care about latency, measure.


...And then have something in an entirely unrelated set of code change and make the code you oh-so-carefully benchmarked become half as fast. Because you just caused cache line aliasing, or your pushed your function code across a boundary, or you're now killing your branch predictor, or, or...

At this point, in any optimized language benchmarking parts of your code is practically useless for finer optimization. There are some things you will pick up on, sure, but trying to squeeze the last drop of performance out of your code? Good luck. Too many global interactions. I mean: even GCC doesn't generate statistically significantly faster code on O3 versus O2.


Computers are metal but programming languages are for humans. The purpose of functional abstractions is primarily to help humans produce correct code, not necessarily to optimize for speed. Referentially transparent functions, immutable data and type checking are meant to provide guarantees that reduce complexity, make a program easier to reason about and help stop the programmer from introducing costly and dangerous bugs. It doesn't matter how fast your "fugly imperative code" is, if it's incorrect. And the lesson we've learned the hard way over and over again is that humans are just not smart enough to reliably write correct fugly imperative code.


I'm not sure I buy your argument, but even if I did, there is no evidence as of yet that says that fugly functional abstractions provide for lower error rates.

Further, performance is part of correctness for nearly any real word example.


> there is no evidence as of yet that says that fugly functional abstractions provide for lower error rates.

Sure there is. Compare the incidence of buffer overflows in C programs vs Haskell ones.

> performance is part of correctness for nearly any real word example.

No it's not. See the definition of correctness [1].

[1] http://en.wikipedia.org/wiki/Correctness_%28computer_science...


"Compare the incidence of buffer overflows in C programs vs Haskell ones."

It's true that 1 particular instance of a functional language removes 1 particular class of problems from another non functional language. But there is no evidence that these error rates are lower in general because other classes of errors creep in. For instance, space leaks due to laziness are a class of error you are unlikely to encounter in C, but are common in Haskell.


Sure, but in C you get memory leaks due to simply forgetting to call free...


In most real world software, performance constraints are part of the specification, and thus, performance is part of correctness by the definition you cite.

(And where they aren't part of the formal specification, it often is later discovered that that was simply a failure of requirements gathering/analysis, in that the specification did not completely capture the expectations of the entity that the software was being built for.)


What you're saying is that performance constraints are often part of the business requirements.

Business requirements are not the same thing as program correctness.


Ok, if you are going to be pedantic about it, you need to go back to your original claim about it not being important how fast your code is if it isn't correct. Proving something is partially correct is not important, if the program in question doesn't provide an answer in the required timeframe.


Also true, and that's when you need to optimize your code, which sometimes means doing away with abstractions that are slowing it down, which can mean replacing elegant functional code with fugly imperative code like the OP discusses. But then after you optimize it to meet the business requirements on speed and size, it still needs to be correct.

And of course, while we're talking about business requirements, there is always project speed and budget size to account for. When optimizing for program running time and memory cost, you don't want to blow up the project's running time and dollar cost. Productivity is a huge reason why abstractions need to exist.


Quick question: when the article calls a pair of mutually-recursive functions "corecursive", is this a commonly-used meaning of the term, or just a mistake (since they're certainly not corecursive in the coinductive sense)


That's the meaning I use.


Computers are "made of metal", and category theory does often lead far away from that reality. But that doesn't mean function calls are slow, or that we need Sufficiently Smart Compilers just to use map efficiently.

What it means is that abstractions should be designed with both the use and the implementation in mind. One way to do that is "zero-cost abstractions" a la C++, where the abstractions are pretty minimal. Another way is things like stream fusion and tco, where it's easy to accidentally stray out of the efficiently representable subset.

But there are a lot of ways to get abstractions that are both higher-level and "zero-cost" (generating the code you would have if you hadn't used them). For example, Python generators and C# iterators (coroutine-looking functions internally transformed into state machines) look a lot like Haskell higher-order functions but the benefits of laziness and stream fusion and tco are just built into the abstraction, rather than optimizations that may or may not happen, depending on the language/compiler. They also turn out to be more flexible, since the iterator state is reified.

Another example is entity-component systems in game engines. You still get a nice abstraction of game-world objects composed from behaviors, like you might see in an OO hierarchy, but the cache behavior is vastly improved and the behaviors are again more flexible.


Love the title.

A more common mistake I notice people making is writing code that makes more memory allocations than necessary.

  # Bad: Makes an extra instantiation of a list with 1 in
  # it which then needs to be read 
  x = set([1])

  # Good
  x = set()
  x.add(1)

Overall, I think it is important to remember that when you write a program that every step translates to a set of operations. And this applies to all kinds of programming, not just functional programming.


Any half-decent optimizing compiler will turn those two implementations into the exact same assembly.

That being said, that looks like Python source code - and as the default Python interpreter doesn't do much in the way of optimization, you are correct in that case. (Although I wonder about PyPy.)


When I started programming it was ZX Spectrum BASIC. No abstractions other than some symbols. Then I moved on to Perl, then PHP and eventually C# - all increasing abstraction from the hardware, but all mutable state and based off how hardware works.

Recently I've started moving on to F# and Haskell, and it's really opened my eyes.

While computers keep getting faster the humans who input the programs do not. While programming is about getting a computer to do things the most important part is making it do what you want it to to. Anything that helps humans reason about what the computer will do - rather than exactly how - is a good thing in my book.


Stalin and MLton are sufficiently smart. Probably too smart in some cases.


One can just as easily make the counter-argument people understand theory, not computer instructions. So the cleaner and more functional your code is, the easier (and faster) it becomes for other people to build on top of / around it.

But that counter-argument only works given that the current application requires tons of people of work on your code. Just as the article's argument works only if you actually need to squeeze out those extra ms on your 1 machine.

My only take away from this is "use the right tools for the right job".


I'm a little confused: the author is speaking of "metal" but isn't writing assembly.

(And I'm a little snarky today.)


Surely the point is to use an effective language (for your particular definition of effective) and then optimize based on performance testing. Otherwise you lose time writing fast low level code that doesn't need to be fast or low level, and possibly never get to the important stuff.


This. "No optimization without quantification" should be pasted up beside Knuth's laws of optimization. Performance testing and profiling will tell you where the code is really spending its time. Fix those problems.

The past 20 years have shown us that naive optimization, which was never all that good, breaks completely in modern compilers, which often emit code that is optimized for the underlying architecture in ways that it's difficult for the programmer to anticipate. Like everyone who used to do a lot of hand optimization, in the late 90's I started seeing more and more cases where my clever tricks were making things worse, not better. So you can't just optimize by rote: you have to understand the specific problem you're dealing with.


Well, OK. I have never seen anyone argue that functional programming is beneficial for performance reasons.


Well, Ok.

Haskell (or any other pure functional language) can often make it viable for you to write code that is much more performant than what you'd have writen in C (or any ohter mainly imperative language). Mainly so if the problem is paralelizable and can use asynchronous IO.

Done. Now you've seen somebody arguee that functional programming is benefical for performance reasons.

Of course, the previous statement is not true if you have unlimitted budget and time. At least not yet.


TFA serves as the rejoinder to the argument you just made :)


Really? I hear it all the time.

The argument paraphrased is "we've maxed out what we can do in a serial fashion, so we need to start working concurrently and/or in parallel. Functional paradigms are much easier to reason about concurrency so we switch to them for the win!"


More than that.

Functional programming in general tends to be better in terms of cache locality and number of dereferences required. Largely because it tends to encourage storing things "column-oriented". Ever looked at cache locality and the number of dereferences required in an overly-object-oriented program? Not pretty. Pointer chasing all over the place.

Although this is more of a push away from OO programming than a push towards functional programming. OO implicitly assumes RAM - and in modern computers memory is decidedly not random-access.


"Functional programming in general tends to be better in terms of cache locality"

I've seen no evidence to support the "in general" part of this claim. I will agree that pointer chasing kills cache locality and that in some object systems dereferences have a bad impact on this. But in many functional systems you encounter the same problem with cache locality due to the nature of immutable data structures.


Hence the next paragraph: "Although this is more of a push away from OO programming than a push towards functional programming."


silicon is not really metal[1], it's rock. It's the most abundant element on earth & the universe, which I find interesting. Most of our planet and universe could be turned into a giant computer.

[1] it's a metaloid


Silicon is not rock, rock is Silicon Dioxide, which is not the same thing.

It's not the most abundant element on earth - Iron is. It's not even the most abundant on the earth's crust, oxygen is. (Silicon is second place.)

And it's certainly not the most abundant in the universe - hydrogen is. Silicon is 8th.


Yes I should check my factoids before regurgitation. In atom counts there is more silicon than iron though, but more oxygen than either. Still interesting though that silicon is so plentiful. Doesn't matter if its bonded to an oxygen. I suppose the bonding explains why the oxygen count on earth is so high also. Oxygen is also (part of) rock.


> Still interesting though that silicon is so plentiful.

It makes sense if you look at https://en.wikipedia.org/wiki/Stellar_nucleosynthesis since carbon, silicon, oxygen and iron are some of the main results.

What's more odd is just how useful silicon-dioxide and iron is for building planets! Some of the most plentiful elements, just so happen to be perfect for making planets.

And carbon with its amazing array of compounds no other element can match is also plentiful.

Then you have iron which is the end result of this process, which just so happens to be perfect for building man-made things. It's strong, yet unlike the other strong elements, iron is also easy to work.

And water - oxygen and hydrogen, very common elements. And water expands when it freezes, if not for that small small, non-obvious thing, life could not exist.

The most abundant elements, the ones that don't need a supernova, just so happen to also be the most useful ones. Well, it's pretty obvious to me this universe was designed specifically for us.


   - There is nothing absolute in the world.
   - Metal rails are.


He keeps using the word "pointer" ...


"No, don't take away my beautiful abstractions!"




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: