John Carmack on Inlined Code (2014)

dzdt · on July 19, 2016

I have had the pleasure to work with lots of other people's code of varying styles and quality. Nothing is harder to read and understand than code which is deeply nested calls from one little helper (or wrapper) function to another. Nothing is easier to read and understand than code which just flows straight through from top to bottom of a big function.

There are other tradeoffs of code reuse and speed and worst-case-speed and probability of introducing bugs. If you haven't read the article, do, its worth it.

I love that Carmack tries to measure which styles introduce more bugs! Who else does that? Seriously, I would love to see more of that.

jameshart · on July 19, 2016

I don't think I can agree with "Nothing is easier to read and understand than code which just flows straight through from top to bottom of a big function" in the general case.

I will agree that if you are coding in the kind of environment Carmack is talking about, where you are writing code which interacts with massive shared global state, that purely procedural structured code is easiest to understand, because you can be sure that you can see all interactions with global data structures in one place.

But if you don't have massive shared global state, I'd argue that calling subfunctions is a hugely valuable aid to understanding, since it allows you to reason much more about the way data dependencies flow through code when you can be sure that a call to a function means that that function can only act on the data structures passed in to it and is guaranteed not to affect anything else. I would much rather in that case see:

   a = subOperationA(arg)
   b = subOperationB(a)
   c = subOperationC(a, b)
   d = subOperationD(b, c)
   return d

than see those three suboperations inlined and have to figure out for myself how the data dependencies flow through the code.

taeric · on July 19, 2016

Isn't this a bit of a strawman, though? All too often, you instead get the likes of:

    a = subOperationA(a, b, c, d, e, f, g, h, i)
    b = subOperationB(a, d, e, f, g, h, i, j, k)
    c = subOperationC(d, e, f, g, h, i, j, k, l)
    return c

And then some poor fool breaks each of those up into about 8 functions each. So, you click into subOperationA to find it is subOperationAa chaining to subOperationAb,....

My gripe is when we argue against having the full solution in your mind as you work the pieces. Logically, it makes sense to reduce everything down to named things. Cognitively, that is expensive.

And yes, I'm responding with an equally opposing strawman. I am beginning to reject that there is a "one true answer."

saynsedit · on July 19, 2016

This is not really about subroutines, it's about abstractions. Good abstractions decrease cognitive load, bad ones increase it. The one true answer is to aim for good abstractions, and not factor code into subroutines at arbitrary boundaries.

For example, adding two n-length vectors is a good abstraction ("vector_add"). Adding two n-length vectors then multiplying the magnitude of the result by 3 is a bad abstraction.

sbov · on July 19, 2016

> I am beginning to reject that there is a "one true answer."

There are basically no guidelines that always apply. Novices should stick to them, and people with more experience can recognize when they don't apply.

Retra · on July 20, 2016

It's usually very safe and convenient to not inline pure functions. At least while debugging.

jschwartzi · on July 20, 2016

The sooner you learn that there are no hard and fast rules, the easier life will be for your teammates and for anyone you try to lead. Most of the conflicts I have with my lead boil down to him wanting to resolve a decision using a hard always/never dichotomy, and me wanting to use my best judgement.

joveian · on July 19, 2016

Does any language have as a feature blocks that you must pass variables to? I tend to prefer larger functions but agree with your point. If I ever write (or extend) a language I am going include huge numbers of ways to make assertions or explicit limitations of various forms.

I like Jens Gustedt's proposal for annonymous functions in C (at the end of Modern C[1]) by extending the syntax for compound literals. That would in a way give parameterized blocks, except that what you pass would be at the end (although if you name the parameters the same as what you pass it should work).

I think the main point is that it is helpful to be able to easily follow all the code that executes from point A to point B (at least up to the point of whatever portability layer you use). If you can reasonably structure it so that you just need to hit page down to read that code (without duplicated code) then that will often be easier to read IMO. I don't find one or two levels of function calls to be hard to follow, but after that it quickly gets more difficult. I suspect preferences here may to some extent be influenced by how good your short term memory is (and how good your IDE is). But I suspect there are also significant differences in how often different programmers try to step through code.

[1] http://icube-icps.unistra.fr/img_auth.php/d/db/ModernC.pdf

TillE · on July 19, 2016

You could do something like that with C++ lambdas and explicit capture.

    int a,b,c;
    // ...
    [a, &b]() {
      // do stuff
    }();

stcredzero · on July 19, 2016

Decomposition into functions and routines can make some sort of conceptual sense. This can be like the "chunking" that advanced chess players do, allowing them to remember entire board positions. Often, programs that are a pleasure to read have a structure which makes internal sense or which can be attached to an outside concept. Doing this optimizes for programmer reading time.

kminehart · on July 19, 2016

Good Lord yes. When I was working on point of sale machines for Toshiba, there was this unbelievable depth of inheritance and ridiculously deep callstack constantly, and it made fixing any minor bug take weeks.

* abstract class Connection would have maybe 3 or 4 methods;

* DataConnection would extend Connection and add a couple more methods that were specific to some proprietary protocol.

* POSDataConnection would extend DataConnection and wrap this proprietary protocol for POS machines

* ControllerDataConnection would extend POSDataConnection because a Point of Sale controller is technically a POS machine with a bit more functionality (Really just a couple flags turned on).

* There was plenty more in between; it's been so long now that I've forgotten it all.

It's just like we all learned in college! Object Oriented programming is supposed to model real-life! Except no, it's not. That's stupid and complicated.

Now the part of the OS that was C/C++ was absolutely beautiful. It took the UNIX style of programming / applications seriously; every little piece was its own program, and it worked flawlessly. Anyone could jump in and get to work immediately because it was so well written. You could follow any program top to bottom and it just... made sense!

artursapek · on July 19, 2016

> it made fixing any minor bug take weeks.

My god.

A lot of these fancy language features sound good on paper but can be so abused that even just coming to understand the code takes up too much space in your head.

https://twitter.com/davidcrawshaw/status/507268175256231936

MyNameIsFred · on July 19, 2016

Emphasis belongs on abused. Interfaces are swell. Subclasses of classes which implement an Interface are swell too, when they extend simply by altering the way said interface is achieved. When you subclass to "add more stuff", however, you get the kind of nonsense described.

deong · on July 19, 2016

The problem, in my opinion, is that the Java community (generalizing, I know) decided to bless that abuse as a best practice.

donatj · on July 19, 2016

This is part of why I am so in love with Go. I have never read Go I couldn't understand. It's very clear at heart. I have read through much of the standard library and it all makes sense.

MyNameIsFred · on July 19, 2016

Do you think that's that's because the language inherently makes things readable, or is it simply because the language is too new to have popular anti-patterns?

mixedCase · on July 19, 2016

The simplicity of the language and some of its choices really force you to do things a certain way that comes out almost universally legible instead of going with what may feel easier or "clever" to write. This is why it's so widely criticized (which doesn't mean that some of its criticism isn't fair).

Also gofmt having the last word on what code is acceptable and what isn't helps kill a lot of arguments that aren't worth having.

stcredzero · on July 19, 2016

In a way, it's like the Go team chose to write something that has some of the coder UX of Smalltalk, but in a C-like statically compiled language. There is a useful region of language design space around the "simple language" concept, and they seem to be making the most of that particular design space maxima.

agumonkey · on July 19, 2016

The team promotes heavily dead simple code over PL abstractions. Taken from some golang con slides where they said there would be nothing new in go, only perf and bug removal for a while.

donatj · on July 19, 2016

It's an active choice of the language to not allow many syntactic sugars that make the code have a much higher learning curve.

https://www.youtube.com/watch?v=rFejpH_tAHM is a good overview.

jasonm23 · on July 20, 2016

In many ways I'm terrified by Java programmers who have never heard of pattern abuse.

jakub_h · on July 19, 2016

"It's just like we all learned in college! Object Oriented programming is supposed to model real-life!"

It is - in execution (message passing), not in taxonomies. You can't model the taxonomies because all set-in-stone languages are inadequate for that (see SICP, for example). Also, it sounds like someone had an inheritance mania on your project. That's an instant loss right there. (But maybe the language used didn't allow for anything better? C++ is notoriously bad in this respect, for example.)

oselhn · on July 19, 2016

This is definitely not what you learn on college. If you use inheritance instead of composition it is your problem, not OOP problem. You can do the same mess with procedures and structs. OOP is not here to magically solve programmers bad design decisions.

vonmoltke · on July 19, 2016

> This is definitely not what you learn on college.

That is definitely the way I was taught OOP in college. I came out thinking it as a steaming, overcomplicated pile of shit. I have since learned better. Of course, there is a whole world of enterprise programmers for whom what was described is "proper" OOP.

jakub_h · on July 19, 2016

Did you attend an "encapsulation, polymorphism, inheritance are our holy buzzwords" (yuck!) kind of technical college or a "focus on messaging patterns" (yay!) kind of technical college? The former kind tends to do that to people.

vonmoltke · on July 20, 2016

I actually got this out of a College of Engineering at an actual university. It was mostly the former.

maxxxxx · on July 19, 2016

When I learned OOP it was all about inheritance. Only later I realized that aggregation is much easier to handle and that functions that take well defined input and return a value without side effects are easier to maintain.

jakub_h · on July 19, 2016

It's never been all about inheritance, they just forgot to tell that to Stroustrup and Gosling. ;)

kminehart · on July 19, 2016

Maybe I didn't go to a good enough college, but this is what I was taught. Either way though, this is not how I do OOP.

It is what I had to deal with, though.

PopsiclePete · on July 19, 2016

After working on some "enterprise" systems myself, I came to the cynical conclusion that Java/C++/C# were never meant to actually solve problems, but to introduce unnecessary complexity and brittle-ness which eventually, inevitably, leads to the hiring of "consultants" for support or new features at exorbitant prices.

These languages, as well as things like WSDL/SOAP, are the late 90's/early 00's snake oil designed to woo clueless manager-types into funneling money to consultants and 'architects'.

I worked a lot with those types and saw the futility of suggesting 'simple' solutions to simple problems - everything had to be over-complicated to justify their salaries.

It left me pretty bitter towards that whole business model. Then again, there's lots of money to be made there....

jolhoeft · on July 19, 2016

Ugh, yes. The inheritance tree is just a line, of 4 or more segments.

  interface -> abstract -> base -> TheOnlyImplementationInstantiated

Repeat for a couple more chains, and one line bug fixes require visiting a dozen files to figure out.

critium · on July 19, 2016

While I agree deeply nested calls are bad, I've also had the pleasure of inheriting a an application with several, muti-thousand line functions and those are not fun either.

In code as in life, all in moderation.

SomeCollegeBro · on July 19, 2016

Couldn't agree with you more. I get the impression that a lot of people in this thread have never worked on behemoth, million-line enterprise code bases. Cognitively speaking, 4 nested thousand-line function calls is equally as bad as 15 nested 10-line function calls.

> In code as in life, all in moderation.

Beautifully said. Both methods (procedural vs. object oriented) give you different ways to shoot yourself in the foot. Instead of picking one or the other, it's more important to manage the scope at which your project grows. You have to avoid falling into the traps of each style.

jblow · on July 19, 2016

Wait, you are comparing 4000 lines of code to 150 lines of code? How does that make sense?

If you modify your example to 400 nested 10-line function calls, how does that change your comparison?

coldcode · on July 19, 2016

The worst example of this I ever saw was an old macOS helpdesk application (@1995) that had a 14,000 line main event loop and the entire application was a single 29,000 line C file. Just typing at the bottom of the file was agonizing. We tore the whole thing apart and put it back together as a real app with include files and everything.

jackmott · on July 19, 2016

Sometimes that is just inherent in the problem. If a a few thousand lines are actually needed to solve the problem, it will be somewhat hard to deal with whether they are all in one big function or broken up into many functions. One option may be marginally less annoying than the other, depending on what you are doing, or preference perhaps.

But sometimes it is just a "Such is life" situation.

MereInterest · on July 19, 2016

I think it depends on whether the problem lends itself to abstraction or not. If I have a problem that requires 1000 lines to solve several large system of linear equations, I'm absolutely going to be separating that out. On the other hand, if I have 1000 lines of "if in country A, apply this tax rate", then I'm not going to separate it, because there is no underlying abstraction to use.

lostcolony · on July 19, 2016

That's actually one of the more reasonable uses for OO; have an interface that defines 'apply_tax_rate', and a bunch of implementations for each, so you only have a single line in your function (instance).apply_tax_rate. Of course, that leads to your actual implementing code being scattered in different classes, and that can be a pain, too.

If a functional language, I'd prefer each being its own function, with the case matching in a single, standalone file that returns the appropriate function based on whatever. That way you still get just apply_tax_rate(type, amount) in your main calculation function.

sbov · on July 19, 2016

You are saying something I highly doubt you want to be saying.

My "problem" is an online store. It is about 75,000 lines of code. It needs all 75,000 to solve the problem. What you are saying is that it would be "somewhat" hard to deal with whether it is all in one big function or many functions.

But in reality, one big function would be incomprehensible. It is vastly easier to deal with broken out into many functions.

adrianratnapala · on July 19, 2016

A problem that genuinely takes a few thousand lines to solve doesn't sound "hard to deal with". It sounds just the right size for a module or tool that solves a problem big enough to be non-trivial, but small enough to be solved well.

However you end up solving it, it is way to big for a single function. On the other hand, a moderate bunch of functions can do it nicely -- and at that scale you don't need big complicated object-oriented abstractions other than the external interface itself.

rakpol · on July 19, 2016

I was thinking about the issue of nested code the other day -- isn't it (mostly) an editing environment problem? I mean, if my IDE can detect that a method is pure, couldn't it do a little magic (e.g. provide a different coloured background, some dotted boxes, etc.) and show me the inlined code right there? It doesn't seem wise to architect your codebase around the deficiencies in your tools, since one is (hopefully) going to long outlive the other.

tibbon · on July 19, 2016

Oh, i like this idea. Imagine your IDE just being able to substitute method calls that are just calling single line methods, for viewing only. Or even being able to refactor them in-line, but keep them in private methods?

slazaro · on July 19, 2016

With Visual Studio, you can visually inline the definition of a function below its call by selecting "Peek Definition", which is close to what you're saying.

You can only do it one at a time, and it's for quick scanning of what the function does, though.

timlyo · on July 19, 2016

I think Jetbrains ide's also have this. It's been a while though.

tibbon · on July 19, 2016

Hmm, wonder if I can coax Sublime into doing that. It's a little closer to IDE-type functionality, which it's pretty weak on (refactoring is terrible for Ruby with it).

caf · on July 19, 2016

Particularly if the editor can do things like constant folding and branch elimination based on constants that are being passed at that point.

jakub_h · on July 19, 2016

That's just the editor doing the compiler's work at edit time, though. I know that Chuck Moore has always advocated extreme early binding, but that really is usually associated with Forth, not with mainstream languages.

caf · on July 20, 2016

I didn't mean that the editor would change what was presented to the compiler, just that it would simplify for the user when expanding the use of helper functions.

taneq · on July 20, 2016

> By making your methods shorter, you’re just trading one kind of complexity for another.

Oh, I want to frame this and put it on the wall.

I can't stand code where even the simplest thing is implemented as a giant tree of sub-5-line functions nested 15 deep (and probably, for bonus points, scattered across half a dozen files).

perspectivep · on July 19, 2016

Visual Studio has had that for awhile now.

http://msdn.microsoft.com/en-us/library/dn160178.aspx

softawre · on July 19, 2016

VSCode has this as well.

stavros · on July 19, 2016

I don't know, I'd prefer:

config_filename = get_config_filename()

config = read_config(config_filename)

endpoint_url = config["endpoint_url"]

auth_data = get_auth_data(endpoint_url)

to just dumping all of that inline.

CyberDildonics · on July 19, 2016

Except that is a false dichotomy. This isn't about inlining 'pure' functions (I know your example is IO, but they aren't mutating your program's state), it is about avoiding functions that are only used once and/or that mutate state.

I've used the same technique - instead of creating a function that might be awkward due to all the inputs and outputs, I'll create an inner scope {} with nothing around it. Only the variables that it modifies/initializes are left on the outside. This is a simple way to enforce more of a data flow style within a function and ends up being a useful organizational tool. It is also similar to 'let' in some languages.

MyNameIsFred · on July 19, 2016

This. In the article, Carmack does suggest just this, although its not as explicitly called out.

If your methods are functional, then

    int fancyNumber = calculateFancyNumber();

Can be inclined cleanly/equivalently as:

    int fancyNumber;
    {
        // [Mathy stuff redacted]
        fancyNumber = someIntermediateVariable / anotherIntermediateVariable;
    }

Fillipoman · on July 19, 2016

And with c++11 you can use a lambda capture to make it even cleaner (I'm not sure if this is the right syntax)

[&fancyNumber] { // [Mathy stuff redacted] fancyNumber = someIntermediateVariable / anotherIntermediateVariable; }();

Roboprog · on July 19, 2016

Best of both worlds in Javascript. Use "IIFE" to make an anonymous function and call it. The editor lets you elide the function if you don't want to see the details, or leave it "open" if you want to read it.

If it turns out said little dance needs to be done twice, instead, assign the function expression to a name (in an appropriate scope) and call it twice.

Nested funcs/procs in Pascal were useful for this kind of local partitioning, as well (before C; C++; Java came and peed in the development mindshare pool), even if you couldn't make closures out of them.

stavros · on July 19, 2016

That's all good, unless you need to do it twice. Then you're repeating yourself, and you're in a whole other world of pain next time you want to change some of the code.

stcredzero · on July 19, 2016

2x copy-paste is a "whole other world of pain"? I've worked in shops where the rule was you didn't make a method or otherwise refactor until there were 3 repeats of the same code. If you're in an environment where you can make the prophylactic search for that code pattern, this works well. (I could easily make searches with full syntactic expressiveness.)

stavros · on July 19, 2016

> 2x copy-paste is a "whole other world of pain"?

Yep, namely when someone fixes a bug in one of the paths but forgets to update the second.

stcredzero · on July 20, 2016

Sorry, but this isn't a "whole other world of pain." Increase the multiple to dozens, hundreds, or thousands. That is a whole other world of pain. (I've worked in a project that started with literally 500 re-implementations of doubly linked lists.)

stonemetal · on July 19, 2016

Is get_config_filename() a large function that gets the filename directly or is the end result nested another hundred lines of callstack down? If it is a large function that directly returns results then that is my pick, high level "story" with shallow, direct access to the details. No dicking about for half an hour trying to figure out where A, B, or C happen, yet hidden details for when you are focusing on a different part of the code.

imron · on July 19, 2016

> Nothing is easier to read and understand than code which just flows straight through from top to bottom of a big function

It really depends on how long the function is. Ignoring stuff like variable declarations and verifying constraints at the beginning of a function, I personally find anything longer than a screen or two is usually better off (in terms of readability) being in its own function.

wschroed · on July 19, 2016

I wonder if we've over-focused on the inlining vs functions-for-modularity thing here. Fundamentally, the primary objective is to express code in a way that is clear to the reader for maintenance. Secondarily, the code must work with the compiler to achieve the desired level of efficiency in speed or resource usage.

I have come to interpret this as: We are language designers. This isn't about functions. This is about building a language for the business case that is comprised of primitive expressions, means of combination, and means of abstraction. If the language is clear to the reader, expressing the problem well, it should be easier to detect problems or extend the existing language and its uses.

We are language designers already: If you build a traditional class with a bunch of methods, that is a language with how to deal with the concept embodied by the class. It must be held to the same standards of any language, DSL, or API design.

I like to remind people that we don't tend to dig into the code behind printf(); we trust what it does. We have years of experience using that primitive. It's a great example of a function that has been through many revisions due to security issues, untrustworthy in its inception. What is key here is trust that a primitive does as advertised so that one does not have to dig into its source code repeatedly. Nested functions are not an issue in the presence of trust.

My "secondarily" clause has a fatal flaw: Many languages are not suited for building languages while simultaneously not trading off speed and resource usage. The ones with pre-runtime macros/templates assist us developers in the building of expressions beyond the limitations of the base language with minimal fuss.

jakub_h · on July 19, 2016

"This isn't about functions. This is about building a language for the business case that is comprised of primitive expressions, means of combination, and means of abstraction."

Heh. Anyone up for a Sussman&Abelson drinking game? :D

DigitalJack · on July 19, 2016

My experience, limited though it is, is that there is a trade-off between how easy it is to write code vs how easy it is to read.

Abstracting with functions is very beneficial to the writer because, well, they come up with the abstractions and know what they mean.

I feel they are less helpful to the reader, except perhaps at a very cursory level. If the reader is trying to actually understand the code to be able to modify it, many abstractions are actually a hindrance.

For example, in lisp, it's a fairly common practice to essentially write a DSL for the problem you are solving. It's a great tool to be able to do this easily and quickly. You can build massive and complex programs without overloading your brain.

However, for a reader new to your code base, there is a huge cognitive load to try to decipher the DSL. It's intuitive for the writer, because they invented it, but for the reader, it is a hurdle to overcome.

Once you learn someone's DSL, it's a very powerful tool now for you too. But when every project has it's own, it's really too much to bear.

qwertyuiop924 · on July 19, 2016

If you write and document your DSL well, the cognitive load is minimal, unless you need to plunge into the implementation. And it's not hard to write a DSL: I wrote what looks a heck of a lot like a DSL for a very simplistic piece of IF I wrote with maybe 5 exposed functions, and no macros.

chimprich · on July 19, 2016

Big functions are hard to unit-test.

The article includes comments from Carmack from 2014 saying he now favours breaking the code up into pure functions.

donatj · on July 19, 2016

I had the "pleasure" of working on a system that made recursive calls that made other recursive calls that made other recursive calls, which made recursive calls to the database. I'd helped design it originally but this was not at all what I intended. I had a medical emergency though that knocked me out of work for quite a while and when I got back this unreasonable thing was what they had built.

oselhn · on July 19, 2016

From my experience a lot of nested functions usually means spagetti code created from huge functions which were split on radnom points to small functions (which mixed abstraction levels and still do multiple things at once). If you properly design your data model there will be no huge functions and no deeply nested calls.

purplerabbit · on July 19, 2016

You can definitely break up a function's contents into logical nested functions... Why are you saying "random points"?

Nested functions can actually allow you to even out your abstraction levels... Just group lower-level operations in more deeply nested functions. It's sort of the functional equivalent of extracting a group of similarly-leveled concepts into a new object in the OO approach.

Not sure if you were talking about C++, where my point probably doesn't apply... (Lack of nested function support, AFAIK)

chocolatebunny · on July 19, 2016

I think the biggest problem with the "big function" is that it tends to lead to code duplication. If taskY is in bigFunction, but taskY is needed in otherPlaces then people tend to just copy paste taskY in otherPlaces.

If another developer joins the team, then he might not have noticed that there is a taskY and may end up writing his own, completely different, implementation in yet another place.

For C, I tend to go the big function route but I usually try to put extra braces with comments to isolate subtasks from each other within the big function. Then I can pull out the tasks into functions if I see that I need them in more places. I don't know how my old code is doing so I don't know if anyone has followed my lead in those projects.

jontas · on July 19, 2016

I don't know if you read the whole article but the exact same thing is suggested towards the end.

hndl · on July 19, 2016

We do have Cyclomatic Complexity (https://en.wikipedia.org/wiki/Cyclomatic_complexity). I see this often enough to understand how important it is.

rakpol · on July 19, 2016

Isn't cyclomatic complexity just about the number of paths through code? It seems like (statically resolved) nested calls, as long as they don't loop or branch themselves, shouldn't increase cyclomatic complexity.

sb8244 · on July 19, 2016

There may be subtle branches like error handling that can impact it for otherwise non branching code

Waterluvian · on July 19, 2016

Maybe an editor can allow you to expand function calls so they appear in-line when you want to read them as such. Often I do have the nested call hell, but the alternative is an absurd amount of writing/testing things twice.

andrewingram · on July 19, 2016

I'd have thought at least a few people measure bug ratios of different coding styles?

It's a big motivator in terms of deciding up best practices. And whenever I fix a bug, I try to identify the root cause. If it was a coding pattern to blame (as opposed to bad process or design) I look for ways to eliminate that pattern.

specialist · on July 19, 2016

My most recent real-life example: Jackson + Afterburner

I hand wrote a LA(1) style parser (and lexer) for JSON, because I wanted to experiment with a source code generation notion I had (vs reflection, or bytecode generation). I was surprised that in micro benchmarks, Jackson + Afterburner is still twice as fast as my stuff. I had to understand why.

Turns out it does the lex/parse equivalent of unrolling and inlining (cut and paste) absolutely everything. The code is utterly baffling.

My implementation's jar is ~30Kb, fast enough (faster than all but Jackson + Afterburner, Boon), stupid simple, and works the way I want, so I'll keep using it for my projects.

userhacker · on July 19, 2016

There is a very interesting discussion on how to go about doing this in this video https://youtu.be/QM1iUe6IofM?t=37m41s

honkhonkpants · on July 19, 2016

C++11 has given us a new and wonderful world of this small function debugging horror, since we can now gin up lambda functors that have no name, that have a type with no known name, and don't play well with the debugger (good luck figuring out why your program is spending all of its time in "operator()").

AstralStorm · on July 19, 2016

Even relatively old generally and lldb show location information correctly for lambdas.

Templates are usually more of a pain to debug.

honkhonkpants · on July 19, 2016

How do you mean they show them correctly? Perhaps you are right but I often see them attributed to std::function::operator() in the header <functional>, because someone has assigned the lambda to a class member field or function argument of type std::function. When the function has more than one caller it can be a nuisance trying to figure out which caller's lambda is the one being called. The only way I know of to identify them is to examine the full template expansion which is often a kilobyte of metaprogram gibberish.

robotresearcher · on July 19, 2016

Can you not see the call stack for your lambda, as for any function?

zump · on July 19, 2016

YES.

Nothing is more annoying than a pointlessly huge call tree where the function is called from a single place.

rimantas · on July 19, 2016

Procedural code is easier to reason about. OOP is easier to change. Unless it is Evil OOP, with deeeeeep hierachies and done without understanding why and what for.

lj3 · on July 19, 2016

Interestingly, Casey Muratori accidentally demonstrates during one of his Handmade Hero sessions that the compiler won't always be able to optimize certain bits of code that are put in a function as opposed to being inline.

In the video, he inlines a very simple function and his game gets twice as fast for no apparent reason. It's instructive to watch him dive into the generated assembly to figure out why.

https://www.youtube.com/watch?v=B2BFbs0DJzw

imtringued · on July 19, 2016

The compiler probably turned

  for(int I = 0; I < 4; ++I) {
        real32 PixelPx = (real32)(XI + I);
        real32 PixelPy = (real32)Y;
        real32 dx = PixelPx - Origin.x;
        real32 dy = PixelPy - Origin.y;
  
        real32 U = dx*nXAxis.x + dy*nXAxis.y;
        real32 V = dx*nYAxis.x + dy*nYAxis.y;
  
        //rest of the loop
  }

into something like

  real32 PixelPy = (real32)Y;
  real32 dy = PixelPy - Origin.y;
  
  real32 PixelPx = (real32)(XI);
  real32 dx = PixelPx - Origin.x;
  
  real32 U = dx*nXAxis.x + dy*nXAxis.y;
  real32 V = dx*nYAxis.x + dy*nYAxis.y;
  
  for(int I = 0; I < 4; ++I) {
      U += nXAxis.x;
      V += nYAxis.x;
      
      //rest of the loop
  }

PixelPy and dy are not affected by the counter in the loop which means they can safely moved outside the loop.

This also results in the subexpression dynXAxis.y and dynYAxis.y being lifted outside the loop.

Now we've moved half of the code outside the loop but we aren't done yet.

The same can be done with PixelPx and dx, the trick is to then replace dxnXAxis.x with

  (dx + I)*nXAxis.x

Expanding
(dx + I)*nXAxis.x
yields
dx*nXAxis.x + I*nXAxis.x
We can now lift the subexpression
dx*nXAxis.x
out of the loop.
The only thing that is now done in the loop is
I*nXAxis.x
which can be further simplified to
U += nXAxis.x
The same happens with nYAxis.x.
EDIT: Sorry for the bad formatting. The markdown parser ate my asterisks so I put things into code blocks which requires a new line each time.

Lerc · on July 19, 2016

Effectively this means the speedup comes from optimizations that assume the code in question is only ever run in that context. When the code is inline this is an easy call to make. For a function it's trickier. I would hope some compilers make a function to handle arbitrary contexts but try inlining on individual cases to see if significant gains such as this can be made.

It's another hurdle for the sufficiently smart compiler though. You need to know how the program will be run to know which is the better form. Once you get into making code-size Vs speed things get murky with instruction caches etc.

Manozco · on July 19, 2016

having const parameters in the function V2i might have helped the compiler I think

sitkack · on July 19, 2016

TIL, another reason to perform 'defactor inline-method' (with directed feedback of course).

Longwelwind · on July 19, 2016

I'm not sure it should be considered as a reason for anything. The most important point of the article, imo, is:

  If a function is only called from a single place, consider inlining it.

You should consider inlining your function, not always do it. Recently, I made a mod for a game and I had to draw an UI by code, and there, it made sens to use one-time function because it made the code easier to read (super-expressive functions like DrawLeftPane() or DrawHeader(), and next to no ties between functions).

Most of the time, code readability should be prioritized over performance.

sitkack · on July 19, 2016

Totally agree, but in perf critical inner loops, it might be interesting to speculatively inline different functions and measure perf. Overly factored code inside an inner loop has been shown in the video to cause compiler confusion.

sqeaky · on July 19, 2016

People write perf intensive inner loops so infrequently that this really ought to be discounted as an argument.

I work in two performance sensitive projects, both C++, and this has yet to be a reason to inline code. Algorithm choice is optimization of choice first and so far finally.

jackmott · on July 19, 2016

Another example to set aside for the next "So you think you are smarter than the compiler" person.

_qwfv · on July 19, 2016

Is that a helpful response to that person? Isn't that person (despite sounding like an ass) more or less right at any given time?

Excepting environments where performance is critical (games comes to mind), shouldn't we bias toward improving the code for human readability?

AstralStorm · on July 19, 2016

Technically, you could have some stronger keyword than inline in future C and C++ standards, akin to constexpr. For hard inlining always.

For now, there are macros if an inline function does not work properly. Attributes to force inlining exist in some compilers, at least GCC and clang support those.

Additionally marking the function as pure if applicable can help optimisers as well.

zyxley · on July 19, 2016

You may find the Nim language to be interesting. The entire language is processed as an AST, so you can do a lot of magic stuff like write normal functions, functions with forced inlining, and AST-transforming macros all in the same syntax and all processed in one pass at compile time.

http://nim-lang.org

AndrewGaspar · on July 19, 2016

__forceinline in MSVC.

niedzielski · on July 19, 2016

This is still a suggestion https://msdn.microsoft.com/en-us/library/bw1hbe6y.aspx

AstralStorm · on July 19, 2016

That usually works, with caveats as mentioned on page.

(Security attributes and recursive calls that may remain recursive calls instead of stack utilising loops.)

GCC and clang variants do not have this issue.

MSVC is generally not really known for high performance of generated code, which is partly why newest versions support a clang backend.

enqk · on July 19, 2016

MSVC's latest versions support a clang frontend w/ microsoft's code generation as backend. Are you referring to something else?

DannyBee · on July 19, 2016

constexpr functions exist, and they are also inline :)

bshimmin · on July 19, 2016

Imagine how fantastic it must be working with someone like Carmack. Sure, the first few code reviews would be fairly traumatic - as you quickly realise just how much faster and generally better he is than you - but I think after a little while you could just relax and try to absorb as much as possible.

I love how everything in these emails is delivered as a calm series of reflections, chronicling with great honesty his own changing opinions over time - nothing is a diktat.

I also found it rather heartening that he makes the same copy/paste mistakes that the rest of us do - how many times have you duplicated a line and put "x" or "width" on both lines..? Seemingly Carmack can actually tell you how many times he's done that!

samlittlewood · on July 19, 2016

A good read about Michael Abrash's experience of this:

http://blogs.valvesoftware.com/abrash/valve-how-i-got-here-w...

(And interesting history about Valve)

smegel · on July 19, 2016

> the first few code reviews would be fairly traumatic

Hopefully because he is saying "did you really think vain attempts at premature optimization were going to impress me?".

blackbeard334 · on July 19, 2016

Nerd rage: https://youtu.be/JjDsP5n2kSM?t=13m41s

bluecmd · on July 19, 2016

Honestly, this is why working for someone like Google is so great. All the smart people you can learn from.

sitkack · on July 19, 2016

Carmack is smart, but that isn't what makes him interesting. He constantly re-evaluates his beliefs in the face of new facts, the vast majority of people do not do this, regardless of their intelligence.

mark-r · on July 19, 2016

My theory is that this is what separates the good from the great. You can't improve unless you're continually considering what you might be doing wrong, or what you could have done better.

yannickt · on July 19, 2016

Brian Hook wrote about Carmack's productivity here: http://bookofhook.blogspot.ca/2013/03/smart-guy-productivity...

It's an interesting read. Previous HN discussion: https://news.ycombinator.com/item?id=5383650

bluetomcat · on July 19, 2016

Another perspective in defense of long functions is that they enable you to spot common expressions/statements within the body, for example:

    void long_func(void) {
        ...
        if (player.alive && player.health == 100) {
            ....
        }
        ...
        if (some_other_condition && player.alive && player.health == 100) {
        }
    }

Conventional wisdom says that you should write a function `is_player_untouched` and substitute the composite expressions with function calls, but the code in question can be refactored in a much more straightforward way:

    void long_func(void) {
        ...
        const bool player_untouched = is_player_untouched();

        if (player_untouched) {
            ....
        }
        ...
        if (some_other_condition && player_untouched) {
        }
    }

Had the function body been split into more functions for "clarity", you would be doing duplicate calls to `is_player_untouched()` which go unnoticed because they would be buried deep in the call graph.

rubber_duck · on July 19, 2016

This leads to brittle code - it's easy to determine if is_player_untouched state changes between those two conditions when writing the code but when someone else edits that code it's also easy to introduce something that will break it and the monolithic/big function makes it hard to keep track of assumptions like this and it leads to a lot of code changes in those big functions as well. Even worse this kinds of bugs usually end up being hard to test if you're not covered with unit tests, and games rarely are, and big functions go against testing practices.

bluetomcat · on July 19, 2016

It sure leads to brittle code if assumptions and contracts in the code are not crystal clear, but Carmack's point in this particular article was about writing "consistently performing" code which doesn't degrade under specific conditions.

In my refactored example, you wouldn't be eventually calling `is_player_untouched()` once more if `some_other_condition` is true.

Jaruzel · on July 19, 2016

I have nothing to add to this, other than... "OK this guy has clearly worked on the Quake source code..." :)

icebraining · on July 19, 2016

Hopefully, that should be something the compiler could do on its own; assuming the language has appropriate semantics, it could identify the common expression, prove that it's value can't have changed in between, and then do the substitution on its own.

Is there any language where this is implemented, or is the effort too great for the gains?

accatyyc · on July 19, 2016

This would maybe be possible in C if the "functional" keyword was added (as suggested in the letter), and the compiler can be sure that the function doesn't use anything apart from its (const) arguments to generate a return value.

Then it also would need to make sure that those parameters can't change between different calls within the same function.

This wouldn't be feasible in the example above, since that function explicitly depends on outside variables. If you were to supply both `health` and `alive` as arguments, you could pretty much write the check yourself from the beginning.

Of course, there are many bigger checks that could benefit from this, but still, you (or the compiler) have to make sure that all functions are pure, that no arguments can change etc etc. I imagine that this could make compile times quite slow (and also complicated to write the compiler itself).

logfromblammo · on July 19, 2016

If you occasionally inline all the functions and unroll all the loops, you can occasionally find optimizations that even the compiler won't be able to make.

For example, in quaternion-based rotation math, there exists a "sandwich product" where you take the (non-commutative) product of the transform and the input, followed by the product of that result and the conjugate of the transform.

It turns out that several of the embedded multiplication terms cancel out in that double operation, and if you avoid calculating the canceled terms in the first place, you can do a "sandwich product" in about 60% the total floating-point operations as two consecutive product operations.

In the application that used spatial transforms and rotations, the optimized quaternion functions were faster than the 4x4 matrix implementation, whereas the non-optimized quaternion functions were slightly slower. That change alone (adding an optimized sandwich product function) cut maybe 30 minutes off of our longest bulk data processing times.

You would never be able to figure that out from this.

  out = ( rotation * in ) * ( ~rotation );

You have to inline all the operations to find the terms that cancel (or collapse into a scalar multiplication).

apeace · on July 19, 2016

I think there's a big point that's being missed here. Carmack is conflating inlining code with writing functional code. These are different things.

I'd agree that if the majority of your code is mutating state, it makes sense to mash all that together in one place. You want to keep an eye on the dirty stuff.

But on the other hand, inlining pure functions that don't use or mutate any global state doesn't make sense to me. Why is making it "not possible to call the function from other places" a benefit?

How about calling that code from a unit test!

phkahler · on July 19, 2016

>> Why is making it "not possible to call the function from other places" a benefit?

When it's a pure function that's not a problem. When it changes state then you lose track of ordering and such. That's his point, state changes need to be kept in the one big function so you can keep track of them easily.

And of course, almost all interesting software has mutable state. Otherwise you're just doing a computation and looking for a single output.

apeace · on July 19, 2016

>> When it's a pure function that's not a problem.

That was my point. He's conflating two different things. I understand why inlining mutation has benefits. Just not inlining functional code.

>> And of course, almost all interesting software has mutable state.

Of course, but I think most programmers overestimate how prevalent state needs to be throughout a program.

I once wrote an RSS aggregator as an eight-stage pipeline. It checked about 40k feeds, each every 60 seconds. Every stage had a 'main' file where the vast majority of state was kept. The rest was functional libraries. I suppose that would be a demonstration of what Carmack is proposing, with the difference being that my pure functions (the majority of the code) had clear names and were unit-tested.

It worked so well that almost every large program I've written since has been designed the same way!

MyNameIsFred · on July 19, 2016

Carmack addressed this, at least as I read it. One of his concluding guidelines was that it purely functional methods are a virtue and should be made purely functional wherever possible (he phrases it in terms of "const"). He did also day, however, that in his domain non-trivial pure-functional methods rarely apply.

jon-wood · on July 19, 2016

The thing that struck me was Carmack's relentless pursuit of perfection. I can't think of many people who'd describe a single frame of input latency as a cold sweat moment!

dfan · on July 19, 2016

When you are making videogames, a (video) frame of latency is a big deal.

When I worked on Guitar Hero and Rock Band, we worried about sub-frame latency (timing is more important when you're hitting a drum than when you're firing a gun).

speeder · on July 19, 2016

How you handled CRT vs non-CRT screens?

I still use CRT screens, not because of latency, but because of better contrast and colour reproduction, and the capability to use whatever resolution I want.

I noticed that in new games, and using newer video-cards, there is some kinda weird lag there, like if they were geared on purpose for slow LCDs (there seemly even some variables that you can control on AMD cards, using Windows Registry, or tweaking the Linux driver, related to screen input lag, they are on the "PowerPlay" part of the drivers for some reason though, I couldn't figure yet what they do exactly).

EDIT: Also, I stopped playing music-games almost entirely, I found many of them completely unplayable on my setup, I just can't find the correct settings to make the timing work. The least aggravating one is "Necrodancer" that seemly is really good in calibrating.

dfan · on July 19, 2016

There are so many AV setups, we had to leave calibration to the user (we tried with auto-calibration but it could not always be perfect).

The fundamental problem is that there are two independent delays that both depend on your individual system: the delay from the time that the console produces a video frame to the time that the user sees it, and the delay from the time that the console produces a sound to the time that the user hears it. In a beatmatching game, you really need the user's perceptions to be in sync, which means delaying either the video or the audio. Of course, the more you delay one or the other, the more the repercussions you run into.

In a regular video game, it's not a big deal if you fire a gun and hear the shot 50ms later, but in a beatmatching game, that delay is really noticeable.

AstralStorm · on July 19, 2016

In a competitive twitch shooter, 10 ms lag is an important handicap. And that is less than one frame at 60 FPS.

maccard · on July 19, 2016

Many modern games buffer frames to render so the rendering is up to 3 frames behind the simulation (in the UE4 case anyway) . in a multiplayer game, you've got these 3 frames of rendering, plus network latencyt both ways, plus a frame of simulation time on the server, and then (depending on the engine) a possible extra frame of input latency when taking the input from the controller to processing it.

zapu · on July 19, 2016

Ah, the times of getting a perfect setup for music games: the combination of a simple TV with low lag (introduced by picture scalers and other processing), and a simplest stereo connected with RCA cables.

izacus · on July 19, 2016

Both Rock Band and Guitar Hero allow you to calibrate your display latency to compensate for that - since you're playing songs with static timing that's possible.

glenneroo · on July 19, 2016

I'm not sure what you mean by "music games" but I've been playing StepMania for awhile now and it seems to handle latency just fine.

xchip · on July 19, 2016

LOL :)

bluetomcat · on July 19, 2016

I particularly liked the analogy with hardware where you do an operation unconditionally and eventually inhibit the result. Thinking about this, I would compare software written in that fashion with a well-oiled, smooth engine.

theandrewbailey · on July 19, 2016

John Carmack's version of hell must be extremely laggy for him, but no one else notices.

lgieron · on July 19, 2016

From what I've read, dropping frames is enough to fail console certification. Console gaming may be one of the few (non-mission critical) software areas with actual high quality standards. Meanwhile, after an iOS update, I have to swipe up to 5 times to pick up a call on my iPhone 5s...

Macha · on July 19, 2016

I wonder how strict they are about? Particularly near the end of the 360/ps3 generation, digital foundry comparisons used to be full of inconsistent frame rates etc

chipsy · on July 19, 2016

The answer is: it depends. They will silently bend the "TRC" a little if there is a business case for releasing something now and not later.

But most of the requirements center around nitpicks of software polish: Specific words and phrases used to discuss the device, loading screens must not just be a black screen, the game should not crash if the user mashes the optical eject button, etc. These things add a level of consistency but aren't the same as "solid 60hz" or "no input lag". The latter sort of issues can be shipped most of the time, they just impact the experience everywhere.

qwertyuiop924 · on July 19, 2016

No. Not only is it laggy, but Richard M. Stallman continually holds speeches about how GNU/Linux is the future.

theandrewbailey · on July 19, 2016

I'm sure Carmack would have favorable views of such a future. I want to say that he made most (all?) of the id tech engines on Linux, but I can't find a source.

RMS's hell is entirely within closed source software and everyone there calls it "Linux".

qwertyuiop924 · on July 19, 2016

No, Carmack isn't a fan of Linux as a gaming system. Look it up. Which is to say, on the consumer end. idtech was coded in NEXTStep workstations until at most quake 3, so he liked Unix for development. IIRC, the linux ports of idtech were done by Dave Taylor, not Carmack. Carmack basically just gave a shrug and signed off on them, because they worked, so why not?

And RMS's hell is one where everyone uses open source to some extent, calls it open source, and has no philosophical reason for using it, only practical ones, and have no qualms about mixing it with closed source software.

ctrlrsf · on July 19, 2016

Long functions might read easier but you lose some testing precision. I think recent focus on testing has lead to shorter functions with as little responsibility as possible. When short functions fail a test you have smaller surface area to look for the cause.

leojfc · on July 19, 2016

Doesn’t this imply the need for a new language feature? So that well-defined sections of inline code can be pulled out, initial conditions set in a testing environment, and then executed independently.

I guess this could trip up if the compiler optimisations available when considering all the code at once means that the out-of-context code actually does something different in testing...

chimprich · on July 19, 2016

Pull out well-defined sections of inline code that can be executed independently for testing? Sure, that's breaking it down into functions.

MereInterest · on July 19, 2016

I think there is such a feature. In most languages, these are called "functions".

virtualized · on July 19, 2016

Game developers don't test, they debug. That's why he draws wrong conclusions like "inline everything manually".

typedef_struct · on July 19, 2016

This is a pet peeve of mine. If you made a block of code into a separate function, I'm assuming it's called from multiple places. Or maybe it used to be. Or will be soon. But still.

sickbeard · on July 19, 2016

It's hard to unit test large functions.

panic · on July 19, 2016

Some good previous discussion here: https://news.ycombinator.com/item?id=8374345

nickm12 · on July 19, 2016

I'm surprised at the enthusiasm for really long functions here. I find my experience is just the opposite—I find it much more difficult to read code written that way than when the different sections are broken up into smaller functions.

It is of course essential that the smaller functions be well-named and manage side-effects carefully. That is, they should either be pure functions, or the side effects should be "what the function does", so that readers of the main function don't generally need to read the function's code to understand its side effects.

mark-r · on July 19, 2016

Yes, a long monolithic block is hard to read. But he's suggesting using comments and braces to visually separate it into blocks, so the end result is a happy halfway point between monolithic and broken-down functions.

The intro suggests that he agrees that pure functions are an even better solution.

qwertyuiop924 · on July 19, 2016

Well, yes, but in games, you have a lot of global state. You wind up manipulating so much state that you may indeed want this kind of large function so that you can see everything.

vfaronov · on July 19, 2016

A side benefit to having separate, clearly named functions: they often play better with tools.

You can jump to them by name from a completely different part of the code (with some editor support). You see them in the headers of your Git hunks so you don't lose context.

In languages that are heavy on type inference, like Haskell or Python+MyPy, they make for a convenient boundary at which you can assert your types, to help with type errors.

Practicality · on July 19, 2016

I wonder how much of this change is because he can no longer keep track of so much state in his head (or just doesn't want to).

I only say this because I've gone through a similar transition of valuing my mental computation time in the last 20 years of coding :).

The efficiency of inlining is compelling when you code the whole thing at once, in one session. Once you decide to break the work up over multiple sessions, it's too much to keep in your head over multiple days (or weeks).

qwertyuiop924 · on July 19, 2016

Although I am a fan of the LISP school of program design (minimal global state, build small functions and macros, make sure they work, and than build more functions and macros on top of that, until you have an abstraction that you can build your app on), Carmack raises some interesting points: If you're handling a lot of global, mutable state, you may want to abstract minimally, so that you can see where that state is mutating, which makes bugs easier to spot.

Not a bad idea.

mgregory22 · on July 19, 2016

Maybe the key idea is to put all the global state mutations as closely together as possible, so they can be compared and contrasted as easily as possible.

qwertyuiop924 · on July 19, 2016

Well, yeah, that's a big part of it.

p0nce · on July 19, 2016

About style A vs B vs C:

Robert C. Martin encourages style B because it reads topdown and replaces comments with names.

anotherhacker · on July 19, 2016

Don't we write code for other programmers first--then for the system?

It seems counter-intuitive but in the long run this mentality best serves the business.

softawre · on July 19, 2016

Depends if you're working on some LOB application with average programmers or if you're writing a game that pushes the limits of modern technology.

sophacles · on July 19, 2016

The argument in the article is actually a "for programmers" argument. Mainly that sometimes it's easier to reason about inline code than deeply chained functions, because you don't necessarily know all the implications of calling that function. Or you don't know the potential starting states when writing the function.

schlipity · on July 19, 2016

>I now strongly encourage explicit loops for everything, and hope the compiler unrolls it properly.

I get why this is a thing. Sometimes an unrolled loop is faster. But if this is really an issue, why isn't there a [UnRoll] modifier or a preprocessor or something that handles that for you?

Something like this:

  for (int i = 0; i < x; i++;) {
    dothing(x[i]);
  }

versus:

  unroll for (int i = 0; i < x; i++;) {
    dothing(x[i]);
  }

Only the compiler / preprocessor would unroll the second one. You have the best of both worlds with a reduced chance of subtle errors.

mschaef · on July 19, 2016

This is essentially the thinking behind the 'register' keyword. The idea was to make it possible to mark which variables were supposed to go in registers and which could be stored in memory. That may have made sense back in the 70's, but these days, the compiler's heuristic is usually better. This also applies to the 'inline' keyword. Maybe you're right... but maybe you're wrong and inlining the function blows the cache, etc.

I think the same logic applies to a putative 'unroll' keyword. Even if it's a short-term win, the environmental properties that make it a win are likely to change before the code is retired. To me, that argues for relying on the heuristic.

One note to this is that MSVC has both the usual 'inline' keyword as well as a proprietary stronger '__forceinline' keyword. __forceinline overrides the heurstic and forces the inlining of the function even if the compiler doesn't agree it makes sense. I can see how that kind of compiler-specific annotation might be useful tactically. (ie: You've found the compiler to be making the wrong choice for a specific platform and you wish to overrule.) But not a full-fledged language keyword...

Symmetry · on July 19, 2016

And letting the compile decide means you can choose between -Os and -O3 later depending on how your constraints change. Really for performance they only keywords you should be using are 'const' and 'static'. Both just tell the compiler it's free to make certain kinds of optimizations it might not otherwise figure out that it's allowed to do.

smcl · on July 19, 2016

Many compilers let you supply pragmas that give hints as to how many iterations a loop will usually be hit, whether this will multiples of x etc. Compiler can then use this info to decide whether to apply various optimisations. I think it's not part of C because it's kinda an underlying detail - explicitly informing the compiler how it should optimise something shouldn't really be part of the language really, pragmas are good for this

chli · on July 19, 2016

This kind of pre-processor / compiler specific keyword is available in most embedded C/C++ compiler.

See http://www.keil.com/support/man/docs/armcc/armcc_chr13591249...

For example.

efaref · on July 19, 2016

The compiler has heuristics for unrolling, and ought to do it automatically when appropriate. Sometimes unrolling hurts performance, as you make the code larger and therefore reduce i-cache performance.

shin_lao · on July 19, 2016

You mean something like...

    __attribute__((optimize("unroll-loops")))

? :-)

xxs · on July 19, 2016

btw, there shouldn't be a semicolon past i++ (i.e. the code won't compile)

any compiler worth its salt should be able to unroll w/o explicit demand from the developer.

jb1991 · on July 19, 2016

One of his definitions of a pure function is one that only has by value parameters and doesn't change state. Am I correct in thinking that in C++, the advent of C++11 lambdas allows you to be explicit about this and prevent the compiler from allowing you to accidentally use variables from outside the scope of the function's parameters, by writing lambadas (named, if necessary, like a normal function) with no-capture lists ("[]") which would force you to work in a more pure style. In C++, what other method might help you enforce purity?

pbsd · on July 19, 2016

You can still freely access global variables inside lambdas, with or without captures.

jb1991 · on July 23, 2016

I have been testing this, and I'm not convinced. A lambda with no capture at all (just []) will not compile if it accesses any variable in the scope of the lambda definition (including globals). I must add [=] or similar to get access to them. What were you referring to?

The compiler error is clear: "variable 'lam' cannot be implicitly captured in a lambda with no capture-default specified"

pbsd · on July 24, 2016

https://godbolt.org/g/gimahO

jb1991 · on July 24, 2016

Indeed, I was noticing after my comment that the automatic access of globals is a different behavior than the access to other variables outside the lambda's scope, but not globals.

Thanks for the example.

jb1991 · on July 19, 2016

Thanks for clarification. I thought the capture list was required for this. I suppose the capture list is for capturing variables that are local to the scope of the lambda's definition.

TickleSteve · on July 19, 2016

Correct me if I'm wrong (I only skimmed it) but this is less about not liking inlining than having deterministic/time-bounded performance.

These are two separate/orthogonal issues, I doubt he would turn his nose up at the processor doing less work iff it was also deterministic and had predictable worst-case timing.

bluetomcat · on July 19, 2016

I would sum it up as "reducing variability in code paths", i.e. not causing degraded performance when certain conditions change.

TickleSteve · on July 19, 2016

yes, but the intention behind that is to have control and knowledge over the maximum time a function will take.

What he is effectively saying is to treat all code in the same manner as you would for a hard-real-time system.

I certainly agree with this for performance critical code (performance being overall duration or latency), but this is not a one-size-fits-all solution. There are a lot of cases where this is not appropriate.

ajuc · on July 19, 2016

It depends if we optimize for thtoughput or latency.

TickleSteve · on July 19, 2016

(I mentioned overall duration or latency which covers your point).

but those two cases are the same.... they're both performance-critical code.

Not all code is.

ajuc · on July 20, 2016

I would argue in this context "optimize running time" is different while "optimize latency" and "don't care about performance" are similar.

I mean - if you don't care about total running time it makes sense to remove special cases that don't change the worst case because of readibility/bugs, no matter if you care about latency.

Animats · on July 19, 2016

Both games and low-level real time systems have one big loop executed at a fixed rate. That leads to the architecture Carmack describes.

It's not particularly helpful to a server that's fielding vast numbers of requests of various types.

saynsedit · on July 19, 2016

I think Carmack is conflating FP with good abstractions.

Haskell abstractions are often good because they flow from category theory and there are usually well established mathematical laws associated with them. I'm thinking of the "monad laws" and the "monoid laws."

Mathematicians tend to create abstractions if the abstraction satisfies coherent and provable properties. Programmers tend to be less rigorous about what and how they abstract.

There is nothing about C++ that prevents making good abstractions. It's just the culture of the language. Industry programmers are taught to not duplicate code and to keep functions short but they are not taught the fundamentals of what makes a good abstraction.

andy_ppp · on July 19, 2016

Yes, once you understand functional programming you never want to go back to non-explicit state changes to all of your variables contents without you knowing or your explicit consent.

xxs · on July 19, 2016

a snarky remark: 'citation need'

Functional programming ain't a panacea, either.

andy_ppp · on July 19, 2016

Have you read the article - John Carmack suggest FP and writing pure functions is better.

accatyyc · on July 19, 2016

Not really. He suggests that if a function is called from multiple places, one should try to make it pure to avoid subtle bugs. If called from a single place it may be better to inline it.

Citation: "I don’t think that purely functional programming writ large is a pragmatic development plan, because it makes for very obscure code and spectacular inefficiencies, but if a function only references a piece or two of global state, it is probably wise to consider passing it in as a variable."

Malice · on July 19, 2016

Your citation is from his 2007 thoughts.

His 2014 thoughts: No matter what language you work in, programming in a functional style provides benefits. You should do it whenever it is convenient, and you should think hard about the decision when it isn't convenient.

jbooth · on July 19, 2016

Which is still a very different statement from the way functional zealots would put it.

Carmack's talking about pure functions at the architecture/design level. Within those functions, there's still lots of temporary mutable state, I'd be willing to bet. He's writing graphics code, he's probably not passing functions to functions in order to sum an array, he'll just do the fast, iterative thing.

rtpg · on July 19, 2016

I'm not going to do the whole spiel about performance, but I know a lot of FPers who wouldn't care about the implementation of map, so long as the external contracts are kept.

The beauty of functional programming is that it doesn't matter how map works. So you can make map work as fast as possible through all the techniques you want, since code can't rely on the behaviour. Only on the input.

jbooth · on July 19, 2016

Interfaces you supply will necessarily constrain implementation details. So it winds up mattering.

Passing a function to a function is a bunch of indirection and extra stack frames(!) compared to updating very small, memory aligned mutable state in-line with the work you're doing. It's even worse with closures where you're creating anonymous data structures and passing them around. You can read up on TLBs and the speed difference between L1 cache and main memory if you'd like to know more.

You might not care about the above if it's more 'beautiful' to you, but it's vastly, vastly less performant.

rtpg · on July 19, 2016

that's if you implement it as a function :)

You could totally implement these things as compile-time macros, or do many different optimisation passes, or so many other things.

AstralStorm · on July 19, 2016

The old school of algorithm developers used to introduce hard, explicitly verified preconditions and postconditions. Typically this is done in C and C++ using assertions.

In general, any dependence on external mutable state should be asserted or otherwise verified. Those checks can be disabled for performance later. Meshes very well with actual tests too.

andy_ppp · on July 20, 2016

It's become pretty standard on here to downvote without being constructive. Maybe I misinterpreted the article but I thought it was fairly clearly saying FP is a good thing. I got to over a thousand points without going loopy, that's enough!

Good night.

corysama · on July 19, 2016

Reposting my comment from the last time this was posted. There was a lot of nice discussion there: https://news.ycombinator.com/item?id=8374345

===

The older I get, the more my code (mostly C++ and Python) has been moving towards mostly-functional, mostly-immutable assignment (let assignments).

Lately, I've noticed a pattern emerging that I think John is referring to in the second part. The situation is that often a large function will be composed of many smaller, clearly separable steps that involve temporary, intermediate results. These are clear candidates to be broken out into smaller functions. But, a conflict arises from the fact that they would each only be invoked at exactly one location. So, moving the tiny bits of code away from their only invocation point has mixed results on the readability of the larger function. It becomes more readable because it is composed of only short, descriptive function names, but less readable because deeper understanding of the intermediate steps requires disjointly bouncing around the code looking for the internals of the smaller functions.

The compromise I have often found is to reformat the intermediate steps in the form of control blocks that resemble a function definitions. The pseudocode below is not a great example because, to keep it brief, the control flow is so simple that it could have been just a chain of method calls on anonymous return values.

    AwesomenessT largerFunction(Foo1 foo1, Foo2 foo2)
    {
        // state the purpose of step1
        ResultT1 result1; // inline ResultT1 step1(Foo1 foo)
        {
            Bar bar = barFromFoo1(foo);
            Baz baz = bar.makeBaz();
            result1 = baz.awesome(); // return baz.awesome();
        }  // bar and baz no longer require consideration

        // state the purpose of step2
        ResultT2 result2; // inline ResultT2 step2(Foo2 foo)
        {
            Bar bar = barFromFoo2(foo); // 2nd bar's lifetime does not overlap with the 1st
            result2 = bar.awesome(); // return bar.awesome();
        }

        return result1.howAwesome(result2);
    }

If it's done strictly in the style that I've shown above then refactoring the blocks into separate functions should be a matter of "cut, paste, add function boilerplate". The only tricky part is reconstructing the function parameters. That's one of the reasons I like this style. The inline blocks often do get factored out later. So, setting them up to be easy to extract is a guilt-free way of putting off extracting them until it really is clearly necessary.

===

In the earlier discussion sjolsen did a good job of illustrating how to implement this using lambdas https://news.ycombinator.com/item?id=8375341 Improvements on his version would be to make everything const and the lambda inputs explicit.

    AwesomenessT largerFunction(Foo1 foo1, Foo2 foo2)
    {
        const ResultT1 result1 = [foo1] {
            const Bar bar = barFromFoo1(foo1);
            const Baz baz = bar.makeBaz();
            return baz.awesome();
        } ();

        const ResultT2 result2 = [foo2] {
            const Bar bar = barFromFoo2(foo2);
            return bar.awesome();
        } ();

        return result1.howAwesome(result2);
    }

It's my understanding that compilers are already surprisingly good at optimizing out local lambdas. I recall a demo from Herb Sutter where std::for_each(someLambda) was faster than a classic for(int i;i<100000;i++) loop with a trivial body because the for_each internally unrolled the loop and the lamdba body was therefore inlined as unrolled.

dustingetz · on July 19, 2016

50% of HN comments have misread this! The first few paragraphs mentioning FP were written in 2014 and are retracting the opinion of the long email about inlining, which is from 2007

abritinthebay · on July 19, 2016

Not... quite. He's not retracting so much as saying he's much more positive about FP now in its ability to solve this issues his 2007 email talks about.

It's not a retraction so much as an expansion of solutions to include (with caveats) FP.

Also it clarifies some drawbacks to the approach on mobile/limited resource platforms.

hiou · on July 19, 2016

I think this is a great example of something that is different for an exceptional developer as opposed to an average one.

A developer like Carmack and likely the teams he works with are able to keep a much larger system in their head at one time than an average developer.

And this is typically why they can write larger functions like that and get away with it.

A less talented developer will be much more likely to introduce bugs near the top of that function over time as they struggle to maintain the entire function in there head.

Sometimes choosing the correct tool has more to do with the craftsman than the craft.

ajuc · on July 19, 2016

As an average developer I find it much easier to keep the system in head when it's not cut into 10-line-long pieces in random order.

Hiding the complexity doesn't make it irrelevant suddenly. That's how you get code that does the same thing 5 times in 5 different branches of highly nested call tree "just to be sure".

yxhuvud · on July 19, 2016

Functions with the correct level of abstraction doesn't hide the complexity - it categorizes it and put names to the different categories.

Kenji · on July 19, 2016

and I was quite surprised at how often copy-paste-modify operations resulted in subtle bugs that weren’t immediately obvious.

I noticed this quite some time ago. This is also a major source of bugs that I write. That is, until I decided to stop copy-pasting more than a word at all, and retype everything character by character when I need it again. Interestingly enough, this saves a lot of time because the bugs I would generate otherwise cost way more time than a bit of typing.