As usual, the programmer should look at the generated assembly, when reasoning about performance. Sometimes compilers are smart, optimizing away things people wouldn't imagine they can, and sometimes they are really dumb, pessimizing code in unimaginable ways (to the extent that just deleting code from the output, which doesn't naturally correspond to anything in the source code, makes the output better). Relying on the compiler to perform magic without verifying the results is an antirational stance. Personally, I'd prefer if optimizations were applied based on what's visible in the source code, instead of the wild-west that we have now. E.g. inline function calls based on annotations left by the programmer, not because you (the compiler) feel like it.
The Rust compiler can be very good at optimizing iterator pipelines, however a while ago I wrote a function which didn't have any branches whatsoever in it, but the generated assembly was ridden with them. What could have been 5 instructions consisting of simple bit operations (everything done in registers, not loading anything from memory), ended up as 12 (? IIRC) ridden with conditional jumps. No matter how I changed the source code, which operations I used, the output was the same. Getting to the desired result shouldn't be stifled by the compiler having a mind of its own, making its own decisions. A sane default could be set, but some annotations to have a greater control over the result would be appreciated.
Optimizing compilers also make debugging harder. Making the the output and the source code closer together would help making sense of a debugging session.
This sounds tempting, but it would trigger a wave of unreadable, irrelevant micro optimization by many programmers.
If you have a desired assembly result, write it yourself in the first place.
The one concession to this I would like is a "constant time" marker for cryptography functions where it is an absolutely critical property that there be no data dependant branches.
(This is a little different on GPUs where branches are so unreasonably expensive, and it sounds like that would also benefit from "no branches here" markers. The downside of this is that it makes valid code un compilable if the compiler can't find a branchless way of doing it.)
I've been designing a programming language where nothing is a side-effect – including calling conventions, processor caches, branch predictors, locks, graphics modes, etc.. “Runtime has to be independent of the value of the input” would be how that language expresses "constant time" – and it would take into account every part of the compiler's model of the target architecture (which, by the way, is also described in the code) so for a sufficiently-simple, sufficiently-well-modelled architecture, the generated code could be provably Spectre-proof – given sufficient assumptions (Eis's take on Rust's `unsafe`).
It's very early days, but I hope to get something usable by 2025.
Unless I’m misunderstanding your point, I don’t think what you want is possible, due to microcode. You give the AMD/Intel/Arm/etc platform some instructions to execute, and then they execute it in ways that are not guaranteed to stay the same, between microcode updates and between hardware revisions. The only guarantee is that the results you get should be the same as if the instructions you requested were executed in the order you requested them.
The sophisticated version is difficult, but there are a lot of things you can treat as "opaque, everything mutates this, lots of stuff depends on it" and still be able to make decent optimisations based on what you know.
It requires state that can be inferred and manipulated. For the cache, assuming you know how large the cache is, you can just fill up the cache with useless pages, evicting whatever might be vulnerable to a timing attack, before having it loaded back in. Very slow, but it works. (You might be able to get away with less, depending on the OS's guarantees and what you can assume about how your program will run – and the language requires you to make those assumptions explicit, even if it's a case of "assume this manually-written assembly does the right thing without violating any invariants".)
Yes, these tricks don't have forwards-compatible guarantees in the general case – but if the processor manufacturer makes guarantees about all members of a family, then you can optimise to those, and in any case having something that provably works in all supported processors (and can be ported to later processors if necessary, just by using an extra library (if the library is correct)) is still useful.
No. You cannot infer the layout of the cache based on size alone. You don’t know how many ways there are, how many sets their are etc. you don’t even know their LRU scheme.
For mutating branch predictor state at will, you will need your compiler to write self modifying cache. This will pollute the icache and further mutate the state based on where the cache line lies, and the current state of the icache.
You could maybe restrict your target to open source hardware and come up with something.
> You cannot infer the layout of the cache based on size alone.
No – but the compiler would presumably be told how the specific processor's cache works. If it's possible for a human to write cache-flushing code, it's possible to describe it to a computer. (And that was just an example.)
> For mutating branch predictor state at will, you will need your compiler to write self modifying code.
Only for arbitrary modifications. For specific modifications, it's fine to go with ordinary code – though self-modifying code isn't actually all that hard to model, if you generalise “state” to also encompass the state of the self-modifying section. (Compilers already have “all branches” type things.)
> You could maybe restrict your target to open source hardware
There's no need to make this restriction. You only need sufficiently-understood hardware; you underestimate the ability of reverse-engineers.
How exactly would you modify branch predictor state without self modifying code? I am very interested in this.
Another major question: a lot of x86 ops for manipulating things you want to manipulate are ring 0 instructions.
We haven’t even discussed the effect of somewhat non deterministic delays caused by contention from other processes. This can change the true execution order of instruction inside the pipes, even if you somehow manage to enforce in order issue.
This is one things I like so much about SBCL, it not only compiles the Lisp code directly to native code, with the disassemble function you can print out the generated code for each function to see how much the compiler was able to optimize a function. This is especially important in the context of dynamic languages to see where generic operators have to be called and where the type inferencer was able to compile it into an assembly instruction like integer multiplication.
What makes SBCL special is a) using type information as compile-time assertions and b) the level of diagnostic output during compilation. The compiler explains in detail when operations can't be optimized further, for example because of lack of type information.
Well, yes. How SBCL uses the type annotations as well as type inferencing to completely eliminate type checking throughout sections of code and generating highly efficient assembly code makes looking at the assembly output of SBCL especially interesting to look at. It gives you good feedback whether your type annotations had the desired optimization outcomes. I didn't want to indicate that other Lisp implementations wouldn't offer native compilation or a disassemble function.
Many implementations also give us type inferencing and optimizations, but usually not compile time type checking like in SBCL. I would also kind of think that the level of type inferencing in SBCL is greater.
That's a good point, but a bad example since 'x < 0 ? 1 : 0' is just the same as 'x < 0', which is also be implemented (on x86) as a CMP followed by SETcc.
Some targets don't support integer division/remainder or some floating point ops in hardware and automatic vectorization can add/remove branches in places that won't be easily predictable.
Early GPU “compilers” [1] and their corresponding architectures really meant that “an if statement means go slow”. Also, IIRC, the original mask depth on Fermi was ... 32 branches, and afterwards you’d spill. So it was an adaptation to “don’t make the compiler work harder when you can fold the branch for it”.
I always found this to be worse for autovectorization with icc. Every once in a while, it would impress you by calculating masks and then blending. Usually it would not.
[1] Many of these “compilers” were really just blindly emitting code with no optimizations. The point was it ran correctly, and was effectively autovectorized. If you cared, you could make it faster.
The disassembly can lie, however, i.e. the CPU can do a lot of work in the time a cache miss takes (or some other dependency induced stall, or misprediction) - you have to measure it, in the situation where the code is actually being used (i.e. contrived benchmarks will again often lie as they usually don't fill the cache, and are easy to predict)
Testing with only one if statement is not convincing at all.
I'm pretty sure that once you have a couple of mixes and steps etc., the branch prediction becomes exponentially harder and you are better off with the one line mixes, step, clamps etc. In general, from experience writing a lot of shaders, I want to avoid if statements, function calls and most importantly loops.
Now maybe if statements are efficient in some conditions, but if you want to add new details to the same shader some months later, you can't be sure the generated instructions and branch prediction will be as efficient.
There is also coding style: Id rather stick to a bunch of mixes, steps and clamps than a salad of mixes, steps and clamps with some multiline if statements in between.
Not to mention branches and branch prediction can be a cause of understudied (and possibly) vulnerable interactions like Spectre/Meltdown.
It's also worth mentioning that branching is a critical part of any recursive function, or else it would not know when to terminate. In some cases the function call could be linearized, but it again becomes a compiler runtime cost analysis.
Ifs are generally bad for readability though, because they force the reader to understand your control flow. Replacing an if with a ternary makes it more readable, not because the generated code is any different, but because a reader immediately knows that they don't need to scan through the two sides for control flow constructs (because the two sides of the ternary are guaranteed to be expressions rather than blocks). An if might occasionally be clearer than some convoluted algebra, but not often.
I have been reading HN for a long time but your comment is the first time I violently felt the need to create an account and respond. I want you to consider how significant that is. Ternaries are NEVER easier to read.
I don't want to come of as insensitive, it sounds like you've had some really bad experiences with ternaries in the past and it's important to acknowledge your experience. But I just want to say that not all ternaries are like that.
In a more serious tone, ternaries in places where they should be used, i.e. as expressions, are in my opinion much clearer to read than replacing them with an if-else statement. In my mind, an if/else block is an imperative entity that describes two different actions that could happen. Even if in reality both branches are trivial, my mind will be unnecessarily occupied with the fact that it needs to reason about code here, not just data. A ternary, on the other hand, is a declarative entity, saying that this expression takes on this or that value, usually in such a way that each branch of it maintains some invariant with respect to what's in the conditional.
> In my mind, an if/else block is an imperative entity that describes two different actions that could happen. Even if in reality both branches are trivial, my mind will be unnecessarily occupied with the fact that it needs to reason about code here, not just data
That's true in some languages. But in some languages if-else can also be used as in expression. e.g. in Rust I can write:
Yes, the : and ? are single characters that are easily lost/overlooked among the rest of the code on that line. Even worse when there are nested ternaries.
When I find a place where a complex ternary is probably the best choice, I'm a big fan of the following format, because I find it clear that each line is a new condition
> where a complex ternary is probably the best choice
Under what circumstances would that happen? I've never encountered a nested ternary that wouldn't be better replaced with readable code, so now I'm wondering what experiences I've been missing. :)
I don't have an example handy, but it's useful in exactly the same cases where you would use a Lisp "cond" construct; testing a bunch of conditions one after the other. Using an "if / else if / ... / else" construct can do the same thing, but it can be less clear sometimes (both to intent and because it takes up more screen real estate, which makes understanding the overall code harder).
Yes, good point. Still, when working on a shared codebase, it's good to take more potential beholders into account. I suppose if there is a common, consistent style that you can expect everyone to be familiar with, you can get used to reading a sensible amount of chained ternaries.
And, to illustrate my point, hypothetical "if statement is also conditional expression" syntax, to illustate how much more readable "ifs" are than "?:":
While I agree that "if statements are expressions" is a nice feature, I disagree that your version is more readable. That being said, it's readable enough that, if the rest of my team preferred it, I'd be ok using it.
I don't advocate the use of this macro, but if you do use it you should really wrap the parameters and the whole expression in parentheses to avoid precedence issues:
That isn't more readable. It's non-standard. It also looks like a function call, but doesn't behave like one (in a function call, a, b, and c would all be evaluated). It's going to confuse other C programmers.
As Liquid_Fire points out, it's also probably incorrect.
Maybe ternaries aren't easier to read, but they make the surrounding code easier to read because they duck out of the way? Devoting several lines to what is a very quick if/else expression could be distracting I suppose. Playing devil's advocate.
It's important to note that ternaries are expressions, they evaluate to a value. When I see a ternary I know that a boolean is being mapped to a value. This is true both technically and almost always in practice.
If-statements are the wild west. Who knows what an if statement will do? An if-statement may be as simple as a ternary, or it might be a branch point and the resulting logic streams will never meet again.
There’s the condition (evaluated first), the true case and the false case. Only one of the cases is evaluated.
I’m assuming you were alluding to some sort of “x++ = x++ * *x++” style situation, but the ternary expression isn’t a statement and there are sequence points in between. As a bonus, the first result for “ternary operator evaluation order” gave me someone who linked to the standard [1]. Quoting from the standard that they quoted:
> The first operand is evaluated; there is a sequence point between its evaluation and the evaluation of the second or third operand (whichever is evaluated). The second operand is evaluated only if the first compares unequal to 0; the third operand is evaluated only if the first compares equal to 0; the result is the value of the second or third operand(whichever is evaluated), converted to the type described below.
I was not thinking about local state; you're right that the sequence points ensure an orderly evaluation there.
My point was that since ternaries can call functions, and functions can have side effects on global state, the concept of an "if" being a "statement" vs a ternary being an "expression / value" is not nearly as meaningful as it's made out to be. A ternary can trade stocks, fire a 500W laser, unlock a velociraptor enclosure, etc, just as easily as an if can.
Meh. When sufficiently terse, ternary expressions are frequently more readable than a more verbose if/else equivalent. I would agree that ternaries can get out of hand quickly though.
In C and C++, ternaries are frequently used on the right hand side of an assignment statement. if/else can't be used in such a context.
Here's a concrete example:
const int min = x < y ? x : y;
int min;
if (x < y)
min = x;
else
min = y;
Note also that this prevents variable min from being made const, unless you are using c++ and wrap the entire if/else in an immediately invoked lambda, which of course would require additional verbosity.
We can't objectively say whether they're easier to read or not. Opinions differ.
We can objectively say that ternaries are shorter and simpler however. (Simpler because they can do fewer things than an if-statement or if-expression.)
Since the question mark applies to the condition and not the 5. This also highlights the 2 possiblities of the value in a more clear way. The two possible values are directly stacked with nothing in front of them.
I am surprised about this. Very often you must duplicate a lot of the statement in each branch of the if statement, and it's not clear that those duplicated parts are exactly the same and must be exactly the same.
The ternary and unary increment/decrement operators are things I thought I’d miss, but in fact absolutely do not miss. May they forever perish. If-as-expression always!
Firstly, you can't compare if statements to ternary expressions. To compare apples to apples, you need to compare: 1) if statements with single assignment statements in both branches, assigning to the same variable, and 2) assignments with ternary expressions on the right side.
In that apples-to-apples comparison, it's really a toss up which is more readable. Looking at the if statement, anyone who has been reading code for any amount of time is going to immediately recognize the pattern. There's no "scanning two sides".
Do you know if there's any evidence that ternaries beat ifs for people understanding code in practice or in people's self-assessments? Would not have been my guess at all.
If Hillel Wayne’s talk on empirical software design is any good measure then either method is fine[0] so long as you and the team you work with can agree about it’s utility in code review. If it’s just you on the project then do whatever is most comfortable.
I usually just transform functions with 2+ if statements into functions without `if` statements by extracting `if` statements into other smaller functions where I can write it with an early return.
Sorry, perhaps I don't understand how ternary operators are any different than if; then; else. In fact for readability I very much prefer guard statements that don't include an else at all.
if/else can contain arbitrary control flow (return, break, throw, goto) in the bodies, so you have to read the whole body to know what's going on. Ternaries are just expressions so however complex the body is, you know it's always just going to evaluate to a value.
The Rust compiler can be very good at optimizing iterator pipelines, however a while ago I wrote a function which didn't have any branches whatsoever in it, but the generated assembly was ridden with them. What could have been 5 instructions consisting of simple bit operations (everything done in registers, not loading anything from memory), ended up as 12 (? IIRC) ridden with conditional jumps. No matter how I changed the source code, which operations I used, the output was the same. Getting to the desired result shouldn't be stifled by the compiler having a mind of its own, making its own decisions. A sane default could be set, but some annotations to have a greater control over the result would be appreciated.
Optimizing compilers also make debugging harder. Making the the output and the source code closer together would help making sense of a debugging session.