> Return Infinity goes back to the roots of computer programming with pure A...

haberman · on May 26, 2011

It won't die because it's true, at least in some cases.

The LuaJIT 2.0 interpreter, written in x86-64 assembly language, is 2-5x the speed of the plain Lua interpreter, written in C. Note that this is with the JIT disabled -- it is an apples-to-apples comparison of interpreter-vs-interpreter: http://luajit.org/performance_x86.html

I recently wrote a protobuf-decoding assembly code generator that is 2-3x the speed of C++ generated code: http://blog.reverberate.org/2011/04/25/upb-status-and-prelim...

What is your evidence in support of the idea that assembly cannot be faster?

yan · on May 26, 2011

> What is your evidence in support of the idea that assembly cannot be faster?

I made no such claim.

My gripe was using it as a feature, claiming that since it's in assembly, it's certainly faster, which is just simply not true. In most cases, it's the algorithm that determines performance as opposed to the details of its implementation. Assembly certainly has its place, but arguing a kernel completely implemented in assembly is faster simply due to the abstraction level they're working on does not carry much weight. Of course you'll be able to find hand-tuned algorithms that are much faster in assembly than a higher-level language, but that does not follow that "Complex software written in X is generally slower than complex software written in assembly"

Also, Lua is a poor example. It's performance was much more heavily influenced by portability and embedability.

zavulon · on May 26, 2011

Of course, if you take two different pieces of software and implement one in assembly language, and the other in high level, you can't claim valid comparison.

But if you take one algorithm and implement it in both languages, assembly implementation will always be faster, thus the basis for their claim.

Groxx · on May 27, 2011

That's equivalent to claiming Assembly is always faster than other languages, because every program is just an algorithm. It's completely incorrect - I guarantee you I can write an implementation of an algorithm in Assembly that is slower than the same algorithm implemented in Ruby.

keeperofdakeys · on May 27, 2011

I think the person is trying to point out it is you need to be a good programmer to write good assembly, and the a assumption that you will always have a good programmer can be broken sometimes. An algorithm may be an algorithm, but sometimes an inexperienced programmer could easily make it slower.

mathgladiator · on May 27, 2011

> assembly implementation will always be faster

only if you don't suck at assembly.

phillmv · on May 26, 2011

I don't know, I always found the notion that humans will always be able to optimize better than machines to be somewhat... naïve. Is it an NP complete problem our own heuristics are currently better at estimating?

No one has performed controlled studies on these things - maybe they had some crummy bottlenecks, used some language feature that their compiler couldn't optimize away, maybe the benchmarks they use to determine performance are trivial (which is very often the case), etc.

Not to mention that most of an operating system's time, post boot is spent doing... what? Having the scheduler swap processes in and out? If you're running a single program that fits inside ram... it's totally fucking pointless, there's nothing left to optimize.

I have a better question. These guys are clearly smart. What the hell are they still doing in Atwood, Ontario?

haberman · on May 26, 2011

> Is it an NP complete problem our own heuristics are currently better at estimating?

Some of the important problems are NP-complete (like register allocation). Another problem is that compilers aren't that good at telling fast-paths from slow-paths (and keeping everything in registers for the fast paths). For more info see this message from the author of LuaJIT: http://article.gmane.org/gmane.comp.lang.lua.general/75426

kragen · on May 26, 2011

There's a new formulation of register allocation that's computationally tractable: http://compilers.cs.ucla.edu/fernando/projects/puzzles/exper...

BrandonM · on May 26, 2011

Thanks for the link. That email was informative enough that I thought it deserved its own submission: http://news.ycombinator.com/item?id=2588696

Bootvis · on May 26, 2011

I think the problem is that optimization is AI-complete. Without a lot of context about what your program is doing under what circumstances the problem is not solvable. You need to know when and how a specific code path is run.

eru · on May 27, 2011

Yes, though that doesn't mean that humans will be better at it.

Bootvis · on May 27, 2011

Agreed, I guess that when we have build smarter compilers a Centaur approach would work best (like in chess). The computer can do a whole lot by bruteforce and smart algorithms and the human uses his knowledge of the context to steer it in the right direction.

Unfortunately we're not their yet.

endtime · on May 26, 2011

Does the plain Lua interpreter also JIT? PyPy is written in Python and it's a lot faster than CPython in many cases, thanks in large part to the JIT.

haberman · on May 26, 2011

The numbers I quoted are when LuaJIT has the JIT disabled. It's an interpreter-to-interpreter comparison.

endtime · on May 26, 2011

Oh, cool, thanks for clarifying.

rdtsc · on May 27, 2011

> What is your evidence in support of the idea that assembly cannot be faster?

I don't think that is the key, they key is the cost of that speed improvement. Say you spend a week to write the protobuf decoder in assembly so now it can decode in 30usec instead of 60usec. So you have an impressive 2x speed gain.

But then say, you are writing the data do a disk. Well maybe it doesn't really matter how fast you are decoding the protobuf if next you are sitting there for ages waiting for that data to be written out. That 30usec gain is nothing on top of that 10msec wait time that is coming next, so was that week a good investment f you just did for pure speed improvement? (well you might have done as a learning exercise, then speed doesn't really matter).

alf · on May 27, 2011

Although this is a valid point in many cases, I don't think this is one of those cases. It's in these "infrastructure" type projects like kernels, compilers, interpreters, and parsers where "micro" optimizations are actually really important.

> But then say, you are writing the data do a disk. Well maybe it doesn't really matter how fast you are decoding the protobuf if next you are sitting there for ages waiting for that data to be written out. That 30usec gain is nothing on top of that 10msec wait time that is coming next, so was that week a good investment f you just did for pure speed improvement? (well you might have done as a learning exercise, then speed doesn't really matter).

haberman's parser (1460 MB/s) outperforms Google's C++ parser (260 MB/s) more the 5x. Note that even in the disk example, a fast SSD will have enough bandwidth to throttle the CPU on Google's parser. On top of that, this is FOSS, which means his weeks of investment is multiplied every time someone downloads and uses his code.

rdtsc · on May 27, 2011

> On top of that, this is FOSS, which means his weeks of investment is multiplied every time someone downloads and uses his code.

Excellent point.

Also, I didn't mean to talk specifically about his parser, it was just used as a general example.

It is just that in my experience, engineers (I am guilty too) have a tendency to spend time micro-optimizing without, in the end, making a difference in overall user-experience. For example, stuff like choosing to write a GUI app in C++ when it could have been whipped up in Python in a fraction of time and lines of code. The menus will open in 10ms instead of 3ms but maybe it doesn't really matter from user s point of view.

Same holds for most data that ends up in IO choke-points. Even memory today in SMP architectures is a choke-point. Spend time hand-optimizing CPU bound code only to find out that it ends up waiting on a lock, in a disk, network buffer, or for some user input.

Also micro-optimizations are often not future-proof. Many cache-friendly data structures and algorithms for example, assume a particular cache line size, or particular characteristics of hardware that just happen to change. Even in the assembly case, today we have 32bit, 64bit and ARM common target architectures, each with various levels of SSE extension support and other features, so one can spend a lot of time, maintaining and tweaking all of them.

rthomas6 · on May 27, 2011

In this case though, they say their market is for HPC clusters and embedded computing, which are two areas where most processes are likely to be CPU-intensive.

davidtgoldblatt · on May 26, 2011

An interpreter isn't a fair comparison though - in assembly you can use a few tricks like threaded code (http://en.wikipedia.org/wiki/Threaded_code) to get a big speed boost, but these techniques aren't really broadly applicable to programs generally.

And in any case, it's still not an argument for writing an entire OS in assembly, but rather only a few important segments of the code.

marshray · on May 26, 2011

OK, but if it's a JIT, then how much time is the interpreter actually running?

Wouldn't we expect it be executing the JIT-compiled code (i.e., doing useful work) most of the time?

If so, doesn't that really make the opposite point, that compiler (JIT or no) generated code is plenty fast?

haberman · on May 26, 2011

LuaJIT has both an interpreter (written in assembly) and a JIT. The 2-5x I quoted is only for the interpreter (ie. with the JIT disabled). The speedup for the actual JIT is 2-130x vs. the interpreter written in C.

marshray · on May 26, 2011

The Lua C implementation seems very conservatively written for portability and maintainability, but it's not slow either.

Handwritten assembly really can be faster than compiler generated code. The proof is that we can always look at the output of the compiler and invest more time improving on it by hand, whereas the compiler is required to complete in a short amount of time and usually without actually timing its code on the target machine.

Now if you take someone experienced in hand-tuning assembly like that and ask them to write the fastest possible code using a compiler, they're going to beat the pants off an ordinary coder who hasn't been benchmark everything he writes all along.

But the real lesson here is that Lua is just freaking awesome.

wtracy · on May 27, 2011

Actually, porting LuaJIT to BareMetal might be a neat idea. :-)

DarkShikari · on May 26, 2011

When the Singularity happens and computers are at least as smart as humans.

Until then, compilers will be mindbogglingly retarded piles of crap that produce code 10, 20, or more percent slower than a human. Doubly so on anything other than x86. Add a factor of 10 if SIMD is involved.

DarkShikari · on May 26, 2011

And the downvote brigade arrives, consisting entirely of programmers who don't read the assembly outputted by their compiler.

Part of the problem is simply that compilers typically cannot know the same information the programmer knows: assumptions about alignment and aliasing, for example, that the programmer knows, but the compiler doesn't.

But even if they did, there are plenty of cases where "producing good assembly code for a given algorithm" is infeasible with a brute-force approach, requiring the imprecise-but-effective pattern-matching of a human brain -- or something similarly powerful.

jasongullickson · on May 26, 2011

I don't understand why this reasoning is inappropriate for the application (HPC). If they were suggesting this as a way to develop massive enterprise applications with vast compatibility requirements and rapid application development needs that would be one thing, but for specialized supercomputing applications do you disagree with this approach?

lincolnq · on May 26, 2011

Modern compilers are generally considered smart enough to do a better job at optimizing your code than you. Maybe if you're really, really smart you can do a better job than the compiler. But I suspect most people who say this are not, in fact, smarter than the compiler (but are simply suffering from self-serving bias). And by saying it, they're fooling a bunch more people into thinking that modern compilers are stupid, leading them into incorrect decisions like actually writing applications in assembly "for performance reasons".

So while yes, there are probably a small number of cases where it's still worthwhile to write stuff in assembly, it's not worthwhile to talk about it as though it's a good thing (I would guess you should only do it when it becomes a necessity, and complain about it a lot, rather than presenting it as a feature)

jedbrown · on May 26, 2011

Compilers are great at some things and really bad at other things. The C and Fortran ABIs are too permissive in some cases, making certain optimizations impossible for the compiler to do (without combinatorial growth in generated code size). On x86-64, you can go a long way using SSE intrinsics, but that is pretty close to the assembly level. IBM did a bad job designing the PowerPC intrinsics, so they are nearly useless. There are a few computational kernels that I have sped up by a factor of two relative to what the compiler could produce or the best published result. The x264 project writes a huge amount of assembly and provides consistently better performance than other implementations. There is still a place for assembly, although it should be kept localized.

bnegreve · on May 26, 2011

Take a piece of assembly code generated by a compiler and try to optimize it. You'll see that you don't actually need to be that smart to obtain a significant perf improvement.

Compilers are good, but they must ensure correctness for any source code. On the other hand, you now exactly what you need thus you can drastically simplify / optimize the assembly code.

haberman · on May 26, 2011

As someone who writes assembly, I think this is overstating the case. There are a lot of algorithms (particularly short ones) where good C compilers generate nearly optimal code that you would be hard-pressed to improve on.

In my experience the benefit you get from writing assembly comes largely from your ability to do better register allocation for your fast-paths, in cases where your compiler would spill registers to the stack.

There are cases where the compiler does something that is genuinely stupid (http://blog.reverberate.org/2011/03/19/when-a-compilers-slow...) but in modern compilers these are pretty rare.

jasongullickson · on May 26, 2011

I don't think they are saying that you and I (application developers) should be writing assembly (they mention C/C++, and refer to C++ libraries elsewhere), unless we want to.

What they do emphasize is that the operating system was written in assembly, and while I don't know the members of the team personally, I'd guess anyone who has completed a project such as this (with this level of polish and utility) is at least potentially capable of being smarter than your average compiler.

I'd also like to state that I do agree, for a vast amount of software development, the convenience of higher-level languages and API's outweigh the associated performance disadvantages but for some programs, the ones that talk directly to hardware and who's library functions are called billions of times a second by application programs (and let's add, that are written far less frequently than application-level code) the assembly approach is justified.

Of course you're welcome to build something similar in a compiled language and prove us all wrong :)

jasongullickson · on May 26, 2011

I'd also like to say that while you may not enjoy programming in assembly that doesn't make it a bad thing.

Programming assembly is actually fun (for some of us) and gives you a level of intimacy and insight into the machine that no other language can provide.

OK maybe FORTH

lincolnq · on May 26, 2011

Great! Go right ahead and program everything you want in assembly. If it's for fun, or for educational purposes, or whatever, awesome. But don't go around saying it's going to perform better than Haskell code, or whatever higher-level language you want, that produces the same output, without testing it.

nhebb · on May 26, 2011

languages like C/C++, VB, and Java.

That's kind of funny if you think about it. Are people really writing OS's in VB these days?

blasdel · on May 27, 2011

HN user daeken has been writing an OS in .NET http://daeken.com/renraku-future-os

veyron · on May 26, 2011

when people fab chips that run java / vb / C++ natively. I saw a demo once using java on chip, but it was far from consumer-level.

reemrevnivek · on May 26, 2011

Chips will never run java/vb/C++ natively. Chips execute instructions, and those languages contain constructs which are not instructions.

See a great discussion at http://electronics.stackexchange.com/questions/14527/any-pro...

(The Java demo you saw was probably Jazelle, which is a processor module on ARM chips that runs (some) Java assembly language instructions natively, instead of using a virtualized processor. That's possible for a lot of VM-based languages, but it's not running Java.

jrockway · on May 26, 2011

Modern chips don't run x86_64 assembly language either. That's just an compatibility layer that is translated away from as soon as possible.

akent · on May 26, 2011

You mean like Jazelle http://en.wikipedia.org/wiki/Jazelle that hardly anyone used?

veyron · on May 26, 2011

Gosling came to my school and did a demo of an RV using something like that -- i wish i remembered the chip name

"hardly anyone used" <-- this is why assembly still survives. I still use a fair bit of it for commands that glibc doesnt wrap (i.e. RDTSC)