How to Read Assembly Language

ChrisMarshallNY · on March 2, 2021

Godbolt is a fun tool: https://godbolt.org/

The story I heard, was that the author is one of these obsessive genius types, who was having a debate about compiler efficiency, so he wrote an entire compiler engine to prove his point.

I can understand that.

This is a great article. Being able to descend into machine code/assembly is a valuable “tool of last resort.”

Personally, I avoid that like the plague, but an OS developer probably still needs to do this (I developed system stuff, back in the Dawn Times).

I started off with Machine Code, so it’s been a long, strange trip, for me...

flohofwoe · on March 2, 2021

IMHO you don't have to be obsessive to use godbolt.org in your daily work, neither should it be a "tool of last resort".

It really does help a lot to understand what the compiler will actually turn your high level code into, because quite often the results are very surprising (or if only to be shocked about the amount of assembly code that's generated from a few lines of C++ template code). Comparing the differences in output between different compilers and optimization levels is also very useful.

ChrisMarshallNY · on March 2, 2021

I consider Godbolt to be an educational and “thinking cap” tool, as opposed to a critical path optimization tool.

It is pretty damn cool, though.

It’s sort of like how an algebraic graphing app is most useful for students, as opposed to working engineers.

Nowadays, with on-chip threading, ASM stack dumps are a lot less useful than they used to be. Really, for me, munged function names are what I use to figure out the general vicinity of a bug, which I then figure out, by examining the source.

mumblemumble · on March 2, 2021

There was a great interview with Matt Godbolt on the CoRecursive podcast: https://corecursive.com/to-the-assembly/

Touches on a lot of good reasons why someone might still want to understand assembly in this day and age.

amelius · on March 2, 2021

I don't understand this tool. My compiler has a flag for generating assembly output. That should be sufficient, no? Also, it more accurately reflects what my compiler actually generates given all the other flags.

pjc50 · on March 2, 2021

Your compiler does, yes. Godbolt gives you everyone else's compilers, across a huge range of versions and architectures, without having to install anything.

It's quite often easier to just slam something into godbolt than to do it locally with a temporary file and gcc -S.

flohofwoe · on March 2, 2021

An important feature of godbolt is quickly comparing the output and language feature support of different compilers, or different versions of the same compiler (e.g. what versions of MSVC support a specific C99 feature, or which new C++ feature is safe to use across clang, gcc and MSVC).

And most importantly, you can check what your code would look like when compiled for 6502 or Z80 ;)

amelius · on March 2, 2021

I see, but I'm not convinced a web-tool is the best way to go about it if optimizing assembly is part of one's dayjob.

I do understand that it's nice to play with, and that it can have educational value. I would personally love to see WASM support, and a way to see e.g. LLVM intermediate code. But I'd want to see some more documentation on the website e.g. about the ABI so I know how the arguments of a function are passed in etc.

pjc50 · on March 2, 2021

> I'm not convinced a web-tool is the best way to go about it if optimizing assembly is one's dayjob.

It isn't, because that's not what it's for. Besides, few people have optimizing assembly as a dayjob, and in that case you'd be writing it rather than reading it. Trying to juice a compiler into producing specific assembly output is frustrating and brittle. I suppose it's useful for seeing whether your autovectorisation has worked.

But the best use of godbolt - uniquely as a web tool - is to show other people what the output is.

(I wish more people would stop before saying "tool X is useless for workflow Y" and ask themselves what the intended workflow of tool X is. Not everything is for everybody, that would be impossible.)

> more documentation on the website e.g. about the ABI so I know how the arguments of a function are passed in etc

It just calls the underlying compiler executables. Consult their documentation.

detaro · on March 2, 2021

There's quite a range between "looking at compiler output is sometimes useful" and "optimizing assembly is one's day job".

We use it all the time, it's certainly not just a toy.

I have the luxury of having a build farm with a large range of compilers and platforms available, so I could use that to rig something like godbolt. But that'd be spending probably days on setting something up, + ongoing maintenance, instead of just using an existing thing that does the job? If I really were only doing this kind of thing, that'd probably be worth it to be able to customize etc. But I'm not. It being a service (and thus basically automatically a web tool) is a a valuable feature.

It's also a great tool to share and communicate results of such investigations. Again, easy for a web tool.

flohofwoe · on March 2, 2021

There's a few nice hidden features presented in Matt Godbolt's talk here:

https://www.youtube.com/watch?v=kIoZDUd5DKw

As for LLVM IR output, just use the right clang command line options:

https://www.godbolt.org/z/r7f6TY

Likewise for WASM:

https://www.godbolt.org/z/Kr6G3K

pjmlp · on March 2, 2021

For any compiled language you can read their generated Assembly via online tools like https://godbolt.org/

For .NET languages use https://sharplab.io/

For Java, https://github.com/AdoptOpenJDK/jitwatch

For JavaScript, if using V8 there are the set --print-something flags.

vardump · on March 2, 2021

It's eye opening how most trivial functionally equivalent code generates practically identical native binary output regardless of the language, be it C, Java, Javascript or .NET.

_448 · on March 2, 2021

A brilliant insight on these similar lines was exploited by a very brilliant engineer(https://news.ycombinator.com/user?id=jart) to go even deeper and generate single cross-platform binary of a program (https://news.ycombinator.com/item?id=26271117)! This magic is achieved using this: https://news.ycombinator.com/item?id=25952540

pjc50 · on March 2, 2021

Why is that eye-opening? That things might converge on the same optimal result?

vardump · on March 2, 2021

It's eye opening, because it shows how small the actual differences in executed code really are.

When I first tried this about 10 years ago, I was very surprised how good code Javascript JITs generated. I expected there to be type checks, lots of unnecessary memory accesses, etc. all over, but to my surprise the inner loop was practically same as gcc -O2 output when compiling an equivalent C program.

Of course the same is not true when it comes to bigger blocks of code, because Javascript does need to do a lot of bookkeeping behind the scenes.

pjmlp · on March 2, 2021

I guess it is more like that many are still unaware of it when mixing languages with implementations, while arguing language X vs Y.

nayuki · on March 2, 2021

My take on learning assembly: https://www.nayuki.io/page/a-fundamental-introduction-to-x86...

dragosmocrii · on March 2, 2021

Thank you for this!

Stratoscope · on March 2, 2021

> Note that my tutorial uses the AT&T assembly language syntax instead of Intel syntax.

This makes me curious: how widely used is the AT&T syntax these days? I could easily be wrong, but my impression was that most work is done in the Intel syntax.

This may be ignorance on my part, as the Intel syntax is all I've ever used on Intel processors. I've coded in a dozen other assembly languages for various processors, but Intel chips seem to be the only ones with two different assembly language syntaxes, right down to the source/destination order.

retrac · on March 2, 2021

It's still the same division as always. Linux and other UNIXes on x86 and x86_64 generally use AT&T syntax. The rest of the x86 world uses Intel syntax. And of course *nix has always been a minority on x86.

> I've coded in a dozen other assembly languages for various processors, but Intel chips seem to be the only ones with two different assembly language syntaxes,

It's a historical quirk. Generally, today if there is a widely used and accepted assembler syntax targeting the architecture, UNIX porters would use that assembler's syntax. GAS accepts normal ARM style syntax, for example.

But that's today. x86 is ancient. And so is UNIX on the x86. Going all the way back to the late 1970s, the 8086 was one of the first microprocessors that could host a decent UNIX environment. The rush was on as soon as the chip was released. (Most of the porting fervour would shift to the 68000 about 18 months later when that came out.)

There was no established assembler aside from the one from Intel. MASM wouldn't be written for another couple years. UNIX at this point already had a full quasi-portable toolchain, which of course was familiar to the porters. And they planned to live in an all-UNIX world anyway. The only assembly anyone would ever write would, hopefully, be a dozen lines in the kernel. So they just adapted what they had and what worked. The old PDP-11 UNIX assembler, backwards syntax and prefixes and all. And here we are 40 years later.

lokedhs · on March 2, 2021

The M68k has two different syntaxes too. The argument order is still the same (src,dest, the correct order in my opinion), but other things are different, with the main difference being indirect addressing which is (a0) in Motorola syntax, while it's a0@ in AT&T syntax.

So while the changes might not be as huge, it's definitely not just Intel.

masklinn · on March 2, 2021

> The argument order is still the same (src,dest, the correct order in my opinion)

OTOH dest, src matches every assignment ever, and is much more regular when extending to multi-parameter mnemonics (e.g. madd).

colejohnson66 · on March 2, 2021

I’m going to ignore the characters such as % and $, and also the parameter order (sortove)

There’s also the fact that the Intel and AMD x86 manuals list them as

    MNEMONIC dst, src1

GAS does this:

    MNEMONIC src1, dst

Which is fine, but what about 3 and 4 operand instructions? I always have to look that up. 8087 FPU instructions also (in GAS), due to a historical quirk, are in “Intel” form:

    MNEMONIC dst, src

Then there’s that whole indirect syntax of:

    segment:displacement(base,index,scale)

Whereas “Intel” syntax is much clearer:

    segment:[base + index*scale + displacement]

This leads to weird things like:

    [ebx*2+10]  ; I can see what that’s doing
    10(,%ebx,2) ; I need to remember what the order is and convert it in my head

I’m not trying to convince anyone here. If you like “AT&T” syntax, that’s fine with me. I just prefer “Intel” syntax because of these reasons.

jart · on March 2, 2021

AT&T syntax is best syntax. Linux, GCC, LLVM, etc. all use it. Scott Wolchok writes it in a style that's more verbose than it needs to be. It's probably a good call since this is an educational blog post. Like there's some nuance to when you need things like q on statements `movq %rdi,8(%rsp)`. For example, this is fine: `mov %rdi,8(%rsp)` but you need it if you move an immediate to memory, e.g. `movq $123,8(%rsp)`. I like to avoid them because mnemonics like `orb` don't appear to the untrained eye as the `or` instruction! The GNU-style suffixes can also appear multiple times within an instruction name in various combinations, e.g. pclmullqlqdq vs. pclmulhqhqdq.

swolchok · on March 2, 2021

> Scott Wolchok writes it

I didn’t write it, I just copied the compiler output from Godbolt so that the article could be read on mobile, where Godbolt’s UI doesn’t work so well and switching tabs back and forth is difficult.

muststopmyths · on March 2, 2021

gdb and lldb's default disassembly syntax is AT&T, so I'd say it's pretty widespread.

29athrowaway · on March 2, 2021

One thing is reading assembly language written by a human, another thing is reading assembly language generated by a compiler. Compilers are insane, especially when you turn on optimizations.

Another thing is that there is more than 1 asm syntax. Popular ones are the Intel syntax and the GNU assembly syntax. The way you read those is different.

If you want to get started in a simple way with little friction I suggest you do this.

1. Go to godbolt.org (Compiler Explorer)

2. Write a simple function (not program, that will complicate things), such as one with no parameters that adds two hardcoded. Like this

    int sum() {
      int result = 0;
      result += 1;
      return result;
    }

3. Compile with with no optimizations (-O0).

4. Read the output. Note that the colors represent each statement. Pay attention to that. Also, by hovering over an instruction for a few moments you can get a definition of what it does.

Then you can start making adjustments such as using parameters, adding control flow like if statements, invoking other functions, using float instead of int, etc.

Another thing are calling conventions. Functions do not exist in assembly, they're an abstraction. There are many ways to represent a function call and these are some of them:

https://en.wikipedia.org/wiki/X86_calling_conventions

tralarpa · on March 2, 2021

(Edit: I just noticed (yes, I am slow) that you were probably referring to code written by a human for educational purposes. Sorry for the misunderstanding)

> One thing is reading assembly language written by a human, another thing is reading assembly language generated by a compiler. Compilers are insane, especially when you turn on optimizations.

In my experience, it's the opposite. Compiler-generated code has a very homogeneous global structure that is easy to understand even if the compiler does complex stuff on instruction level. Human-written code does things that force me to be permanently on my guard concerning the control flow and the meaning of data, and I am not even talking about code (like malware) designed intentionally to be confusing. Examples from computer games:

- Calling a function X and directly wanting to jump to another function Y afterwards by first pushing the address of Y on the stack, then call X. Sometimes combined with a jump into the middle of another function.

- Manipulating stack frames, accessing local variables of the caller function, etc.

- Using the same location in memory to store completely different data types (imagine a C program using unions everywhere, even for a simple counter variable)

- Organising data on bit level

- Something extremely popular in older games: Collection types with non-uniform elements. Imagine a kind of array or list where some elements are 4 bytes long and some are 6 bytes.

eska · on March 2, 2021

Something extremely popular in older games: Collection types with non-uniform elements. Imagine a kind of array or list where some elements are 4 bytes long and some are 6 bytes.

There's this really funny reaction from the audience when Mike Acton suggests writing different types of scene graph nodes to a void* to avoid unnecessary padding which would pollute the data cache. One of the Q&A questions at the end utters its distaste for it cause it's using void instead of proper types.

It's the objectively better solution in terms of performance, but it's just too yuck for many people.

https://www.youtube.com/watch?v=rX0ItVEVjHc

ngcc_hk · on March 2, 2021

There is no plan for human as long as it works.

gggggggggg554fh · on March 2, 2021

-O0 is unnecessarily bloated (can do lots of unnecessary load data to reg, save to stack, load back same reg again) which also makes it hard to read. I find -Og (on gcc) or -O1 to be the sweet spot.

29athrowaway · on March 2, 2021

Yes, but it is a starting point. You first learn to crawl, then walk, then run. You don't start running right away.

mhh__ · on March 2, 2021

> compilers are insane

Compile factorial with avx enabled and the compiler will often go mad and shit out a 150 instruction body when n=low overflows and thus invokes undefined behaviour anyway

tptacek · on March 2, 2021

The major breakthrough I had in being able to comprehend assembly code was when I learned to look at it in basic blocks, rather than as a straight line. I don't know if that's super obvious (it seems obvious, written out that way), but just taking a listing and adding newlines at block boundaries is massively helpful to me.

chrisaycock · on March 2, 2021

That's definitely how a compiler does it, especially at the IR stage. It's much easier to track values within a localized region; this applies to both people and computers.

tralarpa · on March 2, 2021

> it seems obvious, written out that way

Yep. Most disassembly tools have configuration settings specifically for that (add newline after jumps, put labels on separate lines, etc.)

tptacek · on March 2, 2021

Sure, and I'm guessing I learned to read assembly this way from IDA Pro; I'd had to deal with assembly long before IDA was a thing, but I was never an especially able reader of other people's assembly.

But even today, like, when the eBPF verifier coughs on a hairball, I end up pulling assembly into Emacs and then doing a quick pass marking up the basic blocks before I try to read it.

dtornabene · on March 2, 2021

I'm sure its not too much trouble to whip up on my own but if you have any emacs lisp helping you do this I'd love to read (and use) it

tptacek · on March 3, 2021

I wish. If you write it, let me know.

gens · on March 2, 2021

That's how i write assembly. Extra newline between each small block, and two between every major block.

wglb · on March 2, 2021

This was also discussed by Don Lancaster.

So for a good trivia question: who invented the concept of Basic Blocks?

tptacek · on March 3, 2021

It turns out it is not Frances Allen, if you're wondering.

dtornabene · on March 2, 2021

this was also a major step for me, understanding what a basic block was and also seeing the code organized that way

ianbutler · on March 2, 2021

Back years ago when I had much much more freetime I binged the below to learn how to read x86 assembly and use things like gdb. It's a lot of information to take in but I was getting into reverse engineering back then and have a bit of an obsessive personality so I thought it was great. Still do, so I refer people to the resources anytime something like this pops up.

https://opensecuritytraining.info/IntroX86.html

vsundar · on March 2, 2021

This looks very useful. Thank you for this.

titzer · on March 2, 2021

The perennial problem for me is the bifurcation from "AT&T" and "Intel" syntax. I don't really care if register names have a $, a %, or no prefix at all. It's the reversal of source and destination operands that is absolutely maddening.

Personally I prefer that destination operands are on the left, because that tracks with assignments in normal programming languages. I don't really understand why you would want the opposite.

patrec · on March 2, 2021

It becomes more natural/C-like if you think of

    ADD foo bar

not as

    foo + bar

but as

    foo += bar

And then apply a prefix syntax correction, to end up with

    += foo bar

(Or, equivalently, always mentally add the word "TO" to each operator that modifies an argument, which is most)

Also, most intel instructions use one or two arguments as input, and modify one of them as output. It sort of makes sense to have the output (or input/output) argument always be in the same position, regardless of whether an instruction takes 1 or 2 args. BTW, even in riscv where "add" takes three args, i.e. the form is a = b + c, the output-only argument a comes first.

titzer · on March 2, 2021

> foo += bar

Yeah, this is how I think about it too. It's natural to have the destination operand always as the first one (i.e. left hand side). I think the GNU tools all use the AT&T syntax (destination is last), and I hate it.

gumby · on March 2, 2021

AT&T simply adapted the assembly syntax of earlier machines, ultimately which reflected the architectures of even earlier hardware from a time when there weren't really higher level languages.

Some bizarre high level conventions, like using '=' for assignment, were also adopted from earlier languages but are now commonplace.

pjc50 · on March 2, 2021

"MOV A, B" reads more naturally as "move A to B".

I agree that we should just pick a convention!

root · on March 2, 2021

COBOL has MOVE ... TO ... which seems to be the logical order for a command named MOV/MOVE.

082349872349872 · on March 2, 2021

A way to squint at assembly language: as backwards functional code where almost all subexpressions have been recursively extracted out into local definitions (except for leaf un- and binops) ... and then the resulting temporaries reused.

wruza · on March 2, 2021

And then in a CPU “runtime” that reuse is un-reused and reordered for a better superscalar pipeline throughput.

Hnrobert42 · on March 2, 2021

Why does assembly use cryptic instruction and variable names? imulq could be integer_multiply_64bit. Everyone has big monitors these days. There isn’t space to save. Is it just historical inertia and/or “it’s not so bad. if you are smart like me, it’s easy”? Or are there good reasons to keep things so terse?

flohofwoe · on March 2, 2021

Because they're called mnemonics for a reason ;) They stop being cryptic after a few days of use, but you'd be stuck forever with overly long descriptive instruction names that repeat over and over and over again.

It's not about saving space, but about reducing visual noise and simplifying "visual pattern matching".

PS: Other CPUs do have somewhat friendlier assembly dialects (Motorola 68k comes to mind), but they all have short 3..5 letter mnemonics.

glhaynes · on March 2, 2021

But that's approximately as true for other programming languages. (EDIT: Good point elsewhere in the discussion where it’s pointed out that assembly is significantly more verbose than nearly any other language so it’s desirable to keep it short.)

It'd be interesting to see an editor mode that could jump back and forth between a mnemonic view and a more descriptive one, perhaps even with argument labels to get rid of src vs. dest confusion.

flohofwoe · on March 2, 2021

I think much of the "modern confusion" around assembly code comes from being mainly exposed to raw disassembled compiler output instead of "sane" assembly code written by humans.

Back when writing assembly code was more or less mainstream, "high-level" macro assemblers were used to wrap assembly snippets into fairly advanced macros which could lead to assembly code that was nearly on the same abstraction level as C code, you could define structs, named constants, write complex constant expressions and so on..

There were also dedicated assembly IDEs like ASM-ONE on the Amiga or Turbo Assembler on the PC, which made assembly programming quite comfortable (I guess the same can be achieved today with relatively little effort by writing a VSCode plugin).

skissane · on March 2, 2021

The good reason for this terseness is that assembly is inherently verbose. A single line of code in a high-level language can easily become a dozen in assembly. Your proposal would be taking an already verbose language and making it even more verbose.

“imulq” is not just less typing than “integer_multiply_64bit”, it is less reading too.

rectang · on March 2, 2021

thissentenceistersebutitisstillhardtoparse.

wruza · on March 2, 2021

If you are annoyed by something innocent like i-mul-q, then check SSE and AVX.

Btw, intel syntax doesn’t have these [bdq] suffixes (uses operand typing instead) and looks more clean for simple commands like mov, mul, add, and, etc. Personally, I don’t see any reason for them to be too wordy.

Upd: it is also easier to read when operands are aligned on the same column, because values move between registers form line to line.

rayiner · on March 2, 2021

There is a fixed set of instructions so anyone with a bit of experience will know what they mean. It’s different than being verbose with variable names that can be different in different programs.

unnah · on March 2, 2021

There are a few CPUs, for which the official assembly language is based on algebraic expressions: instructions look like "R1 = R2 * 4", or "R4 = R1 AND R3". See for example the SHARC instruction set: https://fayllar.org/sharc-instruction-set.html

Algebraic assembly is obviously a brilliant idea, yet so few assemblers have followed suit. I guess the tradition of cryptic mnemonics is too strong...

amelius · on March 2, 2021

I don't see how that would help much, as registers are a terrible way to name variables. I think you just want to avoid assembly as much as possible, in general.

cmrdporcupine · on March 2, 2021

Came here to comment and say the same thing. Assemblers already "know" what the opcode is doing (division, etc.) so it's not like it's a high-level-language intrusion to just write it out in a form which is readable in this manner.

rectang · on March 2, 2021

The cryptic instruction names in assembly drive me nuts, too.

`integer_multiply_64bit` is much easier and faster for my eye to parse than `imulq`, which is a blob I have to stop and tease apart. I would also be fine with `imul_64` or `int_mul_64`; over time I could probably get used to `imul_q`.

But `imulq` jammed all together is a nope. I still hate `strrchr` and `atof` and all those silly names which parse poorly from the C standard library, and I've been programming C for many years.

I figure you could create a 1:1 assembly transpiler which does two things:

* Maps a bunch of aliases like `integer_multiply_64bit` to their official names.

* Uses parentheses, argument order, and dereferencing operators according to conventions which are more in line with what people are accustomed to seeing in popular modern programming languages.

Register naming seems a bit tougher, though, and I haven't figured out an approach for deriving intuitive aliases. Suggestions?

> “it’s not so bad. if you are smart like me, it’s easy”?

There are going to be responses that read like this, well-intentioned or no, but we just have to press on regardless.

tralarpa · on March 2, 2021

The question is: why the effort to make assembly even more verbose?

People who write assembly (they still exist, I guess) will prefer "imulq" after a short time because it's much faster to type. Remember that a line like "a=2+b*f(x+1)" corresponds to 5 to 10 machine instructions.

People who have to read assembly don't really care because after a few minutes you know the mnemonics of the 20 most frequently used instructions (which constitute probably 99% of the code) anyway.

Computers that have to write or read a lot of assembly (some compilers do not directly generate binary machine code) are more efficient with a compact representation.

Okay, if we ignore the above cases, there are maybe five or six people in the world who might prefer "integer_signed_multiply_32bit" or "jump_relative_if_unsigned_less_or_equal".

tom_mellior · on March 2, 2021

> People who have to read assembly don't really care

I have to read assembly, and I do care. I'd much rather read something like imul.q than imulq. Assemblers could allow something like this by ignoring such periods or underscores, the same way that many modern programming languages allow you to write 1_000_000 as a synonym for 1000000.

Though a lenient assembler frontend wouldn't necessarily help me, since the assembly code I most often read is dumped by a disassembler I have no control over.

ksherlock · on March 2, 2021

Any assembler worth using has macros so you can integer_multiply_64bit to your hearts content.

MaxBarraclough · on March 5, 2021

It's about the readability of existing code, not the writeability of new code.

joeblau · on March 2, 2021

Each letter takes space and back in the day, there weren’t free flowing 8TB had drives and 128bit cpus flowing from the distribution centers of Amazon with free 2-day shipping. You had to save space every way you could: mnemonic commands, tabs over spaces, etc...

scoutt · on March 2, 2021

You can't apply that rule to all instructions.

For example, movslq becomes move_sign_extended_double_word.

ARM instruction QADD16 would be signed_saturating_parallel_halfword_wise_addition.

QASX: signed_saturating_parallel_add_and_subtract_halfwords _with_exchange.

OnlyOneCannolo · on March 2, 2021

Early assemblers and compilers had character limits for identifiers because it avoided extra memory allocation and pointer indirection.

oilbagz · on March 2, 2021

The answer: Decades and decades of tooling. You wanna go refactor all that code to make it more verbose?

chmaynard · on March 2, 2021

Note to author: Because this is a personal blog, your RSS feed might as well include the entire post instead of the first few sentences. Just a suggestion.

unwind · on March 2, 2021

Cool! A minor typo if the author is here: the first example shows 2D integer vector code, but the declaration has the line

    int64_t z;

swolchok · on March 2, 2021

Thanks! Fix should deploy momentarily.

This is what I get for duplicating the examples across Godbolt and the post itself. It was originally 3D for no particular reason and I cut it to keep things short.

Voltage · on March 2, 2021

In Example 3, there is a 'pushq %rax' but no subsequent pop command.

Error or is the stack being manipulated in another way elsewhere?

dominicjj · on March 2, 2021

This bothered me as well. The only thing I can think of is that one of the calls to libc functions is consuming rax off the stack. The code as displayed by GodBolt is the code as generated by clang so it's not an error there.

EDIT: I've just noticed this line: leaq -24(%rbp), %rsp

It does a manual pop of rax by restoring rsp to what it was before the prologue.

JdeBP · on March 2, 2021

What is going on is this.

A function activation record contains various parts.

* http://jdebp.uk./FGA/function-perilogues.html

The calls to push and pop R15, R14, and RBX in the perilogue are manipulating the save area. The space below the save area is the locals area, and there are local variables in the function.

The inner save and restore of the stack pointer in R15 is adding further, block-scoped, locals to the locals area for the duration of that block, which is why it hasn't been combined with the outer perilogue (one of the things asked about in a footnote). If this function were more complex, the restoration of the original stack pointer from R15 would be well before the function epilogue, demonstrating the disconnect more clearly.

The actual epilogue begins with moving the stack pointer back up to the bottom of the save area, from whereever it happens to have been after the locals area was allocated, so that the save area can then be popped. In the absence of alloca() and variable-length arrays, this could be done by ADDing to the stack pointer register, as the size of the locals area is known by the compiler and fixed at compile time.

alloca() and variable-length arrays make things more complex, though, and this function contains at least one of those. Getting the amount to ADD right would involve retaining the sizes of relevant alloca()s and variable-length arrays. (Also, compilers prefer LEA to ADD for the stack pointer, but for the sake of simplicitly let's just treat this as ADD.)

Fortunately, the address of the bottom of the save area is known, and working out how much to ADD is thus unnecessary and the problem is moot. The bottom of the save area is by its nature at a fixed offset from the frame pointer, with a size known from which registers had to be saved, so the generated code just resets the stack pointer to that offset relative to the frame pointer, effectively popping off the entire locals area that way.

There's no reason to POP what was pushed from RAX back into the register, because that part of the stack is not part of the save area. It's a dummy local variable that ensures that the stack remains 16-byte aligned once all fourof return address, saved frame pointers area, save area, and locals area are in place. As another footnote notes, PUSH RAX is just a quick way of subtracting from the stack pointer without using the ALU, which a SUB instruction would. RAX isn't actually being saved.

dominicjj · on March 11, 2021

Thank you.

wruza · on March 2, 2021

  lea rsp, [rbp-24]

sets stack pointer to 3 registers above saved rbp. (or below, depending on the viewpoint)

gtm1260 · on March 2, 2021

Ahh I remember sweating bullets, trying to diffuse assembly bombs for labs in college. I love how looking at GDB starts out like literally staring at heiroglyphics but by the end of the semester, you get such an intuitive sense for things.

Haven't really 'used' that knowledge directly, and I don't think I will. But it taught me a ton about how computers work, which probably helps me somehow!

waynesonfire · on March 2, 2021

I've heard of people writing custom assembly to get more performance... what level of assembly expertise is required to achieve that? And, what does that look like?

Do you let the compiler emit assembly code and then start editing it to make it faster? Or, does it start from scratch? Are there known areas that the compiler has troubles with?

chrisseaton · on March 2, 2021

It's almost never 'assembly expertise' - it's architecture expertise - knowing a specific instruction you want to use in a specific way to get some kind of fusion or fill some functional unit using some knowledge that the compiler doesn't have.

The actual mechanics of how to write a program in assembly are trivial if you already know a language like C.

retrac · on March 2, 2021

> trivial if you already know a language like C

The few times I've been stuck writing code for a platform without access to a compiler, I just wrote it in pseudo-C then hand-compiled it. Once I got comfortable I could "compile" on full mental autopilot, checked out and watching TV at the same time. But it's horribly dull work, like doing arithmetic by hand.

_448 · on March 2, 2021

Any good resource to learn Arm Assembly?

oilbagz · on March 2, 2021

I found this pretty useful recently:

https://azeria-labs.com/writing-arm-assembly-part-1/

There's also this, but I haven't dug much into it yet:

https://hackaday.com/2018/09/20/learn-arm-assembly-with-the-...

swolchok · on March 2, 2021

I’d be happy to port the article to ARM64 if there was sufficient demand for that, but I think it’s still fairly niche among developers, right? (Apple, recent Android, niche server environments?)

Are you talking about ARM64 or ARMv7?

retrac · on March 2, 2021

A subset of 32-bit ARM is quite common these days in education, often replacing where MIPS was once used.

ARM64 has really swept in the last couple years. My two year old budget Android smartphone has an ARM64. So does my Raspberry Pi. Lots of hardware hacker types likely to play around with assembly would be using ARM, and often 64 bit these days.

That said, it is relatively niche. A lot of those ARM64 machines are running 32 bit software still. But I do expect it to be the dominant architecture in just a couple years.

_448 · on March 4, 2021

> Are you talking about ARM64 or ARMv7?

Either will do. The goal is to learn :)

swolchok · on March 14, 2021

I ported the article to ARM64. https://wolchok.org/posts/how-to-read-arm64-assembly-languag...

_448 · on March 23, 2021

Thank you!