As mentioned in this article, x ^ x == 0. Fun fact, this is frequently used by c...

nickelpro · on Jan 18, 2021

Xor'ing registers isn't a compiler trick or arcane piece of lore, it's the canonical way to zero a register on most architectures. It's the only universally recommended way for both Intel and AMD x86 and x64 processors.

bonzini · on Jan 18, 2021

Not necessarily true for most architectures. Many RISC architectures have a "zero" register, so a canonical "move short immediate" instruction could be "OR Rn, R0, #n" ("OR R0 which is always zero with n and store into Rn") or the same with ADD. Then clearing Rn will usually be "OR Rn, R0, R0", "ADD Rn, R0, #0" or something like that.

nickelpro · on Jan 18, 2021

I should have been clearer, when I say most architectures I mean most x86/x64 architectures not most ISAs. Obviously ISAs with a dedicated zero register don't need the zeroing idioms of x86.

bonzini · on Jan 18, 2021

In that case it's all of them. Some might like having both a MOV and a XOR in some special cases (one to avoid partial register stalls and one to avoid partial flags stalls, if I remember correctly) but even in those cases it's usually easy enough to avoid partial register stalls in some other way and just use XOR.

FartyMcFarter · on Jan 18, 2021

In fact, modern x86 CPUs know that the result of "x XOR x" is independent of x, and use this fact to optimize operations that would otherwise have a dependency.

tomtomtom777 · on Jan 19, 2021

You are correct but using programmers terminology here is confusing.

The opcodes 31 C0 mean "Set EAX register to zero"

The opcodes 31 D8 mean "Set EAX to EAX xor EBX"

The trick part is in the mnemonics used by assemblers and literature that transcribe the first as "XOR EAX, EAX".

It isn't actually implemented as such, so it isn't really an "optimization"

FartyMcFarter · on Jan 19, 2021

I'm not sure what you mean by "programmers terminology".

0x31 0xC0 is disassembled to "xor eax, eax".

This is not nitpicking, as there's an important difference. The "xor eax, eax" instruction affects CPU flags [1], while "mov eax, 0" doesn't [2].

> It isn't actually implemented as such

The implementation is independent of the meaning of the instruction set. There are many implementations of x86 instructions with differing levels and kinds of optimization, so we can't make general statements about that.

[1] https://c9x.me/x86/html/file_module_x86_id_330.html

[2] https://c9x.me/x86/html/file_module_x86_id_176.html

Kwpolska · on Jan 19, 2021

You make it sound as if there were four Intel/AMD x86/x64 architectures. And well, there aren’t. The vendor split is not a thing, we’re all running the same software on Intel and AMD x86/x64 processors. And you could argue that the x86/x64 split doesn’t really matter for this, since x64 is a superset of x86 and inherits this tradition from the 16-bit and 32-bit eras.

nickelpro · on Jan 19, 2021

There are dozens of x86/x64 uarchs, x86 is just an ISA, doesn't say anything about how the chip is built.

danpalmer · on Jan 18, 2021

Thanks! Learning even more!

gameswithgo · on Jan 18, 2021

I feel like xor comes up all the time in programming! Perhaps generations that have almost entirely focuses on web front and backends never have use for it, but libraries underneath it all certainly are.

mytailorisrich · on Jan 18, 2021

xor reg, reg is indeed the standard way to zero out a register in x86 assembly (it's not just a compiler trick) as it is both shorter and faster than loading the register with zero via a mov.

Apart from that, I'd say that a common use of XOR operations in general are interactions with hardware peripherals where manipulating bit fields are needed.

oconnor663 · on Jan 18, 2021

Why does mov'ing a constant end up having overhead?

nickelpro · on Jan 18, 2021

It's not overhead, it's about dependency breaking. 32-bit xors on a single register are universally recognized as a zeroing idiom, which means the CPU doesn't have to wait for the results of previous operations in order to set the value of the applicable register to zero.

In modern CPUs zero'ing idioms aren't even executed, they only get as far as the register allocater. The register allocater will allocate a zero'd physical register for the architectural register that had the idiom applied to it and the job is done.

shiftingleft · on Jan 18, 2021

Could you share a source about register allocation describing optimizations such as the one you described?

wongarsu · on Jan 18, 2021

You might find what you are looking for by googling "register renaming"

saagarjha · on Jan 26, 2021

The magic words for this is "zeroing idiom".

detaro · on Jan 18, 2021

It's larger, because it needs to fit a 32bit value of 0 in the instruction, and thus e.g. on x86 needs 5 bytes, whereas xor reg,reg needs 2. As such it was a common code size optimization, which in turn has lead to CPU manufacturers optimizing their CPUs to recognize it and treat it even more efficiently.

mytailorisrich · on Jan 18, 2021

A comprehensive answer:

https://stackoverflow.com/questions/33666617/what-is-the-bes...

Basically, xor leads to smaller code and more efficient use of resources.

eMSF · on Jan 18, 2021

IIRC immediate operands have to be the same size as the destination, so to zero a 32-bit register you need a 32-bit constant, and it simply makes the whole instruction larger than a simple xor.

toxik · on Jan 18, 2021

A somewhat common use of XOR is p != q. It is actually an XOR.

Another useful way to think of XOR is "either p or q is true, but not both, and not neither."

globular-toast · on Jan 18, 2021

> Another useful way to think of XOR is "either p or q is true, but not both, and not neither."

That's literally what eXclusive OR means. This should be the first way to think about it.

toxik · on Jan 18, 2021

I think most people think of XOR as “difference of bits”, which it is.

dan-robertson · on Jan 18, 2021

FWIW, I mostly think of it as addition in Z_2^n.

globular-toast · on Jan 18, 2021

That is more traditional. It's addition without carry.

randomswede · on Jan 19, 2021

Although in Z_2^n, "addition without carry" and "subtraction without carry" are the same operation.

bonzini · on Jan 18, 2021

Compilers can optimize "a != b" to "a xor b" if they know that both operands are 0 or 1. There are crazy many expression rewriting rules.

dan-robertson · on Jan 18, 2021

They don’t need to be 0 or 1. We already know that a ^ a = 0, a ^ 0 = a, and a ^ (b ^ c) = (a ^ b) ^ c, so we must have a ^ b = 0 only when a = b. Therefore in the C convention where 0 means false, a ^ b = not (a = b) = (a!=b) (this equality only holds for expressions that are going straight into a Boolean operation (or test in eg an if statement), as a!=b should always evaluate to 0 or 1.

The compiler may use this by xoring a and b and then jumping if the zero flag was set.

bonzini · on Jan 20, 2021

> (this equality only holds for expressions that are going straight into a Boolean operation

Right, though that's more commonly turned into just a cmp instruction (a subtraction).

However, if the result is assigned to a variable, on x86 "a != b" would be two or three instructions (possibly clearing a register because setne takes an 8-bit register operand only, and then a cmp/setne pair). Instead the xor would be one instruction, or two if a mov is needed.

tyingq · on Jan 18, 2021

"In addition, there are comparatively few cases in programming where we XOR. Sure, it happens in things like games quite a lot, but the main use is actually _cryptography_"

I'd be curious if that's actually true. I know XOR is used for non-cryptographic checksums, parity bits, maintaining key traversal order for associative arrays, overflow detection, etc. Lots of general purpose "stuff" that isn't cryptography.

ellis-bell · on Jan 18, 2021

XOR is used a ton in the theoretical underpinnings of cryptography. It's used in the one time pad which is essentially the "smallest" cryptographic scheme that is perfectly secure (perfectly secure has a mathematical definition in this context, it's not saying there can never be any attacks).

In general the reason why is that if you have two random variables x and y, where x has any distribution (so for example x could even be "attack normandy on june 6" with certainty) and y is uniformly distributed across all n-bit strings (so it could be any string of n zeros and ones with equal probability), then you can show that x ^ y appears as if it is also uniformly distributed across all n-bit strings as well.

Because of this property it's used frequently in many higher order methods as well.

swiftcoder · on Jan 18, 2021

A lot of that sort of thing is abstracted away by common standard libraries. I don't tend to run into a whole lot of XOR on a day-to-day basis, apart from when digging into low-level libraries.

FartyMcFarter · on Jan 18, 2021

Hashing is another big use case.

For example, XOR is the most convenient combining function for Zobrist hashes in chess programs. OR/AND would be bad choices: given random input, their output is biased to the values 1 and 0 (respectively).