Careful abusing these tricks. Over 10 years ago I decided to implement an RC4 (arcfour) cypher to generate pseudorandom noise for a test program.
The algorithm looks like (from wikipedia):
i := 0
j := 0
while GeneratingOutput:
i := (i + 1) mod 256
j := (j + S[i]) mod 256
swap values of S[i] and S[j]
K := S[(S[i] + S[j]) mod 256]
output K
endwhile
Being a smartass 1337 coder (and declaring intermediate variables always being a bit of a bother in C89) in my implementation I decided to implement the swap using the xor trick:
s[i] ^= s[j];
s[j] ^= s[i];
s[i] ^= s[j];
I then ran a few basic test vectors, everything seems to work fine.
A little while later I noticed that the output of the cypher was weird, the longer it ran the more zeroes I would get, eventually getting only zeroes.
The keen reader will already have spotted the problem: whenever i and j happen to be equal instead of doing nothing the code sets the entry to 0 (since x ^ x == 0 for any x).
And that's the story of how I never used this trick ever again.
In general it's always better to write clear, concise code and let the compiler optimize it. If it turns that the code is performance critical and the compiler doesn't do a good enough job then you can see if the smart solution is worth it.
And if you fail an interview because the interviewer expected you to know how to find a duplicate in an integer list by using xor then you probably dodged a bullet anyway. What a silly trivia question.
Not the values (the values being equal doesn't break the trick), but the pointers. That is, if you use the XOR trick to swap a value with itself, then it will be set to zero, instead.
At that point it's probably almost always better to just use some temporary space, rather than dealing with all the overhead of a branch (what if it's mispredicted, what about the resources in the branch predictor tied up by this that might cause something else to be mispredicted).
On the other hand there are many situations where you can guarantee that x and y are not the same space in memory (for example because they are local variables). There this trick might still be interesting (for the compiler or assembly programmer)
The xor swap truck is rarely better. It causes pipeline stalls, and so is likely to be significantly slower than the trivial swap. The temporary storage will be in a register, so it's (usually) not causing memory accesses.
Yeah exactly this, it only really makes sense in some very limited situations in embedded software, for example when you might be in an interior handler and have no free registers
On the topic of intermediate variables, when I'm forced to use c89 I'm really shameless about introducing extra code blocks in the middle of other blocks. For example,
{int i; for(i=0; i<N; i++){
/**/
}}
Still bothersome but better than having to declare everything at the top of the function, IMO.
I do something like this in "modern" Java to get for-each with index:
{ int i = -1; for (V v : list) { i++;
// stuff
} }
While declaring i = -1 to start is a little gross, I like that it has some of the same properties as a normal for loop. No effect on parent scope, and all loop-logic is concentrated on the first (textual) line.
As an example, in our C++ codebase at work we always place mutex locks inside their own scope blocks along with the critical sections of code they’re synchronizing. Helps readability and the scope determines when the mutex is released.
Indeed! I somewhat over-use this in Rust to initialize an immutable variable with mutating code.
let var: Vec<u32>; // this one is immutable, will be initialized later
{
let mut var_; // mutable
… // initialize var_ with some mutating code
var = var_; // now move var_ to var, initializing the latter
}
The XOR trick has also the disadvantage of being 3 dependend instructions that can not overlap. The one with a temp variable has only 2 of them dependend and can therefore save one cycle on an OO CPU.
> And that's the story of how I never used this trick ever again.
I agree with your point, and I would go on to say that this is an excellent example of the value of tests. Especially with something so relatively small and self-contained, which makes it easy to test.
I sometimes use your "bug" on purpose in tests, when checking that two values are either both defined or both undefined, but not one defined and one undefined:
$ python
>>> a, b = 1, 2
>>> bool(a) ^ bool(b)
False
>>> a, b = 1, 1
>>> bool(a) ^ bool(b)
False
>>> a, b = None, None
>>> bool(a) ^ bool(b)
False
>>> a, b = None, 1
>>> bool(a) ^ bool(b)
True
(Note: this doesn't work if a value can be 0, because bool(0) is False in Python).
This helps to avoid having to write something like `(a is not None and b is not None) or (a is None and b is None)`.
Lesson for myself: check my working. The test I actually used the XOR in, that I was referring to, uses the first example you wrote - I agree, this is much better. However, I also like your second example, which is arguably clearer. Thanks!
Hah, I wrote that code on the fly and didn't check the aforementioned test implementation, where I had `(a is None) ^ (b is None)` like the other commenter suggested.
I like this as an example of how compilers cause bugs. This code looks reasonable and concise, but it shouldn’t. If you were writing microcode for a non-superscalar processor, you would do two separate memory to register loads and then three xors and then two register to memory stores. But the compiler teaches us to use shorthand and it will fill in the details. In this case the resulting code is ridiculous with loads and stores on each xor because the compiler can’t guarantee that i!=j, and it would normally perform terribly, but we’ve architected CPUs around compilers to use caches that make up for these problems, and they are so prevalent that it’s worth it to dedicate most of the CPU die area to branch prediction based on conventions of the C compiler, and most of the compiler code to conventions of the CPU branch predictor. It’s getting quite ridiculous, and maybe it’s time to move to architectures and languages that make everything explicit.
Edit: also, there is an xchg instruction that you have to pull out of intrinsics to use in C (good luck in most other HLL). It seems like a language should be more about how to use vocabulary than restricting it.
The "std::tie(b, a) = std::make_tuple(a, b)" is an attempt to replicate the Python style in C++... which, I agree, is certainly not clear, and, unlike using the "swap" algorithm, it's also unclear whether it will result in efficient code.
> because the interviewer expected you to know how to find a duplicate in an integer list by using xor
Maybe they expect you to come up with a solution right there. This way they can see that you're curious and trying to solve a problem even if you don't know its solution beforehand. (And by looking at your attempts they your mind is even working in the right direction. They could give you small hints and observe how you process them.)
I never conducted interviews myself, but some years ago my supervisor asked me for advice on interview problems for his interns. He wanted something that will help spot a person inclined to algorithmic thinking, but the job was not 100% algorithmic, he needed programmers. If he asked me the same today, I would have advised him to look at these xor search problems.
> how to find a duplicate in an integer list by using xor
I've never heard of that trick and would never have thought of it on my own. It takes a moment's thought to see why it works, and I like it.
Bit twiddling is a bit like symbolic integration. There isn't a systematic approach, it's just an accumulation of formulae that people have stumbled upon over time. You can buy books of them. (These days, of course, you just use a program that has them all hardcoded into it.)
That suffers from the same problem if a and b are pointers to the same thing (which was the issue in my code, since I indexed the same cell of the array).
Note that the problem is not that a and b have the same value, or even that one of them is 0, it's that a and b alias to the same memory location, effectively being two handles to the same variable.
If you stick to multiplication and addition then the beauty of modular arithmetic ensures that over and underflow are only a problem if your final result under/overflows.
In particular you can solve the problem in this article by just subtracting the numbers from 1+2+...+n-1+n. The remainder will be your missing number (though be careful with the 1+2+...+n-1+n = n(n-1)/2 formula; division by 2 is very much not compatible with arithmetic modulo a power of 2, so divide first then multiply, or keep a running total if you want to be flexible).
Worse. The zero problem does not exist if the two variables are at different memory locations. The overflow problem of this solution always exists. Plus, if a and b are at the same memory location, it suffers from the same zero problem.
You're wrong. Remember that XOR works on individual bits. If what you're saying was true, then swapping bits that are both set to 1 would also fail, which means this algorithm wouldn't work at all.
Edit: Ignore this post, I misread the original post as saying swapping the same values would fail.
They're not saying swapping equal valued variables breaks it. It's when the pointer is the same, using the trick to swap a variable with itself will set the variable to 0.
The algorithm looks like (from wikipedia):
Being a smartass 1337 coder (and declaring intermediate variables always being a bit of a bother in C89) in my implementation I decided to implement the swap using the xor trick: I then ran a few basic test vectors, everything seems to work fine.A little while later I noticed that the output of the cypher was weird, the longer it ran the more zeroes I would get, eventually getting only zeroes.
The keen reader will already have spotted the problem: whenever i and j happen to be equal instead of doing nothing the code sets the entry to 0 (since x ^ x == 0 for any x).
And that's the story of how I never used this trick ever again.
In general it's always better to write clear, concise code and let the compiler optimize it. If it turns that the code is performance critical and the compiler doesn't do a good enough job then you can see if the smart solution is worth it.
And if you fail an interview because the interviewer expected you to know how to find a duplicate in an integer list by using xor then you probably dodged a bullet anyway. What a silly trivia question.