Careful abusing these tricks. Over 10 years ago I decided to implement an RC4 (a...

azhenley · on Jan 18, 2021

The code from the Wikipedia article for XOR swapping [1] checks if the values are equal.

  void XorSwap(int *x, int *y) {
    if (x != y) {
      *x ^= *y;
      *y ^= *x;
      *x ^= *y;
    }
  }

[1] https://en.wikipedia.org/wiki/XOR_swap_algorithm

FabHK · on Jan 18, 2021

> checks if the values are equal

Not the values (the values being equal doesn't break the trick), but the pointers. That is, if you use the XOR trick to swap a value with itself, then it will be set to zero, instead.

As if you had written:

  x ^= x
  x ^= x
  x ^= x

Yeah, now x would be zero, not itself.

wongarsu · on Jan 18, 2021

At that point it's probably almost always better to just use some temporary space, rather than dealing with all the overhead of a branch (what if it's mispredicted, what about the resources in the branch predictor tied up by this that might cause something else to be mispredicted).

On the other hand there are many situations where you can guarantee that x and y are not the same space in memory (for example because they are local variables). There this trick might still be interesting (for the compiler or assembly programmer)

Snild · on Jan 18, 2021

The xor swap truck is rarely better. It causes pipeline stalls, and so is likely to be significantly slower than the trivial swap. The temporary storage will be in a register, so it's (usually) not causing memory accesses.

Taniwha · on Jan 18, 2021

Yeah exactly this, it only really makes sense in some very limited situations in embedded software, for example when you might be in an interior handler and have no free registers

Someone · on Jan 18, 2021

If they’re local variables, the compiler may not need to generate any code for a swap. It could do the equivalent of register renaming (https://en.wikipedia.org/wiki/Register_renaming)

saagarjha · on Jan 26, 2021

And if your compiler can't do it, your hardware might.

mhh__ · on Jan 19, 2021

On top of that, the CPU can literally elide movs at runtime so the whole swap could end up being zero latency depending on the code motion

voldacar · on Jan 19, 2021

Wouldnt the extra branch kill any performance improvement you were getting from the xor trick?

ChrisRR · on Jan 19, 2021

As with every performance optimisation, it's better to test and profile rather than to assume

ufo · on Jan 18, 2021

On the topic of intermediate variables, when I'm forced to use c89 I'm really shameless about introducing extra code blocks in the middle of other blocks. For example,

    {int i; for(i=0; i<N; i++){
       /**/
    }}

Still bothersome but better than having to declare everything at the top of the function, IMO.

rcoveson · on Jan 18, 2021

I do something like this in "modern" Java to get for-each with index:

    { int i = -1; for (V v : list) { i++;
        // stuff
    } }

While declaring i = -1 to start is a little gross, I like that it has some of the same properties as a normal for loop. No effect on parent scope, and all loop-logic is concentrated on the first (textual) line.

simias · on Jan 18, 2021

I agree, but I think back then I didn't know this trick.

I also abuse blocks in Rust, but it's more in order to placate the borrow checker...

jfk13 · on Jan 18, 2021

That's not necessarily abuse; it may be quite appropriate to explicitly limit something's lifetime.

sixstringtheory · on Jan 18, 2021

As an example, in our C++ codebase at work we always place mutex locks inside their own scope blocks along with the critical sections of code they’re synchronizing. Helps readability and the scope determines when the mutex is released.

gnull · on Jan 18, 2021

Indeed! I somewhat over-use this in Rust to initialize an immutable variable with mutating code.

  let var: Vec<u32>;              // this one is immutable, will be initialized later
  {
    let mut var_;                 // mutable
    …                             // initialize var_ with some mutating code
    var = var_;                   // now move var_ to var, initializing the latter
  }

steveklabnik · on Jan 18, 2021

This is a bit simpler as:

  let var = {
      let mut var = ...

      var
   };

that is, you can assign the result of the block directly. That way you don't need two variables with the _ name as the second.

wizzwizz4 · on Jan 18, 2021

Couldn't you just write:

  let var: Vec<u32> = {
    let mut var_;
    …
    var_
  };

Or, better still:

  let mut var: Vec<u32>;
  …
  let var = var;

zerd · on Jan 18, 2021

You can also initialize it to an expression, e.g.

    let var = {
      let mut var_ = Vec::new();
      var_.push(1u32);
      var_
    };

black3r · on Jan 18, 2021

if I was forced to use c89 I would resign and find a better job...

gallier2 · on Jan 18, 2021

The XOR trick has also the disadvantage of being 3 dependend instructions that can not overlap. The one with a temp variable has only 2 of them dependend and can therefore save one cycle on an OO CPU.

mhh__ · on Jan 19, 2021

Even in a old OOO core because of register renaming and friends the swap could have a latency of zero in the first place.

usefulcat · on Jan 18, 2021

> And that's the story of how I never used this trick ever again.

I agree with your point, and I would go on to say that this is an excellent example of the value of tests. Especially with something so relatively small and self-contained, which makes it easy to test.

ssl232 · on Jan 18, 2021

I sometimes use your "bug" on purpose in tests, when checking that two values are either both defined or both undefined, but not one defined and one undefined:

$ python

>>> a, b = 1, 2

>>> bool(a) ^ bool(b)

False

>>> a, b = 1, 1

>>> bool(a) ^ bool(b)

False

>>> a, b = None, None

>>> bool(a) ^ bool(b)

False

>>> a, b = None, 1

>>> bool(a) ^ bool(b)

True

(Note: this doesn't work if a value can be 0, because bool(0) is False in Python).

This helps to avoid having to write something like `(a is not None and b is not None) or (a is None and b is None)`.

shiftingleft · on Jan 18, 2021

Your note is very important, I think it would be good to give it more emphasis.

What works flawlessly, however, is:

  In [1]: a = None
  In [2]: b = 0
  In [3]: (a is None) ^ (b is None)
  Out[3]: True

Alternatively, as suggested in another comment, you can use inequality as a replacement for XOR:

  In [4]: (a is None) != (b is None)
  Out[4]: True

Another error prone pattern is the following:

  In [5]: a or b
  Out[5]: 0

Which can behave differently depending on the order of elements/values:

  In [6]: b or a
  Out[6]: None # should be 0

Since b is zero, it doesn't count. Less error prone, but also more verbose is:

  In [10]: a if a is not None else b
  Out[10]: 0

ssl232 · on Jan 18, 2021

Lesson for myself: check my working. The test I actually used the XOR in, that I was referring to, uses the first example you wrote - I agree, this is much better. However, I also like your second example, which is arguably clearer. Thanks!

gfodor · on Jan 18, 2021

Not a python person, but I usually do this in other languages via an equality check `(a != nil) == (b != nil)` etc

patrec · on Jan 18, 2021

You are aware that ^ on a 1 bit value (like a boolean) is just !=, right?

ssl232 · on Jan 18, 2021

Hah, I wrote that code on the fly and didn't check the aforementioned test implementation, where I had `(a is None) ^ (b is None)` like the other commenter suggested.

patrec · on Jan 18, 2021

You could just write this as

    (a is None) is not (b is None)

or

    (a is None) != (b is None)

It won't make a difference (and is 1 to to 5 characters longer), but it seems "cleaner" since it's more specific.

jl2718 · on Jan 19, 2021

I like this as an example of how compilers cause bugs. This code looks reasonable and concise, but it shouldn’t. If you were writing microcode for a non-superscalar processor, you would do two separate memory to register loads and then three xors and then two register to memory stores. But the compiler teaches us to use shorthand and it will fill in the details. In this case the resulting code is ridiculous with loads and stores on each xor because the compiler can’t guarantee that i!=j, and it would normally perform terribly, but we’ve architected CPUs around compilers to use caches that make up for these problems, and they are so prevalent that it’s worth it to dedicate most of the CPU die area to branch prediction based on conventions of the C compiler, and most of the compiler code to conventions of the CPU branch predictor. It’s getting quite ridiculous, and maybe it’s time to move to architectures and languages that make everything explicit.

Edit: also, there is an xchg instruction that you have to pull out of intrinsics to use in C (good luck in most other HLL). It seems like a language should be more about how to use vocabulary than restricting it.

tobinfricke · on Jan 18, 2021

People like to harp on C++, but templates permit code that is simultaneously maximally clear and efficient:

#include <utility>

swap(s[i], s[j]);

Iwan-Zotow · on Jan 19, 2021

Prefer pythonic style

std::tie(b, a) = std::make_tuple(a, b);

CamperBob2 · on Jan 19, 2021

Why? That's about 400% harder to read. Even someone who doesn't know how to program in any language can probably infer what the C++ example does.

ChrisRR · on Jan 19, 2021

I know normally python is simpler and easier to read, but this is not one of those times.

The example C++ is much easier to read and more intuitive as to what it does.

tobinfricke · on Jan 21, 2021

To be fair, the Python syntax is just:

a, b = b, a

The "std::tie(b, a) = std::make_tuple(a, b)" is an attempt to replicate the Python style in C++... which, I agree, is certainly not clear, and, unlike using the "swap" algorithm, it's also unclear whether it will result in efficient code.

gnull · on Jan 18, 2021

> because the interviewer expected you to know how to find a duplicate in an integer list by using xor

Maybe they expect you to come up with a solution right there. This way they can see that you're curious and trying to solve a problem even if you don't know its solution beforehand. (And by looking at your attempts they your mind is even working in the right direction. They could give you small hints and observe how you process them.)

I never conducted interviews myself, but some years ago my supervisor asked me for advice on interview problems for his interns. He wanted something that will help spot a person inclined to algorithmic thinking, but the job was not 100% algorithmic, he needed programmers. If he asked me the same today, I would have advised him to look at these xor search problems.

WalterBright · on Jan 19, 2021

> how to find a duplicate in an integer list by using xor

I've never heard of that trick and would never have thought of it on my own. It takes a moment's thought to see why it works, and I like it.

Bit twiddling is a bit like symbolic integration. There isn't a systematic approach, it's just an accumulation of formulae that people have stumbled upon over time. You can buy books of them. (These days, of course, you just use a program that has them all hardcoded into it.)

1MachineElf · on Jan 18, 2021

n00b question: What is the ^= operator, and what language is it from?

valvar · on Jan 18, 2021

It's bitwise XOR in C (and most other languages, I guess).

https://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B#Bit...

detaro · on Jan 18, 2021

x ^= y is the same as x = x ^ y, with ^ being XOR. Bunch of languages have it, e.g. C, C++, Java, ...

PartiallyTyped · on Jan 18, 2021

I have an alternative to the xor trick that doesn't suffer from zeros.

    def swap(a,b):
        a=a-b
        b=a+b
        a=b-a
        return a,b

simias · on Jan 18, 2021

That suffers from the same problem if a and b are pointers to the same thing (which was the issue in my code, since I indexed the same cell of the array).

Note that the problem is not that a and b have the same value, or even that one of them is 0, it's that a and b alias to the same memory location, effectively being two handles to the same variable.

mrmuagi · on Jan 18, 2021

Hmm, swaps the zero problem for a overflow/underflow one?

contravariant · on Jan 19, 2021

If you stick to multiplication and addition then the beauty of modular arithmetic ensures that over and underflow are only a problem if your final result under/overflows.

In particular you can solve the problem in this article by just subtracting the numbers from 1+2+...+n-1+n. The remainder will be your missing number (though be careful with the 1+2+...+n-1+n = n(n-1)/2 formula; division by 2 is very much not compatible with arithmetic modulo a power of 2, so divide first then multiply, or keep a running total if you want to be flexible).

ganafagol · on Jan 19, 2021

Worse. The zero problem does not exist if the two variables are at different memory locations. The overflow problem of this solution always exists. Plus, if a and b are at the same memory location, it suffers from the same zero problem.

Izkata · on Jan 19, 2021

Uh,

  def swap(a, b):
      return b, a

?

Or since this is presumably python, just inline it without a function:

  (a, b) = (b, a)

taejo · on Jan 18, 2021

Did you use this in a contest? I seem to remember this exact thing being used to leak data in a contest.

m463 · on Jan 19, 2021

that's a very sharp edge case.

Thorham · on Jan 18, 2021

You're wrong. Remember that XOR works on individual bits. If what you're saying was true, then swapping bits that are both set to 1 would also fail, which means this algorithm wouldn't work at all.

Edit: Ignore this post, I misread the original post as saying swapping the same values would fail.

alowe · on Jan 18, 2021

They're not saying swapping equal valued variables breaks it. It's when the pointer is the same, using the trick to swap a variable with itself will set the variable to 0.

Thorham · on Jan 18, 2021

Yeah, I miss-read that. Thanks for pointing that out!

dahart · on Jan 18, 2021

> swapping bits that are both set to 1 would also fail

No, swapping two 1 bits works fine. Work it out more slowly, the article covered this and why it always works.

(1,1) => (1^1,1)=(0,1) => (0,1^0)=(0,1) => (0^1,1)=(1,1)