Wait, this isn't right is it?: void execute_loop(int & data, const size_t loop_s...

Jtsummers · on Nov 19, 2021

The loop size is all that matters. The thing that's being optimized is just:

  for (int i = 0; i < loop_size; i++) {
    some_work();
  }

Whatever some_work may be you want to unroll the loop to avoid the jumps. It could be an operation like:

  c[i] = a[i] + b[i]

In which case you'd do like you suggest. But it could also just be some repeated operation on an entire data set, like in a simulation where you want to run loop_size iterations.

wang_li · on Nov 19, 2021

>There needs to be a dependency on the index for duff's device to work right?

For it to work "right" the work inside the loop has to be quite minor. The point being that it saves on cpu cycles by reducing the number of times it does i < loop_size, i++, and the number of times it has to branch back to the beginning of the loop. It's almost always some kind of memory copy, or a simple read, lookup, write in which the inside of the loop will complete in a few cpu cycles.

If the work inside the loop is 100x the cost of implementing the loop there is no point.