Hacker News new | past | comments | ask | show | jobs | submit login

Wait, this isn't right is it?:

void execute_loop(int & data, const size_t loop_size) { for (int i = 0 ; i < loop_size/4 ; ++i) { computation(data); computation(data); computation(data); computation(data); } }

Shouldn't it be something like: void execute_loop(int & data[], const size_t loop_size) { for (int i = 0 ; i < loop_size/4 ; ++i) { computation(data[4i]); computation(data[4i+1]); computation(data[4i+2]); computation(data[4i+3]); } } ?

Something like that? There needs to be a dependency on the index for duff's device to work right?




The loop size is all that matters. The thing that's being optimized is just:

  for (int i = 0; i < loop_size; i++) {
    some_work();
  }
Whatever some_work may be you want to unroll the loop to avoid the jumps. It could be an operation like:

  c[i] = a[i] + b[i]
In which case you'd do like you suggest. But it could also just be some repeated operation on an entire data set, like in a simulation where you want to run loop_size iterations.


>There needs to be a dependency on the index for duff's device to work right?

For it to work "right" the work inside the loop has to be quite minor. The point being that it saves on cpu cycles by reducing the number of times it does i < loop_size, i++, and the number of times it has to branch back to the beginning of the loop. It's almost always some kind of memory copy, or a simple read, lookup, write in which the inside of the loop will complete in a few cpu cycles.

If the work inside the loop is 100x the cost of implementing the loop there is no point.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: