Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Memcmp is the most efficient way to compare large chunks of data (mymicrocontroller.com)
3 points by arash9k on May 4, 2018 | hide | past | favorite | 4 comments


The author's proposed for-loop version is:

  int *p = (int *)0x80000000;
  int *q = (int *)0x90000000;
  while(p < q){
	if(*p++ != *q++){
		flag = 1;
	}	
  }
There must have been a transcription error as that while loop doesn't look like it will exit until q reaches the end of memory and wraps back to 0.

On my machine, the following takes about 161 ms (that's the best of 5 timings on a non-quiet machine):

  result = memcmp(p, q, SIZE);
The author's for-loop is optimistic, and doesn't short-circuit exit on mismatch, so I'll replace it with a non-branching equivalent:

  result = 0;
  for (int i=0; i<SIZE/sizeof(long); i++) {
      result = result || (p[i] != q[i]);
  }
That takes 310 ms (compiled with Apple LLVM version 8.0.0, using -O3). About 1/2 the performance of built-in.

Replace the int with a long and it's 200ms. Manually unroll by 4 and it's 175ms. My final code is:

  result = 0;
  for (int i=0; i<SIZE/sizeof(long); i+=4) {
     result = result
       || (p[i] != q[i])
       || (p[i+1] != q[i+1])
       || (p[i+2] != q[i+2])
       || (p[i+3] != q[i+3]);
  }

That's only somewhat slower than the memcmp, not the 10x that the author reports.

Of course, memcmp does short-circuit evaluation, and I know the size is a multiple of sizeof(long), so this test case is biased in favor of my code.

I would, of course, always favor the tested, often optimized library code over rolling my own unless there are good reasons otherwise.


Op implemented a poor version of timingsafe_memcmp, which is of course slow. His other HN post today is also full of such errors.

I even don't agree with the major premise that the libc version should be always preferred. In fact memcmp and memcpy can be made much faster.

At http://blogs.perl.org/users/rurban/2014/08/perfect-hashes-an... I wrote a jit like memcmp with switching on wordsized chunks which is 200x faster than memcmp.

And I benchmarked my safelibc implementation of memcpy_s with a good compiler, clang >4, beating the handoptimized and unsafe glibc assembler version by ~5%. gcc stinks there. https://github.com/rurban/safeclib/blob/master/tests/perf_me... Let the compiler optimize it.

But I saw much faster memcpy than this.


Sure, they can be made faster.

I'm not going to spend my life on those sorts of optimizations unless there are good reasons.


You don't see the point. The point is to write proper loops in C which the compiler can optimize. This the fastest way and benefits all platforms. Special cases which are actually slower should be removed. Compilers which are not able to optimize proper loop patterns should be fixed and documented as such. Haven't checked gcc-8 yet, but older versions should not be used for such core functions. libc should be compiled with clang.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: