Hacker News new | past | comments | ask | show | jobs | submit login

> And why 32 entries? I ran this benchmark with a bunch of different bucket sizes and 32 worked well. I have no idea why that worked out to be the best.

If you were using 2-byte ints, this is likely because cache lines are 64 bytes, so 32 entries would be exactly one cache line, letting each cache line hold an entire bucket, thus reducing those expensive main memory transfers.




I really like the way Knuth benchmarks many of his later programs. He basically puts a counter for how many times something has to be loaded from memory. Would be curious to know if you could approximate how many times you have to clear cache lines, in the same way?


Yeah when benchmarking by batch sizes it's common to see huge jumps associated with the memory hierarchy:

- word size (64bits) - cache alingment fetch size (generally 64bytes as mentioned above) - OS page size (4-16kb) - L1 size (~80kb/core) - L2 (low megabyte number)


With lots of bizarre artifacts if you don’t force alignment.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: