I think that the main point of the blog post, is that instead of an exact match in O(N) worst time, you change the algorithm in order to replace the entry when adding, so that the lookup time also is always O(1) and the size is bound. The cacheline thing is just an optimization in order get the best out of the single cache miss you incurred. The table offset I use is the manual version of what you would do with a struct, no differences in the generated code more or less: the entry and the timestamp are stored at sequential addresses.
I can see that a little. It is a good article regardless what you drew from it. I found it more of a lossy set membership test that used packed cache lines to help in performance (over chaining), but I don't really know what his use case was. Good write up none the less.