Caching in theory and practice

jcampbell1 · on Oct 16, 2012

This was a disappointing read:

It is an interesting question -- Given actual dropbox usage patterns, is there there an caching approach that is better than LRU for real Dropbox customers?

The answer I got was, hybrid LFU/LRU algorithms could work better, but they are complicated and were never tested against real Dropbox access patterns.

arnsholt · on Oct 17, 2012

The ARC algorithm is patented (by IBM according to Wikipedia), so that might be why it's not used at least.

eblume · on Oct 16, 2012

Only so much an intern can do in a summer, I suppose.

namukang · on Oct 17, 2012

He says that he was an intern back in '11. I'm guessing he's full-time now. Not very clear.

dudey · on Oct 17, 2012

He's a junior now :-)

pencilcode · on Oct 16, 2012

is there any known algorithm that looks at the relationship between cached items? Ie. File A is accessed, then B, then C, then D, then E, and so on. A would a have stronger relationship to B, one step away, than to D, two steps away. So if we later access file A, the algorithm would know there's a higher probability that we need File B next, so it could check if file B is in the cache and if not, prefetch it and save it in the cache.

eblume · on Oct 16, 2012

Seems like a natural place for an application of a 1st-order markov model - one state for every entity (file), record the load events from each previously loaded file to the next file, and then your cache can predictively load a set of files by computing the most likely transitions.

Going higher than 1-order might make it even smarter, but with the cost of taking more memory (increasing the likelyhood of thrashing).

It actually seems like a very novel and interesting approach, although I don't know if it's been done before.

(Quick googling reveals that there is a LOT of work done on this approach, but I don't have the time right now to see if it's a good one or not.)

azth · on Oct 17, 2012

Didn't Microsoft introduce something similar in Windows Vista, where it tries to keep the system cache occupied in order to have a better response time for the user?

dllthomas · on Oct 16, 2012

In the case of viewing a slideshow of photographs on a webpage (and, doubtless, some other things), the browser cache probably further reduces the usefulness of LRU on the server side and reduces the chance of worst-case for MRU.

aidenn0 · on Oct 17, 2012

I still like random replacement; it's really fast and works better than you think.