Hacker News new | past | comments | ask | show | jobs | submit login
Caching in theory and practice (dropbox.com)
123 points by lowe on Oct 16, 2012 | hide | past | favorite | 10 comments



This was a disappointing read:

It is an interesting question -- Given actual dropbox usage patterns, is there there an caching approach that is better than LRU for real Dropbox customers?

The answer I got was, hybrid LFU/LRU algorithms could work better, but they are complicated and were never tested against real Dropbox access patterns.


The ARC algorithm is patented (by IBM according to Wikipedia), so that might be why it's not used at least.


Only so much an intern can do in a summer, I suppose.


He says that he was an intern back in '11. I'm guessing he's full-time now. Not very clear.


He's a junior now :-)


is there any known algorithm that looks at the relationship between cached items? Ie. File A is accessed, then B, then C, then D, then E, and so on. A would a have stronger relationship to B, one step away, than to D, two steps away. So if we later access file A, the algorithm would know there's a higher probability that we need File B next, so it could check if file B is in the cache and if not, prefetch it and save it in the cache.


Seems like a natural place for an application of a 1st-order markov model - one state for every entity (file), record the load events from each previously loaded file to the next file, and then your cache can predictively load a set of files by computing the most likely transitions.

Going higher than 1-order might make it even smarter, but with the cost of taking more memory (increasing the likelyhood of thrashing).

It actually seems like a very novel and interesting approach, although I don't know if it's been done before.

(Quick googling reveals that there is a LOT of work done on this approach, but I don't have the time right now to see if it's a good one or not.)


Didn't Microsoft introduce something similar in Windows Vista, where it tries to keep the system cache occupied in order to have a better response time for the user?


In the case of viewing a slideshow of photographs on a webpage (and, doubtless, some other things), the browser cache probably further reduces the usefulness of LRU on the server side and reduces the chance of worst-case for MRU.


I still like random replacement; it's really fast and works better than you think.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: