I know disk space is cheap these days, but at 16MB/metric/<level-of-granular...

mgkimsal · on March 24, 2012

I was thinking that too, but the 16m is keeping track of data for 128m users. Assuming you don't have that many, the number is potentially a lot less.

2 million users' actions could be tracked in 250k per metric. 10 metrics per day is 2.5m per day x 7 days is back to just over 16m (17.5m).

kijin · on March 24, 2012

Redis stores everything in RAM, and RAM is not as cheap as disk. Adding GB's of RAM every week will quickly get rather expensive. But I guess you could dump old data to disk and load it back to Redis only when you need it. It might even compress well, depending on what the metrics track.

ewb · on March 24, 2012

"But I guess you could dump old data to disk and load it back to Redis only when you need it."

Redis has a mode which does this automatically I believe (and it's the default if I remember correctly).

Erwin · on March 24, 2012

Isn't Redis still single-threaded for queries, but saving in the background? That seems a little risky: you've got your 100 million users setting bits in your bitsets and suddenly everything blocks for 10 seconds while old data is being loaded from disk.

kijin · on March 24, 2012

The "virtual memory" feature is now deprecated, I think.

rythie · on March 24, 2012

It only really needs be captured like that using redis, when the collection period is up you could store it on disk, or even just the aggregate information on disk.