Realtime Metrics for 128 Million Users with Redis: 50ms + 16MB RAM

firefoxman1 · on March 24, 2012

This was posted a while ago, and I have since implemented bitmaps myself. One thing I learned from the documentation[1] is that setting an initial bit at a very high-numbered place (like 2^30 - 1) takes a while to allocate (compared to Redis's normal speed) and blocks other operations in the process.

In my case, and it appears to be true for Spool too, I don't know what bit will be set first. It could be 12 or it could be 2938251, so to prevent a slowdown if the initial bit is a high place I use buckets of bitmaps, each holding around 8 million bits.

[1] Read Warning: http://redis.io/commands/setbit

antirez · on March 24, 2012

There is an easy fix for this, that is also the recommended way to use bitmaps in Redis: split your bitmap among multiple keys.

For instance you want to set bit i, but you want k bits per every key, the you do:

    keyname = "bitmap:"+(i/k)
    keybit = i%k

k can be fairly large, like 128k bytes per key. It's still small but big enough for keys overhead to be negligible.

firefoxman1 · on March 24, 2012

Wow, that's almost exactly how I implemented it:

   var bucketSize = 8190;

   ...

   var bucketNumber = Math.floor(userId / bucketSize),
       bitInBucket = userId % bucketSize;

...correction on my last comment, looks like I use ~8 thousand bits per bucket, not 8 million.

latch · on March 24, 2012

Repost from https://news.ycombinator.com/item?id=3292542 if anyone's interested in reading the comments from then.

mattlong · on March 24, 2012

I know disk space is cheap these days, but at 16MB/metric/<level-of-granularity>, it seems like your metric dataset would grow pretty quick. With just 10 metrics tracked daily, thats another gigabyte per week. Of course it does come with the benefit of maintaining all the raw data since you never roll up or aggregate the data...so the pros probably outweigh that con. :)

mgkimsal · on March 24, 2012

I was thinking that too, but the 16m is keeping track of data for 128m users. Assuming you don't have that many, the number is potentially a lot less.

2 million users' actions could be tracked in 250k per metric. 10 metrics per day is 2.5m per day x 7 days is back to just over 16m (17.5m).

kijin · on March 24, 2012

Redis stores everything in RAM, and RAM is not as cheap as disk. Adding GB's of RAM every week will quickly get rather expensive. But I guess you could dump old data to disk and load it back to Redis only when you need it. It might even compress well, depending on what the metrics track.

ewb · on March 24, 2012

"But I guess you could dump old data to disk and load it back to Redis only when you need it."

Redis has a mode which does this automatically I believe (and it's the default if I remember correctly).

Erwin · on March 24, 2012

Isn't Redis still single-threaded for queries, but saving in the background? That seems a little risky: you've got your 100 million users setting bits in your bitsets and suddenly everything blocks for 10 seconds while old data is being loaded from disk.

kijin · on March 24, 2012

The "virtual memory" feature is now deprecated, I think.

rythie · on March 24, 2012

It only really needs be captured like that using redis, when the collection period is up you could store it on disk, or even just the aggregate information on disk.

reitzensteinm · on March 24, 2012

It's not often you see 16MB these days and it turns out not to be a typo.

mattparlane · on March 24, 2012

The only problem with this method is that it requires that IDs are integers, start at 1 and increment by 1.

I'm using MongoDB and IDs are 12-byte values of which the first four are a timestamp. Does anyone know of a way to make this method work, ideally without adding another field to the collection?

simonw · on March 24, 2012

The comments on the article address this - the OP is using UUIDs as the primary key for their users, but each user is also assigned an "analytics key" which is an integer that started at one. You can even use the redis INCR command to generate these on demand.

bluemoon · on March 24, 2012

There is a way in mongo to replace the id with an auto incrementing number. Have a look at the docs. It's also helpful of you want to use the id as a base62 value for urls

ericd · on March 24, 2012

Kind of a disingenuous title, since that time sort of implies that redis is handling that many users and that that's the average response time...

pooriaazimi · on March 24, 2012

Sorry - Although I think the word 'Metrics' must debunk that implication, I can understand what you mean.

I hope it's clearer now.

was:

   Realtime Metrics with Redis: 128 Million Users + 16MB RAM = 50ms

is:

   Realtime Metrics for 128 Million Users with Redis: 50ms + 16MB RAM

ericd · on March 24, 2012

I still think that those user numbers in the title evokes a mental image of a certain type of load with a certain type of response time. I think if you got rid of the response time, it would be less linkbaity, because then it's clear that your focus is on the amount of storage it would take. It's not very important either way.