This was posted a while ago, and I have since implemented bitmaps myself. One thing I learned from the documentation[1] is that setting an initial bit at a very high-numbered place (like 2^30 - 1) takes a while to allocate (compared to Redis's normal speed) and blocks other operations in the process.
In my case, and it appears to be true for Spool too, I don't know what bit will be set first. It could be 12 or it could be 2938251, so to prevent a slowdown if the initial bit is a high place I use buckets of bitmaps, each holding around 8 million bits.
I know disk space is cheap these days, but at 16MB/metric/<level-of-granularity>, it seems like your metric dataset would grow pretty quick. With just 10 metrics tracked daily, thats another gigabyte per week. Of course it does come with the benefit of maintaining all the raw data since you never roll up or aggregate the data...so the pros probably outweigh that con. :)
Redis stores everything in RAM, and RAM is not as cheap as disk. Adding GB's of RAM every week will quickly get rather expensive. But I guess you could dump old data to disk and load it back to Redis only when you need it. It might even compress well, depending on what the metrics track.
Isn't Redis still single-threaded for queries, but saving in the background? That seems a little risky: you've got your 100 million users setting bits in your bitsets and suddenly everything blocks for 10 seconds while old data is being loaded from disk.
It only really needs be captured like that using redis, when the collection period is up you could store it on disk, or even just the aggregate information on disk.
The only problem with this method is that it requires that IDs are integers, start at 1 and increment by 1.
I'm using MongoDB and IDs are 12-byte values of which the first four are a timestamp. Does anyone know of a way to make this method work, ideally without adding another field to the collection?
The comments on the article address this - the OP is using UUIDs as the primary key for their users, but each user is also assigned an "analytics key" which is an integer that started at one. You can even use the redis INCR command to generate these on demand.
There is a way in mongo to replace the id with an auto incrementing number. Have a look at the docs. It's also helpful of you want to use the id as a base62 value for urls
I still think that those user numbers in the title evokes a mental image of a certain type of load with a certain type of response time. I think if you got rid of the response time, it would be less linkbaity, because then it's clear that your focus is on the amount of storage it would take. It's not very important either way.
In my case, and it appears to be true for Spool too, I don't know what bit will be set first. It could be 12 or it could be 2938251, so to prevent a slowdown if the initial bit is a high place I use buckets of bitmaps, each holding around 8 million bits.
[1] Read Warning: http://redis.io/commands/setbit