Hacker News new | past | comments | ask | show | jobs | submit login

I know disk space is cheap these days, but at 16MB/metric/<level-of-granularity>, it seems like your metric dataset would grow pretty quick. With just 10 metrics tracked daily, thats another gigabyte per week. Of course it does come with the benefit of maintaining all the raw data since you never roll up or aggregate the data...so the pros probably outweigh that con. :)



I was thinking that too, but the 16m is keeping track of data for 128m users. Assuming you don't have that many, the number is potentially a lot less.

2 million users' actions could be tracked in 250k per metric. 10 metrics per day is 2.5m per day x 7 days is back to just over 16m (17.5m).


Redis stores everything in RAM, and RAM is not as cheap as disk. Adding GB's of RAM every week will quickly get rather expensive. But I guess you could dump old data to disk and load it back to Redis only when you need it. It might even compress well, depending on what the metrics track.


"But I guess you could dump old data to disk and load it back to Redis only when you need it."

Redis has a mode which does this automatically I believe (and it's the default if I remember correctly).


Isn't Redis still single-threaded for queries, but saving in the background? That seems a little risky: you've got your 100 million users setting bits in your bitsets and suddenly everything blocks for 10 seconds while old data is being loaded from disk.


The "virtual memory" feature is now deprecated, I think.


It only really needs be captured like that using redis, when the collection period is up you could store it on disk, or even just the aggregate information on disk.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: