Yeah. We're starting to contemplate a new monitoring/performance/etc. system, an...

jedberg · on Dec 9, 2015

We did something similar at Netflix. We had all the aggregations but also stored all the raw data. The raw data would be pushed out of the live system and replaced with aggregates and then stored as flat text in S3. If for some reasons you needed the old data, you just put in a ticket with the monitoring team to load the data back into the live system (I think this is even self service now).

The system would then load the data from S3 back into the live system via Hadoop. Turns out it was pretty cheap to store highly compressible files in S3.

halayli · on Dec 9, 2015

I've built a similar system as well. Raw data compressed and stored in s3, but aggregations are stored in postgres. Data stored in Postgres is compressed binary representation of a histogram and I added few C functions in postgres to do things like select x,y, histogram_percentile(histogram_merge(data), 99) group by x,y etc..

amluto · on Dec 9, 2015

My company stores all raw data for pretty much everything forever (and, in finance, there are a lot of things). It's binary, compressed with xz, and stored in several places. The grand total Glacier cost is something like $50/month.

(Do NOT, for the love of your sanity, emulate this design exactly. Use Backblaze or Google's nearline storage. Do not touch Glacier at all if you can avoid it. When I wrote the Glacier integration, I did it because it was far cheaper than its competitors. That's no longer the case.)