Is Time Series Database a Thing?

detaro · on March 31, 2015

Without benchmarks this post seems a bit strange. I don't think anyone doubted that you can put that kind of data into a traditional database, the interesting question IMHO is how efficient it is in comparison.

jnazario · on March 31, 2015

i concur.

while it's likely i've implemented my SQL horribly, i can say that after a few days of millions of hits per day in my time series database, searches became horribly slow and interactive became unresponsive. in my case it was a set of botnet sinkholes that i was recording.

so yes you can, but on the high volume side of things (for some cutoff of "high volume") it falls over pretty dramatically and continues to degrade.

time series data has a few unique properties that a full SQL solution doesn't optimize around, like write-once/read-many. a purpose-built TSDB solution is built for this.

mborch · on April 2, 2015

The article states the opposite: that it's write-heavy.

The difficulty in managing time series data is that you need to do roll-ups and generally avoid doing the same work twice - that is, read the same rows over and over again.

If you're doing the same work over and over, it's always going to be slow. Don't do that! InfluxDB could presumably be built on top of PostgreSQL. It just manages the data lifecycle. But that would be a polyglot mashup project then and not something you could sell to VCs.

gtrubetskoy · on March 31, 2015

Yeah, but is that really TS-specific? High volume is high volume, regardless of the type of data, and to address this you may need to use something like a fast key-value store.

jnazario · on March 31, 2015

maybe i could have shoehorned it into a KV store and done range queries, but again this was stuff like "timestamp, srcip, srccc, srcasn, eventid". the main vector is a timestamp, and every query has a timestamp range associated with it. these are written once, never updated. other data stores don't optimize for those parameters.

gtrubetskoy · on March 31, 2015

I'd argue that it would as efficient is it gets if you use the database properly. You might be able to squeeze more performance out of something "thinner" like one of the key-value db's out there, but sooner or later you will need to scale horizontally. This is where something like Cassandra is more attractive, but it lacks a lot of typical db features, e.g. aggreagations.

beamatronic · on March 31, 2015

I was surprised there was no mention of NoSQL in this at all. Surely NoSQL can be part of a time series database solution.

gtrubetskoy · on March 31, 2015

I just don't like the term "NoSQL". I do mention Cassandra, Redis and InfluxDB - which are all "NoSQL" if you will.