Time-Series Database Requirements

damian2000 · on March 9, 2015

Anyone heard of Operational Data Historians? Like OSISoft's PI, Honeywell's PHD, GE's Proficy. They're expensive suites of software that have optimal real-time ability, plus historical access. They usually work in process control/operations of factories or manufacturing plants. Each item being measured is called a tag.

Just thought I'd throw this out there since its a specialised area that not many people know about. I've done some work with them in terms of writing adaptors to a time series data visualisation product.

http://en.wikipedia.org/wiki/Operational_historian

http://en.wikipedia.org/wiki/OSIsoft

https://www.honeywellprocess.com/en-US/training/programs/adv...

http://www.geautomation.com/products/proficy-historian

On the topic of Historians vs Relational Databases, theres a blog post here about it ...

https://osipi.wordpress.com/2010/07/05/relational-database-v...

... admittedly this is by the developer OSISoft so it may be biased, but their points seem valid. Especially the swinging door algorithm reference and the fact they are far more efficient in storage.

snowwindwaves · on March 9, 2015

I just installed and threw 200 tags at the rockwell factory talk branded version of osi. I always wanted to work with it but unfortunately a "historian" was just one of many deliverables so I only got to do the bare minimum that ticked the box. It is noticeably faster at retrieving data from the historian than the rockwell factory talk SE HMI binary datalog files though.

digitalzombie · on March 9, 2015

Cassandra seems like a good fit.

Writes are faster than read, it's an AP, and you shouldn't really update frequently it unless you want tombstone hell. There's also TTL too.

Is there any cons of using Cassandra as a Time Series Database? I'd like to hear it.

The biggest thing for Cassandra is you should know your queries before hand before you data model.

kodablah · on March 9, 2015

I too believe it is the best fit, but the "aggregate functions" gets most people. The use of counter columns is very limiting and many engineers don't want to struggle with storing state during streaming writes to precalculate the aggregates on write. Also, engineers tend not to want to do large reads to rebuild large aggregate values on small data changes.

It is what we use, and we use spark streaming for the rollups. We had evaluated Influx, OpenTSDB, and Druid also. So long as you know the exact read patterns for your client I think Cassandra is definitely the best fit for most things.

salex89 · on March 9, 2015

Did you try out the Cassandra-backed KairosDB? I'm very interested in the results if you did.

kodablah · on March 9, 2015

I did not. The Cassandra schema appears similar. We had enough custom needs for how we aggregated data (e.g. ewma) that we probably needed to do this ourselves anyways.

dmichulke · on March 9, 2015

I also use cassandra heavily.

From what I read (so not confirmed) one problem is that it uses space inefficiently. Since I predefine the columns anyway, they might as well use an efficient storage instead of mongodb style kv-pairs.

I am also not convinced of the partition key necessity (although it doesn't hurt either once you got it).

Finally, since my application runs on the JVM, I'd actually like to see a direct integration / an API that allows me to skip the socket overhead and launch cassandra directly on start-up. The advantage is mainly memory (just need half of it) and latency although I agree this is more difficult to maintain in a distributed scenario.

otterley · on March 9, 2015

Cassandra is the de facto backing store for pretty much every metrics service out there - Datadog etc.

obstinate · on March 9, 2015

Multi-dimensionality optional/drawback? Strongly disagree.

This may be my experience as a Googler talking, but I also somewhat disagree with the notion that the data can't fit in memory. I operate a X0,000 task service, and our monitoring data could fit into memory in a large server, if need be.

Of course, we don't keep per-task data permanently, that would be prohibitive at a 5s monitoring interval like the one we use, even if it were put on disk. Instead, we accomplish what I described by aggregating away some dimensions, mostly task number, and then holding the aggregated series in memory, for fast queries. There are some nuances, particularly around having foresight over the cases where you do want to see individual tasks.

But suffice it to say that this person's experience does not match mine in terms of what I need from a TSDB. Perhaps his ops background comes from a different set of needs than mine, but if you're building a TSDB for many customers, I wouldn't take this list as gospel.

devdas · on March 9, 2015

If I need a few terabytes of data to be stored reliably enough that I can lose a node (or two), support mostly writes and some reads, for a duration of a few years (at least three), would you still suggest storing this in RAM?

obstinate · on March 9, 2015

My team keeps replicated copies of the data in RAM, replacing nodes that get lost. The data is also echoed to disks, of course.

k1w1 · on March 9, 2015

I built exactly the database that you are describing here. Unfortunately it is not open source. However it is no secret that we used sqlite for the storage. Inserts and queries in sqlite are fast, even for databases with billions of rows. Deletes are very, very slow, so we created a new sqlite database for each week, and simply deleted an entire database file when the retention period expired. Sqlite is ACID, supports all of the complex queries that could be imagined and is easy to embed into the overall engine code base. We use aggregate tables to more efficiently display common aggregations of the data (e.g. graph metric over the last day, or average metric over the last date). The aggregates were updated as data was inserted, so that real-time views were always available.

biokoda · on March 9, 2015

Sounds like you implemented something similar to http://www.actordb.com/

thrownaway2424 · on March 9, 2015

"Reads need to be fast, even though they are rare. There are generally two approaches to dealing with this. The first is to write efficiently, so the data isn’t read-optimized per-series on disk, and deploy massive amounts of compute power in parallel for reads, scanning through all the data linearly. The second is to pay a penalty on writes, so the data is tightly packed by series and optimized for sequential reads of a series."

As the little girl says in the GIF: why don't we have both? Write to a write-optimized store of limited size that requires full access during reads, and re-write that into a read-optimized format hourly or daily. Because it's limited in size, you won't care that the most recent data isn't very efficient for reading, or isn't particularly compact.

manigandham · on March 9, 2015

Recently came across Prometheus (http://prometheus.io)

There's also OpenTSDB (http://opentsdb.net) that's been around for a while.

otterley · on March 9, 2015

Prometheus has no scaling model at this time. Sharding is currently described as an exercise for the user. This surprises me if they're actually still using it at SoundCloud.

Filligree · on March 9, 2015

Why do you need scaling? A single machine should be enough to monitor thousands of others.

Or, in short, if you do need scaling you probably can afford to implement it.

devdas · on March 9, 2015

Because you end up being write limited, and there is only so much you can do with SSDs.

otterley · on March 9, 2015

Not if you ever want to read the data back and aggregate it.

rodionos · on March 9, 2015

I'm one of the developers behind Axibase Time-Series Database which runs on top of HBase. ATSD is two years into development and has a built-in rule engine, forecasting,and visualization: http://axibase.com/products/axibase-time-series-database/vis.... The rule engine allows you to write expressions such as abs(forecast_deviation(avg())) > 2.0 to trigger url/email/command actions if sliding window average is outside of 2.0 sigmas from Holt-Winters/ARIMA forecast.

The license is commercial and there's a free CE version which can be scaled vertically without any throughput constraints. Tags are supported for series as well as for entities and metrics to avoid storing long-term metadata such as location, type, category etc. along with data itself.

I wouldn't be surprised if functional differences between TSDBs and historians will disappear in just a few years. Right now the historians are good at compressing repetitive data at source and on disk which makes sense given their heritage in archiving data from SCADA systems.

welder · on March 9, 2015

This sounds like a job for http://www.aerospike.com/

nivertech · on March 9, 2015

How would you model timeseries on top of Aerospike?

Using 1KB chunks of datapoints as simple K/V?

Or using Large Data Types [1] like Large Stack [2]?

    Large Stack
    
    A Large Stack collection is naturally aligned with time series data
    because the stack preserves the insert order.
    Stacks provide Last In First Out (LIFO) order, 
    so it's a convenient mechanism for tracking "recent" data.

    Usage examples include:

        Show me the last N events
        Search the last N actions with a filter
        Show me all documents in reverse insert order

[1] https://www.aerospike.com/docs/guide/ldt.html

[2] https://www.aerospike.com/docs/client/java/usage/ldt/ldt.htm...

chupy · on March 9, 2015

Ok...but does it really? What made you say that?

Rapzid · on March 9, 2015

I would actually be very interested in hearing more about this MySQL implementation.

preetamjinka · on March 9, 2015

I work with Baron, the author. He gave a talk recently at SCaLE[0] about our time series storage using MySQL. The slides[1] are also available in case you don't want to watch the entire talk.

[0] http://youtu.be/EoUfkkrIbPg

[1] http://www.slideshare.net/vividcortex/scaling-vividortexs-bi...

Rapzid · on March 9, 2015

Thanks very much. I have experimented with used Riak as a backend with its search 2.0(lucene) for indexing the streams. I'm very interested in how the MySQL tables are laid out; On the watch/reading list for tonight :)

Rapzid · on March 9, 2015

I'm wondering if it has been considered to buffer the metrics into a memory table so they can participate in queries, and then let MySQL flush them to durable tables at regular intervals?

preetamjinka · on March 9, 2015

My personal opinion on that is that it's another layer of complexity without too much to gain. It's something else to manage (operational overhead), and something else to interact with (programmer overhead).

MySQL by itself is plenty fast for us. As Baron mentioned, we don't even run the database servers at full capacity.

nimish · on March 8, 2015

Druid is quite a good one

zodvik · on March 9, 2015

How has your experience being using Druid? It seems like a good fit for our use cases, but there doesn't seem to be much of a community around it.

aaa667 · on March 9, 2015

Would be keen to know what people think about: https://github.com/ambiata/ivory

sethev · on March 9, 2015

I'm kind of curious about the no "tagging" point. Won't there always be some data that doesn't fit the [timestamp double] format?

preetamjinka · on March 9, 2015

We don't use tags, but we basically encode tags into the metric name of our time series. An example would be "host.queries.c.1374c6821ead6f47.tput". This tells you the category, type of query, query ID, and the metric associated with it. In this case, it's the time series for the throughput of the query with ID 1374c6821ead6f47.

philsnow · on March 9, 2015

If you have multiple tags, do you canonicalize the ordering of the metric name? If so, where does that canonicalization happen (on the metric origination side or on the server/collection side)? It would seem to be a pain if you end up with metrics named

    host.queries.type=GET.result=200.tput

and

    host.queries.result=200.type=GET.tput

preetamjinka · on March 9, 2015

Good question. We do have a specific ordering, and that happens when the metrics are generated. Again, we don't really treat them as tags for a specific metric, but rather unique metrics. Having a different name means it's a different metric.

doodlebugging · on March 9, 2015

This might not be what he's looking for but it is one way to manage huge amounts of data:

[HDF5](http://www.hdfgroup.org/)

rspeer · on March 9, 2015

HDF5 is mostly a storage format. It's not what he's looking for because it doesn't meet the requirements of a database.

For example, only one process at a time can open an HDF5 file for writing.

hartror · on March 8, 2015

This smells like a job for Apache Kafka [1], I've yet to use it personally but its feature set appears to hit the mark though it lacks SQL. The application described sounds like it uses something similar to event sourcing [2] which people have used Kafka for successfully. If you're not familiar with Kafka there is a very good interview with Jun Rao [3] on se radio.

[1] http://kafka.apache.org/

[2] http://martinfowler.com/eaaDev/EventSourcing.html

[3] http://www.se-radio.net/2015/02/episode-219-apache-kafka-wit...

otterley · on March 9, 2015

Kafka is a message bus (for data in transit), not a store for data at rest. Sure, you could abuse it as an append-only log-structured primary data store, but then again, when you have a hammer, everything looks like your thumb, I guess.

Also, as another poster said, it has no indices, just an offset pointer.

preetamjinka · on March 9, 2015

Kafka is great, but it doesn't solve the problem the author describes since you can't specify an index. Data arrives in (timestamp, metric) order, but it needs to be indexed by (metric, timestamp).

jat850 · on March 9, 2015

Would Apache Samza fold in here better, perhaps?

preetamjinka · on March 9, 2015

I don't know much about Samza, but I don't think stream processors are what we (I work with the author) are looking for. We don't really have a lot of "stream processing" to do, and aggregate functions are usually computed on-demand. Also, still have to put results from the stream process somewhere, right? Back into Kafka is something people do, but we still need indexing capabilities. As the post mentions, we use MySQL for time series storage, but we also use Kafka in front as a durable log.

jat850 · on March 9, 2015

You're welcome to email me if you like - it's in my profile. Kafka and Samza are intended (generally speaking) to go hand in hand. Samza is a re-imagined datastore that Kafka can shuttle data into. I've been investigating Samza quite heavily specifically for time series data storage. I'd be happy to share thoughts.

andrioni · on March 9, 2015

While I'm not using Samza, Spark Streaming also works pretty nicely in this case, although it is not so focused on keeping state (it can, though, using the checkpointing system and the `updateStateByKey` transformation) and thus might not perfectly stable if you require to handle failures without reprocessing.