Hacker News new | past | comments | ask | show | jobs | submit login
Time series database Graphite seems to be falling into disfavor (vividcortex.com)
127 points by js2 on Nov 9, 2015 | hide | past | favorite | 79 comments



Obligatory response from Graphite contributor and author Jason Dixon who is shouted out in the vividcortex intro.

http://obfuscurity.com/2015/11/Everybody-Loves-Graphite

Personal response:

I've used Graphite, OpenTSDB, Ganglia, Cacti, and a bunch more solutions. Recently, work transitioned from OpenTSDB to a hosted solution from a startup called Wavefront.

https://www.wavefront.com/

This has been a smash hit.

Scale matters. If you are a small shop, and can workaround the atrocious UI shortcomings of Graphite by all means, go for it. As you start to get larger, OpenTSDB looks attractive. We went too far down that path but were able to quickly (< 3 months) transition over to Wavefront and haven't looked back.


A closed silo seems like a distinct step backwards compared to a stack you can maintain yourself.


Read a little more into the second part of your sentence, especially the word "maintain."


Wavefront looks like it's probably closed source. Is this true?


Yes, sorry if that wasn't clear.


And do they have a self-hosted solution?

The proprietary vs open argument is less important then the argument over where the data is physically held and ownership of it.


Agreed. I built an in-house mobile crash reporting solution even though the commercial options range from decent to excellent because none of them offered self-hosted, nor a way to export your data. We really wanted the data because it's allowed us to determine the cause of crashes and answer other questions that we otherwise couldn't w/o the raw data. There were also privacy concerns, even though in theory crash reports shouldn't have any PII. And we really don't need third parties knowing our DAU numbers and whatever else in any case.

There are finally some self-hosted commercial options in this space, but we've already built our solution and it works well.


I'm building my own time-series database for some time.

https://github.com/akumuli/Akumuli

It's a standalone solution without dependencies on other services or databases. It can handle more than a million data points per second and uses fixed amount of memory to store data on disk but without Graphite's shortcomings (compression is used for everything and timestamps is stored with nanosecond precision).

I'm focused mostly on realtime time-series analysis. Akumuli has built-in anomaly detector: EWMA, Holt-Winters and sketch based methods. It can generate different time-series representations, like SAX and PAA. DTW and correlation search is in progress now.


I've been in search of a good time-series database solution for some time now, and have pretty much given up on it and am in the process of rolling my own: https://github.com/grisha/timeriver

My issues with the present state of TS isn't the volume. I was looking for using TS outside of the DevOps world. Everyday things like your heart rate over time, price of gas at the nearest station, number of people in line at your coffee shop, etc. All these are interesting, and Graphite/RRDTool/InfluxDB/etc did not seem like appropriate storage, because to use it with your other data (which is most likely in a relational DB of some kind) you need to export/import it and who wants that.

I call this problem "data seclusion". When data exists in some kind of an incompatible format (e.g. Whisper files), it will end up ignored because of that extra conversion step necessary to link it with your other data. Data in Graphite and such is mostly good for generating charts, but TS analysis is so much more than that, even at its simplest.

I think that the good old relational database is fine storage for TS and we gave up on it way too early, especially given what's new in PostgreSQL. Making it horizontally scalable, distributed, using consensus protocols, etc - these are not time series problems, these are database problems and we do not yet have a good solution for these. (We have many that "kind of" work, support some features but not others e.g. Cassandra). Projects like InfluxDB are mired in solving the wrong problem which will eventually get solved at the DB level.

More thoughts on the subject: http://grisha.org/blog/2015/03/28/on-time-series/ http://grisha.org/blog/2015/09/23/storing-time-series-in-pos... and http://grisha.org/blog/2015/05/04/recording-time-series/


Can't agree more. Time-series data analysis should be considered the most important, not storage. Storage is a solved problem. Let's imagine that we're getting ECG data from many patients. What most time-series databases is able to do with this data? They can draw a nice graph. OK. They can resample this data. Well.. this is mostly meaningless for ECG. Can graphite/opentsdb/whatsoever find specific patterns indicating a disease (for example T-wave inversion)? No, but this is OK, because they're not designed for this. We should build new tools to solve new problems, not the old ones (like monitoring in DevOps).


> Can graphite/opentsdb/whatsoever find specific patterns indicating a disease (for example T-wave inversion)? No, but this is OK, because they're not designed for this. We should build new tools to solve new problems, not the old ones (like monitoring in DevOps).

You know, that sounds like exactly the kind of thing you'd want to do in a DevOps world (leverage your data and new ideas to predict problems).

So, you're not just right that we need to look at timeseries because there are many interesting new problems that we might solve with them, but also because there are still many problems we want to solve in DevOps.


I mean, that's sort of a "solved' solution. DevOps (well they were called greybeard Perl developers, but they effectively filled the role of devops 20 years ago) data munged with Perl and Unix tools to 'leverage' your data (i.e., everything from simple transforms, all the way to complicated aggregate analysis. Did you know you could topologically build a di-graph with an out of the box default UNIX tool 'tsort'? Yeah you can).

It's kind of now accessible to the masses (previously you'd have to pay SAP a few million for BusinessObjects then EMC another million and a half for enough storage to setup a DW), but open source tools have been available for at least a decade. Now it just has a fancy new marketable name "predictive analysis" that Oracle et al can charge an extra couple million for with their price-gouging RAC licenses.


I'm responsible for a solution at $dayjob where I ended up feeding the data into multiple sources. I basically have incoming "events" and they get fed into Sentry (which in turn stores them in MySQL and Redis), InfluxDB (to use with Graphana), and soon Elastic Search (to use with Kibana). Each of these just provides a different way to visualize the data.


I find it confusing that these solutions seem to conflate time series storage/querying with visualization. Is there a reason for this?

I recall that Kibana is a plugin for Elastic Search but don't know about the others. Isn't it possible to connect a hypothetical standalone visualization frontend to any TSDB?


Grafana is sort of that. It's a visualization tool that can use a bunch of different TSDBs as the backend. I use it with InfluxDB to track and plot sensor data from my home IoT setup. Basically all sensors (primarily temp because it controls my thermostat) are connected to an isolated WLAN and broadcast do their readings via MQTT. A process on the server collects them and stores them in Influx, which Grafana can use to make nice plots. It's quite easy to use actually.

The TSDBs I tested when I was building this setup (Graphite, OpenTSDB and InfluxDB) all had some sort of basic plotting facilities, but they were extremely minimal and all pointed towards Grafana if you wanted something more substantial.


> sensor data from my home IoT setup

Have you documented your setup anywhere?

I'm DIY-ing something similar and I'm always curious how others have approached things.


I haven't, unfortunately. I'm planning on doing it eventually, but it'll probably be a while. The tl;dr though is a RasPi hosting a local-only wifi network and Mosquitto + ESP8266's everywhere. I was originally going to go for Arudino+NRF24 wireless, but ESP8266 is cheaper, smaller, longer range, simpler and still compatible with the Arduino library.


Kibana 3 and earlier was a plugin for Elasticsearch. However, Kibana 4 is a stand alone application. Also, Kibana 4 uses Elasticsearch aggregations instead of Lucene Facets.


Thanks for your thoughts Gregory!

We're also in the process now of finding a good storage solution for time series data at my dayjob. We're not storing any server metrics, but more personal user health data and related metrics. So we want the flexibility of being able to write new metrics on your personal timeline without altering a schema. Most reads would be querying single users timeline to fetch their data and deliver through APIs for rendering in clients. Also of course to do analysis on all users timelines, find correlations etc. but those queries are less frequent and not so time critical. I'm starting working on a prototype now with MongoDB. Other DBs that have come up has been InfluxDB, Riak TS, Amazon DynamoDB and probably some others I don't remember. Haven't actually thought that much about Postgres, but thanks for your links to your blog posts I will read up on how Postgres might work. Most other non-time series data would still be in our MySQL setup.


I highly recommend looking at Keogh's work, particularly the iSAX and iSAX2 stuff: http://www.cs.ucr.edu/~eamonn/SAX.htm

While it is used for machine learning, it actually makes a lot of sense to follow a similar approach for more general time series applications.


I'm familiar with SAX, it's remarkably powerful at discerning patterns and identifying similarity between series, good stuff, thanks for the link.


SAX requires different type of storage. You can store SAXified time series in ElasticSearch, or Solr but time-series database doesn't fit for this. Time-series databases should be able to generate SAX representation.


> You can store SAXified time series in ElasticSearch, or Solr but time-series database doesn't fit for this.

That's making some presumptions about the time-series database use case. SAX is convenient for storing and retrieving the data as well as identifying trends or recurring behaviour. What more do you need?


You can't retrieve original time-series data from SAX storage because of normalization. To query time-series data by content (approximate 1-NN, motif discovery, etc) you need inverted index.


The normalized data is the index. It can still be pointing to the raw data. The nice thing is that the index organizes the data in a way that makes it easily (lossless) compressible.


> It can still be pointing to the raw data.

Yep. Each SAX word should be mapped to the list of seriesid:timestamp pairs. This list is often referred as postings list in information retrieval. The resulting data-structure is an inverted index. SAX and iSAX papers describes inverted index variant (really bad one) based on folder structure but one can use convenient IR tools for this.


iSAX2 yields a much better inverted index style model.

The thing is, particularly with time series data, a lot of times it is sufficient to at least start with the summary data in the index.


I actually love Graphite's API. I've yet to find another solution that is as easy to use and has as many available functions (http://graphite.readthedocs.org/en/latest/functions.html).

For that reason we still use graphite's API, but not the UI or datastore. Whisper nor whatever the newer one is would survive the load we put on it.

Our setup looks like:

grafana -> graphite-api -> graphite-influxdb -> InfluxDB <- statsite (C impl of statsd) <- metrics


Is it the case even with the latest Grafana (2.5) that you still get more functionality by running graphite-influxdb as a middleman?

I have Grafana 1.9 pointed directly at InfluxDB 0.8 and here's one thing you _can't_ do: group a series by X, then plot only the top Y groups. Is that the sort of thing that using graphite-influxdb provides?


There are a lot of things missing from Influx's query language. The most glaring issue is that you can't do 'multi step math' e.g. you can't do something like 'average series A and series B into 5min buckets then sum them'.


Try out Prometheus, you can get it up and running a few minutes. It supports all the functions Graphite does, bar sin() and the Holt Winters functions which we plan on adding.


What does a modern metrics stack look like? There are so many words... Graphfana, Kibana, StatsD, Graphite, etc. Which bits should I choose and attach together?


I've got to advise against Kibana/Elasticsearch until the company gets more mature. I was 100% on board with them up until the redesign.

Not so much for the design (oh look it's white instead of black, who cares), but the way they've handled it subsequently.

The 3.x -> 4.x transition for Kibana left a product that was missing really basic features (like, the ability to set graph colors for one - a bug/feature request that's been open since last january).

As it stands, upgrading from Kibana 3 to Kibana 4 is a step backwards. You lose functionality rather than gaining it.

They also decided to require a major version bump to Elasticsearch (to 2.x) with a point release of Kibana.

I used to be super optimistic about the Elastic guys, but some of these decisions are just head-scratchingly awful.


After many years of using Splunk at my job, switching to ELK for personal project use was quite a disappointment for me. Of course, ELK is free and Splunk is very expensive, but I was still surprised at the gap.

This might be derailing the thread a bit, but is there any log management platform like ELK or Splunk that has an expressive and versatile query language like Splunk's? My biggest issue with ELK is that analytics is mostly expected to be done through Kibana's GUI, while with Splunk you can craft terse queries to do almost any sort of transformation and visualization imaginable. I don't like how ELK is so GUI-oriented.


Not sure what kind of queries you want to do, but Elasticsearch has a fairly extensive Query DSL that'll let you do all sorts of aggregations:

https://www.elastic.co/guide/en/elasticsearch/reference/curr...

There is a Python implementation that makes creating complex queries pretty easy:

https://github.com/elastic/elasticsearch-dsl-py

I agree with the criticisms of Kibana, but I have had no problems querying Elasticsearch directly. It also supports scripted queries if the built-in aggregations aren't enough.

Of course, then you have to build your own visualizations with the results...


I'm not sure I'd consider that a DSL. It's unwieldy to write a full JSON object for every one-off query I want to do.

Using one of their examples, this:

    {
      "query": {
        "filtered": {
          "query": {
            "bool": {
              "must": [{"match": {"title": "python"}}],
              "must_not": [{"match": {"description": "beta"}}]
            }
          },
          "filter": {"term": {"category": "search"}}
        }
      },
      "aggs" : {
        "per_tag": {
          "terms": {"field": "tags"},
          "aggs": {
            "max_lines": {"max": {"field": "lines"}}
          }
        }
      }
    }
would be the following Splunk query:

    title=python description!=beta | stats max(lines) by tags
It would be nice if there was some kind of query compiler that could generate ES JSON from an expressive query language.


The linked python dsl does pretty well. Not as nice as your reduced dsl but.

  s = Search(using=client, index="my-index") \
      .filter("term", category="search") \
      .query("match", title="python")   \
      .query(~Q("match", description="beta"))


https://github.com/NLPchina/elasticsearch-sql/

does exactly that with a reasonably good subset of SQL and ES query language mixed in.

It operates in 2 modes; in one it runs the query, in another it spits back out what the equivalent ES JSON query is. We use this as a quick prototyping tool and modify the ES query as needed, as most of us here still "think" in SQL for a lot of things.


I know, right? I recently did a pilot project with a Splunk competitor called Jut, that was totally query language oriented. It was great, hugely refreshing after Kibana. They're planning on going open source, too, so maybe the language will even survive.


There are also some serious unresolved issues in the current versions of the ELK stack bits, such as https://github.com/elastic/kibana/issues/5170, https://github.com/elastic/kibana/issues/1961, https://github.com/elastic/logstash/issues/3440. Even with all that though, it's still an incredibly useful tool for log analysis. It's a much different scenario than what one would use statsd, graphite, etc. for.


Kibana 4.x is atrocious. In addition to generally being as heavy as a ton of bricks.


I've been using the latest Grafana release as an alternative to Kibana for visualizations. It now supports Elasticsearch as a db backend. Seems much faster and more stable than Kibana 4.


Belated thanks for pointing this out! Grafana is now getting some serious play here, and it would get more if I could figure out how to make the term searches work :)


Kibana is in incredible heavy weight in comparision to other dashboards like e.g. Grafana. Kibana loads like 4MB+ of JS and data. It is designed for intranet usage where your Elastic Search cluster is near you.


Prometheus is really the answer here.

http://prometheus.io


> What does a modern metrics stack look like? There are so many words... Graphfana, Kibana, StatsD, Graphite, Prometheus etc. Which bits should I choose and attach together?


Kibana is more logging, but Promethues covers the StatsD and Graphite requirements out of the box. Graphfana is one visualisation option to go with that (http://www.robustperception.io/setting-up-grafana-for-promet...).


Prometheus with grafana for visualization. I've got some ideas about trying to run automated correlation queries for peaks and the like but they're probably above my skill level.

I have had some decent luck with sending syslog into mtail and counting generic word events like 'error' and 'warning' as a way to do "something might be wrong" alerting.


Grafana + InfluxDB are the simplest to get up and running. Literally two binaries that you download and run, that's it. Can't beat the simplicity of this set up and you can get a lot done with it, if you are just starting your metrics infrastructure I can recommend it.


With http://bosun.org (Stack Overflows alerting system: Expressions, Notification Templates, Historical testing etc) we have hedged our bets by adding in different query functions to the expression language. Currently it can query:

- OpenTSDB

- InfluxDB

- Graphite

- Elastic (Expects to be populated by logstash)

OpenTSDB was the original time series database so that has some extra UI features compared to others for graphing. My hope is that InfluxDB will mature to the point in availability that we can use it and ditch the hbase dependency. But Currently OpenTSDB is the best option for us.


If you prefer running an open-source solution yourself, Prometheus (http://prometheus.io/) addresses exactly those points and works especially well in a dynamic cloud / microservices / container scheduler world.

It has a dimensional data model, a powerful query language to go with it, and covers aspects from instrumentation to storing data, all the way to alerting and dashboarding. The latest version of Grafana has native Prometheus support now too. Many tools (like Kubernetes or etcd) already export Prometheus metrics natively, so you can monitor them with Prometheus right out of the box. Support for many kinds of service discovery (Kubernetes, Marathon, EC2, Consul, ...) make it work very well to monitor dynamically scheduled services as well. Disclaimer: Prometheus author.


I chose InfluxDB about a year ago because it was the only option that allowed me to put each individual event into a time series and then do the grouping/counting after the fact. The prometheus docs used to say this it wasn't suited to that use case. Is that still the case?

edit: yup, the prometheus docs indicate InfluxDB is better suited to this use case.

http://prometheus.io/docs/introduction/comparison/

Aside, I really appreciate this comparison page and the prometheus docs in general are well done.


Yes, Prometheus is fundamentally a store for numeric time series (with a set of dimensions attached), not a store for individual events or log entries.


It's buried about 10 paragraphs down, but it should be noted the author is a Graphite competitor.


It looks like they (deliberately?) didn't mention the open source projects that were specifically build to address the shortcomings of graphite, like prometheus or opentsdb.


The author most likely didn't mention them because they suffer from the same limiting assumptions that graphite does. When you're dealing with multidimensional changing metrics, opentsdb has the same issues graphite does.


Graphite's data model doesn't support multidimensional metrics which is in my opinion it's biggest shortcoming. To support multidimensional data, a suitable data model is needed. AFAIK influxdb stores all label dimensions and the value for each data point. No matter how high your cardinality is, the storage requirement is the same. If you need to have, let say, a dimension 'client_ip' in a metrics http_response_time, that's a reasonable model. I'd argue though that outside analytics/data warehousing such high cardinality is rarely needed. That's why prometheus (can't speak for opentsdb) stores the dimensions identifying a metric (metric key/label pairs) once. For all data points it just needs to store the value which makes reading and writing less expensive. Now however VividCortex stores it's metrics, it has to choose from similar trade offs. Sure, they host it for you - still, there are open source project out there addressing exactly those issues.


Though I'm a Prometheus author, I need to defend InfluxDB here :) Since 0.9.0, they support tags (vs. fields), which store indexed dimensions similarly to how Prometheus does it. So this argument doesn't count anymore. Still, there are many other differences in scope and functionality between the two systems.


So influxdb is now ready to be used as prometheus long time storage?


OpenTSDB may be broken in the same way, but Prometheus isn't.


I dunno. It really rocks. And we were so envious that Linux folks have Graphite/StatsD that we've ported it to .NET/Windows: https://bitbucket.org/aeroclub-it/statsify


Graphite is ugly. Graphite is hard to use beyond just clicking a few graphs. Graphite requires some hard work to extend. But I am still using Graphite because

1) it is simple to set up!

2) nice to have a fixed-size flat file as database, although performance degrades very quickly. Cache misses is too frequent.

3) over raw socket is also quite attractive

No other solution can compete with Graphite for its simplicity. But there is so much more than just sending data to graphite from collectd, or your custom Python program....

Whenever I see OpenTSDB I sigh. Do we really need H technology here? Yeah I work for a small shop but do I really want to maintain OpenTSDB....when I already have so many databases? I am choosing between Cassandra and PostgreSQL for my TSD.


At scales when it starts breaking one doesn't directly consume the metrics. In our setup of over 500K metrics/min, we had scripts consuming app metrics from graphite and pushing them back into a different, higher hierarchy which was then consumed by graphiti/grafana.

The Graphite URL API was visionary, and still makes the cut in most common use-cases.


The UI leaves a lot to be desired; there could be better tooling surrounding metrics selection (maybe like XPath?) for making rollup dashboards. Also, saving/editing the dashboard could use a better UI, but that's only like 10% of the time I put into the software.


Live by the sword, die by the sword.

It is kind of amazing how hard it is for existing time series database systems to track the changing needs of the marketplace.


You mean for open source time series database systems run entirely by volunteers in their spare (personal) time? I know, right? What jerks.


What? I'm not saying they are jerks. I'm pointing out that it is harder for existing projects to track the changes in the market place. It is hard for a new open source project to launch and build up enough of a community to be viable.


Can some one change heading so it is understood as software and not actual Graphite


Ok, we changed the title to use representative language from the article.

Btw, this article was heavily flagged. It's not really legit to flag a story just because people don't like the title. Plenty of good stories have problematic titles. Depriving others of a chance to read the content, especially when there's a good discussion going on in the thread, is a bad use of flagging power.


I didn't flag it, but perhaps others did because the author is a Graphite competitor, yet this isn't disclosed until the end.


It's not like this is a Medium post or something, it's on the company blog.


Not a lot of useful information for such a long article.


Agreed. I clicked because I was curious why people stopped liking the carbon form.


I had assumed it was this Graphite: http://scripts.sil.org/cms/scripts/page.php?site_id=projects... , probably because I'm a font and writing systems geek.


I also thought this was the physical substance graphite.


Seriously


Here I was thinking I would read some informative article on graphite, graphene and related carbon derivatives. Dear programmers: pick better names.


Naming is a hard problem in computer science :)


Like they say, the two fundamentally difficult problems in CS are cache invalidation, naming things, and off by one errors.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: