If someone would suggest a reliable, free source of realtime data that won't fall under peak load from HN, I'd be happy to replace the random walk data source with something more realistic. (For example, stock data, wind measurements, etc.) I added a sine to the random walk to make it a bit more interesting, but these visualizations are much more compelling with real data.
We're now using this for production metrics at Square - thanks to the builtin Graphite support - and it's changed everything about how we can visualize them. This is faster, more intuitive, and more beautiful than anything that rrdtool graph or Graphite's grapher can kick out. Absolutely recommended to anybody who needs to have realtime visual feedback about system performance.
My startup uses it in production. We probably send a couple million events a day and have a big dashboard with around 40 graphs on it. We also have other dashboards split out for viewing specific things. It works pretty well once we got MongoDB tuned properly. We're receiving events via UDP (which we had to add in, but is supported out of the box in the new version, I believe)
I don't think it's meant to be used as a full-on data warehouse. The closest thing people usually compare it to is Graphite.
This new library looks great; it basically lets you do fancy visualizations of time series with data from any time series database. There's Graphite import code already. Square's related project Cube includes its own datastore; Cubism.js just pulls out the visualization part to use with your own datastore.
This is interesting, I do want to ask though why you went with stacked graphs as opposed to overlaid line graphs. This makes sense when comparing disparate data points (Load vs Memory vs HTTP Requests), but are you doing this with similar data points as well?
The full screen demo is really what triggered the question for me. I've got a bunch of servers that we monitor and have found that a single graph that shows load for every server gives me a better indicator for those that are outliers. Perhaps you're just not using this for that purpose.
Would love to hear some of the use cases, are you doing all of your graphing with this?
We use it for both, but most commonly we use it to compare a single metric (such as CPU or network utilization) across hosts.
Line graphs work well when you only have a few hosts, but start to suffer when you try to plot many hosts simultaneously. Depending on how noisy your metrics are, line graphs are good for showing the envelope, but are less effective at revealing when one or two hosts are behaving oddly; the anomalies get lost in the mess of lines. (It doesn't help that the default colors in Graphite are bad; the host that happens to be assigned bright yellow against a white background becomes much harder to see.)
Small multiples give each host a dedicated row, so you don't have to worry about occlusion or distraction. However, that requires more vertical space, which is why you need horizon graphs or a similar technique to compress them. Scrolling with a vertical rule also helps detect coincident anomalies across metrics.
Thanks for the insight. I hadn't seen horizon graphs previously, they're interesting and a great way to compact data and in a way that maintains readability.
Regarding line graphs, there is some visibility loss by having a a single line graph show a ton of servers. I tend to think that deviations that are worth noting tend to stick out. That isn't always going to be the case I suppose, but nothing is 100%.
I'm just not sure how this scales if I've got 100 servers with 2-3 metrics each that I'm tracking. Perhaps this is better for more of a detailed view than a dashboard where where more density is needed. They're certainly more readable then what we use today.
Really nice graphics. It might be interesting to be able to select a region and have a pop-out that zooms into that region and maybe shows additional information. For example, showing cpu performance I would like to select a region where performance declines and get a pop-out that shows a more detailed view and includes network, disk access and memory statistics. I am sure there are many more uses though. Keep up the nice work!
Edit: Not really the purpose I know but it was just something that struck me as a possible use case.
Lots of things might not work in nightly builds. That's why they're nightly. ;) I tried the latest Mac OS X WebKit r115090 built on 24 April 2012, and it worked fine for me. If the latest nightly doesn't work for you, please let me know by filing an issue on GitHub and giving more details about your OS and version so that I can track down the problem. Thanks!
If by "amazing" means "I need an explanation to see what I'm seeing", then I agree with you.
Analytics visualizations are great when they're pretty, but if they aren't comprehensible, you've failed. In the vast (>1) user testing I've done with horizon graphs, it just isn't grokkable.
People in general don't want to look stupid, so they aren't going to volunteer a "hey, what the heck am I looking at here?" They'll just assume they aren't smart enough to get it and be quiet.
Agreed; horizon charts take some explanation. However, once you understand what they're showing, they become quite natural. That makes them suitable for dashboards that are monitored by a regular audience (say, production engineers, stock traders), since the viewer learns the first time and then benefits subsequently from the concise representation. However, I probably wouldn't recommend them for a mass audience in a one-off visualization, where the effort to understand might be beyond the viewer's patience for a single viewing.
I think most visualizations require a bit of explanation to understand. The standard rrdtool graphs in Munin, for instance, are surprisingly complicated to understand the first time you see them. And smokeping is a fabulous tool despite having a totally idiosyncratic display. Horizon charts also take a little time to get, but not much.
This appeals to an intuitive sense, but I question the validity of claiming that there exists an information dense format that does not require training to read. Do you have an example of one?
Part of the Cube 0.2.0 release: https://github.com/square/cube/wiki/Release-Notes http://square.github.com/cube/
If someone would suggest a reliable, free source of realtime data that won't fall under peak load from HN, I'd be happy to replace the random walk data source with something more realistic. (For example, stock data, wind measurements, etc.) I added a sine to the random walk to make it a bit more interesting, but these visualizations are much more compelling with real data.