At any given time, there are a near infinite number of things your business "needs" to do: add a dashboard, A/B test your landing pages, test your database backups, fix your security holes, improve your customer service, add more metrics tracking, etc, etc. The list goes on and on.
A big part of running your business involves looking at this list and picking the ones that make sense for your business, not what the blog-post-du-jour is telling you.
Those are all good things but to me there's only one thing on your list that can instantly end just about any company with no way to recover. That's not having working backups. I would always put that one first!
I'd argue that having a well illustrated overview of your whole company helps you prioritize this list.
So put business and technological metrics on that dashboard. And then you can decide whether you should be focusing most on customer support, A/B testing, or that network bottleneck.
I won't disagree, but my main point is that everything has a cost. These blog posts often present what your business "needs" to do in a context-free manner.
If you are sitting on your thumbs looking for something to do, then definitely add a dashboard. If you're like me and have a backlog a mile long, maybe a dashboard isn't the best way to spend your time.
> "maybe a dashboard isn't the best way to spend your time."
The point of the dashboard is that when you have one, you have actual accessible data from which you can plausibly determine what is the best use of your time.
e.g. If you can see that conversions are already coming in faster than the database is scaling then you know that you need to solve the scaling problem before you worry about A/B testing. Or vice-versa. You don't need to guess (as much).
That's a rather simplistic view and a slippery slope.
Spending time on a pretty dashboard is one of the easiest ways to get drawn into vanity metrics and fapalytics.
Proper monitoring is important, but that usually means completely different things for different departments.
The CEO doesn't need to see database performance charts because he can't make sense of them anyway. Likewise the admin doesn't need to see conversion charts because they are not relevant to his work.
Trying to squeeze unrelated data into a common dashboard often leads to false correlations and entire herds of shaved yaks.
You provided a nice example for a false correlation yourself: The conversion-rate almost never relates to database performance in any meaningful way.
I think the biggest benefit to the dashboard is (compared to something like A/B testing) it is much less iterative, so with the exception of adding another graph here or there, when it's "done" it's done. If you have the dashboard at all, it continues being useful. I've never heard anyone say, "Finally! We're done with our A/B testing! On to the next to do item..."
I think it can be argued that a dashboard isnt as great as this post makes it sound. The implication is that there's one person or one team looking at the dash and making sure things are running smoothly. Instead you can break up that big dashboard into a series of smaller ones each being watched over by the specific team for which each dash applies to. If your company's teams all work well together then the suits can focus on business and everyone else works together to keep things working arguably more efficiently. It's like have decentralization of responsibility instead of a top down system where one person/group is charged with knowing what's going on everywhere at once.
I worked at a place doing network appliance stuff. We managed and developed the machines, and product to end users. A simple dashboard increased our effectiveness by an order of magnitude. Some simple graphs and deep knowledge of the code really really helped. Some data from other departments combined with this was amazing. A simple color coded call queue count told us there was a problem before the front-line tech support people could even report it. Simple analysis of that queue told us where to look for problems: a predominance of one area code told us to examine specific machines (and which subset!), while a wide array of area codes told us that it was probably in billing or comms infrastructure. A couple of graphs of network traffic and system load told me what what the problem could be. I don't know how, I couldn't write code to do it, but I certainly could get to the problem much faster with that minimal input from graphs.
My working theory on dashboards is this: the human mind is amazing at pattern recognition. We're wired for it. We can make intuitive leaps see patterns that may be very hard to describe in math/stats or code. Particularly visual ones. So if you provide that data to your brain to crunch, you are enabling and augmenting your natural tooling. The graphs and stats should be as specific an pre-thought out as possible, but they aren't perfect. Fortunately, as time progresses you learn what "looks right" and what "looks like a problem in subsystem Foo".
This isn't a silver bullet, but certainly it is a great tool. Since then, I've tried to never do work without some sort of visual feedback I can background my innate pattern matching on. Even if it is just scrolling logs -- these patterns emerge and provide clues even if you can't express what they are.
(Anecdote: I had built a demo a while back, and it hiccuped during the live presentation. Fortunately I was in the back of the room with my logs scrolling, and I noticed the logs looked wrong, so I found out a script had died. I restarted it, causing a weird blip in one of our display graphs, but the presenter noticed it before calling attention to that graph in the course of presentation. He glanced at me and I gave him the thumbs up and the audience never even noticed. What was the pattern? The scrolling in one of my log windows slowed down....)
I generally paraphrase this effect as such ;-): "There is no aberration detection mechanism more sophisticated than your marketing guy glancing at a dashboard while drinking his coffee."
I completely agree with your point 'the human mind is amazing at pattern recognition'. From personal experience, I can say I am a 'visual' person. (So much so that my startup is entirely based on this premise, because I think our eyes are amazing analytics tools.) But to my point: is that the case for everybody? Is everyone 'visual', or do some folks think better 'visually' than others?
On one side- yes it's supercool to have a dashboard with maps and metrics and everything like that. And I can see it could be really useful too when something's going wrong.
On the other hand, I'd be wary of building anything like this until I had a good number of customers, paying me money for a service. Is the product so complete (I would ask myself) and my users so happy, that we have no more important task than building a dashboard to look at in the office. Don't some simple email alerts cover most of the important bases (server down, site down etc) for now.
Your startup needs customers (or failing that, users). Your users don't care if you've got a dashboard, they care that your product does what it says.
My sense is that people continue to think that monitoring is hard mostly because it used to be. There are great open source tools and services available now to make both collecting the data and watching it pretty simple to set up.
Yes you need some customers, otherwise you have no data to look at. However, as soon as you have _any_ customers if you aren't looking at real data to decide what is important to do next you are essentially just guessing.
For a long time that's what we did because tracking what was relevant was just too hard. Now what I see in a lot of startups that are leading the pack is that they are monitoring everything whether they think they'll need it or not - then when they have a critical question the data is already there.
Dashboards don't have to be expensive to build. My company uses Graphite now, but before we were at that level, I just had engineering give me credentials for our DBs, and used ODBC + Excel.
Next thing you know, you have a dashboard tracking:
- Real time revenue/revenue projections
- Real time payments funnel metrics (impressions on payments page, conversions, average transaction size)
- Real time tutorial completion rate by hour (aka application health monitoring)
- Other KPIs tracked by the day (revenue, install, etc.)
In fact, there was a period of time when my dashboard was the most effective health monitoring in the company, and detected an issue before engineering or ops.
I can't agree more. First you need a product that works at least for some users, then when you are expanding the user base it's time for dashboards.
Dashboard answers the question: Is my product right now working for my users?
Eventually it can also show trends in terms of resource usage, strange peaks, etc. But the primary function of dashboard is answering the question: "are my users getting the service?"
Additional note: dashboard works wonders for getting us - engineers - understand that there are real users out there right now and thus making us want to build even better product.
This seems pedantic. Of course the product needs to work for some users before you can measure anything about it. Otherwise your dashboard is filled with zeros, which should be obvious because the product doesn't work for anyone (and isn't a product).
Once you've gotten to the MVP stage, you should be tracking the key metrics, and those should be easily viewable in some sort of dashboard. Tracking up-time (or if the service is working) is a small part of what should be measured. When you're just starting off, you need to quickly figure out if your market hypothesis is correct. You can't do that without data, and a dashboard is both easy to build and helps make more informed decisions for everyone involved.
No one is arguing that you should build a dashboard before you have a MVP. Once that happens, you need to be measuring things.
Dashboards are not just for "Is my product working for my users?"
They have variety of purposes. For e.g our webapp is an open book, we display live dashboard (https://my.infocaptor.com) of our very app itself because it is meant to be a dashboard platform
The main purpose of dashboards is to keep a birds eye view and take action when something is off.
Don't track metrics where you don't have time or resources to take action against.
I have taken the time to build a graphite/stasd setup. It is over engineered for now, but it grows and is really easy to add in new metrics. (there is even a fab file to install on ubuntu buried in github/lifeisstillgood/frozone)
but, you are right - a dashboard that measures how the site is going is useful. One that measures cashflow, email conversion rates and more is more useful. Linking those into graphite is not a good idea. A more traditional data collection is needed.
Graphite/carbon/whisper is a rrd like tool - you give it a number of ticks labelled with a certain metric (facebook.photo.upload) and it counts the incoming ones over a fixed period, averages them, stores the average and goes on and on. Then it draws you a graph over those periods
this is great for "what is normal" and "something has changed over past 5 mins, 5 days, 5 weeks"
some things you want on your dashboard are absolutes that don't vary well with time (cash on hand/ assets vs liabilities)
also the whisper database is awkward to query if you are not aggregating - so you cannot store which customers actually responded to which email campaign - that needs a real RDBMS
So it's really good for watching stuff over time - like you used to use tail -f for. But other stuff needs to be captured robustly and then maybe time graphed
tl dr
Some things are trends - you want to know easily and quickly which ones are going up, and when the trend goes haywire. Use graphite et al.
other things are "action this day" - trend or not you need the email addresses of all users on the blue campaign.
Using UDP you lose a not insignificant number as well
Cant agree more. I almost always have email triggers built into the system for ever critical error or failure of the system. When we are being lean on the core product itself building less features... build it when you need it approach ... having a dashboard is kind of luxury.
Metrics are also about the state of your business. What is your growth rate ? What is your churn ? Are people only clicking on certain features ? Where should you improve performance ? These sort of questions are essential for startups especially in the early stages where you need to be careful where to allocate precious resources.
Yes you can sort of gauge that with Google Analytics but only to a certain granularity. Graphite is incredibly flexible.
I think your parent post is saying that metrics are for control and not planning. Once you've identified what your business depends on, and have identified good metrics that (at best) are proxies for those things, a dashboard is a great idea. If on the other hand you don't actually understand your business model, your first goal is to start to understand it, and fixating on a particular set of metrics will only make that harder IMO.
My new strategy is to instrument the hell out of things -- but only to look at data when enough of it has accumulated to be meaningful.
I think you can make a mistake of measuring and trying to understand data in increments that are so small as to be a waste of time. It can become just another way to not be minding your knitting. Server down? You need to know right now. Knowing that somebody considering buying your product is using a iPhone in New Zealand? Not so much. Computers are great at catching and storing all sorts of stuff. Unless it's critical, best to play with the bells and whistles in controlled deep dives, not as part of a Tony Stark video-game-master-of-the-universe thing. I love bringing up my real-time site metrics and watching all sorts of folks that I am helping. I also like to watch my stock portfolio move throughout the day. But I've found that there's a lot of naval-gazing going on there and not much of anything useful.
Don't look at measurements that generate no subsequent management activity.
The The Grammar of Graphics (Statistics and Computing) by Leland Wilkinson is also great, though more focused on building graphing systems. I believe it is the inspiration for a lot of d3.js
http://www.amazon.com/The-Grammar-Graphics-Statistics-Comput...
That Stephen Few book is pretty much all you need, but you can also check out Edward Tufte's books, and Juice Analytics' white papers (http://www.juiceanalytics.com/).
We don't really cover any serious metrics, as we're mostly client work - but we did have a big screen that we wanted to use. It's become one of the coolest things in our office.
Something like this would have to be everywhere. I know I always wait a few days, before sending feedback that something doesn't work for me and sometimes I just stop using features like that. The chance to fix something before users would even start to complain is priceless.
Like any doctor would say, the sooner you find out something is wrong, the better. And if nothing goes wrong, you have a huge cool display of how awesome your company is doing.
Dashboards provide value on multiple levels. They radiate information out into your team so everyone is working off of the same shared understanding about the state of the business. They make it easier to detect aberrations. They enable your experts to curate a set of top-level metrics on a per-service basis to look at first when fire-fighting. Every service in your architecture should have it's own dashboard.
If you're interested in dashboards, our startup recently launched a service that makes it simple to hook in your metrics through OSS tools like StatsD and build real-time dashboards with a few clicks: https://metrics.librato.com You just provide the data, we do all the rest. Would love any feedback you might have!
The only issue is the pricing can be a little unfair if you are aiming for more of a real time system since you can't define retention rate. It would be great to be able to model the same system used in Graphite i.e. real time for last 6 hours, then capture every 5 minutes for 6-24 hours and then every 10 minutes from 24 hours onwards.
Thanks for the feedback :). We do provide a defined set of rollups today: 1m, 15m, 1h ... but in the future we intend to let users configure that themselves. If you want to shoot me an email, we can chat about the realtime scenario, I'm always interested in testing our pricing hypothesis against real-world use-cases.
All of this is well and good, but it's kind of like a "step 1, step ...., PROFIT" joke. The difficult part is picking the right metrics and, often, creating and maintaining them.
It's worth noting that dashboards and instrumentation are one of my key job functions - As it is for all Google SREs. Having the data to tell when you're doing well and when your failing is key - Without knowing you're broke, how do you know to be fixing?
I agree with this. Our version is a plain ol' chalkboard, but it covers the entire wall. We mainly use it for the current state of the company - projects, action steps, projections, etc. Incredibly useful and has a UI everyone understands.
It's not only start-ups that need dashboard or visual metrics, established companies can benefit from more information about how their services/applications/etc work.
Its one thing to give the numbers to the management group, but its more readily accepted when combined in a visual manner. i.e. dashboard
My current employer is only now beginning to see the advantage of collecting good metrics and how it shows the behaviour of both the users and the application.
Their decision making and planning becomes more accurate over time due to having good information at their fingertips rather than an educated guess because originally the system was a black box.
Dashboards are necessary but not sufficient. As your company grows, you also need to send periodic report summaries to people who can act on them. While you certainly need instant pings (emails/IMs/Tweets) in case of major issues (server down, $10k+ shipment got delayed etc.), for the most part, you need to let someone spend 15-30 mins a week to analyze the data and make rational decisions based on that. Dashboards can't help with that but creating too many dashboards will take up valuable resources that could have been used to make more effective reports.
The coolest "hack" about this is that they put the dashboard on several wall-mounted flatscreens.
It doesn't even matter what sort of logging/stats drawing software they use, it could just as well be a tiled window manager with some Google Analytics views on auto-refresh and a colour-coded log tail :)
Not saying you can't do much better than that by picking smart visualisations, but the biggest win is having a few dedicated screens mounted on the wall continuously displaying info, for everyone in the office to glance at.
Not to mention is looks damn cool and impressive :)
GeckoBoard - Dashboard as a service - Can be coerced into creating whatever kind of charts/graphs you want on a slick looking board, but is set to do polling of data which is less helpful for system operations stuff, but nicer for marketing, MAU/DAU monitoring, signups, etc.
I'd also be careful to avoid vanity metrics. I think your time is better spent building a dashboard for actionable metrics. A map of all of your api calls might look cool but what are you going to do with that? I'm more interested in average customer life-time value, churn, customer acquisition cost, retention etc... I've been looking into using geckoboard for this as it seems like it would make setup go much faster.
If you suddenly see a huge cluster of API calls in a single geographic area, that might point to an issue, such as a problem connecting to the local CDN. The stuff you're talking about is actionable from a business standpoint, but the stuff in the article is meant to be actionable from a performance and problem resolution standpoint.
I really, really want to see a beautiful, open source dashboard app, preferably written with Rails.
So far, I've only seen beautiful but closed source SaaS apps, and pretty ugly open source PHP or Python projects.
I'm very tempted to start my own, but I just wish someone like Panic would open source their awesome design.
many moons ago * 20+ i had a summer jobb in a industrial consumer analog photo lab, in the entrence to the coffe machine / resting area there where hand-plotted charts updated daily on velocity of orders / our fulfillment with comparison to expected orders and also historical high and lows . this made the workforce (40+) transparently se what shape we all where in with regards to business ...
If you are looking for custom charts for graphite, shoot me an email: zack@zacharymaril.com. I've done d3 consulting previously and can get it done for you. </plug>
For those that are considering this you can capture from all three areas of your 'stack' straight into Graphite:
1) Munin. There are hundreds of munin plugins around for monitoring every part of your infrastructure from CPU to MongoDB replication lag to number of Nginx 404s. Use Munin2Graphite bridge.
2) App. There are lots of statsd clients which allow you to measure all your business metrics. Signups. Retention rate. Revenue rates etc.
3) Front End. With the NginxStatsd module you can push stats directly into statds from your Javascript methods. Use it to do end to end timing, capture what the user is doing etc without putting any load on your app.
Sure it's a bit of work but done well it can save you a lot of money over products like MixPanel or NewRelic.
A big part of running your business involves looking at this list and picking the ones that make sense for your business, not what the blog-post-du-jour is telling you.