Hacker News new | past | comments | ask | show | jobs | submit login

the graphite instance had the metrics, not grafana. I don't know how many were actually graphed. One thing that I can be sure of was that they'd all been pushed within the last week, otherwise they'd get deleted.

There were at the time about 200 dashboards. They were controlled and curated by their own teams. It was pretty much the only shared tool that worked well. The only thing that I encouraged was tagging, but even then, they mostly did it themselves to make finding things easier.

There were about 80 active products, most had _a_ dashboard.

The cruicial thing was that it doesn't cost much to record those metrics. This means that post incident we can easily put an alert in, or prove x affects y because z.

limiting the number of metrics recorded is frankly silly. Enforcing rules about quality and location, certainly, its something I spend a reasonable amount of time on.

for example, the front end was a microservice. Each http call of each microservice was graphed, which allowed quick and simple diagnostics for general performance. Most of the time its not needed, but when you _do_ need it, its critical to have context




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: