Very nifty. The automatic selection is a great innovation. I built a somewhat si...

Very nifty. The automatic selection is a great innovation.

I built a somewhat similar system a while ago on-top of statsd/graphite. Mine was not designed for production deployment though, just as a test platform (I was basically using graphite to store and query metric data. Not optimal, but that problem was out of scope and it was easy to abuse like that.) This tool allowed a user to manually select a set of metrics and create a fault classifiers with those metrics.

These classifiers were able to detect not only the presence of faults but also classify what type of faults they were (provided sufficient training data. Of course you could train new classifiers with data you collected in production so training new classifiers becomes an ongoing activity.). We were only testing geometric classification, but using any sort of classifier to identify complex fault types seems to be an idea with promise.