I read the "Why do this project" section on the site, and am not convinced. Nagios is a very actively maintained project, and did have major features added about a year and a half ago. It is totally suited for use in large installations, especially with the help of existing third party enhancements like GroundWork Monitor. And on the non-issue of wanting a flexible Nagios, a lack of flexibility is NOT one of Nagios' problems. It is insanely flexible, and has tons of add-ons and plugins. Also being in C, it performs very well, especially the current version. I've looked at the code, and it's not bad - maybe the shinken authors are just intimidated by C.
Is there anyone out there who uses Nagios (and by use, I mean have more than a hundred services being actively monitored) that finds nagios lacking? Nagios is basically an engine for state tracking, notification, and cmd dispatch. Ironically, the base set of Nagios packages isn't capable of doing any monitoring - you need to add plugins to fill out the command library to do that.
Nagios, to me, is like cvs, jira, cacti, sendmail, bind - just one of those sysadmin toolkit apps that works pretty well and I've never really been inspired to really invest the time in learning the alternatives - never really needed them. (Yes - I know git/subversion blow cvs away, and postfix is probably the right MTA to use. I think most bind actually is the only one on my list that actually still has mindshare. Issue tracking (jira) is so over the place that I'm guessing there is no "one popular" system out there)
We've been using a different nagios alternative, zabbix, for quite a while and I'm always surprised how few people have heard of it. The UI takes getting used to, but it's full featured and dependable.
happy zabbix user here, it's true, the ui is not especially obvious, but it works quite well, and with the graphing capabilities it is also a reasonable replacement for cacti.
Does anyone have experience with running large scale deployments of this? I am not a fan of nagios itself (last I looked at the code it was a horrible perl mess), but the idea itself is great.
Not very likely, as the project itself got wiki and ML only in January... But the architecture seems better than Nagios tbh. More distributed and with more options for redundancy.
up on github now. it works but we're still working out config file format, and it's just now able to send alerts. lot of work to do before we could even call it alpha...
I've always thought that Nagios configuration was pretty good. For the small organizations, it's simple enough with it's object/template infrastructure, particularly now that you can create custom properties and pass arguments to nrpe, that most sysadmins can get a monitored environment up in 30 minutes, and then add devices on the order or 2-3 minutes per system (inheriting all the monitors for that object class)
When organizations get larger, the configs are straightforward enough that the machine generation (which is absolutely necessary when your monitoring 10s of thousands of services) is also relatively straightforward.
My company is dropping north of $1mm on ITIL practices, ticketing, trending, and KPI generators to depart from our current Jira/Cacti/Nagios world, but Nagios is one of the products that we're continuing to use for the foreseeable future. (Eventually that will change - I've discovered that Billion Dollar corporations would rather spend $500K / framework and then start adding developers (at $150K fully loaded) to extend that platform with proprietary in-house developed systems, rather than simply go buy an open-source product for $5K and do the same thing.
It may just be that the Nagios developers have a different cognitive style than I do, because I find the configuration language to be irritating to the extreme. So much so that I end up using templates to script configuration as much as possible. Somehow monit has always been more comfortable to use in my hand.
Interesting. About the only "Nagios Annoyance" that I have (and I don't really see a good way around it) is that you don't specify the commands you are running for a service check, but have to reference your command library (which in turn has the name and calling order of the actual parameters)
I realize that there is no way to update the command/parameters you are calling in one place without doing this, but the cmd entry usually almost _completely_ maps to the command that I'm calling - parameters, command name, etc.. that it's addition has never really seemed to be that critical - and always results in one more "lookup" when trying to figure out what a service check is doing.