If you’ve followed my blog in the past few weeks, I’ve been doing quite some work between Munin and the Nagios packaging (I leave Icinga to prometheanfire!), as well as working closely with Munin upstream by feeding them patches — yesterday I actually got access to the Munin contrib repository so now I can help make sure that the plugins reach a state where they can actually be redistributed and packaged.
I also spent some time clearing up what was once called
nagios-nrpe (which are now just
nrpe since they work just fine with Icinga as well, and the
nagios- part was never part of the upstream names anyway; kudos to Markos for noticing I didn’t check correctly for revdeps, by the way) — now you got minimal USE flags that you can turn on to avoid building their respective daemon, although you have to remember that you have to enable minimal for nsca on the nodes, and for nrpe on the master. They also both come with new init scripts that are simplified and leverage the new functionalities in OpenRC.
There is though something that is not making me sleep well — and that’s beside the usual problems I have with sleeping. Let me try to explain.
Some of the Nagios/Icinga tests can’t be executed remotely as they are, obviously — things like S.M.A.R.T. monitoring need to be executed on the box they have to monitor obviously. So how do you fix this? You use
nrpe — the Nagios Remote Plugin Executor. This is basically a daemon that is used to execute commands on the node (more or less the way Munin’s node works). Unfortunately, unlike Munin, both Icinga proper and NRPE don’t allow you to choose on a per-plugin basis which user to use (to do so, Munin has its node running as root).
Instead, everything is executed by the nagios user, and if you need to access something that the user can’t access, you can work it around by using a setuid-root plugin (these are tied to the suid USE flag for
nagios-plugins in Gentoo). But this, of course, only works for binaries, not scripts. And here’s the first problem: to check the S.M.A.R.T. status of an IDE drive, you can use the
check_ide_smart tool that reimplements the whole protocol… to check the status of a SATA drive you should use
check_smart.pl that uses SmartMonTools to take care of it.
But how can the script access the disk? Well, it does it in the simplest way: it uses sudo. Of course this means that the nagios user has to have sudo access… afraid that this would get people to give unconditional sudo access to the nagios user, I decided to work it around by installing my own configuration file for sudo in the ebuild, making use of the new
/etc/sudoers.d folder, which means that on a default install, just the commands that are expected will be allowed for the nagios user. And this is good.
But sometimes the plugins themselves don’t seem to care about using sudo directly; instead they rely on being executed with an user that has enough privileges; for this reason, the nrpe configuration allows you to prefix all commands with any command of your choice, with the default being… sudo! And their documentation suggest to make sure that the user running nrpe does not have write access to the directory to avoid security issues… you can understand that it’s not the only bad idea you could have, there.
Sigh, this stuff is a complete security nightmare, truly.
I have been looking into Op5-Ninja as a replacement UI for Nagios/Icinga (Icinga will require some changes to make this work, but we will look at those after the release of Nagios4), there are some extra checks provided from the Op5-plugin project, those relies on sperl, which causes some complications as sperl has already been dropped upstream, while Op5 is building much of their stuff on top of RHEL which uses quite old version of perl, so your sudoer stuff may come handy.
You might be also interested to take a look at the Shinken Project (http://shinken-project.org).It’s a Python based re-implementation of Nagios which has a strong focus on decentralized setups, modularization, massive parallel processing and distributed configurations.I didn’t take a much closer look at it yet, but to me it seems like the decentralized and modularized concept of it might solve some of your security nightmares
Sorry, had the wrong URL in my previous post. The correct one is:http://www.shinken-monitori…
From a security perspective, the first question should be:What kind of exposure do I have with the chosen piece of software. NRPE waits for check requests and it is the first point of entry to be exploited. After that, running a bunch of checks with sudo privilege and read/write access to all kinds of directories (required by some plugins) is indeed a nightmare.One alternative is having the checks run using cron and placing their output in a directory that is read-only. The NRPE daemon reads this data and returns the output.Or pushing this out even more. Have the check results be returned using some type of NSCA mechanism (*MQ, NSCA, TSCA(shinken), WebNSCA, collectd, etc…). So this eliminates the daemon exposure and can permit some checks to actually run as root or other lesser privileged user.Is this perfect, no… No possibility to modify the behavior of remote checks for dynamically changing monitoring aspects. These remain within the configuration management sphere (Puppet/Chef/Cfengine/etc)By definition monitoring systems that have remote access (in read/write) to anything are bad news! As the monitoring systems talk to everything, so by definition they are very exposed to becoming a choice target.
To be honest there is only a few plugins which need sudo or root access, the majority of checks does need not such access. Most of your services you are able to monitor via SNMP. The problem exists only there where you have no SNMP access. On my own installation I need 3 plugins which use sudo this is to ask the S.M.A.R.T status the other one to as for md_raid status. The third one is rndc to get BIND statistics. So having 122 services in icinga and around 10 of them are using sudo. Is around 10% which do need this kind of access.I wouldn’t say it is a security nightmare. I really do not need to know BIND statistics and the S.M.A.R.T status of an harddisk, since the most important information I am able to get via SNMP.So generally spoken, I agree you pointing it out as a security risk, but only if you need certain specific information from a service or a piece of hardware, you are not able to get via SNMP. And it is not a problem with Icinga, or Nagios or Munin, but it is a problem more of that certain plugins that need such access, in my opinion.
I think people here are astronomically missing the point.Yes I know that _most_ checks don’t need root access — the few that do, they can be worked around with proper ACLs or things like SELinux.The security nightmare is the way the default suggestions for Nagios and NRPE are to enable them to be able to use sudo, unconditionally — setting them up with sudoers.d? That’s safe enough for me.
Yo Diego,You really should check out JFFNMS. It’s based on php, easy to understand and very extensible. Not only that, the dev team is quite responsive and encouragesboth patches and ideas on direction the software should go.It use to be in portage, but got orphaned after a dev left…..I’d greatly appreciated your evaluation of JFFNMS vs Nagios et. al.James
Hi Diego,you might want to have a look at this:https://github.com/mxey/ssh…With the proper restrictions (no x11, no forwarding etc.) in authorized_keys, the user can only execute pre-defined pseudo-commands which he can pass parameters to. With proper sudo rules, you can have a kind of “login-user” that only executes some of the commands (where needed) with sudo.
Hi,i spent a lot of time on Nagios some time ago, and i think you based on wrong approch.Active checks are really a nightmare moreover if they need root, what works really well in this case are passive checks.Wait for host to send a notification itself; alarm if it doesn’t happen. Launch whatever script you like as root or something-user-with-privileges you need, and put the security problem concerns on script and not on communications between hosts.Passive checks, are more effective on terms of performance. I used a server to check about 200 services on 5 minutes and it lags (on nagios there is an auto-optimization of queue, but needs a lot of time to be effective); with some service converted to passive checks i save a lot of cpu cycle.What nsca and approch are for is a dirty and quick way to solve some checks, but they should be avoided more as possible (for the sake of security).