If you’ve followed my blog in the past few weeks, I’ve been doing quite some work between Munin and the Nagios packaging (I leave Icinga to prometheanfire!), as well as working closely with Munin upstream by feeding them patches — yesterday I actually got access to the Munin contrib repository so now I can help make sure that the plugins reach a state where they can actually be redistributed and packaged.
I also spent some time clearing up what was once called nagios-nsca
and nagios-nrpe
(which are now just nsca
and nrpe
since they work just fine with Icinga as well, and the nagios-
part was never part of the upstream names anyway; kudos to Markos for noticing I didn’t check correctly for revdeps, by the way) — now you got minimal USE flags that you can turn on to avoid building their respective daemon, although you have to remember that you have to enable minimal for nsca on the nodes, and for nrpe on the master. They also both come with new init scripts that are simplified and leverage the new functionalities in OpenRC.
There is though something that is not making me sleep well — and that’s beside the usual problems I have with sleeping. Let me try to explain.
Some of the Nagios/Icinga tests can’t be executed remotely as they are, obviously — things like S.M.A.R.T. monitoring need to be executed on the box they have to monitor obviously. So how do you fix this? You use nrpe
— the Nagios Remote Plugin Executor. This is basically a daemon that is used to execute commands on the node (more or less the way Munin’s node works). Unfortunately, unlike Munin, both Icinga proper and NRPE don’t allow you to choose on a per-plugin basis which user to use (to do so, Munin has its node running as root).
Instead, everything is executed by the nagios user, and if you need to access something that the user can’t access, you can work it around by using a setuid-root plugin (these are tied to the suid USE flag for nagios-plugins
in Gentoo). But this, of course, only works for binaries, not scripts. And here’s the first problem: to check the S.M.A.R.T. status of an IDE drive, you can use the check_ide_smart
tool that reimplements the whole protocol… to check the status of a SATA drive you should use check_smart.pl
that uses SmartMonTools to take care of it.
But how can the script access the disk? Well, it does it in the simplest way: it uses sudo. Of course this means that the nagios user has to have sudo access… afraid that this would get people to give unconditional sudo access to the nagios user, I decided to work it around by installing my own configuration file for sudo in the ebuild, making use of the new /etc/sudoers.d
folder, which means that on a default install, just the commands that are expected will be allowed for the nagios user. And this is good.
But sometimes the plugins themselves don’t seem to care about using sudo directly; instead they rely on being executed with an user that has enough privileges; for this reason, the nrpe configuration allows you to prefix all commands with any command of your choice, with the default being… sudo! And their documentation suggest to make sure that the user running nrpe does not have write access to the directory to avoid security issues… you can understand that it’s not the only bad idea you could have, there.
Sigh, this stuff is a complete security nightmare, truly.