You might wonder why I’m working so much on Munin, given that I should be worried about other things such as my own job… well, turns out that I’m using Munin at my job and my bosses are so impressed with actually having a monitor on the resource (while they originally sceptical we needed it), that I can easily cut down more time on the payroll to work on improvements.
The funny thing is that we’re actually developing proprietary software based on Free Software components (don’t worry, we respect the licenses!) but our main FLOSS contributions are probably outside the area of development for which our business is based on. I guess this is just the way it is, after all, the open source contributions of Facebook have little to do with social networking.
Anyway while trying to set up monitor for the servers and devices we care about, I was trying to solve the issue of knowing what the sensors reading are on the servers, which are mostly HP (with the exception of my Excelsior, which is monitored, yes, but on a different Munin anyway) and for which lm_sensors is useless: the data is not fed to the main system but rather to the IPMI management board.
Thankfully, there are a number of different tools that allow you to access that IPMI data, and Munin already had a plugin, ipmi_
that uses ipmitool
to fetch the sensors’ data. The problem with it is that the fetching takes time, and if the plugin doesn’t reply in 10 seconds, Munin will consider it as not available. On the HP server I started setting this up, the reply time is well over 10 seconds.
To get around this limitation, I decided to take a different approach: I wrote my own plugin. Or rather I rewrote the original plugin using FreeIPMI which I know well (I maintain it in Gentoo and I sent patches upstream before). And this seems to be a win on all counts.
First of all, the ipmi-sensors
command caches the so-called SDR data, which describe the sensors available on a system, which means that after the first execution, it doesn’t spend as much time parsing what it’s receiving. Then, since version 1 at least, it has a number of parameters that make it very easy to filter the output and receive it in a format that is suitable for script-based parsing. In particular you can filter what type of sensors you want data from (Temperature or Fan), ignore the values that are not available (e.g.: missing fans), all together with having the output in CSV form.
The net result is that instead of a page-long gawk
script to parse the lines, filter them, generate unique names and so on, I’m using a couple of awk
commands — yes I tested them with mawk
, but they also use a very very simple syntax, so I’m not surprised it’ll work for a very long time. Basically now the plugin is very fast, very short, and very simple. And instead of just expecting to always have both temperature and fans data, this time it actually checks before suggesting anything at all.
Unfortunately, right now, it has two missing features that are present in the original plugin: the first is critical and warning thresholds that are not printed by the current version of FreeIPMI. This is okay, because likely the next release will have a switch to print those as well (which means that it’ll reach feature parity for those two sensors). The other issue is that the original plugin has an undocumented support for power metering.
Unfortunately none of my boards support power metering so I’m stuck without that kind of data, and I can’t be sure how to implement it back right now — the ticket refers that HP servers have the data, but none of the ones I have here report it. But they do report a few temperatures…
But with the exception of that, the plugin is much better than the one that was there before. Hopefully at some point it’ll be in the default Munin set instead of just being available for Gentoo users.