Munin, sensors and IPMI

In my previous post about Munin I said that I was still working on making sure that the async support would reach Gentoo in a way that actually worked. Now with version 2.0.7-r5 this is vastly possible, and it’s documented on the Wiki for you all to use.

Unfortunately, while testing it, I found out that one of the boxes I’m monitoring, the office’s firewall, was going crazy if I used the async spooled node, reporting fan speeds way too low (87 RPMs) or way too high (300K), and with similar effects on the temperatures as well. This also seems to have caused the fans to go out of control and run constantly at their 4KRPM instead of their usual 2KRPM. The kernel log showed that there was something going wrong with the i2c access, which is what the sensors program uses.

I started looking into the sensors_ plugin that comes with Munin, which I knew already a bit as I fixed it to match some of my systems before… and the problem is that for each box I was monitoring, it would have to execute sensors six times: twice for each graph (fan speed, temperature, voltages), one for config and one for fetching the data. And since there is no way to tell it to just fetch some of the data instead of all of it, it meant many transactions had to go over the i2c bus, all at the same time (when using munin async, the plugins are fetched in parallel). Understanding that the situation is next to unsolvable with that original code, and having one day “half off” at work, I decided to write a new plugin.

This time, instead of using the sensors program, I decided to just access /sys directly. This is quite faster and allows to pinpoint what data you need to fetch. In particular during the config step, there is no reason to fetch the actual value, which saves many i2c transactions even just there. While at it, I also made it a multigraph plugin, instead of the old wildcard one, so that you only need to call it once, and it’ll prepare, serially, all the available graphs: in addition to those that were supported before, which included power – as it’s exposed by the CPUs on Excelsior – I added a few that I haven’t been able to try but are documented by the hwmon sysfs interface, namely current and humidity.

The new plugin is available on the contrib repository – which I haven’t found a decent way to package yet – as sensors/hwmon and is still written in Perl. It’s definitely faster, has fewer dependencies and it’s definitely more reliable at leas ton my firewall. Unfortunately, there is one feature that is missing: sensors would sometimes report an explicit label for temperature data.. but that’s entirely handled in userland. Since we’re reading the data straight from the kernel, most of those labels are lost. For drivers that do expose those labels, such as coretemp, they are used, though.

Also we lose the ability to ignore the values from the get-go, like I describe before but you can’t always win. You’ll have to ignore the graph data from the master instead. Otherwise you might want to find a way to tell the kernel to not report that data. The same probably is true for the names, although unfortunately…

[temp*_label] Should only be created if the driver has hints about what this temperature channel is being used for, and user-space doesn’t. In all other cases, the label is provided by user-space.

But I wouldn’t be surprised if it was possible to change that a tinsy bit. Also, while it does forfeit some of the labeling that the sensors program do, I was able to make it nicer when anonymous data is present — it wasn’t so rare to have more than one temp1 value as it was the first temperature channel for each of the (multiple) controllers, such as the Super I/O, ACPI Thermal Zone, and video card. My plugin outputs the controller and the channel name, instead of just the channel name.

After I’ve completed and tested my hwmon plugin I moved on to re-rewrite the IPMI plugin. If you remember the saga I first rewrote the original ipmi_ wildcard plugin in freeipmi_, including support for the same wildcards as ipmisensor_, so that instead of using OpenIPMI (and gawk), it would use FreeIPMI (and awk). The reason was that FreeIPMI can cache SDR information automatically, whereas OpenIPMI does have support, but you have to tackle it manually. The new plugin was also designed to work for virtual nodes, akin to the various SNMP plugins, so that I could monitor some of the servers we have in production, where I can’t install Munin, or I can’t install FreeIPMI. I have replaced the original IPMI plugin, which I was never able to get working on any of my servers, with my version on Gentoo for Munin 2.0. I expect Munin 2.1 to come with the FreeIPMI-based plugin by default.

Unfortunately, like for the sensors_ plugin, my plugin was calling the command six times per host — although this allows you to filter for the type of sensors you want to receive data for. And that became even worse when you have to monitor foreign virtual nodes. How do I solve that? I decided to rewrite it to be multigraph as well… but shell script then was difficult to handle, which means that it’s now also written in Perl. The new freeipmi, non-wildcard, virtual node-capable plugin is available in the same repository and directory as hwmon. My network switch thanks me for that.

Of course unfortunately the async node still does not support multiple hosts, that’s something for later on. In the mean time though, it does spare me lots of grief and I’m happy I took the time working on these two plugins.

One thought on “Munin, sensors and IPMI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s