Munin no[dt]es

Back when I was looking at entropy I took Jeremy’s suggestion and started setting up Munin on my boxes to have an idea of their average day situation. This was actually immensely useful to note that the EntropyKey worked fine, but the spike in load and processes caused entropy depletion nonetheless. After a bit of time spent tweaking, trying, and working with munin, I now have a few considerations I’d like to make you a part of.

First of all, you might remember my custom ModSecurity ruleset (which seems not to pick the interest of very many peopole). One of the things that my ruleset filters is requests coming from user-agents only reporting the underlying library version (such as libCurl, http-access2, libwww-perl, …), as most of the time they seem to be bound to be custom scripts or bots. As it happens, even the latest sources for Munin do not report themselves as being part of anything “bigger”, with the final result that you can’t monitor Apache when my ModSecurity ruleset is present, d’oh!

Luckily, Christian helped me out and provided me with a patch (that is attached to bug #370131 which I’ll come back to later) that makes munin plugins show themselves as part of munin when sending the requests, which stops my ruleset from rejecting the request altogether.

In the bug you’ll also find a patch that changes the apc_nis plugin, used to request data to APC UPSes though apcupsd, to graph one more variable, the UPS internal temperature. This was probably ignored before because it is not present on most low-end series, such as the Back-UPS, but it is available in my SmartUPS, which last year did indeed shut down abruptly, most likely because of overheating.

Unfortunately it’s not always as easy to fix the trouble; indeed one of the most obnoxious, for me, issues is that Munin does not support using IPv6 to connect to nodes. Unfortunately the problem lies in the Perl Net-Server module, and there is no known solution, barring some custom distribution patches — which I’d rather not ask to implement in Gentoo. For now I solved this by using ssh with port forwarding over address 127.0.0.2, which is not really nice, but works decently for my needs.

Another interesting area relates to sensors handling: Munin comes with a plugin to interpret the output of the sensors program from lm_sensors, which is tremendously nice to make sure that a system is not overheating constantly. Unfortunately this doesn’t always work as good. Turns out that the output format expected by Munin has changed quite a bit in the latest versions, namely the min/max/hyst tuple of extra values vary their name depending on the used chip. Plus it doesn’t take into consideration the option that one of the sensors is altogether disabled.

This last problem is what hit me on Raven (my frontend system). The sensors of the board – Asus AT5IONT-I – were not supported on Linux up to version 2.6.38; I could only watch over the values reported by coretemp, the Intel CPU temperature sensor. With versions 2.6.39, driver it8721 finally supported the board and I could watch over all the temperature values. But of the three temperature values available from the sensor, only two are actually wired, respectively to a thermal diode and a thermistor; the third results disabled, but is still reported by the sensors output with a value of –128°C which Munin then graphs. The only way I found to disable that, was to create a local sensors.d configuration file to stop the value from appearing.

One interesting note relates to the network I/O graphing: Munin didn’t seem to take very nice the rollover at the reboot of my router: it reported a peak of petabits I/O at the time. The problem turns out to be easy to solve, but not as easy to guess. When reading the ifconfig output, the if_ plugin does not discard any datapoint, even if it is bigger than the speed of any given interface, unless it can read the interface’s real speed through mii-tool. Unfortunately it can only do that if the plugin is executed as root, which obviously is not the default.

Another funny plugin is hddtemp_smartctl — rather than using the hddtemp command itself, it uses smartctl which is part of smartmontools, which I most definitely keep around more often than the former. Unfortunately it also has limitations; the most obnoxious is that it doesn’t allow you to just list a bunch of device paths. Instead you have to first provide it with a set of device names, and then you can override their default device path; this is further complicated by the fact that each of the paths is forced to have /dev/ prefixed. Since Yamato has a number of devices, and not always they seem to get the same letters, I had to set it up this way:

[hddtemp_smartctl]
user root
group disk
env.drives sd1 sd2 sd3 sd4 sd5 sd6
env.dev_sd1 disk/by-id/ata-WDC_WD3202ABYS-01B7A0_WD-WCAT15838155
env.dev_sd2 disk/by-id/ata-WDC_WD3202ABYS-01B7A0_WD-WCAT16241483
env.dev_sd3 disk/by-id/ata-WDC_WD1002FAEX-00Z3A0_WD-WCATR4499656
env.dev_sd4 disk/by-id/ata-WDC_WD1002FAEX-00Z3A0_WD-WCATR4517732
env.dev_sd5 disk/by-id/ata-WDC_WD10EARS-00Z5B1_WD-WMAVU2720459
env.dev_sd6 disk/by-id/ata-WDC_WD10EARS-00Z5B1_WD-WMAVU2721970

For those curious, the temperatures vary between 39°C for sd6 and 60°C for sd3.

I have to say I’m very glad I started using Munin, as it helps understanding a few important things, among which is the fact that I need to separate my storage to a separate system, rather than delegating everything to Yamato (I’ll do so as soon as I have a real office and some extra cash), and that using a bridge to connect the virtualised guests to my home network is not a good idea (having Yamato with the network card in promiscuous mode means that all the packets are received by it, even when they are directed to the access point connecting me to the router downstairs).

One thought on “Munin no[dt]es

  1. You can just list the devices in env.drives, like this:[hddtemp_smartctl]user rootgroup diskenv.drives disk/by-id/ata-WDC_WD360GD-00FNA0_WD-WMAH91043197 disk/by-id/ata-SAMSUNG_HD321KJ_S0MQJDWQ155517 disk/by-id/ata-ST3250410AS_9RY2F1RL disk/by-id/ata-ST3250410AS_9RY2F0RD disk/by-id/ata-WDC_WD2500YS-01SHB1_WD-WCANY3985334Unfortunately there is a bug that breaks parsing of some of those device IDs, but it’s easy to fix: http://munin-monitoring.org…Also, can’t wait for munin IPv6 support as well!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s