Home Assistant and CGG1 Sensors

You may or may not know that I’m one of the few tech people who don’t seem to scream against IoT and I actually have quite a few “smart home” devices in my apartment — funnily enough, one less now that we don’t have normal heating at home and we had to give up the Nest Thermostat. One of the things that I have not set up until now was Home Assistant, simply because I didn’t really need anything from it. But when we moved to this flat, I felt the need to have some monitoring of humidity and temperature in the “winter garden” (a part of the flat that is next to the outside windows, and is not heated), since we moved up in floors and we no longer have the “repair” of being next to another building.

After talking with my friend Srdjan, who had experience with this, I settled on ordering Xiaomi-compatible “ClearGrass” CGG1 sensors from AliExpress. These are BLE sensors, which mean they can broadcast their readings without requiring active connections, and they have a nice eInk display, which wastes very little power to keep running. There’s other similar sensors, some cheaper in the short run, but these sounded like the right compromise: more expensive to buy, but cheaper to run (the batteries are supposed to last six months, and cost less than 50p).

Unfortunately, getting them to work turned out to be a bit more complicated than either of us planned at the beginning. The sensors arrived on a day off, less than two weeks after ordering (AliExpress is actually fairly reliable when ordering to London), and I spent pretty much the whole day, together with Srdjan over on videocall, to get them to work. To save this kind of pain for the next person who come across these issues, I decided to write this up.

Hardware Setup Advices

Before we dig into the actual configuration and setup, let me point out a few things about setting up these devices. The most annoying part is that the batteries are, obviously, isolated to avoid running out in shipping, but the “pull out tab” doesn’t actually pull out. It took quite a bit of force to turn the battery compartment door, open it up, and then re-set it in. Be bold in taking them out.

The other advice is not from me but from Srdjan: write the MAC address (or however it’s called in Bluetooth land) on each of the sensors. Because if you only have one sensor, it’s very easy to tell which one it is, but if you bought more (say, four like I did), then you may have issues later to identify which one is which. So it’s easier to do while you’re setting them up, by turning them on one at a time.

To tell what’s the address, you can either use an app like Beacon Simulator, and listen to the broadcast, or you can wait until you get to a point when you’re ready to listen to the broadcast, later in the process. I would recommend the former, particularly if you live in a block of flats. Not only there’s a lot of stuff that broadcast BLE beacons, but nowadays pretty much every phone is broadcasting them as well due to the Covid 19 Exposure Notifications.

And to make sure that you don’t mix them up, I definitely suggest to use a label maker — and if you are interested in the topic, make sure to check out the Museum of Curiosity Series 13, Episode 6.

Finally, there’s a bit of a spoiler of where this whole process is going to end up going to — I’m going to explicitly suggest you avoid using USB Bluetooth dongles, and instead get yourself an ESP32 kit of some kind. ESP32 devkits are less than £10 on Amazon at the time of writing, and you can find them even cheaper on AliExpress — and they will be much more useful, as I’ll go ahead and tell you.

Home Assistant Direct Integration

So first of all, we spent a lot of time mucking around with the actual Home Assistant integrations. There’s a mitemp_bt sensor integration in Home Assistant, but it’s an “active” Bluetooth implementation, that is (supposedly) more power hungry, and require associating the sensors with the host running Home Assistant. There’s also an alternative implementation, that is supposed to use passive BLE scans.

Unfortunately, even trying to install the alternative implementation turned out to be annoying and difficult — the official instructions appears to expect you install another “store-like” interface on top of Home Assistant, which appears to not be that easy to do when you use their “virtual appliance” image in the first place. I ended up hacking it up a bit, but got absolutely nothing out of it: there isn’t enough logging to know what’s going on at any time, and I couldn’t tell if any packet was even received and parsed.

There is also a clear divergence between the Home Assistant guidelines on how to build new integration, and the way the alternative implementation is written — one of the guides (which I can’t now find easily, and that might speak to the reason for this divergence) explicitly suggests not to write complex parsing logic in the integration, and instead build an external Python library to implement protocols and parsers. This is particularly useful when you want to test something outside of Home Assistant, to confirm it works first.

In this case, having a library (and maybe a command line tool for testing) would have made it easier to figure out if the problem with the sensors was that nothing was received, or that something was wrong with the received data.

This was made more annoying too by the fact that for this to work, you need a working Bluetooth adapter connected to your Home Assistant host — which in my case is a virtual machine. And the alternative implementation tells you that it might interfere with other Bluetooth integrations, so you’re suggested to keep multiple Bluetooth interfaces, one for each of the integrations.

Now this shouldn’t be too hard, but it is: the cheapest Bluetooth dongles I found on Amazon are based on Realtek chipsets, which while supported by (recent enough) Linux kernels, need firmware files. Indeed the one dongle I got requires Linux 5.8 or later, or it requests the wrong firmware file altogether. And there’s no way to install firmware files in the Home Assistant virtual appliance. I tried, quite a few times by now.

ESPHome Saves The Day

ESPHome is a project implementing firmware (or firmware building blocks, rather) for ESP8266 and ESP32 boards and devices, that integrates fairly easily with Home Assistant. And since ESP32 supports BLE, ESPHome supports Xiaomi-compatible BLE sensors, such as the CGG1. So the suggestion from Srdjan, which is what he’s been doing himself, is to basically use an ESP32 board as a BLE-to-WiFi bridge.

This was easy because I had a bunch of ESP32 boards in a box from my previous experiments with acrylic lamps, but as I said they are also the same price, if not cheaper, than Bluetooth dongles. The one I’m using is smaller than most breadboard-compatible ESP32 boards and nearly square — it was a cheap option at the time, but I can’t seem to find one available to build now. It’s working out well by size, because it also doesn’t have any pin headers soldered, so I’m just going to double-side-tape it to something and give it a USB power cable.

But it couldn’t be as easy as to follow the documentation, unfortunately. While configuring the ESPhome is easy, and I did manage to get some readings almost right away, I found that after a couple of minutes, it would stop seeing any signal whatsoever from any of the sensors.

Digging around, I found that this was not uncommon. There’s two ESPhome issues from 2019: #317 and #735 that report this kind of problems, with no good final answer on how to solve them, and unfortunately locked to collaborators, so I can’t leave breadcrumbs for the next person in there — and it’s why I am now writing this, hopefully it’ll save headaches for others.

The problem, as detailed in #735, is that the BLE scan parameters need to be adjusted to avoid missing the sensors’ broadcasts. I tried a few combinations, and at the end found that disabling the “active” scan worked — that is, letting the ESP32 passively listen to the broadcasts, without trying to actively scan the Bluetooth channels seemed to let it stay stable, now for over 24 hours. And it should also be, as far as I can tell, less battery-draining.

The final configuration looks something like this:

    duration: 300s
    window: 48ms
    interval: 64ms
    active: False

  - platform: xiaomi_cgg1
    mac_address: "XX:XX:XX:XX:XX:XX"
      name: "Far Corner Temperature"
      name: "Far Corner Humidity"
      name: "Far Corner Battery Level"

The actual scan parameters should be, as far as I can tell, ignored when disabling the active scan. But since it works, I don’t dare to touch it yet. The code in ESPhome doesn’t make it very clear if changing those parameters when disabling active scan is entirely ignored, and I have not spent enough time going through the deep stack to figure this out for certain.

The only unfortunate option of having it set up this way, is that by default, Home Assistant will report all of the sensors in the same “room”, despite them being spread all over the apartment (okay, not all over the apartment in this case but in the winter garden).

I solved that by just disabling the default Lovelace dashboard and customizing it. Because turns out it’s much nicer to customize those dashboards than using the default, and it actually makes a lot more sense to look at it that way.

Looking To The Future

So now I have, for the first time, a good reason to use Home Assistant and to play around with it a bit. I actually have interesting “connected home” ideas to go with it — but they mostly rely on getting the pantograph windows working, as they don’t seem currently to be opening at all.

If I’m correct in my understanding, we’ll need to get the building managers to come and fix the electronic controls, in which case I’ll ask for something I can control somehow. And then it should be possible to have Home Assistant open the windows in the morning, assuming it’s not raining and the temperature is such that a bit of fresh air would be welcome (which is most of the summer, here).

It also gives me a bit more of an incentive to finish my acrylic lamps work — it would be easy, I think, to use one of those as a BLE bridge that just happens to have a NeoPixel output channel. And if I’m going “all in” to wire stuff into Home Assistant, it would also allow me to use the acrylic lamps as a way to signal for stuff.

So possibly expect a bit more noise on this front from me, either here or on Twitter.

Update 2020-12-06: The sensors I wrote about above are an older firmware CGG1 sensors that don’t use encryption. There appear to be a newer version out that does use encryption, and thus require extracting a bind key. Please follow the reported issue for updates.

Update 2021-06-26: I forgot to update this, but ESPHome 1.17.1 and later support encrypted CGG1 out of the box as my pull request was merged.

Munin, sensors and IPMI

In my previous post about Munin I said that I was still working on making sure that the async support would reach Gentoo in a way that actually worked. Now with version 2.0.7-r5 this is vastly possible, and it’s documented on the Wiki for you all to use.

Unfortunately, while testing it, I found out that one of the boxes I’m monitoring, the office’s firewall, was going crazy if I used the async spooled node, reporting fan speeds way too low (87 RPMs) or way too high (300K), and with similar effects on the temperatures as well. This also seems to have caused the fans to go out of control and run constantly at their 4KRPM instead of their usual 2KRPM. The kernel log showed that there was something going wrong with the i2c access, which is what the sensors program uses.

I started looking into the sensors_ plugin that comes with Munin, which I knew already a bit as I fixed it to match some of my systems before… and the problem is that for each box I was monitoring, it would have to execute sensors six times: twice for each graph (fan speed, temperature, voltages), one for config and one for fetching the data. And since there is no way to tell it to just fetch some of the data instead of all of it, it meant many transactions had to go over the i2c bus, all at the same time (when using munin async, the plugins are fetched in parallel). Understanding that the situation is next to unsolvable with that original code, and having one day “half off” at work, I decided to write a new plugin.

This time, instead of using the sensors program, I decided to just access /sys directly. This is quite faster and allows to pinpoint what data you need to fetch. In particular during the config step, there is no reason to fetch the actual value, which saves many i2c transactions even just there. While at it, I also made it a multigraph plugin, instead of the old wildcard one, so that you only need to call it once, and it’ll prepare, serially, all the available graphs: in addition to those that were supported before, which included power – as it’s exposed by the CPUs on Excelsior – I added a few that I haven’t been able to try but are documented by the hwmon sysfs interface, namely current and humidity.

The new plugin is available on the contrib repository – which I haven’t found a decent way to package yet – as sensors/hwmon and is still written in Perl. It’s definitely faster, has fewer dependencies and it’s definitely more reliable at leas ton my firewall. Unfortunately, there is one feature that is missing: sensors would sometimes report an explicit label for temperature data.. but that’s entirely handled in userland. Since we’re reading the data straight from the kernel, most of those labels are lost. For drivers that do expose those labels, such as coretemp, they are used, though.

Also we lose the ability to ignore the values from the get-go, like I described before but you can’t always win. You’ll have to ignore the graph data from the master instead. Otherwise you might want to find a way to tell the kernel to not report that data. The same probably is true for the names, although unfortunately…

[temp*_label] Should only be created if the driver has hints about what this temperature channel is being used for, and user-space doesn’t. In all other cases, the label is provided by user-space.

But I wouldn’t be surprised if it was possible to change that a tinsy bit. Also, while it does forfeit some of the labeling that the sensors program do, I was able to make it nicer when anonymous data is present — it wasn’t so rare to have more than one temp1 value as it was the first temperature channel for each of the (multiple) controllers, such as the Super I/O, ACPI Thermal Zone, and video card. My plugin outputs the controller and the channel name, instead of just the channel name.

After I’ve completed and tested my hwmon plugin I moved on to re-rewrite the IPMI plugin. If you remember the saga I first rewrote the original ipmi_ wildcard plugin in freeipmi_, including support for the same wildcards as ipmisensor_, so that instead of using OpenIPMI (and gawk), it would use FreeIPMI (and awk). The reason was that FreeIPMI can cache SDR information automatically, whereas OpenIPMI does have support, but you have to tackle it manually. The new plugin was also designed to work for virtual nodes, akin to the various SNMP plugins, so that I could monitor some of the servers we have in production, where I can’t install Munin, or I can’t install FreeIPMI. I have replaced the original IPMI plugin, which I was never able to get working on any of my servers, with my version on Gentoo for Munin 2.0. I expect Munin 2.1 to come with the FreeIPMI-based plugin by default.

Unfortunately, like for the sensors_ plugin, my plugin was calling the command six times per host — although this allows you to filter for the type of sensors you want to receive data for. And that became even worse when you have to monitor foreign virtual nodes. How do I solve that? I decided to rewrite it to be multigraph as well… but shell script then was difficult to handle, which means that it’s now also written in Perl. The new freeipmi, non-wildcard, virtual node-capable plugin is available in the same repository and directory as hwmon. My network switch thanks me for that.

Of course unfortunately the async node still does not support multiple hosts, that’s something for later on. In the mean time though, it does spare me lots of grief and I’m happy I took the time working on these two plugins.

Munin and lm_sensors

I’ve already posted about some munin notes before, when I had to fight with the hddtemp_smartctl plugin and with the bogus readings on my frontend’s sensors output. Today I’ll write a few more notes related to the sensors_ plugin, which heavily tie into lm_sensors territory.

Beside simply monitoring and graphing the input data, Munin provides support for notifying values that get too high, or too low, and that might require direct action. When possible, these values are provided by the plugin itself, which means, for what concerns lm_sensors that the same min/max values as reported by the sensors command will be used, something like this:

Adapter: ISA adapter
in0:          +3.06 V  (min =  +2.99 V, max =  +2.28 V)  ALARM
in1:          +3.06 V  (min =  +2.27 V, max =  +1.20 V)  ALARM
in2:          +1.13 V  (min =  +1.79 V, max =  +1.20 V)  ALARM
in3:          +2.94 V  (min =  +1.88 V, max =  +2.53 V)  ALARM
in4:          +2.71 V  (min =  +1.03 V, max =  +0.80 V)  ALARM
in5:          +2.86 V  (min =  +3.05 V, max =  +1.93 V)  ALARM
in6:          +1.46 V  (min =  +0.78 V, max =  +2.54 V)
3VSB:         +4.08 V  (min =  +4.20 V, max =  +2.66 V)  ALARM
Vbat:         +3.43 V  
fan2:        3901 RPM  (min =  121 RPM)
temp1:        +85.0°C  (low  = -61.0°C, high = +100.0°C)  sensor = thermal diode
temp2:        +79.0°C  (low  = +115.0°C, high = +123.0°C)  sensor = thermistor

As you can see from the output, not always the values are very meaningful: the box (which is the same AT5IONT-I I referred to in the previous post) turns off much sooner than reaching those temperatures: as soon as temp2 reaches 90°C it is already too late, and having a fan run at 121 RPM would mean that the box is gone for good.

I usually don’t care much about those values as I keep an eye on the boxes’ health at least once a day, lately a bit more because I’m monitoring a new production server for a customer. And my main home router always reported issues with its fans, even though the fans themselves where running properly. I never cared much because I knew it was a fluke there.

But when today, after the summer holidays, another customer’s backup server started showing warnings in temperature, I was much more worried — after all, these are very hot days and I’m myself almost going to pass out because of it. The warning turned out to be another fluke: I updated the server to kernel 2.6.39 before holidays, which has a driver for the sensor in that box.. unfortunately the sensors output reported 0°C as the critical temperature, and munin assumed it was bad.

While there are one or two ways to work this around on Munin side, I found it more solid to actually resolve the issue on the sensors themselves. You can do that by creating a /etc/sensors.d/local file, with a few directives for the sensors modules to handle:

chip "nct6775-isa-0680"
     set temp1_max 80
     set temp1_max_hyst 75

But just creating this file is not enough to actually have it working, if you tried it and it failed: you have to tell the kernel about it, because the data sensors displays, it simply fetches straight from the kernel. This is done through sensors -s or through the lm_sensors init script if you have INITSENSORS=yes in /etc/conf.d/lm_sensors.

What was the problem with my router’s sensors then? Well, one of the fans’ minimum value was set to over twenty thousands rotations per minute.. it was just a matter of resetting it to 1500 (for a fan that averages on 2000 RPM it should be okay to warn at that point) and it works quite nicely without warning me.

I’m actually now considering to spend some more time to get the limits set to actually relevant values so that I would be warned in case of a serious issue, but for now I think that just not having a visual clue for a fluke would be enough.

Finally, I wish to thank Steve Schnepp for both implementing native IPv6 support in the upcoming Munin 2.0 version (which should simplify a lot the current mess of aggregating data for the monitored hosts on my system, together with the native SSH transport), and for merging the three patches I sent them, which will be part of the new release.

Munin no[dt]es

Back when I was looking at entropy I took Jeremy’s suggestion and started setting up Munin on my boxes to have an idea of their average day situation. This was actually immensely useful to note that the EntropyKey worked fine, but the spike in load and processes caused entropy depletion nonetheless. After a bit of time spent tweaking, trying, and working with munin, I now have a few considerations I’d like to make you a part of.

First of all, you might remember my custom ModSecurity ruleset (which seems not to pick the interest of very many peopole). One of the things that my ruleset filters is requests coming from user-agents only reporting the underlying library version (such as libCurl, http-access2, libwww-perl, …), as most of the time they seem to be bound to be custom scripts or bots. As it happens, even the latest sources for Munin do not report themselves as being part of anything “bigger”, with the final result that you can’t monitor Apache when my ModSecurity ruleset is present, d’oh!

Luckily, Christian helped me out and provided me with a patch (that is attached to bug #370131 which I’ll come back to later) that makes munin plugins show themselves as part of munin when sending the requests, which stops my ruleset from rejecting the request altogether.

In the bug you’ll also find a patch that changes the apc_nis plugin, used to request data to APC UPSes though apcupsd, to graph one more variable, the UPS internal temperature. This was probably ignored before because it is not present on most low-end series, such as the Back-UPS, but it is available in my SmartUPS, which last year did indeed shut down abruptly, most likely because of overheating.

Unfortunately it’s not always as easy to fix the trouble; indeed one of the most obnoxious, for me, issues is that Munin does not support using IPv6 to connect to nodes. Unfortunately the problem lies in the Perl Net-Server module, and there is no known solution, barring some custom distribution patches — which I’d rather not ask to implement in Gentoo. For now I solved this by using ssh with port forwarding over address, which is not really nice, but works decently for my needs.

Another interesting area relates to sensors handling: Munin comes with a plugin to interpret the output of the sensors program from lm_sensors, which is tremendously nice to make sure that a system is not overheating constantly. Unfortunately this doesn’t always work as good. Turns out that the output format expected by Munin has changed quite a bit in the latest versions, namely the min/max/hyst tuple of extra values vary their name depending on the used chip. Plus it doesn’t take into consideration the option that one of the sensors is altogether disabled.

This last problem is what hit me on Raven (my frontend system). The sensors of the board – Asus AT5IONT-I – were not supported on Linux up to version 2.6.38; I could only watch over the values reported by coretemp, the Intel CPU temperature sensor. With versions 2.6.39, driver it8721 finally supported the board and I could watch over all the temperature values. But of the three temperature values available from the sensor, only two are actually wired, respectively to a thermal diode and a thermistor; the third results disabled, but is still reported by the sensors output with a value of –128°C which Munin then graphs. The only way I found to disable that, was to create a local sensors.d configuration file to stop the value from appearing.

One interesting note relates to the network I/O graphing: Munin didn’t seem to take very nice the rollover at the reboot of my router: it reported a peak of petabits I/O at the time. The problem turns out to be easy to solve, but not as easy to guess. When reading the ifconfig output, the if_ plugin does not discard any datapoint, even if it is bigger than the speed of any given interface, unless it can read the interface’s real speed through mii-tool. Unfortunately it can only do that if the plugin is executed as root, which obviously is not the default.

Another funny plugin is hddtemp_smartctl — rather than using the hddtemp command itself, it uses smartctl which is part of smartmontools, which I most definitely keep around more often than the former. Unfortunately it also has limitations; the most obnoxious is that it doesn’t allow you to just list a bunch of device paths. Instead you have to first provide it with a set of device names, and then you can override their default device path; this is further complicated by the fact that each of the paths is forced to have /dev/ prefixed. Since Yamato has a number of devices, and not always they seem to get the same letters, I had to set it up this way:

user root
group disk
env.drives sd1 sd2 sd3 sd4 sd5 sd6
env.dev_sd1 disk/by-id/ata-WDC_WD3202ABYS-01B7A0_WD-WCAT15838155
env.dev_sd2 disk/by-id/ata-WDC_WD3202ABYS-01B7A0_WD-WCAT16241483
env.dev_sd3 disk/by-id/ata-WDC_WD1002FAEX-00Z3A0_WD-WCATR4499656
env.dev_sd4 disk/by-id/ata-WDC_WD1002FAEX-00Z3A0_WD-WCATR4517732
env.dev_sd5 disk/by-id/ata-WDC_WD10EARS-00Z5B1_WD-WMAVU2720459
env.dev_sd6 disk/by-id/ata-WDC_WD10EARS-00Z5B1_WD-WMAVU2721970

For those curious, the temperatures vary between 39°C for sd6 and 60°C for sd3.

I have to say I’m very glad I started using Munin, as it helps understanding a few important things, among which is the fact that I need to separate my storage to a separate system, rather than delegating everything to Yamato (I’ll do so as soon as I have a real office and some extra cash), and that using a bridge to connect the virtualised guests to my home network is not a good idea (having Yamato with the network card in promiscuous mode means that all the packets are received by it, even when they are directed to the access point connecting me to the router downstairs).