Reverse Engineering an LG Aircon Control Panel — Low-Speed Serial Issues

In the previous part of the LG aircon reverse engineering, I gave the impression that the circuitry design was a problem mostly solved. After all, I found working LINbus transceivers, and I figured out a way to keep one of them “quiet” when not in use. And indeed, when I started drafting the post, it was supposed to be followed by a description of the protocol I identified, and should have been talking about how I wrote tools to simulate this protocol to test the implementation without actually touching the HVAC unit.

And that’s what I set out to do myself on a live stream, nearly two months ago now — the video says “Attempt 2” because on the first, short stream I ended up showing on stream my home wifi password and, even though it’s not likely that a bad actor would show up at my place to take over the wifi, it’s not good opsec, so I stopped the stream, rotated the password of all the devices at home, and came back to the stream.

So what happened? Well, while I was trying to figure out how to build an ESPHome custom component for the “climate” platform, but trying to send sequence of bytes through the serial port appeared to not work correctly: instead of being sent at the selected polling frequency, they would be “bunched up” together, sending three bytes instead of one, or twelve instead of six. It worked “fine” if I flushed the serial port, but the flush operation would then take longer than the time between the commands I wanted to send, so that didn’t sound like a good plan.

As you can imagine from the title, this particular problem only happened with the slow, 104 8n1 configuration that the LG aircon needs — it didn’t happen at all with higher baudrates such as 9600, which suggested the problem was related to the timing of the connection, which is not uncommon: a lot of UART implementations that include FIFOs tend to define some timing based on the timing of a “space” or of a full character.

What also suggested me that, is that someone, somewhere, was complaining that the ESP32 couldn’t do the slow speed that this aircon needs, and that they preferred using the ESP8266 because that one came with a software serial implementation. Unfortunately, I cannot find anymore where I read that, to link it there and to point out that the code for the ESP8266 software serial actually works without significant modifications on the ESP32 — it’s just that the lack of need for it means it’s not readily available.

So indeed, I managed to get the ESP8266 software serial to work… except for the fact that it was not quite reliable. At 104 bps (which is the speed the aircon protocol needs) sending a six bytes sequence (which is the size of an aircon packet) takes about half a second — add another half second for the response (which is also six bytes), and you have a recipe for a disaster: one second every two seconds (which is the frequency of command exchange between panel and HVAC) would be spent just on serial communication — anything else happening during those time and messing up the timing meant bad communication.

Another nearly-software-based alternative I attempted, and also kind-of worked, was using the RMT peripheral. This is the remote control peripheral included in ESP32 — and the reason why Circuit Python made it harder to send pulse trains on FeatherS2: it’s no longer just implemented in software, but it relies on hardware buffers to allow sending and receiving pulse trains. It’s awesome to offload that, but it also comes with limitations. In particular, while I did manage to implement a 104 bps serial transmission through this interface, it would only allow me one serial pair rather than two, severely limiting what I could be doing with the aircon board.

Content Warning: though I’m personally aiming at following more modern conventions, the terminology in use by datasheet and other content that I’m about to link to still use older, less inclusive terminology.

UART — But Discretely

So instead, I used my overcomplication skill to come up with an alternative: discrete UARTs! You see, back in the days when personal computers came with serial ports, and before chipsets started having everything into a single physical chip, and even before most of the functionality of basic peripherals was merged into a Super I/O chip, we had multiple competing discrete UART chips available. The most common one being the 16550, at least for my personal experience. These are still available, and you can indeed still use the 16550, although to do so you need to use a lot of I/O lines, as 16550 and compatible have usually a parallel I/O interface, which is suitable for ISA bus connection, but not so suitable for a microcontroller even when it’s quite generous with its GPIO lines.

As an alternative, I started looking into using a SC16IS741A, which is a I²C (and SPI) chip. Instead of using a lot of separate I/O lines for sending commands and data to it, you send it with the usual two wire interface, and it internally decodes it into whatever format it needs. You may wonder what the difference is, between using the actual UART and sending it over to an I²C hardware UART — the answer is a bit complicated to explain theoretically, but I think a visualization of it will go a long way to explain:

What you see here is a screenshot from the Saleae Logic software capturing three I/O lines: the I²C bus (SCL above, SDA below), and the TX line of the discrete UART. This is a very much not optimized transaction that shows the sending of one byte at the 104 bps configuration that my aircon control needs. It’s one order of magnitude slower to send the byte than to set up the UART to send the message and send it, and this is with a relatively slow bus as I²C is. And it doesn’t scale linearly, even.

Basically, the discrete UART allows offloading the whole process of keeping up with the timing, and it does so in a very efficient way. Receiving is even more interesting, because it does not require the microcontroller to pay attention to the received bytes until it’s ready to process them, and maintain them in the FIFO in the meantime. But this kind of features already exist in most microcontrollers (often referred to “hardware UART”), and when they work, that’s awesome… but clearly sometimes they don’t quite work.

This particular device would be useful on boards based around the older ESP8266 micro, as that only has a single hardware UART, and is used for the logging. With one (or more) of this or similar chips you would be able to control a much wider number of serial-controlled devices, and that makes them valuable.

Unfortunately, ESPHome does not really have a way to provide an uart bus outside of the core platform, at least right now. If I did end up working more down this particular route, I would probably have paid more attention to integrate it — it’s not hard to provide the same interface so that it would be a drop-in replacement, but it does require some reshuffling of the core hierarchy so I don’t think I can pull that out just yet.

Writing a Device Driver, a Component, a Library

In either case, whether you want to integrate this directly in ESPHome, or use it from another software stack, you need to implement its own command set. This is what you usually refer to as a “device driver” for operating systems such as Linux and Windows, as it provides access to the underlying device with some standard interface. And indeed, the Linux kernel has a driver for a set of related of NXP UART peripherals: sc16is7xx.

Now, while the NXP-provided datasheet has all the information needed to program for this chip, having the source reference on Linux made things significantly easier, particularly because unless you know what to look for, you most likely will misread the datasheet. I have some (though not much) experience with I²C devices, but there were a few things that ended up confusing me enough that I wasted hours down the wrong route.

The first problem was figuring out the addressing convention. I²C addresses should, by convention be 7 bit. While the protocol sends a whole byte for addressing, it uses the last bit in it to specify whether you’re issuing a read or a write. But despite this being the common convention, and the one that ESPHome, CircuitPython, and just about anything else expects you to use, some datasheet do not follow that, and provide the full 8 bit “address”. You can see more details on that on the Total Phase website, which was instrumental for me to get to the bottom of why things kept disagreeing with what I was writing.

Once the peripheral was addressed, the question to answer was about the registers addresses. In my first attempt at configuring the chip I was falling short. A quick look through the Linux sources told me that I was missing a left shift of the register address… which made me go “Huh?” for a while. Indeed, the datasheet provides explicit tables explaining the register addressing: registers have a 4-bit address, but are set in the 3:6 bits of a byte, with bit 7 (MSB) being used in SPI only to select between reading and writing to the register. Of the remaining three bits, two are used to select between channels — because some of the matching chips by NXP include more than one UART on board, though unfortunately I couldn’t find one that I could easily work with on a breadboard that did. The last one (LSB)… is not used. It’s always off and reserved. But more interestingly, the Linux driver only shifts the address by two, not by three like I had to. So I’m wondering if this does mean that this chip is only mostly compatible with the ones I was looking.

So after one full day of figuring out how to properly run my component over ESPHome, I decided I needed something for prototyping this I2C faster.

Enter Circuit Python

By now you probably know that I do like Circuit Python. And as it turns out, I already have written some code for I²C in Circuit Python when I extend the MCP230xx library to include the older ’16 model. So it didn’t feel too odd to go ahead and use a Trinket M0 with Circuit Python to play around with the UART.

The choice of the Trinket M0 over the more capable Feathers was not random: while the Trinket has a physical UART and the pins to use it, it’s also a very tiny device. The fact that you can use multiple physical UARTs through the I²C bus allows a significant expansion of the I/O abilities of that class of microcontrollers.

At the end, I not only ended up writing a CircuitPython compatible library that allowed me to use the UART, but also re-writing it to leverage the Adafruit_CircuitPython_Register library, making it significantly easier to add support for more features.

The library supports an interface that is nearly identical to the one provided by the built-in serial, although I don’t think theré sa way to make sure it really is, because similarly to ESPHome it doesn’t look like Circuit Python ever consider the need to support UARTs that are not part of the original hardware design, understandably so, as these discrete UARTs are still fairly uncommon.

But I went one step further: when I read the datasheet the first time I wasn’t sure just how strong the suggestion for 1.8432 MHz crystals was, for the divisor. Turns out it’s not strong at all, so the whole amount of crystals I bought at that frequency are not particularly helpful. Worse yet, it turns out I don’t need any clock, because even the Trinket M0 is able to create a 50% duty cycle PWM output at a frequency that is high enough to use as driving clock for the UART.

That means that I can fit, in a half sized breadboard, the whole circuitry I need, including only two passives (the pull-up resistors on the SCL/SDA lines), providing the clock as a PWM output from the Trinket while also piggybacking its reset line. This was a surprisingly good setup, and would actually allow me to control the two sides of my aircon (panel and hvac) if I was still going with the discrete UART idea.

But it turns out I really don’t need any of that: ESP32’s UARTs worked out just fine at the end — at least in the most recent firmware as uploaded by ESPHome, so I decided to set aside again the UARTs and try instead to control the aircon at least with an USB-to-UART adapter. But that’s a story for another post.

Bonus Chapter: Dual-UART chips

As I said earlier, there are some options out there for multiple UART chips that would be interesting to use for cases like mine, in which I need two independent, yet identically configured UARTs. Dual UART chips are not uncommon, but I²C controlled ones are.

If you look around for I²C Dual-UART options, you most ilkely will end up on the DFRobot DFR0627 which is an “IIC to Dual UART Module” — IIC being the name you’ll find used on Chinese products to refer to I²C (it’s like TF card instead of SD card, don’t ask me.)

So why did I not even consider this particular option? Well, the first issue is that this is a full module, that uses the Gravity connector (which is similar to, but as far as I know not compatible with, the Stemma QT connector that Adafruit uses), the second issue is that there’s no documentation to go with how to use it.

Since I want to, at the end of the whole process, have a printed board I can just hang on the wall (possibly with a 3D Printed case, but that’s further along the way), I need to be able to get the components I want in retail-enough options that I can buy them and solder them in myself. I also need to be able to control those components with arbitrary software.

The DFRobot modules tend to have Arduino components, which you may still be able to use for ESPHome, but you wouldn’t be able to use with Circuit python during the more iterative side of the project. Since these components are open source you could go ahead and reverse engineer it from those, but it would be much easier to develop for something that has a datasheet and some documentation.

Indeed, the DFRobot website does not even tell you what chip is on the module. though if you look around in the forums you can find a reference to WEIKAI WK2132-ISSG, which is available through LCSC and comes with a datasheet. In Chinese.

If you just look at the pictures, you can at least confirm that this device is similar in functionality to what the NXP part I’ve been working with provides, except for the fact that it does not have the full RS-232 style CTS/RTS lines. So it really would be an interesting part if I ever decided to go back to the idea of using discrete UARTs, but it would require at the very least for me to get one of my Chinese-reading friends to translate enough of this 28 pages datasheet to be able to tell what to do. That is unlikely.

Reverse Engineering an LG Aircon Control Panel — Buses and Cars

This is part two of my tale of reverse engineering the air conditioning control panel in our apartment. See the first part for further details.

If you are binging on retrocomputing videos like I’ve been doing myself, you may have the wrong impression that a bus has to have multiple lines, like the ISA and PCI buses do. But the truth is that a single-wire bus is not unheard of or even uncommon. It just means that the communication needs to be defined in such a way that there’s no confusing as for who is sending at any given time. In this case, it’s clear that the control panel is sending six bytes which are immediately (and I do mean immediately) followed by six bytes response from the HVAC.

So the next step was to figure out what those six bytes where, and thanks to Saleae’s recent licensing of sample high-level analyzers, this became a piece of cake. While I’m not at liberty to share the code, at the time of writing, I ended up writing an analyzer that would frame together the 6 bytes from the panel and the 6 bytes from the HVAC. Once I had that, it was also easier to notice that the checksum byte was indeed the same as other LG protocols, it’s just that it applied separately to the two 6 bytes packets, which means there’s only five bytes in the message that need to be decoded.

A screenshot of the Logic 2 software showing an analyzed trace with the high level analyzer loaded.

With a bit of trial and error, I already decoded what I think will give me most of the important controls for my plan: how to change the mode between the aircon, heat pump, fan, dehumidifier, and how to change the fan speed. The funniest part is that the “Auto” mode is actually not a mode at all, and just means that the thermostat appears to be sending the “aircon” or “heat pump” as needed.

What got even more interesting, is that if you leave the control panel by itself, after a few minutes it appears to notice the lack of an HVAC connected, and goes into an error state where it alternates the display between “Ch” and “3”. Either it’s reporting its own channel for diagnostics (assuming it’s misconfigured) or it’s just showing a particular error status. In either case, that threw a spanner in my plans.

The first problem is that obviously you wouldn’t be able to connect the 12V data wire to the ESP32 directly. That’s kind of obvious: the ESP32 is a 3.3V microcontroller, and if you tried to use a 12V wire with it it’ll just… go. My original intent was to use two optocouplers: one to receive the data from the control panel, and the other to inject my messages onto the wire. But that won’t work quite the same way for a bus, and while I could try to build up the right circuitry with discrete components, I would have rather used a ready-made transceiver.

The problem with that the transceivers are made for specific buses, and so the first question is to find the right bus that is by LG. A lot of HVAC systems (particularly in industrial scale) use Modbus over RS-485 — I have experience with this since the second company I ever worked for is a multinational that works in the industrial HVAC sector, so I learnt quite a bit of how those fit together. But an RS-485 connection would require two wires, since it uses differential signaling, and that’s already excluded.

Going pretty much by Google searches, I finally nailed down something useful. In the automotive industry, there’s a number of standards for on-board diagnostics (OBD). The possibly most famous (and nowadays most common) of those is the CAN bus, which is widely used outside that one industry, as well. LG is not using that. But one of the other protocols used is ISO 9141-2, which includes a K-Line bus on it, which according to Wikipedia is an asynchronous serial connection over a single bidirectional wire without handshakes — though it is using a 10.4kBd signal which is… exactly 100 times faster than the LG signal.

Through these, I found out about the LIN (Local Interconnect Network) bus, which is also used in automotive, specifies a higher level implementation on top of ISO 9141 compatible electrical signaling, but happens to be a good position to start the work with. Indeed, there are a number of LIN bus transceivers that are pretty oblivious of the addressing and framing on the protocol — on purpose, because the specifications have changed over the years. But what they are good for, is to connect to a 12V, high recessive bus, and provide microcontroller-leveled RX and TX signals.

An example of these transceivers is Microchip’s MCP2003, so I decided to set myself up to redesign the board based on that. But since the control panel also needed to receive “acknowledgements” from the HVAC, it meant that each “smart controller” needs two transceivers: one where it fakes the controller to the HVAC, and another one where it fakes the HVAC to the controller. And both of those needed to have the ability to just go into a “lurking” state where they wouldn’t be sending signals if I flipped a physical switch.

Screw It, I’m Doing It Live

So here’s where things got a bit more interesting in multiple directions. In the days just before this work, I was being asked a few pointers about reverse engineering — and unfortunately I don’t know how to “teach” RE, but I can at least “go through the motion”. After all, that was the more interesting part of my Cats Protection streaming week, so once the DigiKey order arrived with the transceivers and all the various passives to add around it, I decided to set up a camera, and try breadboarding the basic circuitry.

Now, setting aside the fact that I do not particularly enjoy streaming with an actual camera, and indeed the end results left a lot to be desired, the two hours stream was fairly productive. I found that the PL2303 USB-to-serial adapters actually work quite well at both 100 and 104 Bps, and that indeed the transceiver mostly works fine.

It also showed an interesting effect that I did not expect: as I said earlier, after a few minutes without getting an answer from the HVAC, the control panel enters into an error state (Ch/3). I assumed that what it needed was a valid packet from the HVAC, with checksum and information. Instead, it seems like just filling up a buffer, even with invalid packets, is enough to keep the control panel working: as I typed random words onto the serial port, while connected to the bus, the Ch/3 error vanished, and the panel went back to a working state.

This was surprising for one more reason: at least some of the packets sent from the HVAC to the panel had to include the capabilities the HVAC system has to begin with. The reason why I knew that is that the control panel appears to have a lot more functions when it’s running standalone, compared to when it’s installed on the wall. Things like a “power” fan mode for the aircon, the swiveling ventilation, and so on.

Spoiler: it turned out to indeed be the case: the first two commands sent from the panel to the HVAC appear to be some sort of inquiry, that provide some state to the panel to know which features are supported, including the heat-pump mode and the different fan speeds. But for now, let’s move on.

Before I could go and and try to figure out which bit related to which capability I hit a snag, which is what I got stuck at the end of the stream there: sending the character ‘H’ on the serial port (a very random character that just happens to be the start of the string “Hello, world!”) showed me something was… not quite right.

A screenshot from Logic2 showing a 0x68 sequence being interpreted as 0x6C.

This is not easy to see, beside for the actual value changing, but in the image above the first row (Channel 0) is the 12V bus (which you can read on the fourth line is actually 10V), the second and fifth rows (Channel 4) are a probe connected to the RXD pin of the MCP2003, and the third and sixth (Channel 5) are a probe connected to the TXD pin (which is in turn connected to the TXD of the USB-to-serial adapter).

Visibly, the problem is that somehow the bus went from “dominant” (0V) to “recessive” (12 10V) too fast, making the second and third bits look like 1s instead of 0s. But why? My first thought was that it was an electrical characteristic I missed – I did skimp on capacitors and diodes on my breadboarding – but after the stream terminated, I grabbed my Boox, and checked the datasheet more carefully and…

1.5.5.1 TXD Dominant Time-out

If TXD is driven low for longer than approximately 25 ms, the LBUS pin is switched to Recessive mode and the part enters TOFF mode. This is to prevent the LIN node from permanently driving the LIN bus dominant. The transmitter is reenabled on the TXD rising edge.

MCP2003/4/3A/4A Datasheet, DS20002230G, page 10

25ms is nearly exactly how long the dip to dominant state is on Channel 0 (and about the same on Channel 4): it’s also nearly exactly 2.5 baud.

A Note About Baudrate

I have complained loudly before of how I’m annoyed at people who think those younger than them know nothing and should just be made fun of. I don’t believe in that, and I think we should try our best to explain the more “antique” knowledge when we have a chance.

Folks who have been doing computers and modems well before me appear to love teasing people about the difference between “baudrate” and “bits per second”. The short version of that is that the baud rate relates to the speed of sending a single impulse, while the bits per second (bps) is (usually, but not always) meant to be taking the speed of the actual data transmitted. The relation between the two is usually fixed per protocol, and depends on how you send those bits.

In a asynchronous serial protocol (including RS-232 and this LG abomination), you define how you send your bits with an expression such as “8n1” or “7-odd-2” (also called the framing parameters) — or a number of other similar expression with different values in them. These indicate that each character sent is respectively eight or seven bits in size, that the parity is not present in the first case, and is odd in the latter, and that the first includes only one stop bit while the second is providing two. In addition to this, there’s always a single start bit.

8n1 is probably the most common of the framings, and that means you’re actually sending 10 bits for each character. A baudrate of 9600 Bd/s gives you a 960 bps raw connection, the 104 value for LG is the actual baudrate, as I can measure one of the impulses from the original control panel at 9.745ms — which actually would put it around 103 Bd/s.

Which is where my assertion that 25ms is nearly exactly 2.5 baud — 2.65 to be a bit more precise: you take the length (25) and divide it for the time needed to send a single baud (0.9745).

What this means in practicality is that the MCP2003 series (including the more modern MCP2003B that includes the same time-out behaviour) has a minimum baud rate as well as a maximum one. The maximum one is documented in the datasheet as 20 Kb/s, but the minimum is affected by this timeout: a frame of all zeros would be the worst case scenario in this condition, as the line would be asserted low (“dominant”) for the longest time. While theoretically you can define framings the way you prefer, the common configurations vary between 5 and 9 data bits per frame (though I would have no clue how to process the 9 bits per frame to be honest!) — which means that the maximum number of space (‘0’) baud would vary between 6 and 11.

Why six and eleven? Well, the “start” baud is also a space (logical zero) – which means that if your framing is 5n1, the 0x00 value would be sent with six “spaces”. And if you use nine data bits per frame with even parity, 0x000 would then be followed by a “space” in parity (to maintain the number of ‘1’ bits even), bringing it up to 11 (start, nine zeros, and parity).

The minimum baudrate for a certain framing configuration is thus calculated by dividing the maximum number of consecutive spaces the timeout in seconds (0.025), which leads to a minimum baudrate of 240 Bd/s for when using 5n1, 440 Bd/s for 9e1, and 360 Bd/s for the most commonly used 8n1 framing. Which is over three times faster than what these LG units are using.

I Need A New Bus Transceiver

Since I couldn’t use the MCP2003, I ordered a few MCP2021. Note that Microchip also says that these are not recommended in new designs, suggesting instead the ATA663232 — which as I’ll get to has all of the disadvantages of all the various options for LIN bus transceivers.

When I received the meter, I decided to take another stab at streaming setting up the emulator on camera:

If you watch the whole video you will see me at some point put a finger on the chip and yelp — turns out I ended up with a near-dead short on its embedded regulator. Thankfully, since the chip is designed for the automotive market, the stress did not cause it to fail at all, just… overheat. And as I showed on stream, I did manage to keep the control panel running with my “emulator”, although I did note some noise on the I/O towards the end.

So a little bit more exploration later told me that a) the PL2303 seems to be a bit unreliable with the 3.3V without tying the VREG with the 3.3V coming from the device, and b) even on the CH341 I would get some strange noise in addition to the signal. I think the reason for that is that the chip uses a comparator against its own regulator to decide whether the transmitter should be on. Since, as Monty and Hector suggested, it’s a bad idea to tie multiple regulators together, I decided that even the MCP2021 is not the transceiver I wanted.

Unfortunately, that made it harder to find the right transceiver. Microchip’s suggested replacement, the ATA6632xx series, has all of the disadvantages, as I said: it has the “TXD Dominant Timeout” feature (so it cannot send the 104bps signal I need to send), it includes a voltage regulator that cannot be disabled, and it is only available in VDFN package that is not possible to hand-solder.

On Digi-Key (which is by now my usual supplier), Microchip’s MCP20xx series are the only PDIP-8 through-hole components, so the next best thing is SOIC-8, which is surface mount (so not easily breadboardable) but still hand-solderable (with a steady hand, a magnifying glass, and a iron tip). Looking at those, I found at least two that fit.

ON Semiconductor’s NCV7327 was a very obvious choice because they explicitly say in the features list «Transmission Rate up to 20 kbps (No low limit due to absence of TxD Timeout function)», and it was the only one that I found explicitly note that the TxD Timeout imposes a floor to the speed (as I explained above). Unfortunately, the SOIC-8 version was not available at the time of order on Digi-Key, with a 22 weeks backorder.

So instead, I settled for Texas Instrument’s TLIN1027DRQ1. This is pretty much… the same. For what I can see, both ON’s and TI’s SOIC-8 devices are pin compatible, and they are nearly pin compatible with Microchip’s SOIC-8 variants, insofar as the power, bus, RXD, and TXD pins are in the same position.

There is, though, a rake just waiting for you there. The Enable/Chip Select pins on both the TLIN1027DRQ1 and the NCV7327 do not correspond to the MCP20xx Transmission Enable semantics, despite sharing the same position. With the MCP20xx you could leave a transceiver connected to a chatty bus, with the TXEnable off, and you would still receive the traffic from the bus.

But with the other two, you’re turning off the whole transceiver at once, which wouldn’t be too bad if it wasn’t that both of these pull TXD to ground (dominant), if you leave it unconnected. Again, this isn’t a big problem in by itself, as long as the firmware is told not to transmit when the bus is connected directly between the panel and the HVAC, nothing should be transmitted, right?

But this does break one assumption I was making: if I disable the smart controller board, I want to be able to remove the ESP32 devkit altogether. This is important because beside OTA (Over The Air) updates, I would need to be able to disconnect the ESP32 to update the firmware on it. Which means I don’t want to rely on the firmware being running and not holding the bus busy.

A schematic diagram of the Panel-side bus transceiver block.

So what I ended up adding to the design is a way for the bus selector to decide whether transmission is to be allowed on the transceiver. I think this is the first time I even consider the idea of using a 74-logic component in my designs (to the point that I had to figure out how to use that with the EAGLE-provided symbols — hint: use the invoke command), but this seemed to me as the easiest option to implement what I needed.

The tie-up-both-inputs for the NAND is literal textbook electronics, but turns out to work very well since the cheapest 74 logic NAND chip I found contains four of them, and I only need one other.

Note that of course this is only one of the “logical blocks” of the board — and actually not even the final form of it. As I get into more details later, you’ll find out that this only turned out to be one of the possible solutions, and (at the time of writing) there’s no guarantee that this is actually going to be the one I’m going to be using.

Reverse Engineering an LG Aircon Control Panel — Introduction

I like reverse engineering stuff. It’s not just the fact that it’s a nice puzzle to solve, but I enjoy the thrill of “Oh, that’s how that works.” I’m sure I’m not alone, as can be clearly seen following marcan’s Asahi Linux work, or following Foone on Twitter, or Big Clive on YouTube (and many, many others).

Sometimes, a lot more rarely, my reverse engineering is actually geared towards something I want to make use of, rather than just for the sake of finding answers — this is one of those cases. If you have been following me on Twitter or decided to watch me work on this live on Twitch, you probably already know what I’m talking about. If not, be warned that this is going to be the first part of a (possibly long) series of posts on the same topic. It turned out to be very long for a single post, and I decided to split it instead.

You see, when we moved from the last apartment, we sold our Nest smart thermostat to a friend. The new apartment has an aircon system with heat pump, rather than a “classic” heating system, which is really important as the balcony can easily reach 40°C in the mornings when the sun shines. And unlike in the US, where thermostats are pretty much standardized, Europe’s landscape of thermostats is different enough that Nest gave up, and does not support aircon systems.

Aside: I do have a bit of a rant about Nest Thermostats in Europe, but some of that might be a bit tricky to phrase for me without risking breaching confidentiality with my previous employer, which I don’t want to do. So I will leave a question here for European Nest Thermostats users: can you finally enable hot water boost with the Google Home app?

To be honest, this also kind of makes sense: in a flat that is cooled and heated with an HVAC, it makes sense to have multiple thermostats so that each room can set a different required temperature. If we’re spending the evening in the living room, what’s the point of heating up the bedroom? If I’m on vacation and not spending time in the office, why would I turn on the air conditioning? And so on.

Unfortunately what we ended up with is three thermostat units from LG, model number LG-PQRCUDS0 (provided for ease of searching and finding this blog post), which are definitely not smart, and also not convenient. These are wired, non-smart control panels, that do support features like timing, but do not provide any way to control without tapping on the screen. As far as I know, these are configured to read a temperature sensor that is not on the panel itself, but on the other hand, the placement of those sensors are a bit on the unfortunate side: in particular in the bedroom it appears located in a position that is too natural to fit a wardrobe in, making it register always a higher temperature that the room actually has.

This had been particularly annoying during the winter but it was proving to be worse during the summer: as I said the temperature in the balcony can reach 40°C in the morning, as we’re facing east and it’s a all-glass external wall. That means that the temperature inside the apartment can easily reach 30°C quite suddenly. This is not good for electronics already, but it’s doubly non-good for things like food and medicine, including insulin, which I very much depend on.

While we could just try leveraging the timer mode to turn on the AC in the morning, the complication of where the sensor is makes it very hard to judge the temperature to set it at. And since, as Alec points out on the video, the thermostat’s job is only to turn something on or off (in theory, at least)… well, there has to be an easier way.

So I embarked in this quest of reverse engineering my aircon control panel, with the intent of introducing an ESPHome-compatible add-in that would allow me to control the HVAC through Home Assistant.

Inspection

The first thing to do when setting off to reverse engineer something is to figure out what it is, whether there is any documentation for it, and whether someone else already reverse engineered it. The model number, as I said, is LG-PQRCUDS0 and LG has user and installation manuals online describing it a Delux Wired Remote Controller (together with the -B and -S variants of the part number).

Reverse image search for the panel actually seemed to struck gold at first, as this Instructables post showed exactly the same UI as mine, and included a lot of information about the protocol. But also the comments pointed to a couple of different models that seemed all similar but a bit different. So instead of going ahead and trying to build the already reversed protocol I wanted to confirm how it all worked myself.

A close up of the door behind my LG aircon control panel showing a JST ZR connector, and a yellow-red-black cable going to the wall.

The first question is going to be what the electrical “protocol” it’s using. The back of the panel has a door, that hides the inbound connection from “the wall” (that is, the actual HVAC unit), which is three wires and terminates in a JST ZR connector.

With my multimeter I could confirm that the voltage would be around 12V — but I couldn’t confirm whether it would be differential data or what else, since I’m still using an older multimeter and it doesn’t have any option to indicate there’s a signal on a wire. If someone has a good suggestion for a multimeter that does that, please leave a comment below the video in this post as I’d love to get a good one.

Now this is a good news, overall. The fact that the plug, and the cable itself, can be bought off the shelf means I don’t have have to take risky approaches, which is great, given that we’re renting, so any reverse engineering and replacement implementation needed to be non-destructive and reversible.

So I took out my Logic Pro, a very long USB 3.0 cable, and I ordered just enough components from Digikey to debug this thing. And a bench power supply — because I didn’t have a bench power supply, and given this thing needed 12V, it sounded something handy to have for this. The end result is the following:

With this connected, I used the Logic 2 software to check the voltage levels, and figure out that the yellow wire is data, while the red wire (in the middle) is 12V supply. The data turned out to, indeed, be a 104 Bd serial connection, which would make it share a lot of the information from the previous reverse engineering…

Except that something was off: what I could see on the wire was a burst of 12 bytes in a single stream, exactly once a second, which I assumed at that point to be unidirectional from the panel to the HVAC. But when trying to verify the checksum it didn’t match what the instructions on the other project suggested: sum everything, modulo 256, and xor with 0x55 (the confusing ‘U’ in the various descriptions is actually a bit pattern). So while I could figure out that the first byte seemed to include the mode of operation, and the third one appeared to include the fan speed, I couldn’t figure out for the life of me the checksum, so I thought I wouldn’t be able to send commands to the HVAC to do what I wanted.

On the other hand, in the worst case scenario I could have just replayed the commands I could record from the panel, so I decided to try my luck at drawing and ordering a PCB that would have just enough components for me to play around with.

Drawing the PCB

I’m far from being even a passable expert on electronics, but I could at least figure out some of the things I wanted from a “smart controller” unit to attach to this aircon. So I started with a list of requirements:

  • Since I wanted it to use ESPHome, it should be designed around an ESP32 module. I already attempted this once with the acrylic lamps, and I have yet to get a working board out of that. On the other hand, this time I’m much less space constrained, so I decided to go for a full DEVKIT module, one of those with already the full board of regulators, USB port and serial adapters. This turned out to be a further blessing in disguise, since the current chip shortage appears to have affected the CP2104 module I used in my previous design and I wouldn’t have been able to replicate it.
  • While I don’t expect that the HVAC power supply has been limited in power significantly (after all there’s even more deluxe WiFi enable controllers in other versions), I didn’t want to increase the load on the 12V supply significantly. Which meant I went for the more complex, but also more efficient, route of building in a buck converter to 3.3V to power up the ESP32.
  • Also, I really know that relying on my code for “enjoyment-critical” use cases can be frustrating, I wanted a physical way to hard-disconnect a possibly misbehaving automation, and go back to use the old controller, without having to fidget with cables.

With these conditions, and the assumption that the twelve bytes I was seeing were being sent directly from the controller to the HVAC, I drew and manufactured the above board. Feel free to spot the error at the top of the board, if you may.

Now, since JLCPCB turnaround is usually fairly fast, I went ahead and got that manufactured while I was still fighting with figuring out the checksum. So when the boards arrived and I populated them, I was planning on just keep changing settings to find more possible combinations of bytes to see how the checksum would behave.

And that’s when I found out I was very wrong in my assumption, and it’s possible that either the reverse engineering notes I’ve seen for other are missing a big chunk of information, or LG has so many different ways to achieve roughly the same endgame. One I powered up the panel from the bench supply, then I could see that the panel was rather only sending six bytes, rather than the twelve I expected. It’s a bidirectional communication on a single wire, a bus.

That meant going back to the literal drawing board, find the right components to implement this, and start what turned out to be a much large sidequest of complicating matters.

Glucometer Notes: GlucoRx Q

This article is going to be spending some time to talk about the meter, the manufacturer, and my reverse engineering. The more “proper” review of the device will be at the end, look for Review as title.

So despite having had no progress in months with my reverse engineering efforts started last year, I have not been ignoring my pastime of acquiring and reverse engineering the protocols of glucometers. And since I felt a bit bored, I went onto AliExpress Amazon UK and searched for something new to look at. Unfortunately, as usual, Amazon is becoming more and more a front for drop-ship sellers from AliExpress and similar sources, so most of the results are Sinocare. Eventually, I found the a GlucoRx Q, which looked interesting, given that I have already reverse engineered the GlucoRx Nexus.

Let’s start with a few words about the brand, because that’s one of those “not-so-secret secrets” that is always fun to remind people of: GlucoRx does not actually design, or manufacture, these devices or the strips they use. Instead, they “rebadge” white label glucometers. The Nexus meter I previously looked at was also marketed by the Italian company Menarini and the German (if I understand that correctly) Aktivmed, but was actually manufactured by TaiDoc, a Taiwanese company, as the TD-4277. I say this is not so secret because… it’s not a secret. The name of TaiDoc, and the original model number are printed on the label at the bottom of the device.

Now, some manufacturers doing this kind of white label rebadging don’t really have “loyalty” to a single manufacturer, so when I saw that the Q required different strips from the Nexus, I thought it would be a different manufacturer this time, which brought up my hopes that I would have a fun reverse engineering project on my hands, but that turned out to be disappointed very quickly, as the device said it’s a TaiDoc TD-4235B.

A quick search on Wikidata turned out to be even more interesting than I expected, and showed that GlucoRx markets more of the TaiDoc devices too, including the Nexus Voice TD-4280. Interesting that the company does not have an entity at the time of writing, and that even the retracted article names TaiDoc twice, but GlucoRx 45 times. To make a comparison with computers, it’s like writing an article about a Surface Book laptop and keep talking about the CPU as if it was Microsoft’s.

Anyway, even though the manufacturer was the same, I was still hoping to have some fun reverse engineering it. That was also disappointed: it took me longer to set up Windows 10 in a virtual machine than it took me to make glucometerutils download the data from the meter. It looks like TaiDoc has a fairly stable protocol, which honestly surprised me, as a lot of the manufacturers appear to just try to make it harder to support their devices.

Indeed this meter also shows up with a CP2110-compatible HID endpoint, which meant I could reuse my already-written chatter script to extract the back-and-forth between the Windows app and the device, and confirm that it was pretty much the same as the Nexus. The only differences were the model number (which is still issued in little-endian BCD), and a couple of unknown bytes that weren’t as constants as I thought they were. I also updated the documentation.

Why did I say “CP2110-compatible” instead of just CP2110? Well, here’s the thing: the GlucoRx Q shows up on the kernel logs (and in Windows hardware notifications) as “Silicon Laboratories C8051F34x Development Board”. Sounds like someone forgot to flash in the magic strings, and that pretty much “broke the magic” of which platform these devices are based on. Again, not the biggest secret, but it’s always interesting.

As the name might already have given away, the Silicon Labs C8051F34x is an 8-bit microcontroller based on the 8051. Yes, the same architecture I used for Birch Books, and for which I complained about the lack of good FLOSS support (since there doesn’t seem to be any institutional money to improve). It appears that these MCUs don’t just include the 8051 core but also a whole suite of components that do make them very versatile for the use on glucometers, namely fast and precise Analog-to-Digital Converters (ADCs). It also appears to have an integrated USB-to-serial through the same HID protocol as the CP2110.

So, yeah, I’m considering doing one run of the Birch Books controller based on this particular MCU out of curiosity, because they come in a package that is still hand-solderable and include already USB support. But that’s a project for another time.

So putting the reverse engineering (or rather, the no lack of need of it) aside, let’s take a quick look at the meter as a meter.

Review

There is not much to say about this meter, because it’s your average “cheap” meter that you can find on online stores and pharmacies. I’m still surprised that most people don’t just get a free meter from one of the big names (in Italy, Ireland, and UK they are usually free — the manufacturers make their money on the strips), but this is not uncommon.

The GlucoRx Q is a fairly comfortable meter — unlike the Nexus, it’s “pill-shaped”, reminding me a lot of the Contour Next One. Unlike the Contour, this meter is not backlit, which means it’s not usable in dark places, but it also has a significantly bigger display.

The size does not compare entirely favourably with the FreeStyle Libre, part of the reason for which is that it runs off a single AAA battery — which makes it easy to replace, but puts some constraints on the size. On the bright side, the compartment door is captive to the main body so you don’t risk losing it if you were to change the battery on a moving vehicle, for instance.

The fitting of the strips is nice and solid, but I have to say getting blood onto them was quite harder than other meters, including the already mentioned Sinocare. Unlike other meters, there’s no lever to eject the strip without touching it — which makes me wonder once again if it’s a cultural reason for most of the Chinese meters to have it.

As usual for these reviews, I can’t really give an opinion on the accuracy — despite having spent many years looking at glucometers, I haven’t figured out a general way test these for accuracy. Most meters appear to have a calibration solution, but that’s not mean tot be compatible between devices, so I have no way to compare them to each other.

I don’t see any particular reason for getting or avoiding this particular device, to be honest. It seems to just be working fine, but at the same time I get other meters for free, and the strips are covered by NHS for me and all the diabetics — if anyone has any other reason on why to prefer this meter, I’d love to hear about it.

USB Captures Yak Shaving

Months ago I complained about the state of USB captures solutions in 2020. One of the issues it that you can’t easily provide a capture filter to libpcap, because they don’t want to implement user-mode capturing, and Linux does not provide BPF-based filtering for usbmon.

While I do still find it an interesting idea to add BPF filtering there, my kernel-fu is still fairly limited, and I thought I would start with something easier: filtering in userspace with a custom capture program. This also got me a bit more comfortable with the actual capture API, that I have been ignoring for the most part.

As I said before, languages are tools, and I could have tried implementing the tool in a different programming language. But on the other hand, I’m trying to get this done to integrate with the rest of the chatter-extraction tools I released as part of usbmon-tools, so why straying away (too much) from the path? Well, turns out that the usbmon interface is a bit too complicated to implement in pure Python, but Cython makes for a good extended language for it, and it’s something I’m familiar enough with — including for something fairly similar with the SGIO implementation.

It was yet another interesting exercise in Yak Shaving though. Beside the documentation being obtuse at times, and trying to explain the interfaces in their chronological order, with the most useful once last, I found myself partially stumped when I realised that the ioctl() constants you have to use to get any useful information are not available on any userspace header of the Linux kernel! Indeed, it seems the main implementation of usbmon, as part of libpcap, just copies enough of the structures to be able to read the information — and, by the way, does not actually follow the documented process: it sets a value for the buffer size, rather than getting the one that is already set.

I’ve now engaged to make sure that the structures and constants are available to userspace, because at the very least that needs to be addressed properly. I’ve also added unrolled constants for the two ioctl calls that are needed to set the capture up, which keep the amount of copy-paste from kernel headers to a minimum.

While I have committed a monitoring tool that allows printing the output of packets, this is far from the end. It only outputs text format right now, it doesn’t do URB re-tagging, and it only does naïve filters. My next few steps will likely involve getting python-pcapng write support merged in, and start writing pcapng file with the new tool. Then I can start looking at a more common, more interesting filtering set.

Once the capturing is properly taken care of, I have two main needs that I need to address, in the toolset: one is to be able to unpack PL2303 serial protocols — because the programmer that is failing me is using PL2303 and I would like to see how the conversation with the bootloader is going. While the stcgal tool has debug output, having a general chatter printer feels like it would be useful in the future. The other is USB Mass Storage parsing and inspecting, because I need it for the beurer, but also because I would like to turn some of my past reverse engineering blog posts into a talk, and I would like to have some more examples of how the tools make it easier to find the meat of the information.

So yeah that’s where a Sunday went for me…

Investigating Chinese Acrylic Lamps

A couple of months ago I built an insulin reminder light, roughly hacking around what I would call an acrylic lamp. The name being a reference to the transparent acrylic (or is it polycarbonate?) shape that you fit on top, and that lights up with the LEDs it’s resting on top. They are totally not a new thing, and even Techmoan looked at them three years ago. The relatively simple board inside looked fairly easy to hack around, and I thought it would make a good hack project to look more into them.

They are also not particularly expensive. You can go on AliExpress and get them for a few bucks each with so many different shape designs. There’s different “bases” to choose from, too — the one I hacked the Pikachu on was a fairly simple design with a translucent base, and no remote control, although the board clearly showed space for a TSOP-style infrared decoder. So I ended up ordering four different variants — although all of them without remotes because that part I didn’t particularly care for: one translucent base, one black base with no special features, one with two-colour shapes and LEDs, one one self-changing LEDs with mains power only.

While waiting for those to turn up, I also found a decent deal on Amazon on four bases without the acrylic shapes on them for about £6 each. I took a punt and ordered them, which turned out to be a better deal than expected.

These bases appear to use the same board design, and the same remote control (although they shipped four remotes, too!), and you can see an image of it on the right. This is pretty much the same logic on the board as the one I hacked for my insulin reminder, although it has slightly different LEDs, which are not common anode in the package, but are still wired in a common-anode configuration.

For both the boards, the schema above is as good a reversing as I managed on my own. I did it on the white board, so there might be some differences in the older green one, particularly in the number of capacitors, but all of that is not important for what I’m getting to right now. I shortened the array to just four LEDs to show, but this goes on for all of the others too. The chip is definitely not a Microchip one, but it fits the pinout, so I kept that one, similarly to what I did for the fake candle. Although in this case there’s no crystal on the board, which suggests this is a different chip.

I kind of expected that all the remaining boards would be variation on the same idea, except for the multi-color one, but I was surprised to figure out that only two of them shared the same board design (but took different approaches as to how to connect the IR decoder — oh yeah, I didn’t select any of the remote-controlled lamps, but two of them came with IR decoderes anyway!)

The first difference is due to the base itself: there’s at least two types of board that relate to where the opening for the microUSB port is in relation to the LEDs: either D-shaped (connector inline with the LEDs) or T shaped (connector perpendicular to the LEDs). Another difference is in the placement of the IR decoder: on most of the bases, it’s at 90° from the plug, but in at least one of them it’s direct opposite.

Speaking of bases, the one that was the most different was the two-colours base: it’s quite smaller in size, and round with a smooth finish, and the board was proper D shaped and… different. While the LEDs were still common-anode and appeared wired together, each appears to have its own paired resistor (or two!), and the board itself is double-sided! That was a surprise! It also is changing the basic design quite a bit more than I expected, including only having one Zener, and powering up the microcontroller directly over 4.5V instead of using a 3V regulator.

It also lacks the transistor configuration that you’d find on the other models, which shouldn’t surprise, given how it needs to drive more than the usual three channels. Which actually had me wonder: how does it drive two sets of RGB LEDs with an 8-pin microcontroller? Theoretically, if you don’t have any inputs at all, you could do it: VDD and VSS take two pins, each set of LEDs take three pins for the three colour channels. But this board is designed to take an IR decoder for a remote control, which is an input, and it comes with a “button” (or rather, a piece of metal you can ground with your finger), which is another input. That means you only have four lines you can toggle!

At first I thought that the answer was to be found on the other six-pin chip on the lift, but turns out that’s not the case. That one is marked 8223LC and that appears to correspond to a “touch controller” Shouding SD8223L and is related to the metal circlet that all of these bases use as input.

Instead, the answer became apparent when using the multimeter in continuity mode: since it provides a tiny bit of current, you can turn on LEDs by pointing them between anode and cathode of the diode. Since the RGB cathode on the single LED package are already marked on the board, that’s also not difficult to do, and while doing that I found their trick: the Blue cathods are common to all 10 LEDs, they are not separate for outer and inner groups, and more interestingly the Green cathodes are shorted to the anodes for the inner four LEDs — that means that only the outer LEDs have the full spectrum of colours available, and the only colour combination that make the two groups independent is Green/Red.

So why am I this interested in these particular lamps? Well, they seem to be a pretty decent candidate to do some “labor of love” hack – as bigclive would call it – with making them “Internet of Things” enabled: there’s enough space to fit an ESP32 inside, and with the right stuff you should be able to create a lamp that is ESPHome compatible — or run MicroPython on it, either to reimplement the insulin reminder logic, or something else entirely!

A size test print of my custom designed PCB.

Indeed, after taking a few measurement, I decided to try my hand at designing a replacement board that fits the most bases I have: a D-shaped board, with the inline microUSB, has just enough space to put an ESP32 module on it, while keeping the components on the same side of the board like in the original models. And while the ESP32 would have enough output lines to control at least the two group of LEDs without cheating, it wouldn’t have enough to address normal RGB LEDs individually… but that doesn’t need to stop a labor of love hack (or an art project): Adafruit NeoPixel are pretty much the same shape and size, and while they are a bit more expensive than the regular RGB LEDs they can be individually addressed easily.

Once I have working designs and code, I’ll be sharing, particularly in the hopes that others can improve on them. I have zero designing skills when it comes to graphics or 3D designing, but if I could, I would probably get to design my own base as well as the board: with the exception of the translucent ones, the bases are otherwise some very bland black cylinders, and they waste most of the space to allow 3×AAA batteries (which I don’t think would last for any amount of time). Instead, a 3D printed base, with hooks to hold it onto a wall (or a door) and a microUSB-charged rechargeable battery, would be a lovely replacement for the original ones. And if we have open design for the board, there’s pretty much no need to order and hope for a compatible base to arrive.

FreeStyle Libre 2 More Encryption Notes

Foreword: I know that I said I wouldn’t put reverse engineering projects as part of the Monday schedule, but I find myself having an unbalance between the two set of posts, and I wanted to get this out sooner rather than later, in the hope someone else can make progress.

You may remember I have been working on the FreeStyle Libre 2 encrypted communication protocol for a few months. I have actually taken a break from my Ghidra deep dive while I tried sorting my future out – and failing, thanks to the lockdown – but I got back to this a couple of weeks ago, since my art project completed, and I wanted to see if sleeping it over a bit meant getting a clearer view of it.

Unfortunately, I don’t think I’m any closer to figuring out how to speak to Libre 2 readers. I did manage to find some more information about the protocol, including renaming one of the commands to match the debug logs in the application. I do have some more information about the encoding though, which I thought I would share with the world, hoping it will help the next person trying to get more details on this — and hoping that they would share it with the world as well.

While I don’t have a final answer on what encryption they use on the Libre 2, I do have at least some visualization of what’s going on in the exchange sequence.

There’s 15 bytes sent from the Libre 2 reader to the software. The first eight are the challenge, while the other seven look like a nonce of some kind, possibly an initialization vector, which is used in the encryption phase only.

To build the challenge response, another eight bytes are filled with random returned by CryptGenRandom, which is a fairly low level, and deprecated, API. This is curious, given that the software itself is using Qt for the UI, but makes more sense when you realise that they use the same exact code in the driver used for uploading to the LibreView service, which is not Qt based. It also likely explains why the encryption is not using the QtCryptography framework at all.

This challenge response is then encrypted with a key — there are two sets of keys: Authorization keys are used only for this challenge phase, and Session keys are used to handle the rest of the communication. Each set includes an Encryption and a MAC key. The Authorization keys are both seeded with just the serial number of the device in ASCII form, and two literal strings, as pictured above: AuthrEnc and AuthrMAC. The session keys’ seeds include a pair of 8-bytes values as provided by the device after the authorization completes.

The encryption used is either a streaming cipher or a 64-bit block cipher. I know that, because I have multiple captures from the same device in which the challenge started with the same 8 bytes (probably because it lacked enough entropy to be properly random at initialization time), and they encrypted to exactly the same output bytes. Since the cleartext adds a random component, if it was a 128-bit block cipher, you would expect different ciphertext in the output — which kind of defeats the purpose of those 8 random bytes I guess?

The encrypted challenge response is then embedded in the response message, which includes four constant bytes (they define the message type, the length, and the subcommand, plus an extra constant byte thrown in), and then processed by the MAC algorithm (with the Authorization MAC key) to produce a 64-bit MAC, that is tackled at the end of the message. Then the whole thing is sent to the device, which will finally start answering.

As far as I can tell, the encryption algorithm is the same for Authorization and Session — with the exception of the different seed to the key generation. It also includes a different way to pass a nonce — the session encryption includes a sequence number, on both the device and the software, which is sent in clear text and fed into the encryption (shifted left by 18 bits, don’t ask me!) In addition to the sequence number, the encrypted packets have an unencrypted MAC. This is 4 bytes, but it’s actually done with the same algorithm as the authorization. The remaining 4 bytes are just dropped on the floor.

There’s a lot more that I need to figure out in the code, because not knowing anything about cryptography (and also not being that good with Ghidra). I know that the key generation and the encryption/decryption functions are parameterized with an algorithm value, which likely corresponds to an enum from the library they used. And that the parameterized functions dispatch via 21 objects (but likely not C++ objects, as they don’t seem to use vtables!), which can either point at a common function that returns an error (pretty much “not implemented”) or to an actual, implemented function — the functions check something: the enum in the case of key creation (which is, by the way, always 9), or some attribute of the object passed in for encryption and decryption.

These are clearly coming from a library linked in statically — I can tell because the code style for these is totally different from any other part of Abbott’s code, and makes otherwise no sense. It also is possibly meant to be obfuscated, or at least made it difficult — it’s not the same object out of the 21 that can answer the encrypt/decrypt function for the object, which makes it difficult to find which code is actually being executed.

I think at this point, the thing that is protecting their Libre 2 protocol the most is just the sheer amount of different styles of code in the binary: Qt, C++ with STL, C++ with C-style arrays, Windows APIs, this strange library, …

By the way, one thing that most likely would help with figuring this out would be if we could feed selected command streams into the software. While devices such as the Facedancer can help, given that most of this work is done in virtual machines, I would rather have my old idea implemented. I might look for time to work on this if I can’t find anyone interested, but if you find that this is an useful idea, I would much prefer being involved but not leading its implementation. Honestly, if I had more resources available, I would probably just pay someone to implement it, rather than buy a hardware Facedancer.

FreeStyle Libre 2: Notes From The Deep Dive

As I wrote last week, I’ve started playing with Ghidra to dive into the FreeStyle Libre 2 software, to try and figure out how to speak the encrypted protocol, which is in the way to access the Libre 2 device as we already access the Libre 1.

I’m not an expert when it comes to binary reverse engineering — most of the work I’ve done around reverse engineering has been on protocols that are not otherwise encrypted. But as I said in the previous post, the binary still included a lot of debug logs. In particular, the logs included the name of the class, and the name of the method, which made it fairly easy to track down quite a bit of information on how the software works, as well as the way the protocols work.

I also got lucky to find a second implementation of their software protocol. At least a partial one. You see, there’s two software that can communicate with the old Libre system: the desktop software that appears to be available in Germany, Australia, and a few other countries, and the “driver” for LibreView, a service that allows GPs, consultants, and hospitals to remotely access the blood sugar readings of their patients. (I should write about it later.) While the main app is a single, mostly statically linked Qt graphical app, the “driver” is composed of a number of DLL modules, which makes it much easier to read.

Unfortunately it does not appear to support the Libre 2 and its encryption, but it does help to figure out other details around the rest of the transport protocol, since it’s much better logged, and provides clearer view of the function structure — it seems like the two packages actually come from the same codebase, as a number of classes share the same name between the two implementations.

The interesting part is trying to figure out what the various codenames mean. I found the names Orpheus and Apollo in the desktop app, and I assumed the former was the Libre and the latter the Libre 2, because the encryption is implemented only on the Apollo branch of the hierarchy, in particular in a class called ApolloCryptoLib. But then again, in the “driver” I found the codenames Apollo and Athena — and since the software says it supports the “Libre Pro” (which as far as I know is the US-only version that was released a few years ago), I’m wholly confused on what’s what now.

But as I said, the software does have parallel C++ class hierarchies, implementing lower-level and higher-level access controls for the two codenames. And because the logs include the class name it looks like most functions are instantiated twice (which is why I found it easier to figure out the flow for the non-crypto part from the “driver” module.) A lot of the work I’m doing appears to be manual vtable decoding, since there’s a lot of virtual methods all around.

What also became very apparent is that my hunch was right: the Libre 2 system uses basically the same higher level protocol as the Libre 1. Indeed, I can confirm not only that the text commands sent are the same (and the expected responses are the same, as well), but also that the binary protocol is parsed in the same way. So the only obstacle between glucometerutils and the Libre 2 is the encryption. Indeed, it seems like all three devices use the same protocol, which is either called Shazam, AAP or ATP — it’s not quite clear given the different set of naming conventions in the code, but it’s still pretty obvious that they share the same protocol, not just the HID transport, but also for defining higher level commands.

Now about the encryption, what I found from looking at the software is that there are two sets of keys that are used. The first is used in the “authentication” phase, which is effectively a challenge-response between the device and the software, based on the serial number of the device, and the other is used in the encrypted communication. This was fairly easy to spot, because one of the classes in the code is named ApolloCryptoLib, and it included functions with names like Encrypt, Decrypt, and GenerateKeys.

Also one note that important: the patch (sensor) serial number is not used for the encryption of the reader’s access. This is something that comes up time and time again. Indeed at least a few people have been telling me on Twitter that the Libre 2 sensors (or patches, as Abbott calls them) are also encrypted and that clearly they use the same protocol for the reader. But that’s not the case at all. Indeed, the same encryption happens when no patch was ever initialized, and the information on the patches is fetched from the reader as the last part of the initialization.

Another important piece of information that I found in the code is that the encryption uses separate keys for encryption and MAC. This means that there’s an actual encryption transport layer, similar to TLS, but not similar enough to worry me so much regarding the key material present.

With the code at hand, I also managed to confirm my original basic assumptions about the initialization using sub-commands, where the same message type is sent with a follow-up bytes including information on the command. The confirmation came from a log message calling the first byte in the command… subcmd. The following diagram is my current best understanding of the initialization flow:

Initialization sequence for the FreeStyle Libre 2 encryption protocol.

Unfortunately, most of the functions that I have found related to the encryption (and to the binary protocol, at least in the standalone app) ended up being quite complicated to read. At first I thought this was a side effect of some obfuscation system, but I’m no longer sure. It might be an effect of the compile/decompile cycle, but at least on Ghidra these appear as huge switch blocks, with what is effectively a state machine jumping around, even for the most simple of the methods.

I took a function that is (hopefully) the least likely to get Abbott upset for me reposting it. It’s a simple function: it takes an integer and returns an integer. I called it int titfortat(int) because it took me a while to figure out what it was meant to do. It turns out to normalize the input to either 0, 1 or -1 — the latter being an error condition. It has an invocation of INT3 (a debugger trap), and it has the whole state machine construct I’ve seen in most of the other functions. What I found about this function is that it’s used to set a variable based on whether the generated keys are used for authentication or session.

The main blocker for me right now to figure out how the encryption is working, is that it looks like there’s an array of 21 different objects, each of which comes with what looks like a vtable, and only partially implemented. It does not conform to the way Visual C++ is building objects, so maybe it’s a static encryption library linked inside, or something different altogether. The functions I can reach from those objects are clearly cryptography-related: they include tables for SHA1 and SHA2 at least.

The way the objects are used is also a bit confusing: an initialization function appears to assign to each pointer in the array the value returned by a different function — but each of the functions appear to only return the value of a (different) global. Whenever the vtable-like is not fully implemented, it appears to be pointing at code that simply return an error constant. And when the code is calling those objects, if an error is returned it skips the object and go to the next.

On the other hand, this exercise is giving me a lot of insights about the insight of the overall HID transport as well as the protocol inside of it. For example, I finally found the answer to which checksum the binary messages include! It’s a modified CRC32, except that it’s calculated over 4-bit at a time instead of the usual 8, and thus requires a shortened lookup table (16 entries instead of 256) — and if you think that this is completely pointless, I tend to agree with you. I also found that some of the sub-commands for the ATP protocol include an extra byte before the actual sub-command identifier. I’m not sure how those are interpreted yet, and it does not seem to be a checksum, as they are identical for different payloads.

Anyway, this is clearly not enough information yet to proceed with implementing a driver, but it might be just enough information to start improving the support for the binary protocol (ATP) if the Libre 2 turns out not to understand the normal text commands. Which I find very unlikely, but you we’ll have to see.

Leveling up my reverse engineering: time for Ghidra

In my quest to figure out how to download data from the Abbott FreeStyle Libre 2, I decided that figuring it out just by looking at the captures was a dead end. While my first few captures had me convinced the reader would keep sending the same challenge, and so I could at least replay that, it turned out to be wrong, and that excluded most of the simplest/silliest of encryption schemes.

As Pierre suggested me on Twitter, the easiest way to find the answer would be to analyze the binary itself. This sounded like a hugely daunting task for myself, as I don’t speak fluent Intel assembly, and particularly I don’t speak fluent Windows interfaces. The closest I got to this in the past has been the reverse engineering of the Verio, in which I ran the software on top of WinDbg and discovered, to my relief, that they not just kept some of the logging on, but also a whole lot of debug logs that made it actually fairly easy to reverse the whole protocol.

But when the options are learning enough about cryptography and cryptanalysis to break the encoding, or spend time learning how to reverse engineer a Windows binary — yeah I thought the latter would suit me better. It’s not like I have not dabbled in reversing my own binaries, or built my own (terrible) 8086 simulator in high school, because I wanted to be able to see how the registers would be affected, and didn’t want to wait for my turn to use the clunky EEPROM system we used in the lab (this whole thing is probably a story for another day).

Also, since the last time I considered reversing a binary (when I was looking at my laptop’s keyboard), there’s a huge development: the NSA released Ghidra. For those who have not heard, Ghidra is a tool to reverse engineer binaries that includes a decompiler, and a full blown UI. And it’s open source. Given that the previous best option for this was IDA Pro, with thousands of dollars of licenses expected, this opened a huge amount of doors.

So I spent a weekend in which I had some spare time to try my best on reversing the actual code coming from the official app for the Libre 2 — I’ll provide more detail of that once I understand it better, and I know which parts are critical to share, and which one would probably get me in trouble. In general, I did manage to find out quite a bit more about the software, the devices, and the protocol — if nothing else, because Abbott left a bunch of debug logging disabled, but built in (and no, this time the WinDbg trick didn’t work because they seems to have added an explicit anti-debugger exception (although, I guess I could learn to defeat that, while I’m at it).

Because I was at first a bit skeptical about my ability to do anything at all with this, I also have been running it in an Ubuntu VM, but honestly I’m considering switching back to my normal desktop because something on the Ubuntu default shell appears to mess with Java, and I can’t even run the VM at the right screen size. I have also considered running this in a Hyper-V virtual machine on my Gamestation, but the problem with that appears to be graphics acceleration: installing OpenSUSE onto it was very fast, but trying to use it was terribly sloppy. I guess the VM option is a bit nicer in the sense that I can just save it to power off the computer, as I did to add the second SSD to the NUC.

After spending the weekend on it, and making some kind of progress, and printing out some of the code to read it on paper in front of the TV with pen and marker, well… I think I’m liking the idea of this but it’ll take me quite a while, alone, to come up with enough of a description that it can be implemented cleanroom. I’ll share more details on that later. For the most part, I felt like I was for the first time cooking something that I’ve only seen made in the Great British Bake Off — because I kept reading the reports that other (much more experienced) people wrote and published, particularly reversing router firmwares.

I also, for once, found a good reason to use YouTube for tutorials. This video by MalwareTech (yes the guy who got arrested after shutting WannaCry down by chance) was a huge help to figure out features I didn’t even know I wanted, including the “Assume Register” option. Having someone who knows what he’s doing explore a tool I don’t know was very helpful, and indeed it felt like Adam Savage describing his workshop tools — a great way to learn about stuff you didn’t know you needed.

My hope is that by adding this tool to my toolbox – like Adam Savage indeed says in his Every Tool’s A Hammer (hugely recommended reading, by the way) – is that I’ll be able to use it not just to solve the mystery of the Libre 2’s encryption. But also that of the nebulous Libre 1 binary protocol, which I never figured out (there’s a few breadcrumbs I already found during my Ghidra weekend). And maybe even to figure out the protocol of one of the gaming mice I have at home, which I never completed either.

Of course all of this assumes I have a lot more free time than I have had for the past few years. But, you know, it’s something that I might have ideas about.

Also as a side note: while doing the work to figure out which address belongs to what, and particularly figure out the jumps through vtables and arrays of global objects (yeah that seems to be something they are doing), I found myself needing to do a lot of hexadecimal calculations. And while I can do conversions from decimal to binary in my head fairly easily, hex is a bit too much for me. I have been using the Python interactive interpreter for that, but that’s just too cumbersome. Instead, I decided to get myself a good old physical calculator — not least because the Android calculator is not able to do hex, and it seems like there’s a lack of “mid range” calculators: you get TI-80 emulators fairly easily, but most of the simplest calculators don’t have hex. Or they do, but they are terrible at it.

I looked up on Amazon for the cheapest scientific calculator that I could see the letters A-F on, and ordered a Casio fx-83GT X — that was a mistake. When it arrived, I realized that I didn’t pay attention to finding one with the hex key on it. The fx-83GT does indeed have the A-F inputs — but they are used for defining variables only, and the calculator does not appear to have any way to convert to hexadecimal nor to do hexadecimal-based operations. Oops.

Instead I ordered a Sharp WriteView EL-W531, which supports hex just fine. It has slightly smaller, but more satisfying, keys, but it’s yet another black gadget on my table (the Casio is light blue). I’ll probably end up looking out for a cute sticker to put on it to see it when I close it for storage.

And I decided to keep the Casio as well — not just because it’s handy to have a calculator at home when doing paperwork, even with all the computers around, but also because it might be interesting to see how much of the firmware is downloadable, and whether someone has tried flashing a different model’s firmware onto it, to expand its capabilities: I can’t believe the A-F keys are there just for the sake of variables, my guess is that they are there because the same board/case is used by a higher model that does support hex, and I’d expect that the only thing that makes it behave one way or the other is the firmware — or even just flags in it!

At any rate, expect more information about the Libre 2 later on this month or next. And if I decide to spend more time on the Casio as well, you’ll see the notes here on the blog. But for now I think I want to get at least some of my projects closer to completion.

USB capturing in 2020

The vast majority of the glucometer devices I reverse the protocol of use USB to connect to a computer. You could say that all of those that I successfully reversed up to now are USB based. Over the years, the way I capture USB packets to figure out a protocol changed significantly, starting from proprietary Windows-based sniffers, and more recently involving my own opensource trace tools. The process evolution was not always intentional — in the case of USBlyzer, it was pretty much dead a few years after I started using it, plus the author refused to document the file format, and by then even my sacrificial laptop was not powerful enough to keep running all the tools I needed.

I feel I’m close to another step on the evolution of my process, and once again it’s not because of me looking to improve the process as much as is the process not working on modern tools. Let me start by explaining what the situation is, because there are two nearly separate issues at play here.

The first issue is that either OpenSuse or the kernel changed the way the debugfs is handled. For those who have not looked at this before, debugfs is what lives in /sys/kernel/debug, and provides the more modern interface for usbmon access; the old method via /dev/usbmonX is deprecated, and Wireshark will not even show up the ability to capture USB packets without debugfs. Previously, I was able to manually change the ownership of the usbmon debugfs paths to my user, and started Wireshark as user to do the capturing, but as of January 2020, it does not seem to be possible to do that anymore: the debugfs mount is only accessible to root.

Using Wireshark as root is generally considered a really bad idea, because it has a huge attack surface, in particular when doing network captures, where the input would literally be to the discretion of external actors. It’s a tinsy bit safer when capturing USB because even when the device is fairly unknown, the traffic is not as controllable, so I would have flinched, but not terribly, to use Wireshark as root — except that I can’t sudo wireshark and have it paint on X. So the remaining alternative is to use tshark, which is a terminal utility that implements the same basics as Wireshark.

Unfortunately here’s the second problem: the last time I ran a lot of captures was when I was working on the Beurer glucometer (which I still haven’t gotten back to, because Linux 5.5 is still unreleased at the time of writing, and that’s the first version that’s not going to go into a reset loop with the device), and I was doing that work from my laptop, and that’s relevant. While the laptop’s keyboard and touchpad are USB, the ports are connected to a different bus internally. Since usbmon interfaces are set by bus, that made it very handy: I only needed to capture on the “ports” bus, and no matter how much and what I typed, it wouldn’t interfere in my captures at all.

You can probably see where this is going: I’m now using a NUC on my desk, with an external keyboard and the Elecom trackball (because I did manage to hurt my wrist while working on the laptop, but that’s a story for another post). And now all the USB 2.0 ports are connected to the same bus. Capturing the bus means getting all the events for keypresses, mouse movements, and so on.

If you have some experience with tcpdump or tshark, you’d think that this is an easy problem to solve: it’s not uncommon having to capture network packets from an SSH connection, which you want to exclude from the capture itself. And the solution for that is to apply a capture filter, such as port not 22.

Unfortunately, it looks like libpcap (which means Wireshark and tshark) does not support capture filters on usbmon. The reasoning provided is that since the capture filters for network are implemented in BPF, there’s no fallback for usbmon that does not have any BPF capabilities in the kernel. I’m not sure about the decision, but there you go. You could also argue that adding BPF to usbmon would be interesting to avoid copying too much data from the kernel, but that’s not something I have particular interest in exploring right now.

So how do you handle this? The suggested option is to capture everything, then use Wireshark to select a subset of packets and save the capture again. This should allow you to have a limited capture that you can share without risking having shared a keylogger off your system. But it also made me think a bit more.

The pcapng format, which Wireshark stores usbmon captures in, is a fairly complicated one, because it can include a lot of different protocol information, and it has multiple typed blocks to store things like hardware interface descriptions. But for USB captures, there’s not much use in the format: not only the Linux and Windows captures (the latter via usbpcap) are different formats altogether, but also the whole interface definition is, as far as I can tell, completely ignored. Instead, if you need a device descriptor, you need to scan the capture for a corresponding request (which usbmon-tools now does.)

I’m now considering just providing a simpler format to store captured data with usbmon-tools, either a simple 1:1 conversion from pcapng, with each packet just size-prefixed, and a tool to filter down the capture on the command line (because honestly, having to load Wireshark to cut down a capture is a pain), or a more complicated format that can store the descriptors separately, and maybe bundle/unbundle them across captures so that you can combine multiple fragments later. If I was in my bubble, I would be using protocol buffers, but that’s not particularly friendly to integrate in a Python module, as far as I can tell. Particularly if you want to be able to use the tools straight out of the git clone.

I guess that since I’m already using construct, I could instead design my own simplistic format. Or maybe I could just bite the bullet, use base64-encoded bytearrays, and write the whole capture session out in JSON.

As I said above, pcapng supports Windows and Linux captures differently: on Linux, the capture format is effectively the wire format of usbmon, while on Linux, it’s the format used by usbpcap. While I have not (yet, at the time of writing) added support to usbmon-tools to load the usbpcap captures, I don’t see why it shouldn’t work out that way. If I do manage to load usbpcap files, though, I would need a custom format to copy these to.

If anyone has a suggestion I’m open to them. One thing that I may try is to use Protocol Buffers but submit the generated source files to parse and serialize the object.