Reverse Engineering an LG Aircon Control Panel — Fit It All Together

Since we moved to this flat, one of my wishlist item was to have a way to control the HVAC (heat, ventilation, and air conditioning) without having to get out of bed, or go to a different room. A few months ago, I started on a relatively long journey of reverse engineering the protocol that the panel used so that I could build my own controller, and I’ve been documenting pretty much every step along the way, either on Twitter, this blog, or Twitch. As I type this, while I can’t say that the project is all done and dusted, I’ve at least managed to get to the point where I’m reaping the benefits of this journey.

You may remember that at the end of the previous chapter in this saga I was looking at controlling the actual HVAC with a Python command line tool. I built that on stream, to begin with, and while I didn’t manage to make it work the way I was hoping (with a curses-based UI), I did manage to get something that worked to emulate both the panel and (to a minor point) the HVAC itself.

I actually used this “in production” (that is, to control the air conditioning in my home office) for a little over a week. The way I was using it was through a Beagle Bone Black which I had laying around for a long while – I regret not signing up for the RISC-V preview, but honestly I wouldn’t have had time – connected through an USB-to-UART adapter (a CH340 based one, because they can actually do 104bps!), and a breadboarded TLIN1206 on a SOP-8-to-DIP adapter. A haphazard and janky setup if you’d ever seen one, but it was controllable by phone… with JuiceSSH and Tailscale.

While absolutely not ergonomic, this setup still allowed me to gain the experience needed with the protocol, in order to send the PCB design to manufacture. I did that about a week in, and as usual, JLCPCB‘s turnaround is phenomenally fast, and by the following Monday I had my boards.

The design is pretty much the same as the one I spoke about in part two, with two bus transceiver blocks, a 3.3V DC-DC converter, the quad-NAND to make sure the code cannot send data to the bus if the physical switch is turned to “disable”.

While the design proven it needs a little bit more work to be optimal, it’s a great starting point because it just works fine. And while I did originally plan to have this support the original panel with the switch turning on “pass-through mode”, I decided that for the moment, this is not needed, nor desired.

My original intention was to allow the physical switch to just prevent the custom controller from talking to the HVAC engine, and let the original panel take over. Unfortunately this does not work: the panel is not entirely stateful, but there is a bit in the command packet that says “Listen to me, I’m changing configuration” — and if the configuration in the panel and the HVAC don’t match when that bit is not set, an error state is introduced.

This basically means I wouldn’t be able to seamlessly switch between the old panel and my custom controller. Instead what I’m going to be doing if I need to is to first flip the switch to disable the custom logic, and then connect the panel to the secondary bus. That way it would initialize directly on the bus. If I do need to re-design this board, I’m going to make the switch more useful, and add a power disconnection so that it can be all connected without power on at all.

There was another reason why I originally planned to support the original control panel: it reports a temperature. I thought that I would be able to use that temperature as a “default” sensor, while still allowing me to change the source of the temperature at the time of configuration. But the panel’s temperature sensor is pretty much terrible, and it’s only able to measure 16°C~30°C, plus it’s easily fooled by… a lamp. So not exactly something you can call reliable for this use, and not something I would care to add to my monitoring.

To my own astonishment, my first attempt at soldering the full board was successful — two out of three of the boards work like a charm, the third one is a bit iffy, but it might be a not perfect component into it. I’ll have to see, but also I don’t want to make a new respin now that even some of the components I need are getting harder (and more expensive) to find. Take the JST ZR connection: in the picture above you an see it white rather than cream — that’s because it’s not an official JST part but an AliExpress clone that fits the same footprint, and I could actually buy.

Custom ESPHome Climate

Once I had the board, I rigged up a quick test bed on my desk with a breadboard and another CH340 adapter (I have a few around, they are fairly cheap and fairly versatile), and started off to complete the Custom Climate component.

Since I started this project, the ESPHome Climate API actually changed a couple of times, particularly as Nabu Casa is now sponsoring the project, and its development moved to a monthly release kind of deal. Somehow the Climate components were among those that most required work.

But the end result is that the API was clearer, and actually easier to implement, by the time I had this ready. So I wrote down the code to generate the six bytes packets I needed to send, and ran it against the emulator… and it seemed to work fine. I had wired this with a test component in Home Assistant and I could easily change the mode, the temperature, and everything else just fine, and I could see in my emulator that it was getting the right data in just fine.

At that point I was electrified, and I thought it would just be a matter of putting it to the wall and see it work. You can imagine my disappointed when I called in my wife to assist me in my victory… and it didn’t work. And I was ready to detach all of it to spend the next day debugging what was going on, until I realized that I forgot to send the “settings changed” flag. I have to say that the protocol turns out to be a bit more complex than I expected it to be, and I should probably write a bit more documentation about it, not just scatter it around the (now multiple) implementations.

After that, I actually went ahead and replaced all three of the control panels with my custom ones, and connected them to Home Assistant. That turned out to be a much easier prospect than anything else we have been trying up to now: we can decide which temperature reading to use to control the room, rather than being stuck to the silly temperature sensor in the panel, and we can use the two-point set temperature in the way most modern smart thermostat can (which the panel didn’t support, despite having an “auto mode” that would turn on cooling or heating “as needed”).

The first couple of days lead to a bit of adjustments being necessary — including implementing a feature that my wife requested: when not cooling or heating, the original panel would enter “fan only” mode. Which I enjoy for myself in the office, but bothers my wife. The original panel does not have an option to turn the fan off — but I could implement that in the custom controller. This allows us to keep our thermostat on the heat-cool mode most of the time, and just make sure the range is what we actually care for.

I also made the mistake at first of not counting on hysteresis. That turned out to be a bit more annoying to implement, not in the matter of code, but in the matter of the logic behind it — but it should now be working: it means that there is more friction to change the state of the air conditioning, which means the temperature is not as constant, but it should be significantly easier to run. To be honest I was impressed by how stable the temperature was when I left it to short cycle…

Home Assistant Integration

This was probably the simplest part of the whole work! Nabu Casa is doing an awesome job at keeping the two projects integrated very well, and with the help of “packages” configuration, replicating the configuration for the three separate boards took basically no time.

The only problem I had was that I couldn’t seem to be able to flash the first ESPHome firmware onto my ESP32 devkits using the WebSerial support. I have used it multiple times in the past, particularly to update my BLE Bridge, which I just need to connect to my main workstation rather than the standalone power supply, to upgrade, but for the pristine ESP32 devkits it didn’t work out quite as well.

The UI is very similar to the one that Google Home exposes for Nest thermostats where they do support air conditioning. And indeed, with the addition of the Home Assistant Cloud service, the same UI shows up for these thermostats.

And at that point it was just a matter of configuring the expected range of operation, both for the “daytime” and for the “night scene”. Which is one of the reasons why I wanted to have thermostat that we can control with Home Assistant.

You see, some time ago I set up so that we have a routine phrase (“To the bedroom”) and Flic buttons in both the living room and the bedroom, that prepare us to get to bed: turn off the TV, subwoofer, air fresheners, lights everywhere except bedroom and bathroom, set the volume of the bedroom speaker for the relaxing night sound, and so on.

Recently, thanks to a new Dyson integration, it also been setting the humidifier to raise the humidity in the bedroom (it gets way too dry!), and turn on the night mode on the other purifiers, which has been a great way not just to make it easier not to forget things, but also to save us from leaving the humidifier running 24/7: it’s easier for us to keep it running overnight at a higher humidity, than trying to keep it up during the whole day.

Now, with the climate controls in place, we can also change the temperatures before going to bed, rather than turning it off, which is the only option we had before. And this is a big deal, because particularly for the living room, we don’t want it to get too scorching hot even if we’re not there: it’s where the food cupboards, among other things, and during the heatwave we exceeded the 30°C for a couple of hours every other day. Being able to select different ranges while we’re still sleeping gives us a bit more safety, without having to keep running the air conditioning overnight.

Features And Remote Control

Similarly, our scheduled morning routine, configured to go off together with the alarm clock, can scale back the range in my home office to something suitable for my work day (it can get warm fast with computers and stuff running, and the door closed for meetings), so I don’t have to over-run the office in the morning when having my first meeting: it starts automatically, and it welcomes me with a normal, working temperature for my first meeting.

The final point is that, since we can actually set a very wide range on the thermostats, and rely on much more accurate thermometers that are not restricted to the 16°C to 30°C range, we can leave the thermostat on, with a very wide range, when we’re not actually going to be at home. This is particularly interesting now that we might be able to travel again, at least to see families and friends — we don’t want to leave the air conditioning or heating running all the time, but we also want to have some safeguards against the temperature dropping or raising out of control. This became very clear when the CGG1 in our winter garden hit the 50°C ceiling it could report while we were out playing Pokémon Go and we couldn’t turn on the air conditioning at all — thankfully the worst result of that was that the body of one of the fake candles we have in the winter garden for ambiance melted… and now it looks even more realistic as a candle (it also damaged the moving flame part, but who cares.)

Now, the insulation of this flat is not great. In particular the home office tends to drop down close to freezing temperatures in the winter, because the window does not seal even when closed. This means I can’t easily follow Alec’s suggestion on energy storage. But having a bit more control on automation does make it easier to keep the temperature in the flat more stable. In the winter, I expect we’ll make sure that we keep a minimum temperature overnight to avoid having to force a much higher differential when we wake up, for instance.

There’s a few features that I have not yet implemented, and that I definitely should look into implementing soon. To start with, as soon as summer gives away to autumn, we’re likely going to want to be using the dehumidifier more — without turning on the heat pump, particularly in the living room as we cook and eat. Since the CGG1 provides humidity readings together with temperature, it means I can set up an automation that, if nothing else is running, turns the dehumidifier on if the humidity reaches a certain set point, for instance.

There’s also two switches that I have not implemented yet, but should probably do, soon. One turns on resistive heating – and this will make sense again if you watch Alec’s video on heat pumps – while the other has to do with the Plasma Filter.

What’s a plasma filter? That’s a good question, and one that I’m not sure I have the right answer for. I know that this is something that the original control panel suggests is present in our HVAC, although I have no way to know for sure (we don’t have the manual of the actual engines). The manual for the PQRCUDS0 says that «you can use the plasma function» but also states «If the product is not compatible with the Plasma function,
it will not do the Plasma function even though the indicator is turned on.» This suggests that unlike other features like the swirl/swing, it’s not part of the feature query that the panel sends at turn on.

When googling further to look for information about LG’s plasma filter, I did find another manual, for an actual unit, rather than the control panel. Not the unit we’ve got, I think, but at least an unit. And this one has a description for the plasma functionality:

Plasma filter is a technology developed by LG to get rid of microscopic contaminants in the intake air by generating a plasma of high charge electrons. This plasma kills and destroys the contaminants completely to provide clean and hygienic air.

This is quite a bit interesting — and next to this, a video from LG refers to this as a “Plasma/Ionizer”, which pretty much suggested me that this is one of Big Clive’s favourite toys: ozone generators. Which makes sense given that one of his favourites is a Sharp Plasmacluster.

Code And Next Steps

First of all, the code and the board designs are all available on my GitHub. I originally considered making this a normal component for ESPHome, but since it relies on a very custom board, it doesn’t feel like the right thing to do. It does mean I need to manually keep in mind all the various changes in APIs, but hopefully that will not take too much of my time.

As I said previously, I have not actually implemented the panel bus handler — the panel will enter into error mode if it does not get an expected reply from the HVAC engine, so connecting it right now would not work at all, except if you were to disable the actual ESP32 control. I’m likely going to leverage that behaviour to test some more error handling in the future.

I would like to put a box around the board — right now it’s literally stuck to the wall with some adhesive feet, in all three rooms. And while the fixed red LED is not too annoying overnight, it is noticeable if you wake up in the middle of the night. My original idea was to find someone who can help me 3D print a box that fits on the same posts the panel fits, and provide a similar set of posts for the original panel. But it also involved me finding a way to flip the switch without taking it down the wall.

But since figuring out 3D printing with no experience is going to take a lot of investment, I am not going to take a look at that option until I’m absolutely certain I’m not changing anything in the design. And I know I would want to change the design a little bit.

First of all, I want to have a physical cut-off for the connection — since the power to run the ESP32 module comes from the same cable controlling the HVAC, the only way to turn off the power is to disconnect the cable, right now. Having a physical switch that just disconnects the power and data would just make it easier to, say, replace the devkit module.

Similar, as I said, the panel is not that useful to keep running all the time. So instead I would change the switch implementation to keep the panel off most of the time, and only power it on when disabling the ESP32. This would also save some components, since there would be no need to have the second bus transceiver and its passives.

I’m also wondering if it would make sense to have some physical feedback and access, in addition to the Home Assistant integration and the voice assistant controls: in particular I’m considering having an RGB LED on the board to tell the current action being taken (optionally, I wouldn’t want to have that in the bedroom, as it would be way too bright) together with a button to at least turn the HVAC “soft off”.

Finally, there’s a couple of optimizations that could be done to make the board a bit cheaper. One of the capacitor is ceramic, but could be replaced with a polymer one for ⅓ of the price, and the TVS diodes pair (which were actually a legacy part of the design, recommended by the MCP2021, but not in the reference designs for the TLIN1027) could be replaced with a single integrated TVS diode — it would just be “a bit” harder to solder by hand in TO-236 packages.

These are all minor though — the main cost behind the board is actually the ESP32-DEVKIT-32D that it’s designed around. It would be much cheaper to use only the ESP32 WROOM module, and not have the USB support components on the board. But I have had bad experiences with trying to integrate that in my designs, so I’m feeling a bit sceptical of going down that route — it also would mean that a botched board sacrifices the whole module (I did sacrifice two or three of those already) unless you get very good at desoldering them (or you have a desoldering station).

So most likely it will take me a few months before I actually get to the point of trying building a 3D printed cover for it. With a bit of luck by then we’ll be back in an office at least part of the time, and I can get someone to teach me how to use the 3D printer there.

Also as a final note — the final BOM for the boards suggest that building one of them costs around £25 or so, without the case. As I said there’s a few cost saving measures that I could take for the next round — though it’s questionable, because it would require me to get more components I don’t currently have. Of course the actual cost of building three of them turned out to be… significantly higher.

I think this is the part that sometimes it’s hard to explain to people who have not had this type of experience: the BOM costs are only one of the problems you need to solve — you can really screw up a project by choosing the wrong components and bringing the BOM price too high… but a low BOM cost does not make for a cheap project to finish, particularly when you’re developing it from nowhere.

In this case, if I wanted to tally up the cost of building these custom thermostat panels, I would have to, at the very least, count the multiple orders from DigiKey from which I ordered the wrong components (like the two failed attempt at using Microchip’s LIN bus transceivers rather than the TLIN, or the discrete UARTs with their own set of passives). But then there’s the cost of having all of the various tools that were needed to get this all done. Thankfully most of those tools have been used (and sometimes abused) for different projects, but they are a good metaphor for the cost of R&D that many products need — so it makes sense that what you end up buying costs more than the “simple” expense of the BOM. So keep that in mind next time you see an open source hardware device costing more than what you expect it to.

Burning Out From «Always Be Helpin’»

There is a phrase that I heard more people using than I would have liked, in my life: Always be hustlin’, which is often connected with Uber, though I highly doubt they are the source of it in the first place. I don’t like the phrase nor the spirit it embodies because I met those people. The guys who kept pestering me for years to build their dream site, game, app, always with a catch to make money out of it by hooking people in. For the most part these would be badly thought out clones stealing ideas from one or two successful existing examples, but who am I to go and keep score.

I prefer a variant of it which would be «Always Be Helpin’», inspired in part from Susan Calman’s talk of kindness. The way I tend to express this is: if helping someone does not cause me to lose time, energy, or money, helping them is close to a duty. Unfortunately, turns out this is not as easy as I thought when I expressed it.

I have been applying this particular rule both at work, for which I introspected on the role of senior engineers a couple of times, and in personal time, by trying to keep all of my work “visible”, either as blog posts or as Twitch streams. I’ve been trying to formulate ideas in a way that are more easily understood by newcomers (including my wife who’s been learning Python for the past couple of years and can now follow at least some of my projects), and I even ended up buying stands so that I could more easily stream the hardware implementation of my aircon controller.

But, it turns out, this is leading me to burn out, in both environments, and I seriously need to scale this back.

At work things are actually quite good, but I did sign up to host an intern and, though I’m getting a lot of satisfaction to the experience overall, it reminded me why I don’t want to be a manager. It’s too much pressure for me, and it ends up draining my energy with worry to put someone’s career in my hands. And that is probably the hardest problem in my role: to maintain the expectations of my role, I need to maintain a healthy impact on “people and org” — and that’s something that taxes me more than the technical requirements of the role.

This is something I struggle with. It makes total sense that for a senior engineering role to be having an impact on the work of other people, and I have been doing my best over time to “scale myself up” as they say — but it’s not something that comes natural to me and it takes energy. At the same time, I’m not the perfect code monkey that can live forever off that, so I really need to find a way that works for me without completely burning out.

In personal time, I found myself trying to find time that works to go ahead and work on stuff “on stream” and explain it there so that random viewers can figure out what I’m doing. But that does not really work that well. if I don’t schedule it ahead of time, nobody turns up, but to schedule it ahead of time it means I need to add to the pressure of not being able to do a number of other things I may have to do.

And the “sizeable chunk of uninterrupted time” that the streaming has been using would be the same reserve I would have for relaxing: after work, before or after dinner (depending on how early it ends up being), and not when we’re out with friends. Since now going out with friends is again an option, at least for some friends who are double-jabbed, this means there’s a lot fewer “slots” for me to stream in. I’ve been pretty much forcing myself to find that time, but this is now getting in the way not just of my own relaxation, but also of family and social life.

When I started the streams, it was because I wanted to figure out how to convey information over video calls, since that has become the norm for work related content as well. And I think I did get a bit more practical at that than I used to. But at the same time, I realize that it’s not something that can be done without a significant investment in the right tools for the job.

I ended up getting the microphone and enough mounts that allow me to keep my already owned camera (and lense) in a position that is useful — and I got the light, which admittedly I can use for a number of other things (and I already used twice for completely different purposes). But then again I can see that to get something of a decent enough quality to make sense, you do end up needing preset selection buttons (Streamdeck or the like), and if you want to have a bit more audience, you end up having to pay one of the providers to get you restreaming options (or technically I could do it from home, if I could be bothered with setting up a restreaming server here, Hyperoptic does provide 1Gbps service, but it’ll be even more work for me to do.

And that is without considering the fact that, even though I have now the videos posted on YouTube for the future, I do not have captions on them. And I have personal experience reminding me that this is not the good way to have content up — but it’s also not something I can do myself, so I would have to pay someone to do it… and it’s not a lot of money, but it’s more money than I wanted to invest just to randomly try stuff out and maybe explain something to the wide world.

Money investment is one thing, but time investment is also another. At some point, I have been considering how to make it easier to explain some concepts that need a bit more dynamic point of views. For that i was considering learning Synfig, which looks like a pretty powerful tool to explain. But it turns out this is not the type of tool I want in my toolbelt at the moment: it’s not an obvious tool unless you know enough of the concepts around animations, and while it would be probably be extremely impactful if I was trying to focus on teaching… that’s not my job, not my role, not what I can have most impact with.

So what is that I can have the most impact with right now? I have no idea! For sure, I need to recover from this spell of burnout. It means that I’ll probably not be streaming any of my work for the foreseeable future. And while I have not settled on taking a break from blogging, it might be a bit more sporadic, skipping some of the Tuesdays at least until the internship is finished and a few other things fall into place.

For now, I won’t be waiting for streams to work on the air conditioner anymore, and in particular as I finish this blog post I just assembled one of the new boards, which hopefully will turn into a working air conditioning control board this very week. But hopefully this mean you’ll see more finalized code and designs being posted, soon.

And, honestly, I start to think why the various social networks of likes, subscribe, comment, … are important to creators. It definitely would feel less taxing to me if I knew that the stuff I’m doing is actually helping people rather than shouting into the void and hope someone is actually listening. So, if you enjoyed my streams and/or blog posts on various topics, while I’m on a break, please share your favourite posts or videos with others, comment to state whether you agree or disagree, in general, if you care about the content, show it. Because it just feels so draining to put effort on things people don’t seem to notice.

If You Party, Think of The Non-Drinkers

This post is scheduled to be published the day after that the UK (where I live) is “un-locking” by removing pretty much all of the Covid restrictions. This means that, even more than before, people will want to party. Not that the full unlock was necessary for that, as a number of us who already got both doses of vaccine have been meeting up, following mask and distance protocols, in the past few weeks.

What will likely change is that work outings, parties, and general meetups (tech and not) will begin again, at least until we get to a point where it’s too scary to keep open, and we’ll bounce back into a partial lockdown (yes I’m not particularly optimistic about the whole situation). And with that comes the usual equivalence of tech with beer. And while Kara’s Model View Culture article is still my favourite pointer to give people to make it clear that alcohol preferences are a diversity dimension, I thought I could make a few more explicit points from the point of view of a non-drinker that is otherwise playing at the lowest difficulty setting.

First of all, let’s clarify one thing: I’m not arguing that those who like to drink alcohol at parties should not be allowed to. While I’m uneasy in events where alcohol is flowing too easily, I’m not going to say that my preference should override that of other people. My wife enjoys wine, spirits, and cocktails, and we have fun out together and with friends without problems. It just means that we choose where to go with a little more attention to the menus and the options provided.

The other thing to talk about, before I go into the nitty gritty details, is that there are many different reasons why people don’t drink, and some of the people will be more open than others to discuss them, even when they are the same reasons. I don’t drink for health reasons, but there are people who don’t drink because they don’t like the taste, people who don’t drink because of their religion or philosophy, and there’s also people that may not be drinking for a time being — and that again can be for a multitude of reasons: driving and pregnancy are two of the often listed reasons, but there’s a multitude of other reasons, that can include things as normal as not feeling like it. You don’t get to know why people are not drinking, you shouldn’t insist on asking a group if they are all cool with alcoholic drinks being the only thing on the menu.

Soft Drinks, Softer Drinks, Softish Drinks

By experience, a number of drinkers think that the world of soft drinks is boring and thus as long as some drink is available that doesn’t have alcohol, they think everything is done. This is neither true from the point of view of requirements nor from the point of view of preference. Which usually means that, if you only provide one type of soft drinks, you’re likely to still disappoint some people.

For instance, most of the readers of this blog know that I’m diabetic, so I generally avoid sugary drinks (but can still have one or two a day without too much risk depending on the drink, the sugar, the meals, and the physical activity), but I also avoid drinks with ginger because it tends to over-exert my pancreas, and just makes me feel sick afterwards.

What follows is going to make it sound fairly negative, and complicated, as I’m going to shoot down every single common answer to the soft drinks choices made available to non-drinkers. This is not to make us sound extremely picky, but to illustrate that soft drinks can, and should in my opinion, take more space on a menu than the small corner next to the kids’ options.

Let’s start with a classic: non-alcoholic beer — the common choice for drinkers that need to drive. I’ve actually been suggested that a few times, and while I did use to drink Bavaria-branded non-alcoholic beer, that was when not drinking was a personal choice, rather than a survival matter. But it’s generally not a great idea: beside the fact that a lot of non-drinkers would probably not like the taste, way too many beers are sold out there calling themselves non-alcoholic, but having 0.5% ABV (Alcohol-by-volume) which make them… pretty much the same as normal beer, from many points of view, both philosophical and physiological.

The next, near ubiquitous option is colas – whether brand name like Coca-Cola or Pepsi, store brand, or independent – the vast majority of these frizzy, sugary, and caffeinated. I’m usually okay with these, as long as they are sugar-free, since as most of the readers of this blog know, I’m diabetic. I can have one glass of sugary Coke if my blood sugar level is falling rapidly due to exercise, but I’m unlikely to get a second. But I know a number of people who can’t, or don’t want to, have caffeine. Again, this is both philosophical and physiological: some people are affected more than others, and while I survive on caffeine, other people would have strong negative reactions to it.

To wrap together another point for frizzy drinks: they don’t go well for everyone — whether it’s a problem with gassiness, with teeth, or anything else, it’s always nice to have a flat option of some kind. I mean, would you trust a restaurant that only had gas water, and no tap water option.

Speaking of fizzy things called beer but that are not alcoholic, ginger beer and root beer are not as common here in the UK than they would be in US and Canada, particularly the latter. Unfortunately for me, ginger beer (or ginger ale, or most things with ginger) are not a great idea — it looks like ginger stimulates the pancreas and it makes me feel just ill.

Fruit juices are often a safer option — although again you should be careful with sugars, and there’s a few combination of juices that tend to be unsafe for people. For instance, strawberry allergy is not uncommon — but since it’s usually a nuisance rather than being life-threatening, it’s not usually listed as an allergen, but it definitely wouldn’t be a good time to be had for someone who would get a reaction to it.

Another interesting option are milkshakes and smoothies, which tends to be something different, but also include many of the already noted risks (sugar, allergies), plus they are often dairy based, which means they come with their own limitations, for those who are allergic, or vegan — and despite me not being one, and having heard all of the jokes, this is still an important diversity dimension.

My personal favourite, honestly, is mocktails. The problem with them is that they are not usually around in the same place you organize a “beer party” — and it might sound fancy, but I want to say that for the most part this is my usual fare at hotels, and not just when I’m actually staying in one. They still include some of the issues noted before: sometimes dairy, often sugary, and in certain cases a risk for allergies, but with enough choice, these can be overcomed.

If we do not restrict ourselves to hot, thirsty weather, we should also consider hot drinks such as coffee, tea, and various types of herbal teas. For conferences, these are pretty much the standard fare for non-alcoholic drinks. The latter in particular, since they require making available hot water, and a selection of teabags, or variation thereof. Remember though, that as I said above not everyone will be happy with caffeine, so it’s a good idea to have something that is caffeine free available, even if it’s just camomile tea (which I hate, to be quite clear, but that’s me.)

And to finish the list off with the worst option you can give non-drinkers: spa water. If you think you’re giving plenty of options for non-drinkers by providing multiple “flavours” of spa water (that is, water infused with fruits and vegetables), then please go back to read this whole post from the beginning, and just don’t do it. As a non-drinker who has been told before that four flavours of spa water was a “selection”, I want to say that this adds insult to injury: spa water is pretty much “less fresh water”, as it ends up just sitting in a container, often under lights making it warmer than room temperature. Just don’t count on this as an option.

Again, going through all the various options is not to make non-drinkers appear picky and choosy. It’s to remind you that there is no one magical option that everyone will be happy about. If you intend to have an inclusive event (which I’d consider a requirement, if you’re organizing a work event), then you should consider a selection of non-alcoholic beverage to be available.

Event Organization Is Hard

I’m not good at organizing events myself, which is why I try not to be the one doing it, as my preferred place would be a bookstore, ignoring the chit-chatting and rather exploring books and tastes. Actually, now I think I would suggest that as a possible activity one day — I wonder if it would be possible to book Waterstones Piccadilly for an event, and just have people browsing through and convincing other colleagues to read their favourite books.

So for the most part, events that I have attended have been organized by other people, more often than not by the administrative business partners of the org I was working on. And they were almost always perfect organizers, with… a few exceptions. I want to talk about the exceptions as pitfalls, not to blame the organizers of said events, but to show possible mistakes that I’ve seen happening, and that should really be avoided.

Before going on the problems though, let me give a huge shout out and a hug to the one organizer who tried reversing the stereotype on its head and organize a “dry” holiday party for once, being on the receiving end of complaints, rants, and abuse — folks, if you really can’t imagine spending an evening with your colleagues while all of you are sober at a party, then think for a moment at why would your non-drinking colleagues want to spend time at a party while they are sober and you are not! If you think this is a perfectly reasonable state of affairs, we should have a talk, because it really isn’t.

The first common answer that I got a few times in the past is that alcohol is brought in, but the fridges at the office are stocked up with soft drinks, so “just grab one on your way to the event“. This makes us non-drinker feel less important, because it means that we’re not worth the extra effort to bring something special in, while the rest of the folks are getting something that is indeed not readily available at any moment. It waters down the event for the non-drinkers, and makes us feel unwelcome. But even more so, it makes us not even worth the effort to put our drinks next to the others: we need to bring our own.

If you think I’m being a bit harsh here, let me give some more context: in particular in bigger companies, event organization and catering is contracted out to vendors. Often, those vendors are different from those that stock the fridges in snack areas and similar. To run an event, the organizers are constrained by a budget: any drink (alcoholic and not) ordered from the “event catering vendor” is charged against that budget, but the drinks in the fridge are not accounting against the event, as they are part of the regular budget for the office.

This means that, for an organizer wanting to maximise the booze available, asking the non-drinkers to just pick up drinks from the regular fridges means having more budget available for the drinkers, practically subsidizing the good time for the drinkers. Now, you could argue that it doesn’t matter since it’s company money, but let’s remember that for work events, this is the same type of policy that reminds you not to take bribes or misuse funds — fair is fair, right?

I do have a suggestion for this, which is to make sure that in the catering ordering process (which exists, either as a third-party software, or a set of forms that need to be filled in — every big company would have event organization forms!) there is no space for an order to be made of alcoholic drinks only. That is, ordering 𝑥 servings of alcohol should require ordering 𝑎𝑥 soft drinks — either automatically, or by rejecting the form until corrected.

Speaking of company policies, sometimes they have alcoholic drink rules at work-sponsored events, such as a limit of two drinks per event. Enforcement of these rules is… varying, in my experience. Fairness (again that work) would want that rules would apply to everyone, but I have experienced different approaches from organizers, some of which only paid lip service to the rules by ordering two drinks per invited person (which makes the presence of non-drinkers a useful way to let people breach the rule), though most of the time this is implemented by providing tokens to exchange for drinks.

Drink tokens implementation are also somewhat different. The “best” implementation I can think of was when soft drinks were unlimited, with the tokens being used to get the alcoholic drinks — unfortunately this has limits as well, particularly if the needed amount of soft drinks was underestimated. Plus in one case I found myself being scoffed at by an organizer for not having given my drinks token to them or someone else for extra drinks, since I wouldn’t drink. To balance the scale, let me say that I have more experiences with good organizers that were happy to take the tokens back off me to avoid misuse — kudos to them.

Where things go very wrong is where beers are free (with a token) but soft drinks are not. This only happened to me once — and hopefully won’t ever happen again. It’s more common when the event is hosted in a catered facility such as a restaurant or a bar, since it’s really strange for a in-office event would even allow you to pay for anything.

Another similar situation, that has less of an effect on non-drinkers – except for the part where it creates a two-tier system of which company policies are enforced and which ones are turned a blind eye for – is the situation in which an event is held at a resort or other similar venue, and beers require a token (limited by company policy) but cocktails and spirits can be purchased separately at the venue. Not my problem, but keep an eye for those situations.

Finally, let me say that non-drinkers are not monsters. We understand that different people perceive drinking and having fun differently. I personally wouldn’t go and make character judgement on organizers for falling trap to one of the mistakes I listed here — particularly at their first event, or the first event with a new group. But please own the mistakes, and don’t hide them with lies.

One of the worst interpersonal experiences in my career has been related to the lack of soft drinks at a company event. When I raised the issue, trying to understand if there’s any process failsafe that could be added to avoid this (such as the ordering limits I wrote about earlier), I was told that the order was done correctly, but the caterer “forgot to unload the soft drinks”. Well, it happens, right? The problem was when a month or so later, at the following events, the selection of soft drinks came to be four types of spa water (see above) and a half-tray of smoothies… and this was after an explicit reminder of what happened at the previous event. Difficult to trust the organisers after that.

Drinkers Come From Bars, Everyone Else, Venues

As I said above, I’m not usually in the business of organizing large events. But at least when it comes to my small social circle, I have organized a number of dinners, so I can at least give a few pointers about what may make it smoother to make non-drinkers feel more included in a social event.

The first thing I would suggest anyone looking go organize even a small, inclusive event is to check out what the menu of the venue (restaurant, pub, bar) is, and whether they have a drinks menu with more than three items in the non-alcoholic section. I consider it a red flag is the only mention of soft drinks is in the kids’ menu, because that would just be silly. Do note, though, that particularly pubs and bars tend to not have a browseable drinks menu because most of their alcoholic choice is seasonal — a quick call to the place, or dropping in, would be useful to confirm in those cases.

What I find confuses people quite a bit, is that sometimes medium-sized chains are better at that than both small independent venues and larger chains. This is understandable when you consider the scale of them, but left a number of people confused that I would rather go to a Brewdog pub than to a local — in my experience every Brewdog always had at least a handful of soft drink options in addition to the usual Coca-Cola or Pepsi varieties, while locals tend to be happy with the supply of soft drink kegs they would receive in subscription by their distributor. But at the same time bigger chains (think, McDonalds and Burger King) have also no interest in stocking small-batch independent soft drinks so you’ll end up with the same, boring selection of drinks. If you’re lucky, at least in North America, you may find a Coca-Cola Free Style machine (or the equivalent Pepsi), that at least allows you to get a Coke Zero with lime.

If you go to a “craft pub“, which is very common in UK and Ireland in my experience, you may also need to pay some attention to what is on the food menu. A lot of pubs are very proud of their beer-based sauces and beer-marinated meats — make sure that you don’t end up at a place where the only safe option is french fries, which I have been to before. I personally look for places that serve at least partial halal food, since that requires them to be cooked without alcohol, and are safe for me, but do remember that just because something is okay for one set of people doesn’t mean that it’s okay for everyone else.

Conferences are often organized in hotels, many of which are designed for it, either by being or being attached to a convention center. In my experience, any hotel with a bar, above the “budget level” tend to have a long menu of alcoholic drinks but also a page or two of soft drinks and mocktails. In particular because it doesn’t take that many more ingredients to make a Virgin Mary, if you’re already serving a Bloody Mary. And did you know that the bars in most hotels don’t require you to be staying at the hotel itself to serve you? I used to grab drinks with friends at the St Pancras Renaissance, which was next door to my office — not only it served a great virgin mojito, but it always had a few pages of mocktails available.

The only thing I would ask everyone is to be also mindful of pricing. In the tech field, price sensitivity is often ignored, because a bar bill tends to be “a rounding error” for the most part. But when you hear drinkers complaining about their pint (568 ml) coming over £5, you may be feel less pressed to join them to grab a small Coke (250 ml) for £3. It’s obvious, when you think about it, that particularly for alternative, small-batch and independent soft drinks the costs are higher — but you should keep in mind that if the non-alcoholic options are there, but costs more than the beer, the non-drinkers are unlikely to feel welcome.

To put some context on this last problem — many venues in the UK offer a “two for one” deal on cocktails, either during happy hours, dinner time, or even constantly. But only a few of them extend the same deal to mocktails and other soft drinks. This means that when you look at the menu without the offer in place, the soft drink may not be cheaper than alcohol… but if you factor in the offer, it would be. This is not a great problem for the non-drinkers (after all, it literally just mean the cost is the same for them, offer or not) but the fairness comes back into the consideration, and we would feel more welcome in a place that extends the deal to our soft drinks of choice. To put some names into these issues, Brewdog runs similar 2×1 offers during happy hours, but doesn’t extend them to soft drinks, while Las Iguanas does — and they even have a long list of non-alcoholic drinks and mocktails (huge kudos to them!)

At this point you may have noticed a pattern: with the exception of coffee and tea houses, which for the most part don’t serve alcohol (the only exception I know for sure is Notes that serves great coffee, as well as wine), you end up having to raise the bar (pun not intended) in the venue selection to provide enough options to non-drinkers, at least in the UK. This is unlikely to change, at least until we socialize more the idea that soft drinks can be as interesting, and as varied, as alcoholic ones. Given the amount of ads I keep receiving for “hard lemonade” and similar drinks (so much for ads knowing “too much”), marketed towards a “macho” culture where drinking is the only way to have fun… I don’t expect much.

Personally, what I miss in London is a place as interesting as Dernier Bar Avant La Fin Du Monde in Paris (and now, it seems, Lille): their menu is filled with both alcoholic and non-alcoholic options, that cover pretty much any variation of preference or requirement. By feeling of geekiness, the closest places I know here are The Ludoquist in Croydon and Library Pot in Richmond — but while both of them have a selection of Fentimans Botanic Brews, they end up costing more than the beers!

For the time being, one of the best places – in terms of selection of non-alcoholic Mocktails – is Luna Gin Bar in North London. Not a cheap place, let’s be clear, but it has something different, and the Atomic Cat is quite something.

Options Exists, If You Look For Them

As I said, there’s plenty of options to choose from beside the “usual suspects” (Coke, Pepsi, 7up, Sprite, …) I think this is important to say because, when I previously complained about work events not providing any interesting non-alcoholic drink I was asked to provide examples — so I came up with a list of them, never to be heard of again.

So let’s start with the one I already named: Fentimans Botanic Brews. Fentimans makes and markets a number of different soft drinks including various types of tonic waters, lemonades, colas, and even ginger beer. As noted above, these are often the choice of more indie venues here in London, in my experience. While ordering direct provides a significant choice, bars and cafes only appear to keep a limited selection of them. They also have some surprising combinations, as their Victorian Lemonade appears to have a significant (to me) amount of ginger — unfortunately their store pages don’t list ingredients nor nutritional values.

A similar selection in North America (and, as it turned out for me, in China) is available from Boylan. I had their Cane Cola a few times while visiting Shanghai (but only if my sugars allowed it — they didn’t have the Diet version available) and it tasted great. I haven’t had a chance to taste more of their options, but from their website, it looks extremely interesting. Ordering them in London does not appear to be trivial though, so I guess I’ll have to taste them after the pandemic is really over.

Geeks who have been at hacking conferences in Europe, particularly around CCC, definitely already know Fritz-Kola, the Hamburg-based brand of vastly caffeinated drinks (that feels almost an opposite impression of Boylan, which markets mostly caffeine-free options). I have found that some Brewdog pubs have them available in London, which is awesome — but even more awesome is that you can buy them online in UK through different retailers. Personally I’ve been buying mine from The Belgian Beer Company (referral link), since they also have a selection of beers including my wife’s favourite.

More recently, I found a different, non-fizzy source of caffeine: ChariTea Black, which is sold by the same company also selling Lemonaid — also based in Hamburg, like Fritz! They sell on online in the UK through their own webshop (referral link), although I’ll warn that the packaging feels a bit lacking for shipping long distance. Both ChariTea and Lemonaid taste awesome, at least the ones I can have (no ginger) — and I’m more than happy to risk the delivery disruption to get it directly from the source.

To come back to local brands, Square Root are a London soda company that has a long list of interesting flavours, although I only tried one of them up to now, they are often available at Brewdog, though only one or two flavours at a time it seems. Square Root is interesting for another reason: they use a variation of a “crown cap” for the bottle, which is typical for beers, Coke, and Fritz. Possibly knowing that their audience is unlikely to to have a bottle opener on them, they come with a pull-off ring. Neat!

I haven’t tried Franklin & Sons sodas themselves, though they make the base of mocktails at Luna Gin Bar referenced earlier, and they can’t be that bad. Also even though they were sugary, they weren’t horribly sugary, so that sounds like a good option for me to try again.

There are more, obviously. I was going to talk about the non-alcoholic Senser spirits, but their website is gone, and I guess even the lockdown didn’t help them. They tasted… different, but not something I would be getting again any time soon. And if you go out to randomly look for alcohol-free alternatives to certain drinks, you can find something, but quality is very… variable. That’s why I prefer trying new soda brands at a pub or a bar, it allows me to try something without committing to a huge mistake.

Not-Interested Based Advertising

Full disclosure: I work for a company that is known for IBA (Interest Based Advertising), and that is often criticised for it. I have already expressed my opinion, repeatedly, so I don’t represent the view of anyone but myself.

With the exception of Senser (which appears to have gone away in less than two years), all of the other brands I found by trying the soda at a venue. You would expect that, as a non-drinker, I could actually see a bunch of ads about sodas like these. Clearly I’m the perfect target for them. Instead, I get ads for Brewdog beer subscriptions (because I visit their locations for the soft drinks), and for “hard lemonade” (because they seem to be spending all of the money).

This is a clear example of failure of IBA, and it shows how much less “surveillance” and more “stereotype” this targeting is. But it also may be that the crowd that works on independent soda manufacturing is not interested in using the more “big tech” corporations. Which is fair, but it also makes me fear it hides these options from the “mainstream” of culture. Because whether we like it or not, alcohol is, and makes it very hard to take the spot.

So what we can do instead, is to socialize the idea that non-alcoholic drinks don’t need to be boring. And hoping that organizing events that cater to non-drinkers becomes easier, particularly now that we have a “new normal” in front of us.

If you do have a suggestion for other drinks to try, either available in the UK, or for local venues in other parts of the world that may be offering them, please leave a comment. I’m honestly looking forward to try more alcohol-free options in other countries when we can finally start traveling again.

Reverse Engineering an LG Aircon Control Panel — Low-Speed Serial Issues

In the previous part of the LG aircon reverse engineering, I gave the impression that the circuitry design was a problem mostly solved. After all, I found working LINbus transceivers, and I figured out a way to keep one of them “quiet” when not in use. And indeed, when I started drafting the post, it was supposed to be followed by a description of the protocol I identified, and should have been talking about how I wrote tools to simulate this protocol to test the implementation without actually touching the HVAC unit.

And that’s what I set out to do myself on a live stream, nearly two months ago now — the video says “Attempt 2” because on the first, short stream I ended up showing on stream my home wifi password and, even though it’s not likely that a bad actor would show up at my place to take over the wifi, it’s not good opsec, so I stopped the stream, rotated the password of all the devices at home, and came back to the stream.

So what happened? Well, while I was trying to figure out how to build an ESPHome custom component for the “climate” platform, but trying to send sequence of bytes through the serial port appeared to not work correctly: instead of being sent at the selected polling frequency, they would be “bunched up” together, sending three bytes instead of one, or twelve instead of six. It worked “fine” if I flushed the serial port, but the flush operation would then take longer than the time between the commands I wanted to send, so that didn’t sound like a good plan.

As you can imagine from the title, this particular problem only happened with the slow, 104 8n1 configuration that the LG aircon needs — it didn’t happen at all with higher baudrates such as 9600, which suggested the problem was related to the timing of the connection, which is not uncommon: a lot of UART implementations that include FIFOs tend to define some timing based on the timing of a “space” or of a full character.

What also suggested me that, is that someone, somewhere, was complaining that the ESP32 couldn’t do the slow speed that this aircon needs, and that they preferred using the ESP8266 because that one came with a software serial implementation. Unfortunately, I cannot find anymore where I read that, to link it there and to point out that the code for the ESP8266 software serial actually works without significant modifications on the ESP32 — it’s just that the lack of need for it means it’s not readily available.

So indeed, I managed to get the ESP8266 software serial to work… except for the fact that it was not quite reliable. At 104 bps (which is the speed the aircon protocol needs) sending a six bytes sequence (which is the size of an aircon packet) takes about half a second — add another half second for the response (which is also six bytes), and you have a recipe for a disaster: one second every two seconds (which is the frequency of command exchange between panel and HVAC) would be spent just on serial communication — anything else happening during those time and messing up the timing meant bad communication.

Another nearly-software-based alternative I attempted, and also kind-of worked, was using the RMT peripheral. This is the remote control peripheral included in ESP32 — and the reason why Circuit Python made it harder to send pulse trains on FeatherS2: it’s no longer just implemented in software, but it relies on hardware buffers to allow sending and receiving pulse trains. It’s awesome to offload that, but it also comes with limitations. In particular, while I did manage to implement a 104 bps serial transmission through this interface, it would only allow me one serial pair rather than two, severely limiting what I could be doing with the aircon board.

Content Warning: though I’m personally aiming at following more modern conventions, the terminology in use by datasheet and other content that I’m about to link to still use older, less inclusive terminology.

UART — But Discretely

So instead, I used my overcomplication skill to come up with an alternative: discrete UARTs! You see, back in the days when personal computers came with serial ports, and before chipsets started having everything into a single physical chip, and even before most of the functionality of basic peripherals was merged into a Super I/O chip, we had multiple competing discrete UART chips available. The most common one being the 16550, at least for my personal experience. These are still available, and you can indeed still use the 16550, although to do so you need to use a lot of I/O lines, as 16550 and compatible have usually a parallel I/O interface, which is suitable for ISA bus connection, but not so suitable for a microcontroller even when it’s quite generous with its GPIO lines.

As an alternative, I started looking into using a SC16IS741A, which is a I²C (and SPI) chip. Instead of using a lot of separate I/O lines for sending commands and data to it, you send it with the usual two wire interface, and it internally decodes it into whatever format it needs. You may wonder what the difference is, between using the actual UART and sending it over to an I²C hardware UART — the answer is a bit complicated to explain theoretically, but I think a visualization of it will go a long way to explain:

What you see here is a screenshot from the Saleae Logic software capturing three I/O lines: the I²C bus (SCL above, SDA below), and the TX line of the discrete UART. This is a very much not optimized transaction that shows the sending of one byte at the 104 bps configuration that my aircon control needs. It’s one order of magnitude slower to send the byte than to set up the UART to send the message and send it, and this is with a relatively slow bus as I²C is. And it doesn’t scale linearly, even.

Basically, the discrete UART allows offloading the whole process of keeping up with the timing, and it does so in a very efficient way. Receiving is even more interesting, because it does not require the microcontroller to pay attention to the received bytes until it’s ready to process them, and maintain them in the FIFO in the meantime. But this kind of features already exist in most microcontrollers (often referred to “hardware UART”), and when they work, that’s awesome… but clearly sometimes they don’t quite work.

This particular device would be useful on boards based around the older ESP8266 micro, as that only has a single hardware UART, and is used for the logging. With one (or more) of this or similar chips you would be able to control a much wider number of serial-controlled devices, and that makes them valuable.

Unfortunately, ESPHome does not really have a way to provide an uart bus outside of the core platform, at least right now. If I did end up working more down this particular route, I would probably have paid more attention to integrate it — it’s not hard to provide the same interface so that it would be a drop-in replacement, but it does require some reshuffling of the core hierarchy so I don’t think I can pull that out just yet.

Writing a Device Driver, a Component, a Library

In either case, whether you want to integrate this directly in ESPHome, or use it from another software stack, you need to implement its own command set. This is what you usually refer to as a “device driver” for operating systems such as Linux and Windows, as it provides access to the underlying device with some standard interface. And indeed, the Linux kernel has a driver for a set of related of NXP UART peripherals: sc16is7xx.

Now, while the NXP-provided datasheet has all the information needed to program for this chip, having the source reference on Linux made things significantly easier, particularly because unless you know what to look for, you most likely will misread the datasheet. I have some (though not much) experience with I²C devices, but there were a few things that ended up confusing me enough that I wasted hours down the wrong route.

The first problem was figuring out the addressing convention. I²C addresses should, by convention be 7 bit. While the protocol sends a whole byte for addressing, it uses the last bit in it to specify whether you’re issuing a read or a write. But despite this being the common convention, and the one that ESPHome, CircuitPython, and just about anything else expects you to use, some datasheet do not follow that, and provide the full 8 bit “address”. You can see more details on that on the Total Phase website, which was instrumental for me to get to the bottom of why things kept disagreeing with what I was writing.

Once the peripheral was addressed, the question to answer was about the registers addresses. In my first attempt at configuring the chip I was falling short. A quick look through the Linux sources told me that I was missing a left shift of the register address… which made me go “Huh?” for a while. Indeed, the datasheet provides explicit tables explaining the register addressing: registers have a 4-bit address, but are set in the 3:6 bits of a byte, with bit 7 (MSB) being used in SPI only to select between reading and writing to the register. Of the remaining three bits, two are used to select between channels — because some of the matching chips by NXP include more than one UART on board, though unfortunately I couldn’t find one that I could easily work with on a breadboard that did. The last one (LSB)… is not used. It’s always off and reserved. But more interestingly, the Linux driver only shifts the address by two, not by three like I had to. So I’m wondering if this does mean that this chip is only mostly compatible with the ones I was looking.

So after one full day of figuring out how to properly run my component over ESPHome, I decided I needed something for prototyping this I2C faster.

Enter Circuit Python

By now you probably know that I do like Circuit Python. And as it turns out, I already have written some code for I²C in Circuit Python when I extend the MCP230xx library to include the older ’16 model. So it didn’t feel too odd to go ahead and use a Trinket M0 with Circuit Python to play around with the UART.

The choice of the Trinket M0 over the more capable Feathers was not random: while the Trinket has a physical UART and the pins to use it, it’s also a very tiny device. The fact that you can use multiple physical UARTs through the I²C bus allows a significant expansion of the I/O abilities of that class of microcontrollers.

At the end, I not only ended up writing a CircuitPython compatible library that allowed me to use the UART, but also re-writing it to leverage the Adafruit_CircuitPython_Register library, making it significantly easier to add support for more features.

The library supports an interface that is nearly identical to the one provided by the built-in serial, although I don’t think theré sa way to make sure it really is, because similarly to ESPHome it doesn’t look like Circuit Python ever consider the need to support UARTs that are not part of the original hardware design, understandably so, as these discrete UARTs are still fairly uncommon.

But I went one step further: when I read the datasheet the first time I wasn’t sure just how strong the suggestion for 1.8432 MHz crystals was, for the divisor. Turns out it’s not strong at all, so the whole amount of crystals I bought at that frequency are not particularly helpful. Worse yet, it turns out I don’t need any clock, because even the Trinket M0 is able to create a 50% duty cycle PWM output at a frequency that is high enough to use as driving clock for the UART.

That means that I can fit, in a half sized breadboard, the whole circuitry I need, including only two passives (the pull-up resistors on the SCL/SDA lines), providing the clock as a PWM output from the Trinket while also piggybacking its reset line. This was a surprisingly good setup, and would actually allow me to control the two sides of my aircon (panel and hvac) if I was still going with the discrete UART idea.

But it turns out I really don’t need any of that: ESP32’s UARTs worked out just fine at the end — at least in the most recent firmware as uploaded by ESPHome, so I decided to set aside again the UARTs and try instead to control the aircon at least with an USB-to-UART adapter. But that’s a story for another post.

Bonus Chapter: Dual-UART chips

As I said earlier, there are some options out there for multiple UART chips that would be interesting to use for cases like mine, in which I need two independent, yet identically configured UARTs. Dual UART chips are not uncommon, but I²C controlled ones are.

If you look around for I²C Dual-UART options, you most ilkely will end up on the DFRobot DFR0627 which is an “IIC to Dual UART Module” — IIC being the name you’ll find used on Chinese products to refer to I²C (it’s like TF card instead of SD card, don’t ask me.)

So why did I not even consider this particular option? Well, the first issue is that this is a full module, that uses the Gravity connector (which is similar to, but as far as I know not compatible with, the Stemma QT connector that Adafruit uses), the second issue is that there’s no documentation to go with how to use it.

Since I want to, at the end of the whole process, have a printed board I can just hang on the wall (possibly with a 3D Printed case, but that’s further along the way), I need to be able to get the components I want in retail-enough options that I can buy them and solder them in myself. I also need to be able to control those components with arbitrary software.

The DFRobot modules tend to have Arduino components, which you may still be able to use for ESPHome, but you wouldn’t be able to use with Circuit python during the more iterative side of the project. Since these components are open source you could go ahead and reverse engineer it from those, but it would be much easier to develop for something that has a datasheet and some documentation.

Indeed, the DFRobot website does not even tell you what chip is on the module. though if you look around in the forums you can find a reference to WEIKAI WK2132-ISSG, which is available through LCSC and comes with a datasheet. In Chinese.

If you just look at the pictures, you can at least confirm that this device is similar in functionality to what the NXP part I’ve been working with provides, except for the fact that it does not have the full RS-232 style CTS/RTS lines. So it really would be an interesting part if I ever decided to go back to the idea of using discrete UARTs, but it would require at the very least for me to get one of my Chinese-reading friends to translate enough of this 28 pages datasheet to be able to tell what to do. That is unlikely.

Extending the Saleae Logic

One of the reasons why I have always appreciated FLOSS is the ability to adapt, modify, and reuse tools and code and designs to build just what you need. Over the years I have done that many, many times, including writing ruby-elf to run my analysis and taking over unpaper to handle the scanned document.

What I have not, until lately, realized, is the usefulness of building your own physical tools, or rather I have figured that out by watching Adam Savage videos on YouTube, but I didn’t actually build physical tools until pretty much last year, when I started the Birch Books project.

Now, even though Saleae Logic Pro is not open source, neither in its hardware nor its software, I was recommended it many years ago, and I think it’s one of my best purchases in a long time, because it’s very compact, the software works great on Linux and Windows (which is handy because my Windows machine always had a lot more RAM), and the support folks are awesome at addressing issues (I reported a few wishlists, and a couple of bugs, and I think the only one that took a long time was a licensing issue). And since supporting and maintaining those tools is not my main hobby, I’m okay with accepting some closed source tools if they allow me to build more open source.

(I have also ordered a Glasgow, obviously – I mean, Hector working on it made it kind of an obvious thing for me to do – but I have had the Logic Pro for multiple years and the Glasgow is expected to deliver next year, so…)

Tip-Ring-Ring-Sleeve Man-In-The-Middle

So anyway, last year I started looking at ways I could make my life easier, and since I was already getting stuff printed at JLCPCB (not a sponsor), I thought I would try to at least build one of the devices I needed: a board that would let me pass through two TRRS connectors while “copying” the signals to the Saleae. The end result is in the following pictures (two different versions of the same concept — originally I made the board a bit too long, and the connector didn’t fit quite right).

As you can see from the image on the right (or the bottom on mobile), the main use I have for this is to be able to observe the transmission between computers and glucometers, since that has been my main target for reverse engineering over the years, and very often they are connected via a TRS 2.5mm or 3.5mm serial plug, similar to how Osmocom has introduced the world of open hardware many years ago.

The board is simplistic, but also provides a few features to make my life easier. First of all, it uses TRRS connectors, because they are compatible with the good old TRS connectors, but also support more modern ones. Secondly, it’s only 3.5mm plugs on purpose: finding 2.5mm cables is annoying, but finding 3.5 to 2.5 and vice versa is easy, and that’s why I use it with those cables at the end.

Finally, killer feature for me is that switch, which selects between CTIA and OMTP ordering of the connectors. If you’re not aware of it, over time two separate standards have been in use for wiring four conductor TRRS cables, with one of them often incorrectly referred to as “Android” or “Samsung” and the other referred to as “Apple”. The CTIA/OMTP naming is the correct naming, and basically what this does is just changing which of the two pins of the connectors is provided as ground to the Saleae.

Oh yeah and eventually, I released it. I did that under 0BSD, because it’s a really obvious design and if someone wants to reuse it, I’m only happy. I have considered whether this should be released under the CERN Open Hardware License and I can’t imagine why, but if you want to make an argument for it I’m likely going to be swayed by it, feel free.

I also originally drawn plans for a USB-to-Serial adapter using CP2104 — in two versions, one that would just be a simpler USB-to-UART with no level shifting, and one that would be a full-blown level shifting RS-232 port. The latter was something that came to me as an idea after reading a lot Foone on Twitter, to the point that I sent them a some of the untested boards (because I got stuck with the move and stuff, so it took me months to get back to them!)

Unfortunately, something didn’t work out quite right. The serial adapter didn’t work out at all in my case, and the RS-232 one keeps browning out, probably because I skimped on the capacitors. I would be ready to get a respin to try this again, but the current chip shortage does not allow me to make more orders for SMT boards with that particular chip in it.

Serial Adapters Harnesses

Not quite related to the Saleae, but related to the recent work on my aircon reverse engineering, was starting to think about labelled wire harnesses for serial adapters. And thanks to Meowy reply I went down the rabbit hole looking at professional label makers.

Now, let me be clear, I have had a label maker for quite a long time. I have blogged about it in 2011, when even the simplest of the label makers (Dymo LetraTag) was to be considered a risky investment (that’s what happens when you’re self-employed, coming from a blue-collar family, and in the backwaters of everything — one day I need to write more about that). I still use that label maker, pretty much our whole spice cabinet depends on it.

But there’s a different class of label makers, namely electricians’ label makers, such as the one showed off in the Big Clive video on the side. Of all the features that he shows off on that video, though, it was missing showing off prints on heatshrink, and how you can use those to make those labelled harnesses as shown in the tweet.

So, while the original price I could see (£110) was a bit annoying to invest for something I wasn’t sure I would be using very often, following Clive’s advice of waiting for Toolstation having it at a discount (which it did, at £69) was worth it — well worth it for me. Though to be honest, I got some off-brand heatshrink cartridges.

So the first thing I did with these have been creating a wire harness for one of the USB serial adapters that I have been working on the aircon with:

For this particular adapter, I also used a 1×5 Dupont plastic block — I did not crimp the cables, they come from a box of male-to-female cables that I bought off AliExpress some time ago, though you can refer to Big Clive on crimping, too. One of the things he shows in this last video, is that you can just break the plastic tooth to release the metal Dupont connector. Which is what I did: I broke off the plastic teeth, and re-fit the already crimped cable into the 1×5 housing.

I could have just prepared my own cables, to be honest. But to be honest, if I wanted to do that I would probably grab some good quality soft cables like the one that comes with the Saleae Logic, but I don’t have any of that stuff at home.

The other important part with this is that I used male-to-female on the cables, because these are the cables I need right this moment, because I’m working on the air conditioning reverse engineering, and that means I need to connect the serial adapter straight into the breadboard.

So About That Breadboard

You may have noticed one thing I said: I’m currently working on the aircon reverse engineering with breadboards, so having the cables end with a male connector helps. If you’re familiar with the Saleae Logic, you would know that the harnesses it comes with are 2×4 dupont cables with a female end to connect to boards. Which works great when you’re connecting to I/O pins on a board, or using the provided probes — but not really if you’re doing work on a breadboard.

So in the same vein as I did for the serial adapter above, I also decided to go ahead and make my own set of additional harnesses. They are once again not perfect, as the cables are the same AliExpress cables I used for the other harness, but they will do the job for the time being.

I have kept the colours the same – or at least the closet that I could get to with the AliExpress cables – because the Logic software uses the colours to represent the channels. And that means I don’t need to worry anymore of matching colours with colours all over the place. On the other hand, since working with a breadboard means I don’t have that many different ground positions, I decided to only put one ground wire per block. This is also because I ran out of black cables in the pile of AliExpress ones (I should order more).

This turned out to be extremely useful. Being able to just grab the cables and plug them straight into the breadboard on one side, and the Logic Pro on the other made has been a huge timesaver during my work, both off-camera and during streams.

A Note On Board Design

After this experience, I also want to share some of the considerations I made my mind up on, after these trials and errors. The first is that it’s very hard to find more than 1×8 Dupont connectors. That means that in many, many cases it’s much easier to go ahead and use 2×5 connectors if you have multiple pins that need to be connected together.

It also makes sense to space them to “key” them as a single connector, rather than using multiple, out of line pin lines, which I have done in the past. Indeed, in the custom CP2102 that I spoke about earlier, I wouldn’t be able to have a single harness, and would rather need two. That was a bad idea for a design, and when I’m going to re-do it (because I am going to re-do it), I’ll make sure the pins are arranged 2×4 instead, so that it can be connected with a custom labelled harness… or one of the harnesses that comes with the Logic!

This is the kind of important and useful notes that I like reading, finding in videos, and I wish would be collated in more practical material for wannabe PCB designers. It definitely carries over on my current designs of the air conditioning control board.

Customizing The Software

In addition to the custom hardware dongles I’ve been playing with, I also started looking at using more advanced software for the analysis.

In the Software Defined Remote Control repository, I already had a binary that would be able to receive a CSV exported by Logic and interpret it. Unfortunately for this to be easily parsed from within Logic, I would need to write a C++ extension for it.

On the other hand, for the air conditioning, I just needed to write a Python High-Level Analyzer, which provides me with at least a bit of the understood meaning of the various bytes in the packets.

I hope that as time goes on, and I find myself reverse engineering different hardware, I will be able to build up a good library of various analyzers — hopefully sharing enough code between them. Which is something I definitely need to engage with Saleae on: it does seem like right now you cannot depend on external Python modules even from HLAs, but it would make sense to be able to use libraries like construct, or even higher level libraries for things like my aircon or some of the glucometers I have worked on before.

Reverse Engineering an LG Aircon Control Panel — Buses and Cars

This is part two of my tale of reverse engineering the air conditioning control panel in our apartment. See the first part for further details.

If you are binging on retrocomputing videos like I’ve been doing myself, you may have the wrong impression that a bus has to have multiple lines, like the ISA and PCI buses do. But the truth is that a single-wire bus is not unheard of or even uncommon. It just means that the communication needs to be defined in such a way that there’s no confusing as for who is sending at any given time. In this case, it’s clear that the control panel is sending six bytes which are immediately (and I do mean immediately) followed by six bytes response from the HVAC.

So the next step was to figure out what those six bytes where, and thanks to Saleae’s recent licensing of sample high-level analyzers, this became a piece of cake. While I’m not at liberty to share the code, at the time of writing, I ended up writing an analyzer that would frame together the 6 bytes from the panel and the 6 bytes from the HVAC. Once I had that, it was also easier to notice that the checksum byte was indeed the same as other LG protocols, it’s just that it applied separately to the two 6 bytes packets, which means there’s only five bytes in the message that need to be decoded.

A screenshot of the Logic 2 software showing an analyzed trace with the high level analyzer loaded.

With a bit of trial and error, I already decoded what I think will give me most of the important controls for my plan: how to change the mode between the aircon, heat pump, fan, dehumidifier, and how to change the fan speed. The funniest part is that the “Auto” mode is actually not a mode at all, and just means that the thermostat appears to be sending the “aircon” or “heat pump” as needed.

What got even more interesting, is that if you leave the control panel by itself, after a few minutes it appears to notice the lack of an HVAC connected, and goes into an error state where it alternates the display between “Ch” and “3”. Either it’s reporting its own channel for diagnostics (assuming it’s misconfigured) or it’s just showing a particular error status. In either case, that threw a spanner in my plans.

The first problem is that obviously you wouldn’t be able to connect the 12V data wire to the ESP32 directly. That’s kind of obvious: the ESP32 is a 3.3V microcontroller, and if you tried to use a 12V wire with it it’ll just… go. My original intent was to use two optocouplers: one to receive the data from the control panel, and the other to inject my messages onto the wire. But that won’t work quite the same way for a bus, and while I could try to build up the right circuitry with discrete components, I would have rather used a ready-made transceiver.

The problem with that the transceivers are made for specific buses, and so the first question is to find the right bus that is by LG. A lot of HVAC systems (particularly in industrial scale) use Modbus over RS-485 — I have experience with this since the second company I ever worked for is a multinational that works in the industrial HVAC sector, so I learnt quite a bit of how those fit together. But an RS-485 connection would require two wires, since it uses differential signaling, and that’s already excluded.

Going pretty much by Google searches, I finally nailed down something useful. In the automotive industry, there’s a number of standards for on-board diagnostics (OBD). The possibly most famous (and nowadays most common) of those is the CAN bus, which is widely used outside that one industry, as well. LG is not using that. But one of the other protocols used is ISO 9141-2, which includes a K-Line bus on it, which according to Wikipedia is an asynchronous serial connection over a single bidirectional wire without handshakes — though it is using a 10.4kBd signal which is… exactly 100 times faster than the LG signal.

Through these, I found out about the LIN (Local Interconnect Network) bus, which is also used in automotive, specifies a higher level implementation on top of ISO 9141 compatible electrical signaling, but happens to be a good position to start the work with. Indeed, there are a number of LIN bus transceivers that are pretty oblivious of the addressing and framing on the protocol — on purpose, because the specifications have changed over the years. But what they are good for, is to connect to a 12V, high recessive bus, and provide microcontroller-leveled RX and TX signals.

An example of these transceivers is Microchip’s MCP2003, so I decided to set myself up to redesign the board based on that. But since the control panel also needed to receive “acknowledgements” from the HVAC, it meant that each “smart controller” needs two transceivers: one where it fakes the controller to the HVAC, and another one where it fakes the HVAC to the controller. And both of those needed to have the ability to just go into a “lurking” state where they wouldn’t be sending signals if I flipped a physical switch.

Screw It, I’m Doing It Live

So here’s where things got a bit more interesting in multiple directions. In the days just before this work, I was being asked a few pointers about reverse engineering — and unfortunately I don’t know how to “teach” RE, but I can at least “go through the motion”. After all, that was the more interesting part of my Cats Protection streaming week, so once the DigiKey order arrived with the transceivers and all the various passives to add around it, I decided to set up a camera, and try breadboarding the basic circuitry.

Now, setting aside the fact that I do not particularly enjoy streaming with an actual camera, and indeed the end results left a lot to be desired, the two hours stream was fairly productive. I found that the PL2303 USB-to-serial adapters actually work quite well at both 100 and 104 Bps, and that indeed the transceiver mostly works fine.

It also showed an interesting effect that I did not expect: as I said earlier, after a few minutes without getting an answer from the HVAC, the control panel enters into an error state (Ch/3). I assumed that what it needed was a valid packet from the HVAC, with checksum and information. Instead, it seems like just filling up a buffer, even with invalid packets, is enough to keep the control panel working: as I typed random words onto the serial port, while connected to the bus, the Ch/3 error vanished, and the panel went back to a working state.

This was surprising for one more reason: at least some of the packets sent from the HVAC to the panel had to include the capabilities the HVAC system has to begin with. The reason why I knew that is that the control panel appears to have a lot more functions when it’s running standalone, compared to when it’s installed on the wall. Things like a “power” fan mode for the aircon, the swiveling ventilation, and so on.

Spoiler: it turned out to indeed be the case: the first two commands sent from the panel to the HVAC appear to be some sort of inquiry, that provide some state to the panel to know which features are supported, including the heat-pump mode and the different fan speeds. But for now, let’s move on.

Before I could go and and try to figure out which bit related to which capability I hit a snag, which is what I got stuck at the end of the stream there: sending the character ‘H’ on the serial port (a very random character that just happens to be the start of the string “Hello, world!”) showed me something was… not quite right.

A screenshot from Logic2 showing a 0x68 sequence being interpreted as 0x6C.

This is not easy to see, beside for the actual value changing, but in the image above the first row (Channel 0) is the 12V bus (which you can read on the fourth line is actually 10V), the second and fifth rows (Channel 4) are a probe connected to the RXD pin of the MCP2003, and the third and sixth (Channel 5) are a probe connected to the TXD pin (which is in turn connected to the TXD of the USB-to-serial adapter).

Visibly, the problem is that somehow the bus went from “dominant” (0V) to “recessive” (12 10V) too fast, making the second and third bits look like 1s instead of 0s. But why? My first thought was that it was an electrical characteristic I missed – I did skimp on capacitors and diodes on my breadboarding – but after the stream terminated, I grabbed my Boox, and checked the datasheet more carefully and…

1.5.5.1 TXD Dominant Time-out

If TXD is driven low for longer than approximately 25 ms, the LBUS pin is switched to Recessive mode and the part enters TOFF mode. This is to prevent the LIN node from permanently driving the LIN bus dominant. The transmitter is reenabled on the TXD rising edge.

MCP2003/4/3A/4A Datasheet, DS20002230G, page 10

25ms is nearly exactly how long the dip to dominant state is on Channel 0 (and about the same on Channel 4): it’s also nearly exactly 2.5 baud.

A Note About Baudrate

I have complained loudly before of how I’m annoyed at people who think those younger than them know nothing and should just be made fun of. I don’t believe in that, and I think we should try our best to explain the more “antique” knowledge when we have a chance.

Folks who have been doing computers and modems well before me appear to love teasing people about the difference between “baudrate” and “bits per second”. The short version of that is that the baud rate relates to the speed of sending a single impulse, while the bits per second (bps) is (usually, but not always) meant to be taking the speed of the actual data transmitted. The relation between the two is usually fixed per protocol, and depends on how you send those bits.

In a asynchronous serial protocol (including RS-232 and this LG abomination), you define how you send your bits with an expression such as “8n1” or “7-odd-2” (also called the framing parameters) — or a number of other similar expression with different values in them. These indicate that each character sent is respectively eight or seven bits in size, that the parity is not present in the first case, and is odd in the latter, and that the first includes only one stop bit while the second is providing two. In addition to this, there’s always a single start bit.

8n1 is probably the most common of the framings, and that means you’re actually sending 10 bits for each character. A baudrate of 9600 Bd/s gives you a 960 bps raw connection, the 104 value for LG is the actual baudrate, as I can measure one of the impulses from the original control panel at 9.745ms — which actually would put it around 103 Bd/s.

Which is where my assertion that 25ms is nearly exactly 2.5 baud — 2.65 to be a bit more precise: you take the length (25) and divide it for the time needed to send a single baud (0.9745).

What this means in practicality is that the MCP2003 series (including the more modern MCP2003B that includes the same time-out behaviour) has a minimum baud rate as well as a maximum one. The maximum one is documented in the datasheet as 20 Kb/s, but the minimum is affected by this timeout: a frame of all zeros would be the worst case scenario in this condition, as the line would be asserted low (“dominant”) for the longest time. While theoretically you can define framings the way you prefer, the common configurations vary between 5 and 9 data bits per frame (though I would have no clue how to process the 9 bits per frame to be honest!) — which means that the maximum number of space (‘0’) baud would vary between 6 and 11.

Why six and eleven? Well, the “start” baud is also a space (logical zero) – which means that if your framing is 5n1, the 0x00 value would be sent with six “spaces”. And if you use nine data bits per frame with even parity, 0x000 would then be followed by a “space” in parity (to maintain the number of ‘1’ bits even), bringing it up to 11 (start, nine zeros, and parity).

The minimum baudrate for a certain framing configuration is thus calculated by dividing the maximum number of consecutive spaces the timeout in seconds (0.025), which leads to a minimum baudrate of 240 Bd/s for when using 5n1, 440 Bd/s for 9e1, and 360 Bd/s for the most commonly used 8n1 framing. Which is over three times faster than what these LG units are using.

I Need A New Bus Transceiver

Since I couldn’t use the MCP2003, I ordered a few MCP2021. Note that Microchip also says that these are not recommended in new designs, suggesting instead the ATA663232 — which as I’ll get to has all of the disadvantages of all the various options for LIN bus transceivers.

When I received the meter, I decided to take another stab at streaming setting up the emulator on camera:

If you watch the whole video you will see me at some point put a finger on the chip and yelp — turns out I ended up with a near-dead short on its embedded regulator. Thankfully, since the chip is designed for the automotive market, the stress did not cause it to fail at all, just… overheat. And as I showed on stream, I did manage to keep the control panel running with my “emulator”, although I did note some noise on the I/O towards the end.

So a little bit more exploration later told me that a) the PL2303 seems to be a bit unreliable with the 3.3V without tying the VREG with the 3.3V coming from the device, and b) even on the CH341 I would get some strange noise in addition to the signal. I think the reason for that is that the chip uses a comparator against its own regulator to decide whether the transmitter should be on. Since, as Monty and Hector suggested, it’s a bad idea to tie multiple regulators together, I decided that even the MCP2021 is not the transceiver I wanted.

Unfortunately, that made it harder to find the right transceiver. Microchip’s suggested replacement, the ATA6632xx series, has all of the disadvantages, as I said: it has the “TXD Dominant Timeout” feature (so it cannot send the 104bps signal I need to send), it includes a voltage regulator that cannot be disabled, and it is only available in VDFN package that is not possible to hand-solder.

On Digi-Key (which is by now my usual supplier), Microchip’s MCP20xx series are the only PDIP-8 through-hole components, so the next best thing is SOIC-8, which is surface mount (so not easily breadboardable) but still hand-solderable (with a steady hand, a magnifying glass, and a iron tip). Looking at those, I found at least two that fit.

ON Semiconductor’s NCV7327 was a very obvious choice because they explicitly say in the features list «Transmission Rate up to 20 kbps (No low limit due to absence of TxD Timeout function)», and it was the only one that I found explicitly note that the TxD Timeout imposes a floor to the speed (as I explained above). Unfortunately, the SOIC-8 version was not available at the time of order on Digi-Key, with a 22 weeks backorder.

So instead, I settled for Texas Instrument’s TLIN1027DRQ1. This is pretty much… the same. For what I can see, both ON’s and TI’s SOIC-8 devices are pin compatible, and they are nearly pin compatible with Microchip’s SOIC-8 variants, insofar as the power, bus, RXD, and TXD pins are in the same position.

There is, though, a rake just waiting for you there. The Enable/Chip Select pins on both the TLIN1027DRQ1 and the NCV7327 do not correspond to the MCP20xx Transmission Enable semantics, despite sharing the same position. With the MCP20xx you could leave a transceiver connected to a chatty bus, with the TXEnable off, and you would still receive the traffic from the bus.

But with the other two, you’re turning off the whole transceiver at once, which wouldn’t be too bad if it wasn’t that both of these pull TXD to ground (dominant), if you leave it unconnected. Again, this isn’t a big problem in by itself, as long as the firmware is told not to transmit when the bus is connected directly between the panel and the HVAC, nothing should be transmitted, right?

But this does break one assumption I was making: if I disable the smart controller board, I want to be able to remove the ESP32 devkit altogether. This is important because beside OTA (Over The Air) updates, I would need to be able to disconnect the ESP32 to update the firmware on it. Which means I don’t want to rely on the firmware being running and not holding the bus busy.

A schematic diagram of the Panel-side bus transceiver block.

So what I ended up adding to the design is a way for the bus selector to decide whether transmission is to be allowed on the transceiver. I think this is the first time I even consider the idea of using a 74-logic component in my designs (to the point that I had to figure out how to use that with the EAGLE-provided symbols — hint: use the invoke command), but this seemed to me as the easiest option to implement what I needed.

The tie-up-both-inputs for the NAND is literal textbook electronics, but turns out to work very well since the cheapest 74 logic NAND chip I found contains four of them, and I only need one other.

Note that of course this is only one of the “logical blocks” of the board — and actually not even the final form of it. As I get into more details later, you’ll find out that this only turned out to be one of the possible solutions, and (at the time of writing) there’s no guarantee that this is actually going to be the one I’m going to be using.

Service Providers, Business Sustainability, And Society

I’m not sure how many people think consciously about the business plans of the service providers that they start using, or at least relying upon. I don’t do it terribly often, but I do sometimes, and I thought I would share some words on the topic, because I do think that the world would be a better place if we did think about the effect of our choices to the wide world.

I have for instance wondered aloud, over the years, on Twitter and even on this blog, about the fintech company Curve. Their services are actually interesting and valid… but they do not offer any interesting value at a premium — I would never pay for their “Curve Metal” tiers, because it just doesn’t add up. I couldn’t just figure out how they were expecting to keep running, given I expect the majority of their “customers” are consumer end-users that would follow the same procedure I followed: sign-up for the service, using the £5 offer, use it for the 90 days of free cashback, possibly use it a time or two while traveling, and otherwise just keeping it as a backup card. Before the pandemic I also wrote how they were giving out more free money, but more recently they decided to start crowdfunding. And they became very loud when they did. Which is what reminded me I had a half-drafted post on the topic (this post), which I should probably resurrect (which I did, since you’re reading this now).

Before going back to Curve and their crowdfunding, I want to point at two sayings that you’ll keep finding around you, when you discuss businesses, products, Internet, and privacy: «If it sounds too good to be true, it probably is» and «If you are not paying for it, you’re not the customer, you’re the product». These are good places to start a discussion, although I think there is a lot of nuance that is lost when trying to (over) simplify with these.

Full disclosure: I work for a big company that, primarily, offers services to end users free with ads, though the product I personally work on not only does not relate to ads, but does not even have ads in it. And while my previous employer was another big company that is part of the “AdTech” business (and in there I did work on ads systems for a long while), I have discussed ads in the past, and I even ran ads on this blog (back in the days before stable employment), so you can imagine that what I’m going to be writing about is my personal opinion and does not represent that of my current, past, or future employers.

So why do I think it’s important to figure out the sustainability of service providers? Well, because it becomes a problem for the whole of society when a fraudulent or scammy provider gets a certain about of market share, even if sometimes not evenly and not in a way that most people would be able to connect together. For instance you can take the examples of Enron, Bernie Madoff, Theranos, or Wirecard — organizations that promised too-good-to-be-true services and profits, and ended up bust, with different blast radiuses. For the last one we don’t even quite know yet what the blast radius will be once things settle down: the German financial services environment is likely going to be reshaped by last year’s scandal, and so it appears will be EY.

Though this is clearly not limited to financial services (VPN providers seem to be pretty much in the same position), it does look like the likes of Curve and Revolut are easily the most visible cases where a company apparently lacked a sustainable business plan, and decided to turn to a crowdfunding campaign — Curve just had one this past May, and it was so noisy that I know a couple of people who went on to find a way to delete their account (and the app) simply because they got tired of their pushy notifications (not a typo).

Now, that might sound not too bad — after all, crowdfunding for the most part just means someone is willingly going to pay to subsidise other people’s “free money”. But the next thing that Curve did after that was to increase referral bonuses for new users to £20, from the previous £5, and that smells to me even more fishy — because that sounds like trying to bring in a mass of users hoping that enough of them can’t figure out that the premium options of Curve are not worth the money.

On the other side of the tracks, Revolut has been pushing more and more for cryptocurrencies, which I’m not going to even pretend is a neutral thing. I care enough about the environment that their consumption alone makes me angry, but even more so, I find that the amount of scams related to cryptocurrencies at this point are wide enough to show that the whole concept is hostile to society. I do not support nor recommend Revolut to new users unless they live in countries like Ireland where there is no other option – in London, using Revolut feels like subsidizing scammers by lending respectability to cryptocurrencies.

But at this point, neither of those appear to have reached the full consumer scams, so it should be fine, right?

Well, let’s take a different example with the VPN market. I have complained over on Twitter a few times how I blame us geeks for the amount of VPN scams that are out there. Privacy maximalists tend to scare people with the idea that your ISP, the Starbucks, or the airport lounge you’re using can see everything you do — and while it is definitely the case that there’s a lot more data going around than you may think, ads such as those ExpressVPN pays the otherwise excellent No Such Thing As A Fish podcast to air, that suggests that your ISP would be able to tell what you’re Googling are not just falsehoods, but proper FUD.

But even accepting that ExpressVPN has no ulterior motive and are totally legit – I have no idea about that, I only used them before while in China – and leaving aside the fact that VPNs have huge targets painted on their backs, there’s still the matter that you need to trust your VPN provider. Which may or may not be more trustworthy than your home ISP — the two of of them having pretty much the same power. Most of the review websites seem to be talking more about commissions than trust, because the worst part is that there is nothing that allows you to verify their statement that they are not logging your traffic in the same way they keep insisting your ISP is doing.

So how do you trust a VPN provider? Well, for a start you may want to start considering who their founders are and how they get their money. And you’d be surprised how many dots you can connect this way. For instance, last year CNET wrote about Kape Technologies, a company that bought a Romanian VPN service called CyberGhots. In that article, they also noted that in addition to CyberGhost, the same parent company bought two more VPNs:

After buying CyberGhost, Kape then bought VPN ZenMate in 2018 and more recently Private Internet Access, a US-based VPN, in a move which Erlichman said in a press release would allow Kape to “aggressively expand our footprint in North America.”

Now, the problem is less about a single parent company owning multiple VPN services as they were different brands — this happen all the time in many other fields. Just look at the relationship between Tesco Mobile and O2, or banks such as Halifax and Lloyds. But the rest of the article does make for a good build up for why the whole situation is a bit suspect.

But more importantly, you may have heard of Private Internet Access before — they are the company that started heavily sponsoring Freenode a few years ago. And if you have been paying attention to Free Software projects’ communication in the past few months, you probably know by now that Freenode is a trash fire now. So given those connections, would you trust anything that has connections to these organizations and people? I clearly wouldn’t.

This same problem with trust and business sense applies to other businesses. With the exception of B Corporations, most companies out there are intended to make money. If nothing else, they need to make money so that they can pay the wages of the people working there. So I don’t generally trust companies that appear to be giving everything away — and rather prefer those that, if they are making money with ads, say so out right.

In the case of Fintech services — Wise (formerly known as TransferWise) is my example of choice for a company that is transparent when it comes to the cost associated with their services, and makes a good case for why they charge you, and how much so. I really wish more of them did the same because it would make it easier for people to choose how much trust to put in a company. Unfortunately it appears that the current trend in the market is to push as much grown as possible for companies to grab a captive audience before turning on the monetization screw.

Important note: this blog post was written before Wise announced they intend to go public (it was previously rumored, but I didn’t spot that). I guess I should now disclose that I will most likely consider buying some stock of the company, though probably not on the IPO day. We shall see. As I said, I do like their business sense.

Going back to a moment to that «if you’re not paying for it, you’re the product» as well — well, I don’t agree in full, but this is something that people do need to be keyed in to look out for. In particular, I don’t think that ad-supported businesses should disappear, and that everything should be hidden behind a paywall, because I do think that having wider access to information without making it costly is a good thing. But also I do think that there are services that are often crossing the line into being creepily interested in your data rather than “trade it” for useful information.

But I also think the scrutiny is often placed more on the big, established companies rather than the “scrappy” startups, or the more consulting-like companies. Heck, a few of you reading this are probably already ready to complain that both my current and past employers are seen as data hungry — but I can tell you that both companies, at least during my tenure, would never have someone state on a stage that collecting data from IoT sensors and just throwing it to a ML pipeline to gather unexpected insights, as it would go against every one of the privacy and data handling trainings and commitments…

And yet John Roese from Dell EMC stated that in his opening thoughts for LISA 16 (go to minute 44 in the open access video) in what sounds terribly like an advice to startups. To be honest, that’s not the only cringey thing in that opening talk — from a technical point of view, his insisting that persistent memory means you can’t just reboot a computer to reset the state of memory (as if re-loading the data in memory from scratch wouldn’t happen on request, whether this is persistent or not) is probably a worst phrase.

What I’m trying to say is that you need to be sure who your friends are, and it’s not as easy as to expect that all small players are ethical and all big ones are not. And asking yourself “how are they making money, if at all?” is not just allowed — it should sometimes be considered a necessity.

Reverse Engineering an LG Aircon Control Panel — Introduction

I like reverse engineering stuff. It’s not just the fact that it’s a nice puzzle to solve, but I enjoy the thrill of “Oh, that’s how that works.” I’m sure I’m not alone, as can be clearly seen following marcan’s Asahi Linux work, or following Foone on Twitter, or Big Clive on YouTube (and many, many others).

Sometimes, a lot more rarely, my reverse engineering is actually geared towards something I want to make use of, rather than just for the sake of finding answers — this is one of those cases. If you have been following me on Twitter or decided to watch me work on this live on Twitch, you probably already know what I’m talking about. If not, be warned that this is going to be the first part of a (possibly long) series of posts on the same topic. It turned out to be very long for a single post, and I decided to split it instead.

You see, when we moved from the last apartment, we sold our Nest smart thermostat to a friend. The new apartment has an aircon system with heat pump, rather than a “classic” heating system, which is really important as the balcony can easily reach 40°C in the mornings when the sun shines. And unlike in the US, where thermostats are pretty much standardized, Europe’s landscape of thermostats is different enough that Nest gave up, and does not support aircon systems.

Aside: I do have a bit of a rant about Nest Thermostats in Europe, but some of that might be a bit tricky to phrase for me without risking breaching confidentiality with my previous employer, which I don’t want to do. So I will leave a question here for European Nest Thermostats users: can you finally enable hot water boost with the Google Home app?

To be honest, this also kind of makes sense: in a flat that is cooled and heated with an HVAC, it makes sense to have multiple thermostats so that each room can set a different required temperature. If we’re spending the evening in the living room, what’s the point of heating up the bedroom? If I’m on vacation and not spending time in the office, why would I turn on the air conditioning? And so on.

Unfortunately what we ended up with is three thermostat units from LG, model number LG-PQRCUDS0 (provided for ease of searching and finding this blog post), which are definitely not smart, and also not convenient. These are wired, non-smart control panels, that do support features like timing, but do not provide any way to control without tapping on the screen. As far as I know, these are configured to read a temperature sensor that is not on the panel itself, but on the other hand, the placement of those sensors are a bit on the unfortunate side: in particular in the bedroom it appears located in a position that is too natural to fit a wardrobe in, making it register always a higher temperature that the room actually has.

This had been particularly annoying during the winter but it was proving to be worse during the summer: as I said the temperature in the balcony can reach 40°C in the morning, as we’re facing east and it’s a all-glass external wall. That means that the temperature inside the apartment can easily reach 30°C quite suddenly. This is not good for electronics already, but it’s doubly non-good for things like food and medicine, including insulin, which I very much depend on.

While we could just try leveraging the timer mode to turn on the AC in the morning, the complication of where the sensor is makes it very hard to judge the temperature to set it at. And since, as Alec points out on the video, the thermostat’s job is only to turn something on or off (in theory, at least)… well, there has to be an easier way.

So I embarked in this quest of reverse engineering my aircon control panel, with the intent of introducing an ESPHome-compatible add-in that would allow me to control the HVAC through Home Assistant.

Inspection

The first thing to do when setting off to reverse engineer something is to figure out what it is, whether there is any documentation for it, and whether someone else already reverse engineered it. The model number, as I said, is LG-PQRCUDS0 and LG has user and installation manuals online describing it a Delux Wired Remote Controller (together with the -B and -S variants of the part number).

Reverse image search for the panel actually seemed to struck gold at first, as this Instructables post showed exactly the same UI as mine, and included a lot of information about the protocol. But also the comments pointed to a couple of different models that seemed all similar but a bit different. So instead of going ahead and trying to build the already reversed protocol I wanted to confirm how it all worked myself.

A close up of the door behind my LG aircon control panel showing a JST ZR connector, and a yellow-red-black cable going to the wall.

The first question is going to be what the electrical “protocol” it’s using. The back of the panel has a door, that hides the inbound connection from “the wall” (that is, the actual HVAC unit), which is three wires and terminates in a JST ZR connector.

With my multimeter I could confirm that the voltage would be around 12V — but I couldn’t confirm whether it would be differential data or what else, since I’m still using an older multimeter and it doesn’t have any option to indicate there’s a signal on a wire. If someone has a good suggestion for a multimeter that does that, please leave a comment below the video in this post as I’d love to get a good one.

Now this is a good news, overall. The fact that the plug, and the cable itself, can be bought off the shelf means I don’t have have to take risky approaches, which is great, given that we’re renting, so any reverse engineering and replacement implementation needed to be non-destructive and reversible.

So I took out my Logic Pro, a very long USB 3.0 cable, and I ordered just enough components from Digikey to debug this thing. And a bench power supply — because I didn’t have a bench power supply, and given this thing needed 12V, it sounded something handy to have for this. The end result is the following:

With this connected, I used the Logic 2 software to check the voltage levels, and figure out that the yellow wire is data, while the red wire (in the middle) is 12V supply. The data turned out to, indeed, be a 104 Bd serial connection, which would make it share a lot of the information from the previous reverse engineering…

Except that something was off: what I could see on the wire was a burst of 12 bytes in a single stream, exactly once a second, which I assumed at that point to be unidirectional from the panel to the HVAC. But when trying to verify the checksum it didn’t match what the instructions on the other project suggested: sum everything, modulo 256, and xor with 0x55 (the confusing ‘U’ in the various descriptions is actually a bit pattern). So while I could figure out that the first byte seemed to include the mode of operation, and the third one appeared to include the fan speed, I couldn’t figure out for the life of me the checksum, so I thought I wouldn’t be able to send commands to the HVAC to do what I wanted.

On the other hand, in the worst case scenario I could have just replayed the commands I could record from the panel, so I decided to try my luck at drawing and ordering a PCB that would have just enough components for me to play around with.

Drawing the PCB

I’m far from being even a passable expert on electronics, but I could at least figure out some of the things I wanted from a “smart controller” unit to attach to this aircon. So I started with a list of requirements:

  • Since I wanted it to use ESPHome, it should be designed around an ESP32 module. I already attempted this once with the acrylic lamps, and I have yet to get a working board out of that. On the other hand, this time I’m much less space constrained, so I decided to go for a full DEVKIT module, one of those with already the full board of regulators, USB port and serial adapters. This turned out to be a further blessing in disguise, since the current chip shortage appears to have affected the CP2104 module I used in my previous design and I wouldn’t have been able to replicate it.
  • While I don’t expect that the HVAC power supply has been limited in power significantly (after all there’s even more deluxe WiFi enable controllers in other versions), I didn’t want to increase the load on the 12V supply significantly. Which meant I went for the more complex, but also more efficient, route of building in a buck converter to 3.3V to power up the ESP32.
  • Also, I really know that relying on my code for “enjoyment-critical” use cases can be frustrating, I wanted a physical way to hard-disconnect a possibly misbehaving automation, and go back to use the old controller, without having to fidget with cables.

With these conditions, and the assumption that the twelve bytes I was seeing were being sent directly from the controller to the HVAC, I drew and manufactured the above board. Feel free to spot the error at the top of the board, if you may.

Now, since JLCPCB turnaround is usually fairly fast, I went ahead and got that manufactured while I was still fighting with figuring out the checksum. So when the boards arrived and I populated them, I was planning on just keep changing settings to find more possible combinations of bytes to see how the checksum would behave.

And that’s when I found out I was very wrong in my assumption, and it’s possible that either the reverse engineering notes I’ve seen for other are missing a big chunk of information, or LG has so many different ways to achieve roughly the same endgame. One I powered up the panel from the bench supply, then I could see that the panel was rather only sending six bytes, rather than the twelve I expected. It’s a bidirectional communication on a single wire, a bus.

That meant going back to the literal drawing board, find the right components to implement this, and start what turned out to be a much large sidequest of complicating matters.

Ten Years of IPv6, Maybe?

It seems like it’s now a tradition that, once a year, I will be ranting about IPv6. This usually happens because either I’m trying to do something involving IPv6 and I get stumped, or someone finds one of my old blog posts and complains about it. This time is different, to a point. Yes I sometimes throw some of my older post out there, and they receive criticism in the form of “it’s from 2015” – which people think is relevant, but isn’t, since nothing really changed – but the occasion this year is celebrating the ten years anniversary for the World IPv6 Day, the so-called 24-hour test of IPv6 from the big players of network services (including, but not limited to, my current and past employer).

For those who weren’t around or aware of what was going on at the time, this was a one-time event in which a number of companies and sites organized to start publishing AAAA (IPv6) records for their main hostnames for a day. Previously, a number of test hostnames existed, such as ipv6.google.com, so if you wanted to work on IPv6 tech you could, but you had to go and look for it. The whole point of the single day test was to make sure that users wouldn’t notice if they started using the v6 version of the websites — though as history will tell us now, a day was definitely not enough to find that many of the issues around it.

For most of these companies it wasn’t until the following year, on 2012-06-06, that IPv6 “stayed on” on their main domains and hostnames, which should have given enough time to address whatever might have come out of the one day test. For a few, such as OVH, the test looked good enough to keep IPv6 deployed afterwards, and that gave a few of us a preview of the years to come.

I took part to the test day (as well as the launch) — at the time I was exploring options for getting IPv6 working in Italy through tunnels, and I tried a number of different options: Teredo, 6to4, and eventually Hurricane Electric. If you’ve been around enough in those circle you may be confused by my lack of Sixxs as an option — I have encountered their BOFH side of things, and got my account closed for signing up with my Gmail address (that was before I started using Gmail For Business on my own domain). Even when I was told that if I signed up with my Gentoo address I would have had extra credits, I didn’t want to deal with that behaviour, so I just skipped on the option.

So ten years on, what lessons did I learn about IPv6?

It’s A Full Stack World

I’ve had a number of encounters with self-defined Network Engineers, who think that IPv6 just needs to be supported at the network level. If your ISP supports IPv6, you’re good to go. This is just wrong, and shouldn’t even need to be debated, but here we are.

Not only supporting IPv6 requires using slightly different network primitives at times – after all, Gentoo has had an ipv6 USE flag for years – but you need to make sure anything that consumes IP addresses throughout your application knows how to deal with IPv6. For an example, take my old post about Telegram’s IPv6 failures.

As far as I know their issue is solved, but it’s far from uncommon — after all it’s an obvious trick to feed legacy applications a fake IPv4 if you can’t adapt them quickly enough to IPv6. If they’re not actually using it to initiate a connection, but only using it for (short-term) session retrieval or logging, you can get away with this until you replace or lift the backend of a network application. Unfortunately that doesn’t work well when the address is showed back to the user — and the same is true for when the IP needs to be logged for auditing or security purposes: you cannot map arbitrary IPv6 into a 32-bit address space, so while you may be able to provide a temporary session identifier, you would need to have something mapping the session time and the 32-bit identifier together, to match the original source of the request.

Another example of where the difference between IPv4 and IPv6 might cause hard to spot issues is in anonymisation. Now, I’m not a privacy engineer and I won’t suggest that I’ve got a lot of experience in the field, but I have seen attempts at “anonymising” user IPv4s by storing (or showing) only the first three octets of it. Beside the fact that this doesn’t work if you are trying to match up people within a small pool (getting to the ISP would be plenty enough in some of those cases), this does not work with IPv6 at all — you can have 120 of the 128 bits of it and still pretty much being able to identify a single individual.

You’re Only As Prepared As Your Dependencies

This is pretty much a truism in software engineering in general, but it might surprise people that this applies to IPv6 even outside of the “dependencies” you see as part of your code. Many network applications are frontends or backends to something else, and in today’s world, with most things being web applications, this is true for cross-company services too.

When you’re providing a service to one user, but rely on a third party to provide you a service related to that user, it may very well be the case that IPv6 will get in your way. Don’t believe me? Go back and read my story about OVH. What happened there is actually a lot more common than you would think: whether it is payment processors, advertisers, analytics, or other third party backends, it’s not uncommon to assume that you can match the session by the source address and time (although that is always very sketchy, as you may be using Tor, or any other setup where requests to different hosts are routed differently.

Things get even more complicated as time goes by. Let’s take another look at the example of OVH (knowing full well that it was 10 years ago): the problem there was not that the processor didn’t support IPv6 – though it didn’t – the problem was that the communication between OVH (v6) and the payment processor (v4) broke down. It’s perfectly reasonable for the payment processor to request information about the customer that the vendor is sending through, through a back-channel: if the vendor is convinced they’re serving an user in Canada, but the processor is receiving a credit card from Czechia, something smells fishy — and payments are all about risk management after all.

Breaking when using Tor is just as likely, but that can also be perceived as a feature, from the point of view of risk. But when the payment processor cannot understand what the vendor is talking about – because the vendor was talking to you over v6, and passed that information to a processor expecting v4 – you just get a headache, not risk management.

How did this become even more complicated? Well, at least in Europe a lot of payment processors had to implement additional verification through systems such as 3DSecure, Verified By Visa, and whatever Amex calls it. It’s often referred to as Strong Customer Authentication (SCA), and it’s a requirement of the 2nd Payment Service Directive (PSD2), but it has existed for a long while and I remember using it back before World IPv6 Day as well.

With SCA-based systems, a payment processor has pretty much no control on what their full set of dependencies is: each bank provides their own SCA backend, and to the best of my understanding (with the full disclosure that I never worked on payment processing systems), they all just talk to Visa and MasterCard, who then have a registry of which bank’s system to hit to provide further authentication — different banks do this differently, with some risk engine management behind that either approves straight away, or challenges the customer somehow. American Express, as you can imagine, simplifies their own life by being both the network and the issuer.

The Cloud is Vastly V4

This is probably the one place where I’m just as confused as some of the IPv6 enthusiast. Why do neither AWS nor Google Cloud provide IPv6 as an option, for virtual machines, to the best of my knowledge?

If you use “cloud native” solutions, at least on Google Cloud, you do get IPv6, so there’s that. And honestly if you’re going all the way to the cloud, it’s a good design to leverage the provided architecture. But there’s plenty of cases in which you can’t use, say, AppEngine to provide a non-HTTP transport, and having IPv6 available would increase the usability of the protocol.

Now this is interesting because other providers go different ways. Scaleway does indeed provide IPv6 by default (though, not in the best of ways in my experience). It’s actually cheaper to run on IPv6 only — and I guess that if you do use a CDN, you could ask them to provide you a dual-stack frontend while talking to them with an IPv6-only backend, which is very similar to some of the backend networks I have designed in the past, where containers (well before Docker) didn’t actually have IPv4 connectivity out, and they relied on a proxy to provide them with connections to the wide world.

Speaking of CDNs – which are probably not often considered part of Cloud Computing but I will bring them in anyway – I have mused before that it’s funny how a number of websites that use Akamai and other CDNs appear to not support IPv6, despite the fact that the CDNs themselves do provide IPv6-frontend services. I don’t know for sure this is not related to something “silly” such as pricing, but certainly there are more concerns to supporting IPv6 than just “flipping a switch” in the CDN configuration: as I wrote above, there’s definitely full-stack concern with receiving inbound connections coming via IPv6 — even if the service does not need full auditing logs of who’s doing what.

Privacy, Security, and IPv6

If I was to say “IPv6 is a security nightmare”, I’d probably get a divided room — I think there’s a lot of nuance needed to discuss privacy and security about IPv6.

Privacy

First of all, it’s obvious that IPv6 was designed and thought out at a different time than the present, and as such it brought with it some design choices that, looking at them nowadays, look wrong or even laughable. I don’t laugh at them, but I do point out that they were indeed made with a different idea of the world in mind, one that I don’t think is reasonable to keep pining for.

The idea that you can tie up an end-user IPv6 with the physical (MAC) address of an adapter is not something that you would come up with in 2021 — and indeed, IPv6 was retrofitted with at least two proposals for “privacy-preserving” address generation option. After all, the very idea of “fixed” MAC addresses appear to be on the way out — mobile devices started using random MAC addresses tied to specific WiFi networks, to reduce the likeliness of people being tracked between different networks (and thus different locations).

Given that IPv6 is being corrected, you may expect then that the privacy issue is now closed, but I don’t really believe that. The first problem is that there’s no real way to enforce what your equipment inside the network will do from the point of view of the network administrator. Let me try to build a strawman for this — but one that I think is fairly reasonable, as a threat model.

While not every small run manufacturer would go out of their way to be assigned an OUI to give their devices a “branded” MAC address – many of the big ones don’t even do that and leave the MAC provided by the chipset vendor – there’s a few of them who do. I know that we did that at one of my previous customers, where we decided to not only getting an OUI to use for setting the MAC addresses of our devices — but we also used it as serial number for the device itself. And I know we’re not alone.

If some small-run IoT device is shipped with a manufacturer-provided MAC address with their own OUI, it’s likely that the addresses themselves are predictable. They may not quite be sequential, and they probably won’t start from 00:00:01 (they didn’t in our case), but it might be well possible to figure out at least a partial set of addresses that the devices might use.

At that point, if these don’t use a privacy-preserving ephemeral IPv6, it shouldn’t be too hard to “scan” a network for the devices, by calculating the effective IPv6 on the same /64 network from a user request. This is simplified by the fact that, most of the time, ICMP6 is allowed through firewalls — because some of it is needed for operating IPv6 altogether, and way too often even I left stuff more open than I would have liked to. A smart gateway would be able to notice this kind of scans, but… I’m not sure how most routers do with things like this, still. (As it turns out, the default UniFi setup at least seems to configure this correctly.)

There’s another issue — even with privacy extensions, IPv6 addresses are often provided by ISPs in the form of /64 networks. These networks are usually static per subscriber, as if they were static public IPv4, which again is something a number of geeks would like to have… but has also side effects of being able to track a whole household in many cases.

This is possibly controversial with folks, because the move from static addresses to dynamic dialup addresses marks the advent of what some annoying greybeards refer to as Eternal September (if you use that term in the comments, be ready to be moderated away, by the way). But with dynamic addresses came some level of privacy. Of course the ISP could always figure out who was using a certain IP address at a given time, but websites wouldn’t be able to keep users tracked day after day.

Note that the dynamic addresses were not meant to address the need for privacy; it was just incidental. And the fact that you would change addresses often enough that the websites wouldn’t track you was also not by design — it was just expensive to stay connected for long period of times, and even on flat rates you may have needed the phone line to make calls, or you may just have lost connection because of… something. The game (eventually) changed with DSLs (and cable and other similar systems), as they didn’t hold the phone line busy and would be much more stable, and eventually the usage of always-on routers instead of “modems” connected to a single PC brought the whole cycling to a new address a rare occurrence.

Funnily enough, we once again some tidbit of privacy in this space with the advent of carrier-grade NATs (CGNAT), which were once again not designed at all for this. But since they concentrated multiple subscribers (sometimes entire neighbourhoods or towns!) into a single outbound IP address, they would make it harder to tell the person (or even the household) accessing a certain website — unless you are the ISP, clearly. This, by the way, is kind of the same principle that certain VPN providers use nowadays to sell their product as a privacy feature; don’t believe them.

This is not really something that the Internet was really designed — protection against tracking didn’t appear to be one of the worries of an academic network that considered the idea that each workstation would be directly accessible to others, and shared among different users. The world we live in, with user devices that are increasingly single-tenant, and that are not meant to be accessible by anyone else on the network, is different from what the original design of this network was visualizing. And IPv6 carried on with such a design to begin with.

Security

Now on the other hand there’s the problem with actual endpoint security. One of the big issues with network protocols is that firewalls are often designed with one, and only one protocol in mind. Instead of a strawman, this time let me tlak about an episode from my past.

Back when I was in high school, our systems lab was the only laboratory that had a mixed IP (v4, obviously, it was over 16 years ago!) and IPX network, since one of the topics that they were meant to teach us about was how to set up a NetWare network (it was a technical high school). All of the computers were set up with Windows 95, with the little security features that were possible to use, and included a software firewall (I think was ZoneAlarm, but my memory is fuzzy around this point). While I’m sure that trying to disable it or work it around would have worked just as well, most of us decided to not even try: Unreal Tournament and ZSNES worked over IPX just as well, and ZoneAlarm had no idea of what we were doing.

Now, you may want to point out that obviously you should make sure to secure your systems to work for both IP generations anyway. And you’d be generally right. But given that sometimes systems have been lifted and shifted many times, it might very well be that there’s no certainty that a legacy system (OS image, configuration, practice, network, whatever it is) can be safely deployed on a v6 world. If you’re forced to, you’ll probably invest money and time to make sure that it is the case — if you don’t have an absolute need beside the “it’s the Right Thing To Do”, you most likely will try to live as much as you can without it.

This is why I’m not surprised to hear that for many sysadmins out there, disabling IPv6 is part of the standard operating procedure of setting up a system, whether it is a workstation or a server. This is not even helped by the fact that on Linux it’s way easy to forget that ip6tables is different from iptables (and yes I know that this is hopefully changing soon).

Software and Hardware Support

Probably this is the aspect of the IPv6 fan base where I feel myself most at home with. For operating systems and hardware (particularly network hardware) to not support IPv6 in 2021 it feels like we’re being cheated. IPv6 is a great tool for backend networks, as you can avoid a lot of legacy features of IPv4 quickly, and use cascading delegation of prefixes in place of NAT (I’ve done this, multiple times) — so not supporting it at the very basic level is damaging.

Microsoft used to push for IPv6 with a lot of force: among other things, they included Miredo/Teredo in Windows XP (add that to the list of reasons why some professionals still look at IPv6 suspiciously). Unfortunately WSL2 (and as far as I understand HyperV) do not allow using IPv6 for the guests on a Windows 10 workstation. This has gotten in my way a couple of times, because I am otherwise used to just jump around my network with IPv6 addresses (at least before Tailscale).

Similarly, while UniFi works… acceptably well with IPv6, it is still considering it an afterthought, and they are not exactly your average home broadband router either. When even semi-professional network equipment manufacturers can’t get you a good experience out of the box, you do need to start asking yourself some questions.

Indeed, if I could have a v6-only network with NAT64, I might do that. I still believe it is useless and unrealistic, but since I actually do develop software in my spare time, I would like to have a way to test it. It’s the same reason why I own a number of IDN domains. But despite having a lot of options for 802.1x, VLAN segregation, and custom guest hotspots, there’s no trace of NAT64 or other goodies like that.

Indeed, the management software is pretty much only showing you IPv4 addresses for most things, and you need to dig deep to find the correct settings to even allow IPv6 on a network and set it up correctly. Part of the reason is likely that the clients have a lot more weight when it comes to address selection than in v4: while DHCPv6 is a thing, it’s not well supported (still not supported at all on WiFi on Android as far as I know — all thanks to another IPv6 “purist”), and the router advertising and network discovery protocols allow for passive autoconfiguration that, on paper, is so much nicer than the “central authority” of DHCP — but makes it harder to administer “centrally”.

iOS, and Apple products in general, appear to be fond of IPv6. More than Android for sure. But most of my IoT devices are still unable to work on an IPv6-only network. Even ESPHome, which is otherwise an astounding piece of work, does not appear to provide IPv6 endpoints — and I don’t know how much of that is because the hardware acceleration is limited to v4 structures, and how much of it is just because it’s possibly consuming more memory in such small embedded device. The same goes for CircuitPython when using the AirLift FeatherWing.

The folks who gave us the Internet of Things name sold the idea of every device in the world to be connected to the Internet through a unique IPv6 address. This is now a nightmare for many security professionals, a wet dream for certain geeks, but most of all an unrealistic situation that I don’t expect will become reality in my lifetime.

Big Names, Small Sites

As I said at the beginning explaining some of the thinking behind World IPv6 Day and World IPv6 Launch, a number of big names, including Facebook and Google, have put their weight behind IPv6 from early on. Indeed, Google keeps statistics of IPv6 usage with per-country split. Obviously these companies, as well as most of the CDNs, and a number of other big players such as Apple and Netflix, have had time, budget, and engineers to be able to deploy IPv6 far and wide.

But as I have ventured before, I don’t think that they are enough to make a compelling argument for IPv6 only networks. Even when the adoption of IPv6 in addition to IPv4 might make things more convenient for ISPs, the likeliness of being able to drop tout-court IPv4 compatibility is approximately zero, because the sites people actually need are not going to be available over v6 any time soon.

I’m aware of trackers (for example this one, but I remember seeing more of those) that tend to track the IPv6 deployment for “Alexa Top 500” (and similar league tables) websites. But most of the services that average people care about don’t seem to be usually covered by this.

The argument I made in the linked post boils down to this: the day to day of an average person is split between a handful of big name websites (YouTube, Facebook, Twitter), and a plethora of websites that are very sort of global. Irish household providers are never going to make any of the Alexa league tables — and the same is likely true for most other countries that are not China, India, or the United States.

Websites league tables are not usually tracking national services such as ISPs, energy suppliers, mobile providers, banks and other financial institutions, grocery stores, and transport companies. There are other lists that may be more representative here, such as Nielsen Website Ratings, but even those are targeted at selling ad space — and suppliers and banks are not usually interested in that at all.

So instead, I’ve built my own. It’s small, and it mostly only cares about the countries I experienced directly; it’s IPv6 in Real Life. I’ve tried listing a number of services I’m aware of, and should give a better idea of why I think the average person is still not using IPv6 at all, except for the big names we listed above.

There’s another problem with measuring this when resolving hosts (or even connecting to them — which I’m not actually doing in my case). While this easily covers the “storefront” of each service, many use separate additional hosts for accessing logged-in information, such as account data. I’m covering this by providing a list of “additional hosts” for each main service. But while I can notice where the browser is redirected, I would have to go through the whole network traffic to find all the indirect hosts that each site connects to.

Most services, including the big tech companies often have separate hosts that they use to process login requests and similar high-stake forms, rather than using their main domain. Or they may use a different domain for serving static content, maybe even from a CDN. It’s part and parcel of the fact that, for the longest time, we considered hostnames to be a security perimeter. It’s also a side effect of wanting to make it easier to run multiple applications written in widely different technologies — one of my past customers did exactly this using two TLDs: the marketing pages were on a dot-com domain, while the login to the actual application would be on the dot-net one.

Because of this “duality”, and the fact that I’m not really a customer of most of the services I’m tracking, I decided to just look at the “main” domain for them. I guess I could try to aim higher and collect a number of “service domains”, but that would be a point of diminishing returns. I’m going to assume that if the main website (which is likely simpler, or at least with fewer dependencies) does not support IPv6, their service domains don’t, either.

You may noticed that in some cases, smaller companies and groups appear to have better IPv6 deployments. This is not surprising: not only you can audit smaller codebases much faster than the extensive web of dependencies of big companies’ applications — but also the reality of many small businesses is that the system and network administrators do get a bit more time to learn and apply skills, rather than having to follow through a stream of tickets from everyone in the organization that is trying to deploy something, or has a flaky VPN connection.

It also makes it easier for smaller groups to philosophically think of “what’s the Right Thing To Do” versus the medium-to-big company reality of “What does the business get out of spending time and energy on deploying this?” To be fair, it looks like Apple’s IPv6 requirements might have pushed some of the right buttons for that — except for the part where they are not really requiring the services in use by the app to be available on IPv6, it’s acceptable for the app to connect with NAT64 and similar gateways.

Conclusions

I know people paint me as a naysayer — and sometimes I do feel like one. I fear that IPv6 is not going to become the norm during my lifetime, definitely not my career. It is the norm to me, because working for big companies you do end up working with IPv6 anyway. But most users won’t have to care for much longer.

What I want to point out to the IPv6 enthusiast out there is that the road to adoption is harsh, and it won’t get better any time soon. Unless some killer application of IPv6 comes out, where supporting v4 is no longer an option, most smaller players won’t bother. It’s a cost to them, not an advantage.

The performance concerns of YouTube, Netflix, or Facebook will not apply to your average European bank. The annoyance of going through CGNAT that Tailscale experience is not going to be a problem for your smart lightbulb manufacturer who just uses MQTT.

Just saying “It’s the Right Thing To Do” is not going to make it happen. While I applaud those who are actually taking the time to build IPv6-compatible software and hardware, and I think that we actually need more of them taking the pragmatic view of “if not now, when?”, this is going to be a cost. And one that for the most part is not going to benefit the average person in the medium term.

I would like to be wrong, honestly. I would want to wish that next year I’ll magically get firmware updates for everything I have at home and be working with IPv6 — but I don’t think I will. And I don’t think I’ll replace everything I own just because we ran out of address space in IPv4. It would be a horrible waste, to begin with, in the literal sense. The last thing we want, is to tell people to throw away anything that does not speak IPv6, as it would just pile up as e-waste.

Instead I wish that more IPv6 enthusiasts would get to carry the torch of IPv6 while understanding that we’ll live with IPv4 for probably the rest of our lives.

Home Automation: Physical Contact

In the previous post on the matter, I described the setup of lighting solutions that we use in the current apartment, as well as the previous apartment, and my mother’s house. Today I want to get into a little bit more detail of how we manage to use all of these lights without relying solely on our phones, or on the voice controlled assistants.

First of all, yes, we do have both Google Assistant and Alexa at home. I only keep Assistant in the office because it’s easiest to disable the mic on it with the physical switch, but otherwise we like the convenience of asking Alexa for the weather, or let Google read Sarah Millican for us. To make things easier to integrate, we also signed up for Nabu Casa to integrate our Home Assistant automations with them.

While this works fairly decently for most default cases, sometimes you don’t want to talk, for instance because your partner is asleep or dozing off, and you still want to control the lights (or anything else) without talking. The phone (with the Home Assistant app) is a good option, but it is often inconvenient, particularly if you’re going around the flat with pocketless clothes.

As it turns out, one of the good things that smart lights and in general IoT home automation bring to the table, is the ability to add buttons, which usually do not need to be wired into anything, and that can be placed just about anywhere. These buttons also generally support more than one action connected to them (such as tap, double-tap, and hold), which should allow providing multiple controls at a single position more easily. But there are many options for buttons, and they are not generally compatible with each other, and I got myself confused for a long while.

So to save the day, Luke suggested me some time ago to look into Flic smart buttons, which were actually quite the godsend for us, particularly before we had a Home Assistant set up at all. The way these work is that they are Bluetooth LE devices, that can pair with a proprietary Bluetooth LR “hub” (or with your phone). The hub can either connect with a bunch of network services, or work with a variety of local network devices, as well as send arbitrary HTTP requests if you configure it to.

While Flics were our first foray into adding physical control to our home automation, I’m not entirely sure if I would recommend them now. While they are quite flexible at a first glance, they are less than stellar in a more complex environment. For instance, while the Flic Hub can talk directly to LIFX lights on local network (awesome, no Internet dependency), it doesn’t have as much control on the results of that: when we used the four LIFX spots in the previous flat’s office, local control was unusable, as nearly every other click would be missing one spot, making it nearly impossible to synchronise. Thankfully, LIFX is also available as a “Cloud” integration, that could handle the four lights just fine.

The Flic Hub can talk to a Hue Bridge as well, to toggle lights and select scenes, but this is still not as well integrated as an original Hue Smart Button: the latter can be configured to cycle between scenes on multiple taps, while I could only find ways to turn the light on or off, or to select one scene per action with the default Flic interface.

We also wanted to use Flic buttons to control some of the Home Assistant interactions. While buttons on the app are comfortable, and usually Google Assistant can understand when we say “to the bedroom”, there are other times when we could rather use a faster response than the full Google Assistant round-trip. Unfortunately this is an area where Flic leaves a lot to be desired.

First of all, the Flic Hub does not support IPv6 (surprise), which meant I can’t just point it at my Home Assistant hostname, I need to use the internal IPv4 address. Because of that, it also cannot validate the TLS certificate either. Second, Flic does not have an Home Assistant native integration: for both Hue and LIFX, you can configure the Hub against a Cloud account or the local bridge, then configure actions to act on the devices connected to them, but for Home Assistant there is nothing, so the options are limited to setting up manual HTTP requests.

This is where things get fairly annoying. You can either issue a bearer token to login to Home Assistant, in which case you can configure the Flic to execute a script directly, or you can use the webhook trigger to send the action to Home Assistant and handle it there. The former appears to be slightly more reliable in my experience, although I have not figured out if it’s the webhook request not being sent by the hook, or the fact that HA is taking time to execute the automations attached to it; I should spend more time debugging that, but I have not had the time. Using the Bearer Tokens is cumbersome, though. Part of the problem is that the token itself is an extremely long HTTP header, and while you can add custom headers to requests in the Flic app, the length of this header means you can’t copy it, or even remove it. If you need to replace the token you need to forget the integration with the HTTP request and create a new one altogether.

Now on the bright side, Flic has recently announced (by email, but not on their blog) that they launched a Software Development Kit that allows writing custom integrations with the Flic Hub. I have not looked deeply into it, because I have found other solutions that work for me to augment my current redundant Flics, but I would hope that it means we will have a better integration with Home Assistant one day in the future.

To explain what the better alternatives we’re using are, we need to point out the obvious one first: the native Hue smart buttons. As I said in the previous post, I did get them for my mother, so that the lights on the staircase turn on and off the same way as they did before I fixed the wiring. We considered getting them here, but it turns out that those buttons are not cheap. Indeed, the main difference between different buttons we have considered (or tried, or at are using) is to be found in the price. At the time of writing, the Hue buttons go for around £30, Flics for £20, Aqara (Xiaomi) buttons for around £8 and Ikea ones for £6 (allegedly).

So why not using cheaper options? Well, the problem is that all of these (on paper) require different bridges. Hue, Aqara and Ikea are all ZigBee based, but they don’t interoperate. They also have different specs and availability. The Aqara buttons can be easily ordered from AliExpress and they are significantly cheaper than the equivalent from Hue, but they are also bigger, and of a shape just strange enough to make it awkward to place next to the wallplates with the original switches of the apartment, unlike both Flic and Hue. The Ikea ones are the cheapest, but unless you have a chance to pop in their store, it seems like they won’t be shipping in much of a hurry. As I write this blog post, it’s been nearly three weeks that I ordered them and they still have not shipped, with the original estimate for delivery of just over a month — updated before even posting this: Ikea (UK) cancelled my order and the buttons are no longer available online, which meant I also didn’t get the new lights that I was waiting for. Will update if any new model becomes available. In the meantime I checked the instructions and it looks like these buttons only support a single tap action.

This is where things get more interesting thanks to Home Assistant. Electrolama sells a ZigBee stick that is compatible with Home Assistant and that can easily integrate with pretty much any ZigBee device, including the Philips Hue lights and the Aqara buttons. And even the Aqara supports tap, double-tap, and hold in the same fashion as Flic, but with a lot less delay and no lost event (again, in my experience). It turned out that at the end of the day for us the answer is to use cheaper buttons from AliExpress and configure those, rather than dealing with Flic, though at the moment we have not removed the Flics around the apartment at all, and we rather have decided to use them for slightly different purposes, for automation that can take a little bit more time to operate.

Indeed, the latency is the biggest problem of using Flic with Home Assistant right now: even when event is not going lost, it can sometimes take a few seconds before the event is fully processed, and in that time, you probably would have gotten annoyed enough that you would have asked a voice assistant, which sometimes causes the tap to be registered after the voice request, turning the light back off. Whereas, the Aqara button is pretty much instantaneous. I’m not entirely sure what’s going on there, it feels like the bridge is “asleep” and can’t send the request fast enough for Home Assistant to register it.

It is very likely that we would be replacing at least a couple of the Flics we already have set up with the Aqara buttons, when they arrive. They support the same tap/double-tap/hold patterns as the Flic, but are significantly lower latency. Although they are bigger, and they do seem to have very cheap brittle plastic, I nearly made it impossible to change the battery on my first one, because trying to open the compartment with a 20p coin completely flattened the back!

Once you have working triggers with ZigBee buttons, by the way, connecting more controls become definitely easier. I really would consider making a “ZigBee streamerdeck” to select the right inputs on the TV, to be honest. Right now we have one Flic to double-tap to turn on the Portal (useful in case one of our mothers is calling), and another one to select PS4 (tap), Switch (double-tap), or Kodi (hold).

Wiring automation, and selection of specific scenes, is the easiest thing you can do in Home Assistant, so you get a lot of power for a little investment in time, from my point of view. I’m really happy to have finally set it up just the way I want it. Although it’s now time to consider updating the setup to no longer assume that either of us is always at home at any time. You know, with events happening again, and the lockdown end in sight.