Quarantine Project: Birch Books Smart Home

By now, everybody knows we’ll be spending a few more weeks closed up in our houses, flats, or rooms. And so everybody is looking for new ways to keep busy. While I do have a significant number of incomplete open source projects to dedicate my time to, I also felt I needed something that was just for myself. So I decided to start what could be described as an “art project”.

A few months ago, I have bought for myself a Birch Books Lego set. It’s a lovely set, and I have a particular place in my heart for bookstores in general (despite having switched to ebooks many years ago). I wanted to take good pictures of it, particularly since I have a few macro lenses that should do it justice, but to do so I felt the need for lighting it up.

Now, you can buy light kits on Amazon for it (and for most other kits out there), but I wanted to do something. So I ended up buying a bunch of battery holders, white LEDs, and a box of assorted lego bricks for me to take a look at, for modifications. I also got lucky, and found a number of bricks the right color for the set.

And since I’m a geek and I love overkill solutions, I started thinking “can I make wire this to a microcontroller, and have it control the lights over time?” — Originally I considered keeping it aligned to the wallclock (as in, the time in the real world) but it turned out that it’s not that interesting to change so slowly — and it’s actually much harder. Having it cycle at a fixed speed is much easier, and with a couple of button controls you can do pretty much all you need.

As it turned out, I had most of the stuff I needed at home already: for Valentine’s Day, I ordered three heart-shaped LED kits on eBay (three because that’s how many they sold it as), that came with a microcontroller — a STC89C51, which are Intel 8051 compatible micros with 35 I/O ports. Back then I also decided I would like to try programming one for custom patterns, and bought a programmer/development board as well, and it came with another micro. While this seemed like a fairly niche micro, it made prototyping easy, since it’s a DIP micro, which I can just put on a breadboard.

The first problem I had with this was figuring out how to flash anything on the micro. What I found online suggested using a set of Windows tools that are only in Chinese. I then found another page, that suggests to just use SDCC and a tool called stcgal, which is Python command line tool that allows one-command programming of the micro. This worked great — the Special Function Registers are described in the datasheet, so it shouldn’t be hard to describe in a header.

So I turned to EAGLE first, and KiCad second, trying to figure out how I would layout what I need on a board, and couldn’t find the STC anywhere. And then I started thinking that maybe this is a clone of something else. A few Google searches later, I found the AT89S52 — which turns out to be exactly the same pinout and most likely the same registers as well. The STC89C51 is (mostly) a clone of the AT89S52. It does not appear to share the same programming protocol, and it actually appears to provide a list of additional capabilities and registers that the Microchip MCU does not have, but it does mean I can just write the code targeting the AT89S52, and be done with it.

Now let me remind you that I’m not particularly versed with electronics, so I’ll probably be making tons of mistakes throughout this experience. But I’ll also be trying to provide regular updates on how the project is going and, assuming I do make it to get it to work, I’ll be publishing all the source code, and any schematics I might end up drawing for the project.

I’m actually happy of having found out that the chip is just a Microchip/Atmel chip instead — because this increases not just the usefulness of me talking about the design, and opening up the sources, but also for myself as I would rather play with something that I can reuse later, rather than with some specific micro I just happen to have at home.

Also, I might end up designing a few alternatives for this anyway. The original draft of this blog post was written when I just started thinking this around, but I’m now putting a few more edits on it while having fleshed out some of my intentions for the project, and it might just be I’ll run with a number of different options around. We’ll see.

Artist Spotlight: Sezzadactyl

A sampler of different art by Sezzadactyl

The times are bleak, with a fourth of the World’s population under lockdown or quarantine. I don’t particularly like dark times, and so I would rather bring some fresh air and happiness to people instead. For this reason, I decided to add a third weekly post on my blog, not to rants, or technical content, but rather as a way to show off more artists that deserve appreciation, and probably the support of those who can share, given that conventions are being cancelled left and right.

Sezzadactyl was at the top of my list. Her art is awesome, we stumbled across her booth at the MCM Comic Con London last year, but we were in a hurry at first (I had a photoshoot booked), and we just promised to come back. Unfortunately we forgot to write down the booth we were at, so it took us a while to find her again, but we were very happy to have a chance to get some prints of hers.

And we didn’t stop there. My wife got me as presents her 2018 and 2019 art books, for my birthday and Christmas. I was (and am) so happy, because I really couldn’t choose which prints to get!

As you can tell from the Instagram feed, she covers a wide range of subjects, from Pokémon (which admittedly was the first thing that caught our eyes) to Studio Ghibli characters, to video games, to dragons and pets.

Her Etsy store is currently on hiatus, but I do recommend, if you like her work, to get a notification for when it’ll reopen. As you can see we already have quite a few pieces, but we’ll look forward to get more in the future!

By the way, I want to give a shout out to Frame Company, where we bought the frames and mounts you see in the picture above. We have so far ordered more than a dozen frames from them, and they don’t just have a great selection to match any art style, but they do custom mounts, including multi-opening ones (you’ll see more of them in the future).

The importance of reliability

In the past seven years I worked at Google as a Site Reliability Engineer. I’m actually leaving the company — I’m writing this during my notice period. I’m currently scheduled to join Facebook as a Production Engineer at the end of May (unless COVID-19 makes things even more complicated). Both roles are related to reliability of services – Google even put it in the name – so you could say I have more than a passing idea of what is involved in maintaining (if not writing) reliable software and services.

I had to learn to be an SRE — this was my first 9-5 job (well, not really 9-5 given that SREs tend to have very flexible and very strange hours), and I hadn’t worked on such a scale before this. And as I wrote before, the job got me used to expect certain things. But it also made me realise how important it is for services and systems to be reliable, as well as secure. And just how much this is not fairly distributed out there.

During my tenure at Google, I’ve been oncall for many different services. Pretty much all of them have been business critical in one way or another — some much more than others. But none of them were critical to society: I’ve never joined the Google Cloud teams, any of the communication teams, or Maps teams. I had been in the Search team, but while it’s definitely important to the business, I think society would rather stay without search results than without a way to contact their loved ones. But that’s just my personal opinion.

The current huge jump in WFH users due to COVID-19 concerns has clearly shown how much more critical to society some of the online services are, that even ten years ago wouldn’t be found as important: Hangouts, Meet, Zoom, Messenger, WhatsApp, and the list goes on. Video calls appear to be the only way to get in touch with our loved ones right now, as well as, for many, the only way to work. Thankfully, most of these services are provided by companies that are big enough to be able to afford reliability in one form or another.

But at least in the UK, this has shown how many other services are clearly critical for society, but not provided by companies who can afford reliability. Online grocery shopping became the thing to do, nearly overnight. Ocado, possibly the biggest grocery delivery company, had had so much pressure on their system that they had to scramble, first introducing a “virtual queue” system, and then eventually taking down the whole website. As I type this, their website has a front page that informs you that the login is only available for those who already have a slot booked for this weekend, and otherwise is not available to anyone — no new slots can be booked.

In similar fashion online retailers, surgery online systems, online prescription services, and banks also appeared to be smothered in requests. I would be surprised if libraries, bookstores, and restaurant websites who don’t rely on the big delivery companies weren’t also affected.

And that had made me sad, and at least in part made me feel ashamed of myself. You see, I have been interviewing at another place, while I was looking for a new job. Not a big multinational company, a smaller one, an utility. And while the offer was very appealing, it was also a more challenging role, and I decided to pass on it. I’m not saying that I’d have made a huge difference for them from any other “trained” SRE, but I do think that a lot of these “smaller” players need their fair dose of reliability.

The problem is that there’s a mixture of different attitude, and actual costs, related to reliability the way Google and the other “bigs” do it. In the case of Google, more often than not the answer to something not working very well is to throw more resources (CPU, memory, storage) at it. That’s not something that you can do quickly when your service is running “on premise” (that is, in your own datacenter cabinet), and not something that you can do cheaply when you run on someone else’s cloud solution.

The thing is, Cloud is not just someone else’s computer. It’s a lot of computers, and it does add a lot of flexibility. And it can even be cheaper than running your own server, sometimes. But it’s also a risk, because if you don’t know when to say “enough”, you end up with budget-wrecking bills. Or sometimes with a problem “downstream”. Take Ocado — the likeliness it that it’s not the website that was being overloaded. It was the fulfillment. Indeed, the virtual queue approach was awesome: it limited the whole human interaction, not just the browser requests. And indeed, the queue worked fine (unlike, say, the CCC ticket queue), and the website didn’t look overloaded at all.

But saying that on-premise equipment does not scale is not trying to market cloud solutions — it’s admitting the truth: if you start getting so many requests at short notice, you can’t go, buy, image, and set up to serve another four or five machines — but you can tell Google Cloud, Amazon, Azure, to go and triple the amount of resources available. And that might or might not make it better for you.

It’s a tradeoff. And not one I have answers for. I can’t say I have experience with managing this tradeoff, either — all the teams I worked on had nearly blank cheques for internal resources (not quite, but nearly), and while resource saving was and is a thing, it never gets to be a real dollar amount that, as an SRE, you end up dealing with. While other companies, particularly smaller companies, need to pay a lot of attention to that.

From my point of view, what I can do is try to be more open with discussing design decisions in my software, particularly when I think it’s my experience talking. I still need to work actively on Tanuga, and I am even considering making a YouTube video of me discussing the way I plan to implement it — as if I was discussing this during a whiteboard design interview (since I have quite a bit of experience with them, this year).

The GPL is not an EULA

Before I get to the meat of this blog post, let me make sure nobody would misunderstand for a lawyer. What you’re about to read is a half-rant about Free Software projects distributing Windows binaries with pretty much default installer settings. It is in no way legal advice, and it is not being provided by someone with any semblance of legal training.

If you follow Foone or me on Twitter, you probably have noticed one time or another our exchanges of tweets about GPL in Windows installers.

The reason for this annoyance is that, as Foone pointed out many times in the past, licenses such as GPL and MIT are not EULAs: End-User License Agreements. They are, by design, licensing the distribution of the software, rather than its use. Now, in 2020 a lot of people are questioning this choice, but that’s a different topic altogether.

What this means for a consumer is that you are not required to agree to the GPL (or LGPL, or MIT) to install and use a piece of software. You’re required to agree to it if you decide to redistribute it. And as such the install wizards’ license dialogs, with their “I accept the terms” checkboxes are pretty much pointless. And an annoyance because you need to actually figure out where to click instead of keep clicking “Next” — yes I realise that the tiniest violin may be playing at that annoyance, but that’s not the point.

Indeed, the point why I make fun of these installers is because, at least to me, they show the cultural mark of proprietary software on Windows, and the general lack of interest in the politics of Free Software from pretty much everybody involved. The reason why the installers default to saying “EULA” and insisting on you to agree to it, is because non-Free Software on Windows usually does have EULAs. And even the FLOSS installer frameworks ended up implementing the same pattern.

VLC, unsurprisingly, cares about Free Software ideals and its politics, and went out of its way many years ago to make sure that the license is correctly shown in its installer. For a few other projects, I sent patches myself to correct them, whenever I can. For others… well, it’s complicated. The WiX installer toolkit was released years ago by Microsoft as open source, and is used by Calibre and Mono among others, but it seems like the only way to have it show a non-EULA screen is to copy one of its built-in dialogs and edit it.

As I said recently on Twitter, we need a reference website, with instructions on how to correctly display non-EULA Free Software licenses on Windows (and any other operating system). Unfortunately I don’t have time to go through the releasing process as I’m about to leave the company in a few weeks. So either it’ll have to wait another couple of weeks (when I’m free from those obligations), or be started by someone else.

Until then, I guess I’ll provide this blog post as a reference to anyone who asks me why am I even complaining about those licenses.

Success Story: Mergify, GitHub and Pre-Merge Checks

You may remember that when I complained about bubbles, one of the thing I complained about is that I had no idea how to get continuous integration right. And this kept being a problem to me for a few projects where I do actually get contributions.

In particular, glucometerutils is a project that I don’t want to be “just mine” in the future. I am releasing it with a very permissive license in the future, and I hope that others will continue contributing. But while I did manage to get Travis CI set up for it, I kept not remembering to run it myself, which is annoying.

One of the solutions that was proposed to me for that particular project was to use pre-commit, which clearly is a good starting point, but as the mypy integration shows, it’s not perfect: it requires you to duplicate quite a bit of information regarding dependencies. And honestly the problem is not whether it’s working on a per-commit basis, as much as it’s fine on a per-push basis. Which often it hasn’t been for me.

On the other hand, pull requests coming from other users have been much easier to not break stuff, because Travis CI would tell me if something was wrong. So I was basically looking for something that would let me go through exactly the same level of checking, but at the same time would let me push (or merge in) my code as soon as integration passed.

While I was looking around for this, I found a blog post by Debian developer Julien Danjou about his company Mergify which looked pretty much exactly what I wanted: it allows me to say that if either I approved of a pull request, or made it myself, and the continuous integration reports no problems, the pull request should just be rebased into the master branch.

The next problem was how to make it less cumbersome for me to keep developing the project, but thankfully Julien came through for that as well by introducing me to git-pull-request, although we had a bit of work needed for that, because of me having the same advanced settings in my git configuration for the past few years, and also because I’m lazy and not always capitalize the F in Flameeyes when I type my username. Hopefully all of that will be upstreamed by the time you read this blog post.

The end result of this? I moved glucometerutils to be part of the same organizations as the Protocols (which is also using Mergify now), and instead of git push, I’m using git pull-request. If I didn’t break anything, it gets merged by the bot. If someone sends me a pull request, I just need to approve, and once again the bot takes care of it.

I’ll look for ways to keep doing this for repositories that are not part of any organization, but at the very least this solved the issue for the two main repositories for which I have active contributors. And reduces the risk of me being the single point of failure for the projects.

Also, this is a perfect example of why Randall Munroe is Wrong, for once, or twice. Automating the merges definitely does not save me any more time than I spent even trying to get this to work. The fragment of time me and Julien spent to figure out why GitHub was throwing non-obvious validation errors will never be repaid by the time I save not clicking on the pull request link after git push. But saving time is not the only thing automation is about.

In particular, this time automation is about fairness, consistency, and resiliency: while I’m still special in the Mergify configuration, I now go through the same integration test as everyone else to merge to the repository, and it’s a bot doing the rebase-merge, rather than me, so it’s less likely it’ll make mistakes.

Anyway, thank you Julien, thank you Mergify, and thank you all who contribute. Hopefully the next few months will be a bit more active for me, between the forced work from home and the new job.

Blog Redirects, Azure Style

Last year, I set up an AppEngine app to redirect the old blog’s URLs to the WordPress install. It’s a relatively simple Flask web application, although it turned out to be around 700 lines of code (quite a bit to just serve redirects). While it ran fine for over a year on Google Cloud without me touching anything, and fitting into the free tier, I had to move it, as part of my divestment from GSuite (which is only vaguely linked to me leaving Google).

I could have just migrated the app on a new consumer account for AppEngine, but I decided to try something different, to avoid the bubble, and to compare other offerings. I decided to try Azure, which is Microsoft’s cloud offering. The first impressions were mixed.

The good thing of the Flask app I used for redirection being that simple is that nothing ties it to any one provider: the only things you need are a Python environment, and the ability to install the requests module. For the same codebase to work on AppEngine and Azure, though, there seems to be a need for a simple change. Both providers appear to rely on Gunicorn, but AppEngine appears to be looking for an object called app in the main module, while Azure is looking for it in the application module. This is trivially solved by defining the whole Flask app inside application.py and having the following content in main.py (the command line support is for my own convenience):

#!/usr/bin/env python3

import argparse

from application import app


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--listen_host', action='store', type=str, default='localhost',
        help='Host to listen on.')
    parser.add_argument(
        '--port', action='store', type=int, default=8080,
        help='Port to listen on.')

    args = parser.parse_args()

    app.run(host=args.listen_host, port=args.port, debug=True)

The next problem I encountered was with the deployment. While there’s plenty of guides out there to use different builders to set up the deployment on Azure, I was lazy and went straight for the most clicky one, which used GitHub Actions to deploy from a (private) GitHub repository straight into Azure, without having to install any command line tools (sweet!) Unfortunately, I hit a snag in the form of what I think is a bug in the Azure GitHub Action template.

You see, the generated workflow for the deployment to Azure is pretty much zipping up the content of the repository, after creating a virtualenv directory to install the requirements defined for it. But while the workflow creates the virtualenv in a directory called env, the default startup script for Azure is looking for it in a directory called antenv. So for me it was failing to start until I changed the workflow to use the latter:

    - name: Install Python dependencies
      run: |
        python3 -m venv antenv
        source antenv/bin/activate
        pip install -r requirements.txt
    - name: Zip the application files
      run: zip -r myapp.zip .

Once that problem was solved, the next issue was to figure out how to set up the app on its original domain and have it serve TLS connections as well. This turned out to be a bit more complicated than expected because I had set up CAA records in my DNS configuration to only allow Let’s Encrypt, but Microsoft uses DigiCert to provide the (short lived) certificates, so until I removed that it wouldn’t be able to issue (oops.)

After everything is set up, here’s a few more of the differences between the two services, that I noticed.

First of all, Azure does not provide IPv6, although since they use CNAME records this can change at any time in the future. This is not a big deal for me, not only because the IPv6 is still dreamland, but also because the redirection would point to WordPress, that does not support IPv6. Nonetheless, it’s an interesting point to make, that despite Microsoft having spent years preparing for IPv6 support, and having even run Teredo tunnels, they also appear to not be ready to provide modern service entrypoints.

Second, and related, it looks like on Azure there’s a DNAT in front of the requests sent to Gunicorn — all the logs show the requests coming from 172.16.0.1 (a private IP address). This is opposite to AppEngine that shows the actual request IP in the log. It’s not a huge deal, but it does make it a bit annoying to figure out if there’s someone trying to attack your hostname. It also makes it funny that it’s not supporting IPv6, given it does not appear to need for the application itself to support the new addresses.

Speaking of logs, GCP exposes structured request logs. This is a pet peeve of mine, which GCP appears to at least make easier to deal with. In general, it allows you to filter logs much more easily to find out instances of requests being terminated with an error status, which is something that I paid close attention to in the weeks after deploying the original AppEngine redirector: I wanted to make sure my rewriting code didn’t miss some corner cases that users were actually hitting.

I couldn’t figure out how to get a similar level of detail in Azure, but honestly I have not tried too hard right now, because I don’t need that level of control for the moment. Also, while there does seem to be an entry in the portal’s menu to query logs, when I try it out I get a message «Register resource provider ‘Microsoft.Insights’ for this subscription to enable this query» which suggests to me it might be a paid extra.

Speaking of paid, the question of costs is something that clearly needs to be kept in clear sight, particularly given recent news cycles. Azure seems to provide a 12 months free trial, but it also gives you £150 of credit for 14 days, which don’t seem to match up properly to me. I’ll update the blog post (or write a new one) with more details after I have some more experience with the system.

I know that someone will comment complaining that I shouldn’t even consider Cloud Computing as a valid option. But honestly, from what I can see, I will be likely running a couple more Cloud applications out there, rather than keep hosting my own websites, and running my own servers. It’s just more practical, and it’s a different trade-off between costs and time spent maintaining thing, so I’m okay with it going this way. But I also want to make sure I don’t end up locking myself into a single provider, with no chance of migrating.

Publishing Documentation

I have been repeating for years that blogs are not documentation out of themselves. While I have spent a lot of time over the years to make sure that my blog’s links are not broken, I also know that many of my old blog posts are no longer relevant at all. The links out of the blog can be broken, and it’s not particularly easy to identify them. What might have been true in 2009 might not be true in 2020. The best option for implementing something has likely changed significantly, given how ten years ago, Cloud Computing was barely a thing on the horizon, and LXC was considered an experiment.

This is the reason why Autotools Mythbuster is the way it is: it’s a “living book” — I can update and improve it, but at the same time it can be used as a stable reference of best practices: when they change it gets updated, but the link is still a pointer to the good practice.

At work, I pretty much got used to “Radically Simple Documentation” – thanks to Riona and her team. Which pretty much means I only needed to care about the content of the documentation, rather than dealing with how it would render, either in terms of pipeline or style.

And just like other problems with the bubble, when I try to do the same outside of it, I get thoroughly lost. The Glucometer Protocols site had been hosted as GitHub pages for a few years by now — but I now wanted to add some diagrams, as more modern protocols (as well as some older, but messier, protocols) would be much simpler to explain with UML Sequence Diagrams to go with.

The first problem was of course to find a way to generate sequence diagrams out of code that can be checked-in and reviewed, rather than as binary blobs — and thankfully there are a few options. I settled for blockdiag because it’s the easiest to set up in a hurry. But it turned out that integrating it is far from easy as it would seem.

While GitHub pages uses Jekyll, it uses such an old version that reproducing that on Netlify is pretty much impossible. Most of the themes that are available out there are mostly dedicated to personal sites, or ecommerce, or blogs — and even when I found one that seemed suitable for this kind of reference, I couldn’t figure out how to to get the whole thing to work. And it didn’t help that Jekyll appears to be very scant on debug logging.

I tried a number of different static site generators, including a few in JavaScript (which I find particularly annoying), but the end result was almost always that they seemed more geared towards “marketing” sites (in a very loose sense) than references. To this moment, I miss the simplicity of g3doc.

I ended up settling for Foliant, which appears to be more geared towards writing actual books than reference documentation, but wraps around MkDocs, and it provides a plugin that integrates with Blockdiag (although I still have a pending pull request to support more diagram types). And with a bit of play around it, I managed to get Netlify to build this properly and serve it. Which is what you get now.

But of course, since MkDocs (and a number of other Python-based tools I found) appear to rely on the same Markdown library, they are not even completely compatible with the Markdown as written for Jekyll and GitHub pages: the Python implementation is much stricter when it comes to indentation, and misses some of the feature. Most of those appear to have been at some point works in progress, but there doesn’t seem to be much movement on the library itself.

Again, these are relatively simple features I came to expect for documentation. And I know that some of my (soon-to-be-former) colleagues have been working on improving the state of opensource documentation frameworks, including Lisa working on Docsy, which looks awesome — but relies on Hugo, which I still dislike, and seems to have taken a direction which is going further and further away from me (the latest when I was trying to set this up is that to use Hugo on Linux they now seem to require you to install Homebrew, because clearly having something easy for Linux packagers to work with is not worth it, sigh).

I might reconsider that, if Hugo finds a way to implement building images out of other tools, but I don’t have strong expectations that the needs for documentation reference would be considered for future updates to Hugo, given how it was previously socialized as a static blog engine, only to pivot to needs that would make it more “marketable”.

I even miss GuideXML, to a point. This was Gentoo’s documentation format back in the days before the Wiki. It was complex, and probably more complicated than it should have been, but at least the pipeline to generate the documentation was well defined.

Anyhow, if anyone out there has experience in setting up reference documentation sites, and wants to make it easier to maintain a repository of information on glucometers, I’ll welcome help, suggestions, pull requests, and links to documentation and tools.

Oh Gosh, Trying to Find a New Email Provider in 2020

In the year 2020, I decided to move out of my GSuite account (née Google Apps for Business), which allowed me to use Gmail for my personal domain, and that I have used for the past ten years or so. It’s not that I have a problem with Gmail (I worked nearly seven years for Google now, why would it be a problem?) or that I think the service is not up to scratch (as this experience is proving me, I’d argue that it’s still the best service you can rely upon for small and medium businesses — which is the area I focused on when I ran my own company). It’s just that I’m not a business and the features that GSuite provides over the free Gmail no longer make up for the services I’m missing.

But I still wanted to be able to use my own domain for my mail, rather than going back to the standard Gmail domain. So I decided to look around, and migrate my mail to another paid, reliable, and possibly better solution. Alas, the results after a week of looking and playing around are not particularly impressive to me.

First of all I discarded, without even looking at it, the option of self-hosting my mail. I don’t have the time, nor the experience, nor the will to have to deal with my own email hosting. It’s a landmine of issues and risks and I don’t intend to accept them. So if you’re about to suggest this, feel free to not comment. I’m not going to entertain those suggestions anyway.

I ended up looking at what people have been suggesting on Twitter a few times and evaluated two options: ProtonMail and FastMail. I ended up finding both lacking. And I think I’m a bit more upset with the former than the latter, for reasons I’ll get to in this (much longer than usual) blog post.

My requirements for a replacement solution were to have a reliable webmail interface, with desktop notifications. A working Android app. And security at login. I was not particularly interested in ProtonMail’s encrypt-and-sign everything approach, but I could live with that. But I wanted something that wouldn’t risk letting everyone in with just a password, so 2FA was a must for me. I was also hoping to find something that would make it easy to deal with git send-email, but I ended up accepting right away that nothing would be anywhere close to the solution that we found with Gmail and GSuite (more on that later.)

Bad 2FA Options For All

So I started by looking at the 2nd Factor Authentication options for the two providers. Google being the earliest adopter of the U2F standard means of course that this is what I’ve been using, and would love to keep using once I replace it. But of the two providers I was considering, only FastMail stated explicitly it supported U2F. I was told that ProtonMail expects to add support for it this year, but I couldn’t even tell that from their website.

So I tried first FastMail, which has a 30 days free trial. To set up the U2F device, you need to provide a phone number as a recovery option — which gets used for SMS OTP. I don’t like SMS OTP because it’s not really secure (in some countries taking over a phone number is easier than taking over an email address), and because it’s not reliable the moment you don’t have mobile network services. It’s easy to mistake the “no access to mobile network” with “no access to Internet” and say that it doesn’t really matter, but there are plenty of places where I would be able to reach the Internet and not receive SMS: planes, tube platforms, the office when I arrived in London, …

But surely U2F is enough, why am I even bothering complaining about SMS OTP, given that you can disable it once the U2F security key is added? Well, turns out that when I tried to login on the Android app, I was just sent an SMS with the OTP to log myself in. Indeed, after I removed the phone number backup option, the Android app threw me a lovely error of «U2F is your only two-step verification method, but this is not supported here.» On Android, which can act as an U2F token.

As I found out afterwards, you can add a TOTP app as well, which solves the issue of logging in on Android without mobile network service, but by that point I had already started looking at ProtonMail, because it was not the best first impression to start with.

ProtonMail and the Bridge of Destiny

ProtonMail does not provide standard IMAP/SMTP access, because encryption (that’s the best reason I can get from the documentation, I’m not sure at all what this was all about, but honestly, that’s as far as I care to look into it). If you want to use a “normal” mail agent like Thunderbird, you need to use a software, accessible to paying customers only, that acts as “bridge”. As far as I can tell after using it, it appears to be mostly a way to handle the authentication rather than the encryption per se. Indeed, you log into the Bridge software with username, password and OTP, and then it provides localhost-only endpoints for IMAP4 and SMTP, with a generated local password. Neat.

Except it’s only available in Beta for Linux, so instead I ended up running it on Windows at first.

This is an interesting approach. Gmail implemented, many years ago, a new extension to IMAP (and SMTP) that allows using OAuth 2 for IMAP logins. This effectively delegates the login action to a browser, rather than executing it inline in the protocol, and as such it allows to request OTPs, or even supporting U2F. Thunderbird on Windows does work very well with this and even supports U2F out of the box.

Sidenote: Thunderbird seems to have something silly going on. When you add a new account to it, it has a drop-down box to let you select the authentication method (or go for “Autodetect”). Unfortunately, the drop-down does not have the OAuth2 option at all. Even if you select imap.gmail.com as the server — I know hardcoding is bad, but not allowing it at all sounds worse. But if you cheat and give it 12345 as password, and select password authentication just to go through with adding the account, then you can select OAuth 2 as authentication type and it all works out.

Anyway, neither ProtonMail nor FastMail appear to have implemented this authentication method, despite the fact that, if I understood that correctly, it’s supported out of the box on Thunderbird, Apple’s Mail, and a bunch of other mail clients. Indeed, if you want to use IMAP/SMTP with FastMail, they only appear to give you the option to use application-specific passwords, which are a shame.

So why did I need IMAP access to begin with? Well, I wanted to import all my mail from Gmail into ProtonMail, and I though the easier way to do so was going to be through Thunderbird and manually copy the folders I needed. That turned out to be a mistake: Thunderbird crashed while trying to copy some of the content over, and I effectively was spending more time while waiting for it to index anything than instructing it on what to do.

Luckily there’s alternative options for this.

Important Importing Tooling

ProtonMail provides another piece of software, in addition to the Bridge, to paying customers: an Import Tool. This allows you to login to another IMAP server, and copy over the content. I decided to use that to copy over my Gmail content to ProtonMail.

First of all, the tool does not support OAuth2 authentication. To be able to access Gmail or GSuite mailboxes, it needs to use an Application-Specific Password. Annoying but not a dealbreaker for me, since I’m not enrolled in the Advanced Protection Program, which among other things disable “less-secure apps” (i.e. those apps using Application-Specific Passwords). I generated one, logged in, and selected the labels I wanted to copy over, then went to bed, a little, but not much, concerned over the 52 and counting messages that it said it was failing to import.

I woke up to the tool reporting only 32% of around fifty thousands messages imported. I paused, then resumed, the import hoping to getting it unstuck, and left to play Pokémon with my wife, coming back to a computer stuck exactly at the same point. I tried stopping and closing the Import Tool, but that didn’t work, it got stuck. I tried rebooting Windows and it refused to, because my C: drive was full. Huh?

When I went to look into it, I found a 436GB text file, that’s the log from the software. Since the file was too big to open with nearly anything on my computer, I used good old type, and beside the initial part possibly containing useful information, most of the file repeated the same error message about not being able to parse a mime type, with no message ID or subject attached. Not useful. I had to delete the file, since my system was rejecting writes because of the drive being full, but it also does not bode well for the way the importer is written: clearly there’s no retry limit on some action, no log coalescing, and no security feature to go “Hang on, am I DoSing the local system?”

I went looking for tools I could use to sync IMAP servers manually. I found isync/mbsync, which as a slightly annoyance is written in C and needs to be built, so not easy to run on Windows where I do have the ProtonMail bridge, but not something I can’t overcome. When I was looking at the website, it said to check the README for workarounds needed with certain servers. Unfortunately at the time of writing the document, in the Compatibility section, refers to “M$ Exchange” — which in 2020 is a very silly, juvenile, and annoying way to refer to what is possibly still the largest enterprise mail server out there. Yes, I am judging a project by its README the way you judge a book by its cover, but I would expect that a project unable to call Microsoft by its name in this day and age is unlikely to have added support for OAuth2 authentication or any of the many extensions that Gmail provides for efficient listing of messages.

I turned to FastMail to see how they are implementing it: importing Gmail or GSuite content can be done directly on their server side: they require you to provide OAuth2 access to all your email (but then again, if you’re planning to use them as your service provider, you kind of are already doing that). It does not allow you to choose which labels you want to import: it’ll clone everything, even your trash/bin folder. So at the time of writing it’s importing 180k messages. It took a while, and it showed the funny result of saying «175,784 of 172,368 messages imported.» Bonus point to FastMail for actually sending the completion note as an email, so that it can be fetched accordingly.

A side effect of FastMail doing the imports server side is that there’s no way for you to transfer ProtonMail boxes to FastMail, or any other equivalent server with server-side import: the Bridge needs to run on your local system for you to authenticate. It’s effectively an additional lock-in.

Instead of insisting on self-hosting options, I honestly feel that the FLOSS activists should maybe invest a little more thought and time on providing ways for average users with average needs to migrate their content, avoiding the lock-in. Because even if the perfect self-hosting email solution is out there, right now trying to migrate to it would be an absolute nightmare and nobody will bother, preferring to stick to their perfectly-working locked-in cloud provider.

Missing Features Mayhem

At that point I was a bit annoyed, but I had no urgency to move the old email away, for now at least. So instead I went on to check how ProtonMail worked as primary mail interface. I changed MX around, set up the various verification methods, and waited. One of the nice things of migrating the mail provider is that you end up realizing just how many mailing lists and stuff you keep receiving, that you previously just filed away with filters.

I removed a bunch of subscriptions to open source mailing lists for projects I am no longer directly involved in, and unlikely to go back to, and then I started looking at other newsletters and promotions. For at least one of them, I thought I would probably be better served by NewsBlur‘s newsletter-to-RSS interface. As documented in the service itself, the recommended way to use this is to create a filter that takes the input newsletter and forwards them to your newsblur alias.

And here’s the first ProtonMail feature that I’m missing: there’s no way to set up forwarding filters. This is more than a bit annoying: there was mail coming to my address that I used to forward to my mother (mostly bills related to her house, before I set up a separate domain with multiple aliases that point at our two addresses), and there still are a few messages that come to me only, that I forward to my wife, where using our other alias addresses is not feasible for various reasons.

But it’s not just a matter of forwards that is missing. When I looked into the filter system of ProtonMail I found it very lacking. You can’t filter based on an arbitrary header. You cannot filter based on a list-id! Despite the webmail being able to tell that an email came through from a mailing list, and providing an explicit Unsubscribe button, based on the headers, it neither has a “Filter messages like these” like Gmail has, nor a way to select this manually. And that is a lot more annoying.

FastMail, by comparison, provides a much more detailed rules support, including the ability to provide them directly in Sieve language, and it allows forward-and-delete of email as well, which is exactly what the NewsBlur integration needs (although to note, while you can see the interface for do that, trial accounts can’t set up forwarding rules!) And yes, the “Add Rule from Message” flow defaults to the list identifier for the messages. Also, to one-up even Gmail on this, you can set those rules from the mobile app as well — and if you think this is not that big of a deal, just think of much more likely you are to have spare time to do this kind of boring tasks while waiting for your train (if you commute by train, that is).

In terms of features, it seems like FastMail has the clear upper hand. Even ignoring the calendar provided, it supports the modern “Snooze” concept, letting mail show up later in the day or the week (which is great when, say, you don’t want to keep the unread email about your job interviews to show up on your mail inbox at the office), and it even has the ability to permanently delete messages in certain folders on after a certain amount of days — just like gmaillabelpurge! I think this last feature is the one that made me realize I really just need to use FastMail.

Sending It All Out

As I said earlier, even before trying to decide which one of the two providers to try, I gave up on the idea of being able to use either of them with git send-email to send kernel patches and similar. Neither of them supports OAuth2 authentication, and I was told there’s no way to set up a “send-only” environment.

My solution to this was to bite the bullet and deal with a real(ish) sendmail implementation again, by using a script that would connect over SSH to one of my servers, and use the postfix instance there (noting that I’m trying to cut down on having to run my own servers). I briefly considered using my HTPC for that, but then I realized that it would require me to put my home IP addresses in the SPF records for my domain, and I didn’t really want to publicise those as much.

But it turned out the information I found was incorrect. FastMail does support SMTP-only Application Specific Passwords! This is an awesomely secure feature that not even Gmail has right now, and it makes it a breeze to configure Git for it, and the worst that can happen is that someone can spoof your email address, until you figure it out. That does not mean that it’s safe to share that password around, but it does make it much less risky to keep the password on, say, your laptop.

I would even venture that this is even safer than the sendgmail approach that I linked above, as the other one requires full mail access with the token, which can easily be abused by an attacker.

Conclusion

So at the end of this whole odyssey, I decided to stick with FastMail.

ProtonMail sounds good on paper, but it give me the impression that it’s overengineered in implementation, and not thought out enough in feature design. I cannot otherwise see how many basic features (forwarding filters, send-only protocol support — C-S-c to add a CC line) would otherwise be missing. And I’m very surprised about the security angle for the whole service.

FastMail does have some rough edges, particularly on their webapp. Small things, like being able to right-click to get a context menu would be nice. U2F support is clearly lacking: having it work on their Android app for me would be a huge step forward. And I should point out that FastMail has a much friendlier way to test its service, as the 30 days free option includes nearly all of the functionality and enough space to test an import of the data from a 10 years old Gmail.

LibreView online reporting service

You may remember I complained about cloud-based solutions before. I have had harsh words about what are to me irresponsible self-hosting suggestions, and I’m not particularly impressed by how every other glucometer manufacturer appears to want their tools to be used, uploading to their cloud solutions what I would expect is a trove of real-world blood sugar reports from diabetics.

But as it happens, since I’m using the FreeStyle LibreLink app on my phone, I get the data uploaded to Abbott’s LibreView anyway. The LibreView service is geo-restricted, so it might not be available in all the countries where FreeStyle Libre is present, which probably is why the standalone Windows app still exists, and why the Libre 2 does not appear to be supported by it.

I haven’t used the service at all until this past month, when I visited the diabetic nurse at the local hospital (I had some blood sugar control issues), and she asked me to connect with their clinic. Turns out that (with the correct authorization), the staff at the clinic can access the real-time monitoring that I get from the phone. Given that this is useful to me, I find this neat, rather than creepy. Also it seems to require authorization on both sides, and it includes an email notification, so possibly they didn’t do that bad of a job with it.

The site is also a good replacement for the desktop app, when using the app with the phone, rather than the reader. It provides the same level of details in the reports, including the “pattern insights”, and a full view of the day-to-day aligned on weeks. Generally, those reports are very useful. And they are available on the site even for yourself, not just for the clinics, which is nice.

Also it turns out that the app tracks how many phones you’ve been using to scan the sensor — in my case, six. Although it’s over 1⅓ years since I have used a different one. I couldn’t see a way to remove the old phones, but at the same time, they are not reporting anything in and they don’t seem to have a limit on how many you can have.

Overall it’s effectively just a web app version of the information that was already available on the phone (but hard to extract and share) or on the reader (if you are still using that). I like the interface and it seems fairly friendly.

Also, you may remember (or notice, if you read the links above) that I had taken an aside pointing out how Diabetes Ireland misunderstood the graphs shown in the report when the Libre reached Ireland. I guess they were not alone, because in this version of the report Abbott explicitly labels the 10th-90th percentile highlight, the 25th-75th percentile highlight, and the median line. Of course this assumes that whoever is reading the graph is aware of “percentile” and “median” stand for — but that’s at least a huge step in the right direction.

FreeStyle Libre 2: Notes From The Deep Dive

As I wrote last week, I’ve started playing with Ghidra to dive into the FreeStyle Libre 2 software, to try and figure out how to speak the encrypted protocol, which is in the way to access the Libre 2 device as we already access the Libre 1.

I’m not an expert when it comes to binary reverse engineering — most of the work I’ve done around reverse engineering has been on protocols that are not otherwise encrypted. But as I said in the previous post, the binary still included a lot of debug logs. In particular, the logs included the name of the class, and the name of the method, which made it fairly easy to track down quite a bit of information on how the software works, as well as the way the protocols work.

I also got lucky to find a second implementation of their software protocol. At least a partial one. You see, there’s two software that can communicate with the old Libre system: the desktop software that appears to be available in Germany, Australia, and a few other countries, and the “driver” for LibreView, a service that allows GPs, consultants, and hospitals to remotely access the blood sugar readings of their patients. (I should write about it later.) While the main app is a single, mostly statically linked Qt graphical app, the “driver” is composed of a number of DLL modules, which makes it much easier to read.

Unfortunately it does not appear to support the Libre 2 and its encryption, but it does help to figure out other details around the rest of the transport protocol, since it’s much better logged, and provides clearer view of the function structure — it seems like the two packages actually come from the same codebase, as a number of classes share the same name between the two implementations.

The interesting part is trying to figure out what the various codenames mean. I found the names Orpheus and Apollo in the desktop app, and I assumed the former was the Libre and the latter the Libre 2, because the encryption is implemented only on the Apollo branch of the hierarchy, in particular in a class called ApolloCryptoLib. But then again, in the “driver” I found the codenames Apollo and Athena — and since the software says it supports the “Libre Pro” (which as far as I know is the US-only version that was released a few years ago), I’m wholly confused on what’s what now.

But as I said, the software does have parallel C++ class hierarchies, implementing lower-level and higher-level access controls for the two codenames. And because the logs include the class name it looks like most functions are instantiated twice (which is why I found it easier to figure out the flow for the non-crypto part from the “driver” module.) A lot of the work I’m doing appears to be manual vtable decoding, since there’s a lot of virtual methods all around.

What also became very apparent is that my hunch was right: the Libre 2 system uses basically the same higher level protocol as the Libre 1. Indeed, I can confirm not only that the text commands sent are the same (and the expected responses are the same, as well), but also that the binary protocol is parsed in the same way. So the only obstacle between glucometerutils and the Libre 2 is the encryption. Indeed, it seems like all three devices use the same protocol, which is either called Shazam, AAP or ATP — it’s not quite clear given the different set of naming conventions in the code, but it’s still pretty obvious that they share the same protocol, not just the HID transport, but also for defining higher level commands.

Now about the encryption, what I found from looking at the software is that there are two sets of keys that are used. The first is used in the “authentication” phase, which is effectively a challenge-response between the device and the software, based on the serial number of the device, and the other is used in the encrypted communication. This was fairly easy to spot, because one of the classes in the code is named ApolloCryptoLib, and it included functions with names like Encrypt, Decrypt, and GenerateKeys.

Also one note that important: the patch (sensor) serial number is not used for the encryption of the reader’s access. This is something that comes up time and time again. Indeed at least a few people have been telling me on Twitter that the Libre 2 sensors (or patches, as Abbott calls them) are also encrypted and that clearly they use the same protocol for the reader. But that’s not the case at all. Indeed, the same encryption happens when no patch was ever initialized, and the information on the patches is fetched from the reader as the last part of the initialization.

Another important piece of information that I found in the code is that the encryption uses separate keys for encryption and MAC. This means that there’s an actual encryption transport layer, similar to TLS, but not similar enough to worry me so much regarding the key material present.

With the code at hand, I also managed to confirm my original basic assumptions about the initialization using sub-commands, where the same message type is sent with a follow-up bytes including information on the command. The confirmation came from a log message calling the first byte in the command… subcmd. The following diagram is my current best understanding of the initialization flow:

Initialization sequence for the FreeStyle Libre 2 encryption protocol.

Unfortunately, most of the functions that I have found related to the encryption (and to the binary protocol, at least in the standalone app) ended up being quite complicated to read. At first I thought this was a side effect of some obfuscation system, but I’m no longer sure. It might be an effect of the compile/decompile cycle, but at least on Ghidra these appear as huge switch blocks, with what is effectively a state machine jumping around, even for the most simple of the methods.

Preview(opens in a new tab)

I took a function that is (hopefully) the least likely to get Abbott upset for me reposting it. It’s a simple function: it takes an integer and returns an integer. I called it int titfortat(int) because it took me a while to figure out what it was meant to do. It turns out to normalize the input to either 0, 1 or -1 — the latter being an error condition. It has an invocation of INT3 (a debugger trap), and it has the whole state machine construct I’ve seen in most of the other functions. What I found about this function is that it’s used to set a variable based on whether the generated keys are used for authentication or session.

The main blocker for me right now to figure out how the encryption is working, is that it looks like there’s an array of 21 different objects, each of which comes with what looks like a vtable, and only partially implemented. It does not conform to the way Visual C++ is building objects, so maybe it’s a static encryption library linked inside, or something different altogether. The functions I can reach from those objects are clearly cryptography-related: they include tables for SHA1 and SHA2 at least.

The way the objects are used is also a bit confusing: an initialization function appears to assign to each pointer in the array the value returned by a different function — but each of the functions appear to only return the value of a (different) global. Whenever the vtable-like is not fully implemented, it appears to be pointing at code that simply return an error constant. And when the code is calling those objects, if an error is returned it skips the object and go to the next.

On the other hand, this exercise is giving me a lot of insights about the insight of the overall HID transport as well as the protocol inside of it. For example, I finally found the answer to which checksum the binary messages include! It’s a modified CRC32, except that it’s calculated over 4-bit at a time instead of the usual 8, and thus requires a shortened lookup table (16 entries instead of 256) — and if you think that this is completely pointless, I tend to agree with you. I also found that some of the sub-commands for the ATP protocol include an extra byte before the actual sub-command identifier. I’m not sure how those are interpreted yet, and it does not seem to be a checksum, as they are identical for different payloads.

Anyway, this is clearly not enough information yet to proceed with implementing a driver, but it might be just enough information to start improving the support for the binary protocol (ATP) if the Libre 2 turns out not to understand the normal text commands. Which I find very unlikely, but you we’ll have to see.