FreeStyle Libre 2: Notes From The Deep Dive

As I wrote last week, I’ve started playing with Ghidra to dive into the FreeStyle Libre 2 software, to try and figure out how to speak the encrypted protocol, which is in the way to access the Libre 2 device as we already access the Libre 1.

I’m not an expert when it comes to binary reverse engineering — most of the work I’ve done around reverse engineering has been on protocols that are not otherwise encrypted. But as I said in the previous post, the binary still included a lot of debug logs. In particular, the logs included the name of the class, and the name of the method, which made it fairly easy to track down quite a bit of information on how the software works, as well as the way the protocols work.

I also got lucky to find a second implementation of their software protocol. At least a partial one. You see, there’s two software that can communicate with the old Libre system: the desktop software that appears to be available in Germany, Australia, and a few other countries, and the “driver” for LibreView, a service that allows GPs, consultants, and hospitals to remotely access the blood sugar readings of their patients. (I should write about it later.) While the main app is a single, mostly statically linked Qt graphical app, the “driver” is composed of a number of DLL modules, which makes it much easier to read.

Unfortunately it does not appear to support the Libre 2 and its encryption, but it does help to figure out other details around the rest of the transport protocol, since it’s much better logged, and provides clearer view of the function structure — it seems like the two packages actually come from the same codebase, as a number of classes share the same name between the two implementations.

The interesting part is trying to figure out what the various codenames mean. I found the names Orpheus and Apollo in the desktop app, and I assumed the former was the Libre and the latter the Libre 2, because the encryption is implemented only on the Apollo branch of the hierarchy, in particular in a class called ApolloCryptoLib. But then again, in the “driver” I found the codenames Apollo and Athena — and since the software says it supports the “Libre Pro” (which as far as I know is the US-only version that was released a few years ago), I’m wholly confused on what’s what now.

But as I said, the software does have parallel C++ class hierarchies, implementing lower-level and higher-level access controls for the two codenames. And because the logs include the class name it looks like most functions are instantiated twice (which is why I found it easier to figure out the flow for the non-crypto part from the “driver” module.) A lot of the work I’m doing appears to be manual vtable decoding, since there’s a lot of virtual methods all around.

What also became very apparent is that my hunch was right: the Libre 2 system uses basically the same higher level protocol as the Libre 1. Indeed, I can confirm not only that the text commands sent are the same (and the expected responses are the same, as well), but also that the binary protocol is parsed in the same way. So the only obstacle between glucometerutils and the Libre 2 is the encryption. Indeed, it seems like all three devices use the same protocol, which is either called Shazam, AAP or ATP — it’s not quite clear given the different set of naming conventions in the code, but it’s still pretty obvious that they share the same protocol, not just the HID transport, but also for defining higher level commands.

Now about the encryption, what I found from looking at the software is that there are two sets of keys that are used. The first is used in the “authentication” phase, which is effectively a challenge-response between the device and the software, based on the serial number of the device, and the other is used in the encrypted communication. This was fairly easy to spot, because one of the classes in the code is named ApolloCryptoLib, and it included functions with names like Encrypt, Decrypt, and GenerateKeys.

Also one note that important: the patch (sensor) serial number is not used for the encryption of the reader’s access. This is something that comes up time and time again. Indeed at least a few people have been telling me on Twitter that the Libre 2 sensors (or patches, as Abbott calls them) are also encrypted and that clearly they use the same protocol for the reader. But that’s not the case at all. Indeed, the same encryption happens when no patch was ever initialized, and the information on the patches is fetched from the reader as the last part of the initialization.

Another important piece of information that I found in the code is that the encryption uses separate keys for encryption and MAC. This means that there’s an actual encryption transport layer, similar to TLS, but not similar enough to worry me so much regarding the key material present.

With the code at hand, I also managed to confirm my original basic assumptions about the initialization using sub-commands, where the same message type is sent with a follow-up bytes including information on the command. The confirmation came from a log message calling the first byte in the command… subcmd. The following diagram is my current best understanding of the initialization flow:

Initialization sequence for the FreeStyle Libre 2 encryption protocol.

Unfortunately, most of the functions that I have found related to the encryption (and to the binary protocol, at least in the standalone app) ended up being quite complicated to read. At first I thought this was a side effect of some obfuscation system, but I’m no longer sure. It might be an effect of the compile/decompile cycle, but at least on Ghidra these appear as huge switch blocks, with what is effectively a state machine jumping around, even for the most simple of the methods.

Preview(opens in a new tab)

I took a function that is (hopefully) the least likely to get Abbott upset for me reposting it. It’s a simple function: it takes an integer and returns an integer. I called it int titfortat(int) because it took me a while to figure out what it was meant to do. It turns out to normalize the input to either 0, 1 or -1 — the latter being an error condition. It has an invocation of INT3 (a debugger trap), and it has the whole state machine construct I’ve seen in most of the other functions. What I found about this function is that it’s used to set a variable based on whether the generated keys are used for authentication or session.

The main blocker for me right now to figure out how the encryption is working, is that it looks like there’s an array of 21 different objects, each of which comes with what looks like a vtable, and only partially implemented. It does not conform to the way Visual C++ is building objects, so maybe it’s a static encryption library linked inside, or something different altogether. The functions I can reach from those objects are clearly cryptography-related: they include tables for SHA1 and SHA2 at least.

The way the objects are used is also a bit confusing: an initialization function appears to assign to each pointer in the array the value returned by a different function — but each of the functions appear to only return the value of a (different) global. Whenever the vtable-like is not fully implemented, it appears to be pointing at code that simply return an error constant. And when the code is calling those objects, if an error is returned it skips the object and go to the next.

On the other hand, this exercise is giving me a lot of insights about the insight of the overall HID transport as well as the protocol inside of it. For example, I finally found the answer to which checksum the binary messages include! It’s a modified CRC32, except that it’s calculated over 4-bit at a time instead of the usual 8, and thus requires a shortened lookup table (16 entries instead of 256) — and if you think that this is completely pointless, I tend to agree with you. I also found that some of the sub-commands for the ATP protocol include an extra byte before the actual sub-command identifier. I’m not sure how those are interpreted yet, and it does not seem to be a checksum, as they are identical for different payloads.

Anyway, this is clearly not enough information yet to proceed with implementing a driver, but it might be just enough information to start improving the support for the binary protocol (ATP) if the Libre 2 turns out not to understand the normal text commands. Which I find very unlikely, but you we’ll have to see.

Leveling up my reverse engineering: time for Ghidra

In my quest to figure out how to download data from the Abbott FreeStyle Libre 2, I decided that figuring it out just by looking at the captures was a dead end. While my first few captures had me convinced the reader would keep sending the same challenge, and so I could at least replay that, it turned out to be wrong, and that excluded most of the simplest/silliest of encryption schemes.

As Pierre suggested me on Twitter, the easiest way to find the answer would be to analyze the binary itself. This sounded like a hugely daunting task for myself, as I don’t speak fluent Intel assembly, and particularly I don’t speak fluent Windows interfaces. The closest I got to this in the past has been the reverse engineering of the Verio, in which I ran the software on top of WinDbg and discovered, to my relief, that they not just kept some of the logging on, but also a whole lot of debug logs that made it actually fairly easy to reverse the whole protocol.

But when the options are learning enough about cryptography and cryptanalysis to break the encoding, or spend time learning how to reverse engineer a Windows binary — yeah I thought the latter would suit me better. It’s not like I have not dabbled in reversing my own binaries, or built my own (terrible) 8086 simulator in high school, because I wanted to be able to see how the registers would be affected, and didn’t want to wait for my turn to use the clunky EEPROM system we used in the lab (this whole thing is probably a story for another day).

Also, since the last time I considered reversing a binary (when I was looking at my laptop’s keyboard), there’s a huge development: the NSA released Ghidra. For those who have not heard, Ghidra is a tool to reverse engineer binaries that includes a decompiler, and a full blown UI. And it’s open source. Given that the previous best option for this was IDA Pro, with thousands of dollars of licenses expected, this opened a huge amount of doors.

So I spent a weekend in which I had some spare time to try my best on reversing the actual code coming from the official app for the Libre 2 — I’ll provide more detail of that once I understand it better, and I know which parts are critical to share, and which one would probably get me in trouble. In general, I did manage to find out quite a bit more about the software, the devices, and the protocol — if nothing else, because Abbott left a bunch of debug logging disabled, but built in (and no, this time the WinDbg trick didn’t work because they seems to have added an explicit anti-debugger exception (although, I guess I could learn to defeat that, while I’m at it).

Because I was at first a bit skeptical about my ability to do anything at all with this, I also have been running it in an Ubuntu VM, but honestly I’m considering switching back to my normal desktop because something on the Ubuntu default shell appears to mess with Java, and I can’t even run the VM at the right screen size. I have also considered running this in a Hyper-V virtual machine on my Gamestation, but the problem with that appears to be graphics acceleration: installing OpenSUSE onto it was very fast, but trying to use it was terribly sloppy. I guess the VM option is a bit nicer in the sense that I can just save it to power off the computer, as I did to add the second SSD to the NUC.

After spending the weekend on it, and making some kind of progress, and printing out some of the code to read it on paper in front of the TV with pen and marker, well… I think I’m liking the idea of this but it’ll take me quite a while, alone, to come up with enough of a description that it can be implemented cleanroom. I’ll share more details on that later. For the most part, I felt like I was for the first time cooking something that I’ve only seen made in the Great British Bake Off — because I kept reading the reports that other (much more experienced) people wrote and published, particularly reversing router firmwares.

I also, for once, found a good reason to use YouTube for tutorials. This video by MalwareTech (yes the guy who got arrested after shutting WannaCry down by chance) was a huge help to figure out features I didn’t even know I wanted, including the “Assume Register” option. Having someone who knows what he’s doing explore a tool I don’t know was very helpful, and indeed it felt like Adam Savage describing his workshop tools — a great way to learn about stuff you didn’t know you needed.

My hope is that by adding this tool to my toolbox – like Adam Savage indeed says in his Every Tool’s A Hammer (hugely recommended reading, by the way) – is that I’ll be able to use it not just to solve the mystery of the Libre 2’s encryption. But also that of the nebulous Libre 1 binary protocol, which I never figured out (there’s a few breadcrumbs I already found during my Ghidra weekend). And maybe even to figure out the protocol of one of the gaming mice I have at home, which I never completed either.

Of course all of this assumes I have a lot more free time than I have had for the past few years. But, you know, it’s something that I might have ideas about.

Also as a side note: while doing the work to figure out which address belongs to what, and particularly figure out the jumps through vtables and arrays of global objects (yeah that seems to be something they are doing), I found myself needing to do a lot of hexadecimal calculations. And while I can do conversions from decimal to binary in my head fairly easily, hex is a bit too much for me. I have been using the Python interactive interpreter for that, but that’s just too cumbersome. Instead, I decided to get myself a good old physical calculator — not least because the Android calculator is not able to do hex, and it seems like there’s a lack of “mid range” calculators: you get TI-80 emulators fairly easily, but most of the simplest calculators don’t have hex. Or they do, but they are terrible at it.

I looked up on Amazon for the cheapest scientific calculator that I could see the letters A-F on, and ordered a Casio fx-83GT X — that was a mistake. When it arrived, I realized that I didn’t pay attention to finding one with the hex key on it. The fx-83GT does indeed have the A-F inputs — but they are used for defining variables only, and the calculator does not appear to have any way to convert to hexadecimal nor to do hexadecimal-based operations. Oops.

Instead I ordered a Sharp WriteView EL-W531, which supports hex just fine. It has slightly smaller, but more satisfying, keys, but it’s yet another black gadget on my table (the Casio is light blue). I’ll probably end up looking out for a cute sticker to put on it to see it when I close it for storage.

And I decided to keep the Casio as well — not just because it’s handy to have a calculator at home when doing paperwork, even with all the computers around, but also because it might be interesting to see how much of the firmware is downloadable, and whether someone has tried flashing a different model’s firmware onto it, to expand its capabilities: I can’t believe the A-F keys are there just for the sake of variables, my guess is that they are there because the same board/case is used by a higher model that does support hex, and I’d expect that the only thing that makes it behave one way or the other is the firmware — or even just flags in it!

At any rate, expect more information about the Libre 2 later on this month or next. And if I decide to spend more time on the Casio as well, you’ll see the notes here on the blog. But for now I think I want to get at least some of my projects closer to completion.

USB capturing in 2020

The vast majority of the glucometer devices I reverse the protocol of use USB to connect to a computer. You could say that all of those that I successfully reversed up to now are USB based. Over the years, the way I capture USB packets to figure out a protocol changed significantly, starting from proprietary Windows-based sniffers, and more recently involving my own opensource trace tools. The process evolution was not always intentional — in the case of USBlyzer, it was pretty much dead a few years after I started using it, plus the author refused to document the file format, and by then even my sacrificial laptop was not powerful enough to keep running all the tools I needed.

I feel I’m close to another step on the evolution of my process, and once again it’s not because of me looking to improve the process as much as is the process not working on modern tools. Let me start by explaining what the situation is, because there are two nearly separate issues at play here.

The first issue is that either OpenSuse or the kernel changed the way the debugfs is handled. For those who have not looked at this before, debugfs is what lives in /sys/kernel/debug, and provides the more modern interface for usbmon access; the old method via /dev/usbmonX is deprecated, and Wireshark will not even show up the ability to capture USB packets without debugfs. Previously, I was able to manually change the ownership of the usbmon debugfs paths to my user, and started Wireshark as user to do the capturing, but as of January 2020, it does not seem to be possible to do that anymore: the debugfs mount is only accessible to root.

Using Wireshark as root is generally considered a really bad idea, because it has a huge attack surface, in particular when doing network captures, where the input would literally be to the discretion of external actors. It’s a tinsy bit safer when capturing USB because even when the device is fairly unknown, the traffic is not as controllable, so I would have flinched, but not terribly, to use Wireshark as root — except that I can’t sudo wireshark and have it paint on X. So the remaining alternative is to use tshark, which is a terminal utility that implements the same basics as Wireshark.

Unfortunately here’s the second problem: the last time I ran a lot of captures was when I was working on the Beurer glucometer (which I still haven’t gotten back to, because Linux 5.5 is still unreleased at the time of writing, and that’s the first version that’s not going to go into a reset loop with the device), and I was doing that work from my laptop, and that’s relevant. While the laptop’s keyboard and touchpad are USB, the ports are connected to a different bus internally. Since usbmon interfaces are set by bus, that made it very handy: I only needed to capture on the “ports” bus, and no matter how much and what I typed, it wouldn’t interfere in my captures at all.

You can probably see where this is going: I’m now using a NUC on my desk, with an external keyboard and the Elecom trackball (because I did manage to hurt my wrist while working on the laptop, but that’s a story for another post). And now all the USB 2.0 ports are connected to the same bus. Capturing the bus means getting all the events for keypresses, mouse movements, and so on.

If you have some experience with tcpdump or tshark, you’d think that this is an easy problem to solve: it’s not uncommon having to capture network packets from an SSH connection, which you want to exclude from the capture itself. And the solution for that is to apply a capture filter, such as port not 22.

Unfortunately, it looks like libpcap (which means Wireshark and tshark) does not support capture filters on usbmon. The reasoning provided is that since the capture filters for network are implemented in BPF, there’s no fallback for usbmon that does not have any BPF capabilities in the kernel. I’m not sure about the decision, but there you go. You could also argue that adding BPF to usbmon would be interesting to avoid copying too much data from the kernel, but that’s not something I have particular interest in exploring right now.

So how do you handle this? The suggested option is to capture everything, then use Wireshark to select a subset of packets and save the capture again. This should allow you to have a limited capture that you can share without risking having shared a keylogger off your system. But it also made me think a bit more.

The pcapng format, which Wireshark stores usbmon captures in, is a fairly complicated one, because it can include a lot of different protocol information, and it has multiple typed blocks to store things like hardware interface descriptions. But for USB captures, there’s not much use in the format: not only the Linux and Windows captures (the latter via usbpcap) are different formats altogether, but also the whole interface definition is, as far as I can tell, completely ignored. Instead, if you need a device descriptor, you need to scan the capture for a corresponding request (which usbmon-tools now does.)

I’m now considering just providing a simpler format to store captured data with usbmon-tools, either a simple 1:1 conversion from pcapng, with each packet just size-prefixed, and a tool to filter down the capture on the command line (because honestly, having to load Wireshark to cut down a capture is a pain), or a more complicated format that can store the descriptors separately, and maybe bundle/unbundle them across captures so that you can combine multiple fragments later. If I was in my bubble, I would be using protocol buffers, but that’s not particularly friendly to integrate in a Python module, as far as I can tell. Particularly if you want to be able to use the tools straight out of the git clone.

I guess that since I’m already using construct, I could instead design my own simplistic format. Or maybe I could just bite the bullet, use base64-encoded bytearrays, and write the whole capture session out in JSON.

As I said above, pcapng supports Windows and Linux captures differently: on Linux, the capture format is effectively the wire format of usbmon, while on Linux, it’s the format used by usbpcap. While I have not (yet, at the time of writing) added support to usbmon-tools to load the usbpcap captures, I don’t see why it shouldn’t work out that way. If I do manage to load usbpcap files, though, I would need a custom format to copy these to.

If anyone has a suggestion I’m open to them. One thing that I may try is to use Protocol Buffers but submit the generated source files to parse and serialize the object.

FreeStyle Libre 2: encrypted protocol notes

When I had to write (in a hurry) my comments on Abbott’s DMCA notice to GitHub, I note that I have not had a chance to get my hands onto the Libre 2 system yet, because it’s not sold in the UK. After that, Benjamin from MillionFriends reached out to me to hear if I would be interested in giving a try to figure out how the reader itself speaks with the software. I gladly obliged, and spent most of the time I had during a sick week to get an idea of where we are with this.

While I would lie if I said that we’re now much closer to be able to download data from a Libre 2 reader, I can at least say that we have some ideas of what’s going on right now. So let me try to introduce the problem a second, because there’s a lot of information out there, and while there’s some of it that is quite authoritative (particularly as a few people have been reverse engineering the actual binaries), a lot of it is guesswork and supposition.

The Libre and Libre 2 systems are very similar — they both have sensors, and they both have readers. Technically, you could say that the mobile app (for Android or iOS) also makes up part of the system. The DMCA notice from Abbott that stirred so much trouble above was related to modifications to the mobile application — but luckily for me, my projects and my interests lay quite far away from that. The sensors can be “read” by their respective reader devices, or with a phone with the correct app on it. When reading the sensors with the reader, the reader itself stores the historical data and a bunch more information. You can download that data from the reader onto a computer with Windows or macOS with the official software from Abbott (assuming you can download a version of it that works for you, I couldn’t find a good download page for this on the UK website, but I was provided an EU version of the software as well.)

For the Libre system, I have attempted reversing the protocol, but ultimately it was someone else contributing the details of an usable protocol that I implemented in glucometerutils. For the Libre 2, the discussion already suggested it wouldn’t be the same protocol, and that encryption was used by the Libre 2 reader. As it turns out, Abbott really does not appear to appreciate customers having easy access to their data, or third parties building tools around their devices, and they started encrypting the communication between the sensors and the reader or app in the new system.

So what’s the state with the Libre 2? Well, one of the good news is that the software that I was given works with both the Libre and Libre 2 systems, so I could compare the USB captures on both systems, and that will (as I’ll show in a moment) help significantly. It also showed that the basics of the Abbott HID protocol were maintained: most of the handshake, the message types and the maximum message size. Unfortunately it was clear right away that indeed most of the messages exchanged were encrypted, because the message length made no sense (anything over 62 as length was clearly wrong).

Now, the good news is that I have built up enough interfaces in usbmon-tools that I could use to build a much more reusable extractor for the FreeStyle protocol, and introduce more of the knowledge of the encryption to it, so that it can be used for others. Being able to release these extraction tools I write, instead of just using them myself and letting them rot, was my primary motivation behind building usbmon-tools, so you could say that that’s an achieved target.

So earlier I said that I was lucky the software works with both the Libre and Libre 2 readers. The reason why that was luck, it’s because it shows that the sequence of operations between the two is nearly the same, and that some of the messages are not actually encrypted on the Libre 2 either (namely, the keepalive messages). Here’s the handshake from my Libre 1 reader:

[ 04] H>>D 00000000: 

[ 34] H<<D 00000000: 16                                                . 

[ 0d] H>>D 00000000: 00 00 00 02                                       .... 

[ 05] H>>D 00000000: 

[ 15] H>>D 00000000: 

[ 06] H<<D 00000000: 4A 43 4D 56 30 32 31 2D  54 30 38 35 35 00        JCMV021-T0855. 

[ 35] H<<D 00000000: 32 2E 31 2E 32 00                                 2.1.2. 

[ 01] H>>D 00000000: 

[ 71] H<<D 00000000: 01                                                . 

[ 21] H>>D 00000000: 24 64 62 72 6E 75 6D 3F                           $dbrnum? 

[ 60] H<<D 00000000: 44 42 20 52 65 63 6F 72  64 20 4E 75 6D 62 65 72  DB Record Number
[ 60] H<<D 00000010: 20 3D 20 33 37 32 39 38  32 0D 0A 43 4B 53 4D 3A   = 372982..CKSM:
[ 60] H<<D 00000020: 30 30 30 30 30 37 36 31  0D 0A 43 4D 44 20 4F 4B  00000761..CMD OK
[ 60] H<<D 00000030: 0D 0A                                             .. 

[ 0a] H>>D 00000000: 00 00 37 C6 32 00 34                              ..7.2.4 

[ 0c] H<<D 00000000: 01 00 18 00                                       .... 

Funnily enough, while this matches the sequence that Xavier described for the Insulinx, and that I always reused for the other devices too, I found that most of this exchange is for the original software to figure out which device you connected. And since my tools require you to know which device you’re using, I actually cleaned up the FreeStyle support code in glucometerutils to shorten the initialization sequence.

To describe the sequence in prose, the software is requesting the serial number and software version of the reader (commands 0x05 and 0x15), then initializing (0x01) and immediately using the text command $dbrnum? to know how much data is stored on the device. Then it starts using the “binary mode” protocol that I started on years ago, but never understood.

For both the systems, I captured the connection establishment, from when the device is connected to the Windows virtual machine to the moment when you can choose what to do. The software is only requesting a minimal amount of data, but it’s still quite useful for comparison. Indeed, you can see that after using the binary protocol to fetch… something, the software sends a few more text commands to confirm the details of the device:

[ 0b] H<<D 00000000: 12 3C AD 93 0A 18 00 00  00 00 00 9C 2D EA 00 00  .<..........-...
[ 0b] H<<D 00000010: 00 00 00 0E 00 00 00 00  00 00 00 7C 2A EA 00 00  ...........|*...
[ 0b] H<<D 00000020: 00 00 00 16 00 00 00 00  00 00 00 84 2D EA 00 21  ............-..!
[ 0b] H<<D 00000030: C2 25 43                                          .%C 

[ 21] H>>D 00000000: 24 70 61 74 63 68 3F                              $patch? 

[ 0d] H>>D 00000000: 3D 12 00 00                                       =... 

[ 60] H<<D 00000000: 4C 6F 67 20 45 6D 70 74  79 0D 0A 43 4B 53 4D 3A  Log Empty..CKSM:
[ 60] H<<D 00000010: 30 30 30 30 30 33 36 38  0D 0A 43 4D 44 20 4F 4B  00000368..CMD OK
[ 60] H<<D 00000020: 0D 0A                                             .. 

[ 21] H>>D 00000000: 24 73 6E 3F                                       $sn? 

[ 60] H<<D 00000000: 4A 43 4D 56 30 32 31 2D  54 30 38 35 35 0D 0A 43  JCMV021-T0855..C
[ 60] H<<D 00000010: 4B 53 4D 3A 30 30 30 30  30 33 32 44 0D 0A 43 4D  KSM:0000032D..CM
[ 60] H<<D 00000020: 44 20 4F 4B 0D 0A                                 D OK.. 

[ 21] H>>D 00000000: 24 73 77 76 65 72 3F                              $swver? 

[ 60] H<<D 00000000: 32 2E 31 2E 32 0D 0A 43  4B 53 4D 3A 30 30 30 30  2.1.2..CKSM:0000
[ 60] H<<D 00000010: 30 31 30 38 0D 0A 43 4D  44 20 4F 4B 0D 0A        0108..CMD OK.. 

My best guess on why it’s asking again for serial number and software version, is that the data returned during the handshake is only used to select which “driver” implementation to use, while this is used to actually fill in the descriptor to show to the user.

If I look at the capture of the same actions with a Libre 2 system, the initialization is not quite the same:

[ 05] H>>D 00000000: 

[ 06] H<<D 00000000: 4D 41 47 5A 31 39 32 2D  4A 34 35 35 38 00        MAGZ192-J4558. 

[ 14] H>>D 00000000: 11                                                . 

[ 33] H<<D 00000000: 16 B1 79 F0 A1 D8 9C 6D  69 71 D9 1A C0 1A BC 7E  ..y....miq.....~ 

[ 14] H>>D 00000000: 17 6C C8 40 58 5B 3E 08  A5 40 7A C0 FE 35 91 66  .l.@X[>..@z..5.f
[ 14] H>>D 00000010: 2E 01 37 88 37 F5 94 71  79 BB                    ..7.7..qy. 

[ 33] H<<D 00000000: 18 C5 F6 DF 51 18 AB 93  9C 39 89 AC 01 DF 32 F0  ....Q....9....2.
[ 33] H<<D 00000010: 63 A8 80 99 54 4A 52 E8  96 3B 1B 44 E4 2A 6C 61  c...TJR..;.D.*la
[ 33] H<<D 00000020: 00 20                                             .  

[ 04] H>>D 00000000: 

[ 0d] H>>D 00000000: 00 00 00 02                                       .... 

[ 34] H<<D 00000000: 16                                                . 

[ 05] H>>D 00000000: 

[ 15] H>>D 00000000: 

[ 06] H<<D 00000000: 4D 41 47 5A 31 39 32 2D  4A 34 35 35 38 00        MAGZ192-J4558. 

[ 35] H<<D 00000000: 31 2E 30 2E 31 32 00                              1.0.12. 

[ 01] H>>D 00000000: 

[ 71] H<<D 00000000: 01                                                . 

[x21] H>>D 00000000: 66 C2 59 40 42 A5 09 07  28 45 34 F2 FB 2E EC B2  f.Y@B...(E4.....
[x21] H>>D 00000010: A0 BB 61 8D E9 EE 41 3E  FC 24 AD 61 FB F6 63 34  ..a...A>.$.a..c4
[x21] H>>D 00000020: 7B 7C 15 DB 93 EA 68 9F  9A A4 1E 2E 0E DE 8E A1  {|....h.........
[x21] H>>D 00000030: D6 A2 EA 53 45 2F A8 00  00 00 00 17 CF 84 64     ...SE/........d 

[x60] H<<D 00000000: 7D C1 67 28 0E 31 48 08  2C 99 88 04 DD E1 75 77  }.g(.1H.,.....uw
[x60] H<<D 00000010: 34 5A 88 CA 1F 6D 98 FD  79 42 D3 F2 4A FB C4 E8  4Z...m..yB..J...
[x60] H<<D 00000020: 75 C0 92 D5 92 CF BF 1D  F1 25 6A 78 7A F7 CE 70  u........%jxz..p
[x60] H<<D 00000030: C2 0F B9 A2 86 68 AA 00  00 00 00 F9 DE 0A AA     .....h......... 

[x0a] H>>D 00000000: 9B CA 7A AF 42 22 C6 F2  8F CA 0E 58 3F 43 9C AB  ..z.B".....X?C..
[x0a] H>>D 00000010: C7 4D 86 DF ED 07 ED F4  0B 99 D8 87 18 B5 8F 76  .M.............v
[x0a] H>>D 00000020: 69 50 4F 6C CE 86 CF E1  6D 9C A1 55 78 E0 AF DE  iPOl....m..Ux...
[x0a] H>>D 00000030: 80 C6 A0 51 38 32 8D 01  00 00 00 62 F3 67 2E     ...Q82.....b.g. 

[ 0c] H<<D 00000000: 01 00 18 00                                       .... 

[x0b] H<<D 00000000: 80 37 B7 71 7F 38 55 56  93 AC 89 65 11 F6 7F E6  .7.q.8UV...e....
[x0b] H<<D 00000010: 31 03 3E 15 48 7A 31 CC  24 AD 02 7A 09 62 FF 9C  1.>.Hz1.$..z.b..
[x0b] H<<D 00000020: D4 94 02 C9 5F FF F2 7B  3B AC F0 F7 99 1A 31 5A  ...._..{;.....1Z
[x0b] H<<D 00000030: 00 B8 7B B7 CD 4D D4 01  00 00 00 E2 D4 F1 13     ..{..M......... 

The 0x14/0x33 command/response are new — and they clearly are used to set up the encryption. Indeed, trying to send out a text command without having issued these commands has the reader respond with a 0x33 reply that I interpret as a “missing encryption” error.

But you can also see that there’s a very similar structure to the commands: after the initialization, there’s an (encrypted) text command (0x21) and response (0x60), then there’s an encrypted binary command, and more encrypted binary responses. Funnily enough, that 0x0c response is not encrypted, and the sequence of responses of the same type is very similar between the Libre 1 and Libre 2 captures as well.

The similarities don’t stop here. Let’s look at the end of the capture:

[x0b] H<<D 00000000: A3 F6 2E 9D 4E 13 68 EB  7E 37 72 97 6C F9 7B D6  ....N.h.~7r.l.{.
[x0b] H<<D 00000010: 1F 7B FB 6A 15 A8 F9 5F  BD EC 87 BC CF 5E 16 96  .{.j..._.....^..
[x0b] H<<D 00000020: EB E7 D8 EC EF B5 00 D0  18 69 D5 48 B1 D0 06 A6  .........i.H....
[x0b] H<<D 00000030: 30 1E BB 9B 04 AC 93 DE  00 00 00 B6 A2 4D 23     0............M# 

[x21] H>>D 00000000: CB A5 D7 4A 6C 3A 44 AC  D7 14 47 16 15 40 15 12  ...Jl:D...G..@..
[x21] H>>D 00000010: 8B 7C AF 15 F1 28 D1 BE  5F 38 5A 4E ED 86 7D 20  .|...(.._8ZN..} 
[x21] H>>D 00000020: 1C BA 14 6F C9 05 BD 56  63 FB 3B 2C EC 9E 3B 03  ...o...Vc.;,..;.
[x21] H>>D 00000030: 50 B1 B4 D0 F6 02 92 14  00 00 00 CF FA C2 74     P.............t 

[ 0d] H>>D 00000000: DE 13 00 00                                       .... 

[x60] H<<D 00000000: CE 96 6D CD 86 27 B4 AC  D9 46 88 90 C0 E7 DB 4A  ..m..'...F.....J
[x60] H<<D 00000010: 8D CC 8E AA 5F 1B B6 11  4E A0 2B 08 C0 01 D5 D3  ...._...N.+.....
[x60] H<<D 00000020: 7A E9 8B C2 46 4C 42 B8  0C D7 52 FA E0 8F 58 32  z...FLB...R...X2
[x60] H<<D 00000030: DE 6C 71 3F BE 4E 9A DF  00 00 00 7E 38 C6 DB     .lq?.N.....~8.. 

[x60] H<<D 00000000: 11 06 1C D2 5A AC 1D 7E  E3 4C 68 B2 83 73 DF 47  ....Z..~.Lh..s.G
[x60] H<<D 00000010: 86 05 4E 81 99 EC 29 EA  D8 79 BA 26 1B 13 98 D8  ..N...)..y.&....
[x60] H<<D 00000020: 2D FA 49 4A DF DD F9 5E  2D 47 29 AB AE 0D 52 77  -.IJ...^-G)...Rw
[x60] H<<D 00000030: 2E EB 42 EC 7E CF BB E0  00 00 00 FE D4 DC 7E     ..B.~.........~ 

… Yeah many more encrypted messages …

[x60] H<<D 00000000: 53 FE E5 56 01 BB C2 A7  67 3E A6 AB DB 8E B7 13  S..V....g>......
[x60] H<<D 00000010: 6D F7 80 5C 06 23 09 3E  49 B4 A7 8B D3 61 92 C9  m..\.#.>I....a..
[x60] H<<D 00000020: 72 1D 5A 04 AE E3 3E 05  2E 1B C7 7C 42 2D F8 42  r.Z...>....|B-.B
[x60] H<<D 00000030: 37 88 7E 16 D9 34 8B E9  00 00 00 11 EE 42 05     7.~..4.......B. 

[x21] H>>D 00000000: 01 84 3F 02 36 1E A6 82  E2 C5 BF C2 40 78 B9 CD  ..?.6.......@x..
[x21] H>>D 00000010: E9 55 17 BE E9 16 8A 52  D2 D9 85 69 E4 D5 96 7A  .U.....R...i...z
[x21] H>>D 00000020: 55 6D DF 2E AF 96 36 53  64 C5 C7 D1 B6 6F 1A 1A  Um....6Sd....o..
[x21] H>>D 00000030: 4F 2F 25 FF 58 F4 EE 15  00 00 00 F6 9A 52 64     O/%.X........Rd 

[x60] H<<D 00000000: 19 F4 D4 F0 66 11 E3 CE  47 DE 82 87 22 48 3C 8D  ....f...G..."H<.
[x60] H<<D 00000010: BA 2D C0 37 12 25 CD AB  3A 58 C2 C4 01 88 60 21  .-.7.%..:X....`!
[x60] H<<D 00000020: 15 1E D1 EE F2 90 36 CA  B0 93 92 34 60 F5 89 E0  ......6....4`...
[x60] H<<D 00000030: 64 3C 20 39 BF 4C 98 EA  00 00 00 A1 CE C5 61     d< 9.L........a 

[x21] H>>D 00000000: D5 89 18 22 97 34 CB 6E  76 C5 5A 23 48 F4 5E C6  ...".4.nv.Z#H.^.
[x21] H>>D 00000010: 0E 11 0E C9 51 BD 40 D7  81 4A DF 8A 0B EF 28 82  ....Q.@..J....(.
[x21] H>>D 00000020: 1F 14 47 BC B8 B8 FA 44  59 7A 86 14 14 4B D7 0F  ..G....DYz...K..
[x21] H>>D 00000030: 37 48 CC 1F C5 A2 9E 16  00 00 00 00 A3 EE 69     7H............i 

[x60] H<<D 00000000: 62 33 4B 90 3B 68 3A D1  01 B1 15 4C 48 A1 6E 20  b3K.;h:....LH.n 
[x60] H<<D 00000010: 12 6F BC D5 50 33 9E C3  CC 35 4E C8 46 81 3E 6B  .o..P3...5N.F.>k
[x60] H<<D 00000020: 96 17 DF D5 8C 22 5C 3A  B7 52 C2 D9 37 71 B7 E2  ....."\:.R..7q..
[x60] H<<D 00000030: 5F C4 88 81 2A 91 65 EB  00 00 00 69 E2 A8 DE     _...*.e....i... 

These are once again text commands. In particular one of them gets a response that is long enough to span multiple encrypted text responses (0x60). Given that the $patch? command on the Libre 1 suggested it’s a multirecord command, it might be that the Libre 2 actually has a long list of patches.

So my best guess of this is that, aside for the encryption, the Libre 2 and Libre 1 systems are actually pretty much the same. I’m expecting that the only thing between us and being able to download the data out of a Libre 2 system is to figure out the encryption scheme and whether we need to extract keys to be able to do so. In the latter case that is something we should proceed carefully with, because it’s probably going to be the way Abbott is going to enforce their DMCA requests.

What do we know about the encryption setup, then? Well, I had a theory, but then it got completely trashed. I still got some guesses that for now appear solid.

First of all, the new 0x14/0x33 command/reply: this is called multiple time by the software, and the reader uses the 0x33 response to tell you either the encryption is missing or wrong, so I’m assuming these commands are encryption related. But since there’s more than one meaning for these commands, it looks like the first byte for each of these selects a “sub-command”.

The 0x14,0x11 command appears to be the starting point for the encryption; the device responds with what appears to be 15 random bytes. Not 16! The first byte again appears to be a “typing” specification and is always 0x16. You could say that the 0x14,0x11 command gets a 0x33,0x16 response. In the first three captures I got from the software, the device actually sent exactly the same bytes. Then it started giving a different response for each time I called it. So I guess it might be some type of random value, that needs some entropy to be re-generated. Maybe it’s a nonce for the encryption?

The software then sends a 0x14,0x17 command, which at first seemed to have a number of constant bytes in positions, but now I’m not so sure. I guess I need to compare more captures for that to be the case. But because of the length, there’s at most 25 bytes that are sent to the device.

The 0x33,0x18 response comes back, and it includes 31 bytes, but the last two appear to be constant.

Also if I compare the three captures, two that received the same 0x33,0x16 response, and one that didn’t, there are many identical bytes between the two with the same response (but not all of them!), and very few with the third one. So it sounds like either this is a challenge-response that uses the provided nonces, or it actually uses that value to do the key derivation.

If you’re interested in trying to figure out the possible encryption behind this, the three captures are available on GitHub. And if you find anything else that you want to share with the rest of the people looking at this, please let us know.

Abbott, the Libre 2, and the takedown

A few people today messaged and mentioned me on twitter regarding the news that Abbott has requested the takedown of something related to their Libre 2. I gave a quick hot take on this on Twitter, but I guess it’s worth having something in long form to be referenced, since I’m sure this will be talked about a lot more, not least because of the ominous permalink chosen by Boing Boing (“they-literally-own-you”) and the fact that, game of telephone style, the news went from the original takedown, to Reddit phrasing it as “Abbott asserts copyright on your data”, which is both silly and untrue.

So let’s start with a bit of background, that most of the re-posters of this story probably don’t know much about. The Libre 2 is an upgrade on the FreeStyle Libre system that I wrote a lot about and that I use daily. It comes with both a reader device and with support in the LibreLink app for both Android and (on more recent iPhones) iOS. The main difference with the Libre system is that the sensors provide both NFC and BLE capabilities, with the ability to proactively notify of high- or low-blood sugar conditions, that the old NFC-only sensors cannot provide, which is more similar to CGM solutions like Dexcom‘s.

In both the Libre and Libre 2 systems, the sensors don’t report blood sugar values, like in most classic glucometers. Instead they report a number of “raw” values, including from a number of temperature sensors. There’s a great explanation of these from Pierre Vandevenne, here and here. To get a real blood sugar measurement, you need to apply some algorithm, that Abbott still refines. The algorithm is what I usually refer to as “secret sauce”, and is implemented in both the reader’s firmware and the LibreLink app itself.

Above I used the word “something” to refer to what was taken down. The reason why I say that is that Boing Boing in the title straight up calls this a “tool” — but when you read the linked post from the affected person, it is described as “details of how to patch the LibreLink app”. Since I have not seen what the repository was before it was taken down, I have no idea which one to believe exactly. In either case, it looks like Abbott does not like someone to effectively leverage their “secret sauce” to use in a different application, but in particular, it does not look like we’re talking about something like glucometerutils, that implemented the protocol “clean”, without derivation off the original software.

Indeed, Boing Boing seems to make a case that this is equivalent of implementing a file format: «[…] just because Apple’s Pages can read Word docs, it doesn’t mean that Pages is a derivative of MS Office.» Except that it’s not as clear cut. If you implemented support for one format by copying the implementation code into your software, that actually would make it a derivative work, quite obviously. In this case, if I am to believe the original report instead, the taken down content were instructions to modify Abbott’s app — and not a redistribution of it. Since I’m not a lawyer, I have no idea where that stands, but it’s clearly not as black-and-white as Boing Boing appears to make it.

As I said on twitter, this does not affect either of my projects, since neither is relying on the original software, and are rather descriptions of the protocols. They also don’t include any information or support for the Libre 2, since the protocol appears to have changed. There’s an open issue with discussion, but it also appears that this time Abbott is using some encryption on the protocol. And that might be an interesting problem, as someone might have to get up close and personal with the code to figure that part out — but if that’s the case, we’re back at needing a clean-room design for implementing it.

I also want to quote Pierre explicitly from the posts I linked above:

[…] in the Libre FRAM, what we are seeing is a real “raw” signal. While the measure of the glucose signal itself is fairly reliable, it is heavily post-processed by the Libre firmware. Specifically – and in no particular order – temperature compensation, delay compensation, de-noising… all play a role. That understanding and, to some extent, my MD training, led me to extreme caution and prevented me from releasing my “solution”, which I knew to be both incomplete and unable to handle some error conditions.

The main driver behind my decision was the well known “first do no harm” (primum non nocere) motto, an essential part of the Hippocratic Oath which I symbolically took. I still stick by it today. […]

[…]

Today, there are a lot of add-on devices that aim to transform the Libre into a full CGM. To be honest, in general, I do not like either the results they provide or their (in)convenience. None of those I have tried delivered results that would lead to an approval by a regulatory agency, none of them were stable for long periods of time. But, apparently, patients still feel they are helpful and there is now a thriving community that aims at improving them.

Pierre Vandevenne

While I have not sworn a Hippocratic Oath myself, I have similar concerns to Pierre, and I have explicitly avoided documenting the sensors’ protocol, and I won’t be merging code that tries to read them directly, even if provided.

And when it comes to copyright issues, I do weigh them fairly heavily: they are the fundamental way that Free Software even works, by respecting licenses. So I will prefer someone to provide me with the description of Abbott’s encryption protocol, rather than an implementation of it where I may be afraid of a “poisonous tree.”

Glucometer notes: GlucoRx Nexus

This is a bit of a strange post, because it would be a glucometer review, except that I bought this glucometer a year and a half ago, teased a review, but don’t actually remember if I ever wrote any notes for it. While I may be able to get a new feel for the device to write a review, I don’t even know if the meter is still being distributed, and a few of the things I’m going to write here suggest me that it might not be the case, but who knows.

I found the Nexus as an over-the-counter boxed meter at my local pharmacy, in London. To me it appears like the device was explicitly designed to be used by the elderly, not just because of the large screen and numbers, but also because it comes with a fairly big lever to drop out the test strip, something I had previously only seen in the Sannuo meter.

This is also the first meter I see with an always-on display — although it seems that the backlight turns on only when the device is woken up, and otherwise is pretty much unreadable. I guess they can afford this type of display given that the meter is powered by 2 AAA batteries, rather than CR2032 like others.

As you may have guessed by now from the top link about the teased review, this is the device that uses a Silicon Labs CP2110 HID-to-UART adapter, for which I ended up writing a pyserial driver, earlier this year. The software to download the data seems to be available from the GlucoRx website for Windows and Mac — confusingly, the website you actually download the file from is not GlucoRx’s but Taidoc’s. TaiDoc Technology Corporation being named on the label under the device, together with MedNet GmbH. A quick look around suggests TaiDoc is a Taiwanese company, and now I’m wondering if I’m missing a cultural significance around the test strips, or blood, and the push-out lever.

I want to spend a couple notes about the Windows software, which is the main reason why I don’t know if the device is still being distributed. The download I was provided today was for version 5.04.20181206 – which presumes the software was still being developed as of December last year – but it does not seem to be quite tested to work on Windows 10.

The first problem is that that the Windows Defender malware detection tool actually considers the installer itself as malware. I’m not sure why, and honestly I don’t care: I’m only using this on a 90-days expiring Windows 10 virtual machine that barely has access to the network. The other problem, is that when you try to run the setup script (yes, it’s a script, it even opens a command prompt), it tries to install the redistributable for .NET 3.5 and Crystal Reports, fail and error out. If you try to run the setup for the software itself explicitly, you’re told you need to install .NET 3.5, which is fair, but then it opens a link from Microsoft’s website that is now not found and giving you a 404. Oops.

Setting aside these two annoying, but not insurmountable problems, what remains is to figure out the protocol behind the scenes. I wrote a tool that reads a pcapng file and outputs the “chatter”, and you can find it in the usbmon-tools repository. It’s far from perfect and among other things it still does not dissect the actual CP2110 protocol — only the obvious packets that I know include data traffic to the device itself.

This is enough to figure out that the serial protocol is one of the “simplest” that I have seen. Not in the sense of being easy to reverse, but rather in term of complexity of the messages: it’s a ping-pong protocol with fixed-length 8-bytes messages, of which the last one is a simple checksum (sum-modulo-8-bit), a fixed start byte of 0x51, and a fixed end with a bit for host-to-device and device-to-host selection. Adding to the first nibble of the message to always have the same value (2), it brings down the amount of data to be passed for each message to 34-bit. Which is a pretty low amount of information even when looking at simple information as glucose readings.

At any rate, I think I already have a bit of the protocol figured out. I’ll probably finish it over the next few days and the weekend, and then I’ll post the protocol in the usual repository. Hopefully if there are other users of this device they can be well served by someone writing a tool to download the data that is not as painful to set up as the original software.

Introducing usbmon-tools

A couple of weeks ago I wrote some notes about my work in progress to implement usbmon captures handling code, and pre-announced I was going to publish more of my extraction/inspection scripts.

The good news is that the project is now released, and you can find it on GitHub as usbmon-tools with an Apache 2.0 license, and open to contributions (with a CLA, sorry about that part). This is the first open source project I release using my employer’s releasing process (for other projects, I used the IARC process instead), and I have to say I’m fairly pleased with the results.

This blog post is meant mostly as a way to explain what’s going on my head regarding this project, with the hope that contributors can help it become reality. Or that they can contribute other ideas to it, even when they are not part of my particular plans.

I want to start with a consideration on the choice of language. usbmon-tools is written in Python 3. And in particular it is restricted to Python 3.7, because I wanted to have access to type annotations, which I found extremely addictive at work. I even set up Travis CI to run mypy as part of the integration tests for the repository.

For other projects I tend to be more conservative, and wait for Debian stable to have a certain version before requiring that as a minimum, but as this is a toolset for developers primarily, I’m going to expect its public to be able to deal with Python 3.7 as the requirement. This version was released nearly a year ago, and that should be plenty of time for people to have one at hand.

As for what the project should achieve in my view, is an easy way for developers to dissect an USB snooping trace. I started by building a simplistic tool that recreates a text format trace from the pcapng file, based on the official documentation of usbmon in the kernel (I have some patches to improve on that, too, but that probably will become a post in by itself next week). It’s missing isochronous support, and it’s not totally tested, but it at least gave me a few important insight on the format itself, including the big caveat that the “id” (or tag) of the URBs is not unique.

Indeed, I think that alone is one of the most important pieces of the puzzle in the library: in addition to parsing the pcapng file itself, the library can re-tag the events so that they get a real unique identifier (UUID), making it significantly easier to analyze the traces.

My next steps on the project are to write a more generic tool to convert a USB capture into what I call my “chatter format” (similar to the one I used to discuss serial protocols), and a more specific one that converts HID traces (because HID is a more defined protocol, and we can go a level deeper in exposing this into a human-readable source). I’m also considering if it would be within reach to provide the tool a HID descriptor blob, parse it and have it used to parse the HID traffic based on it. It would make some debugging particularly easier, for instance the stuff I did when I was fixing the ELECOM DEFT trackball.

I would also love to be able to play with a trace in a more interactive manner, for instance by loading this into Jupyter notebook, so that I could try parsing the blobs interactively, but unless someone with more experience with those contributes the code, I don’t expect I’ll have much time for it.

Pull requests are more than welcome!

Working with usbmon captures

Two years ago I posted some notes on how I do USB sniffing. I have not really changed much since then, although admittedly I have not spent much time reversing glucometers in that time. But I’m finally biting the bullet and building myself a better setup.

The reasons why I’m looking for a new setup are multiple: first of all, I now have a laptop that is fast enough to run a Windows 10 VM (with Microsoft’s 90 days evaluation version). Second, the proprietary software I used for USB sniffing has not been updated since 2016 — and they still have not published any information about their CBCF format, despite their reason being stated as:

Unfortunately, there is no such documentation and I’m almost sure will
never be. The reason is straightforward – every documented thing
should stay the same indefinitely. That is very restrictive.

At this point, keeping my old Dell Vostro 3750 as a sacrificial machine just for reverse engineering is not worth it anymore. Particularly when you consider that it started being obsoleted by both software (Windows 10 appears to have lost the ability to map network shares easily, and thus provide local-network backups), and hardware (the Western Digital SSD that I installed on it can’t be updated — their update package only works for UEFI boot systems, and while technically that machine is UEFI, it only supports the CSM boot).

When looking at a new option for my setup, I also want to be able to publish more of my scripts and tooling, if nothing else because I would feel more accomplished by knowing that even the side effects of working on these projects can be reused. So this time around I want to focus on all open source tooling, and build as much of the tools to be suitable for me to release as part of my employer’s open source program, which basically means not include any device-specific information within the tooling.

I started looking at Wireshark and its support for protocol dissectors. Unfortunately it looks like USB payloads are a bit more complicated, and dissector support is not great. So once again I’ll be writing a bunch of Python scripts to convert the captured data into some “chatter” files that are suitable for human consumption, at least. So I started to take a closer look at the usbmon documentation (the last time I looked at this was over ten years ago), and see if I can process that data directly.

To be fair, Wireshark does make it much nicer to get the captures out, since the text format usbmon is not particularly easy to parse back into something you can code with — and it is “lossy” when compared with the binary structures. With that, the first thing to focus on is to support the capture format Wireshark generates, which is pcapng, with one particular (out of many) USB capture packet structures. I decided to start my work from that.

What I have right now, is an (incomplete) library that can parse a pcapng capture into objects that are easier to play with in Python. Right now it loads the whole content into memory, which might or might not be a bad limitation, but for now it will do. I guess it would also be nice if I can find a way to integrate this with Colaboratory, which is a tool I only have vague acquaintance with, but would probably be great for this kind of reverse engineering, as it looks a lot like the kind of stuff I’ve been doing by hand. That will probably be left for the future.

The primary target right now is for me to be able to reconstruct the text format of usbmon given the pcapng capture. This would at least tell me that my objects are not losing details in the construction. Unfortunately this is proving harder than expected, because the documentation of usbmon is not particularly clear, starting from the definition of the structure, that mixes sized (u32) and unsized (unsigned int) types. I hope I’ll be able to figure this out and hopefully even send changes to improve the documentation.

As you might have noticed from my Twitter rants, I maintain that the documentation needs an overhaul. From mention of “easy” things, to the fact that the current suggested format (the binary structures) is defined in terms of the text format fields — except the text format is deprecated, and the kernel actually appears to produce the text format based on the binary structures. There are also quite a few things that are not obviously documented in the kernel docs, so you need to read the source code to figure out what they mean. I’ll try rewriting sections of the documentation.

Keep reading the blog to find updates if you have interests in this.

Reverse Engineering and Serial Adapter Protocols

In the comments to my latest post on the Silicon Labs CP2110, the first comment got me more than a bit upset because it was effectively trying to mansplain to me how a serial adapter (or more properly an USB-to-UART adapter) works. Then I realized there’s one thing I can do better than complain and that is providing even more information on this for the next person who might need them. Because I wish I knew half of what I know now back when I tried to write the driver for ch314.

So first of all, what are we talking about? UART is a very wide definition for any interface that implements serial communication that can be used to transmit between a host and a device. The word “serial port” probably bring different ideas to mind depending on the background of a given person, whether it is mice and modems connected to PCs, or servers’ serial terminals, or programming interfaces for microcontrollers. For the most part, people in the “consumer world” think of serial as RS-232 but people who have experience with complex automation systems, whether it is home, industrial, or vehicle automation, have RS-485 as their main reference. None of that actually matters, since these standards mostly deal with electrical or mechanical standards.

As physical serial ports on computer stopped appearing many years ago, most of the users moved to USB adapters. These adapters are all different between each other and that’s why there’s around 40KSLOC of serial adapters drivers in the Linux kernel (according to David’s SLOCCount). And that’s without counting the remaining 1.5KSLOC for implementing CDC ACM which is the supposedly-standard approach to serial adapters.

Usually the adapters are placed either directly on the “gadget” that needs to be connected, which expose a USB connector, or on a cable used to connect to it, in which case the device usually has a TRS or similar connectors. The TRS-based serial cables appeared to become more and more popular thanks to osmocom as they are relatively inexpensive to build, both as cables and as connectors onto custom boards.

Serial interface endpoints in operating systems (/dev/tty{S,USB,ACM}* on Linux, COM* on Windows, and so on) do not only transfer data between host and device, but also provides configuration of parameters such as transmission rate and “symbol shape” — you may or may not have heard references to something like “9600n8” which is a common way to express the transmission protocol of a serial interface: 9600 symbols per second (“baud rate”), no parity, 8-bit per symbol. You can call these “out of band” parameters, as they are transmitted to the UART interface, but not to the device itself, and they are the crux of the matter of interacting with these USB-to-UART adapters.

I already wrote notes about USB sniffing, so I won’t go too much into detail there, but most of the time when you’re trying to figure out what the control software sends to a device, you start by taking a USB trace, which gives you a list of USB Request Blocks (effectively, transmission packets), and you get to figure out what’s going on there.

For those devices that use USB-to-UART adapters and actually use the OS-provided serial interface (that is, COM* under Windows, where most of the control software has to run), you could use specialised software to only intercept the communication on that interface… but I don’t know of any such modern software, while there are at least a few well-defined interface to intercept USB communication. And that would not work for software that access the USB adapter directly from userspace, which is always the case for Silicon Labs CP2110, but is also the case for some of the FTDI devices.

To be fair, for those devices that use TRS, I actually have considered just intercepting the serial protocol using the Saleae Logic Pro, but beside being overkill, it’s actually just a tiny fraction of the devices that can be intercepted that way — as the more modern ones just include the USB-to-UART chip straight onto the device, which is also the case for the meter using the CP2110 I referenced earlier.

Within the request blocks you’ll have not just the serial communication, but also all the related out-of-band information, which is usually terminated on the adapter/controller rather than being forwarded onto the device. The amount of information changes widely between adapters. Out of those I have had direct experience, I found one (TI3420) that requires a full firmware upload before it would start working, which means recording everything from the moment you plug in the device provides a lot more noise than you would expect. But most of those I dealt with had very simple interfaces, using Control transfers for out-of-band configuration, and Bulk or Interrupt1 transfers for transmitting the actual serial interface.

With these simpler interfaces, my “analysis” scripts (if you allow me the term, I don’t think they are that complicated) can produce a “chatter” file quite easily by ignoring the whole out of band configuration. Then I can analyse those chatter files to figure out the device’s actual protocol, and for the most part it’s a matter of trying between one and five combinations of transmission protocol to figure out the right one to speak to the device — in glucometerutils I have two drivers using 9600n8 and two drivers using 38400n8. In some cases, such as the TI3420 one, I actually had to figure out the configuration packet (thanks to the Linux kernel driver and the datasheet) to figure out that it was using 19200n8 instead.

But again, for those, the “decoding” is just a matter to filtering away part of the transmission to keep the useful parts. For others it’s not as easy.

0029 <<<< 00000000: 30 12                                             0.

0031 <<<< 00000000: 05 00                                             ..

0033 <<<< 00000000: 2A 03                                             *.

0035 <<<< 00000000: 42 00                                             B.

0037 <<<< 00000000: 61 00                                             a.

0039 <<<< 00000000: 79 00                                             y.

0041 <<<< 00000000: 65 00                                             e.

0043 <<<< 00000000: 72 00                                             r.

This is an excerpt from the chatter file of a session with my Contour glucometer. What happens here is that instead of buffering the transmission and sending a single request block with a whole string, the adapter (FTDI FT232RL) sends short burts, probably to reduce latency and keep a more accurate serial protocol (which is important for device that need accurate timing, for instance some in-chip programming interfaces). This would be also easy to recompose, except it also comes with

0927 <<<< 00000000: 01 60                                             .`

0929 <<<< 00000000: 01 60                                             .`

0931 <<<< 00000000: 01 60                                             .`

which I’m somehow sceptical they come from the device itself. I have not paid enough attention yet to figure out from the kernel driver whether this data is marked as coming from the device or is some kind of keepalive or synchronisation primitive of the adapter.

In the case of the CP2110, the first session I captured starts with:

0003 <<<< 00000000: 46 0A 02                                          F..

0004 >>>> 00000000: 41 01                                             A.

0006 >>>> 00000000: 50 00 00 4B 00 00 00 03  00                       P..K.....

0008 >>>> 00000000: 01 51                                             .Q

0010 >>>> 00000000: 01 22                                             ."

0012 >>>> 00000000: 01 00                                             ..

0014 >>>> 00000000: 01 00                                             ..

0016 >>>> 00000000: 01 00                                             ..

0018 >>>> 00000000: 01 00                                             ..

and I can definitely tell you that the first three URBs are not sent to the device at all. That’s because HID (the higher-level protocol that CP2110 uses on top of USB) uses the first byte of the block to identify the “report” it sends or receives. Checking these against AN434 give me a hint of what’s going on:

  • report 0x46 is “Get Version Information” — CP2110 always returns 0x0A as first byte, followed by a device version, which is unspecified; probably only used to confirm that the device is right, and possibly debugging purposes;
  • report 0x41 is “Get/Set UART Enabled” — 0x01 just means “turn on the UART”;
  • report 0x50 is “Get/Set UART Config” — and this is a bit more complex to parse: the first four bytes (0x00004b00) define the baud rate, which is 19200 symbols per second; then follows one byte for parity (0x00, no parity), one for flow control (0x00, no flow control), one for the number of data bits (0x03, 8-bit per symbol), and finally one for the stop bit (0x00, short stop bit); that’s a long way to say that this is configured as 19200n8.
  • report 0x01 is the actual data transfer, which means the transmission to the device starts with 0x51 0x22 0x00 0x00 0x00 0x00.

This means that I need a smarter analysis script that understands this protocol (which may be as simple as just ignoring anything that does not use report 0x01) to figure out what the control software is sending.

And at the same time, it needs code to know how “talk serial” to this device. Usually the out-of-bad configuration is done by a kernel driver: you ioctl() the serial device to the transmission protocol you need, the driver sends the right request block to the USB endpoint. But in the case of the CP2110 device, there’s no kernel driver implementing this, at least per Silicon Labs design: since HID devices are usually exposed to userland, and in particular to non-privileged applications, sending and receiving the reports can be done directly from the apps. So indeed there is no COM* device exposed on Windows, even with the drivers installed.

Could someone (me?) write a Linux kernel driver that expose CP2110 as a serial, rather than HID, device? Sure. It would require fiddling around with the HID subsystem a bit to have it ignore the original device, and that means it’ll probably break any application built with Silicon Labs’ own development kit, unless someone has a suggestion on how to have both interfaces available at the same time, while it would allow accessing those devices without special userland code. But I think I’ll stick with the idea of providing a Free and Open Source implementation of the protocol, for Python. And maybe add support for it to pyserial to make it easier for me to use it.


  1. All these terms make more sense if you have at least a bit of knowledge of USB works behind the scene, but I don’t want to delve too much into that.
    [return]

Yak Shaving: Silicon Labs CP2110 and Linux

One of my favourite passtimes in the past years has been reverse engineering glucometers for the sake of writing an utility package to export data to it. Sometimes, in the quest of just getting data out of a meter I end up embarking in yak shaves that are particularly bothersome, as they are useful only for me and no one else.

One of these yak shaves might be more useful to others, but it will have to be seen. I got my hands on a new meter, which I will review later on. This meter has software for Windows to download the readings, so it’s a good target for reverse engineering. What surprised me, though, was that once I connected the device to my Linux laptop first, it came up as an HID device, described as an “USB HID to UART adapter”: the device uses a CP2110 adapter chip by Silicon Labs, and it’s the first time I saw this particular chip (or even class of chip) in my life.

Effectively, this device piggybacks the HID interface, which allows vendor-specified protocols to be implemented in user space without needing in-kernel drivers. I’m not sure if I should be impressed by the cleverness or disgusted by the workaround. In either case, it means that you end up with a stacked protocol design: the glucometer protocol itself is serial-based, implemented on top of a serial-like software interface, which converts it to the CP2110 protocol, which is encapsulated into HID packets, which are then sent over USB…

The good thing is that, as the datasheet reports, the protocol is available: “Open access to interface specification”. And indeed in the download page for the device, there’s a big archive of just-about-everything, including a number of precompiled binary libraries and a bunch of documents, among which figures AN434, which describe the full interface of the device. Source code is also available, but having spot checked it, it appears it has no license specification and as such is to be considered proprietary, and possibly virulent.

So now I’m warming up to the idea of doing a bit more of yak shaving and for once trying not to just help myself. I need to understand this protocol for two purposes: one is obviously having the ability to communicate with the meter that uses that chip; the other is being able to understand what the software is telling the device and vice-versa.

This means I need to have generators for the host side, but parsers for both. Luckily, construct should make that part relatively painless, and make it very easy to write (if not maintain, given the amount of API breakages) such a parser/generator library. And of course this has to be in Python because that’s the language my utility is written in.

The other thing that I realized as I was toying with the idea of writing this is that, done right, it can be used together with facedancer, to implement the gadget side purely in Python. Which sounds like a fun project for those of us into that kind of thing.

But since this time this is going to be something more widely useful, and not restricted to my glucometer work, I’m now looking to release this using a different process, as that would allow me to respond to issues and codereviews from my office as well as during the (relatively little) spare time I have at home. So expect this to take quite a bit longer to be released.

At the end of the day, what I hope to have is an Apache 2 licensed Python library that can parse both host-to-controller and controller-to-host packets, and also implement it well enough on the client side (based on the hidapi library, likely) so that I can just import the module and use it for a new driver. Bonus points if I can sue this to implement a test fake framework to implement the tests for the glucometer.

In all of this, I want to make sure to thank Silicon Labs for releasing the specification of the protocol. It’s not always that you can just google up the device name to find the relevant protocol documentation, and even when you do it’s hard to figure out if it’s enough to implement a driver. The fact that this is possible surprised me pleasantly. On the other hand I wish they actually released their code with a license attached, and possibly a widely-usable one such as MIT or Apache 2, to allow users to use the code directly. But I can see why that wouldn’t be particularly high in their requirements.

Let’s just hope this time around I can do something for even more people.