NeoPixel Acrylic Lamp: A Few Lessons Learnt

Last month I wrote some notes about Chinese acrylic lamps, the same kind as I used successfully for my insulin reminder. As I said, I wanted to share the designs as I went along, and I do now have a public repository for what I have, although it has to be said that you shouldn’t trust it just yet: I have not managed to get this to properly turn it on. So instead, in the spirit of learning from others’ mistakes, let me show you my blooper reel.

Lesson #1: Don’t Trust The Library

A pictures of two SMT-manufactured Acrylic lamp boards side-by-side.

These were unusable.

You may remember that in the previous post i showed the “sizing test”, which was a print of the board with no components on it, which I used to make sure that the LEDs and the screw holes would align correctly in the base. Since I had to take the measurement myself I was fairly worried I would get some of the measures wrong.

The size test was sent to print before I came to the conclusion that I actually wanted to use NeoPixel LEDs, instead of simple RGB LEDs, so it was printed to host “standard” 3535 common-anode LEDs. I then changed the design for a more sparse design that used the W2812B-Mini, which is a 3535 package (meaning it’s 3.5mm by 3.5mm in size), and is compatible with the NeoPixel libraries. This meant more capacitors (although modern versions of the W2812B don’t seem to require this) but less logic around connecting these up.

As you can see from the image above, I also added space on the side to solder headers to connect past the USB-to-UART and directly to the NeoPixel row. Which was probably my best choice ever. When the boards arrived, the first thing I did was connecting the NeoPixel control pins to try to turning them on and… I discovered that nothing worked.

The answer turned out to be that I trusted the Adafruit Eagle library too much: the pinout for the 3535 variants of the W2812B (the “mini”) has been added wrong since the beginning, and an issue existed since 2018. I sent a pull request to correct the pinout, but it looks like Adafruit is not really maintaining this repository anymore.

Because of the pinout mistake, there’s no way to access the NeoPixels on any of the ten boards I had printed in this batch, and I also missed one connection on the CP2104, which meant I couldn’t even use these as bulky USB-to-UART adapters as they are. But I can put this down as experience, and not worry too much about it, since it’s still a fairly cheap build by comparison with some of the components I’ve been playing with.

Lesson #2: Datasheets Can Still Lie To You

So I ordered another set of boards with a new revision: I replaced the 3535 version with the 5050 after triple checking that the pinout would be correct — while I did have a fixed part, I thought it would be better not to even suggest using a part that has a widespread broken pinout, and I did confirm I could fit the 5050 through the aperture by then. I also decided to move some components around to make the board a bit tighter and with a more recognizable shape.

The boards arrived, without the ESP32-WROOM module on them — that’s because it’s not in the list of parts that JLCPCB can provide, despite it being sold by their “sister company” LCSC. That’s alright because I procured myself the modules separately on AliExpress (before I figured out that ordering from LCSC is actually cheaper). And I started the testing in increasing order of satisfaction: can the NeoPixel be addressed? Yes! Can the CP2104 enumerate? Yes! Does it transmit/receive over serial? Yes! Does esptool recognize the newly soldered ESP32? Yes! Does it fit properly in the base with the module on, and have space to connect the USB plug? Yes! Can I flash MicroPython on it? Well…

This is where things got annoying and took me a while to straighten out. I could flash MicroPython on the ESP32 module. The programming worked fine, and I could verify the content of the flash, but I never got the REPL prompt back over serial. What gives?

Turns out I only read part of the datasheet for the module, and not the Wiki: there’s something that is not quite obvious otherwise, and that is that GPIO0 and GPIO2 are special, and shouldn’t be used for I/O. Instead, the two of them are used to select boot mode, and enter flashing. Which is why GPIO0 is usually tied to a “BOOT” switch on the ESP32 breakout boards.

How does esptool handle these usually? By convention it expects the RTS and DTR line of the serial adapter to be connected respectively to EN (reset) and GPIO0 (boot), together with “additional circuitry” to avoid keeping the board in reset mode if hardware flow control is enabled. Of course I didn’t know this when I sent these to manufacture, and I still am not sure what that additional circuitry looks like (there’s some circuitry on SparkFun Thing Plus, but it’s not quite clear if it’s the same as Espressif is talking about).

I have seen a few schematics for breadboard-compatible modules for ESP32, but I have not really found a good “best practices to include an ESP32 module into your design”, despite looking around for a while. I really hope at least documenting what can go wrong will help someone else in the future.

Next Steps

I have a third revision design that should address the mistakes I made, although at the time of writing this blog post I still need to find the “additional circuitry”, which I might just forego and remind myself that hardware flow control with ESP32 is probably a bad idea anyway — since the lines are used for other purposes.

I also made sure this time to add reset and boot buttons, although that turned out to be a bit more of a headache just to make sure they would fit with the base. The main issue of using “classic” top-actuated buttons is that putting them on the top of the board makes it hard to find them once mounted, and putting them on the bottom risk to get pressed once I fit back the bottom of the base in. I opted for side-actuated buttons, so that they are reachable when the board is mounted in the base, and marked on the bottom of the board, the same way as the connectors are.

I’m also wondering if I should at least provide the ability to solder in the “touch button” that is already present on the base, and maybe add a socket for connecting the IR decode to reuse the remote controls that I have plenty of, now. But that might all be going for over-engineering, who knows!

Unfortunately, I don’t think I’ll be making an order of a new set of boards in the immediate future. I’ve already said this on Twitter, and it’ll deserve an eventual longer-form post, but it seems like we might be moving sooner, rather than later. And while JLCPCB and LCSC are very fast at shipping orders, I will be mostly trying not to make new orders of anything until we have a certainty of where we’ll be in a a few months. This is again a good time to put everything into a project box, and take it out when the situation feels a bit more stable.

My Password Manager Needs a Virtual USB Keyboard

You may remember that a couple of years ago, Tavis convinced me to write down an idea of how my ideal password manager would look like. Later the same year I also migrated from LastPass to 1Password, because it made family sharing easier, but like before this was a set of different compromises.

More recently, I came to realise that there’s one component that my password manager needs, and I really wish I could convince the folks at 1Password to implement it: a virtual USB keyboard on a stick. Let me try to explain, because this already generated negative reactions on Twitter, including from people who didn’t wait to understand how it all fits together first.

Let me start with a note: I have been thinking in my idea of something like this for a long while, but I have not been able to figure out in my mind how to make it safe and secure, which means I don’t recommend to just use my random idea for this all. I then found out that someone else already came up with pretty much the same idea and did some of the legwork to get this to work, back in 2014… but nothing came of it.

What this suggests, is to have some kind of USB hardware token that can be paired with a phone, say over bluetooth, and be instructed to “type out” text via USB HID. Basically a “remote keyboard” controlled with the phone. Why? Let’s take a step back and see.

Among security professionals, there’s a general agreement that the best way to have safe passwords is to use unique, generated passwords and have them saved somewhere. There’s difference in the grade of security you need — so while I do use and recommend a password manager, practically speaking, having a “passwords notebook” in a safe place is pretty much as good, particularly for less technophile people. You may disagree on this but if so please move on, as this whole post is predicated on wanting to use password managers.

Sometimes, though, you may need a password for something that cannot use a password manager. The example that comes to my mind is trying to log in to PlayStation Network on my PS3/PS4, but there’s a number of other cases like that in addition to gaming consoles, such as printers/scanners, cameras (my Sony A7 need to log in to the Sony Online account to update the onboard apps, no kidding!), and computers that are just being reinstalled.

In these cases, you end up making a choice: for something you may have to personally type out more often than not, it’s probably easier to use a so-called “memorable password”, which is also commonly (but not quite correctly) called a Diceware password. Or, again alternatively, a 936 password. You may remember that I prefer a different form of memorable passwords, when it comes to passwords you need to repeatedly type out yourself very often (such as the manager’s master password, or a work access password), but for passwords that you can generate, store in a manager, and just seldomly type out, 936-style passwords are definitely the way to go in my view.

In certain cases, though, you can’t easily do this either. If I remember this correctly, Sony enforced passwords to have digits and symbols, and not repeat the same digit more than a certain amount of times, which makes diceware passwords not really usable for that either. So instead you get a generated password you need to spend a lot of time reading and typing — and in many cases, having to do that with on-screen keyboards that are hard to use. I often time out on my 1Password screen while doing so, and need to re-login, which is a very frustrating experience in and by itself.

But it’s not the only case where this is a problem. When you set up a computer for the first time, no matter what the operating system, you’ll most likely find yourself having to set up your password manager. In the case of 1Password, to do so you need the secret key that is stored… in 1Password itself (or you may have printed out and put in the safe in my case). But typing that secret key is frustrating — being able to just “send” it to the computer would make it a significantly easier task.

And speaking again of reinstalling computers, Windows BitLocker users will likely have their backup key connected to their Microsoft account so that they can quickly recover the key if something goes wrong. Nothing of course stops you from saving the same key in 1Password, but… wouldn’t it be nice to be able to just ask 1Password to type it for you on the computer you just finished reinstalling?

There’s one final case for which is this is useful, and that’s going to be a bit controversial: using the password on a shared PC where you don’t want to log in with your password manager. I can already hear the complaints that you should never log in from a shared, untrusted PC and that’s a recipe for disaster. And I would agree, except that sometimes you just have to do that. A long time ago, I found myself using a shared computer in a hotel to download and print a ticket, because… well, it was a whole lot of multiple failures why I had to do it, but it was still required. Of course I went on and changed the password right after, but it also made me think.

When using shared computers, either in a lounge, hotel, Internet cafe (are they still a thing), or anything like that, you need to see the password, which makes it susceptible to shoulder surfing. Again, it would be nice to have the ability to type the password in with a simpler device.

Now, the biggest complain I have received to this suggestion is that this is complex, increases surface of attack by targeting the dongle, and instead the devices should be properly fixed not to need any of this. All of that is correct, but it’s also trying to fight reality. Sony is not going to go and fix the PlayStation 3, particularly not now that the PS5 got announced and revealed. And some of these cases cannot be fixed: you don’t really have much of an option for the BitLocker key, aside from reading it off your Microsoft account page and typing it on a keyboard.

I agree that device login should be improved. Facebook Portal uses a device code that you need to type in on a computer or phone that is already logged in to your account. I find this particular login system much easier than typing the password with a gamepad that Sony insists on, and I’m not saying that because Facebook is my employer, but because it just makes sense.

Of course to make this option viable, you do need quite a few critical bits to be done right:

  • The dongle needs to be passive, the user needs to request a password typed out explicitly. No touch sensitive area on the dongle to type out in the style of a YubiKey. This is extremely important, as a compromise of the device should not allow any password to be compromised.
  • The user should be explicit on requesting the “type out”. On a manager like 1Password, an explicit refresh of the biometric login is likely warranted. It would be way too easy to exfiltrate a lot of passwords in a short time otherwise!
  • The password should not be sent in (an equivalent of) cleartext between the phone and the device. I honestly don’t remember what the current state of the art of Bluetooth encryption is, but it might not be enough to use the BT encryption itself.
  • There needs defense against tampering, which means not letting the dongle’s firmware to be rewritten directly with the same HID connection that is used for type out. Since the whole point is to make it safe to use a manager-stored password on an untrusted device, having firmware flashing access would make it too easy to tamper with.
    • While I’m not a cryptography or integrity expert, my first thought would be to make sure that a shared key negotiated between the dongle and the phone, and that on the dongle side, this is tied to some measurement registers similar to how TPM works. This would mean needing to re-pair the dongle when updating the firmware on it, which… would definitely be a good idea.

I already asked 1Password if they would consider implementing this… but I somewhat expect this is unlikely to happen until someone makes a good proof of concept of it. So if you’re better than me at modern encryption, this might be an interesting project to finish up and getting to work. I even have a vague idea on a non-integrated version of this that might be useful to have: instead of being integrated with the manager, having the dongle connect with a phone app that just has a textbox and a “Type!” button would make it less secure but easier to implement today: you’d copy the password from the manager, paste it into the app, and ask it to type of the dongle. It would be at least a starting point.

Now if you got to this point (or you follow foone on Twitter), you may be guessing what the other problem is: USB HID doesn’t send characters but keycodes. And keycodes are dependent on the keyboard layout. That’s one of the issue that YubiKeys and similar solutions have: you either need to restrict to a safe set of characters, or you end up on the server/parser side having to accept equivalence of different strings. Since this is intended to use with devices and services that are not designed for it, neither option is really feasible — in particular, the option of just allowing a safe subset just doesn’t work: it would reduce the options in the alphabet due to qwerty/qwertz/azerty differences, but also would not allow some of the symbol classes that a number of services require you to use. So the only option there would be for the phone app to do the conversion between characters and keycodes based on the configured layout, and letting users change it.

Documentation needs review tools, not Wikis

I’m a strong believer on documentation being a fundamental feature of open source, although myself I’m probably bad at following my own advice. While I do write down a lot of running notes on this blog, as I said before, blogs don’t replace documentation. I have indeed complained about how hard it seems to be to publish documentation that is not tied to a particular codebase, but there’s a bit more that I want to explore.

I have already discussed code reviews in the past few months — and pointing out how the bubble got me used to review tooling (back in the days this would be called CASE). The main thing that I care for, with these tools, is that they reduce the cost of the review, which makes it less likely that a patch is left aside for too long — say for three weeks, because one reviewer points out that the code you copied from one file to another is unsafe, and the other notes they missed it the first time around, but now it’s your problem to get it fixed.

In a similar spirit, “code reviews” for documentation are an incredibly powerful tool. Not just for the documentation quality, but also because of the inclusiveness of them. Let me explain, focusing in particular with documentation that is geared toward developers — because that’s what I know the most of. Product documentation, and documentation that is intended for end users, is something I have had barely any contact with, and I don’t think I would have the experience to discuss the matter.

So let’s say you’re looking a tool’s wiki page, and follow the instructions in it, but get a completely different result than you expected. You think you know why (maybe something has changed in one of the tool’s dependencies, maybe the operating system is different, or maybe it never worked in the first place), and you want to fix the documentation. If you just edit the wiki, and you’re right, you’re saving a lot of time and grief to the next person that comes over to the documentation.

But what happens if you’re wrong?Well, if you’re wrong you may be misinterpreting the instructions, and maybe give a bad suggestion to the next person coming over. You may be making the equivalent change of all the bad howto docs that say to just chmod 0777 /dev/something to make some software work — and the next person will find instructions that work, but open a huge gaping security hole into a software.

Do you edit the Wiki? Are you sure that there’s enough senior engineers knowing the tool that can notice you edited the wiki, and revert your change if it is wrong? You may know who has the answer, and decide to send them a note with the change “Hey, can you check if I did it right?” but what if they just went into a three weeks vacation? What if they end up in the hospital after writing about LED lights?

And it’s not just a matter of how soon someone might spot a mistaken edit. There’s the stress of not knowing (or maybe knowing) how such a mistake would be addressed. Will it be a revert with “No, you dork!”, or will it be an edit that goes into further details of what the intention was and what the correct approach should have been in the first place? Wikipedia is an example of something I don’t enjoy editing, despite doing it from time to time. I just find some of its policy absurdist — including having given me a hard time while trying to correct some editor’s incorrect understanding of my own project, while at the same time having found a minor “commercial open source” project having what I would call close to an advertisement piece, with all the references pointing at content written by the editor themselves — who happen to be the main person behind such project.

Review-based documentation systems – including Google’s g3doc, but also the “humble” Google Docs suggested edits! – alleviate this problem, particularly when you do provide a “fast path” for fixing obvious typos without going through the full review flow. But otherwise, they allow you to make your change, and then send it to someone who can confirm it’s right, or start discussing what the correct approach should be — and if you happen to be the person doing the review, be the rake collector, help clearing documentation!

Obviously, it’s not perfect — if all your senior engineers are jerks that would call names the newcomer making a mistake in documentation, the review would be just as stressful. But it gives a significant first mover advantage: you can (often) choose who to send the review to. And let’s be honest: most jerks are bullies, and they will be less likely to call names the newcomer, when they already got a sign off from another senior person.

This is not where it ends, either. Even when you are a senior engineer, or very well acquainted with a certain tool, you may still want to run documentation changes through someone else because you’re not sure how they will be read. For me, this often is related to the fact that English is not my native language — I may say something in such a way that is, in my head, impossible to misunderstand, and yet confuse everybody else reading it, because I’m using specialised terms, uncommon words, or I keep insisting on using a word that doesn’t mean what I think it means.

As an aside, if you read most of my past writing, you may have noticed I keep using the word sincerely when I mean honestly or truthfully. This is a false friend from Italian, where sincero means truthful. It’s one particular oddity that I was made aware of and tried very hard to get rid of, but still goes through at times. For the same reason, I tend to correct other people with the same oddity, as I trained myself to notice it.

And while non-native English speakers may think of this problem more often, it’s not to say that none of the English native speakers pay attention to this, or that they shouldn’t have someone else read their documentation first. In particular, when writing a tutorial it is necessary to get someone towards who it is targeted to read through it! That means someone who is not acquainted yet with the tool, because they will likely ask you questions if you start using terms that they never heard before, but are to you completely obvious.

Which is why I insist that having documentation in a reviewable (not necessarily requiring a review) repository, rather than a Wiki is an inclusiveness issue: it reduces the stress for newcomers, non-native English speakers, less aggressive people, and people who might not have gone to schools with debating clubs.

And at the same time, it reduces the risk that security-hole-enabling documentation is left, even for a little while, unreviewed but live. Isn’t that good?

Windows 10, NVMe SSDs, VROC

It sounded like an easy task: my main SSD was running out of space and I decided to upgrade to a 2TB Samsung 970 NVMe drive. It would usually be an easy task, but clearly I shouldn’t expect for things to be easy with the way I use a computer, still 20 years after starting doing incredibly rare stuff.

It ended up with me reinstalling Windows 10 three times, testing the Acronis backup restore procedure, buying more adapters than I ever thought I would need, and cursing my laziness when I set up a bunch of stuff in the past.

Let’s start with a bit of setup information: I’m talking about the gamestation, which I bought after moving to London because someone among the moving companies (AMC Removals in Ireland, and Simpsons Removals in London) stole it. It uses an MSI X299 SLI PLUS motherboard, and when I bought it, I bought two Crucial M.2 SSDs, for 1TB each — one dedicated to the operating system and applications, and the other to store the ever-expanding photos library.

At some point a year or so ago, the amount of pictures I took crossed the 1TB mark, and I needed more space for the photos. So thanks to the fact that NVMe SSDs became more affordable, and that you can pretty much turn any PCIe 3.0 x4 slot into an NVMe slot with a passive adapter, I decided to do just that, and bought a Samsung 970 EVO Plus 1TB, copied the operating system to it, and made the two older Crucial SSDs into a single “Dynamic Volume” to have more space for pictures.

At first I used a random passive adapter that I bought on Amazon, and while that worked perfectly nice to connect the device, it had some trouble with keeping temperature: Samsung’s software reported a temperature between 68°C and 75°C which it considers “too high”. I ended up spending a lot of time trying to find a way around this, and I ended up replacing all the fans on the machine, adding more fans, and managed to bring it down to around 60°C constantly. Phew.

A few months later, I found an advertisement for the ASUS Hyper M.2 card, which is a pretty much passive card that allows to use up to four NVMe SSDs on a PCI-E x16 slot as long as your CPU supports “bifurcation” — which I checked my CPU and motherboard both to support. In addition to allowing adding a ton of SSDs to a motherboard, the Hyper M.2 has a big aluminium heatsink and a fan, that makes it interesting to make sure the temperature of the SSD is kept in control. Although I’ll be honest and say that I’m surprised that Asus didn’t even bother adding a PWM fan control: it has an on/off switch that pokes out of the chassis and that’s about it.

Now fast forward a few more months, and my main drive is also full, and also Microsoft has deprecated Dynamic Volumes in favour of Storage Spaces. I decided that I would buy a new, bigger SSD for the main drive, and then use this to chance to migrate the photos to a storage space bundling together all three of the remaining SSDs. Since I already had the Hyper M.2 and I knew my CPU supported the bifurcation, I thought it wouldn’t be too difficult to have all four SSDs connected together…

Bifurcation and VROC

The first thing to know is that the Hyper M.2 card, when loaded with a single NVMe SSD, behaves pretty much the same way as a normal PCI-E-to-M.2 adapter: the single SSD gets the four lanes, and is seen as a normal PCI-E device by the firmware and operating system. If you connect two or more SSDs, now things are different, and you need bifurcation support.

PCI-E bifurcation allows splitting an x8 or x16 slot (8 or 16 PCI-E lanes) into two or four x4 slots, which are needed for NVMe. It requires support from the CPU (because that’s where PCI-E lanes terminate), and from the BIOS (to configure the bifurcation), and from the operating system, for some reason that is not entirely clear to me, not being a PCI-E expert.

So the first problem I found with trying to get the second SSD to work on the Hyper M.2 is that I didn’t realise how complicated the whole selection of which PCI-E slot has how many lanes is on modern motherboards. Some slots are connected to the chipset (PCH) rather than the CPU directly, but you want the videocard and the NVMe to go to the CPU instead. When you’re using the M.2 slots, they take some of the lanes away, and it depends on whether you’re using SATA or NVMe mode which lanes they take away. And it depends on your CPU how many lanes you have available.

Pretty much, you will need to do some planning and maybe some pen-and-paper diagram to follow through. In particular, you need to remember that where the lanes are distributed is statically chosen. Even though you do have a full x16 slot at the bottom of your motherboard, and you have 16 free lanes to connect, that doesn’t mean those two are connected. Indeed it turned out that the bottom slot only has x8 at best on my CPU, and instead I needed to move the Hyper M.2 two slots up. Oops.

The next problem was that despite Ubuntu Live being able to access both NVMe drives transparently, and the firmware able to boot out of them, Windows refused to boot complaining about inaccessible boot device. The answer for this one is to be found in VROC: Virtual RAID on CPU. It’s Intel’s way to implement bifurcation support for NVMe drives, and despite the name it’s not only there if you plan on using your drives in a RAID configuration. Although, let me warn here, from what I understand, bifurcation should work fine without VROC, but it looks like most firmware just enables the two together, so at least on my board you can’t use bifurcated slots without VROC enabled.

The problem with VROC is that while Ubuntu seems to pass through it natively, Windows 10 doesn’t. Even 20H1 (which is the most recent release at the time of writing) doesn’t recognize SSDs connected to a bifurcated host unless you provide it with a driver, which is why you end up with the inaccessible boot device. It’s the equivalent of building your own Linux kernel, and forgetting the disk controller driver or the SCSI disk driver. I realized that when I tried doing a clean install (hey, I do have a back for a reason!), and the installer didn’t even see the drives, at all.

This is probably the closest I’m getting to retrocomputing, by reminding me of installing Windows XP for a bunch of clients and friends, back when AHCI became common, and having to provide a custom driver disk. Thankfully, Windows 10 can take that from USB, rather than having to fiddle around with installation media or CD swap. And indeed, the Intel drivers for VROC include a VMD (Volume Management Device) driver that allows Windows 10 to see the drives and even boot from them!

A Compromising Solution

So after that I managed to get a Windows 10 installed and set up — and one of my biggest worries went away: back when my computer was stolen and I reinstalled Windows 10, the license was still attached to the old machine, I had to call tech support to get it activated, and I wasn’t sure if it would let me re-activate it; it did.

Now, the next step for me was to make sure that the SSD had the latest firmware and was genuine and correctly set up, so I installed Samsung Magician tools, and… it didn’t let me do any of that, because it reported Intel as the provider for the NVMe driver, despite Windows reporting the drive to be supported by their own NVMe driver. I guess what they mean is that the VROC driver interferes with direct access to the devices. But it means you lose access to all SMART counters from Samsung’s own software (I expect other software might still be able to access it), with no genuinity checks and in particular no temperature warning. Given I knew that this had been an issue in the past, this worried me.

As far as I could tell, when using the Hyper M.2, you not only lose access to the SSD manufacturer tooling (like Magician), but I’m not even sure if Windows can still access the TRIM facilities — I didn’t manage to confirm for good, I got an error when I tried using it, but it might have been related to another issue that will become apparent later.

And to fit this all up, if you do decide to move the drives out of the Hyper M.2 card, say to bring them back to the motherboard, you are back to square one with the boot device being inaccessible, because Windows will look for the VROC VMD, which will be gone.

At that point I pretty much decided that the Hyper M.2 card and the whole VROC feature wouldn’t work out for me, too many compromises. I decided to take a different approach, and instead of bringing the NVMe drives away from the M.2 slots, I planned to take the SATA drives away from the M.2 slots.

You see, the M.2 slots can carry either NVMe drives using PCI-E directly, or still common SATA SSDs — the connector is keyed, although I’m not entirely sure why, as there’s nothing preventing to try connecting a SATA M.2 SSD in a connector that only supports NVMe (such as the Hyper M.2), but that’s a different topic that I don’t care to research myself. What matters is that you can buy passive adapters that convert an M.2 SSD to a normal 2.5″ SATA one. You can find those on AliExpress, obviously, but I needed them quickly, so I ordered them from Amazon instead — I got Sabrent ones because they were available for immediate dispatching, but be also careful because they sell both M.2 and mSATA converters, as they all use the same protocol and you just need a passive adapter.

Storage Space and the return of the Hyper M.2

After installing with the two Samsung SSDs on the motherboard’s M.2 slots I finally managed to get the Samsung Magician working, which confirmed not only that the drive is genuine, but also that it already has the latest firmware (good). Unfortunately it also told me that the temperature of the SSD was “too high”, at around 65°C.

The reason for that is that the motherboard predates the more common NVMe drives, and unlike LGR’s, it doesn’t have full aluminium heatsinks to bolt on top of the SSDs to keep the temperature. It came instead with a silly “shield” that might be worse than not having it, and it positioned the first M.2 slot… right underneath the videocar. Oops! Thankfully I do have an adapter with a heatsink that allows me to connect the single SSD to a PCI-E slot without needing to use VROC… the Hyper M.2 card. So I counted for re-opening the computer, moving the 2TB SSD to the Hyper M.2, and be done with that. Easy peasy, and since I already had the card this is probably worth it.

Honestly if I didn’t have the card I would probably have gone for one of those “cards” that have both a passive NVMe adapter and a passive SATA adapter (needing the SATA data cable, but not the power), since at that point I would have been able to keep one SATA SSD on the motherboard (they don’t get as hot it seems), but again, I worked with what I had at hand.

Then, as I said above, I also wanted to take this change to migrate my Dynamic Volumes to the new Storage Spaces, which are supposed to be better supported and include more modern features for SSDs. So once I got everything reinstalled, I tried creating a new pool and setting it up… to no avail. The UI didn’t let me create the pool. Instead I ended up using the command line via PowerShell, and that worked fine.

Though do note the commands on Windows 10 2004/20H1 are different from older Server versions. Which makes looking for solutions on ServerFault and similar very difficult Also it turns out that between deleting Dynamic Volumes from two disks and adding them to a Storage Spaces Pool, you need to reboot your computer. And the default way to select the disk (the “Friendly Name” as Windows 10 calls it) is to use the model number — which makes things interesting when you have two pairs of SSDs with the same name (Samsung doesn’t bother adding the size to the model name as reported by Windows).

And then there’s the kicker, which honestly got me less angry than everything else that went on, but did make me annoyed more than I showed up: Samsung Magician lost access to all the disks connected to the Storage Spaces pool! I assume this is because the moment when they are added to the pool, Windows 10 does not show them in the Disk Management interface either, and Magician is not updated to identify disks at a lower level. It’s probably a temporary situation, but Storage Spaces are also fairly uncommon, so maybe they will not bother fixing that.

The worst part is that even the new SSD disappeared, probably for the reason noted above: it has the same name as the disk that is in the Storage Spaces Pool. Which is what made me facepalm — given I once again lost access to Samsung’s diagnostics, although I confirmed the temperature is fine, the firmware has not changed, and the drive is genuine. I guess VROC would have done just as well, if I confirmed the genuineness before going on with the reinstalling multiple times.

Conclusion

Originally, I was going to say that the Hyper M.2 is a waste of time on Windows. The fact that you can’t actually monitor the device with the Samsung software is more than just annoying — I probably should have looked for alternative monitoring software to see if I could get to the SMART counters over VROC. On Linux of course there’s no issue with that given that Magician doesn’t exist.

But if you’re going to install that many SSDs on Windows, it’s likely you’re likely going to need to use Storage Spaces — in which case the fact that Magician doesn’t work is also moot, as it wouldn’t work either. The only thing you need to do is making sure that you have the drivers to install this correctly in the first place. Using the Hyper M.2 – particularly on slightly older motherboards that don’t have good enough heatsinks for their M.2 slots – turns out to be fairly useful.

Also Storage Spaces, despite being a major pain in the neck to set up on Windows 10, appear to do a fairly good job. Unlike Dynamic Volumes they do appear to balance the writing to multiple SSDs, they support TRIM, and there’s even support for preparing a disk to be removed from the pool, moving everything onto the remaining disks (assuming there’s enough space), and freeing up the drive.

If I’m not getting a new computer any time soon (and I would hope I won’t have to), I have a feeling I’ll go back to use the Hyper M.2 for VROC mode, even if it means reinstalling Windows again. Adding another 2TB or so of space for pictures wouldn’t be the cheapest idea, but it would allow expansion at a decent rate until whatever next technology arrives.

Service Announcement: Live Whiteboarding this Thursday

I’m breaking the usual post schedule to point out that this Thursday (2020-07-30) I’ve decided to attempt (again) an online whiteboarding session, this time with virtual whiteboard software instead of a physical one.

The plan is to have a one hour of me rambling on and ranting about some of my projects that are not formed enough to be blog posts (such as my electronics projects), but that I would love to share with the wider community sooner rather than later.

I plan on streaming this on Twitch, for the first time ever, since their Studio software appears to be the most straightforward way to stream a single window on the screen, and it should have a decent chat system as well. So if you’re interested in hearing some of my thought process, or you’re an ex colleague who misses my office rants, you’re welcome to join us there.

The set time is 8pm London time, and I plan on rant around for an hour. If there’s going to be enough interest in this, I’ll try to make this a more regular thing.

CircuitPython-powered Birch Books

This part April, after leaving one bubble and before joining the next, I found myself deciding to work on some basic electronics, to brush up my down-to-hardware skills and learn new stuff. And since I just had assembled the Lego Bookstore (Birch Books), I thought I would improve on that by adding LEDs and controlling them with a programmed microcontroller.

I’ve been back and forth on choosing which MCU to use, settling on an 8051 clone, which was not entirely straightforward to get to work, but eventually I thought I did. Until I got the actual boards to mount it on, and found out that I couldn’t get any new code on my chips, for some reasons I still haven’t figured out. Instead I decided to take a different approach and use a higher-level programming language and higher-level board, too.

Due to the new job having different set of contribution guidelines, I had to wait a bit before making the new boards and code available, but I have described most of the plans for it in the previous blog post. The code is now dropped in the repository. And I also spent some time to tidy up some of the other boards based on the various bits and pieces I learned from the past few months of trials and error.

The CircuitPython-based implementation relies on either the MCP23016 or the MCP23017 GPIO expanders. The trick for that is that the code is looking for the former at address 32 and the latter at address 33. The pull request to support the older, clunkier ’16 expander is approved but hasn’t been merged yet at the time of writing. I briefly considered making yet another alternative board based on the MCP23018 just to add support for it to the Adafruit library, but… I’ll leave that to someone else for now.

And while I used a Feather M4 to have it working at first, I ended up building the boards to the tinier Trinket M0, which turned out not just to be a good fit for the project, but also a good way to make sure the code would work on more boards. Turns out that on the Trinket, there’s no time.time() and I had to switch to monotonic() instead, and that lead to me finding a bug in CircuitPython.

In truth, you can hotwire anything to these boards as long as they have I²C and can talk to the MCP23016/7. And while the connector on them is designed with my actuator board in mind, what it has is pretty much the whole 16 I/O lines coming from the expander. Were you looking for an easy way to connect a Trinket M0 with 16 I/O lines to a breadboard? There you go — you might just have to connect the top header as a male pin header on the bottom of the board. Making an adapter to use this with a Feather would be relatively straightforward.

The (nearly) end result is fairly pleasing:

The version you see there is not the most recent design that you’ll find in the repository, but rather the first prototype of this version of the board that, surprisingly, managed to work the first time — nearly. The actuator board needed some work after I set up the Trinket: at first it kept blinking really fast, because it kept going in and out of test mode (which turns on all the LEDs), and in and out of “fast forward” mode (I’ll get to that in a moment). The reason turned out to be that the inputs needed an explicit pull-down. Which is why the new version of the actuator has three resistors to the side of the buttons. I also learnt that I really should pay attention to the orientation of components on the design, since at first I messed up and pulled up one of the inputs constantly.

Let’s talk a moment about the “fast forward” mode. When I originally came to design the inputs for the project, I decided I would want to have a way to run the whole cycle faster, to make videos out of it without needing to use a time-lapse or something like that. This turned out to be a good idea for testing, as well. But it’s implemented in fundamentally different ways between the 8052 firmware and its CircuitPython counterpart.

In the 8052 version, the time is kept as a number of ticks increased by a timer interrupt. When the fast-forward mode is engaged, instead of adding one tick per interrupt, it adds sixty four. Which is the speed up time of the fast forward. It also means that fast-forward works similar to a fast-forward on an old cassette tape, as it speeds up time while pressed, but it starts and end with normal time.

In the CircuitPython version, the “virtual hour” (as in, which scene we’re at) is wholly dependent on the monotonic time since start up, and the fast forward mode just changes the divisor between real time and virtual time. Which means that pressing fast-forward completely changes the virtual time. Oops. I couldn’t find a decent way around that that would have made the firmware overly complicated, so I pretty much gave up.

On the bright side, since I already have a repository for flame effect code for CircuitPython, it might be a good time for me to include the design for the fireplace, since it should be trivial to do so with the boards as they are right now!

Overall, I’m fairly happy with the result, although it feels less “self-contained” than I originally planned. I was honestly afraid at first that the Trinket M0 would be more power-hungry than the STC89, but it might be the opposite. Indeed I’m now having some issues with at least one one of the LEDs that I put in the bookstore being “burned down” — I fear my current limiting calculation was too tied to the STC89, as noticed when the ground floor LEDs turned out to be much brighter than with the breadboard. I need to make sure I got my numbers right — and if I need to, I have some new, fancier LEDs I can install, although those were rated for 12V, in which case I might need some rethinking of the whole setup.

Now to wrap this up a few words about this whole experience: I have worked on hardware before, but never down to this level. I have worked on firmware for very tiny systems that go into HVAC control planes, and for bigger systems built with COTS components but treated and sold as embedded solutions. But the closest to anything like this for me has been high school, which is over fifteen years ago!

This journey has been a learning experience, and I want to thank the folks over on the Adafruit Discord server. I also welcome any comment and critique on the code or the board designs — the latter particularly because I really have no idea what I was doing, and I had to improvise.

I also want to keep talking about mistakes as I make them, and even wonder out loud when I think I did something wrong, for two reasons: first of all, as I just said, I welcome more critiques of my thought process — if I’m going around something the wrong way, please let me know. The second reason is a bit more “squishy”: while making these mistakes is a sure way to learn more, they are not cheap mistakes. Ordering another set of five prototypes is around £5, but then you need to batch a few and spread over the shipping costs, plus the “bill of materials” tend to add up, even if the single components are just a few cents. I hope I’ll be saving someone else’s money when they look around to “how did someone else do this for this stuff?”

Windows 10, OpenSSH and YubiKey

You may remember that a few months ago I suggested that Windows 10 is an interesting FLOSS development platform now, and that I decided to start using Windows 10 on my Dell XPS laptop (also in the hope that the problem I had with the battery would be caused by Linux — and the answer to that is “nope”, indeed the laptop’s battery is terrible.) One of the things I realised setting all of those up, is that I found myself unable to use my usual OpenPGP-based token, and I thought I would try using a YubiKey 5 instead.

Now, between me and Yubico there’s not much love lost, but I thought I would try to make my life easier by using a smartcard that seemed to have a company interested in this kind of usage behind it. Turns out that this was only partially working, unfortunately.

The plan was to set up the PIV mode of the YubiKey 5 to provide the authentication certificate, rather than trying to use the OpenPGP mode. The reason for that is to be found on Yubico’s own website:

GPG4Win’s smart card support is not rock solid; occasionally you might get error messages when trying to access the YubiKey. It might happen after removing and re-inserting the YubiKey, or after your computer has been in sleep mode, etc. This can be resolved by restarting gpg-agent [snip]

Given that GnuPG’s own smartcard support is kind of terrible already, and not wanting to get into the yak shaving of getting that to work on Windows, I was hoping that using the more common (on Windows) interface of PKCS#11, which OpenSSH supports natively (sort of). To give a very quick and oversimplified summary, PKCS#11 is the definition of an API/ABI that end user software, such as OpenSSH, can use to interface with middleware that provides access to PKI-related functions. Many smartcard manufacturers provide ready made middleware implementing a PKCS#11 interface, which I thought Windows supported directly, but I may be wrong. Mozilla browsers rely on this particular interface to handle CA certificates as well, to the point that the NSS library that Mozilla uses is pretty much a two-part component with a PKCS#11 provider and a PKCS#11 client.

As it turns out, Yubico develops a PKCS#11 middleware for YubiKey as part of yubiko-piv-tool, and provides documentation on how to use it for SSH authentication. Unfortunately the instructions don’t really expand to including needed information for using this on Windows, as they explicitly say at the top of the page. But that would never stop me, after all. Most of the setup described in that document is perfectly applicable to Windows, by the way — until you get to the first issue…

The first issue with setting this up is that while Windows 10 does ship with OpenSSH client (and server), it does not ship with PKCS#11 support enabled. Indeed, the version provided even with 20H1 (the current most recent non-Insider build) is 7.7p1, while the current upstream release would be 8.3p1. Thankfully, Microsoft is providing a more up to date build, although that’s also still blocked at 8.1p1. The important part is that these binaries do include PKCS#11 support.

For this whole to work, you need to have both the OpenSSH binaries provided by Microsoft, and the Yubico libraries (DLL) in folders that are part of the PATH environment variable. And they also need to match the ABI. So if you’re setting this up on an x64 system, and used the 64-bit OpenSSH build, you should install the 64-bit Yubico PIV Tool, and vice-versa for 32-bit installs.

Now, despite the installer warning you that to use the PKCS#11 provider you need to have the bin folder in the PATH variable, and that loading the provider will full path will not be enough… the installer does not offer to modify the PATH itself, unlike the Git installer that does, to make it easy to use globally. This is not too terrible, because you also need to add the new OpenSSH in the PATH. For myself, I decided to use a simple OpenSSH folder in my home.

Modifying the environment variables in (English) Windows 10 is fairly straightforward: hit the Search function, and type Environment — it’ll come up with the right control panel, and you can then edit the PATH variable and just browse for the right folder.

There is one more thing you need to do, and that is to create a .ssh/config file in your home directory with the following content:

PKCS11Provider libykcs11.dll

This instructs OpenSSH to look for the Yubico PKCS#11 provider automatically instead of having to specify it on the command line. Note once again that while you could provide the full path to the DLL file, if you didn’t add it to the PATH, it would likely not load — Windows 10 is stricter in where to look for dependencies when dynamically loading a DLL. And also, you’ll get a “not a valid win32 application” error if you installed/configured the wrong version of the Yubico tool (32-bit vs 64-bit).

After that is done, ta-dah! It should work fine!

Screenshot of Windows PowerShell using a YubiKey 5 to authenticate to a Gentoo Linux system.

This works, when using PowerShell. You get asked to enter the PIN for the YubiKey, and you login just fine. Working exactly as intended there.

Unfortunately, the next step I wanted to use this for is to use VSCode to connect to my NUC, and work on things like usbmon-tools remotely, so for that to work, I needed to be able to use this authentication method through the Visual Studio Code remote host mode… and that’s not working at the time of writing. The prompt comes up, but VSCode does not appear to proxy it to anything into its UI for me to answer it.

I’m surprised, because as far as I can tell, the code responsible for the prompt uses the correct read_passphrase() function call for it to be a prompt proxied to the askpass implementation, which I thought was already taken care of by VSCode. I have not spent too much time debugging this problem yet, but if someone is more familiar than me with VSCode and can track down what should happen there, I’d be very happy to hear about it. For now, I filed an issue.

Update 2020-08-04: Rob Lourens from Microsoft moved the issue to the the right repository and pointed to another issue (filed later but in the right place).

The workaround to use this from VSCode, it’s to make sure that "remote.SSH.useLocalServer": true is set, and click on the Details link at the bottom-right corner when it’s trying to connect, to type in the PIN. At which point everything seem to work fine, and even use the connection multiplexer to avoid requesting it all the time.

Screenshot of Visual Studio Code showing a remote SSH connection to my Linux NUC with usbmon-tool open.

Cultural Diversity

I would say that in most cases, I’m the worst person to talk about diversity as, like John Scalzi said, I’m playing at the lowest difficulty setting. There is a small exception to this when the matter relates to cultural diversity in a vastly USA-based environment, which is what I would like to spend a few words on this time.

Let’s start with language. For open source developers, working with people from different countries, and for which English is not the native language, is not uncommon at all — indeed, projects such as Gentoo and VideoLan tend to have overall more people for whom English is a second (if not third) language. There is a difference, though, when it comes to work environments such as my previous bubble, where you have to speak English a lot — impromptu, on the spot.

It took adjustment when I moved to Dublin, despite having spent most of the previous year in Los Angeles: on one side, South-Californian English and Dublin English are significantly different in tone and intentions, and on the other hand, it required “checking the gain” on what people coming from other languages meant with certain words. Again, not something totally new for those who spent time in various Open Source projects, but even IRC allowed you to take a moment to type an answer back, or to re-read what the other person said. And while the voice tone and body language help, it’s still harder to process, understand, and form a reply in your second (or third) language in real-time than it would be over asynchronous medium.

London was another kettle of fish altogether — maybe because I have listened to enough Radio 4 to grasp many Londoners expression fasters than I picked them up in Dublin, or maybe because I have built up the experience there. But that doesn’t mean it wasn’t hard — indeed in my experience I found that British-born-and-raised people tend to be (unwittingly) less forgiving for mis-speaking, expecting every word used to having been carefully chosen by the speaker, including any obvious-to-them rude turn of phrase. This doesn’t appear to be my impression only — the FT’s Michael Skapinker wrote about it, more than once, and I would suggest reading his articles to both the English native speakers, and those of us “second-languagers” that find it hard to work productively with them.

Now, before somebody says that I’m painting the whole group of native English speakers with a single brush — that is certainly not the case. I already singled out the Dubliners before, and I have plenty of friends, colleagues, and ex-colleagues that have learnt not to assume that every word is perfectly weighted beforehand, particularly in spoken lines, and have asked me more than a few times if the word I used was meant to sound as aggressive as it did.

There’s another fun interaction that I learnt to appreciate: talking shop with people coming from cultures that are very direct, such as some of my Romanian friends — we can start cussing and repeat the word shit many times in a row, get excited, and maybe even disagree vehemently on concepts, solutions, and decisions, … and then go grab coffee together like nothing happened. At least once I’ve been asked not quite directly whether I have been a hypocrite — but no, it’s just that for me (us) technical disagreements among friends are just that… technical disagreements.

If all of this boggles your mind and sounds like me trying to justify my tone or behaviour, I would suggest you to read The Culture Map. The book is a fascinating read, and goes into a lot of examples of how different culture “baselines” differ — with repeated reminders that the fact that a culture baseline does not mean that everyone in that culture behaves exactly the same way, we’re not in a world of hats.

Another point that I feel should be spelled out explicitly is about general (popular) cultural references — they don’t really translate very well, under different axis. I have a friend who gets annoyed at Harry Potter references in documentation and service naming, because they didn’t actually care for the series, and so anything that feels “obvious” to a fan goes straight over their head, and I sympathise.

I know of similar complains with most other “big fandoms”: Star Wars, Star Trek, Game of Thrones, Lord of the Rings, Dragonball, Naruto, … it’s a type of gatekeeping that is subtle, but still present: if you don’t happen to have at least dabbled in most of these, things will be slightly harder for you, and give you a feeling that you’re not welcome. Also it turns out that those who read/watched some of those big names in another language might be just as annoyed, because a lot of times names and terms got translated to something different, maybe closer to the target culture, that makes the reference even harder to grasp.

Also, let me be clear that this is not only a problem within the tech spaces dominated by white male geeky engineers. A few years ago I found myself having an argument over the fact that I missed the window of time to submit intern projects ideas, because I went on to check the old site, which was handily named “iwantanintern”, rather than finding out that they switched it over a new one named “redfishbluefish”. When I pointed out it’s a very opaque name to me, I was informed by a surprised HR person that they were expecting everybody to know the book by Dr. Seuss – which turns out it doesn’t even appear to be translated in Italian according to Wikipedia – and that it fit perfectly well with the companion Waldo website, named after the North American name of the protagonist from Where is Wally? As it turns out, I at least knew Wally as a kid, but most of the other children books I was given were either Italian or Disney, or both, so Dr. Seuss never figured into my upbringing.

So what’s the point of bringing up all of this? Well, I wanted to point out that there still is quite a bit of discrimination that can be felt among those of us that are otherwise well into a privileged class — with the hope that this can both make it easier to empathise with those who are less privileged, and as an answer – that I should have provided there and then, and out loud – to those I have heard before saying that I shouldn’t complain, since I can completely pass for British if I wanted to (cannot, and will not!) and so I have nothing to fear from drunk right wingers at a pub the day after Brexit deadlines.

Investigating Chinese Acrylic Lamps

A couple of months ago I built an insulin reminder light, roughly hacking around what I would call an acrylic lamp. The name being a reference to the transparent acrylic (or is it polycarbonate?) shape that you fit on top, and that lights up with the LEDs it’s resting on top. They are totally not a new thing, and even Techmoan looked at them three years ago. The relatively simple board inside looked fairly easy to hack around, and I thought it would make a good hack project to look more into them.

They are also not particularly expensive. You can go on AliExpress and get them for a few bucks each with so many different shape designs. There’s different “bases” to choose from, too — the one I hacked the Pikachu on was a fairly simple design with a translucent base, and no remote control, although the board clearly showed space for a TSOP-style infrared decoder. So I ended up ordering four different variants — although all of them without remotes because that part I didn’t particularly care for: one translucent base, one black base with no special features, one with two-colour shapes and LEDs, one one self-changing LEDs with mains power only.

While waiting for those to turn up, I also found a decent deal on Amazon on four bases without the acrylic shapes on them for about £6 each. I took a punt and ordered them, which turned out to be a better deal than expected.

These bases appear to use the same board design, and the same remote control (although they shipped four remotes, too!), and you can see an image of it on the right. This is pretty much the same logic on the board as the one I hacked for my insulin reminder, although it has slightly different LEDs, which are not common anode in the package, but are still wired in a common-anode configuration.

For both the boards, the schema above is as good a reversing as I managed on my own. I did it on the white board, so there might be some differences in the older green one, particularly in the number of capacitors, but all of that is not important for what I’m getting to right now. I shortened the array to just four LEDs to show, but this goes on for all of the others too. The chip is definitely not a Microchip one, but it fits the pinout, so I kept that one, similarly to what I did for the fake candle. Although in this case there’s no crystal on the board, which suggests this is a different chip.

I kind of expected that all the remaining boards would be variation on the same idea, except for the multi-color one, but I was surprised to figure out that only two of them shared the same board design (but took different approaches as to how to connect the IR decoder — oh yeah, I didn’t select any of the remote-controlled lamps, but two of them came with IR decoderes anyway!)

The first difference is due to the base itself: there’s at least two types of board that relate to where the opening for the microUSB port is in relation to the LEDs: either D-shaped (connector inline with the LEDs) or T shaped (connector perpendicular to the LEDs). Another difference is in the placement of the IR decoder: on most of the bases, it’s at 90° from the plug, but in at least one of them it’s direct opposite.

Speaking of bases, the one that was the most different was the two-colours base: it’s quite smaller in size, and round with a smooth finish, and the board was proper D shaped and… different. While the LEDs were still common-anode and appeared wired together, each appears to have its own paired resistor (or two!), and the board itself is double-sided! That was a surprise! It also is changing the basic design quite a bit more than I expected, including only having one Zener, and powering up the microcontroller directly over 4.5V instead of using a 3V regulator.

It also lacks the transistor configuration that you’d find on the other models, which shouldn’t surprise, given how it needs to drive more than the usual three channels. Which actually had me wonder: how does it drive two sets of RGB LEDs with an 8-pin microcontroller? Theoretically, if you don’t have any inputs at all, you could do it: VDD and VSS take two pins, each set of LEDs take three pins for the three colour channels. But this board is designed to take an IR decoder for a remote control, which is an input, and it comes with a “button” (or rather, a piece of metal you can ground with your finger), which is another input. That means you only have four lines you can toggle!

At first I thought that the answer was to be found on the other six-pin chip on the lift, but turns out that’s not the case. That one is marked 8223LC and that appears to correspond to a “touch controller” Shouding SD8223L and is related to the metal circlet that all of these bases use as input.

Instead, the answer became apparent when using the multimeter in continuity mode: since it provides a tiny bit of current, you can turn on LEDs by pointing them between anode and cathode of the diode. Since the RGB cathode on the single LED package are already marked on the board, that’s also not difficult to do, and while doing that I found their trick: the Blue cathods are common to all 10 LEDs, they are not separate for outer and inner groups, and more interestingly the Green cathodes are shorted to the anodes for the inner four LEDs — that means that only the outer LEDs have the full spectrum of colours available, and the only colour combination that make the two groups independent is Green/Red.

So why am I this interested in these particular lamps? Well, they seem to be a pretty decent candidate to do some “labor of love” hack – as bigclive would call it – with making them “Internet of Things” enabled: there’s enough space to fit an ESP32 inside, and with the right stuff you should be able to create a lamp that is ESPHome compatible — or run MicroPython on it, either to reimplement the insulin reminder logic, or something else entirely!

A size test print of my custom designed PCB.

Indeed, after taking a few measurement, I decided to try my hand at designing a replacement board that fits the most bases I have: a D-shaped board, with the inline microUSB, has just enough space to put an ESP32 module on it, while keeping the components on the same side of the board like in the original models. And while the ESP32 would have enough output lines to control at least the two group of LEDs without cheating, it wouldn’t have enough to address normal RGB LEDs individually… but that doesn’t need to stop a labor of love hack (or an art project): Adafruit NeoPixel are pretty much the same shape and size, and while they are a bit more expensive than the regular RGB LEDs they can be individually addressed easily.

Once I have working designs and code, I’ll be sharing, particularly in the hopes that others can improve on them. I have zero designing skills when it comes to graphics or 3D designing, but if I could, I would probably get to design my own base as well as the board: with the exception of the translucent ones, the bases are otherwise some very bland black cylinders, and they waste most of the space to allow 3×AAA batteries (which I don’t think would last for any amount of time). Instead, a 3D printed base, with hooks to hold it onto a wall (or a door) and a microUSB-charged rechargeable battery, would be a lovely replacement for the original ones. And if we have open design for the board, there’s pretty much no need to order and hope for a compatible base to arrive.

Diagonal Contributions

This is a tale that starts on my previous dayjob. My role as an SRE had been (for the most part) one of support, with teams dedicated to developing the product, and my team making sure that it would perform reliably and without waste. The relationship with “the product team” has varied over time and depending on both the product and the SRE team disposition, sometimes in not particularly healthy way either.

In one particular team, I found myself supporting (together with my team) six separate product teams, spread between Shanghai, Zurich and Mountain View. This put particular pressure on the dynamics of the team, particularly when half of the members (based in Pittsburgh) didn’t even have a chance to meet the product team of two services (based in Shanghai), as they would be, in the normal case, 12 hours apart. It’s in this team that I started formulating the idea I keep referring to as “diagonal contributions”.

You see, there’s often a distinction between horizontal and vertical contributions. Vertical referring to improving everything of a service, from the code itself, to its health checks, release, deployment, rollout, … While horizontal referring to improving something of every service, such as making every RPC based server be monitored through the same set of metrics. And there are different schools of thought on which option is valid and which one should be incentivised, and so it usually depends on your manager and their manager on which one of the two approach you’ll be rewarded to take.

When you’re supporting so many different teams directly, vertical contributions are harder on the team overall — when you go all in to identify and fix all the issues for one of the products, you end up ignoring the work needed for the others. In these cases an horizontal approach might pay off faster, from an SRE point of view, but it comes with a cost: the product teams would then have little visibility into your work, which can turn into a nasty confrontation, particularly depending on the management you find yourself dealing with (on both sides).

It’s in that situation that I came up with “diagonal contributions”: improve a pain point for all the services you own, and cover as many services you can. In a similar fashion to rake collection, this is not an easy balance to strike, and it takes experience to have it done right. You can imagine from the previous post that my success at working on this diagonal has varied considerably depending on teams, time, and management.

What did work for me, was finding some common pain points between the six products I supported, and trying to address those not with changes to the products, but with changes to the core libraries they used or the common services they relied upon. This allowed me to show actual progress to the product teams, while solving issues that were common to most of the teams in my area, or even in the company.

It’s a similar thing with rake collection for me: say there’s a process you need to follow that takes two to three days to go through, and four out of your six teams are supposed to go through it — it’s worth it to invest four to six days to reduce the process to something that takes even just a couple of hours: you need fewer net people-days even just looking at the raw numbers, which is very easy to tell, but that’s not where it stops! A process that takes more than a day adds significant risks: something can happen overnight, the person going through the process might have to take a day off, or they might have a lot of meetings the following day, adding an extra day to the total, and so on.

This is also another reason why I enjoy this kind of work — as I said before, I disagree with Randall Munroe when it comes to automation. It’s not just a matter of saving time to do something trivial that you do rarely: automation is much less likely to make one-off mistakes (it’s terrifyingly good at making repeated mistakes of course), and even if it doesn’t take less time than a human would take, it doesn’t take human time to do stuff — so a three-days-long process that is completed by automation is still a better use of time than a two-days-long process that rely on a person having two consecutive days to work on it.

So building automation or tooling, or spending time making it easier to use core libraries, are in my books a good way to make contributions that are more valuable than just to your immediate team, while not letting your supported teams feel like they are being ignored. But this only works if you know which pain points your supported teams have, and you can make a case that your work directly relates to those pain points — I’ve seen situations where a team has been working on very valuable automation… that relieved no pain from the supported team, giving them a feeling of not being taken into consideration.

In addition to a good relationship with the supported team, there’s another thing that helps. Actually I would argue that it does more than just help, and is an absolute requirement: credibility. And management support. The former, in my experience, is a tricky one to understand (or accept) for many engineers, including me — that’s because often enough credibility in this space is related to the actions of your predecessors. Even when you’re supporting a new product team, it’s likely its members have had interactions with support teams (such as SRE) in the past, and those interactions will colour the initial impression of you and your team. This is even stronger when the product team was assigned a new team — or you’re a new member of a team, or you’re part of the “new generation” of a team that went through a bit of churn.

The way I have attacked that problem is by building up my credibility, by listening, and asking questions of what the problems the team feel are causing them issues are. Principles of reliability and best practices are not going to help a team that is struggling to find the time to work even on basic monitoring because they are under pressure to deliver something on time. Sometimes, you can take some of their load away, in a way that is sustainable for your own team, in a way that gains credibility, and that further the relationship. For instance you may be able to spend some time writing the metric-exposing code, with the understanding that the product team will expand it as they introduce new features.

The other factor as I said is management — this is another of those things that might bring a feeling of unfairness. I have encountered managers who seem more concerned about immediate results than the long-term pictures, and managers who appear afraid of suggesting projects that are not strictly within the scope of reliability, even when they would increase the team’s overall credibility. For this, I unfortunately don’t have a good answer. I found myself overall lucky with the selection of managers I have reported to, on average.

So for all of you out there in a position of supporting a product team, I hope this post helped giving you ideas of how to building a more effective, more healthy relationship.