Service Announcement: Live Whiteboarding this Thursday

I’m breaking the usual post schedule to point out that this Thursday (2020-07-30) I’ve decided to attempt (again) an online whiteboarding session, this time with virtual whiteboard software instead of a physical one.

The plan is to have a one hour of me rambling on and ranting about some of my projects that are not formed enough to be blog posts (such as my electronics projects), but that I would love to share with the wider community sooner rather than later.

I plan on streaming this on Twitch, for the first time ever, since their Studio software appears to be the most straightforward way to stream a single window on the screen, and it should have a decent chat system as well. So if you’re interested in hearing some of my thought process, or you’re an ex colleague who misses my office rants, you’re welcome to join us there.

The set time is 8pm London time, and I plan on rant around for an hour. If there’s going to be enough interest in this, I’ll try to make this a more regular thing.

CircuitPython-powered Birch Books

This part April, after leaving one bubble and before joining the next, I found myself deciding to work on some basic electronics, to brush up my down-to-hardware skills and learn new stuff. And since I just had assembled the Lego Bookstore (Birch Books), I thought I would improve on that by adding LEDs and controlling them with a programmed microcontroller.

I’ve been back and forth on choosing which MCU to use, settling on an 8051 clone, which was not entirely straightforward to get to work, but eventually I thought I did. Until I got the actual boards to mount it on, and found out that I couldn’t get any new code on my chips, for some reasons I still haven’t figured out. Instead I decided to take a different approach and use a higher-level programming language and higher-level board, too.

Due to the new job having different set of contribution guidelines, I had to wait a bit before making the new boards and code available, but I have described most of the plans for it in the previous blog post. The code is now dropped in the repository. And I also spent some time to tidy up some of the other boards based on the various bits and pieces I learned from the past few months of trials and error.

The CircuitPython-based implementation relies on either the MCP23016 or the MCP23017 GPIO expanders. The trick for that is that the code is looking for the former at address 32 and the latter at address 33. The pull request to support the older, clunkier ’16 expander is approved but hasn’t been merged yet at the time of writing. I briefly considered making yet another alternative board based on the MCP23018 just to add support for it to the Adafruit library, but… I’ll leave that to someone else for now.

And while I used a Feather M4 to have it working at first, I ended up building the boards to the tinier Trinket M0, which turned out not just to be a good fit for the project, but also a good way to make sure the code would work on more boards. Turns out that on the Trinket, there’s no time.time() and I had to switch to monotonic() instead, and that lead to me finding a bug in CircuitPython.

In truth, you can hotwire anything to these boards as long as they have I²C and can talk to the MCP23016/7. And while the connector on them is designed with my actuator board in mind, what it has is pretty much the whole 16 I/O lines coming from the expander. Were you looking for an easy way to connect a Trinket M0 with 16 I/O lines to a breadboard? There you go — you might just have to connect the top header as a male pin header on the bottom of the board. Making an adapter to use this with a Feather would be relatively straightforward.

The (nearly) end result is fairly pleasing:

The version you see there is not the most recent design that you’ll find in the repository, but rather the first prototype of this version of the board that, surprisingly, managed to work the first time — nearly. The actuator board needed some work after I set up the Trinket: at first it kept blinking really fast, because it kept going in and out of test mode (which turns on all the LEDs), and in and out of “fast forward” mode (I’ll get to that in a moment). The reason turned out to be that the inputs needed an explicit pull-down. Which is why the new version of the actuator has three resistors to the side of the buttons. I also learnt that I really should pay attention to the orientation of components on the design, since at first I messed up and pulled up one of the inputs constantly.

Let’s talk a moment about the “fast forward” mode. When I originally came to design the inputs for the project, I decided I would want to have a way to run the whole cycle faster, to make videos out of it without needing to use a time-lapse or something like that. This turned out to be a good idea for testing, as well. But it’s implemented in fundamentally different ways between the 8052 firmware and its CircuitPython counterpart.

In the 8052 version, the time is kept as a number of ticks increased by a timer interrupt. When the fast-forward mode is engaged, instead of adding one tick per interrupt, it adds sixty four. Which is the speed up time of the fast forward. It also means that fast-forward works similar to a fast-forward on an old cassette tape, as it speeds up time while pressed, but it starts and end with normal time.

In the CircuitPython version, the “virtual hour” (as in, which scene we’re at) is wholly dependent on the monotonic time since start up, and the fast forward mode just changes the divisor between real time and virtual time. Which means that pressing fast-forward completely changes the virtual time. Oops. I couldn’t find a decent way around that that would have made the firmware overly complicated, so I pretty much gave up.

On the bright side, since I already have a repository for flame effect code for CircuitPython, it might be a good time for me to include the design for the fireplace, since it should be trivial to do so with the boards as they are right now!

Overall, I’m fairly happy with the result, although it feels less “self-contained” than I originally planned. I was honestly afraid at first that the Trinket M0 would be more power-hungry than the STC89, but it might be the opposite. Indeed I’m now having some issues with at least one one of the LEDs that I put in the bookstore being “burned down” — I fear my current limiting calculation was too tied to the STC89, as noticed when the ground floor LEDs turned out to be much brighter than with the breadboard. I need to make sure I got my numbers right — and if I need to, I have some new, fancier LEDs I can install, although those were rated for 12V, in which case I might need some rethinking of the whole setup.

Now to wrap this up a few words about this whole experience: I have worked on hardware before, but never down to this level. I have worked on firmware for very tiny systems that go into HVAC control planes, and for bigger systems built with COTS components but treated and sold as embedded solutions. But the closest to anything like this for me has been high school, which is over fifteen years ago!

This journey has been a learning experience, and I want to thank the folks over on the Adafruit Discord server. I also welcome any comment and critique on the code or the board designs — the latter particularly because I really have no idea what I was doing, and I had to improvise.

I also want to keep talking about mistakes as I make them, and even wonder out loud when I think I did something wrong, for two reasons: first of all, as I just said, I welcome more critiques of my thought process — if I’m going around something the wrong way, please let me know. The second reason is a bit more “squishy”: while making these mistakes is a sure way to learn more, they are not cheap mistakes. Ordering another set of five prototypes is around £5, but then you need to batch a few and spread over the shipping costs, plus the “bill of materials” tend to add up, even if the single components are just a few cents. I hope I’ll be saving someone else’s money when they look around to “how did someone else do this for this stuff?”

Windows 10, OpenSSH and YubiKey

Update 2020-08-23: I found WinCryptSSH by chance, and that seems to take care of having an actual agent system set up as well, so that this works with WSL! Give that a try, instead of following the advice on most of this post! You can still read it for context, though.

You may remember that a few months ago I suggested that Windows 10 is an interesting FLOSS development platform now, and that I decided to start using Windows 10 on my Dell XPS laptop (also in the hope that the problem I had with the battery would be caused by Linux — and the answer to that is “nope”, indeed the laptop’s battery is terrible.) One of the things I realised setting all of those up, is that I found myself unable to use my usual OpenPGP-based token, and I thought I would try using a YubiKey 5 instead.

Now, between me and Yubico there’s not much love lost, but I thought I would try to make my life easier by using a smartcard that seemed to have a company interested in this kind of usage behind it. Turns out that this was only partially working, unfortunately.

The plan was to set up the PIV mode of the YubiKey 5 to provide the authentication certificate, rather than trying to use the OpenPGP mode. The reason for that is to be found on Yubico’s own website:

GPG4Win’s smart card support is not rock solid; occasionally you might get error messages when trying to access the YubiKey. It might happen after removing and re-inserting the YubiKey, or after your computer has been in sleep mode, etc. This can be resolved by restarting gpg-agent [snip]

Given that GnuPG’s own smartcard support is kind of terrible already, and not wanting to get into the yak shaving of getting that to work on Windows, I was hoping that using the more common (on Windows) interface of PKCS#11, which OpenSSH supports natively (sort of). To give a very quick and oversimplified summary, PKCS#11 is the definition of an API/ABI that end user software, such as OpenSSH, can use to interface with middleware that provides access to PKI-related functions. Many smartcard manufacturers provide ready made middleware implementing a PKCS#11 interface, which I thought Windows supported directly, but I may be wrong. Mozilla browsers rely on this particular interface to handle CA certificates as well, to the point that the NSS library that Mozilla uses is pretty much a two-part component with a PKCS#11 provider and a PKCS#11 client.

As it turns out, Yubico develops a PKCS#11 middleware for YubiKey as part of yubiko-piv-tool, and provides documentation on how to use it for SSH authentication. Unfortunately the instructions don’t really expand to including needed information for using this on Windows, as they explicitly say at the top of the page. But that would never stop me, after all. Most of the setup described in that document is perfectly applicable to Windows, by the way — until you get to the first issue…

The first issue with setting this up is that while Windows 10 does ship with OpenSSH client (and server), it does not ship with PKCS#11 support enabled. Indeed, the version provided even with 20H1 (the current most recent non-Insider build) is 7.7p1, while the current upstream release would be 8.3p1. Thankfully, Microsoft is providing a more up to date build, although that’s also still blocked at 8.1p1. The important part is that these binaries do include PKCS#11 support.

For this whole to work, you need to have both the OpenSSH binaries provided by Microsoft, and the Yubico libraries (DLL) in folders that are part of the PATH environment variable. And they also need to match the ABI. So if you’re setting this up on an x64 system, and used the 64-bit OpenSSH build, you should install the 64-bit Yubico PIV Tool, and vice-versa for 32-bit installs.

Now, despite the installer warning you that to use the PKCS#11 provider you need to have the bin folder in the PATH variable, and that loading the provider will full path will not be enough… the installer does not offer to modify the PATH itself, unlike the Git installer that does, to make it easy to use globally. This is not too terrible, because you also need to add the new OpenSSH in the PATH. For myself, I decided to use a simple OpenSSH folder in my home.

Modifying the environment variables in (English) Windows 10 is fairly straightforward: hit the Search function, and type Environment — it’ll come up with the right control panel, and you can then edit the PATH variable and just browse for the right folder.

There is one more thing you need to do, and that is to create a .ssh/config file in your home directory with the following content:

PKCS11Provider libykcs11.dll

This instructs OpenSSH to look for the Yubico PKCS#11 provider automatically instead of having to specify it on the command line. Note once again that while you could provide the full path to the DLL file, if you didn’t add it to the PATH, it would likely not load — Windows 10 is stricter in where to look for dependencies when dynamically loading a DLL. And also, you’ll get a “not a valid win32 application” error if you installed/configured the wrong version of the Yubico tool (32-bit vs 64-bit).

After that is done, ta-dah! It should work fine!

Screenshot of Windows PowerShell using a YubiKey 5 to authenticate to a Gentoo Linux system.

This works, when using PowerShell. You get asked to enter the PIN for the YubiKey, and you login just fine. Working exactly as intended there.

Unfortunately, the next step I wanted to use this for is to use VSCode to connect to my NUC, and work on things like usbmon-tools remotely, so for that to work, I needed to be able to use this authentication method through the Visual Studio Code remote host mode… and that’s not working at the time of writing. The prompt comes up, but VSCode does not appear to proxy it to anything into its UI for me to answer it.

I’m surprised, because as far as I can tell, the code responsible for the prompt uses the correct read_passphrase() function call for it to be a prompt proxied to the askpass implementation, which I thought was already taken care of by VSCode. I have not spent too much time debugging this problem yet, but if someone is more familiar than me with VSCode and can track down what should happen there, I’d be very happy to hear about it. For now, I filed an issue.

Update 2020-08-04: Rob Lourens from Microsoft moved the issue to the the right repository and pointed to another issue (filed later but in the right place).

The workaround to use this from VSCode, it’s to make sure that "remote.SSH.useLocalServer": true is set, and click on the Details link at the bottom-right corner when it’s trying to connect, to type in the PIN. At which point everything seem to work fine, and even use the connection multiplexer to avoid requesting it all the time.

Screenshot of Visual Studio Code showing a remote SSH connection to my Linux NUC with usbmon-tool open.

Cultural Diversity

I would say that in most cases, I’m the worst person to talk about diversity as, like John Scalzi said, I’m playing at the lowest difficulty setting. There is a small exception to this when the matter relates to cultural diversity in a vastly USA-based environment, which is what I would like to spend a few words on this time.

Let’s start with language. For open source developers, working with people from different countries, and for which English is not the native language, is not uncommon at all — indeed, projects such as Gentoo and VideoLan tend to have overall more people for whom English is a second (if not third) language. There is a difference, though, when it comes to work environments such as my previous bubble, where you have to speak English a lot — impromptu, on the spot.

It took adjustment when I moved to Dublin, despite having spent most of the previous year in Los Angeles: on one side, South-Californian English and Dublin English are significantly different in tone and intentions, and on the other hand, it required “checking the gain” on what people coming from other languages meant with certain words. Again, not something totally new for those who spent time in various Open Source projects, but even IRC allowed you to take a moment to type an answer back, or to re-read what the other person said. And while the voice tone and body language help, it’s still harder to process, understand, and form a reply in your second (or third) language in real-time than it would be over asynchronous medium.

London was another kettle of fish altogether — maybe because I have listened to enough Radio 4 to grasp many Londoners expression fasters than I picked them up in Dublin, or maybe because I have built up the experience there. But that doesn’t mean it wasn’t hard — indeed in my experience I found that British-born-and-raised people tend to be (unwittingly) less forgiving for mis-speaking, expecting every word used to having been carefully chosen by the speaker, including any obvious-to-them rude turn of phrase. This doesn’t appear to be my impression only — the FT’s Michael Skapinker wrote about it, more than once, and I would suggest reading his articles to both the English native speakers, and those of us “second-languagers” that find it hard to work productively with them.

Now, before somebody says that I’m painting the whole group of native English speakers with a single brush — that is certainly not the case. I already singled out the Dubliners before, and I have plenty of friends, colleagues, and ex-colleagues that have learnt not to assume that every word is perfectly weighted beforehand, particularly in spoken lines, and have asked me more than a few times if the word I used was meant to sound as aggressive as it did.

There’s another fun interaction that I learnt to appreciate: talking shop with people coming from cultures that are very direct, such as some of my Romanian friends — we can start cussing and repeat the word shit many times in a row, get excited, and maybe even disagree vehemently on concepts, solutions, and decisions, … and then go grab coffee together like nothing happened. At least once I’ve been asked not quite directly whether I have been a hypocrite — but no, it’s just that for me (us) technical disagreements among friends are just that… technical disagreements.

If all of this boggles your mind and sounds like me trying to justify my tone or behaviour, I would suggest you to read The Culture Map. The book is a fascinating read, and goes into a lot of examples of how different culture “baselines” differ — with repeated reminders that the fact that a culture baseline does not mean that everyone in that culture behaves exactly the same way, we’re not in a world of hats.

Another point that I feel should be spelled out explicitly is about general (popular) cultural references — they don’t really translate very well, under different axis. I have a friend who gets annoyed at Harry Potter references in documentation and service naming, because they didn’t actually care for the series, and so anything that feels “obvious” to a fan goes straight over their head, and I sympathise.

I know of similar complains with most other “big fandoms”: Star Wars, Star Trek, Game of Thrones, Lord of the Rings, Dragonball, Naruto, … it’s a type of gatekeeping that is subtle, but still present: if you don’t happen to have at least dabbled in most of these, things will be slightly harder for you, and give you a feeling that you’re not welcome. Also it turns out that those who read/watched some of those big names in another language might be just as annoyed, because a lot of times names and terms got translated to something different, maybe closer to the target culture, that makes the reference even harder to grasp.

Also, let me be clear that this is not only a problem within the tech spaces dominated by white male geeky engineers. A few years ago I found myself having an argument over the fact that I missed the window of time to submit intern projects ideas, because I went on to check the old site, which was handily named “iwantanintern”, rather than finding out that they switched it over a new one named “redfishbluefish”. When I pointed out it’s a very opaque name to me, I was informed by a surprised HR person that they were expecting everybody to know the book by Dr. Seuss – which turns out it doesn’t even appear to be translated in Italian according to Wikipedia – and that it fit perfectly well with the companion Waldo website, named after the North American name of the protagonist from Where is Wally? As it turns out, I at least knew Wally as a kid, but most of the other children books I was given were either Italian or Disney, or both, so Dr. Seuss never figured into my upbringing.

So what’s the point of bringing up all of this? Well, I wanted to point out that there still is quite a bit of discrimination that can be felt among those of us that are otherwise well into a privileged class — with the hope that this can both make it easier to empathise with those who are less privileged, and as an answer – that I should have provided there and then, and out loud – to those I have heard before saying that I shouldn’t complain, since I can completely pass for British if I wanted to (cannot, and will not!) and so I have nothing to fear from drunk right wingers at a pub the day after Brexit deadlines.

Investigating Chinese Acrylic Lamps

A couple of months ago I built an insulin reminder light, roughly hacking around what I would call an acrylic lamp. The name being a reference to the transparent acrylic (or is it polycarbonate?) shape that you fit on top, and that lights up with the LEDs it’s resting on top. They are totally not a new thing, and even Techmoan looked at them three years ago. The relatively simple board inside looked fairly easy to hack around, and I thought it would make a good hack project to look more into them.

They are also not particularly expensive. You can go on AliExpress and get them for a few bucks each with so many different shape designs. There’s different “bases” to choose from, too — the one I hacked the Pikachu on was a fairly simple design with a translucent base, and no remote control, although the board clearly showed space for a TSOP-style infrared decoder. So I ended up ordering four different variants — although all of them without remotes because that part I didn’t particularly care for: one translucent base, one black base with no special features, one with two-colour shapes and LEDs, one one self-changing LEDs with mains power only.

While waiting for those to turn up, I also found a decent deal on Amazon on four bases without the acrylic shapes on them for about £6 each. I took a punt and ordered them, which turned out to be a better deal than expected.

These bases appear to use the same board design, and the same remote control (although they shipped four remotes, too!), and you can see an image of it on the right. This is pretty much the same logic on the board as the one I hacked for my insulin reminder, although it has slightly different LEDs, which are not common anode in the package, but are still wired in a common-anode configuration.

For both the boards, the schema above is as good a reversing as I managed on my own. I did it on the white board, so there might be some differences in the older green one, particularly in the number of capacitors, but all of that is not important for what I’m getting to right now. I shortened the array to just four LEDs to show, but this goes on for all of the others too. The chip is definitely not a Microchip one, but it fits the pinout, so I kept that one, similarly to what I did for the fake candle. Although in this case there’s no crystal on the board, which suggests this is a different chip.

I kind of expected that all the remaining boards would be variation on the same idea, except for the multi-color one, but I was surprised to figure out that only two of them shared the same board design (but took different approaches as to how to connect the IR decoder — oh yeah, I didn’t select any of the remote-controlled lamps, but two of them came with IR decoderes anyway!)

The first difference is due to the base itself: there’s at least two types of board that relate to where the opening for the microUSB port is in relation to the LEDs: either D-shaped (connector inline with the LEDs) or T shaped (connector perpendicular to the LEDs). Another difference is in the placement of the IR decoder: on most of the bases, it’s at 90° from the plug, but in at least one of them it’s direct opposite.

Speaking of bases, the one that was the most different was the two-colours base: it’s quite smaller in size, and round with a smooth finish, and the board was proper D shaped and… different. While the LEDs were still common-anode and appeared wired together, each appears to have its own paired resistor (or two!), and the board itself is double-sided! That was a surprise! It also is changing the basic design quite a bit more than I expected, including only having one Zener, and powering up the microcontroller directly over 4.5V instead of using a 3V regulator.

It also lacks the transistor configuration that you’d find on the other models, which shouldn’t surprise, given how it needs to drive more than the usual three channels. Which actually had me wonder: how does it drive two sets of RGB LEDs with an 8-pin microcontroller? Theoretically, if you don’t have any inputs at all, you could do it: VDD and VSS take two pins, each set of LEDs take three pins for the three colour channels. But this board is designed to take an IR decoder for a remote control, which is an input, and it comes with a “button” (or rather, a piece of metal you can ground with your finger), which is another input. That means you only have four lines you can toggle!

At first I thought that the answer was to be found on the other six-pin chip on the lift, but turns out that’s not the case. That one is marked 8223LC and that appears to correspond to a “touch controller” Shouding SD8223L and is related to the metal circlet that all of these bases use as input.

Instead, the answer became apparent when using the multimeter in continuity mode: since it provides a tiny bit of current, you can turn on LEDs by pointing them between anode and cathode of the diode. Since the RGB cathode on the single LED package are already marked on the board, that’s also not difficult to do, and while doing that I found their trick: the Blue cathods are common to all 10 LEDs, they are not separate for outer and inner groups, and more interestingly the Green cathodes are shorted to the anodes for the inner four LEDs — that means that only the outer LEDs have the full spectrum of colours available, and the only colour combination that make the two groups independent is Green/Red.

So why am I this interested in these particular lamps? Well, they seem to be a pretty decent candidate to do some “labor of love” hack – as bigclive would call it – with making them “Internet of Things” enabled: there’s enough space to fit an ESP32 inside, and with the right stuff you should be able to create a lamp that is ESPHome compatible — or run MicroPython on it, either to reimplement the insulin reminder logic, or something else entirely!

A size test print of my custom designed PCB.

Indeed, after taking a few measurement, I decided to try my hand at designing a replacement board that fits the most bases I have: a D-shaped board, with the inline microUSB, has just enough space to put an ESP32 module on it, while keeping the components on the same side of the board like in the original models. And while the ESP32 would have enough output lines to control at least the two group of LEDs without cheating, it wouldn’t have enough to address normal RGB LEDs individually… but that doesn’t need to stop a labor of love hack (or an art project): Adafruit NeoPixel are pretty much the same shape and size, and while they are a bit more expensive than the regular RGB LEDs they can be individually addressed easily.

Once I have working designs and code, I’ll be sharing, particularly in the hopes that others can improve on them. I have zero designing skills when it comes to graphics or 3D designing, but if I could, I would probably get to design my own base as well as the board: with the exception of the translucent ones, the bases are otherwise some very bland black cylinders, and they waste most of the space to allow 3×AAA batteries (which I don’t think would last for any amount of time). Instead, a 3D printed base, with hooks to hold it onto a wall (or a door) and a microUSB-charged rechargeable battery, would be a lovely replacement for the original ones. And if we have open design for the board, there’s pretty much no need to order and hope for a compatible base to arrive.

Diagonal Contributions

This is a tale that starts on my previous dayjob. My role as an SRE had been (for the most part) one of support, with teams dedicated to developing the product, and my team making sure that it would perform reliably and without waste. The relationship with “the product team” has varied over time and depending on both the product and the SRE team disposition, sometimes in not particularly healthy way either.

In one particular team, I found myself supporting (together with my team) six separate product teams, spread between Shanghai, Zurich and Mountain View. This put particular pressure on the dynamics of the team, particularly when half of the members (based in Pittsburgh) didn’t even have a chance to meet the product team of two services (based in Shanghai), as they would be, in the normal case, 12 hours apart. It’s in this team that I started formulating the idea I keep referring to as “diagonal contributions”.

You see, there’s often a distinction between horizontal and vertical contributions. Vertical referring to improving everything of a service, from the code itself, to its health checks, release, deployment, rollout, … While horizontal referring to improving something of every service, such as making every RPC based server be monitored through the same set of metrics. And there are different schools of thought on which option is valid and which one should be incentivised, and so it usually depends on your manager and their manager on which one of the two approach you’ll be rewarded to take.

When you’re supporting so many different teams directly, vertical contributions are harder on the team overall — when you go all in to identify and fix all the issues for one of the products, you end up ignoring the work needed for the others. In these cases an horizontal approach might pay off faster, from an SRE point of view, but it comes with a cost: the product teams would then have little visibility into your work, which can turn into a nasty confrontation, particularly depending on the management you find yourself dealing with (on both sides).

It’s in that situation that I came up with “diagonal contributions”: improve a pain point for all the services you own, and cover as many services you can. In a similar fashion to rake collection, this is not an easy balance to strike, and it takes experience to have it done right. You can imagine from the previous post that my success at working on this diagonal has varied considerably depending on teams, time, and management.

What did work for me, was finding some common pain points between the six products I supported, and trying to address those not with changes to the products, but with changes to the core libraries they used or the common services they relied upon. This allowed me to show actual progress to the product teams, while solving issues that were common to most of the teams in my area, or even in the company.

It’s a similar thing with rake collection for me: say there’s a process you need to follow that takes two to three days to go through, and four out of your six teams are supposed to go through it — it’s worth it to invest four to six days to reduce the process to something that takes even just a couple of hours: you need fewer net people-days even just looking at the raw numbers, which is very easy to tell, but that’s not where it stops! A process that takes more than a day adds significant risks: something can happen overnight, the person going through the process might have to take a day off, or they might have a lot of meetings the following day, adding an extra day to the total, and so on.

This is also another reason why I enjoy this kind of work — as I said before, I disagree with Randall Munroe when it comes to automation. It’s not just a matter of saving time to do something trivial that you do rarely: automation is much less likely to make one-off mistakes (it’s terrifyingly good at making repeated mistakes of course), and even if it doesn’t take less time than a human would take, it doesn’t take human time to do stuff — so a three-days-long process that is completed by automation is still a better use of time than a two-days-long process that rely on a person having two consecutive days to work on it.

So building automation or tooling, or spending time making it easier to use core libraries, are in my books a good way to make contributions that are more valuable than just to your immediate team, while not letting your supported teams feel like they are being ignored. But this only works if you know which pain points your supported teams have, and you can make a case that your work directly relates to those pain points — I’ve seen situations where a team has been working on very valuable automation… that relieved no pain from the supported team, giving them a feeling of not being taken into consideration.

In addition to a good relationship with the supported team, there’s another thing that helps. Actually I would argue that it does more than just help, and is an absolute requirement: credibility. And management support. The former, in my experience, is a tricky one to understand (or accept) for many engineers, including me — that’s because often enough credibility in this space is related to the actions of your predecessors. Even when you’re supporting a new product team, it’s likely its members have had interactions with support teams (such as SRE) in the past, and those interactions will colour the initial impression of you and your team. This is even stronger when the product team was assigned a new team — or you’re a new member of a team, or you’re part of the “new generation” of a team that went through a bit of churn.

The way I have attacked that problem is by building up my credibility, by listening, and asking questions of what the problems the team feel are causing them issues are. Principles of reliability and best practices are not going to help a team that is struggling to find the time to work even on basic monitoring because they are under pressure to deliver something on time. Sometimes, you can take some of their load away, in a way that is sustainable for your own team, in a way that gains credibility, and that further the relationship. For instance you may be able to spend some time writing the metric-exposing code, with the understanding that the product team will expand it as they introduce new features.

The other factor as I said is management — this is another of those things that might bring a feeling of unfairness. I have encountered managers who seem more concerned about immediate results than the long-term pictures, and managers who appear afraid of suggesting projects that are not strictly within the scope of reliability, even when they would increase the team’s overall credibility. For this, I unfortunately don’t have a good answer. I found myself overall lucky with the selection of managers I have reported to, on average.

So for all of you out there in a position of supporting a product team, I hope this post helped giving you ideas of how to building a more effective, more healthy relationship.

Art Projects Setbacks: Birch Books’ New Brain

You may remember that two months ago I declared my art project completed, and posted pictures of it. I have asid back then that it wasn’t quite all done and dusted, because I have been waiting for PCBs to arrive from PCBway, an online PCB ordering service that has been heavily advertising on YouTube makers’ videos for the past year or so. I ordered those back in April and, now that it’s July, they still haven’t turned up! I decided instead to follow again the advice of bigclive, and tried JLCPCB instead, which turned out to be a very good idea: ordered on Sunday, the boards arrived on Friday!

So on Friday I set myself to spend some of the time listening to training talks, and solder on the boards, which turned out to be a bit less practical than I intended — I might try again to solder microUSB connectors, but then I’ll follow Richard’s advice and try without the shielding in the back. After some screwups, I managed to get working boards, programmed a new STC89, and went on to connect it to the LEGO set — and it all turned up at once, which it wasn’t meant to!

After trying a few combinations of different Darlington arrays, and flashing a new build of the firmware, I ended up figuring that all the micros I flasked anew were completely stuck, with all the I/O ports staying at 5V with an occasional momentary pull down. At first I thought I screwed up the R/C network for the reset line, but no, I managed to check it with the Saleae Logic Pro and the reset line behaves exactly as it should have. I also popped in the original STC I used, before the ones I ordered on AliExpress arrived from China. And it worked fine, if a bit misconfigured (the new firmware has a few fixes). I also tried it and one of the “pristine” ones on the programmer, and they all work fine, but anything I program anew fails the same way.

It’s not a firmware problem either: I tried going back in time and build even the first draft version; I tried demos written by someone else; I tried four different versions of SDCC — all to no avail. Nothing changed in stcgal so it doesn’t sound like it’s a problem there… I’m at a complete loss of what’s going on there. I honestly felt desperate about that, because it worked perfectly fine two months ago, and now it suddenly stopped.

So instead, I spent Saturday working on my plan B: using CircuitPython for the main scenes logic, with an Adafruit board — in particular I decided to start working with the Feather M0, but I’m targeting the Trinket M0, which is cheaper, smaller, and still plenty powerful for what I need. This was also fun because, since I designed the actuator board to be separate from the MCU and in particular to be possible to plug it into a breadboard, which means that half of the design was already known working and didn’t need redesign.

Unfortunately what I didn’t think of was to produce a test board that would just plug into the pins on the actuator board to test it, without connecting it to the bookstore… so I ended up having to manually wire a lot of LEDs on the breadboard to turn them on. I’m going to use this as a visible example of why you should always build a fake of your service if you’re writing a client, to run integration testing with! I am addressing this by ordering an explicit testing board to connect on top of the actuator, so that I can just use it. Also fun fact: it looks like the LEDs that I have used in this pictures are… more sensible than the other ones, and Sparky the Blue Smoke Monster came to visit me again when I gave it a straight 5V.

There’s more jankiness going around with this board, though. When I was looking at expanding the I/O capabilities of the Feather, I ended up buying a few MCP23016 expanders. These are I²C chips that provide access to 16 input or output lines while only requiring two wires on the MCU. I can’t for the life of me remember or figure out why I went with this particular model, that currently sports a «Not Recommended for new designs.» warning on the top of the Microchip product page. I might as well have mistyped the MCP23017, which is the modern version of the same chip.

Beside not being (at all) pin compatible, the documentation on Adafruit’s website is designed for the later (’17) variant, and indeed the older (’16) version is not currently supported by CircuitPython (although I’m addressing this). It doesn’t stop there: 16 requires an additional R/C network, which turned out to be very unreliable: it ended up working better with a single resistor on the CLK line, and I’m not at all sure on why. So in general, the difference between MCP23016 and MCP23017 is that the latter is much nicer to use. Since I do have a few 16, and the components needed for the RC network, I’ve started writing the CircuitPython code based on that first, but also designed a separate board to fit the ’17, using a non-default address, so that I can distinguish between the two at startup.

Part of the reason for that is also that the ’16 is very finnicky, and when it receives a command it doesn’t entirely like, it decides to simply crash, and requires a full power cycle (not just a reset), which isn’t very good.

There’s another jankiness that you can possibly spot on the right side of the board: why is there a transistor, in this fairly minimalistic design? I mean okay, the pull-up resistors for I²C lines, and the RC network already ruined the idea of having the daughterboard and the expander only. But a transistor? Well, it turns out when I designed the actuator board I made a silly mistake: I didn’t consider that reset lines (RST) are not always active high (as they are on the 8051), but are sometimes actually ¬RST — and as you can imagine now, both the Feathers and the Trinket M0 are indeed active low.

And since pushing the Reset button is bridging the RST line of the actuator board to VIO (hey, at least I did realise that if I went to a different platform I would need a different “logic high”!), it wouldn’t work at all. Instead, it now activates the transistor, so that it bridges the RST line to ground. Basically, a NOT gate implemented with a single transistor — which I might have remembered thanks to Richard Feyman Lectures on Computation, which is a book I actually would recommend for those interested.

And if you’re wondering why the I/O lines are all orange, I checked which equipment wire roll I had the most of, turned out to be orange.

Also, a word of advise that I found out by complete random chance as well: the default for “quote dimensions” in Eagle is to put them in as traces at the top of the board! When I sent the generated gerber files to JLCPCB this time, they noted some dead shorts and asked me to confirm if that was intentional — so kudos to them for noticing. They allowed me to upload new, fixed gerber files after that.

Was Acronis True Image 2020 a mistake?

You may remember that a few months ago I complained about Acronis True Image 2020. I have since been mostly happy with the software, despite it being still fairly slow when uploading a sizable amount of changed files, such as after shooting a bunch of pictures at home. This would have been significantly more noticeable if we had actually left the country since I started using it, as I usually shoot at least 32GB of new pictures on a trip (and sometimes twice as much), but with lockdown and all, it didn’t really happen.

But, beside for that, the software worked well enough. Backup happened regularly, both on the external drive and the Cloud options, and I felt generally safe with using it. Until a couple of weeks ago, when suddenly it stopped working, and failed with Connection Timeout errors. They didn’t correlate with anything: I did upgrade to Windows 10 20H1, but that was a couple of weeks before, and backups went through fine until then. There was no change in network, there was no change from my ISP, and so on.

So what gives? None of the tools available from Acronis reported errors, ports were not marked as blocked, and I was running the last version of everything. I filed a ticket, was called on the phone by one of their support people who actually seemed to know what he was doing — TeamViewer at hand, he checked once again for connectivity, and once again found that everything is alright, the only thing he found to change was disabling the True Image Mounter service, which is used to get quick access to the image files, and thus is not involved in the backup process. I had to disable tha tone because, years after Microsoft introducing WSL, enabling it breaks WSL filesystem access altogether, so you can’t actually install any Linux distro, change passwords in the ones you already installed, or run apt update on Debian.

This was a week ago. In the meantime support asked me to scan the disks for errors because their system report reported one of the partitions as having issues (if I read their log correctly, that’s one of the recovery images so it’s not at all related to the backup), and the more recent one to give them a Process Monitor log while running the backup. Since they don’t actually give you a list of process to limit to, I ended up having to kill most of the other running application to take the log, as I didn’t want to leak more information that I was required to. It still provided a lot of information I’m not totally comfortable with having provided. And I still have no answer, at the time of writing.

It’s not all here — the way you provide all these details to them is a fairly clunky: you can’t just mail them, or attach them through their web support interface, as even their (compressed) system report is more than 25MB for my system. Instead what they instruct you to do is to take the compressed files and uploaded them through FTP with the username/password pair they provide to you.

Let me repeat that. You upload compressed files, that include at the very least most of the filenames you’re backing up, and possibly even more details of your computer, with FTP. Unencrypted. Not SFTP, not FTPS, not HTTPS. FTP. In 2020.

This is probably the part that makes my blood boil. Acronis has clearly figured out that the easiest way for people to get support is to use something that they can use very quickly. Indeed you can still put an FTP URL In the location bar of your Windows 10 File Explorer, and it will allow you to upload and download files over it. But it does that in a totally unencrypted, plain-text manner. I wonder how much more complicated it would be to use at least FTPS, or to have an inbound-only password-protected file upload system, like Google Drive or Dropbox, after all they are a cloud storage solution provider!

As for myself, I found a temporary workaround waiting for the support folks to figure out what they likely have screwed up on their London datacenter: I’m backing up my Lightroom pictures to the datacenter they provide in Germany. It took three days to complete, but it at least gives me peace of mind that, if something goes horribly wrong, at least the most dear part of my backup is saved somewhere else.

And honestly, using a different backup policy than the rest of the system just for the photos is probably a good idea: I set it to “continuous backup”, because generally speaking it usually stays the same all the time, until I go and prepare another set to publish, then a lot of things change quickly and then nothing until the next time I can do it.

Also, I do have the local backup — that part is still working perfectly fine. I might actually want to use it soon, as I’m of two minds between trying to copy over my main OS drive from a 1TB SSD to a 2TB SSD, and just getting a 2TB SSD, and installing everything anew onto it. If I do go that route, I also will reuse the 1TB SSD onto my NUC instead, which right now is running with half SATA and half NVMe storage.

Conclusions? Well, compared to the Amazon Glacier + FastGlacier (that has not been updated in just over two years now, and still sports a Google+ logo and +1 button!), it’s still good value for money. I’m spending a fraction of what I used to spend with Amazon, and even in the half-broken state it’s backing up more data and has significantly faster access. The fact that you can set different policies for different parts of the backup is also a significant plus. I just wish there was a way to go from a “From Folders to Cloud” backup to a tiered “From Folders to External, plus Cloud” — or maybe I’ll bite the bullet and, if it’s really this broken, just go and re-configure also the Lightroom backup to use the tiered option.

But Acronis, consider cleaning up your support act. It’s 2020, you can’t expect your customers to throw you all their information via unencrypted protocols, for safety’s sake!

Update 2020-06-30: the case is now being escalated to the “development and cloud department” — and if this is at all in the same ballpark as the companies I worked for it means that something is totally messed up in their datacenter connectivity and I’m the first one to notice enough to report to them. We’ll see.

Update 2020-07-16: well, the problem is “solved”. In the sense that after I asked them, they moved my data out of the UK (London) datacenter into the Germany one. Which works fine and has no issues. They also said before they will extend my payment to the month that I didn’t have the backup working. But yeah, turns out that nobody seems to have very clear on their side what was going on, but the UK datacenter just disappeared off my dashboard. I wonder how many had this problem.

Kodi, NUC, and CEC adapters

Welcome to this year’s yearly Kodi post. The fact that I don’t write about Kodi very often should be a sign that it’s actually fairly stable, and indeed beside for the random fight with X11 over whether DPMS should or should not be enabled, my HTPC setup works fairly well, even though it’s getting used less often — the main content I have on it is stuff that we own in DVDs — either because we ripped it ourselves or… found it to make it easier on us.

In part due to my poking around my other TV-related project, and in part because my phone seems to have issues at keeping its IPv6 stable – which means I can’t control Kodi with the Kore app until I turn wifi off and on again – I decided to address the biggest missing features of Intel’s NUC: the lack of HDMI CEC support — and as far as I know that’s a missing feature even on their most recent NUCs.

With CEC, HDMI-connected devices can provide a level of control, including signaling a request to turn on or off, to change the input selection, and so on. It also allows a TV remote control to send most button presses to a connected device. This allows, for instance, to have control over Kodi’s menus with a single remote control, rather than having to set up a second remote just for that, which is what I was originally planning on doing, using a vintage SVHS remote control.

As I said, Intel doesn’t support CEC on their NUC, to this day. Instead they suggest you to buy an adapter from companies such as Pulse Eight, which thankfully still also sell the harness for my six years old model. The adapter is a microcontroller that is connected to an Intel-provided header, from which it receives the CEC signals coming from the HDMI port on one side, and exposes an USB device through the motherboard header, from which the libCEC – that Pulse Eight themselves developed – can decode it and provide information to the other software.

The Pulse-Eight installation instructions gloss over the most interesting part: which pins does it need on the Custom Solutions Header, and more interesting how do you get to them. It turns out that I had to disassemble most of the NUC to get to the connector, and only found the right pins thanks to Kingj who reviewed this kit five years ago. The thing that his review seemed to ignore, though, is that the connectors are just about as tall as the same there is to use that theader — and the cable rubs onto the WiFi antenna (which I actually never use, so I was actually tempted to remove, until I remember they do Bluetooth, too). And the suggested position to stick the small board onto the Ethernet connector bumps into the heat foam that’s meant to protect the mSATA drive. But beside those two issues, it fit all together, and with a bit of worry and fear to break stuff, and possibly one of the black pip retainers broken, I did manage to get the whole thing installed just fine.

Troubleshooting Pulse-Eight CEC Adapter Issues

After installing and booting up the NUC, I found myself nearly crying from the frustration when I booted up and saw the adapter turn up on lsusb as Atmel at90usb162 DFU bootloader. This appears to be a common bane of multiple people, and at least on Windows people suggested that a firmware update would help — the firmware update being only available as an option on Windows (and possibly Wine?), but not released on Linux. Turns out that this is a red herring.

The DFU bootloader is the Device Firmware Update — this is the mode you need a microcontroller (MCU) to be in to load it with a new program. Usually there’s two ways to enter this operation: a physical one and a logical one (sending a specific command via USB, for instance), and while I didn’t expect them to be stateful, they appear to be. At first I was afraid I grounded something I shouldn’t have (which would have put into firmware download mode), or that the device left the factory without being programmed (which would have indeed been fixed by running the firmware update).

Instead, it turns out it might just be stateful, and was left in programming mode when it left the factory — and you can fix that with a single command on Linux: dfu-tool attach. The command is provided by fwupd (LVFS), and worked like a charm on my board for it to show up as the /dev/ttyACM0 it needed to be.

The next problem was figuring out why Kodi was not acting on any of the remote button presses. According to a lot of blogs, and forum posts, and wikis, this might have had to do with remote.xml. It didn’t, at least for me. The problem is that Kodi recognizes the presence of the CEC device by looking at the USB devices connected, but it doesn’t check if it has permissions to open the device, until it tries to. And the device (/dev/ttyACM0, which is usually the easiest no-drivers-needed way to connect an USB-to-UART bridge), is usually owned by the dialout group, which my Kodi user was not part of.

Why dialout and what the heck does it mean? Given it is 2020 as I write this, it might be worth giving a bit of a history lesson, without being condescending to those who might not have heard the term before. This group is usually “in charge” of serial port devices, such as ttyS0 and ttyACM0 and ttyUSB0 — the reason for that is that back in the old days of phone-line connectivity (which for some are current days still), the main use for serial ports was to connect modems (or in some cases faxes) — and that meant that whoever could access the serial ports would be able to “dial out” — that is, make a phone call. Since phone calls could be (and possibly still are) very expensive, accessing those ports couldn’t be made too easy. This is now a totally legacy naming, but… well, it’s hard to change these things, particularly in UNIX-land.

I solved this by adding Kodi to the group. The NUC doesn’t have any other serial port, so it’s not an issue to do that. The alternative would be to have an udev rule that explicitly sets the device as owned by the user running Kodi, similar to the rules in glucometerutils.

So now the device works, Kodi can receive the commands just fine, and I don’t have to bother to get the Kore app out just to select the next episode of whatever we’re watching at any one time. The only thing that remains annoying to me is that I can’t access the CEC device from any other process. I was wondering if I could implement most of what I wanted in the project I described previously (namely, controlling TV inputs) through CEC (answer: probably), but that’s a moot point because serial devices can only be accessed by a single process at a time (the same is true of most other devices — this is the reason why Windows has device drivers, and why you end up with so many “daemons” in Linux and other *nix: you need something that “multiplexes” the account across different processes).

If this was ten years ago, you’d have me design a CEC Daemon, and proof-of-concept integrate it in Kodi. Time being what it is, I’m unlikely to be working on this any time soon. But if someone feels like it’s a worthy task, I’ll be happy to chat, discuss designs, or review code to implement it.

Falsehoods in Tutorials: Database Schemas

It’s well possible that a number of people reading this post have already stumbled across a few of the “Falsehoods Programmers Believe…” documents. If not, there appears to be a collection of them, although I have honestly only read through the ones about names, addresses, and time. The short version of all of this, is that interfacing software with reality is complicated, and in many cases, programmers don’t know how complicated it is at all. And sometimes this turns into effectively institutional xenophobia.

I have already mused that tutorials and documentation are partially to blame, by spreading code memes and reality-hostile simplifications. But now I have some more evidence of this being the case, without me building an explicit strawman like I did last time, and that brings me to another interesting point, in regards to the raising importance of getting stuff right beforehand, as costs to correct these mistakes are raising.

You see, with lockdown giving us a lot of spare time, I spent some of it on artsy projects and electronics, while my wife spent it learning about programming, Python, and more recently databases. She found a set of tutorials on YouTube that explain the basis of what a database is, and how SQL works. And they were full of those falsehoods I just linked above.

The tutorials use what I guess is a fairly common example of using a database for employees, customers, and branches of a company. And it includes in the example the fields for first name and last name. Which frankly is a terrible mistake — with very few exception that include banks and airlines, there’s no need to distinguish between components of a name, and a simple full name field would work just as well, and don’t end up causing headaches to people from cultures that don’t split names the same way. The fact that I recently ranted about this on Twitter against VirusTotal is not totally coincidental.

It goes a bit beyond that though, by trying to explain ON DELETE triggers by attaching them to the deletion of an employee from the database. Now, I’m not a GDPR lawyer, but it’s my understanding that employee rosters are one of those things that you’re allowed to keep for essential business needs — and you most likely don’t want to ever delete employees, their commissions payment history, and tax records.

I do understand that a lot of tutorials need to be using simple examples, as setting up a proper HR-compatible database would probably take a lot more time, particularly with compartmentalizing information so that your random sales analyst don’t have access to the home phone numbers of their colleagues.

I have no experience with designing employee-related database schemas, so I don’t really want to dig myself into a hole I can’t come out of, by running with this example. I do have experience with designing database schemas for product inventory, though, so I will run with that example. I think it was a different tutorial that was talking about those, but I’ll admit I’m not sure, because I didn’t pay too much attention as I was getting annoyed at the quality.

So this other tutorial focused on products, orders and sales total — its schema was naïve and not the type of databases any real order history system would use — noticeably, it assumed that an order would just need to connect with the products, with the price attached to the product row. In truth, most databases like those would need to attach the price for which an item was sold to the order — because products change prices over time.

And at the same time, it’s fairly common to want to keep the history of price changes for an item, which include the ability to pre-approve time-limited discounts, so a table of products is fairly unlikely to have the price for each item as a column. Instead, I’ve commonly seen these database to have a prices table that references the items, and provides start and end dates for the price. This way, it’s possible to know at any time what is the “valid price” for an item. And as some of my former customers had to learn on their own, it’s also important to separate which VAT is used at which time.

Example ER diagram showing an example of a more realistic shop database.

There are five tables. * indicates the primary key.

Order (*ID, Customer_ID, Billing_Address, Shipping_Address)
Order_Products(*Order_ID, *Product_ID, Gross_Price, VAT_Rate)
Product(*ID, Name)
Product_VAT(*Product_ID, *Start_Date, End_Date, VAT_Rate)
Product_ID(*Product_ID, *Start_Date, End_Date, Gross_Price)

This is again fairly simplified. Most of the shopping systems you might encounter use what might appear redundant, particularly when you’re taught that SQL require normal form databases, but that’s just in theory — practice is different. Significantly so at times.

Among other things, if you have an online shop that caters to multiple countries within the European Union, then your table holding products’ VAT information might need to be extended to include the country for each one of them. Conversely, if you are limited to accounting for VAT in a single country you may be able to reduce this to VAT categories — but keep in mind that products can and do change VAT categories over time.

Some people might start wondering now why would you go through this much trouble for an online store, that only needs to know what the price is right now. That’s a good point, if you happen to have multiple hundreds’ megabytes of database to go through to query the current price of a product. In the example above you would probably need a query such as

SELECT Product.ID, Product.Name, Product_Price.Gross_Price, Product_VAT.VAT_Rate
FROM Product
  LEFT JOIN Product_Price ON Product_Price.Product_ID = Product.ID
  LEFT JOIN Product_VAT ON Product_VAT.Product_ID = Product.ID
WHERE
  Product.ID = '{whatever}' AND
  Product_Price.Start_Date <= TODAY() AND
  Product_Price.End_Date > TODAY() AND
  Product_VAT.Start_Date <= TODAY() AND
  Product_VAT.End_Date > TODAY();

It sounds like an expensive query, doesn’t it? And it seems silly to go and scan the price and VAT tables all the time throughout the same day. It also might be entirely incorrect, depending on its placement — I do not know the rules of billings, but it may very well be possible that an order be placed close to a VAT change boundary, in which case the customer could have to pay the gross price at the time of order, but the VAT at shipping time!

So what you do end up using in many places for online ordering is a different database. Which is not the canonical copy. Often the term used for this is ETL, which stands for Extract, Transform, Load. It basically means you can build new, read-only tables once a day, and select out of those in the web frontend. For instance the above schema could be ETL’d to include a new, disconnected WebProduct table:

The same ER diagram as before, but this time with an additional table:

WebProduct(*ID, *Date, Name, Gross_Price, VAT_Rate)

Now with this table, the query would be significantly shorter:

SELECT ID, Name, Gross_Price, VAT_Rate
FROM WebProduct
WHERE ID = '{whatever}' AND Date = TODAY();

The question that comes up with seeing this schema is “Why on Earth do you have a Date column as part of the primary key, and why do you need to query for today’s date?” I’m not suggesting that the new table is generated to include every single day in existence, but it might be useful to let an ETL pipeline generate more than just one day’s worth of data — because you almost always want to generate today’s and tomorrow’s, that way you don’t need to take down your website for maintenance around midnight. But also, if you don’t have any expectation that prices will fluctuate on a daily basis, it would be more resource-friendly to run the pipeline every few days instead of daily. It’s a compromise of course, but that’s what system designing is there for.

Note that in all of this I have ignored the issue of stock. That’s a harder problem, and one that might not actually be suited to be solved with a simple database schema — you need to come to terms with compromises around availability and the fact that you need a single source of truth for how many items you’re allowed to sell… consistency is hard.

Closing my personal rant on database design, there’s another problem I want to point a spotlight to. When I started working on Autotools Mythbuster, I explicitly wanted to be able to update the content, quickly. I have had multiple revisions of the book on the Kindle Store and Kobo, but even those lagged behind the website a few times. Indeed, I think the only reason why they are not lagging behind right now is that most of the changes on the website in the past year or two have only been cosmetics, and not applying to ePub.

Even for a project like that, which uses the same source of truth for the content, there’s a heavy difference in the time cost of updating the website rather than the “book”. When talking about real books, that’s an even bigger cost — and that’s without going into the print books realm. Producing content is hard, which is why I realised many years ago that I wouldn’t have the ability to carve out enough time to make a good author.

Even adding diagrams to this blog post has a slightly higher cost than just me ranting “on paper”. And that’s why sometimes I could add more diagrams with my ideas, but I don’t, because the cost of producing it, and keeping it current would be too high. The Glucometers Protocols site as a few rough diagrams, but they are generated with blockdiag so that they can be edited quickly.

When it comes to online tutorial, though, there’s an even bigger problem: the possibly vast majority of them are, nowadays, on YouTube, as videos shot with a person in frame, to be more like a teacher in a classroom, that can explain things. If something in the video is only minimally incorrect, it’s unlikely that those videos would be re-shot — it would be an immense cost in time. Also, you can’t just update a YouTube video like you do a Kindle book — you lose comments, likes, view-counts, and those things matter for monetization, which is what most of those tutorials out there are made for. So unless the mistakes in a video-tutorial are Earth-shattering, it’s hard to expect the creators to go and fix them.

Which is why I think that it’s incredibly important to get the small things right — Stop using first and last name fields in databases, objects, forms, and whatever else you are teaching people to make! Think a bit harder as for how a product inventory database would look like! Be explicit in pointing out that you’re simplifying to an extreme, rather than providing a real-world-capable design of a database! And maybe, just maybe, start using examples that are ridiculous enough that they don’t risk being used by a junior developer in the real world.

And let me be clear on this: you can’t blame junior developers for making mistakes such as using a naïve database schema, if that’s all they are taught! I have been saying this at previous dayjob for a while: you can’t complain about the quality of code of newbies unless you have provided them with the right information in the documentation — which is why I spent more time than average on example code, and tutorials, to fix up trimmings and make it easier to copy-paste the example code into a working change that follows best practices. In the words of a colleague wiser than me: «Example code should be exemplar.»

So save yourself some trouble in the future, by making sure the people that you’re training get the best experience, and can build your own next tool to the best of specs.