Revisiting Open Source Washing Machines (in Theory)

A number of years ago I wrote a post about the idea of Free Software washing machines, using it as an allegory to point out how silly it might be to attack people for using products with closed-source firmware, when no alternative is readily available. At the time, the idea of wanting an open source washing machine was fairly ludicrous, despite me pointing out that it would have some interesting side effect with programmability.

Well, in 2020 I would actually suggest that I really wish we had more Open Source washing machines, and actually wish we had, in general, more hackable home appliances (or “white goods” as they are called over here), particularly nowadays that it seems like connected appliances are both trendy and the butt of many jokes.

It’s not just because having access to the firmware of everything is an interesting exercise, but because most appliances are lasting for years — which means they are “left behind” by technology moving on, including user interfaces.

Let’s take for instance washing machines — despite being older than the one in my apartment, my mother’s LG washing machine has a significantly more modern user interface. The Zanussi one we have here in London has one big knob to select the program – which is mostly vestigial from the time it would be spring-loaded and moving to select the various phases – and then a ton of buttons to select things like drying mode (it’s a washer/dryer combo), drying time, and the delay start (awesome feature). You can tell that the buttons were addition to an interface, and that the knob is designed to be as similar to the previous interface as possible. And turns out the buttons are not handy: both drying time and delay have only one button each — which means you can only increase those values: if you miss your target, you need to go back to zero and up again.

On the other hand, my mother’s LG also has a knob — but the knob is just a free-spinning rotary encoder connected to a digital controller. While her model is not a dryer, I’m reasonably sure that the machine has a delay start feature, which is configured by pressing one button and then rotating the wheel. A more flexible interface, with a display a bit more flexible than the two multi-segments that our current machine has, would do wonder to usability, and that’s without going into any of the possible features of a “connected appliance”. Observe-only, that is — I would still love seeing a notification on our phones when the washing machine completed, so that we don’t forget that we have clean clothes that need to be hanged to dry. Yes we actually forget sometimes, particularly before the pandemic if we left them to delay-run from the morning.

Replacing a washing machine just because the user interface is bad is a horrible thing to do for the planet. And in particular when living in rented accommodation, you own the white goods, and even when they are defective, you don’t get to choose them — you end up most of the time with whichever is the cheapest one in the shop, power efficiency be damned, since rarely the landlords are paying for electricity. So having hackable, modular washing machines would be just awesome: I could ask our landlord “Hey can I get a smartmodule installed for the washing machine? I’ll pay the £50 it costs!” (seriously, if it costs more than that, it would be a rip-off — most of the controls you need for this can be hardly more complicated than a Feather M4!)

Oh yeah and even if I just had access to the firmware of this washer/dryer I might be able to fix the bug where the “program finished” beeper does not wait for the door’s lock magnet to disengage before starting. The amount of times I need to set a timer to remind myself to go and take the towels out in five minutes is annoying as heck.

But it’s not just washing machines that would be awesome to be hackable and programmable. We have a smallish microwave and convection oven combo. I got it in Dublin, and I chose this model because it was recommended by an acquaintance for its insistent beeping when the timer completes. If you have ever experience hyperfocus at any degree, you probably understand why such a feature is useful.

But in addition to the useful feature, the oven comes with a bunch of pretty much useless ones. There’s a number of “pre-programmed” options for defrosting, or making pop-corns and other things like that, that we would never use. Not just because we don’t eat them, but also because they are rarely recommended — if you ever watch cooking channels such as How To Cook That, you find that professionals usually suggest specific way to use the microwave — including Ann Reardon’s “signature” «put it in the microwave for 30 seconds, stir, put it back for 30 seconds, stir, …».

And again in term of user interfaces, the way you configure the convection temperature is by clicking the “Convection” button to go from full power (200W) down — and if you got it wrong, oops! And then if you turn the knob (this time a free-spinning one, at least), you’re setting the timer, without pre-heating. If you want to pre-heat you need to cancel it all, and resume the thing, and… you see my point.

This is a very simple appliance, and it works perfectly fine. But if I could just open it and replace the control panel with something nicer, I would love to. I think that I would like to have something I can connect to a computer (or maybe connect an USB thumbdrive to), and configure my own saved parameters, selecting for instance “fish fingers” and “melted butter”, which are the more likely uses of the oven for us at home.

But again, this would require a significant change in the design of appliances, which I don’t think is going to happen any year now. It would be lovely, and I think that there might be a chance for Open Source and Open Hardware amateurs to at least show the possibility for it — but it’s the kind of project that I can only with for, with no hope to get to work on myself, not just for the lack of time but for the lack of space — if you wanted to try hacking on a washing machine, you definitely need a more spacious and populated workshop. My limit is still acrylic lamps.

Why I Care About Licensing

Both on the blog and on Twitter, I have ranted at length at projects missing licensing information altogether, or not providing licensing information on specific files, or providing conflicting licensing information. As you can imagine, this is a topic that I’m very attached to, which is why I have been following REUSE guidelines to make sure that all my (currently active) projects follow the specification.

Unfortunately this care is not shared with many developers, even those who consider themselves part of the Free Software movement, and this causes friction, poisons the well in both directions, and overall is detrimental to the community and the movement. Even more so than when people care deeply and disagree on the “correct” licensing terms.

While I am most definitely not a lawyer, and I speak most definitely only for myself and not my employer, let me try to give you a run down of what’s going on here.

First of all, we need to start with a simplification, and handwavey accept that without an explicit license allowing it, the distribution, modification, and integration of source code is not allowed, or at least that’s the way we perceive it in the wider world. And Free Software licenses, more or less permissive, spell out the terms with which distribution and usage are allowed.

It Is But An Example

As far as I can tell, there’s no provision anywhere that source code used in documentation is exempt from these limitations, except insofar as the license on the documentation itself would apply if not otherwise overridden. And that’s how I started engaging with Adafruit: the documentation for most of their CircuitPython libraries provide a lot of useful examples — and as it turns out they were already released with an open-source license (MIT), but that was not obvious when looking at the docs sites themselves. So I convinced them to add SPDX headers to all their source code, including the examples — and now you can read the example and see immediately which license it’s released under. Isn’t that cool?

Unfortunately, sometimes developers are stubborn and find adding two lines to their documentation examples a distraction, and argue against it, making it annoying for others to use their example source code without either infringing the copyright or going the long way to find the right answers.

Websites, PDFs, Books, they are all equal

But this goes to the double for code that is explicitly written only as example material! Let me take a bit of a detour — my wife went through the awesome Python Crash Course a few months ago. While it suffers from a few of the issues I already complained about when it comes to splitting names, the book is fairly well written and has hands-on exercise that provide enough of a stretch to “my first Python project”. In the later parts of the book, one of the long-building exercise is writing a clone of Space Invaders with PyGame, which turned out to be interesting not just for her writing it, but for myself reviewing it as well, as game programming is definitely not a skill I ever spent time acquiring.

Now, remember I said there’s space to stretch? While the book guides you through building the very basic framework for “Alien Invasion” with full code to go with it, it leaves a lot of holes to be filled. Not just the assets (that it pretty much suggests you Google for and find somewhere online, without any discussion on what you can and cannot use — shout out to the Noun Project which I use for my own projects nowadays), but also some of the more advanced gameplay, and a lot of the refactoring — the way you write the game following the book is definitely more aimed at teaching than at maintaining. So when my wife finished with the book, I started showing her examples of how to refactor the code and introduce new features. So while the basic skeleton is the same as the original from the book, the version she ended up with was nearly fully rewritten. And it’s all in a Git repository!

But she has nothing to show for it. The source code in the book does not provide any licensing information. When I reached out to Eric Matthes (the book’s author) on Twitter asking him if he’d consider applying an opensource license to the code, so that she could publish it on her GitHub account to show off to some of her friends – and with an explicit mention that I’d have liked to use it as a base to test out BeeWare projects and see to contribute to some – he said he’d think about it, but that he wouldn’t feel right to release it under a permissive license that would allow someone to take it and sell it on AppStore and similar. So her options are to ignore licensing and publish the code anyway (after all, nobody cares, and I’m sure I can find plenty of people who did exactly that), or to comply with the (lack of) license and keep it for herself, and only show her friends a video of it working. She went for the latter, as we already had a long discussion of copyright when J Salmeron brought up the topic (and dang, we missed the opportunity to shake his hand as we were standing right behind him at the Beast in Black concert in Amsterdam last year!)

Provide It And They Will Build

There is one case that, personally, drained my will to contribute to an ecosystem even more than the example above. After all, Python Crash Course is a great book, and the only really good reason to publish the code is for “bragging rights” — which is not to say it’s not something, but it’s not the end of the world either.

When a commercial vendor is providing you with an extensible ecosystem for you to build upon, but doesn’t play by the same rules, it’s just… disappointing. In this case the issue is with Saleae, the manufacturer of the Logic Pro 16 analyzer I use for a bunch of different things. You may have noticed me providing screenshots off it when talking about fake candles and infrared. As a vendor, Saleae has very good user support: when I complained on Twitter that I wasted two hours chasing ghosts because I didn’t realise I forgot to connect the USB cable to the analyzer, and the software didn’t make it clear enough it was showing me demo garbage, they engaged, asked me what I would have done differently, and delivered the fix in less than a month. That was awesome support.

So where does it go wrong? Well, in June they updated their software to support Python-based extensions for analysis of specific protocols. I was actually interested in adding support for IR decoding to make my life easier in my TV controlling project, and so when they posted that one of their employees built a duty cycle measure tool and posted it on GitHub I was thrilled!

Except… the repository is there, the source code is there, but there is no license. The extension is pretty much a tutorial by itself on how to build what I needed, but it’s coming with no license attached, and as such I can’t use its code as a base for my own extension. And while I could possibly learn from it, it’s also a poison pill… there’s no license, if I copy it too literally, am I infringing copyright? Maybe, who knows? The author says I should «feel free to look, copy and use [his] Logic 2 extensions in any way [I] would like», but that’s not exactly a very comforting statement when you’re contributing while part of a company.

Final Thoughts

Just be yourself (this is pre-recorded). If you do care about Free Software, please take licensing seriously. If you don’t care about Free Software, because you don’t believe in the ideals behind, or you’re just not part of the ecosystem, then I can’t really blame you for disrespecting licenses, but then again if you rely on proprietary software license, you probably should respect all of them. It’s the same problem with software piracy.

I do believe that the folks at REUSE are doing a great service for all of us by making it possible to spell out licenses clearly and openly, and making it easy for others to modify and copy the code that we want to be out there in the world. It doesn’t take so much time to use the tool to add a few lines to a text file, or an additional text file for binary files. Please take the chance to sort this out!

Converting Unpaper to Meson

You may remember that I took over Unpaper a number of years ago, as the original author was not interested in maintaining it in the long term. And one of the first things I did was replacing the hand-rolled build system with Autotools, as that made packaging it significantly simpler. Followed by replacing the image decoding with libav first, and more recently FFmpeg.

For various reasons, I have not spent much time on Unpaper over the years I spent in my last bubble. When I joined my current employer I decided that I cared more to get Unpaper back into a project people can contribute to, and less about maintaining the copyright of my contributions, which makes it easier for me to work on it without having to carve out some special time for it.

One of the main requests over on the GitHub project, over these years, has been Windows support. And I did say that Windows 10 is becoming increasingly interesting for Free Software development. So when I was asked to rant on a bit about build systems, which I did over on Twitch, I decided to take a stab at figuring out if Meson (which supports Windows natively), would be a good fit. And I did that using Visual Studio Code and Debian/WSL!

If you haven’t seen the video yet, spoiler alert: I tried, got it working within ten minutes, and it worked like a charm. I’m not kidding you, it pretty much worked at the first attempt (well, the first session, not the first execution), and it made total sense the way it works. You can tell that the folks involved in building Meson (including Gentoo’s own Nirbheek!) knew what they were embarking to do and how to have it fit together. Even small bits like keeping large files support always enabled made me a very happy user.

I have now a branch on GitHub for the Meson build, although it’s incomplete. It doesn’t install any of the docs, and it doesn’t install the man page. Also it doesn’t build nor run the tests. For all of those I created a project to track what needs to be done: move on from the current implementation, it’s 2020!

The test system in the current Autotools version of Unpaper is leveraging the implementation of make check from Automake, together with a C program that compares the expected outputs with what is generated by the newly-built unpaper binary. It also needs to consider a threshold of difference between the two, because precision is not guaranteed, in either direction. This is incredibly fragile and indeed is currently failing for… not sure which reason. Getting a “diff” of the generated versus expected in C is fairly hard and deserves its own project. Instead, relying on Python for instrumenting and running the tests would make it much easier to maintain, as you wouldn’t need to maintain at least three integration points to keep this together.

Something similar is a problem for documentation: right now the documentation is split between some Markdown files, and a single DocBook (XML) file that is converted to a man page with xsltproc. Once the bullet of Python is bit, there’s no reason not to just use ReStructuredText and Sphinx, which already provides integration to generate man pages — and honestly nowadays I feel like the XML version is definitely not a good source because it can’t be read by itself.

What I have not done yet is making sure that the Meson build system allows a Windows build of Unpaper. The reason is relatively simple: while I have started using Visual Studio Code, and clang is available for Windows in the form of a binary, and so is FFmpeg, fitting it all together will probably take me more time, as I’m not used to this development environment. If someone is interested in making sure this works out of the box, I’m definitely happy to review pull requests.

So yeah, I guess the Autotools Mythbuster can provide a seal(ion) of approval to Meson. Great work, folks! Happy to see we have a modern build system available that compromises in the right places instead of being too dogmatic!

The Rolodex Paradigm

Silhouette of a rolodex.

Created by Marie-Pierre Bauduin from Noun Project.

In my previous bubble, I used to use as my “official” avatar a clipart picture of a Rolodex. Which confused a lot of people, because cultures differ and most importantly generation differ, and turned out that a lot of my colleagues and teammates never had seen or heard of a Rolodex. To quote one of the managers of my peer team when my avatar was gigantic on the conference room’s monitor «You cannot say that you don’t know what a Rolodex is, anymore!»

So, what is a Rolodex? Fundamentally, it’s a fancy address book. Think of it as a physical HyperCard. As Wikipedia points out, though, the name is sometimes used «as a metonym for the sum total of an individual’s accumulated business contacts», which is how I’m usually using it — the avatar is intentionally tongue-in-cheek. Do note that this is most definitely not the same as a Pokédex.

And what I call the Rolodex Paradigm is mainly the idea that the best way to write software is not to know everything about everything, but to know who knows something about what you need. This is easier said than done of course, but let me try to illustrate why I mean all of this.

One of the things I always known about myself is that I’m mainly a generalist. I like knowing a little bit about a lot of things, rather than a lot about a few things. Which is why on this blog you’ll find superficial posts about fintech, electronics, the environment, and cloud computing. You’ll rarely find in-depth articles about anything more recently because to get into that level of details I would need to get myself “in the zone” and that is hardly achievable while maintaining work and family life.

So what do I do when I need information I don’t have? I ask. And to do that, I try to keep in mind who knows something about the stuff that interest me. It’s the main reason why I used to use IRC heavily (I’m still around but not active at all), the main reason why I got to identi.ca, the main reason why I follow blogs and write this very blog, and the main reason why I’m on social networks including Twitter and Facebook – although I’ve switched from using my personal profile to maintaining a blog’s page – and have been fairly open with providing my email address to people, because to be able to ask, you need to make yourself available to answer.

This translates similarly in the workplace: when working at bigger companies that come with their own bubble, it’s very hard to know everything of everything, so by comparison it can be easier to build up a network of contacts who work on different areas within the company, and in particular, not just in engineering. And in a big company it even has a different set of problems to overcome, compared to the outside, open source world.

When asking for help to someone in the open source world, you need to remember that nobody is working for you (unless you’re explicitly paying them, in which case it’s less about asking for help and more about hiring help), and that while it’s possible that you’re charismatic enough (or well known enough) to pull off convincing someone to dedicate significant amount of time to solve your problems, people are busy and they might have other priorities.

In a company setting, there’s still a bit of friction of asking someone to dedicate a significant amount of time to solve your problem rather than theirs. But, if the problem is still a problem for the company, it’s much more likely that you can find someone to at least consider putting your problem in their prioritised list, as long as they can show something for the work done. The recognition is important not just as a way to justify the time (which itself is enough of a reason), but also because in most big companies, your promotion depends on demonstrating impact in one way or another.

Even were more formal approaches to recognitions (such as Google’s Peer Bonus system) are not present, consider sending a message to the manager of whoever helped you. Highlight how they helped not just you personally, but the company — for instance, they may have dedicated one day to implement a feature in their system that saved you a week or two of work, either by implementing the same feature (without the expertise in the system) or working around it; or they might have agreed to get to join a sketched one hour meeting to provide insights into the historical business needs for a service, that will stop you from making a bad decision in a project. It will go a long way.

Of course another problem is to find the people who know about the stuff you need — particularly if they are outside of your organization, and outside of your role. I’m afraid to say that it got a lot harder nowadays, given that we’re now all working remote from different houses and with very little to no social overlapping. So this really relies significantly on two points: company culture, and manager support.

From the company point of view, letting employees built up their network is convenient. Which is why so many big companies provide spaces for, and foster, interaction between employees that have nothing to do with work itself. While game rooms and social interactions are often sold as “perks” to sell roles, they are pretty much relaxed “water cooler” moments, that build those all-too-precious networks that don’t fit into an org structure. And that’s why inclusive social events are important.

So yeah, striking conversations with virtual stranger employees, talking about common interests (photography? Lego? arts?) can lead into knowing what they are working on, and once they are no longer strangers, you would feel more inclined to ask for help later. The same goes for meeting colleagues at courses — I remember going to a negotiation training based around Stuart Diamond’s Getting More, and meeting one of the office’s administrative business partners, who’s also Italian and liking chocolate. When a few months later I was helping to organize VDD ’14, I asked her help to navigate the amount of paperwork required to get outsiders into the office over a weekend.

Meeting people is clearly not enough, though. Keeping in touch is also important, particularly in companies where teams and role are fairly flexible, and people may be working on very different projects after months or year. What I used to do for this was making sure to spend time with colleagues I knew from something other than my main project when traveling. I used to travel from Dublin to London a few times a year for events — and I ended up sitting close to teams I didn’t work with directly, which lead me to meeting a number of colleagues I wouldn’t have otherwise interacted with at all. And later on, when I moved to London, I actually worked with some of them in my same team!

And that’s where the manager support is critical. You won’t be very successful at growing a network if your manager, for example, does not let you clear your calendar of routine meetings for the one week you’re spending in a different office. And similarly, without a manager that supports you dedicating some more time for non-business-critical training (such as the negotiation training I’ve mentioned), you’ll end up with fewer opportunities to meet random colleagues.

I think this was probably the starkest difference between my previous employer’s offices in Dublin and London: my feeling was that the latter had far fewer opportunities to meet people outside of your primary role and cultivate those connections. But it might also be caused by the fact that many more people live far enough from the office that commuting takes longer.

How is this all going to be working in a future where so many of us are remote? I don’t honestly know. For me, the lack of time sitting at the coffee area talking about things with colleagues that I didn’t share a team with, is one of the main reason why I hope that one day, lockdown will be over. And for the rest, I’m trying to get used to talk over the Internet more.

Birch Books With ATmega48

I last wrote about the Trinket M0 version of Birch Books, which is using CircuitPython and uses a much simplified code structure, while relying on one of two possible I/O expanders. But I felt a bit of a cheat to implement this one with a Trinket M0, which is a very powerful MCU to use for something that should be as simple as to keep a few lights on in a LEGO set.

So, as suggested by a colleague, I looked into the ATmega48, which is a close relative to the ATmega328 that Arduino is based on. Actually, these two versions are pin- and source-compatible so while I have now implemented it with the 48, the 328 (which is more commonly available) should work the same way.

The ATmega48 option is an interesting compromise between the 8052 and the Trinket M0. It’s an 8-bit microcontroller, and it requires to write low-level C to program, but at the same time it doesn’t require any passive components (as long as you don’t need more than 8MHz, and you didn’t make the mistake I made about having an active-high RST button). It turned out that the suggestion was a very good one, and the ATmega48 does indeed seem to fit this project very nicely.

Programming the ATmega48 can be done via 6-pin connectors that need to provide power, SPI bus, and the RST line. This can be achieved with Adafruit’s FT232H board: you connect the SPI lines just as described on the breakout, and use D3 for the RST line, then you just tell avrdude to use the UM232H variant. No other configuration is needed. I actually plan on crimping my own harness for this later, so I don’t need to use a breadboard all the time.

The other main difference between the ATmega48 and the 8052 is that the former has nearly half the pins, but it still has enough I/Os (14 outputs, 2 inputs) for Birch Books that there is no need to add an expander. There’s a few more I/O lines as well, but they are “overcommited” and by default configured to have different meanings. For instance the external clock pins can be re-configured to be additional I/O lines, and even the RST line can be disabled by fuses to gain one more I/O.

Let’s talk about the clock! The ATmega48 is sent out from factory with the fuses configured for a 1MHz clock, but can be configured up to 8MHz. Alternatively it can use an external clock and run up to… I forgot, and the point being I don’t need any of that. I did order 100 pieces of 12MHz crystals just to have them at hand, when I made an order from LCSC, but for this project, the 1MHz clock speed is plenty. So I left it configured the way it came from factory, and that works perfectly fine.

The code ended up being very similar to the 8052 version of course: it initializes the ports, sets up a timer input, and in the loop checks for the test button and otherwise keep updating the port state depending on the calculated virtual hour. While the ATmega48 would have allowed for more fine-grained controls on the timing, and thus allowed me to use the original planned scheduled, I decided to actually keep the same 1/16th of a second “tick” timer, and the same formulas to calculate the schedule.

Another thing allowed by the ATmega48, or rather the fact by the AVR-GCC, would be making the code more modular. As I ranted before, SDCC is not the most aggressive compiler, when it comes to optimizing around functions, and that’s why instead of having more structured code I ended up with a lot of spaghetti. To make it easier to compare and maintain the code together, I decided not to refactor the spaghetti in the AVR version, but I think that’s okay. The simplification around the timer handling is already a huge step forward anyway.

On the logic side, I decided to implement a one second power-on self-test though that turns on all of the lights. This was meant not just to easily test the code while I had it on my desk, but also just as a general “healthiness check” for the LEDs. I backported it to the CircuitPython code while I was at it, as well. And I removed the “knuckle pattern strobing” (which turned on every other LED alternatively), because it doesn’t work quite right or well on the real LEGO set (where two rooms use three LED lines tied together), simplifying the test logic.

The other thing that I found out by using the ATmega48 is that I do have some difference in the current limiting between the Trinket M0 and the other two controllers. When I installed the Trinket M0-based controller, the lights in the bookstore were much brighter, and even if that looked nice, it had me worried, because I knew that the LEDs I’ve been using tend to brown out quickly if they are powered at maximum current. I managed to pretty much kill one LED by keeping it running for just two days without a current-limiting resistor. And indeed, a month later, some of the LEDs are now nearly completely dark, and will need to be replaced.

Since the actuator board is the same, powered from 5V, and with 47 Ohm resistor networks for the LEDs in each case, I’m not entirely sure why this is happening. I might need to add some inline current metering to to see how that fits together. But that’s for another time.

Newcomers, Teachers, and Professionals

You may remember I had already a go at tutorials, after listening in on one that my wife had been going through. Well, she’s now learning about C after hearing me moan about higher and lower level languages, and she did that by starting with Harvard’s CS50 class, which is free to “attend” on edX. I am famously not a big fan of academia, but I didn’t think it would make my blood boil as much as it did.

I know that it’s easy to rant and moan about something that I’m not doing myself. After all you could say “Well, they are teaching at Harvard, you are just ranting on a c-list blog that is followed by less than a hundred people!” and you would be right. But at the same time, I have over a decade of experience in the industry, and my rants are explicitly contrasting what they say in the course to what “we” do, whether it is in opensource projects, or a bubble.

I think the first time I found myself boiling and went onto my soapbox was when the teacher said that the right “design” (they keep calling it design, although I would argue it’s style) for a single-source file program is to have includes, followed by the declaration of all the functions, followed by main(), followed by the definition of all the functions. Which is not something I’ve ever seen happening in my experience — because it doesn’t really make much sense: duplicating declarations/definitions in C is an unfortunate chore due to headers, but why forcing even more of that in the same source file?

Indeed, one of my “pre-canned comments” in reviews at my previous employer was a long-form of “Define your convenience functions before calling them. I don’t want to have to jump around to see what your doodle_do() function does.” Now it is true that in 2020 we have the technology (VSCode’s “show definition” curtain is one of the most magical tools I can think of), but if you’re anyone like me, you may even sometimes print out the source code to read it, and having it flow in natural order helps.

But that was just the beginning. Some time later as I dropped by to see how things were going I saw a strange string type throughout the code — turns out that they have a special header that they (later) define as “training wheels” that includes typedef char *string; — possibly understandable given that it takes some time to get to arrays, pointers, and from there to character arrays, but… could it have been called something else than string, given the all-too-similarly named std::string of C++?

Then I made the mistake of listening in on more of that lesson, and that just had me blow a fuse. The lesson takes a detour to try to explain ASCII — the fact that characters are just numbers that are looked up in a table, and that the table is typically 8-bit, with no mention of Unicode. Yes I understand Unicode is complicated and UTF-8 and other variable-length encodings will definitely give a headache to a newcomer who has not seen programming languages before. But it’s also 2020 and it might be a good idea to at least put out the idea that there’s such a thing as variable-length encoded text and that no, 8-bit characters are not enough to represent people’s names! The fact that my own name has a special character might have something to do with this, of course.

It went worse. The teacher decided to show some upper-case/lower-case trickery on strings to show how that works, and explained how you add or subtract 32 to go from one case to the other. Which is limited not only by character set, but most importantly by locale — oops, I guess the teacher never heard of the Turkish Four Is, or maybe there’s some lack of cultural diversity in the writing room for these courses. I went on a rant on Twitter over this, but let me reiterate this here as it’s important: there’s no reason why a newcomer to any programming language should know about adding/subtracting 32 to 7-bit ASCII characters to change their case, because it is not something you want to do outside of very tiny corner cases. It’s not safe in some languages. It’s not safe with characters outside the 7-bit safe Latin alphabet. It is rarely the correct thing to do. The standard library of any programming language has locale-aware functions to uppercase or lowercase a string, and that’s what you need to know!

Today (at the time of writing) she got to allocations, and I literally heard the teacher going for malloc(sizeof(int)*10). Even to start with a bad example and improve from that — why on Earth do they even bother teaching malloc() first, instead of calloc() is beyond my understanding. But what do I know, it’s not like I spent a whole lot of time fixing these mistakes in real software twelve years ago. I will avoid complaining too much about the teacher suggesting that the behaviour of malloc() was decided by the clang authors.

Since there might be newcomers reading this and being a bit lost of why I’m complaining about this — calloc() is a (mostly) safer alternative to allocate an array of elements, as it takes two parameters: the size of a single element and the number of elements that you want to allocate. Using this interface means it’s no longer possible to have an integer overflow when calculating the size, which reduces security risks. In addition, it zeroes out the memory, rather than leaving it uninitialized. While this means there is a performance cost, if you’re a newcomer to the language and just about learning it, you should err on the side of caution and use calloc() rather than malloc().

Next up there’s my facepalm on the explanation of memory layout — be prepared, because this is the same teacher who in a previous lesson said that the integer variable’s address might vary but for his explanation can be asserted to be 0x123, completely ignoring the whole concept of alignment. To explain “by value” function calls, they decide to digress again, this time explaining heap and stack, and they describe a linear memory layout, where the code of the program is followed by the globals and then the heap, with the stack at the bottom growing up. Which might have been true in the ’80s, but hasn’t been true in a long while.

Memory layout is not simple. If you want to explain a realistic memory layout you would have to cover the differences between physical and virtual memory, memory pages and pages tables, hugepages, page permissions, W^X, Copy-on-Write, ASLR, … So I get it that the teacher might want to simplify and skip over a number of these details and give a simplified view of how to understand the memory layout. But as a professional in the industry for so long I would appreciate if they’d be upfront with the “By the way, this is an oversimplification, reality is very different.” Oh, and by the way, stack grows down on x86/x86-64.

This brings me to another interesting… mess in my opinion. The course comes with some very solid tools: a sandbox environment already primed for the course, an instance of AWS Cloud9 IDE with the libraries already installed, a fairly recent version of clang… but then decides to stick to this dubious old style of C, with strcpy() and strcmp() and no reference to more modern, safer options — nevermind that glibc still refuses to implement C11 Annex K safe string functions.

But then they decide to not only briefly show the newcomers how to use Valgrind, of all things. They even show them how to use a custom post-processor for Valgrind’s report output, because it’s otherwise hard to read. For a course using clang, that can rely on tools such as ASAN and MSAN to report the same information in more concise way.

I find this contrast particularly gruesome — the teacher appears to think that memory leaks are an important defect to avoid in software, so much so that they decide to give a power tool such as Valgrind to a class of newcomers… but they don’t find Unicode and correctness in names representation (because of course they talk about names) to be as important. I find these priorities totally inappropriate in 2020.

Don’t get me wrong: I understand that writing a good programming course is hard, and that professors and teachers have a hard job in front of them when it comes to explain complex concepts to a number of people that are more eager to “make” something than to learn how it works. But I do wonder if sitting a dozen professionals through these lessons wouldn’t make for a better course overall.

«He who can, does; he who cannot teaches» is a phrase attributed to George Bernand Shaw — I don’t really agree with it as it is, because I met awesome professors and teachers. I already mentioned my Systems’ teacher, who I’m told retired just a couple of months ago. But in this case I can tell you that I wouldn’t want to have to review the code (or documentation) written by that particular teacher, as I’d have a hard time keeping to constructive comments after so many facepalms.

It’s a disservice to newcomers that this is what they are taught. And it’s the professionals like me that are causing this by (clearly) not pushing back enough on Academia to be more practical, or building better courseware for teachers to rely on. But again, I rant on a C-list blog, not teach at Harvard.

NeoPixel Acrylic Lamp: A Few Lessons Learnt

Last month I wrote some notes about Chinese acrylic lamps, the same kind as I used successfully for my insulin reminder. As I said, I wanted to share the designs as I went along, and I do now have a public repository for what I have, although it has to be said that you shouldn’t trust it just yet: I have not managed to get this to properly turn it on. So instead, in the spirit of learning from others’ mistakes, let me show you my blooper reel.

Lesson #1: Don’t Trust The Library

A pictures of two SMT-manufactured Acrylic lamp boards side-by-side.

These were unusable.

You may remember that in the previous post i showed the “sizing test”, which was a print of the board with no components on it, which I used to make sure that the LEDs and the screw holes would align correctly in the base. Since I had to take the measurement myself I was fairly worried I would get some of the measures wrong.

The size test was sent to print before I came to the conclusion that I actually wanted to use NeoPixel LEDs, instead of simple RGB LEDs, so it was printed to host “standard” 3535 common-anode LEDs. I then changed the design for a more sparse design that used the W2812B-Mini, which is a 3535 package (meaning it’s 3.5mm by 3.5mm in size), and is compatible with the NeoPixel libraries. This meant more capacitors (although modern versions of the W2812B don’t seem to require this) but less logic around connecting these up.

As you can see from the image above, I also added space on the side to solder headers to connect past the USB-to-UART and directly to the NeoPixel row. Which was probably my best choice ever. When the boards arrived, the first thing I did was connecting the NeoPixel control pins to try to turning them on and… I discovered that nothing worked.

The answer turned out to be that I trusted the Adafruit Eagle library too much: the pinout for the 3535 variants of the W2812B (the “mini”) has been added wrong since the beginning, and an issue existed since 2018. I sent a pull request to correct the pinout, but it looks like Adafruit is not really maintaining this repository anymore.

Because of the pinout mistake, there’s no way to access the NeoPixels on any of the ten boards I had printed in this batch, and I also missed one connection on the CP2104, which meant I couldn’t even use these as bulky USB-to-UART adapters as they are. But I can put this down as experience, and not worry too much about it, since it’s still a fairly cheap build by comparison with some of the components I’ve been playing with.

Lesson #2: Datasheets Can Still Lie To You

So I ordered another set of boards with a new revision: I replaced the 3535 version with the 5050 after triple checking that the pinout would be correct — while I did have a fixed part, I thought it would be better not to even suggest using a part that has a widespread broken pinout, and I did confirm I could fit the 5050 through the aperture by then. I also decided to move some components around to make the board a bit tighter and with a more recognizable shape.

The boards arrived, without the ESP32-WROOM module on them — that’s because it’s not in the list of parts that JLCPCB can provide, despite it being sold by their “sister company” LCSC. That’s alright because I procured myself the modules separately on AliExpress (before I figured out that ordering from LCSC is actually cheaper). And I started the testing in increasing order of satisfaction: can the NeoPixel be addressed? Yes! Can the CP2104 enumerate? Yes! Does it transmit/receive over serial? Yes! Does esptool recognize the newly soldered ESP32? Yes! Does it fit properly in the base with the module on, and have space to connect the USB plug? Yes! Can I flash MicroPython on it? Well…

This is where things got annoying and took me a while to straighten out. I could flash MicroPython on the ESP32 module. The programming worked fine, and I could verify the content of the flash, but I never got the REPL prompt back over serial. What gives?

Turns out I only read part of the datasheet for the module, and not the Wiki: there’s something that is not quite obvious otherwise, and that is that GPIO0 and GPIO2 are special, and shouldn’t be used for I/O. Instead, the two of them are used to select boot mode, and enter flashing. Which is why GPIO0 is usually tied to a “BOOT” switch on the ESP32 breakout boards.

How does esptool handle these usually? By convention it expects the RTS and DTR line of the serial adapter to be connected respectively to EN (reset) and GPIO0 (boot), together with “additional circuitry” to avoid keeping the board in reset mode if hardware flow control is enabled. Of course I didn’t know this when I sent these to manufacture, and I still am not sure what that additional circuitry looks like (there’s some circuitry on SparkFun Thing Plus, but it’s not quite clear if it’s the same as Espressif is talking about).

I have seen a few schematics for breadboard-compatible modules for ESP32, but I have not really found a good “best practices to include an ESP32 module into your design”, despite looking around for a while. I really hope at least documenting what can go wrong will help someone else in the future.

Next Steps

I have a third revision design that should address the mistakes I made, although at the time of writing this blog post I still need to find the “additional circuitry”, which I might just forego and remind myself that hardware flow control with ESP32 is probably a bad idea anyway — since the lines are used for other purposes.

I also made sure this time to add reset and boot buttons, although that turned out to be a bit more of a headache just to make sure they would fit with the base. The main issue of using “classic” top-actuated buttons is that putting them on the top of the board makes it hard to find them once mounted, and putting them on the bottom risk to get pressed once I fit back the bottom of the base in. I opted for side-actuated buttons, so that they are reachable when the board is mounted in the base, and marked on the bottom of the board, the same way as the connectors are.

I’m also wondering if I should at least provide the ability to solder in the “touch button” that is already present on the base, and maybe add a socket for connecting the IR decode to reuse the remote controls that I have plenty of, now. But that might all be going for over-engineering, who knows!

Unfortunately, I don’t think I’ll be making an order of a new set of boards in the immediate future. I’ve already said this on Twitter, and it’ll deserve an eventual longer-form post, but it seems like we might be moving sooner, rather than later. And while JLCPCB and LCSC are very fast at shipping orders, I will be mostly trying not to make new orders of anything until we have a certainty of where we’ll be in a a few months. This is again a good time to put everything into a project box, and take it out when the situation feels a bit more stable.

My Password Manager Needs a Virtual USB Keyboard

You may remember that a couple of years ago, Tavis convinced me to write down an idea of how my ideal password manager would look like. Later the same year I also migrated from LastPass to 1Password, because it made family sharing easier, but like before this was a set of different compromises.

More recently, I came to realise that there’s one component that my password manager needs, and I really wish I could convince the folks at 1Password to implement it: a virtual USB keyboard on a stick. Let me try to explain, because this already generated negative reactions on Twitter, including from people who didn’t wait to understand how it all fits together first.

Let me start with a note: I have been thinking in my idea of something like this for a long while, but I have not been able to figure out in my mind how to make it safe and secure, which means I don’t recommend to just use my random idea for this all. I then found out that someone else already came up with pretty much the same idea and did some of the legwork to get this to work, back in 2014… but nothing came of it.

What this suggests, is to have some kind of USB hardware token that can be paired with a phone, say over bluetooth, and be instructed to “type out” text via USB HID. Basically a “remote keyboard” controlled with the phone. Why? Let’s take a step back and see.

Among security professionals, there’s a general agreement that the best way to have safe passwords is to use unique, generated passwords and have them saved somewhere. There’s difference in the grade of security you need — so while I do use and recommend a password manager, practically speaking, having a “passwords notebook” in a safe place is pretty much as good, particularly for less technophile people. You may disagree on this but if so please move on, as this whole post is predicated on wanting to use password managers.

Sometimes, though, you may need a password for something that cannot use a password manager. The example that comes to my mind is trying to log in to PlayStation Network on my PS3/PS4, but there’s a number of other cases like that in addition to gaming consoles, such as printers/scanners, cameras (my Sony A7 need to log in to the Sony Online account to update the onboard apps, no kidding!), and computers that are just being reinstalled.

In these cases, you end up making a choice: for something you may have to personally type out more often than not, it’s probably easier to use a so-called “memorable password”, which is also commonly (but not quite correctly) called a Diceware password. Or, again alternatively, a 936 password. You may remember that I prefer a different form of memorable passwords, when it comes to passwords you need to repeatedly type out yourself very often (such as the manager’s master password, or a work access password), but for passwords that you can generate, store in a manager, and just seldomly type out, 936-style passwords are definitely the way to go in my view.

In certain cases, though, you can’t easily do this either. If I remember this correctly, Sony enforced passwords to have digits and symbols, and not repeat the same digit more than a certain amount of times, which makes diceware passwords not really usable for that either. So instead you get a generated password you need to spend a lot of time reading and typing — and in many cases, having to do that with on-screen keyboards that are hard to use. I often time out on my 1Password screen while doing so, and need to re-login, which is a very frustrating experience in and by itself.

But it’s not the only case where this is a problem. When you set up a computer for the first time, no matter what the operating system, you’ll most likely find yourself having to set up your password manager. In the case of 1Password, to do so you need the secret key that is stored… in 1Password itself (or you may have printed out and put in the safe in my case). But typing that secret key is frustrating — being able to just “send” it to the computer would make it a significantly easier task.

And speaking again of reinstalling computers, Windows BitLocker users will likely have their backup key connected to their Microsoft account so that they can quickly recover the key if something goes wrong. Nothing of course stops you from saving the same key in 1Password, but… wouldn’t it be nice to be able to just ask 1Password to type it for you on the computer you just finished reinstalling?

There’s one final case for which is this is useful, and that’s going to be a bit controversial: using the password on a shared PC where you don’t want to log in with your password manager. I can already hear the complaints that you should never log in from a shared, untrusted PC and that’s a recipe for disaster. And I would agree, except that sometimes you just have to do that. A long time ago, I found myself using a shared computer in a hotel to download and print a ticket, because… well, it was a whole lot of multiple failures why I had to do it, but it was still required. Of course I went on and changed the password right after, but it also made me think.

When using shared computers, either in a lounge, hotel, Internet cafe (are they still a thing), or anything like that, you need to see the password, which makes it susceptible to shoulder surfing. Again, it would be nice to have the ability to type the password in with a simpler device.

Now, the biggest complain I have received to this suggestion is that this is complex, increases surface of attack by targeting the dongle, and instead the devices should be properly fixed not to need any of this. All of that is correct, but it’s also trying to fight reality. Sony is not going to go and fix the PlayStation 3, particularly not now that the PS5 got announced and revealed. And some of these cases cannot be fixed: you don’t really have much of an option for the BitLocker key, aside from reading it off your Microsoft account page and typing it on a keyboard.

I agree that device login should be improved. Facebook Portal uses a device code that you need to type in on a computer or phone that is already logged in to your account. I find this particular login system much easier than typing the password with a gamepad that Sony insists on, and I’m not saying that because Facebook is my employer, but because it just makes sense.

Of course to make this option viable, you do need quite a few critical bits to be done right:

  • The dongle needs to be passive, the user needs to request a password typed out explicitly. No touch sensitive area on the dongle to type out in the style of a YubiKey. This is extremely important, as a compromise of the device should not allow any password to be compromised.
  • The user should be explicit on requesting the “type out”. On a manager like 1Password, an explicit refresh of the biometric login is likely warranted. It would be way too easy to exfiltrate a lot of passwords in a short time otherwise!
  • The password should not be sent in (an equivalent of) cleartext between the phone and the device. I honestly don’t remember what the current state of the art of Bluetooth encryption is, but it might not be enough to use the BT encryption itself.
  • There needs defense against tampering, which means not letting the dongle’s firmware to be rewritten directly with the same HID connection that is used for type out. Since the whole point is to make it safe to use a manager-stored password on an untrusted device, having firmware flashing access would make it too easy to tamper with.
    • While I’m not a cryptography or integrity expert, my first thought would be to make sure that a shared key negotiated between the dongle and the phone, and that on the dongle side, this is tied to some measurement registers similar to how TPM works. This would mean needing to re-pair the dongle when updating the firmware on it, which… would definitely be a good idea.

I already asked 1Password if they would consider implementing this… but I somewhat expect this is unlikely to happen until someone makes a good proof of concept of it. So if you’re better than me at modern encryption, this might be an interesting project to finish up and getting to work. I even have a vague idea on a non-integrated version of this that might be useful to have: instead of being integrated with the manager, having the dongle connect with a phone app that just has a textbox and a “Type!” button would make it less secure but easier to implement today: you’d copy the password from the manager, paste it into the app, and ask it to type of the dongle. It would be at least a starting point.

Now if you got to this point (or you follow foone on Twitter), you may be guessing what the other problem is: USB HID doesn’t send characters but keycodes. And keycodes are dependent on the keyboard layout. That’s one of the issue that YubiKeys and similar solutions have: you either need to restrict to a safe set of characters, or you end up on the server/parser side having to accept equivalence of different strings. Since this is intended to use with devices and services that are not designed for it, neither option is really feasible — in particular, the option of just allowing a safe subset just doesn’t work: it would reduce the options in the alphabet due to qwerty/qwertz/azerty differences, but also would not allow some of the symbol classes that a number of services require you to use. So the only option there would be for the phone app to do the conversion between characters and keycodes based on the configured layout, and letting users change it.

Documentation needs review tools, not Wikis

I’m a strong believer on documentation being a fundamental feature of open source, although myself I’m probably bad at following my own advice. While I do write down a lot of running notes on this blog, as I said before, blogs don’t replace documentation. I have indeed complained about how hard it seems to be to publish documentation that is not tied to a particular codebase, but there’s a bit more that I want to explore.

I have already discussed code reviews in the past few months — and pointing out how the bubble got me used to review tooling (back in the days this would be called CASE). The main thing that I care for, with these tools, is that they reduce the cost of the review, which makes it less likely that a patch is left aside for too long — say for three weeks, because one reviewer points out that the code you copied from one file to another is unsafe, and the other notes they missed it the first time around, but now it’s your problem to get it fixed.

In a similar spirit, “code reviews” for documentation are an incredibly powerful tool. Not just for the documentation quality, but also because of the inclusiveness of them. Let me explain, focusing in particular with documentation that is geared toward developers — because that’s what I know the most of. Product documentation, and documentation that is intended for end users, is something I have had barely any contact with, and I don’t think I would have the experience to discuss the matter.

So let’s say you’re looking a tool’s wiki page, and follow the instructions in it, but get a completely different result than you expected. You think you know why (maybe something has changed in one of the tool’s dependencies, maybe the operating system is different, or maybe it never worked in the first place), and you want to fix the documentation. If you just edit the wiki, and you’re right, you’re saving a lot of time and grief to the next person that comes over to the documentation.

But what happens if you’re wrong?Well, if you’re wrong you may be misinterpreting the instructions, and maybe give a bad suggestion to the next person coming over. You may be making the equivalent change of all the bad howto docs that say to just chmod 0777 /dev/something to make some software work — and the next person will find instructions that work, but open a huge gaping security hole into a software.

Do you edit the Wiki? Are you sure that there’s enough senior engineers knowing the tool that can notice you edited the wiki, and revert your change if it is wrong? You may know who has the answer, and decide to send them a note with the change “Hey, can you check if I did it right?” but what if they just went into a three weeks vacation? What if they end up in the hospital after writing about LED lights?

And it’s not just a matter of how soon someone might spot a mistaken edit. There’s the stress of not knowing (or maybe knowing) how such a mistake would be addressed. Will it be a revert with “No, you dork!”, or will it be an edit that goes into further details of what the intention was and what the correct approach should have been in the first place? Wikipedia is an example of something I don’t enjoy editing, despite doing it from time to time. I just find some of its policy absurdist — including having given me a hard time while trying to correct some editor’s incorrect understanding of my own project, while at the same time having found a minor “commercial open source” project having what I would call close to an advertisement piece, with all the references pointing at content written by the editor themselves — who happen to be the main person behind such project.

Review-based documentation systems – including Google’s g3doc, but also the “humble” Google Docs suggested edits! – alleviate this problem, particularly when you do provide a “fast path” for fixing obvious typos without going through the full review flow. But otherwise, they allow you to make your change, and then send it to someone who can confirm it’s right, or start discussing what the correct approach should be — and if you happen to be the person doing the review, be the rake collector, help clearing documentation!

Obviously, it’s not perfect — if all your senior engineers are jerks that would call names the newcomer making a mistake in documentation, the review would be just as stressful. But it gives a significant first mover advantage: you can (often) choose who to send the review to. And let’s be honest: most jerks are bullies, and they will be less likely to call names the newcomer, when they already got a sign off from another senior person.

This is not where it ends, either. Even when you are a senior engineer, or very well acquainted with a certain tool, you may still want to run documentation changes through someone else because you’re not sure how they will be read. For me, this often is related to the fact that English is not my native language — I may say something in such a way that is, in my head, impossible to misunderstand, and yet confuse everybody else reading it, because I’m using specialised terms, uncommon words, or I keep insisting on using a word that doesn’t mean what I think it means.

As an aside, if you read most of my past writing, you may have noticed I keep using the word sincerely when I mean honestly or truthfully. This is a false friend from Italian, where sincero means truthful. It’s one particular oddity that I was made aware of and tried very hard to get rid of, but still goes through at times. For the same reason, I tend to correct other people with the same oddity, as I trained myself to notice it.

And while non-native English speakers may think of this problem more often, it’s not to say that none of the English native speakers pay attention to this, or that they shouldn’t have someone else read their documentation first. In particular, when writing a tutorial it is necessary to get someone towards who it is targeted to read through it! That means someone who is not acquainted yet with the tool, because they will likely ask you questions if you start using terms that they never heard before, but are to you completely obvious.

Which is why I insist that having documentation in a reviewable (not necessarily requiring a review) repository, rather than a Wiki is an inclusiveness issue: it reduces the stress for newcomers, non-native English speakers, less aggressive people, and people who might not have gone to schools with debating clubs.

And at the same time, it reduces the risk that security-hole-enabling documentation is left, even for a little while, unreviewed but live. Isn’t that good?

Windows 10, NVMe SSDs, VROC

It sounded like an easy task: my main SSD was running out of space and I decided to upgrade to a 2TB Samsung 970 NVMe drive. It would usually be an easy task, but clearly I shouldn’t expect for things to be easy with the way I use a computer, still 20 years after starting doing incredibly rare stuff.

It ended up with me reinstalling Windows 10 three times, testing the Acronis backup restore procedure, buying more adapters than I ever thought I would need, and cursing my laziness when I set up a bunch of stuff in the past.

Let’s start with a bit of setup information: I’m talking about the gamestation, which I bought after moving to London because someone among the moving companies (AMC Removals in Ireland, and Simpsons Removals in London) stole it. It uses an MSI X299 SLI PLUS motherboard, and when I bought it, I bought two Crucial M.2 SSDs, for 1TB each — one dedicated to the operating system and applications, and the other to store the ever-expanding photos library.

At some point a year or so ago, the amount of pictures I took crossed the 1TB mark, and I needed more space for the photos. So thanks to the fact that NVMe SSDs became more affordable, and that you can pretty much turn any PCIe 3.0 x4 slot into an NVMe slot with a passive adapter, I decided to do just that, and bought a Samsung 970 EVO Plus 1TB, copied the operating system to it, and made the two older Crucial SSDs into a single “Dynamic Volume” to have more space for pictures.

At first I used a random passive adapter that I bought on Amazon, and while that worked perfectly nice to connect the device, it had some trouble with keeping temperature: Samsung’s software reported a temperature between 68°C and 75°C which it considers “too high”. I ended up spending a lot of time trying to find a way around this, and I ended up replacing all the fans on the machine, adding more fans, and managed to bring it down to around 60°C constantly. Phew.

A few months later, I found an advertisement for the ASUS Hyper M.2 card, which is a pretty much passive card that allows to use up to four NVMe SSDs on a PCI-E x16 slot as long as your CPU supports “bifurcation” — which I checked my CPU and motherboard both to support. In addition to allowing adding a ton of SSDs to a motherboard, the Hyper M.2 has a big aluminium heatsink and a fan, that makes it interesting to make sure the temperature of the SSD is kept in control. Although I’ll be honest and say that I’m surprised that Asus didn’t even bother adding a PWM fan control: it has an on/off switch that pokes out of the chassis and that’s about it.

Now fast forward a few more months, and my main drive is also full, and also Microsoft has deprecated Dynamic Volumes in favour of Storage Spaces. I decided that I would buy a new, bigger SSD for the main drive, and then use this to chance to migrate the photos to a storage space bundling together all three of the remaining SSDs. Since I already had the Hyper M.2 and I knew my CPU supported the bifurcation, I thought it wouldn’t be too difficult to have all four SSDs connected together…

Bifurcation and VROC

The first thing to know is that the Hyper M.2 card, when loaded with a single NVMe SSD, behaves pretty much the same way as a normal PCI-E-to-M.2 adapter: the single SSD gets the four lanes, and is seen as a normal PCI-E device by the firmware and operating system. If you connect two or more SSDs, now things are different, and you need bifurcation support.

PCI-E bifurcation allows splitting an x8 or x16 slot (8 or 16 PCI-E lanes) into two or four x4 slots, which are needed for NVMe. It requires support from the CPU (because that’s where PCI-E lanes terminate), and from the BIOS (to configure the bifurcation), and from the operating system, for some reason that is not entirely clear to me, not being a PCI-E expert.

So the first problem I found with trying to get the second SSD to work on the Hyper M.2 is that I didn’t realise how complicated the whole selection of which PCI-E slot has how many lanes is on modern motherboards. Some slots are connected to the chipset (PCH) rather than the CPU directly, but you want the videocard and the NVMe to go to the CPU instead. When you’re using the M.2 slots, they take some of the lanes away, and it depends on whether you’re using SATA or NVMe mode which lanes they take away. And it depends on your CPU how many lanes you have available.

Pretty much, you will need to do some planning and maybe some pen-and-paper diagram to follow through. In particular, you need to remember that where the lanes are distributed is statically chosen. Even though you do have a full x16 slot at the bottom of your motherboard, and you have 16 free lanes to connect, that doesn’t mean those two are connected. Indeed it turned out that the bottom slot only has x8 at best on my CPU, and instead I needed to move the Hyper M.2 two slots up. Oops.

The next problem was that despite Ubuntu Live being able to access both NVMe drives transparently, and the firmware able to boot out of them, Windows refused to boot complaining about inaccessible boot device. The answer for this one is to be found in VROC: Virtual RAID on CPU. It’s Intel’s way to implement bifurcation support for NVMe drives, and despite the name it’s not only there if you plan on using your drives in a RAID configuration. Although, let me warn here, from what I understand, bifurcation should work fine without VROC, but it looks like most firmware just enables the two together, so at least on my board you can’t use bifurcated slots without VROC enabled.

The problem with VROC is that while Ubuntu seems to pass through it natively, Windows 10 doesn’t. Even 20H1 (which is the most recent release at the time of writing) doesn’t recognize SSDs connected to a bifurcated host unless you provide it with a driver, which is why you end up with the inaccessible boot device. It’s the equivalent of building your own Linux kernel, and forgetting the disk controller driver or the SCSI disk driver. I realized that when I tried doing a clean install (hey, I do have a back for a reason!), and the installer didn’t even see the drives, at all.

This is probably the closest I’m getting to retrocomputing, by reminding me of installing Windows XP for a bunch of clients and friends, back when AHCI became common, and having to provide a custom driver disk. Thankfully, Windows 10 can take that from USB, rather than having to fiddle around with installation media or CD swap. And indeed, the Intel drivers for VROC include a VMD (Volume Management Device) driver that allows Windows 10 to see the drives and even boot from them!

A Compromising Solution

So after that I managed to get a Windows 10 installed and set up — and one of my biggest worries went away: back when my computer was stolen and I reinstalled Windows 10, the license was still attached to the old machine, I had to call tech support to get it activated, and I wasn’t sure if it would let me re-activate it; it did.

Now, the next step for me was to make sure that the SSD had the latest firmware and was genuine and correctly set up, so I installed Samsung Magician tools, and… it didn’t let me do any of that, because it reported Intel as the provider for the NVMe driver, despite Windows reporting the drive to be supported by their own NVMe driver. I guess what they mean is that the VROC driver interferes with direct access to the devices. But it means you lose access to all SMART counters from Samsung’s own software (I expect other software might still be able to access it), with no genuinity checks and in particular no temperature warning. Given I knew that this had been an issue in the past, this worried me.

As far as I could tell, when using the Hyper M.2, you not only lose access to the SSD manufacturer tooling (like Magician), but I’m not even sure if Windows can still access the TRIM facilities — I didn’t manage to confirm for good, I got an error when I tried using it, but it might have been related to another issue that will become apparent later.

And to fit this all up, if you do decide to move the drives out of the Hyper M.2 card, say to bring them back to the motherboard, you are back to square one with the boot device being inaccessible, because Windows will look for the VROC VMD, which will be gone.

At that point I pretty much decided that the Hyper M.2 card and the whole VROC feature wouldn’t work out for me, too many compromises. I decided to take a different approach, and instead of bringing the NVMe drives away from the M.2 slots, I planned to take the SATA drives away from the M.2 slots.

You see, the M.2 slots can carry either NVMe drives using PCI-E directly, or still common SATA SSDs — the connector is keyed, although I’m not entirely sure why, as there’s nothing preventing to try connecting a SATA M.2 SSD in a connector that only supports NVMe (such as the Hyper M.2), but that’s a different topic that I don’t care to research myself. What matters is that you can buy passive adapters that convert an M.2 SSD to a normal 2.5″ SATA one. You can find those on AliExpress, obviously, but I needed them quickly, so I ordered them from Amazon instead — I got Sabrent ones because they were available for immediate dispatching, but be also careful because they sell both M.2 and mSATA converters, as they all use the same protocol and you just need a passive adapter.

Storage Space and the return of the Hyper M.2

After installing with the two Samsung SSDs on the motherboard’s M.2 slots I finally managed to get the Samsung Magician working, which confirmed not only that the drive is genuine, but also that it already has the latest firmware (good). Unfortunately it also told me that the temperature of the SSD was “too high”, at around 65°C.

The reason for that is that the motherboard predates the more common NVMe drives, and unlike LGR’s, it doesn’t have full aluminium heatsinks to bolt on top of the SSDs to keep the temperature. It came instead with a silly “shield” that might be worse than not having it, and it positioned the first M.2 slot… right underneath the videocar. Oops! Thankfully I do have an adapter with a heatsink that allows me to connect the single SSD to a PCI-E slot without needing to use VROC… the Hyper M.2 card. So I counted for re-opening the computer, moving the 2TB SSD to the Hyper M.2, and be done with that. Easy peasy, and since I already had the card this is probably worth it.

Honestly if I didn’t have the card I would probably have gone for one of those “cards” that have both a passive NVMe adapter and a passive SATA adapter (needing the SATA data cable, but not the power), since at that point I would have been able to keep one SATA SSD on the motherboard (they don’t get as hot it seems), but again, I worked with what I had at hand.

Then, as I said above, I also wanted to take this change to migrate my Dynamic Volumes to the new Storage Spaces, which are supposed to be better supported and include more modern features for SSDs. So once I got everything reinstalled, I tried creating a new pool and setting it up… to no avail. The UI didn’t let me create the pool. Instead I ended up using the command line via PowerShell, and that worked fine.

Though do note the commands on Windows 10 2004/20H1 are different from older Server versions. Which makes looking for solutions on ServerFault and similar very difficult Also it turns out that between deleting Dynamic Volumes from two disks and adding them to a Storage Spaces Pool, you need to reboot your computer. And the default way to select the disk (the “Friendly Name” as Windows 10 calls it) is to use the model number — which makes things interesting when you have two pairs of SSDs with the same name (Samsung doesn’t bother adding the size to the model name as reported by Windows).

And then there’s the kicker, which honestly got me less angry than everything else that went on, but did make me annoyed more than I showed up: Samsung Magician lost access to all the disks connected to the Storage Spaces pool! I assume this is because the moment when they are added to the pool, Windows 10 does not show them in the Disk Management interface either, and Magician is not updated to identify disks at a lower level. It’s probably a temporary situation, but Storage Spaces are also fairly uncommon, so maybe they will not bother fixing that.

The worst part is that even the new SSD disappeared, probably for the reason noted above: it has the same name as the disk that is in the Storage Spaces Pool. Which is what made me facepalm — given I once again lost access to Samsung’s diagnostics, although I confirmed the temperature is fine, the firmware has not changed, and the drive is genuine. I guess VROC would have done just as well, if I confirmed the genuineness before going on with the reinstalling multiple times.

Conclusion

Originally, I was going to say that the Hyper M.2 is a waste of time on Windows. The fact that you can’t actually monitor the device with the Samsung software is more than just annoying — I probably should have looked for alternative monitoring software to see if I could get to the SMART counters over VROC. On Linux of course there’s no issue with that given that Magician doesn’t exist.

But if you’re going to install that many SSDs on Windows, it’s likely you’re likely going to need to use Storage Spaces — in which case the fact that Magician doesn’t work is also moot, as it wouldn’t work either. The only thing you need to do is making sure that you have the drivers to install this correctly in the first place. Using the Hyper M.2 – particularly on slightly older motherboards that don’t have good enough heatsinks for their M.2 slots – turns out to be fairly useful.

Also Storage Spaces, despite being a major pain in the neck to set up on Windows 10, appear to do a fairly good job. Unlike Dynamic Volumes they do appear to balance the writing to multiple SSDs, they support TRIM, and there’s even support for preparing a disk to be removed from the pool, moving everything onto the remaining disks (assuming there’s enough space), and freeing up the drive.

If I’m not getting a new computer any time soon (and I would hope I won’t have to), I have a feeling I’ll go back to use the Hyper M.2 for VROC mode, even if it means reinstalling Windows again. Adding another 2TB or so of space for pictures wouldn’t be the cheapest idea, but it would allow expansion at a decent rate until whatever next technology arrives.