Why I Care About Licensing

Both on the blog and on Twitter, I have ranted at length at projects missing licensing information altogether, or not providing licensing information on specific files, or providing conflicting licensing information. As you can imagine, this is a topic that I’m very attached to, which is why I have been following REUSE guidelines to make sure that all my (currently active) projects follow the specification.

Unfortunately this care is not shared with many developers, even those who consider themselves part of the Free Software movement, and this causes friction, poisons the well in both directions, and overall is detrimental to the community and the movement. Even more so than when people care deeply and disagree on the “correct” licensing terms.

While I am most definitely not a lawyer, and I speak most definitely only for myself and not my employer, let me try to give you a run down of what’s going on here.

First of all, we need to start with a simplification, and handwavey accept that without an explicit license allowing it, the distribution, modification, and integration of source code is not allowed, or at least that’s the way we perceive it in the wider world. And Free Software licenses, more or less permissive, spell out the terms with which distribution and usage are allowed.

It Is But An Example

As far as I can tell, there’s no provision anywhere that source code used in documentation is exempt from these limitations, except insofar as the license on the documentation itself would apply if not otherwise overridden. And that’s how I started engaging with Adafruit: the documentation for most of their CircuitPython libraries provide a lot of useful examples — and as it turns out they were already released with an open-source license (MIT), but that was not obvious when looking at the docs sites themselves. So I convinced them to add SPDX headers to all their source code, including the examples — and now you can read the example and see immediately which license it’s released under. Isn’t that cool?

Unfortunately, sometimes developers are stubborn and find adding two lines to their documentation examples a distraction, and argue against it, making it annoying for others to use their example source code without either infringing the copyright or going the long way to find the right answers.

Websites, PDFs, Books, they are all equal

But this goes to the double for code that is explicitly written only as example material! Let me take a bit of a detour — my wife went through the awesome Python Crash Course a few months ago. While it suffers from a few of the issues I already complained about when it comes to splitting names, the book is fairly well written and has hands-on exercise that provide enough of a stretch to “my first Python project”. In the later parts of the book, one of the long-building exercise is writing a clone of Space Invaders with PyGame, which turned out to be interesting not just for her writing it, but for myself reviewing it as well, as game programming is definitely not a skill I ever spent time acquiring.

Now, remember I said there’s space to stretch? While the book guides you through building the very basic framework for “Alien Invasion” with full code to go with it, it leaves a lot of holes to be filled. Not just the assets (that it pretty much suggests you Google for and find somewhere online, without any discussion on what you can and cannot use — shout out to the Noun Project which I use for my own projects nowadays), but also some of the more advanced gameplay, and a lot of the refactoring — the way you write the game following the book is definitely more aimed at teaching than at maintaining. So when my wife finished with the book, I started showing her examples of how to refactor the code and introduce new features. So while the basic skeleton is the same as the original from the book, the version she ended up with was nearly fully rewritten. And it’s all in a Git repository!

But she has nothing to show for it. The source code in the book does not provide any licensing information. When I reached out to Eric Matthes (the book’s author) on Twitter asking him if he’d consider applying an opensource license to the code, so that she could publish it on her GitHub account to show off to some of her friends – and with an explicit mention that I’d have liked to use it as a base to test out BeeWare projects and see to contribute to some – he said he’d think about it, but that he wouldn’t feel right to release it under a permissive license that would allow someone to take it and sell it on AppStore and similar. So her options are to ignore licensing and publish the code anyway (after all, nobody cares, and I’m sure I can find plenty of people who did exactly that), or to comply with the (lack of) license and keep it for herself, and only show her friends a video of it working. She went for the latter, as we already had a long discussion of copyright when J Salmeron brought up the topic (and dang, we missed the opportunity to shake his hand as we were standing right behind him at the Beast in Black concert in Amsterdam last year!)

Provide It And They Will Build

There is one case that, personally, drained my will to contribute to an ecosystem even more than the example above. After all, Python Crash Course is a great book, and the only really good reason to publish the code is for “bragging rights” — which is not to say it’s not something, but it’s not the end of the world either.

When a commercial vendor is providing you with an extensible ecosystem for you to build upon, but doesn’t play by the same rules, it’s just… disappointing. In this case the issue is with Saleae, the manufacturer of the Logic Pro 16 analyzer I use for a bunch of different things. You may have noticed me providing screenshots off it when talking about fake candles and infrared. As a vendor, Saleae has very good user support: when I complained on Twitter that I wasted two hours chasing ghosts because I didn’t realise I forgot to connect the USB cable to the analyzer, and the software didn’t make it clear enough it was showing me demo garbage, they engaged, asked me what I would have done differently, and delivered the fix in less than a month. That was awesome support.

So where does it go wrong? Well, in June they updated their software to support Python-based extensions for analysis of specific protocols. I was actually interested in adding support for IR decoding to make my life easier in my TV controlling project, and so when they posted that one of their employees built a duty cycle measure tool and posted it on GitHub I was thrilled!

Except… the repository is there, the source code is there, but there is no license. The extension is pretty much a tutorial by itself on how to build what I needed, but it’s coming with no license attached, and as such I can’t use its code as a base for my own extension. And while I could possibly learn from it, it’s also a poison pill… there’s no license, if I copy it too literally, am I infringing copyright? Maybe, who knows? The author says I should «feel free to look, copy and use [his] Logic 2 extensions in any way [I] would like», but that’s not exactly a very comforting statement when you’re contributing while part of a company.

Final Thoughts

Just be yourself (this is pre-recorded). If you do care about Free Software, please take licensing seriously. If you don’t care about Free Software, because you don’t believe in the ideals behind, or you’re just not part of the ecosystem, then I can’t really blame you for disrespecting licenses, but then again if you rely on proprietary software license, you probably should respect all of them. It’s the same problem with software piracy.

I do believe that the folks at REUSE are doing a great service for all of us by making it possible to spell out licenses clearly and openly, and making it easy for others to modify and copy the code that we want to be out there in the world. It doesn’t take so much time to use the tool to add a few lines to a text file, or an additional text file for binary files. Please take the chance to sort this out!

Newcomers, Teachers, and Professionals

You may remember I had already a go at tutorials, after listening in on one that my wife had been going through. Well, she’s now learning about C after hearing me moan about higher and lower level languages, and she did that by starting with Harvard’s CS50 class, which is free to “attend” on edX. I am famously not a big fan of academia, but I didn’t think it would make my blood boil as much as it did.

I know that it’s easy to rant and moan about something that I’m not doing myself. After all you could say “Well, they are teaching at Harvard, you are just ranting on a c-list blog that is followed by less than a hundred people!” and you would be right. But at the same time, I have over a decade of experience in the industry, and my rants are explicitly contrasting what they say in the course to what “we” do, whether it is in opensource projects, or a bubble.

I think the first time I found myself boiling and went onto my soapbox was when the teacher said that the right “design” (they keep calling it design, although I would argue it’s style) for a single-source file program is to have includes, followed by the declaration of all the functions, followed by main(), followed by the definition of all the functions. Which is not something I’ve ever seen happening in my experience — because it doesn’t really make much sense: duplicating declarations/definitions in C is an unfortunate chore due to headers, but why forcing even more of that in the same source file?

Indeed, one of my “pre-canned comments” in reviews at my previous employer was a long-form of “Define your convenience functions before calling them. I don’t want to have to jump around to see what your doodle_do() function does.” Now it is true that in 2020 we have the technology (VSCode’s “show definition” curtain is one of the most magical tools I can think of), but if you’re anyone like me, you may even sometimes print out the source code to read it, and having it flow in natural order helps.

But that was just the beginning. Some time later as I dropped by to see how things were going I saw a strange string type throughout the code — turns out that they have a special header that they (later) define as “training wheels” that includes typedef char *string; — possibly understandable given that it takes some time to get to arrays, pointers, and from there to character arrays, but… could it have been called something else than string, given the all-too-similarly named std::string of C++?

Then I made the mistake of listening in on more of that lesson, and that just had me blow a fuse. The lesson takes a detour to try to explain ASCII — the fact that characters are just numbers that are looked up in a table, and that the table is typically 8-bit, with no mention of Unicode. Yes I understand Unicode is complicated and UTF-8 and other variable-length encodings will definitely give a headache to a newcomer who has not seen programming languages before. But it’s also 2020 and it might be a good idea to at least put out the idea that there’s such a thing as variable-length encoded text and that no, 8-bit characters are not enough to represent people’s names! The fact that my own name has a special character might have something to do with this, of course.

It went worse. The teacher decided to show some upper-case/lower-case trickery on strings to show how that works, and explained how you add or subtract 32 to go from one case to the other. Which is limited not only by character set, but most importantly by locale — oops, I guess the teacher never heard of the Turkish Four Is, or maybe there’s some lack of cultural diversity in the writing room for these courses. I went on a rant on Twitter over this, but let me reiterate this here as it’s important: there’s no reason why a newcomer to any programming language should know about adding/subtracting 32 to 7-bit ASCII characters to change their case, because it is not something you want to do outside of very tiny corner cases. It’s not safe in some languages. It’s not safe with characters outside the 7-bit safe Latin alphabet. It is rarely the correct thing to do. The standard library of any programming language has locale-aware functions to uppercase or lowercase a string, and that’s what you need to know!

Today (at the time of writing) she got to allocations, and I literally heard the teacher going for malloc(sizeof(int)*10). Even to start with a bad example and improve from that — why on Earth do they even bother teaching malloc() first, instead of calloc() is beyond my understanding. But what do I know, it’s not like I spent a whole lot of time fixing these mistakes in real software twelve years ago. I will avoid complaining too much about the teacher suggesting that the behaviour of malloc() was decided by the clang authors.

Since there might be newcomers reading this and being a bit lost of why I’m complaining about this — calloc() is a (mostly) safer alternative to allocate an array of elements, as it takes two parameters: the size of a single element and the number of elements that you want to allocate. Using this interface means it’s no longer possible to have an integer overflow when calculating the size, which reduces security risks. In addition, it zeroes out the memory, rather than leaving it uninitialized. While this means there is a performance cost, if you’re a newcomer to the language and just about learning it, you should err on the side of caution and use calloc() rather than malloc().

Next up there’s my facepalm on the explanation of memory layout — be prepared, because this is the same teacher who in a previous lesson said that the integer variable’s address might vary but for his explanation can be asserted to be 0x123, completely ignoring the whole concept of alignment. To explain “by value” function calls, they decide to digress again, this time explaining heap and stack, and they describe a linear memory layout, where the code of the program is followed by the globals and then the heap, with the stack at the bottom growing up. Which might have been true in the ’80s, but hasn’t been true in a long while.

Memory layout is not simple. If you want to explain a realistic memory layout you would have to cover the differences between physical and virtual memory, memory pages and pages tables, hugepages, page permissions, W^X, Copy-on-Write, ASLR, … So I get it that the teacher might want to simplify and skip over a number of these details and give a simplified view of how to understand the memory layout. But as a professional in the industry for so long I would appreciate if they’d be upfront with the “By the way, this is an oversimplification, reality is very different.” Oh, and by the way, stack grows down on x86/x86-64.

This brings me to another interesting… mess in my opinion. The course comes with some very solid tools: a sandbox environment already primed for the course, an instance of AWS Cloud9 IDE with the libraries already installed, a fairly recent version of clang… but then decides to stick to this dubious old style of C, with strcpy() and strcmp() and no reference to more modern, safer options — nevermind that glibc still refuses to implement C11 Annex K safe string functions.

But then they decide to not only briefly show the newcomers how to use Valgrind, of all things. They even show them how to use a custom post-processor for Valgrind’s report output, because it’s otherwise hard to read. For a course using clang, that can rely on tools such as ASAN and MSAN to report the same information in more concise way.

I find this contrast particularly gruesome — the teacher appears to think that memory leaks are an important defect to avoid in software, so much so that they decide to give a power tool such as Valgrind to a class of newcomers… but they don’t find Unicode and correctness in names representation (because of course they talk about names) to be as important. I find these priorities totally inappropriate in 2020.

Don’t get me wrong: I understand that writing a good programming course is hard, and that professors and teachers have a hard job in front of them when it comes to explain complex concepts to a number of people that are more eager to “make” something than to learn how it works. But I do wonder if sitting a dozen professionals through these lessons wouldn’t make for a better course overall.

«He who can, does; he who cannot teaches» is a phrase attributed to George Bernand Shaw — I don’t really agree with it as it is, because I met awesome professors and teachers. I already mentioned my Systems’ teacher, who I’m told retired just a couple of months ago. But in this case I can tell you that I wouldn’t want to have to review the code (or documentation) written by that particular teacher, as I’d have a hard time keeping to constructive comments after so many facepalms.

It’s a disservice to newcomers that this is what they are taught. And it’s the professionals like me that are causing this by (clearly) not pushing back enough on Academia to be more practical, or building better courseware for teachers to rely on. But again, I rant on a C-list blog, not teach at Harvard.

Falsehoods in Tutorials: Database Schemas

It’s well possible that a number of people reading this post have already stumbled across a few of the “Falsehoods Programmers Believe…” documents. If not, there appears to be a collection of them, although I have honestly only read through the ones about names, addresses, and time. The short version of all of this, is that interfacing software with reality is complicated, and in many cases, programmers don’t know how complicated it is at all. And sometimes this turns into effectively institutional xenophobia.

I have already mused that tutorials and documentation are partially to blame, by spreading code memes and reality-hostile simplifications. But now I have some more evidence of this being the case, without me building an explicit strawman like I did last time, and that brings me to another interesting point, in regards to the raising importance of getting stuff right beforehand, as costs to correct these mistakes are raising.

You see, with lockdown giving us a lot of spare time, I spent some of it on artsy projects and electronics, while my wife spent it learning about programming, Python, and more recently databases. She found a set of tutorials on YouTube that explain the basis of what a database is, and how SQL works. And they were full of those falsehoods I just linked above.

The tutorials use what I guess is a fairly common example of using a database for employees, customers, and branches of a company. And it includes in the example the fields for first name and last name. Which frankly is a terrible mistake — with very few exception that include banks and airlines, there’s no need to distinguish between components of a name, and a simple full name field would work just as well, and don’t end up causing headaches to people from cultures that don’t split names the same way. The fact that I recently ranted about this on Twitter against VirusTotal is not totally coincidental.

It goes a bit beyond that though, by trying to explain ON DELETE triggers by attaching them to the deletion of an employee from the database. Now, I’m not a GDPR lawyer, but it’s my understanding that employee rosters are one of those things that you’re allowed to keep for essential business needs — and you most likely don’t want to ever delete employees, their commissions payment history, and tax records.

I do understand that a lot of tutorials need to be using simple examples, as setting up a proper HR-compatible database would probably take a lot more time, particularly with compartmentalizing information so that your random sales analyst don’t have access to the home phone numbers of their colleagues.

I have no experience with designing employee-related database schemas, so I don’t really want to dig myself into a hole I can’t come out of, by running with this example. I do have experience with designing database schemas for product inventory, though, so I will run with that example. I think it was a different tutorial that was talking about those, but I’ll admit I’m not sure, because I didn’t pay too much attention as I was getting annoyed at the quality.

So this other tutorial focused on products, orders and sales total — its schema was naïve and not the type of databases any real order history system would use — noticeably, it assumed that an order would just need to connect with the products, with the price attached to the product row. In truth, most databases like those would need to attach the price for which an item was sold to the order — because products change prices over time.

And at the same time, it’s fairly common to want to keep the history of price changes for an item, which include the ability to pre-approve time-limited discounts, so a table of products is fairly unlikely to have the price for each item as a column. Instead, I’ve commonly seen these database to have a prices table that references the items, and provides start and end dates for the price. This way, it’s possible to know at any time what is the “valid price” for an item. And as some of my former customers had to learn on their own, it’s also important to separate which VAT is used at which time.

Example ER diagram showing an example of a more realistic shop database.

There are five tables. * indicates the primary key.

Order (*ID, Customer_ID, Billing_Address, Shipping_Address)
Order_Products(*Order_ID, *Product_ID, Gross_Price, VAT_Rate)
Product(*ID, Name)
Product_VAT(*Product_ID, *Start_Date, End_Date, VAT_Rate)
Product_ID(*Product_ID, *Start_Date, End_Date, Gross_Price)

This is again fairly simplified. Most of the shopping systems you might encounter use what might appear redundant, particularly when you’re taught that SQL require normal form databases, but that’s just in theory — practice is different. Significantly so at times.

Among other things, if you have an online shop that caters to multiple countries within the European Union, then your table holding products’ VAT information might need to be extended to include the country for each one of them. Conversely, if you are limited to accounting for VAT in a single country you may be able to reduce this to VAT categories — but keep in mind that products can and do change VAT categories over time.

Some people might start wondering now why would you go through this much trouble for an online store, that only needs to know what the price is right now. That’s a good point, if you happen to have multiple hundreds’ megabytes of database to go through to query the current price of a product. In the example above you would probably need a query such as

SELECT Product.ID, Product.Name, Product_Price.Gross_Price, Product_VAT.VAT_Rate
FROM Product
  LEFT JOIN Product_Price ON Product_Price.Product_ID = Product.ID
  LEFT JOIN Product_VAT ON Product_VAT.Product_ID = Product.ID
WHERE
  Product.ID = '{whatever}' AND
  Product_Price.Start_Date <= TODAY() AND
  Product_Price.End_Date > TODAY() AND
  Product_VAT.Start_Date <= TODAY() AND
  Product_VAT.End_Date > TODAY();

It sounds like an expensive query, doesn’t it? And it seems silly to go and scan the price and VAT tables all the time throughout the same day. It also might be entirely incorrect, depending on its placement — I do not know the rules of billings, but it may very well be possible that an order be placed close to a VAT change boundary, in which case the customer could have to pay the gross price at the time of order, but the VAT at shipping time!

So what you do end up using in many places for online ordering is a different database. Which is not the canonical copy. Often the term used for this is ETL, which stands for Extract, Transform, Load. It basically means you can build new, read-only tables once a day, and select out of those in the web frontend. For instance the above schema could be ETL’d to include a new, disconnected WebProduct table:

The same ER diagram as before, but this time with an additional table:

WebProduct(*ID, *Date, Name, Gross_Price, VAT_Rate)

Now with this table, the query would be significantly shorter:

SELECT ID, Name, Gross_Price, VAT_Rate
FROM WebProduct
WHERE ID = '{whatever}' AND Date = TODAY();

The question that comes up with seeing this schema is “Why on Earth do you have a Date column as part of the primary key, and why do you need to query for today’s date?” I’m not suggesting that the new table is generated to include every single day in existence, but it might be useful to let an ETL pipeline generate more than just one day’s worth of data — because you almost always want to generate today’s and tomorrow’s, that way you don’t need to take down your website for maintenance around midnight. But also, if you don’t have any expectation that prices will fluctuate on a daily basis, it would be more resource-friendly to run the pipeline every few days instead of daily. It’s a compromise of course, but that’s what system designing is there for.

Note that in all of this I have ignored the issue of stock. That’s a harder problem, and one that might not actually be suited to be solved with a simple database schema — you need to come to terms with compromises around availability and the fact that you need a single source of truth for how many items you’re allowed to sell… consistency is hard.

Closing my personal rant on database design, there’s another problem I want to point a spotlight to. When I started working on Autotools Mythbuster, I explicitly wanted to be able to update the content, quickly. I have had multiple revisions of the book on the Kindle Store and Kobo, but even those lagged behind the website a few times. Indeed, I think the only reason why they are not lagging behind right now is that most of the changes on the website in the past year or two have only been cosmetics, and not applying to ePub.

Even for a project like that, which uses the same source of truth for the content, there’s a heavy difference in the time cost of updating the website rather than the “book”. When talking about real books, that’s an even bigger cost — and that’s without going into the print books realm. Producing content is hard, which is why I realised many years ago that I wouldn’t have the ability to carve out enough time to make a good author.

Even adding diagrams to this blog post has a slightly higher cost than just me ranting “on paper”. And that’s why sometimes I could add more diagrams with my ideas, but I don’t, because the cost of producing it, and keeping it current would be too high. The Glucometers Protocols site as a few rough diagrams, but they are generated with blockdiag so that they can be edited quickly.

When it comes to online tutorial, though, there’s an even bigger problem: the possibly vast majority of them are, nowadays, on YouTube, as videos shot with a person in frame, to be more like a teacher in a classroom, that can explain things. If something in the video is only minimally incorrect, it’s unlikely that those videos would be re-shot — it would be an immense cost in time. Also, you can’t just update a YouTube video like you do a Kindle book — you lose comments, likes, view-counts, and those things matter for monetization, which is what most of those tutorials out there are made for. So unless the mistakes in a video-tutorial are Earth-shattering, it’s hard to expect the creators to go and fix them.

Which is why I think that it’s incredibly important to get the small things right — Stop using first and last name fields in databases, objects, forms, and whatever else you are teaching people to make! Think a bit harder as for how a product inventory database would look like! Be explicit in pointing out that you’re simplifying to an extreme, rather than providing a real-world-capable design of a database! And maybe, just maybe, start using examples that are ridiculous enough that they don’t risk being used by a junior developer in the real world.

And let me be clear on this: you can’t blame junior developers for making mistakes such as using a naïve database schema, if that’s all they are taught! I have been saying this at previous dayjob for a while: you can’t complain about the quality of code of newbies unless you have provided them with the right information in the documentation — which is why I spent more time than average on example code, and tutorials, to fix up trimmings and make it easier to copy-paste the example code into a working change that follows best practices. In the words of a colleague wiser than me: «Example code should be exemplar.»

So save yourself some trouble in the future, by making sure the people that you’re training get the best experience, and can build your own next tool to the best of specs.

Are tutorials to blame for basic IT problems?

It’s now effectively impossible to spend a month following IT (and not just) new and not hear of breaches, “hacks”, or general security fiascos. Some of these are tracked down to very basic mistakes in configuration or coding of software, including the lack of hashing of passwords in database. Everyone in the industry, including me, have at some point expressed the importance of proper QA and testing, and budgeting for them in the development process. But what if the problem is much higher up the chain?

Falsehoods Programmers Believe About Names is now over seven years old, and yet my just barely complicated full name (first name with a space in it, surname with an accent) can’t be easily used by most of the services I routinely use. Ireland was particularly interesting, as most services would support characters in the “Latin extended” alphabet, due to the Irish language use of ó, but they wouldn’t accept my surname, which uses ò — this not a first, I had trouble getting invoices from French companies before because they only know about ó as a character.

In a related, but not directly connected topic, there are the problems an acquaintance of mine keeps stumbling across. They don’t want service providers to attach a title to their account, but it looks like most of the developers that implement account handling don’t actually think about this option at all, and make it hard to not set a honorific at all. In particular, it appears not only UIs tend to include a mandatory drop-down list of titles, but the database schema (or whichever other model is used to store the information) also provides the title as an enumeration within a list — that is apparent by the way my acquaintance has had their account reverted to a “default” value, likely the “zeroth” one in the enumeration.

And since most systems don’t end up using British Airways’s honorific list but are rather limited to the “usual” ones, that appears to be “Ms” more often than not, as it sorts (lexicographically) before the others. I have had that happen to me a couple of times too, as I don’t usually file the “title” field on paper forms (I never seen much of a point of it), and I guess somewhere in the pipeline a model really expects a person to have a title.

All of this has me wondering, oh-so-many times, why most systems appear to want to store a name in separate entries for first and last name (or variation thereof), and why they insist on having a honorific title that is “one of the list” rather than a freeform (which would accept the empty string as a valid value). My theory on this is that it’s the fault of the training, or of the documentation. Multiple tutorials I have read, and even followed, over the years defined a model for a “person” – whether it is an user, customer, or any other entity related to the service itself – and many of these use the most basic identifying information about a person as fields to show how the model works, which give you “name”, “surname”, and “title” fields. Bonus points to use an enumeration for the title rather than a freeform, or validation that the title is one of the “admissible” ones.

You could call this a straw man argument, but the truth is that it didn’t take me any time at all to find an example tutorial (See also Archive.is, as I hope the live version can be fixed!) that did exactly that.

Similarly, I have seen sample tutorial code explaining how to write authentication primitives that oversimplify the procedure by either ignoring the salt-and-hashing or using obviously broken hashing functions such as crypt() rather than anything solid. Given many of us know all too well how even important jobs that are not flashy enough for a “rockstar” can be pushed into the hands of junior developers or even interns, I would not be surprised if a good chunk of these weak authentication problems that are now causing us so much pain are caused by simple bad practices that are (still) taught to those who join our profession.

I am afraid I don’t have an answer of how to fix this situation. While musing, again on Twitter, the only suggestion for a good text on writing correct authentication code is the NIST recommendations, but these are, unsurprisingly, written in a tone that is not useful to teach how to do things. They are standards first and foremost, and they are good, but that makes them extremely unsuitable for newcomers to learn how to do things correctly. And while they do provide very solid ground for building formally correct implementations of common libraries to implement the authentication — I somehow doubt that most systems would care about the formal correctness of their login page, particularly given the stories we have seen up to now.

I have seen comments on social media (different people on different media) about what makes a good source of documentation changes depending on your expertise, which is quite correct. Giving a long list of things that you should or should not do is probably a bad way to introduce newcomers to development in general. But maybe we should make sure that examples, samples, and documentation are updated so that they show the current best practice rather than overly simplified, or artificially complicated (sometimes at the same time) examples.

If you’re writing documentation, or new libraries (because you’re writing documentation for new libraries you write, right?) you may want to make sure that the “minimal” example is actually the minimum you need to do, and not skip over things like error checks, or full initialisation. And please, take a look at the various “Falsehoods Programmers Believe About” lists — and see if your example implementation make those assumptions. And if so fix them, please. You’ll prevent many mistakes from happening in real world applications, simply because the next junior developer who gets hired to build a startup’s latest website will not be steered towards the wrong implementations.