Computer-Aided Software Engineering

Fourteen years ago, fresh of translating Ian Sommerville’s Software Engineering (no, don’t buy it, I don’t find it worth it), and approaching the FLOSS community for the first time, I wrote a long article for the Italian edition of Linux Journal on Computer-Aided Software Engineering (CASE) tools. Recently, I’ve decided to post that article on the blog, since the original publisher is gone, and I thought it would be useful to just have it around. And because the OCR is not really reliable, I ended up having to retype a good chunk of it.

And that reminded me of how, despite me having been wrong a lot of times before, I still think some ideas stuck with me and I still find them valid. CASE is one of those, even though a lot of times we’re not really talking of the tools involved as CASE.

UML is the usual example of a CASE tool — it confuses a lot of people because the “language” part suggests it’s actually used to write programs, but that’s not what it is for: it is a way to represent similar concepts in similar ways, without having to re-explain the same iconography: sequence diagrams, component diagrams, entity-relationship diagrams standardise the way you express certain relationship and information. That’s what it is all about — and while you could draw all of those diagrams without any specific tool, with either LibreOffice Draw, or Inkscape, or Visio, specific tools for UML are meant to help (aid) you with the task.

My personal preferred tool for UML is Visual Paradigm, which is a closed-source, proprietary solution — I have not found a good open source toolkit that could replace it. PlantUML is an interesting option, but it doesn’t have nearly all the aid that I would expect form an actual UML CASE tool — you can’t properly express relationships between different components across diagrams, as you don’t have a library of components and models.

But setting UML aside, there’s a lot more that should fit into the CASE definition. Tools for code validation and review, which are some of my favourite things ever, are also aids to software engineering. And so are linters, formatters, and sanitizers. It’s easy to just call them “dev tools”, but I would argue that particularly when it comes to automating the code workflows, it makes sense to consider them CASE tools, and reduce the stigma attached to the concept of CASE, particularly in the more “trendy” startups and open source — where I still feel push backs at using UML, or auto-formatters, and integrated development environments.

Indeed, for most of these tools, they are already considered their own category: “developer productivity”. Which is not wrong, but it does reduce significantly the impact they have — it’s not just about developers, or coders. I like to say that Software Engineering is a teamwork practice, and not everybody on a Software Engineering team would be a coder — or a software engineer, even.

A proper repository of documents, kept up to date with the implementation, is not just useful for the developers that come later, and need to implement features that integrate with the already existing system. It’s useful for the SRE/Ops folks who are debugging something on fire, and are looking at the interaction between different components. It’s useful to the customer support folks who are being asked why only a particular type of requests are failing in one of the backends. It’s useful to the product managers to have clear which use cases are implemented for the service, and which components are involved in specific user journeys.

And similarly it extends for other type of tools — A code review tool that can enforce updates to the documentation. A dependency tracking system that can match known vulnerabilities. A documentation repository that allows full reviews. An issue tracker system that can identify who most recently changed code that affects the component an issue was filed on.

And from here you can see why I’m sceptical about single-issue tools being “good enough”. Without integration, these tools are only as useful as the time they save, and often that means they are “negative useful” — it takes time to set up the tools, to remember to run them, and to address their concern. Integrated tools instead can provide additional benefits that go beyond their immediate features.

Take a linter as an example: a good linter with low false positive rate is a great tool to make sure your code is well written. But if you have to manually run it, it’s likely that, in a collaborative project, only a few people will be running it after each change, slowing them down, while not making much of a difference for everyone else. It gets easier if the linter is integrated in the editor (or IDE), and even easier if it’s also integrated as part of code review – so those who are not using the same editor can still be advised by it – and it’s much better if it’s integrated with something like pre-commit to make it so the issues are fixed before the review is sent out.

And looking at all these pieces together, the integrations, and the user journeys, that is itself Software Engineering. FLOSS developers in general appears to have built a lot of components and tools that would allow building those integrations, but until recently I would have said that there’s been no real progress in making it proper software engineering. Nowadays, I’m happy to see that there is some progress, even as simple as EditorConfig, to avoid having to fight over which editors to support in a repository, and which ones not to.

Hopefully this type of tooling is not going to be relegated to textbooks in the future, and we’ll also be used to have a bunch of CASE tools in our toolbox, to make software… better.

Diagonal Contributions

This is a tale that starts on my previous dayjob. My role as an SRE had been (for the most part) one of support, with teams dedicated to developing the product, and my team making sure that it would perform reliably and without waste. The relationship with “the product team” has varied over time and depending on both the product and the SRE team disposition, sometimes in not particularly healthy way either.

In one particular team, I found myself supporting (together with my team) six separate product teams, spread between Shanghai, Zurich and Mountain View. This put particular pressure on the dynamics of the team, particularly when half of the members (based in Pittsburgh) didn’t even have a chance to meet the product team of two services (based in Shanghai), as they would be, in the normal case, 12 hours apart. It’s in this team that I started formulating the idea I keep referring to as “diagonal contributions”.

You see, there’s often a distinction between horizontal and vertical contributions. Vertical referring to improving everything of a service, from the code itself, to its health checks, release, deployment, rollout, … While horizontal referring to improving something of every service, such as making every RPC based server be monitored through the same set of metrics. And there are different schools of thought on which option is valid and which one should be incentivised, and so it usually depends on your manager and their manager on which one of the two approach you’ll be rewarded to take.

When you’re supporting so many different teams directly, vertical contributions are harder on the team overall — when you go all in to identify and fix all the issues for one of the products, you end up ignoring the work needed for the others. In these cases an horizontal approach might pay off faster, from an SRE point of view, but it comes with a cost: the product teams would then have little visibility into your work, which can turn into a nasty confrontation, particularly depending on the management you find yourself dealing with (on both sides).

It’s in that situation that I came up with “diagonal contributions”: improve a pain point for all the services you own, and cover as many services you can. In a similar fashion to rake collection, this is not an easy balance to strike, and it takes experience to have it done right. You can imagine from the previous post that my success at working on this diagonal has varied considerably depending on teams, time, and management.

What did work for me, was finding some common pain points between the six products I supported, and trying to address those not with changes to the products, but with changes to the core libraries they used or the common services they relied upon. This allowed me to show actual progress to the product teams, while solving issues that were common to most of the teams in my area, or even in the company.

It’s a similar thing with rake collection for me: say there’s a process you need to follow that takes two to three days to go through, and four out of your six teams are supposed to go through it — it’s worth it to invest four to six days to reduce the process to something that takes even just a couple of hours: you need fewer net people-days even just looking at the raw numbers, which is very easy to tell, but that’s not where it stops! A process that takes more than a day adds significant risks: something can happen overnight, the person going through the process might have to take a day off, or they might have a lot of meetings the following day, adding an extra day to the total, and so on.

This is also another reason why I enjoy this kind of work — as I said before, I disagree with Randall Munroe when it comes to automation. It’s not just a matter of saving time to do something trivial that you do rarely: automation is much less likely to make one-off mistakes (it’s terrifyingly good at making repeated mistakes of course), and even if it doesn’t take less time than a human would take, it doesn’t take human time to do stuff — so a three-days-long process that is completed by automation is still a better use of time than a two-days-long process that rely on a person having two consecutive days to work on it.

So building automation or tooling, or spending time making it easier to use core libraries, are in my books a good way to make contributions that are more valuable than just to your immediate team, while not letting your supported teams feel like they are being ignored. But this only works if you know which pain points your supported teams have, and you can make a case that your work directly relates to those pain points — I’ve seen situations where a team has been working on very valuable automation… that relieved no pain from the supported team, giving them a feeling of not being taken into consideration.

In addition to a good relationship with the supported team, there’s another thing that helps. Actually I would argue that it does more than just help, and is an absolute requirement: credibility. And management support. The former, in my experience, is a tricky one to understand (or accept) for many engineers, including me — that’s because often enough credibility in this space is related to the actions of your predecessors. Even when you’re supporting a new product team, it’s likely its members have had interactions with support teams (such as SRE) in the past, and those interactions will colour the initial impression of you and your team. This is even stronger when the product team was assigned a new team — or you’re a new member of a team, or you’re part of the “new generation” of a team that went through a bit of churn.

The way I have attacked that problem is by building up my credibility, by listening, and asking questions of what the problems the team feel are causing them issues are. Principles of reliability and best practices are not going to help a team that is struggling to find the time to work even on basic monitoring because they are under pressure to deliver something on time. Sometimes, you can take some of their load away, in a way that is sustainable for your own team, in a way that gains credibility, and that further the relationship. For instance you may be able to spend some time writing the metric-exposing code, with the understanding that the product team will expand it as they introduce new features.

The other factor as I said is management — this is another of those things that might bring a feeling of unfairness. I have encountered managers who seem more concerned about immediate results than the long-term pictures, and managers who appear afraid of suggesting projects that are not strictly within the scope of reliability, even when they would increase the team’s overall credibility. For this, I unfortunately don’t have a good answer. I found myself overall lucky with the selection of managers I have reported to, on average.

So for all of you out there in a position of supporting a product team, I hope this post helped giving you ideas of how to building a more effective, more healthy relationship.

Don’t Ignore Windows 10 as a Development Platform for FLOSS

Important Preface: This blog post was written originally on 2020-05-12, and scheduled for later publication, inspired by this short Twitter thread. As such it well predates Microsoft’s announcement of expanding support of WSL2 to graphical apps. I considered trashing, or seriously re-editing the blog post in the light of the announcement, but I honestly lack the energy to do that now. It left a bad taste in my mouth to know that it will likely get drowned out in the noise of the new WSL2 features announcement.

Given the topic of this post I guess I need to add a preface to point out my “FLOSS creds” — because I have seen already too many attacks to people who even use Windows at all. I have been an opensource developer for over fifteen years now, and part of the reason why I left my last bubble was because it made it difficult for me to contribute to various opensource projects. I say this because I’m clearly a supporter of Free Software and Open Source, wherever possible. I also think that’s different people have different needs, and that ignoring that is a failure of the FLOSS movement as a whole.

The “Year of Linux on the Desktop” is now a meme that has been running its course to the point of being annoying. Despite what FLOSS advocates keep saying, “Linux on the Desktop” is not really moving, and while I do have some strong opinions on this, that’s for another day. Most users, and in particular newcomers to FLOSS (both as users and developers) are probably using a more “user friendly” platform — if you leave a comment with the joke on UNIX being selective with its friends, you’ll end up on a plonkfile, be warned.

About ten years ago, it seemed like the trend was for FLOSS developers to use MacBooks as their daily laptops. I did that for a while myself — an UNIX-based platform with all the tools of the trade, which allowed quite a bit of work being done without having access to a Linux platform. SSH, Emacs, GCC, Ruby, and so on. And at the same time, you had the stability of Mac OS X, with the battery life and all the hardware worked great out of the box. But then more recently, Apple’s move towards “walled gardens” seemed to be taking away from this feasibility.

But back to the main topic. Over the past many years, I’ve been using a “mixed setup” — using a Linux laptop (or more recently desktop) for development, and a Windows (7, then 10) desktop for playing games, editing photos, designing PCBs, and for logic analysis. The latter is because Saleae Logic takes a significant amount of RAM when analysing high-frequency signals, and I have been giving my gamestations as much RAM as I can just for Lightroom, so it makes sense to run it on the machine with 128GB of RAM.

But more recently I have been exploring the ability of using Windows 10 as a development platform. In part because my wife has been learning Python, and since also learning a new operating system and paradigm at the same time would have been a bloody mess, she’s doing so on Windows 10 using Visual Studio Code and Python 3 as distributed through the Microsoft Store. While helping her, I had exposure to Windows as a Python development platform, so I gave it a try when working on my hack to rename PDF files, which turned out to be quite okay for a relatively simple workflow. And the work on the Python extension keeps making it more and more interesting — I’m not afraid to say that Visual Studio Code is better integrated with Python than Emacs, and I’m a long-time user of Emacs!

In the last week I have actually stepped up further how much development I’m doing on Windows 10 itself. I have been using HyperV virtual machines for Ghidra, to make use of the bigger screen (although admittedly I’m just using RDP to connect to the VM so it doesn’t really matter that much where it’s running), and in my last dive into the Libre 2 code I felt the need to have a fast and responsive editor to go through executing part of the disassembled code to figure out what it’s trying to do — so once again, Visual Studio Code to the rescue.

Indeed, Windows 10 now comes with an SSH client, and Visual Studio Code integrates very well with it, which meant I could just edit the files saved in the virtual machine and have the IDE also build them with GCC and executing them to get myself an answer.

Then while I was trying to use packetdiag to prepare some diagrams (for a future post on the Libre 2 again), I found myself wondering how to share files between computers (to use the bigger screen for drawing)… until I realised I could just install the Python module on Windows, and do all the work there. Except for needing sed to remove an incorrect field generated in the SVG. At which point I just opened my Debian shell running in WSL, and edited the files without having to share them with anything. Uh, score?

So I have been wondering, what’s really stopping me from giving up my Linux workstation for most of the time? Well, there’s hardware access — glucometerutils wouldn’t really work on WSL unless Microsoft is planning a significant amount of compatibility interfaces to be integrated. Similar for using hardware SSH tokens — despite PC/SC being a Windows technology to begin with. Screen and tabulated shells are definitely easier to run on Linux right now, but I’ve seen tweets about modern terminals being developed by Microsoft and even released FLOSS!

Ironically, I think it’s editing this blog that is the most miserable experience for me on Windows. And not just because of the different keyboard (as I share the gamestation with my wife, the keyboard is physically a UK keyboard — even though I type US International), but also because I miss my compose key. You may have noticed already that this post is full of em-dashes and en-dashes. Yes, I have been told about WinCompose, but last time I tried using it, it didn’t work and even screwed up my keyboard altogether. I’m now trying it again, at least on one of my computers, and if it doesn’t explode in my face again, I may just give it another try later.

And of course it’s probably still not as easy to set up a build environment for things like unpaper (although at that point, you can definitely run it in WSL!), or to have a development environment for actual Windows applications. But this is all a matter of different set of compromises.

Honestly speaking, it’s very possible that I could survive with a Windows 10 laptop for my on-the-go opensource work, rather than the Linux one I’ve been using. With the added benefit of being able to play Settlers 3 without having to jump through all the hoops from the last time I tried. Which is why I decided that the pandemic lockdown is the perfect time to try this out, as I barely use my Linux laptop anyway, since I have a working Linux workstation all the time. I have indeed reinstalled my Dell XPS 9360 with Windows 10 Pro, and installed both a whole set of development tools (Visual Studio Code, Mu Editor, Git, …) and a bunch of “simple” games (Settlers, Caesar 3, Pharaoh, Age of Empires II HD); Discord ended up in the middle of both, since it’s actually what I use to interact with the Adafruit folks.

This doesn’t mean I’ll give up on Linux as an operating system — but I’m a strong supporter of “software biodiversity”, so the same way I try to keep my software working on FreeBSD, I don’t see why it shouldn’t work on Windows. And in particular, I always found that providing FLOSS software on Windows a great way to introduce new users to the concept of FLOSS — focusing more on providing FLOSS development tools means giving an even bigger chance for people to build more FLOSS tools.

So is everything ready and working fine? Far from it. There’s a lot of rough edges that I found myself, which is why I’m experimenting with developing more on Windows 10, to see what can be improved. For instance, I know that the reuse-tool has some rough edges with encoding of input arguments, since PowerShell appears to still not default to UTF-8. And I failed to use pre-commit for one of my projects — although I have not taken notice yet much of what failed, to start fixing it.

Another rough edge is in documentation. Too much of it assumes only a UNIX environment, and a lot of it, if it has any support for Windows documentation at all, assumes “old school” batch files are in use (for instance for Python virtualenv support), rather than the more modern PowerShell. This is not new — a lot of times modern documentation is only valid on bash, and if you were to use an older operating system such as Solaris you would find yourself lost with the tcsh differences. You can probably see similar concerns back in the days when bash was not standard, and maybe we’ll have to go back to that kind of deal. Or maybe we’ll end up with some “standardization” of documentation that can be translated between different shells. Who knows.

But to wrap this up, I want to give a heads’ up to all my fellow FLOSS developers that Windows 10 shouldn’t be underestimated as a development platform. And that if they intend to be widely open to contributions, they should probably give a thought of how their code works on Windows. I know I’ll have to keep this in mind for my future.

On Rake Collections and Software Engineering

autum, earth's back scratcher

Matthew posted on twitter a metaphor about rakes and software engineering – well, software development but at this point I would argue anyone arguing over these distinctions have nothing better to do, for good or bad – and I ran with it a bit by pointing out that in my previous bubble, I should have used “Rake Collector” as my job title.

Let me give a bit more context on this one. My understanding of Matthew’s metaphor is that senior developers (or senior software engineers, or senior systems engineers, and so on) are at the same time complaining that their coworkers are making mistakes (“stepping onto rakes”, also sometimes phrased as “stepping into traps”), while at the same time making their environment harder to navigate (“spreading more rakes”, also “setting up traps”).

This is not a new concept. Ex-colleague Tanya Reilly expressed a very similar idea with her “Traps and Cookies” talk:

I’m not going to repeat all of the examples of traps that Tanya has in her talk, which I thoroughly recommend for people working with computers to watch — not only developers, system administrators, or engineers. Anyone working with a computer.

Probably not even just people working with computers — Adam Savage expresses yet another similar concept in his Every Tool’s a Hammer under Sweep Up Every Day:

[…] we bought a real tree for Christmas every year […]. My job was always to put the lights on. […] I’d open the box of decorations labeled LIGHTS from the previous year and be met with an impossible tangle of twisted, knotted cords and bulbs and plugs. […] You don’t want to take the hour it’ll require to separate everything, but you know it has to be done. […]

Then one year, […] I happened to have an empty mailing tube nearby and it gave me an idea. I grabbed the end of the lights at the top of the tree, held them to the tube, then I walked around the tree over and over, turning the tube and wrapping the lights around it like a yuletide barber’s pole, until the entire six-string light snake was coiled perfectly and ready to be put back in its appointed decorations box. Then, I forgot all about it.

A year later, with the arrival of another Christmas, I pulled out all the decorations as usual, and when I opened the box of lights, I was met with the greatest surprise a tired working parent could ever wish for around the holidays: ORGANIZATION. There was my mailing tube light solution from the previous year, wrapped up neat and ready to unspool.

Adam Savage, Every Tool’s a Hammer, page 279, Sweep up every day

This is pretty much the definition of Tanya’s cookie for the future. And I have a feeling that if Adam was made aware of Tanya’s Trap concept, he would probably point at a bunch of tools with similar concepts. Actually, I have a feeling I might have heard him saying something about throwing out a tool that had some property that was opposite of what everything else in the shop did, making it dangerous. I might be wrong so don’t quote me on that, I tried looking for a quote from him on that and failed to find anything. But it is something I definitely would do among my tools.

So what about the rake collection? Well, one of the things that I’m most proud of in my seven years at that bubble, is the work I’ve done trying to reduce complexity. This took many different forms, but the main one has been removing multiple optional arguments to interfaces of libraries that would be used across the whole (language-filtered) codebase. Since I can’t give very close details of what’s that about, you’ll find the example a bit contrived, but please bear with me.

When you write libraries that are used by many, many users, and you decide that you need a new feature (or that an old feature need to be removed), you’re probably going to add a parameter to toggle the feature, and either expect the “modern” users to set it, or if you can, you do a sweep over the current users, to have them explicitly request the current behaviour, and then you change the default.

The problem with all of this, is that cleaning up after these parameters is often seen as not worth it. You changed the default, why would you care about the legacy users? Or you documented that all the new users should set the parameter to True, that should be enough, no?

That is a rake. And one that is left very much in the middle of the office floor by senior managers all the time. I have seen this particular pattern play out dozens, possibly hundreds of times, and not just at my previous job. The fact that the option is there to begin with is already increasing complexity on the library itself – and sometimes that complexity gets to be very expensive for the already over-stretched maintainers – but it’s also going to make life hard for the maintainers of the consumers of the library.

“Why does the documentation says this needs to be True? In this code my team uses it’s set to False and it works fine.” “Oh this is an optional parameter, I guess I can ignore it, since it already has a default.” *Copy-pastes from a legacy tool that is using the old code-path and nobody wanted to fix.*

As a newcomer to an environment (not just a codebase), it’s easy to step on those rakes (sometimes uttering exactly the words above), and not knowing it until it’s too late. For instance if a parameter controls whether you use a more secure interface, over an old one you don’t expect new users of. When you become more acquainted with the environment, the rakes become easier and easier to spot — and my impression is that for many newcomers, that “rake detection” is the kind of magic that puts them in awe of the senior folks.

But rake collection means going a bit further. If you can detect the rake, you can pick it up, and avoid it smashing in the face of the next person who doesn’t have that detection ability. This will likely slow you down, but an environment full of rakes slows down all the newcomers, while a mostly rake-free environment would be much more pleasant to work with. Unfortunately, that’s not something that aligns with business requirements, or with the incentives provided by management.

A slight aside here. Also on Twitter, I have seen threads going by about the fact that game development tends to be a time-to-market challenge, that leaves all the hacks around because that’s all you care about. I can assure you that the same is true for some non-game development too. Which is why “technical debt” feels like it’s rarely tackled (also on the note, Caskey Dickson has a good technical debt talk). This is the main reason why I’m talking about environments rather than codebases. My experience is with long-lived software, and libraries that existed for twice as long as I worked at my former employer, so my main environment was codebases, but that is far from the end of it.

So how do you balance the rake-collection with the velocity of needing to get work done? I don’t have a really good answer — my balancing results have been different team by team, and they often have been related to my personal sense of achievement outside of the balancing act itself. But I can at least give an idea of what I do about this.

I described this to my former colleagues as a rule of thumb of “three times” — to keep with the rake analogy, we can call it “three notches”. When I found something that annoyed me (inconsistent documentation, required parameters that made no sense, legacy options that should never be used, and so on), I would try to remember it, rather than going out of my way to fix it. The second time, I might flag it down somehow (e.g. by adding a more explicit deprecation notice, logging a warning if the legacy codepath is executed, etc.) And the third time I would just add it to my TODO list and start addressing the problem at the source, whether it would be within my remit or not.

This does not mean that it’s an universal solution. It worked for me, most of the time. Sometimes I got scolded for having spent too much time on something that had little to no bearing on my team, sometimes I got celebrated for unblocking people who have been fighting with legacy features for months if not years. I do think that it was always worth my time, though.

Unfortunately, rake-collection is rarely incentivised. The time spent cleaning up after the rakes left in the middle of the floor eats into one’s own project time, if it’s not the explicit goal of their role. And the fact that newcomers don’t step into those rakes and hurt themselves (or slow down, afraid of bumping into yet another rake) is rarely quantifiable, for managers to be made to agree to it.

What could he tell them? That twenty thousand people got bloody furious? That you could hear the arteries clanging shut all across the city? And that then they went back and took it out on their secretaries or traffic wardens or whatever, and they took it out on other people? In all kinds of vindictive little ways which, and here was the good bit, they thought up themselves. For the rest of the day. The pass-along effects were incalculable. Thousands and thousands of soul all got a faint patina of tarnish, and you hardly had to lift a finger.

But you couldn’t tell that to demons like Hastur and Ligur. Fourteenth-century minds, the lot of them. Spending years picking away at one soul. Admittedly it was craftsmanship, but you had to think differently these days. Not big, but wide. With five billion people in the world you couldn’t pick the buggers off one by one any more; you had to spread your effort. They’d never have thought up Welsh-language television, for example. Or value-added tax. Or Manchester.

Good Omens page 18.

Honestly, I often felt like Crowley: I rarely ever worked on huge, top-to-bottom cathedral projects. But I would be sweeping around a bunch of rakes, so that newcomers wouldn’t hit them, and that all of my colleagues would be able to build stuff more quickly.

Have you seen some gold?

Since I have in my TODO list to work on two binutils problems (the warning on softer —as-needed and the fix for PulseAudio build), I also started wondering why I haven’t heard, or rather read, anything about the gold linker .

Saying that I’m disappointed does not really cover much of it to be honest, since I don’t really wish to switch to a linker written in C++ any time soon. But I really hoped that it would generate enough momentum to find a solution. Because, yes, the ld linker that ships with binutils is tremendously slow to link C++ code, and as Linkers & Loaders let me understand now, the problem is not just the length of the (mangled) symbol names, but also the way that templates are expanded and linked together.

But still, I think it’s really worth investigating some alternative, which in my opinion needs not to be written in C++, with all the problems related to that. Saying that the gold linker is fast just because of the language it is written is absolutely naïve, since the problems lie quite deeper than that.

The main problem is that the current ld implementation is based, like the rest of the binutils tools, upon libbfd, an abstraction that allows to support multiple binary formats, not just ELF. It basically allows to use mostly the same interface on different operating systems with different executable formats: ELF under Linux, BSD and Solaris, Mach-O under Mac OS X and PE under Windows and more. While this allows to get a much more powerful ld command, it’s actually a bit of a bottleneck.

Even though the thing is designed well enough for not crumble easily, it is probably a good area to investigate to find why it’s so slow. Having an alternative, ELF-only linker available for users, Gentoo users especially, would likely be a good test. This would follow the same thing that Apple does on OSX (GCC calls Apple’s linker) as well as Sun under Solaris with their copy of GCC.

While I’m all for generic code, sometimes you need to have specialised tools if you want to access advanced features of files, or if you want to have a fast, optimised software.

The same thing can be said for the analysis tool provided by binutils, as I’ve written in my post about elfutils the nm, readelf and objdump tools as provided by binutils, to be generic, lack some of the useful defaults and different interface that elfutils have. Which goes to show why specialised tools here could help. I know that FreeBSD was working on providing replacement for these tools, under the BSD license as their usual. While that’s certainly an important step, I don’t remember reading anything about a new linker.

As it is, I haven’t gone out of my way to see if there are already some alternative linkers that work under Linux, beside the one provided by Sun’s compiler in Sun Studio Express (which has lots of problems on its own). If there is already one we should look at how it stands for what concerns features.

What we desire from a specialised linker, beside speed, is proper support for .gnu.hash section, --as-needed-like features, no text relocation emitted in the code (which is a problem gold used to have at least), and possibly a better support to garbage collection of unused sections that could allow using it in production code without huge impact on performance as it seems to happen with -fdata-sections and -ffunction-sections.

I’m not going to work on this, but if somebody is interested in my opinion about using, in Gentoo, any linker in particular I’d be glad to look at them, not going to spare words though, so that you know.

GCC features and shortcomings

As any other Free Software developer, I think I have a love/hate relationship with the GNU Compiler Collection or, as it’s most commonly called, GCC. While GCC is a very good modern compiler, with tons of features, warning heuristics and very good optimisations, it’s very slow, and it’s not exactly foolproof when it comes to warnings.

In particular, I already wrote a couple of time that I dislike very much the way GCC cannot identify value sets but never used even though it should be trivial to do during SSA (and indeed the variables are not emitted in the final code), but it’s not just that.

Starting from version 4.3, GCC added a new support for warnings, the -Werror= option; with earlier versions you could turn every warning into an error with -Werror, or you had a couple of cases for -Werror-implicit-declaration for instance. With -Werror= you can set a specific class of warnings as being errors, and not turn the rest into errors. This is very good since some warnings like -Wreturn-types are very truly errors, rather than just warnings, and it’s indeed a good idea to stop the build if they are raised.

So -Werror= is good, and I started using it in more than a couple of my projects, to make sure that code is not injected that could break something. On the other hand, -Werror= requires that you know the name of the -W flag that enables a given warning. It’s very easy to do by using -fdiagnostics-show-option:

% gcc -x c -Wall -Wextra -fdiagnostics-show-option -c -o /dev/null - < In function ‘foo':
:2: warning: control reaches end of non-void function [-Wreturn-type]

Cool, now we know that to turn that warning into error we have to pass -Werror=return-type. Now this should be enough to turn any warning into errors, you’d think, but it’s unfortunately not the case. Take for instance the following case:

% gcc -x c -fdiagnostics-show-option -o /dev/null -c - <: In function ‘foo':
:2: warning: return makes integer from pointer without a cast

You can notice here two particular things, the first is the obvious one, that gcc is not reporting a warning option near the warning itself, which disallows us to use -Werror= and the other that I didn’t pass any -W flag to enable the warning. This is one of the warnings that are considered most important to gcc, so important that they are always enabled even when developers and users don’t go around asking for them, so important that the only way to disable it is to pass the -w flag to disable all warnings, as there is no -Wno- flag that disables them. But for these very same reasons, it is not possible to turn them into errors!

As you might guess is a paradoxical situation: the most important, most useful warnings (that most likely mean trouble) cannot be turned into errors because there is no way to disable them. And yet, there seems to be no development on this side, at least judging from the bug I reported .

Sometimes I find GCC is funny…

Unit testing frameworks

For a series of reason, I’m interested in writing unit tests for a few project, one of which will probably be my 1.3 branch of xine-lib (Yes I know 1.2 hasn’t been released yet in beta form either).

While unit tests can be written quite easily without much framework, having at least a basic framework would help to make automated testing possible. Since the project I have to deal with use standard C, I started looking at some basic unit test frameworks.

Unfortunately, it seems like both CUnit and check have last seen a release in 2006, and their respective repositories seem quite calm. In case of CUnit I also have noticed a quite broken buildsystem.

Glib seems to have some basic support for test units, but even they don’t use it so I doubt it’d be a nice choice. There are quite a few unit testing frameworks for particular environments, like Qt or Gnome, but I haven’t found anything generic.

It seems funky that even if people always seems to cheer test-driven development there isn’t a good enough framework for C. I think Ruby, Java, Perl and Python have already their well established frameworks, and most of the software use that, but there is neither a standard nor a widely accepted framework for C.

I could probably write my own framework but that’s not really an option, I don’t have so much free time to my hands, I suppose the less effort would be to contribute to one of the frameworks already out there, so that I can fix whatever I need fixed and have it working as I need. Unfortunately, I’d have to start looking at all of them and find the less problematic before I start doing this, and it is not useful if the original authors have gone MIA or similar, especially since at least CUnit was still developed using CVS.

If somebody have suggestions on how to proceed, they are most welcome. I could probably fork one of them if I have to, although I dislike the idea. Out of what I gathered quite briefly, the presence of XML generation for results in CUnit might be useful to gather up tests’ statistics for an automatic testing facility.

Porting to Glib

In the past few days I’ve been working to port part of lscube to Glib. The idea is that instead of inventing lists, ways to iterate over them, to find data and so on so forth, using Glib versions should be much easier and also faster as they tend to be optimised.

Interestingly enough, glib does seem to provide almost every feature out there, although I still think the logging facilities lack something.

It is very interesting to finally being able to ditch custom-made data structures for things that were tested already before. Unfortunately, rewriting big parts of the code is, as usual, vulnerable to human mistakes. But it’s fun.

I start to wonder how much duplication of efforts we have in a system, I’m pretty sure there is some at least. xine-lib for instance does not use Glib, and it then re-implements some structures and abstractions that could very well be used from Glib. I’m tempted, once I feel better and I can pay more attention to xine-lib again, to start a new branch and see how it would work to move xine-lib to Glib. Considering that even Qt4 now brings that in, I guess most of the frontends would be already using Glib, so it makes sense.

Actually some of the plugins of xine itself make use of Glib, so it would probably be a good idea to at least try to reduce the duplication by using Glib there too. Things like Glib’s GString would make reply generation for HTTP, RTP/RTSP and MMS quite easier than the current methods, and probably much safer. I would even expect it to be faster but I won’t swear on that just yet.

So this is one more TODO for xine-lib, at this point I guess 1.3 series; it would also be nice to start using Ragel for demuxers and protocol parsers.

I guess I should be playing by now rather than writing blogs about technical stuff or rewriting code to use Glib or ragel, sigh.

On patching and security issues

Jeff, I think your concerns are pretty well real. The problem here though is not that Debian users should not be suggested not to file bugs upstream, the problem is that Debian should not go out of their way to patch stuff around.

Of course this is not entirely Debian’s fault, there are a few projects for which dealing with upstream is a tremendous waste of time of cosmic proportions, as they ignore distributors, think that their needs are totally bogus and stuff like that. Now, not all projects are like that of course. Projects like Amarok are quite friendly with downstream (to the point all the patches that are in Gentoo, added by me at least, were committed at the same time on the SVN), and most of the projects that you can find not suiting any distribution are most likely not knowing what the distributors need.

I did write about this in the past, and you can find my ideas on the “Distribution-friendly Projects” article, published on LWN (part 1, part 2 and part 3). I do suggest the read of that to anybody who has an upstream project, and would like to know what distributors need.

But the problem here is that Debian is known for patching the blood out of a project to adapt it to their needs. Sometimes this is good, as they take a totally distribution-unfriendly package into a decent one, sometimes it’s very bad.

You can find a few good uses of Debian’s patches in Portage, it’s not uncommon for a patched project to be used. On the other hand, you can think of at least two failures that, at least for me, shown the way Debian can easily fail:

  • a not-so-commonly known failure in autotoolising metamail, a dependency of hylafax that I tried to run on FreeBSD before. They did use autoconf and automake, but they made them so that they only work under Linux, proving they don’t know autotools that well;
  • the EPIC FAIL of the OpenSSL security bug; where people wanted to fix a problem with Valgrind, not knowing valgrind (if you have ever looked at valgrind docs, there is a good reference about suppression files, rather than patching code you don’t understand either).

Now this of course means nothing, of course even in Gentoo there has been good patches and bad patches; I have yet to see an EPIC FAIL like the OpenSSL debacle, but you never know.

The problem lies in the fact that Debian also seem to keep an “holier than thou” attitude toward any kind of collaboration, as you can easily notice in Marco d’Itri’s statements regarding udev rules (see this LWN article). I know a few Debian developers who are really nice guys whom I love to work with (like Reinhard Tartler who packages xine and Russel Coker whose blog I love to follow, for both technical posts and “green” posts; but not limited to), but for other Debian developers to behave like d’Itri is far from unheard of, and actually not uncommon either.

I’m afraid that the good in Debian is being contaminated by people like these, and by the attitude of trusting no one but themselves in every issue. And I’m sorry to see that because Debian was my distribution of choice when I started using Linux seriously.

What did Enterprise do?

Now that enterprise died (or at least is pretty much sick), I am pointing toward a high-end system. I can understand it is difficult to accept that I don’t just get the cheapest box I can find at the local store, and be done with it.

Why is this? Well the first problem is that in Italy, prices are something very strange. It’s not unexpected for me to find components at half the price, or less, when looking them up in other European shops. In particular, in the local shops a good enough PSU rated 450W like the one I had before would cost me €140. Consider I paid mine €100 two years ago. I could get it from Germany paying less for it, included shipping, that I would get it from Italy, but, I’m not sure it’s the PSU itself, I don’t count on it. Why? Because there is a burning plastic smell when Enterprise is on, and it does not come from the PSU.

So rather than getting a new PSU, waiting to see if it’s the motherboard, or the CPU, or the memory, and then get one of those at a time, paying multiple time the shipment, I’m keen on replacing the box entirely. I was actually already planning on the update, the problem here is the timing: if it wasn’t happening while my health is in this status, I would have had enough availability to just replace Enterprise straight away.

But why am I spending €1300 on a system rather than spending, say, $600 to get the cheapest Intel quadcore available? First is, I don’t think I can get much for such a price, US prices are quite lower, even considering taxes, than the prices in Italy. I checked out newegg before, and the prices were almost half the prices on European shops, so it means a quarter of the prices of Italian suppliers. Unfortunately they don’t ship overseas. Of course I could just get it sent to me through some loophole, but again: getting it through customs would cost me between 40-50% of the nominal price shipment costs included, and it’d be impossible to get warranty out of it. And it’s not very good for most consumer-grade hardware, not having warranty.

On the other hand, a cheap Intel quadcore with a decent amount of memory could work well as a workstaiton, the problem is that Enterprise has never been your usual workstation.

Enterprise not only worked as my workstation, and used to be my media center, but most of all, it’s a development box. I’m not just rebuilding projects I work on, but I’m rebuilding many times the whole of portage. When I updated first to GCC 4.3, the first thing I did was rebuilding world; when glibc 2.8 was released, I rebuilt world; when a new autoconf or automake version is released, I rebuild world. Why? Because I can usually fix or at least give a good indication how to fix those problems.

The faster these rebuild are, the faster I can fix the problem, the faster they enter portage, usually. But it’s not just that.

For instance, Enterprise had a massively more aggressive --as-needed support: I force it through GCC specs. The result is that it stressed out linking, working around libtool brokenness and similar issues.

But this could warrant a multicore system, why going high end? Well, together with the standard system in /, Enterprise had a series of chroots, one handles the updates for the vserver where my blog is (but also xine’s Bugzilla, which is something useful for F/OSS, not just me), others handle corner-cases tests. Those are the ones building for instance a system with OpenPAM instead of Linux-PAM to see which parts of portage can work with it. Or testing cowstats with PIE enabled, to find programs that relay on the fact that they don’t need data relocations outside shared libraries.

It’s kinda like a tinderbox but it isn’t a tinderbox. It was a system that was almost never idling.

And, one thing I haven’t done, or improved in a few months, to be honest, is working on the linking collisions detection. The reason why I stopped doing that is that even using postgresql it takes a long time. And it wasn’t specifically testing for possibly embedded libraries yet.

While I do like devoting my time to Free Software development, a faster box means I can make better use of my time, which, considering my health problems, is probably a good thing (doing the same stuff in less time means I have more time to spend on other things, like going in and out of hospitals, or relaxing if I don’t feel good enough). Maybe I’m selfish but I’d rather spend money on a fast system with users’ help, than spending little money on a cheap system, and being forced to work less on Free Software so that I can handle hospitals and relax time.

So, thanks to all the users helping me with this, I’m doing my best to try securing the money for ordering the box ASAP so I can let it resume its tasks while I’m in the hospital too. And as soon as my health stops the downslide, I’ll be working on Free Software again.