Making it easy to contribute (code)

Now that I left one bubble, and before I join the next one, I thought it would be a good time to discuss one thing that would otherwise be perceived as a shill by many (I’m sure it still will, but at least I gave it a shot): how to make it easier for employers of big corporations (case in point, Google as I just left that) to make code contributions to your project.

I’ve heard complains, and in some cases complained myself, about big companies not contributing back improvements, or deciding to “do their own thing” rather than contribution to an already existing project. But after spending seven years working for them, I have a clearer idea that there’s a few things that the projects can do, to make it easier to take contributions. And because I don’t want to be thought as I’m talking of secret information, I’ll be talking about it while referencing the Google Open Source Docs site, which is a (nearly complete) mirror of the documentation available to Googlers.

First of all, as the landing page states, there is some content missing on that site, so that’s not secret either. I no longer have access to the internal version, so I can’t even compare it right now, but the times I have consulted it, the missing content is really not relevant to external discussion. No secret cabals hidden there.

So the first obvious thing is that you need a license. This is something that effectively everybody has been saying forever. No license in a repository means there’s no legal way to use, distribute, or contribute to the code. And pretty much every company out there will forbid its employees from contributing to such projects. Google is quite explicit on this, and includes a (strangely non-comprehensive) list of forbidden licenses — it goes into further details with the list of banned software licenses, that include non-open source licenses too.

There’s a funny tidbit here — Google bans both WTFPL and CC-0 licenses — but explicitly allows Unlicense without review (assuming a bunch of other requirements are met). This is in contrast to the Fedora Project, that goes the other direction, recommending CC-0 over Unlicense. Personally, with the objective of this post, I would suggest the MIT license — it provides pretty much the same widely permissive license, and is accepted by effectively everybody that I know of. It definitely covers both Google and Fedora Project at the same time.

Update (2020-06-09): it was brought to my attention that a week or two after this post was published, Google’s Open Source documentation was amended to include Unlicense together with CC-0 and most other public domain dedications, except for US Government works and BSD0. I’ll keep my advice for using MIT for those files, if at all possible.

When it comes to licensing repositories, remember to license the repository with websites, too. Some time ago I wanted to send a correction for, but I couldn’t because the repository for the site didn’t have a license. Oops! (I believe this was fixed.)

At this point, with the right license, most open source projects are “patchable” — as in, you can provide a patch to the right reviewers and they can give you a green light (or tell you why you can’t do it). This is not particularly difficult work, but it is a bit toily, and it assumes that you’re able to write a patch and get it reviewed quickly — it makes the difference between sending a patch being around ten minutes at once, to be about half an hour to prepare, and then wait a day or two.

In particular for me the problem with this workflow had been transferring the patch from my Linux development workstation into my corp machine to be able to fill in the form. You can’t use GitHub private repositories either, or email it to yourself — what I used to do was to use Google Drive to share it with my corporate account, but that’s also a bit painful. I have given up a few times on some patches because of this reason, until…

Eventually, in the seven years I worked at Google, the policy relaxed a bit, and in particular, there is no review necessary for patching a subset of public GitHub repositories. You still need to make sure the patch is to a project with an accepted license (actually, a subset of licenses), and not be one of the banned projects, or one of the projects that need SVP approval (you really don’t want to wait on an SVP, from what I have been told), but then you can patch most GitHub repositories easily.

What this means in practice is that while I was working on usbmon-tools, I could keep patching python-pcapng — but I would have had to ask for review to patch hexdump: the latter has an acceptable license (Unlicense), but it’s not hosted on GitHub, but rather on BitBucket. The same is true for GitLab.

I’m not quite clear why this difference, but there you go. My recommendation here is to have at least a GitHub mirror — it doesn’t really cost much to set up mirroring, and it makes it easier for Googlers (and I’m sure other corporate employees as well), and for newcomers who might only have seen GitHub up to then.

There is one more thing that I would suggest you to do. There’s an additional bit of documentation around having an AUTHORS file. I have followed this advice for most of my projects, whether released under Google’s open sourcing policy or not. It makes it much easier to make sure that everybody is credited, while not having to keep adding more and more copyright lines into the files. It also matches well with the advice given by Matija, when it comes to include SPDX metadata to the files themselves.

With all these (small) things considered, I would say that a project is in the best state possible to accept code contributions from “corporate actors” — or at least Googlers. I think this approach, of documenting clearly (or as clear as it’s feasible) how to contribute to open source project, is a great starting point — it makes it possible for projects to not unwittingly put up barriers to contributions. And at the same time it gives the chance for those projects who really don’t want corporate contributions to set up as many barriers as they want. I’m sure it’s a signal that some find positive, I really don’t, but that’s just me.

Working in a bubble, contributing outside of it

The holiday season is usually a great time for personal projects, particularly for people like me who don’t go back “home” with “the family” — quotes needed, since for me home is where I am (London) and where my family is (me and my wife.) Work tends to be more relaxed – even with the added pressure of completing the OKRs for the quarter, and to define those for the next – and given that there is no public transport going on, the time saved in commuting also adds up to an ideal time to work on hobbies.

Unfortunately, this year I’m feeling pretty useless on this front, and I thought this uselessness feeling is at least something I can talk about for the dozen-or-so remaining readers of this blog, in an era of social media and YouTube videos. If this sounds very dismissive, it’s probably because that is the feeling of irrelevancy that took over me, and something that I should probably aim to overcome in 2020, one way or another.

If you are reading this post, it’s likely that you noticed my FLOSS contributions waning and pretty much disappearing over the past few years, except for my work around glucometerutils, and the usbmon-tools package (that kind-of derives off it.) I have contributed the odd patch to the Linux kernel, and more recently to some of the Python typing tooling, but those are really drive-by contributions as I found time for.

Given some of the more recent Twitter threads on Google’s policies around open source contributions, you may wonder if it is related to that, and the answer is “not really”. Early on, I was granted an IARC approval for me to keep working on unpaper (which turned out possibly overkill), for the aforementioned glucometerutils, and for some code I wrote while reverse engineering my gaming mouse. More recently, I’ve leveraged the simplified patching policy, and granted approval for releasing both usbmon-tools and tanuga (although the latter is only released as a skeleton right now.)

So I have all the options, and all the opportunities, to contribute FLOSS projects while in employment of a big multinational Internet company. Why don’t I do that more, then? I think the answer is that I work in a bubble for most of the day, and when I try to contribute something on my spare time, I find myself missing the support structure that the bubble gives me.

I want to make clear here that I’m not saying that everything is better in the bubble. Just that the bubble is soft and warm enough that makes the world outside of it scary, sometimes annoying, but definitely more vast. And despite a number of sensible tools being available out there (and in many cases, better tools), it takes a significant investment in researching the right way to do something, to the point that I suffer from CBA syndrome.

The basic concepts are not generally new: people have talked out loud at conferences about the monorepo, my friend Dinah McNutt spoke and wrote at length about Rapid, the release system we use internally, and that drives the automatic releases, and so on. If you’re even more interested in the topic, this March the book Software Engineering at Google will be released by O’Reilly. I have not read it myself, but I have interacted on and off with two of the curators and I’m sure it’s going to be worth its weight in gold.

Some of the tools are also being released, even if sometimes in modified ways. But even when they are, the amount of integration you may have internally is lost when trying to use them outside. I have considered using Bazel for glucometerutils in the past — but in addition to be a fairly heavy dependency, there’s no easy way to reference most of the libraries that glucometerutils need. At the end of the day, it was not worth trying to use it, despite making my life easier by reducing the cognitive load of working on opensource projects in my personal time.

Possibly the main “support beam” of the bubble, though, is the opinionated platform, which can be seen from the outside in form of the style guides but extends further. To keep the examples related to glucometerutils, while the tests do use absl‘s parameterized class, they are written in a completely different style than I would do at work, and they feel wrong when it comes to importing the local copy of the module to test it. When I looked around to figure out what’s the best practice to write tests in Python, I could find literally dozens of blog posts, StackOverflow answers, documentation for testing frameworks, that all gave slightly different answers. In the bubble you have (pretty much) one way to write the basic test — and while people can be creative even within those guidelines, creativity is usually frown upon.

The same is true for release engineering. As I noted and linked above, all of the release grunt work is done by the Rapid tool in the bubble — and for the most part it’s automated. While there’s definitely more than one way to configure the tool, at least you know which tool to use. And while different teams have often differing opinions on those configurations, you can at least find the opinion of your team, or the closest team to you with an Opinion (with the capital O) and follow that — it might not be perfect for your use, but if it’s allowed it usually means it was reviewed and vouched for (or copy-pasted from something else that was.)

An inside joke from the Google bubble is that the documentation is always out of date and never to be trusted. Beside the unfairness of the joke to the great tech writers I had pleasure to work with, who are more than happy to make sure the documentation is not out of date (but need to know that’s the case, and most of them don’t find out until it’s too late), the truth is that at least we do have documentation for most processes and tools. The outside world has tons of documentation, and some of it is out of date, and it’s very hard to tell whether it’s still correct and valid.

Trying to figure out how to configure a CI/CD tool for a Python project on GitHub (or worse, trying to figure out how to make it release valid packages on PyPI!) still feels like going by the early 2000s HOWTOs, where you hope that the three years old description of the XFree86 configuration file is still matching the implementation (hint: it never did.) Lots of the tools are not easy to integrate, and opting into them takes energy (and sometimes money) — the end result of which is that despite me releasing usbmon-tools nearly a year ago, you still need an unreleased dependency, as the fix I needed for it is not present in any released version, and I haven’t dared bothering the author to ask for a new release yet.

It’s very possible that if I was not working in a bubble all of these issues wouldn’t be be big unknowns — probably if I spend a couple of weeks reviewing the various options for CI/CD I can come up with a good answer for setting up automated releases, and then I can go to the dependency’s author and say “Hey, can I set this up for you?” and that would solve my problem. But that is time I don’t really have, when we’re talking about hobby projects. So I end up opening up the editor in the Git repository I want to work on, add a dozen line or so of code to something I want to do, and figure out that I’m missing the tool, library, interface, opinion, document, procedure that I need, feel drained, and close the editor without having committed – let alone pushed – anything.

We Need More Practical, Business-Oriented Open Source — Case Study: The Pizzeria CRM

I’m an open source developer, because I think that open source makes for safer, better software for the whole community of users. I also think that, by making more software available to a wider audience, we improve the quality, safety and security of every user out there, and as such I will always push for more, and more open, software. This is why I support the Public Money, Public Code campaign by the FSFE for opening up the software developed explicitly for public administrations.

But there is one space that I found is quite lacking when it comes with open source: business-oriented software. The first obvious thing is the lack of good accounting software, as Jonathan has written extensively about, but there is more. When I was consulting as a roaming sysadmin (or with a more buzzwordy, and marketing-friendly term, a Managed Services Provider — MSP), a number of my customers relied heavily on nearly off-the-shelf software to actually run their business. And in at least a couple of cases, they commissioned me custom-tailored software for that.

In a lot of cases, there isn’t really a good reason not to open-source this software: while it is required to run certain businesses, it is clearly not enough to run them. And yet there are very few examples of such software in the open, and that includes from me: my customers didn’t really like the idea of releasing the software to others, even after I offered a discount on the development price.

I want to show the details of an example of one such custom software, something that, to give a name to it, would be a CRM (Customer Relationship Manager), that I built for a pizzeria in Italy. I won’t be opening the source code for it (though I wish I could do so), and I won’t be showing screenshots or provide the name of the actual place, instead referring to it as Pizza Planet.

This CRM (although the name sounds more professional than what it really was), was custom-designed to suit the work environment of the pizzeria — that is to say, I did whatever they asked me, despite it disagreeing with my sense of aesthetics and engineering. The basic idea was very simple: when a customer calls, they wanted to know who the customer was even before picking up the phone — effectively inspecting the caller ID, and connecting it with the easiest database editing facility I could write, so that they could give it a name and a freeform text box to write down addresses, notes, and preferences.

The reason why they called me to write this is that they originally bought a hardware PBX (for a single room pizzeria!) just so that a laptop could connect to it and use the Address Book functionality of the vendor. Except this functionality kept crashing, and after many weeks of back-and-forth with the headquarters in Japan, the integrator could not figure out how to get it to work.

As the pizzeria was wired with ISDN (legacy technology, heh), to be able to take at least two calls at the same time, the solution I came up with was building a simple “industrial” PC, with an ISDN line card and Asterisk, get them a standard SIP phone, and write the “CRM” so that it would initiate a SIP connection to the same Asterisk server (but never answer it). Once an inbound call arrived, it would look up if there was an entry in a simple storage layer for the phone number and display it with very large fonts, to be easily readable while around the kitchen.

As things moved and changed, a second pizzeria was opened and it required a similar setup. Except that, as ISDN are legacy technology, the provider was going to charge up to the nose for connecting a new line. Instead we decided to set up a VoIP account instead, and instead of a PC to connect the software, Asterisk ran on a server (in close proximity to the VoIP provider). And since at that point the limitation of an ISDN line on open calls is limited, the scope of the project expanded.

First of all, up to four calls could be queued, “your call is very important to us”-style. We briefly discussed allowing for reserving a spot and calling back, but at the time calls to mobile phones would still be expensive enough they wanted to avoid that. Instead the calls would get a simple message telling them to wait in line to contact the pizzeria. The CRM started showing the length of the queue (in a very clunky way), although it never showed the “next call” like the customer wanted (the relationship between the customer and the VoIP provider went South, and all of us had to end up withdrawing from the engagement).

Another feature we ended up implementing was opening hours: when call would arrive outside of the advertised opening hours, an announcement would play (recorded by a paid friend, who used to act in theatre, and thus had a good pronunciation).

I’m fairly sure that none of this would actually comply with the new GDPR requirements. At the very least, the customers should be advised that their data (phone number, address) will be saved.

But why am I talking about this in the context of Open Source software? Well, while a lot of the components used in this set up were open source, or even Free Software, it still requires a lot of integration to become usable. There’s no “turnkey pizzeria setup” — you can build up the system from components, but you need not just an integrator, you need a full developer (or development team) to make sure all the components fit together.

I honestly wish I had opensourced more of this. If I was to design this again right now, I would probably make sure that there was a direct, real-time API between Asterisk and a Web-based CRM. It would definitely make it easier to secure the data for GDPR compliance. But there is more than just that: having an actual integrated, isolated system where you can make configuration changes give the user (customer) the ability to set up things without having to know how the configuration files are structured.

To set up the Asterisk, it took me a week or two reading through documentation, books on the topic, and a significant amount of experimentation with a VoIP number and a battery of testing SIM cards at home. To make the recordings work I had to fight with converting the files to G.729 beforehand, or the playback would use a significant amount of CPU.

But these are not unknown needs. There are plenty of restaurants (who don’t have to be pizza places) out there that probably need something like this. And indeed services such as Deliveroo appear to now provide a similar all-in-one solution… which is good for restaurants in cities big enough to sustain Deliveroo, but probably not grate for the smaller restaurants in smaller cities, who probably would not have much of a chance of hiring developers to make such a system themselves.

So, rambling asides, I really wish we had more ready-to-install Open Source solutions for businesses (restaurants, hotels, … — I would like to add banks to that but I know regulatory compliance is hard). I think these would actually have a very good social impact on all those towns and cities that don’t have a critical mass of tech influence, that they come with their own collection of mobile apps, for instance.

If you’re the kind of person who complains that startups only appear to want to solve problems in San Francisco, maybe think of what problems you can solve in and around your town or city.

Diabetes management software, online apps, and my projects

So my previous post with glucometerutils news got picked up by Hackaday, and though the comments ended up mostly talking about the (more physical, less practical) note about fiddling with the glucometers hardware themselves (which would suggest me the editor should probably have avoided moving the spotlight in the post, but never mind), I ended up replying to a few comments that were actually topical, to the point that I thought I should be writing about this more extensively.

In the comments, someone brought up Tidepool, which is a no-profit in California that develops what to me appears to be its own data storage and web application for diabetics. This is not far from what Glucosio is meant to be — and you might remember that an interaction with them, had me almost leave open source development, at least for what diabetes is concerned.

The problem with both projects, and a number of others that I’ve been pointed to over the years, is that I find most of them either not practical or web-oriented, or a mixture of the two. With not practical I mean that while building an “universal glucometer” capable of using any random strip is an interesting proposal, it does nothing to improve the patients’ life, and it actually can significantly increase the risks of misreading values and thus, risk the life of the user. For this reason, plus the fact that I do not have enough of a biochemistry understanding to figure out how to evaluate the precision of the meters that are already certified, I don’t invest any time looking into these projects.

Web-based applications such as Tidepool and similar are also far from my interests. I do not have a personal problem with accessing my blood sugar readouts for the sake of research, but I do have some concerns about which actors are allowed access to them. So in particular a startup like Glucosio is not someone I’d be particularly fond of giving access to my data to. Tidepool may be a no-profit, but that does not really make me feel much better, particularly because I would expect that an US-based no-profit would not have gone through all the possible data processing requirements of EU legislation, unlike, say, Abbott. I have already written a lot about why I don’t find self-hosting a good solution so I don’t think I need to spend much time on it here.

Except, there is one extra problem with those apps that require you to set up your own instance — like some of the people who have not been waiting some time ago. While running an app for my own interest may sound like an interesting thing to do, particularly if I want to build up the expertise to run complicated web app stacks, my personal ultimate goal is to have my doctor know what my blood sugar levels are over time. This is the whole point why I started that tool, I wanted to be able to output a PDF that my doctor could see without having to jump around a number of hoops to produce it — I failed to do so, but in part because I lost interest after I started using the awesome Accu-Chek Mobile.

If I were to tell my doctor «Log in on this site here with these credentials and you can see my readouts» he might actually do it, but mostly because of novelty and because he entertains my geekery around trying different meters and solutions. If he started to get this request from dozens of his patients, not only he’d have to keep a password managers just to deal with credentials, but he’d probably just couldn’t have the time to deal with it. The LibreLink app does have the ability to share data with a few services, and he did suggest me to look into diasend, but it looks like it got merged into something else that might or might not work for now, so I gave up.

Now, here is an interesting prospect, and why such apps are not completely worthless in my opinion. If the protocols are open to be used, and the apps are open source and can be set up by anyone, there is space for the doctors to have their own instance set up so that their patients can upload their data. Unfortunately, the idea that being open source this does not involve a significant investment in time and money is patently false. Particularly for important data like this, there has to be proper security, starting from every session being encrypted with TLS, and the data encrypted at rest (it is ironic that neither Tidepool nor Glucosio, at the time of writing, use TLS for their main websites). So I still don’t expect doctors in the public sector to be using these technologies any time soon. But on the other hand, there are more and more apps for this being built by the diabetes tech companies, so maybe we’ll see something happening in the future.

Where does this leave my project? Well, to begin with it’s not a single project but two of them. glucometerutils was born as a proof of concept and is still a handy tool to have. If someone manages to implement output to HTML or to PDF of the data, that would make it a very useful piece of software that does not need to interact with any remote, online application. The protocols repository serves a distinct need: it provides a way for more people to contribute to this ecosystem without requiring each of them to invest significant time in reversing the protocols, or getting in bed with the manufacturers, which – I can only guess – involves NDAs, data-sharing agreements, and similar bureaucracy that most hobbyist developers can’t afford.

Indeed, I know of at least one app, built for iOS, proprietary and commercial (as in, you have to pay for it), that has built support for meters thanks to my repository (and the author gave back in form of corrections and improvements on the documentation!). This is perfectly in line with my reasons to even have such a repository. I don’t care if the consumers and contributors to the repository build closed-source tools, as long as they share the knowledge on how to get to the data. And after that, may the best tool win.

As I said before, smartphones are no longer a luxury and for many people they are the only way they can access the Internet. It makes sense that the same way, for many diabetics it is their only way to analyse their readouts. This is why Contour Next One comes with Bluetooth and a nice app, and why there even are standard Bluetooth specification for glucometers (GLP/GLS) and continuous monitors (CGMP/CGMS). If my work on an open-source tool brings more people the ability to manage their diabetes, even with closed-source software, I’ll consider myself satisfied.

Now, there is one more interesting bit with Tidepool, though: they actually publish a Chrome-based uploader app that is able to download data from many more glucometers than my own tool (and the intersection between the two is minimal). This is great! But, as it happens, it comes with a little bit of a downside: the drivers are not documented at all. I confirmed the reason is that the access to the various meters’ protocols is subject to NDA — so while they can publish the code that access those meters, they cannot publish the specs of the protocols themselves, and that appears to include in-code comments that would make it easy to read what’s going on.

So, one of the things I’m going to do is read through those drivers, and try to write a protocol spec for the meters. It appears that they have a driver for Contour Next meters, which may or may not work for the Contour Next One which I’ve been trying to reverse engineer — I know there is at least one other open-source implementation of accessing data from Contour Next meters, but the other one is GPL-2 and, like OpenGlucose, I’ve avoided looking too closely to the code.

Projects such as Tidepool are extremely important to provide a proper alternative to the otherwise closed-garden of proprietary cloud diabetes management software. And if they become simple, and secure enough to set up, it is possible that some of the doctors will start providing their own instances where their patients can upload the readings, and that will make them also practical. But for now, to me they are only a good source of confrontation to figure out a way forward for my own tools.

Distributions are becoming irrelevant: difference was our strength and our liability

For someone that has spent the past thirteen years defining himself as a developer of a Linux distribution (whether I really am still a Gentoo Linux developer or not is up for debate I’m sure), having to write a title like this is obviously hard. But from the day I started working on open source software to now I have grown a lot, and I have realized I have been wrong about many things in the past.

One thing that I realized recently is that nowadays, distributions lost the war. As the title of this post says, difference is our strength, but at the same time, it is also the seed of our ruin. Take distributions: Gentoo, Fedora, Debian, SuSE, Archlinux, Ubuntu. They all look and act differently, focusing on different target users, and because of this they differ significantly in which software they make available, which versions are made available, and how much effort is spent on testing, both the package itself and the system integration.

While describing it this way, there is nothing that scream «Conflict!», except at this point we all know that they do conflict, and the solutions from many different communities, have been to just ignore distributions: developers of libraries for high level languages built their own packaging (Ruby Gems, PyPI, let’s not even talk about Go), business application developers started by using containers and ended up with Docker, and user application developers have now started converging onto Flatpak.

Why the conflicts? A lot of time the answer is to be found in bickering among developers of different distributions and the «We are better than them!» attitude, which often turned to «We don’t need your help!». Sometimes this went all the way to the negative side to the point of «Oh it’s a Gentoo [or other] developer complaining, it’s all their fault and their problem, ignore them.» And let’s not forget of the enmity between forks (like Gentoo, Funtoo and Exherbo), in which both sides are trying to prove being better than the other. A lot of conflict all over the place.

There were of course at least two main attempts to standardise parts of how a distribution works: the Linux Standard Base and The former is effectively a disaster, the latter is more or less accepted, but the problem lies there: in the more-or-less. Let’s look at these two separately.

The LSB was effectively a commercial effort, which was aimed at pleasing (effectively) only the distributors of binary packages. It really didn’t make much of an assurance of the environment you could build things in, and it never invited non-commercial entities to discuss the reasoning behind the standard. In an environment like open source, the fact that the LSB became an ISO standard is not a badge of honour, but rather a worry that it’s over-specified and over-complicated. Which I think most people agree it is. There is also quite an overreach of specifying the presence of binary libraries, rather than being a set of guidelines for distributions to follow.

And yes, although technically LSB is still out there working, the last release I could find described in Wikipedia is from 2015, and I couldn’t even find at first search whether they certified any distribution version. Also, because of the nature of certifications, it’s impossible to certify a rolling-release distribution, which as it happens are becoming much more widespread than they used to.

I think that one of the problem of LSB, both from the adoption and usefulness point of views, is that it focused entirely too much on providing a base platform for binary and commercial application. Back when it was developed, it seemed like the future of Linux (particularly on the desktop) relied entirely on the ability for proprietary software applications to be developed that could run on it, the way they do on Windows and OS X. Since many of the distributions didn’t really aim to support this particular environment, convincing them to support LSB was clearly pointless. is in a much better state in this regard. They point out that whatever they write is not standards, but de-facto specifications. Because of the de-facto character of these, they started by effectively writing down whatever GNOME and RedHat were doing, but then grown to be significantly more cross-desktop, thanks to KDE and other communities. Because of the nature of the open source community, FD.o specifications are much more widely adopted than the “standards”.

Again, if you compare with what I said above, FD.o provides specifications that make it easier to write, rather than run, applications. It provides you with guarantees of where you should be looking for your file and which icons should be rendering, and which interfaces are exposed. Instead of trying to provide an environment where a in-house written application will keep running for the next twenty years (which, admittedly, Windows has provided for a very long time), it provides you building blocks interfaces so that you can create whatever the heck you want and integrate with the rest of the desktop environments.

As it happens, Lennart and his systemd ended up standardizing distributions a lot more than LSB or FD.o ever did, if nothing else by taking over one of the biggest customization points of them all: the init system. Now, I have complained about this before that it probably could have been a good topic for a standard even before systemd, and independently from it, that developers should have been following, but that’s another problem. At the end of the day, there is at least some cross-distribution way to provide init system support, and developers know that if they build their daemon in a certain way, then they can provide the init system integration themselves, rather than relying on the packagers.

I feel that we should have had much more of that. When I worked on ruby-ng.eclass and fakegem.eclass, I tried getting the Debian Ruby team, who had similar complaints before, to join me on a mailing list so that we could discuss a common interface between Gems developers and Linux distributions, but once again, that did not actually happen. My afterthought is that we should have had a similar discussion for CPAN, CRAN, PyPI, Cargo and so on… and that would probably have spared us the mess that is go packaging.

The problem is not only getting the distributions to overcome their differences, both in technical direction and marketing, but that it requires sitting at a table with the people who built and use those systems, and actually figuring out what they are trying to achieve. Because in particular in the case of Gems and the other packaging systems, who you should talk with is not only your distribution’s users, but most importantly the library authors (whose main interest is shipping stuff so that people can use them) and the developers who use them (whose main interest is being able to fetch and use a library without waiting for months). The distribution users are, for most of the biggest projects, sysadmins.

This means you have a multi-faceted problem to solve, with different roles of people, and different needs of them. Finding a solution that does not compromise, and covers 100% of the needs of all the roles involved, and requires no workflow change on any one’s part is effectively impossible. What you should be doing is focusing on choosing the very important features for the roles critical to the environment (in the example above, the developers of the libraries, and the developers of the apps using those libraries), requiring the minimum amount of changes to their workflow (but convince them to change the workflow where it really is needed, as long as it’s not more cumbersome than it was before for no advantage), and figuring out what can be done to satisfy or change the requirements of the “less important” roles (distribution maintainers usually being that role).

Again going back to the example of Gems: it is clear by now that most of the developers never cared of getting their libraries to be carried onto distributions. They cared about the ability to push new releases of their code fast, seamlessly and not have to learn about distributions at all. The consumers of these libraries don’t and should not care about how to package them for their distributions or how they even interact with it, they just want to be able to deploy their application with the library versions they tested. And setting aside their trust in distributions, sysadmin only care to have a sane handling of dependencies and being able to tell which version of which library is running on their production, to upgrade them in case of a security issues. Now, the distribution maintainers can become the nexus for all these problems, and solve it once and for all… but they will have to be the ones making the biggest changes in their workflow – which is what we did with ruby-ng – otherwise they will just become irrelevant.

Indeed, Ruby Gems and Bundler, PyPI and VirtualEnv, and now Docker itself, are expressions of that: distribution themselves became a major risk and cost point, by being too different between each other and not providing an easy way to just provide one working library, and use one working library. These roles are critical to the environment: if nobody publish libraries, consumers have no library to use; if nobody consumes libraries, there is no point in publishing them. If nobody packages libraries, but there are ways to publish and consume them, the environment still stands.

What would I do if I could go back in time, be significantly more charismatic, and change the state of things? (And I’m saying this for future reference, because if it ever becomes relevant to my life again, I’ll do exactly that.)

  • I would try to convince people that even on divergence of technical direction, discussing and collaborating is a good thing to do. No idea is stupid, idiotic or any other random set of negative words. The whole point of that is that you need to make sure that even if you don’t agree on a given direction, you can agree on others, it’s not a zero-sum game!
  • Speaking of, “overly complicated” is a valid reason to not accept one direction and take another; “we always did it this way” is not a good reason. You can keep using it, but then you’ll end up with Solaris a very stagnant project.
  • Talk with the stakeholders of the projects that are bypassing distributions, and figure out why they are doing that. Provide “standard” tooling, or at least a proposal on how to do things in such a way that the distributions are still happy, without causing undue burden.
  • Most importantly, talk. Whether it is by organizing mailing lists, IRC channels, birds of a feather at conferences, or whatever else. People need to talk and discuss the issues at hand in clear, in front of the people building the tooling and making the decisions.

I have not done any of that in the past. If I ever get in front of something like this, I’ll do my best to, instead. Unfortunately, this is a position that, in the current universe we’re talking about, would have required more privilege than I had before. Not only for my personal training and experience to understand what should have been done, but because it requires actually meeting with people and organizing real life summits. And while nowadays I did become a globetrotter, I could never have afforded that before.

On the conference circuit

You may remember that I used not to be a fan of travel, and that for a while I was absolutely scared by the idea of flying. This has clearly not been the case in a while, given that I’ve been working for US companies and traveling a lot of the time.

One of the side effects of this is that I enjoy the “conference circuit”, to the point that I’m currently visiting three to four conferences a year, some of which for VideoLAN and others for work, and in a few cases for nothing in particular. This is an interesting way to keep in touch with what’s going on in the community and in the corporate world out there.

Sometimes, though, I wish I had more energy and skills to push through my ideas. I find it curious how nowadays it’s all about Docker and containers, while I jumped on the LXC bandwagon quite some time ago thanks to Tiziano, and because of that need I made Gentoo a very container-friendly distribution from early on. Similarly, O’Reilly now has a booklet on static site generators which describe things not too far from what I’ve been doing since at least 2006 for my website, and for xine’s later on. Maybe if I wasn’t at the time so afraid of traveling I would have had more impact on this, but I guess (to use a flying metaphor) I lost my slot there.

To focus bit more on SCaLE14x in particular, and especially about Cory Doctorow’s opening keynote, I have to say tht the conference is again a good load of fun. Admittedly I rarely manage to go listening to talks, but the amount of people going in and out of the expo floor, and the random conversation struck there are always useful.

In the case of Doctorow’s keynote, while he’s (as many) a bit too convinced, in my opinion, that he has most if not all the answers, his final argument was a positive one: don’t try to be “pure” (as FSF would like you to be), instead hedge your bets by contributing (time, energy, money) to organizations and projects that work towards increasing your freedom. I’ve been pleasantly surprised to hear Cory name, earlier in that talk, VLC and Handbrake — although part of the cotnext in which he namechecked us is likely going to be a topic for a different post, once I have something figured out.

My current trip brings me to San Francisco tonight, for Enigma 2016, and on this note I would like to remember to conferencegoers that, while most of us are aiming for a friendly and relaxed atmosphere, there is some opsec you should be looking into. I don’t have a designated conference laptop (just yet, I might get myself a Chromebook for it) but I do have at least a privacy screen. I’ve seen more than a couple corp email interfaces running on laptops while walking the expo floor this time.

Finally, I need to thank TweetDeck for their webapp. The ability to monitor hashtags, and particularly multiple hashtags from the same view is gorgeous when you’re doing back-to-back conferences (#scale14x, #enigma2016, #fosdem.) I know at least one of them is reading, so, thanks!

Report from SCaLE13x

This year I have not been able to visit FOSDEM. Funnily enough this confirms the trend of me visiting FOSDEM only on even-numbered years, as I previously skipped 2013 as I was just out for my first and only job interview, and 2011 because of contract related timing. Since I still care for going to an open source conference early in the year, I opted instead for SCaLE, the timing of which fit perfectly my trip to Mountain View. It also allowed me to walk through Hermosa Beach once again.

So Los Angeles again it was, which meant I was able to meet with a few Gentoo developers, a few VideoLAN developers who also came all the way from Europe, and many friends who I have met at various previous conferences. It is funny how I end up meeting some people more often through conferences than I meet my close friends from back in Italy. I guess this is the life of the frequent travelers.

While my presence at SCaLE was mostly a way to meet some of the Gentoo devs that I had not met before, and see Hugo and Ludovic from VideoLAN who I missed at the past two meetings, I did pay some attention to the talks — I wish I could have had enough energy to go to more of them, but I was coming from three weeks straight of training, during which I sat for at least two hours a day in a room listening to talks on various technologies and projects… doing that in the free time too sounded like a bad idea.

What I found intriguing in the program, and in at least one of the talks I was able to attend, was that I could find at least a few topics that I wrote about in the past. Not only now containers are now all the rage, through Docker and other plumbing, but there was also a talk about static site generators, of which I wrote in 2009 and I’ve been using for much longer than that, out of necessity.

All in all, it was a fun conference and meeting my usual conference friends and colleagues is a great thing. And meeting the other Gentoo devs is what sparked my designs around TG4 which is good.

I would like to also thank James for suggesting me to use Tweetdeck during conferences, as it was definitely nicer to be able to keep track of what happened on the hashtag as well as the direct interactions and my personal stream. If you’re the occasional conferencegoer you probably want to look into it yourself. It also is the most decent way to look at Twitter during a conference on a tablet, as it does not require you to jump around between search pages and interactions (on a PC you can at least keep multiple tabs open easily.)

Is ohloh here to stay?

You probably know ohloh Open Hub — it’s a website that provides statistical information about a number of Free Software (but not limited to) projects, fetching data from various source repositories and allowing developers to “aggregate” their commit statistics; it works partly like CIA but rather than receiving commit messages, it fetches the whole commits (since it analyses the code as well as the commits).

While I like it, and blogged about it before, I do start having some reserves to it; there are quite a few problems related to it that made some of its useful features moot, and at the same time it seems like it grew some extra features (like download servers) that seem, nowadays, pretty pointless.

Don’t get me wrong, I love the idea itself; and I’m pretty sure developers love statistics, but as I said there are quite a few issues, especially when you add to the story some things like the huge increment in use of distributed version control systems (Git, Bzr, Mercurial, …), and the increased popularity of among free software developers. I’m afraid that some of these environment changes are going to kill off ohloh by this pace, mostly because it really doesn’t seem like it’s going to adapt anytime soon.

You might remember my post about the journal feature which was, at the end, simply a tweaked microblogging application; I say tweaked because it had one fundamental feature: hash-tags weren’t simply invented, they directly related to the ohloh projects. Unfortunately, even I abandoned that feature. The reason for that was not only that it seemed to fail to reach the critical mass for such services to be useful, but also that the implementation had quite a few problems that made it more of a nuisance than something useful. The Jabber bot happened to die more often than not, and even when it worked sometimes it failed to update the website at all. I don’t know whether a proper API was defined for that, but it didn’t get support from desktop microblogging software like gwibber for instance, which could have helped build-up the critical mass needed.

Another issue is with the explosion of DVCS, as I said: since now any people wanting to branch a software to apply some fixes or changes is able to have their own repository, there has to be some filtering in which repositories do get enlisted in ohloh: who decides which repositories are official? This is probably one of the reasons why managers were added to the projects; unfortunately this came probably in too late, and as far as I can see most projects lack a manager at all.

And another problem still: seems like any project that involves changing something in the Linux kernel ended up importing a whole branch of the Linus repository (for obvious reasons), which makes my contributions list projects such as Linux ACPI LinuxSH LTTng OpenMoko (this actually created a bit of a fuzz with a colleague of mine some time ago) OpenEZX KVM linux-omap and linux-davinci and that’s just for one patch, mostly (a few already picked up my second named patch to the kernel, which is even more trivial than the first one; I got a third I’ll have to re-send around sooner or later).

But this by itself would just mean that, like many other projects, of all possible kind out there, ohloh has problems to face and solve, no sh*t Sherlock. Why do I go a step further saying that it might not be around for long still? Well, some time ago, I think it was in relation to the blog post I named above about journals, I was contacted by Jason Allen which is ohloh’s head, who asked my help to clear out some problems with indexing gentoo’s repositories (the problem still exists, by the way, there is a huge timeframe indexed but nowhere near the 10 years we just celebrated). I was able for a while to contact him when some problem came up with ohloh and that was fine; unfortunately I have been unable to contact him for a few months now (around the time SourceForge acquired them if that says anything), and this includes pinging him on ohloh’s own journal feature. I hope he’s alright, and simply too busy with the joined stuff to answer, but still that doesn’t profile well for ohloh as a website.

There are other problems as well, don’t you worry: for instance, the projects allow to set up feeds to publish in the project’s page: they used to have problem with UTF-8 and thus garbled my surname (not the only ones mind you), but this became even worse with time because requests go out without an User-Agent (which means my current mod_security configuration is rejecting the requests); of course I could whitelist the ohloh’s server IP address, but… it doesn’t look like a complex bug to fix does it?

And finally, the other day I was considering making use of the ohloh data to prepare a script showing a tagcloud-like list of projects I contribute to, I wanted something that could show easily what I really do… ohloh makes available an API that most likely had everything I needed, but, for a website that proposes you to “Grok Open Source” (that’s what the homepage says), having a clause like this in the API documentation seems a bit… backwards:

It is important not to share API keys. In order to access or modify account data, your application must be granted permission by an individual Ohloh account holder. This permission is granted on a per-key basis.

Now, I know that it’s pretty difficult for webservices to properly authenticate applications using services, and thus why API keys are used; but at the same time, doesn’t this block the whole idea of open-source clients using those APIs?

Journey in absurd: open-source Windows

Today’s interesting reading is certainly Stormy Peter’s post about hypothetically open-sourcing Windows, while I agree that the conclusion is that Windows is unlikely to get open sourced any time soon, I don’t sincerely agree on other points.

First of all, I get the impression that she’s suggesting that the only reason Linux exists is to be a Free replacement for WIndows, which is certainly not the case; even if Windows were open-source by nature, I’m sure we’d have Linux, and FreeBSD, and NetBSD, and OpenBSD, and so on so forth. The reason for this is that the whole architecture behind the system is different, and is designed to work for different use-cases. Maybe we wouldn’t have the Linux desktop as we know it by now, but I’m not sure of that either. Maybe the only project that would then not have been created, or that could be then absorbed back into Windows, would be ReactOS.

Then there is another problem: confusing Free Software and Open Source. Even if Microsoft open-sourced Windows, adopting the same code would likely not be possible even for projects like Wine and ReactOS that would be able to use it as it is, because the license might well be incompatible with the rest of them.

And by the way, most of the question could probably be answered by looking at how Apple open sourced big chunks of its operating system . While there is probably no point in even trying to get GNU/Darwin to work, the fact that Apple releases code for most of its basic operating system does provide useful insights for stuff like filesystem hacking and even SCSI MMC commands hacking, even just being able to read its sources. It also provides access to the actual software which for instance give you access to the fsck command for HFS+ volumes on Linux (I should update it by the way).

Or if you prefer, at how Sun created OpenSolaris, although one has to argue that in the latter case there is much more similarity with Linux and the rest of *BSD systems that it says very little about how a similar situation with Windows would turn out to be. And in both cases, people still pay for Solaris and Mac OS X.

In general, I think that if Microsoft were to open-source even just bits of its kernel and basic drivers, the main advantages would again come out of filesystem support (relatively, since the filesystems of FreeBSD, Sun Solaris, NetBSD and OpenBSD are really not that well supported by Linux already), and probably some ACPI support that might be lacking in Linux for now. It would be nice, though, if stuff like WMI would then be understandable.

But since we know already that open-sourcing Windows is something that is likely to happen in conjunction with Duke Nukem Forever release, all this is absolutely absurd and should not be thought too much about.

Fedora 9 and VirtualBox

For a job I have been hired to do now, I have the need to keep a virtual machine with the development environment. The reason for this is that there are a few quirks in that environment which caused me some headaches before.

As it’s not like the other virtual machine (on the laptop), requiring a perfect Windows support, I decided to go with VirtualBox again. Which by the way forces me to keep GCC 4.2 around. But oh well, that’s not important, is it?

The choice of distribution inside wasn’t difficult, I thought. I didn’t want to rebuild system on a vbox, no Gentoo. I avoided Debian and Debian-based, after the debacle about OpenSSL I won’t trust them easily. Yes I am picking on them because of that, it’s because it was a huge problem. While I found openSUSE nice to install on a computer last month, it didn’t suit well my needs I thought, so I decided to go with Fedora 9.

I used Fedora a couple of times before, it’s a nice development environment when I need something up quickly and cleanly. Unfortunately I found Fedora 9 a bit rough, more than I remembered.

I wasn’t planning on giving it Internet access at first, because of my strange network setup (I will make a schema as talking with Joshua made me think it’s pretty uncommon); but then the package manager refused to install me the kernel-devel package out of the DVD, it had to use the network. So I tried to configure the network with a different IP and netmask, but this didn’t work cleanly either, the network setting interface seemed to clash with NetworkManager. I admit I didn’t want to spend too much time looking for documentation, so I just created a “VBOX” entry on NetworkManager which I’m selecting at each boot.

After this was done, I updated all the packages as the update manager asked me to do, and tried to install the new kernel-devel. This was needed by the VirtualBox guest tools, which I wanted to install to have the “magic” mouse grab. But VirtualBox refused to do so, because Fedora 9 is shipping with a pre-release Xorg 1.5 that they don’t intend to support. Sigh.

I’m not blaming Fedora here. Lots of people blamed them for breaking compatibility with nVidia’s drivers, but they did give enough good reasons to use that particular Xorg version (I know I read a blog about this, but I don’t remember which Planet it was in, nor the title). What I’m surprised of is that VirtualBox, although being vastly opensourced, seems to keep the additions closed-sourced, which in turn cause a few problems.

Different Linux distributions have different setups, some might use different Xorg versions, other different kernel building methods, and I sincerely think the answer is not LSB. Interestingly, you can get the VMware mouse and video drivers directly from Xorg nowadays (although I admit I haven’t checked how well do they work), but you cannot get the VirtualBox equivalents.

If any Sun/ex-Innotek employee is reading me now, please consider freeing your guest additions! Then your problems with supporting different Linux distributions would be very much slimmed down: we could all package the drivers, so instead of having to connect the ISO image of the additions, mount it, install the kernel and Xorg development files, and compiling modules and drivers, the only required step would be for the distribution to identify VirtualBox like it was any other “standard” piece of real hardware.

I hope the serial device forwarding is working properly as I’ll need that too, and it did throw me a couple of errors since I started installing Fedora… I haven’t tried it yet though. I hope there are picocom packages for Fedora.