Working in a bubble, contributing outside of it

The holiday season is usually a great time for personal projects, particularly for people like me who don’t go back “home” with “the family” — quotes needed, since for me home is where I am (London) and where my family is (me and my wife.) Work tends to be more relaxed – even with the added pressure of completing the OKRs for the quarter, and to define those for the next – and given that there is no public transport going on, the time saved in commuting also adds up to an ideal time to work on hobbies.

Unfortunately, this year I’m feeling pretty useless on this front, and I thought this uselessness feeling is at least something I can talk about for the dozen-or-so remaining readers of this blog, in an era of social media and YouTube videos. If this sounds very dismissive, it’s probably because that is the feeling of irrelevancy that took over me, and something that I should probably aim to overcome in 2020, one way or another.

If you are reading this post, it’s likely that you noticed my FLOSS contributions waning and pretty much disappearing over the past few years, except for my work around glucometerutils, and the usbmon-tools package (that kind-of derives off it.) I have contributed the odd patch to the Linux kernel, and more recently to some of the Python typing tooling, but those are really drive-by contributions as I found time for.

Given some of the more recent Twitter threads on Google’s policies around open source contributions, you may wonder if it is related to that, and the answer is “not really”. Early on, I was granted an IARC approval for me to keep working on unpaper (which turned out possibly overkill), for the aforementioned glucometerutils, and for some code I wrote while reverse engineering my gaming mouse. More recently, I’ve leveraged the simplified patching policy, and granted approval for releasing both usbmon-tools and tanuga (although the latter is only released as a skeleton right now.)

So I have all the options, and all the opportunities, to contribute FLOSS projects while in employment of a big multinational Internet company. Why don’t I do that more, then? I think the answer is that I work in a bubble for most of the day, and when I try to contribute something on my spare time, I find myself missing the support structure that the bubble gives me.

I want to make clear here that I’m not saying that everything is better in the bubble. Just that the bubble is soft and warm enough that makes the world outside of it scary, sometimes annoying, but definitely more vast. And despite a number of sensible tools being available out there (and in many cases, better tools), it takes a significant investment in researching the right way to do something, to the point that I suffer from CBA syndrome.

The basic concepts are not generally new: people have talked out loud at conferences about the monorepo, my friend Dinah McNutt spoke and wrote at length about Rapid, the release system we use internally, and that drives the automatic releases, and so on. If you’re even more interested in the topic, this March the book Software Engineering at Google will be released by O’Reilly. I have not read it myself, but I have interacted on and off with two of the curators and I’m sure it’s going to be worth its weight in gold.

Some of the tools are also being released, even if sometimes in modified ways. But even when they are, the amount of integration you may have internally is lost when trying to use them outside. I have considered using Bazel for glucometerutils in the past — but in addition to be a fairly heavy dependency, there’s no easy way to reference most of the libraries that glucometerutils need. At the end of the day, it was not worth trying to use it, despite making my life easier by reducing the cognitive load of working on opensource projects in my personal time.

Possibly the main “support beam” of the bubble, though, is the opinionated platform, which can be seen from the outside in form of the style guides but extends further. To keep the examples related to glucometerutils, while the tests do use absl‘s parameterized class, they are written in a completely different style than I would do at work, and they feel wrong when it comes to importing the local copy of the module to test it. When I looked around to figure out what’s the best practice to write tests in Python, I could find literally dozens of blog posts, StackOverflow answers, documentation for testing frameworks, that all gave slightly different answers. In the bubble you have (pretty much) one way to write the basic test — and while people can be creative even within those guidelines, creativity is usually frown upon.

The same is true for release engineering. As I noted and linked above, all of the release grunt work is done by the Rapid tool in the bubble — and for the most part it’s automated. While there’s definitely more than one way to configure the tool, at least you know which tool to use. And while different teams have often differing opinions on those configurations, you can at least find the opinion of your team, or the closest team to you with an Opinion (with the capital O) and follow that — it might not be perfect for your use, but if it’s allowed it usually means it was reviewed and vouched for (or copy-pasted from something else that was.)

An inside joke from the Google bubble is that the documentation is always out of date and never to be trusted. Beside the unfairness of the joke to the great tech writers I had pleasure to work with, who are more than happy to make sure the documentation is not out of date (but need to know that’s the case, and most of them don’t find out until it’s too late), the truth is that at least we do have documentation for most processes and tools. The outside world has tons of documentation, and some of it is out of date, and it’s very hard to tell whether it’s still correct and valid.

Trying to figure out how to configure a CI/CD tool for a Python project on GitHub (or worse, trying to figure out how to make it release valid packages on PyPI!) still feels like going by the early 2000s HOWTOs, where you hope that the three years old description of the XFree86 configuration file is still matching the implementation (hint: it never did.) Lots of the tools are not easy to integrate, and opting into them takes energy (and sometimes money) — the end result of which is that despite me releasing usbmon-tools nearly a year ago, you still need an unreleased dependency, as the fix I needed for it is not present in any released version, and I haven’t dared bothering the author to ask for a new release yet.

It’s very possible that if I was not working in a bubble all of these issues wouldn’t be be big unknowns — probably if I spend a couple of weeks reviewing the various options for CI/CD I can come up with a good answer for setting up automated releases, and then I can go to the dependency’s author and say “Hey, can I set this up for you?” and that would solve my problem. But that is time I don’t really have, when we’re talking about hobby projects. So I end up opening up the editor in the Git repository I want to work on, add a dozen line or so of code to something I want to do, and figure out that I’m missing the tool, library, interface, opinion, document, procedure that I need, feel drained, and close the editor without having committed – let alone pushed – anything.

How Flattr grew back for me

I wrote about flattr more than a couple of times in the past. In particular, I’ve complained about the fact that its system made it difficult for people not take their money out, as they take a continuous 10% stream out of each people’s revenue monthly. Also, the revenue out of Flattr at least for me has been, for a while, just a notch above that of Google’s AdSense, which does not require direct interaction from users to begin with.

But one of the things they stared this year made it possible to increase significantly (well depending on your habits) the amount of money that runs in the system. Socialvest is a very neat service that uses the various affiliate systems to gather you funds that you can then employ to donate straight to a non-profit (including Flattr itself!) and if you link it with your Flattr account, you’ll also see that money transferred to your Flattr funds, which you can then use to flattr others.

For the user it’s extremely simple actually: you install a browser extension, and then go around doing your online shopping as usual. Some websites will show up a ribbon telling you that you can use Socialvest with them, in which case the extension injects the needed affiliate code into the order forms so that you get your “rebate”. Considering that Amazon has a 4% affiliate fee, it’s extremely interesting, as I do most of my shopping on Amazon (ThinkGeek also should be supported but when I tried, it seemed like it didn’t work as intended, unfortunately). The nicest part is that it seems to work fine with gift cards as well.

Using SocialVest hasn’t really changed my spending habits — although it did change my preference in where to buy TV series and music, from Apple’s iTunes Store to Amazon’s stores. This was helped by me getting a Kindle Fire and Amazon releasing an Instant Video app for iPad. And now from the fact that Amazon launched the MP3 Store in Italy as well. Furthermore it seems like the J-Pop catalogue in Amazon is quite bigger than Apple’s, and that’s good news for me.

So go on, if you’re using Flattr, and go to Socialvest to have more funds to flattr the content you care about. There’s nothing to lose in my opinion.

Motivating users and teaching them good practises

I’m quite sure that some of the possible voices stirred by my previous posts, and about my less-than-happy comments on Sunrise, is that I dislike or look down on user contributions. All the opposite, I actually think that user contributions are the heart and blood of Gentoo. But I have some particular views about them.

One of the most common approaches for user contributions in Gentoo when I joined was never to do the work in place of the users: get them to fix their ebuild, point out what needs to be worked on, and leave to them to submit an updated ebuild. I shared this sentiment for a while, but nowadays I don’t think it’s the best approach. Since, as I already expressed, there is no good documentation for Gentoo development, it’s difficult for the users to provide good ebuilds by themselves. And sincerely, there is too much old cruft in the tree that shows just a bunch of bad examples.

This works only to a point: those most interested in the inner workings on Gentoo might pick up the opportunity to improve their ebuilds; many others might just feel turned down and will stop caring about their ebuilds; or about Gentoo as a whole! This post by Stephan Gallagher points at something similar.

Now, most of us will agree on the point that the situation where the user stop caring about the ebuild, or decide to keep it in his or her overlay is not good for Gentoo as a whole, but there are two approaches on how to solve this. The one that I openly dislike, and attack, and is followed by the one developer I kept criticising, is to take the ebuild, even though “suboptimal” and commit it to the tree as the user contributed it. I think it is easy to see how that can be a problem.

My favourite approach, instead, is to fix the ebuild; try it out, get it polished, if needed contact upstream to fix possible issues… it might take extra work, even for something you don’t use or care much about, but it gets results handy. To take an example from the past months: Pavel contacted me about gearmand and drizzle; while I’m using neither and I wouldn’t have had time nor interest to write ebuilds for them out of thin air, I gave him a few pointers, and he got back to me with updated ebuilds; they weren’t perfect but they were a good starting point. I polished the ebuilds a bit further, sent upstream a few patches, and now Pavel is still maintaining them, and they seem quite fit for the job to me.

Saying that Pavel’s original ebuilds weren’t fit for the tree, and weren’t good enough, is not the same as insulting Pavel; I’m quite sure he agrees that they weren’t perfect; I’m also quite sure that his next ebuilds “out of thin air” will be better than them, since he has been able to see how some of the things were to be done. Something that, lacking a good documentation, is impossible for users to do without developers’ help.

Sometimes, it’s also easier to go upstream and fix some of the troublesome build systems before going on further with the ebuild, as Enrico will probably remember from the Gource work. But again, you shouldn’t be confusing showing users (and other developers, or developers from other projects) how to do the right thing, and criticising them.

*The perfect is the enemy of the done.*

How much the tinderbox is worth

After some of the comments on the previous post explaining the tinderbox I’ve been thinking more about the idea of moving it to a dedicated server, or even better making it an ensemble of dedicated servers, maybe trying to tackle more than just one architecture, as user99 suggested.

This brings up the problem of costs, and the worth of the effort itself. Let’s start from the specifics: right now the tinderbox is running on Yamato, which is my main workstation, it’s a bi-quad Opteron, total 8×2.0GHz CPUs, 16GB of registered ECC RAM, and over 2TB of storage, connected to the net from my own DSL line which is not that stable nor fast. As I said the main bottleneck is the I/O, rather than the CPUs, although when packages have proper parallel build systems, the multiple CPUs work quite well. Not all resources are dedicated to the tinderbox effort as they are: storage space especially is just partially given to the tinderbox as it doesn’t need it that much. I use this box for other things beside Tinderboxing, some related to Gentoo, other to Free Software in general, and others finally related to my work or my leisure.

That said, looking through the offers of OVH (which is what Mauro suggested me before, and seem to have most friendly staff and good prices), a dedicated server that has more or less the same specifics as Yamato costs around €1800/year. It’s definitely too much for me to invest to the sole Tinderbox project, but it’s not definitely too much (getting the hardware would cost me more, and this is outsourcing all the hardware maintenance problems). Considering two boxes, so the out-of-tinderbox resources could also be shared (one box to hold the distfiles and tree, the other to deal with the logs), it would be €3600/year, and the ability to deal with both x86 and amd64 problems. Again, too much for me alone, but not absolutely too much.

Let me consider how resources are actually used right now, one by one.

The disk on-disk space used by the Tinderbox is relatively difficult to properly evaluate: I use separated but shared partitions for the distfiles, the build directories and the installed system. For what concerns the distfiles, and the tree, I could get some partial result from the mirrors’ statistics; right now I can tell you that 127GB is the size of the distfiles directory on Yamato. The installed system is instead 73GB (right now) while the space needed for the build directories never went over 80GB (which is the maximum size of the partition I use), with the nasty exception of qemu, as it fills the directory entirely. So in general, it doesn’t need that much hard disk space.

Network traffic is not much of a problem either I’d say: beside the first round of fetching all the needed distfiles, I don’t usually fetch more than 1GB/sync (unless big stuff like a new KDE4 release is handled). This would also be vastly made moot if the internal network had its own Gentoo mirror (not sure if that’s the case for OVH, but I’ll get to that later).

So the only problem is CPU and I/O usage, which is what a dedicated server is all about, no problem there I guess at all, so whoever would end up hosting the (let’s assume) two tinderboxes would only have to mind the inter-box traffic, which is usually also not a problem if they are physically on the same network. I looked into OVH because that’s what I was suggested; I also checked out the prices for Bytemark which is already a Gentoo sponsor, but at least the price to the public is entirely another league. Ideally, given it’s going to be me to invest the man-hours to run the tinderbox, I’d like for the boxes to be located in Europe rather than in America, where as far as I know most of Gentoo’s current infrastructure is; if you have any other possible option you can share, I’d very much like to compare first of all the prices to the public for various providers, given a configuration in this range: LXC-compatible kernel, bi-quad CPU, with large cache, 16G RAM minimum, 500GB RAID1 storage (backup is not necessary).

Now, I said that I cannot afford even to pay for one single dedicated server for the tinderbox, why am I pondering about this? Well, as many asked before “Why is this not on official Gentoo infra?“ is a question I’m not sure how to answer, last I knew infra wasn’t interested in this kind of work. On the other hand even if it’s not proper infra, I’d very much like to have some numbers, to propose the Gentoo Foundation for paying for the efforts. This would allow to extend the reach of the tinderbox, without having me praying for help every other day (I would most likely not stop using Yamato for tinderboxing, but two more instances would probably help).

Also, even if the Foundation wouldn’t have directly the kind of money to sustain this for a long period, it might still be better to have them to pay for it sustained by users’ donations. I cannot get that kind of money clearing through my books directly, but the Gentoo Foundation might help for that.

So it is important, in my opinion, to have a clear figure, and objective, on the kind of money that it’d be costing. It would also help to have some kind of status “Tinderboxes covered to run for X months, keep them running”.

And before somebody wonders: this is all my crazy idea, I haven’t even tried to talk with the Foundation yet, I’ll do so once I can at least present some data to them.

About patches and contributions

In my last post I mentioned that users of packages lacking a maintainer in portage should submit patches for the open bugs; while this is all good and fine, it is true that often there are bugs with patches that stays there to rot as well. I wish to point out here some things that might not be obvious to users as well as developers.

The first point I want to make is that it’s not like the problem is limited to user-submitted bugs and patches; bugs and patches submitted by developers can follow the same road too, waiting for months, if not years, before they are accepted and merged in. The problems here are many, some technical some social, some difficult to fit into a single category.

The biggest technical problem I can find is that there is no easy way to identify bugs which have patches waiting for review from a simple search. Modifying the Bugzilla workflow is probably a matter too complex to be worth it, but there is an easy way to avoid this, although it requires more coordination between reporters and assignees: using the “Status whiteboard” field to write “Patch waiting review” or something like that; the status whiteboard appears in searches by default, and would be useful to signal stuff like that (I used that to ask developers to signal me particular cases where gnuconfig_update couldn’t be removed).

Another technical problem is that maybe some developer is interested in fixing eventual bugs with patches for a particular package that lacks a maintainer, without becoming maintainers themselves; or an users would like to submit patches for the new bugs about a particular package. Adding themselves to maintainer-needed alias, or watching it, is most likely going to throw at them a huge amount of bug mail they are absolutely not interested in; I’m sure I sent some hundreds of bugs the m-n way in the last month (I’ll have fun watching the bug statistics on the next GMN), and I’m sure most people wouldn’t care of all those bugs.

The feasible way to handle this in the current infrastructure would be to set up bug whining filtering by package name in subject, but that’s not the nicest option I guess, although it is doable without changing anything. Another idea I got would require a huge effort from our infrastructure team and might not really e that feasible: creating multiple package name aliases; basically creating an packages.g.o domain, and then creating dynamic aliasing for addresses like app-arch+libarchivep.g.o so that it would then redirect the mail to the maintainer (developer or team) looking at metadata. This is of course a very complex solution, requires technical changes, and might as well be quite worse on other things, like for instance it would be a huge mess to search for all the bugs for a given herd, and would increase the amount of spam that each alias receive in a HUGE manner. On the other hand, the domain could be ste to only receive mail through bugzilla, and the addresses being otherwise invalid.

There are then some social problems, for instance there is the fact that Gentoo developers are volunteer, so you can’t force them to accept patches, or to improve submitted patches if they are not seen fit for merge in the tree. This means that if you want your patch to be merged in the tree you need to make sure you have a good case for it, and that you improve it as much as it’s needed for it to be merged, which might take an incremental amount of time. Of course not all users are programmers, but you cannot expect all developers to take charge of improving patches indeterminately until they can be merged, if they have no need for such patches. You can always try to find some person interested in helping out who can help you improve the patch; in the worst case you can resolve to pay someone to fix your patch up.

Also, most of the developers are likely swamped and might require some prodding to remember to take care of patches, and this is probably the heaviest showstopper; the only way to fix that is to join Gentoo and start improving the situation; of course it requires training, it’s not like we can accept every user as a developer, there are standards that one has to consider. Yes, I know there are some developers who not always live up to those standards, but that’s also a problem. I’m not saying that new developers should be perfect before joining, but they have to be open to critics and ready to learn; “works for me, it’s your problem” is not a good answer.

Does anybody have better ideas on how to fix these problems?