Code Validation and Reviews

During my “staycation” week I decided to spend some time looking at my various Python projects, trying to make it easier, nicer and cleaner both for contributors and for myself. I also decided to spend some time to improve some of the modules I use, based on various lessons I learned in my own tools — and that got me to learn more about those tools.

First of all, thanks to Ben, glucometerutils has been fully re-formatted with black and isort already. And he set it up with pre-commit to make sure that new changes don’t break these formattings. This is awesome.

As I was recently discussing with Samuele on Twitter, it’s not that I always agree with the formatting choices in black. It’s that it takes away the subjective matter of agreement on formatting. Black does not always make the code look like the way I would make it look like, and I could argue that I can do a better job than black. But it’s also true that it makes everybody’s code look the same, which is actually a great way to fix it.

This is something I definitely picked up in my dayjob — I have been part of the Python Readability program (you can find more about it in the Building Secure and Reliable Systems book, downloadable for free), and through that I have reviewed literally hundreds of changes, coming from all different parts of Alphabet. While my early engagement had lots of comments on code formatting, the moment when Python formatting tools became more reliable and their usage became widespread, the “personal load” of doing those reviews went down significantly, as I could finally focus on pointing out overly complex functions, mismatching code and documentation, and so on. For everything else, my answer was unchanging: “let the formatter do its job, maybe just add a comma there as a suggestion to it.”

This is why I’m happy about black — not that I think its formatting is 100% what I would do. But because it gets close enough, and removes the whole argument or uncertainty around it. The same applies to isort, as well.

While applying pretty much the same set of presubmits and formatting to python-pcapng, I also found out flake8. This is another useful tool to reduce the work needed for reviews, and it also can be configured to run as part of the pre-commit hooks, making sure that violations are identified sooner rather than later. While the tool is designed to deal with styleguide violations, it also turned out to identify a few outright mistakes in glucometerutils. I’m now going to apply it throughout, whenever I can.

There’s more checks I would actually want to integrate — today I was going through all the source files in glucometerutils to update the type annotations, since I dropped Python 3.6 support. As I went to do that I realised that one of the files created in a pull request I approved recently was actually missing licensing information. I have now added both license and copyright annotations as suggested by Matija (who is an actual lawyer, unlike me) — but would love a pre-commit check that just ensured that all the files have a license, copyright, and that they have the expected license, for instance.

There’s a few more trivial checks available in pre-commit that I may actually enable throughout: checks for trailing whitespace, and missing newlines at end of files. All of those are easily fixed, and the fixers do exactly that, which is also a great way to make the tests easier on newcomers and contributors: you don’t just get told that “it’s wrong”, but also “let me fix that for you already”.

It’s not quite as encompassing as the bubble I’m used to, but it seems to be the closest I’m getting to it right now. Maybe I should just start building all the hooks that I feel I need, and see if someone else will adopt them afterwards. It seems to be a common thing to do after all.

Anyone who has written Gentoo ebuilds, by the way, have most likely recognized similarities with repoman, the tool used to validate ebuilds before submitting them to the tree. I think that’s possibly why I’m so interested in this. Because I do agree that tools like repoman are the way to go, and have insisted myself for repoman to be extended to cover more and more cases over time, as it would stop divegence.

I honestly hope to get to a point where there’s no argument made over a code change, on whether it complies to style or not — but rather leaving the enforcement (and the fixing) to computers, whenever it is possible. And that also means helping the computers making it possible by being less picky on things that can be overlooked.

Success Story: Mergify, GitHub and Pre-Merge Checks

You may remember that when I complained about bubbles, one of the thing I complained about is that I had no idea how to get continuous integration right. And this kept being a problem to me for a few projects where I do actually get contributions.

In particular, glucometerutils is a project that I don’t want to be “just mine” in the future. I am releasing it with a very permissive license in the future, and I hope that others will continue contributing. But while I did manage to get Travis CI set up for it, I kept not remembering to run it myself, which is annoying.

One of the solutions that was proposed to me for that particular project was to use pre-commit, which clearly is a good starting point, but as the mypy integration shows, it’s not perfect: it requires you to duplicate quite a bit of information regarding dependencies. And honestly the problem is not whether it’s working on a per-commit basis, as much as it’s fine on a per-push basis. Which often it hasn’t been for me.

On the other hand, pull requests coming from other users have been much easier to not break stuff, because Travis CI would tell me if something was wrong. So I was basically looking for something that would let me go through exactly the same level of checking, but at the same time would let me push (or merge in) my code as soon as integration passed.

While I was looking around for this, I found a blog post by Debian developer Julien Danjou about his company Mergify which looked pretty much exactly what I wanted: it allows me to say that if either I approved of a pull request, or made it myself, and the continuous integration reports no problems, the pull request should just be rebased into the master branch.

The next problem was how to make it less cumbersome for me to keep developing the project, but thankfully Julien came through for that as well by introducing me to git-pull-request, although we had a bit of work needed for that, because of me having the same advanced settings in my git configuration for the past few years, and also because I’m lazy and not always capitalize the F in Flameeyes when I type my username. Hopefully all of that will be upstreamed by the time you read this blog post.

The end result of this? I moved glucometerutils to be part of the same organizations as the Protocols (which is also using Mergify now), and instead of git push, I’m using git pull-request. If I didn’t break anything, it gets merged by the bot. If someone sends me a pull request, I just need to approve, and once again the bot takes care of it.

I’ll look for ways to keep doing this for repositories that are not part of any organization, but at the very least this solved the issue for the two main repositories for which I have active contributors. And reduces the risk of me being the single point of failure for the projects.

Also, this is a perfect example of why Randall Munroe is Wrong, for once, or twice. Automating the merges definitely does not save me any more time than I spent even trying to get this to work. The fragment of time me and Julien spent to figure out why GitHub was throwing non-obvious validation errors will never be repaid by the time I save not clicking on the pull request link after git push. But saving time is not the only thing automation is about.

In particular, this time automation is about fairness, consistency, and resiliency: while I’m still special in the Mergify configuration, I now go through the same integration test as everyone else to merge to the repository, and it’s a bot doing the rebase-merge, rather than me, so it’s less likely it’ll make mistakes.

Anyway, thank you Julien, thank you Mergify, and thank you all who contribute. Hopefully the next few months will be a bit more active for me, between the forced work from home and the new job.

Why is `git send-email` so awful in 2017?

I set myself out to send my first Linux kernel patch using my new Dell XPS13 as someone contacted me to ask for help supporting a new it87 chip in the gpio-it87 driver I originally contributed.

Writing the (trivial) patch was easy, since they had some access to the datasheet, but then the problem came to figure out how to send it over to the right mailing list. And that took me significantly more time than it should have, and significantly more time than writing the patch, too.

So why is it that git send-email is still so awful, in 2017?

So the first problem is that the only way you can send these email is either through a sendmail-compatible interface, which is literally an interface older than me (by two years), or through SMTP directly (this is even older, as RFC 821 is from 1982 — but being a protocol, that I do expect). The SMTP client has at least support for TLS, provided you have the right Perl modules installed, and authentication, though it does not support more complex forms of authentication such as Gmail’s XOAUTH2 protocol (ignore the fact it says IMAP, it is meant to apply to both IMAP and SMTP).

Instead, the documented (in the man page) approach for users with Gmail and 2FA enabled – which should be anybody who wants to contribute to the Linux kernel! – is to request an app-specific password and saving it through the credential store mechanism. Unfortunately the default credential store just stores it as unencrypted plaintext. Instead there are a number of credential helpers you can use, either using Gnome Keyring or libsecret, and so on.

Microsoft maintains and releases its own Credential Manager which is designed to support multi-factor login to a number of separate services, including GitHub and BitBucket. Thank you, Microsoft, although it appears to only be available for Windows, sigh!

Unfortunately it does not appear there is a good credential helper for either KWallet or LastPass which would have been interesting — to a point of course. I would probably never give LastPass an app-specific password to my Google account, as it would defeat my point of not keeping that particular password in a password manager.

So I start looking around and I find that there is a tool called keyring2 which supposedly has kwallet support, though on Arch Linux it does not appear to be working (the kwallet support, not the tool, which appear to work fine with Gnome Keyring). So I checked out the issues, and the defaulting to gnome-keyring is known, and there is a feature request for a LastPass backend. That sounds promising, right? Except that the author suggests building it as a separate library, which makes sense to a point. Unfortunately the implicit reference to their keyrings.alt (which does not appear to support KDE/Plasma), drove me away from the whole thing. Why?

License is indicated in the project metadata (typically one or more of the Trove classifiers). For more details, see this explanation.

And the explanation then says:

I acknowledge that there might be some subtle legal ramifications with not having the license text with the source code. I’ll happily revisit this issue at such a point that legal disputes become a real issue.

Which effectively reads to me as “I know what the right thing to do is, but it cramps my style and I don’t want to do it.” The fact that there have been already people pointing out the problem, and the fact that multiple issues have been reported and then marked as duplicate of this master issue, should speak clearly enough.

In particular, if I wanted to contribute anything to these repositories I would have no hope to do so but in my free time, if I decide to apply for a personal project request, as these projects are likely considered “No License” by the sheer lack of copyright information or licenses.

Now, I know I have not been the best person for this as well. But at least for glucometerutils I have made sure that each file lists its license clearly, and the license is spelt out in the README file too. And I will be correcting some of my past mistakes at some point soon, together with certain other mistakes.

But okay, so this is not a viable option. What else remains to use? Well, turns out that there is an actual FreeDesktop.org specification, or at least a draft, which appears to have been last touched seven years ago, for a common API to share between GNOME and KWallet, and for which there are a few other implementations already out there… but the current KWallet does not support it, and the replacement (KSecretService) appears to be stalled/gone/deprecated. And that effectively means that you can’t use that either.

Now, on Gentoo I know I can use msmtp integrated with KWallet and the sendmail interface, but I’m not sure if in Arch Linux it would work correctly. After all I even found out that I needed to install a number of Perl modules manually, because they are not listed in the dependencies and I don’t think I want to screw with PKGBUILD files if I can avoid it.

So at the end of the day, why is git send-email so awful? I guess the answer is that in so many years we still don’t have a half-decent, secure replacement for sending email. We need what they would now call “disruptive technology”, akin to how SSH killed Telnet, to bring up a decent way to send email, or at least submit Git patches to the Linux kernel. Sigh.

Update 2020-08-29: if you are reading this to try to make sense on how to use git send-email with Gmail or GSuite, you may want to instead turn to the sendgmail binary released in the gmail-oauth2-tools repository. It’s not great, particularly as the upstream maintainer has been very non-responsive, even when I was a co-worker, and it’s not the easiest thing to setup either (it needs you to have a Google Cloud account and enable the right API key), but it does work. If you feel like forking this, and merging the requisite pull requests, and release it as its own application, please be my guest. I’m not using Gmail anymore myself, so…

Bad branching, or a Quagga call for help

You might remember that for a while I worked on getting quagga in shape in Gentoo. The reason why I was doing that is that I needed quagga for the ADSL PCI modem I was using at home to work. Since right now I’m on the other side of the world, and my router decided to die, I’m probably going to stop maintaining Quagga altogether.

There are other reasons to be, which is probably why for a while we had a Quagga ebuild with invalid copyright headers (it was a contribution of somebody working somewhere, but over time it has been rewritten to the point it didn’t really made sense not to use our standard copyright header). From one side it’s the bad state of the documentation, which makes it very difficult to understand how to set up even the most obvious of the situations, but the main issue is the problem with the way the Quagga project is branching around.

So let’s take a step back and see one thing about Quagga: when I picked it up, there were two or three external patches configured by USE flags; these are usually very old and they are not included in the main Quagga sources. It’s not minimal patches either but they introduce major new functionality, and they are very intrusive (which is why they are not simply always included). This is probably due to the fact that Quagga is designed to be the routing daemon for Linux, with a number of possible protocol frontends connecting to the same backend (zebra). Over time instead of self-contained, easily out-of-date patches to implement new protocols, we started having whole new repositories (or at least branches) with said functionalities, thanks to the move to GIT, which makes it too easy to fork even if that’s not always a bad thing.

So now you get all these repositories with extra implementations, not all of which are compatible with one another, and most of which are not supported by upstream. Is that enough trouble? Not really. As I said before, Paul Jakma who’s the main developer of the project is of the idea that he doesn’t need a “stable” release, so he only makes release when he cares, and maintained that it’s the vendors’ task to maintain backports. On that spirit, some people started the Release Engineering for Quagga, but ….

When you think about a “Release Engineering” branch, you think of something akin to Greg’s stable kernel releases, so you get the latest version, and then you patch over it to make sure that it works fine, backporting the new features and fixes that hit master. Instead what happens here is that Quagga-RE forked off version 0.99.17 (we’re now to 0.99.21 on main, although Gentoo is still on .20 since I really can’t be bothered), and they are applying patches over that.

Okay so that’s still something, you get the backports from master on a known good revision is a good idea, isn’t it? Yes it would be a good idea if it wasn’t that … it’s actually new features applied over the old version! If you check you see that they have implemented a number of features in the RE branch which are not in master… to the result that you have a master that is neither a super-set nor a sub-set of the RE branch.

Add to this that some of the contributors of new code seems to have not clear what a license is and they cause discussion on the mailing list on the interpretation of the code’s license, and you can probably see why I don’t care about keeping this running, given I’m not using it in production anywhere.

Beforehand I was still caring about this knowing that Alin was using it, and he was co-maintaining it … but now that Alin has been retired, I’d be the sole maintainer of a piece of software that rarely works correctly, and is schizophrenic in its development, so I really don’t have extra time to spend on this.

So to finish this post with a very clear message: if you use Gentoo and rely on Quagga working for production usage, please step up now or it might just break without notice, as nobody’s caring for it! And if a Quagga maintainer reads this, please, please start making sense on your releases, I beg you.

BerliOS, and picking up “dead” projects

So, Tomáš also posted on the gentoo-dev mailing list about the BerliOS shut down that I have noted after forking unpaper which is/was hosted on that platform as well.

And yes, I do note the irony that I’m the one talking about forks, after what I wrote on the subject — but there are times when a fork is indeed necessary, at least to continue development on a project. And I should probably consider unpaper more a take over than a fork, given that the original developer seem to be unreachable (he hasn’t answered my mail yet).

Of course nobody expected unpaper to be the only project that was hosted on BerliOS, nor the only one dead. Indeed, back in the days when SF.net interface was obnoxious but still usable, BerliOS was considered a quite decent alternative, at least for the fact that there was Subversion support quite a bit of time before SourceForge supported anything other than CVS. Even I started not one, but two projects using BerliOS. One is the same unieject that I have now mostly abandoned and is available on Gitorious; the other was an Ultima OnLine server emulator, which was, really, my first try at coordinating a Free Software project.

Update (2017-04-22): as you may know, Gitorious was acquired by GitLab in 2015 and turned down the service. My projects previously hosted there are now hosted on GitHub.

Said project, was started by me and a handful of friends, some of whom were players on the same unofficial “shard” as me, while another was a fellow developer in another similar software (NoX-Wizard), was basically a from scratch implementation, in what at the time I considered modern C++ (it might even have been, considering that we just came out of the GCC 2.96 trouble). It was also my first encounter with Python used as a scripting environment within another software. The code was originally developed by me in CVS; then it was moved to SVN on a local repository, then again on BerliOS.. with the result that my commits actually showed up under a long series of names, d’oh!

Well, a couple of weeks ago I decided to import the code to GitHub — and with a bit of help from git svn I was able to also merge back my commits under a single name (and those of another developer as well, not under mine of course). It’s impressive how straightforward is to import a whole repository’s history nowadays. I remember going crazy to do the same thing at the time, when moving from CVS to SVN, and to import the local SVN to BerliOS.

This should actually be considered a stating point: indeed the fact that it’s relatively trivial to import a repository’s history nowadays should make it much easier to preserve the most important repositories of BerliOS itself — I just wonder if there’s hope to save all the content in BerliOS. This becomes quite interesting when you note that it comes not a long time after the Kernel.org KO, which has seen a number of projects migrate here and there, including Linux-PAM which seem to be maintained now on Fedora’s hardware (and in GIT, finally! — this means that the next time I’ll have to patch Linux-PAM, which I hope will be far into the future, I’ll be able to provide proper backports in Gentoo infrastructure, like I do for other packages including quagga).

Changing times?

Revisiting my opinion of GitHub — How do you branch a readonly repository?

I have expressed before quite a bit of discontent of GitHub before, regarding the way they continue suggesting people to “fork” projects. I’d like for once to state that while I find the fork word the wrong one the idea is not too far from ideal.

Fast, reliable and free distributed SCMs have defined a new landscape in the world of Free Software, as many probably noted already; but without the help of GitHub, Gitorious and BitBucket I wouldn’t expect them to have made such an impact. Why? Because hosting a repository is generally not an easy task for non-sysadmins, and finding where the original code is supposed to be is also not that easy, a lot of times.

Of course, you cannot “rely on the cloud” to be the sole host for your content, especially if you’re a complex project like Gentoo, but it doesn’t hurt to be able to tell users “You want to make changes? Simply branch it here and we’ll take care of the management”, which is what those services enabled.

Why am I writing about this now? Well, it happened more than once before that I was in need to publish a branch of a package whose original repository was hosted on SourceForge, or the GNOME source , where you cannot branch it to make any change. To solve that problem I set up a system on my server to clone and mirror the repositories over to Gitorious; the automirror user is now tracking twelve repositories and copying them over to gitorious every six hours.

As it happens, last night I was hacking a bit at ALSA — mostly I was looking into applying what I wrote yesterday regarding hidden symbols on the ALSA plugins, as my script found duplicated symbols between the two Pulse plugins (one is the actual sound plugin the other is the control one), but ended up doing general fixing of their build system as it was slightly broken. I sent the patches to the mailing lists, but I wanted to have them available as a branch as well.

Well, now you got most of the ALSA project available on Gitorious for all your branching and editing needs. Isn’t Git so lovely?

Before finishing though I’d like to point out that there is one thing I’m not going to change my opinion on: the idea of “forking” projects is, in my opinion, very bad, as I wrote on the article at the top of the page. I actually like Gitorious’s “clones” better, as that’s what it should be: clones with branches, not forks. Unfortunately as I wrote some other time, Gitorious is not that great when it comes to licensing constraints, so for some projects of mine I’ve been using GitHub instead. Both the services are, in my opinion, equally valuable for the Free Software community, and not only.

Update (2017-04-22): as you may know, Gitorious was acquired by GitLab in 2015 and turned down the service. This means that all the references are totally useless now. Sorry.

Maintaining backports with GIT

I have written last week of the good feeling of merging patches upstream – even though since then I don’t think I got anything else merged … well, beside the bti fixes that I sent Greg – this week, let’s start with the opposite problem: how can you handle backports sanely, and have a quick way to check what was merged upstream? Well, the answer, at least for the software that is managed upstream with GIT, is quite easy to me.

Note: yes this is a more comprehensive rehashing of what I posted last December so if you’ve been following my blog for a long time you might not be extremely surprised by the content.

So let’s start with two ideas: branches and tags; for my system to work out properly, you need upstream to have tagged their releases properly; so if the foobar project just released version 1.2.3, we need to have a tag available that is called foobar-1.2.3, v1.2.3, or something along these lines. From that, we’ll start out a new “scratch branch”; it is important to note that it’s a scratch branch, because it means that it can be force-pushed and might require a complete new checkout to work properly. So we have something like the following:

% git clone git://git.foobar.floss/foobar.git
% cd foobar
% git checkout -b 1.2.3-gentoo origin/v1.2.3

This gives us the 1.2.3-gentoo branch as the scratch branch, and we’ll see how that behave in a moment. If upstream fails to provide tags you can also try to track down which exact commit a release corresponds to – it is tricky but not unfeasible – and replace origin/v1.2.3 with the actual SHA hash of the commit or, even better as you’ll guess by the end of the post, tag it yourself.

The idea of using a scratch branch, rather than an actual “gentoo branch” is mostly out of simplicity to me; most of the time, I make more than a couple of changes to a project if I’m packaging it – mostly because I find it easier to just fix possible autotools minor issues before they actually spread throughout the package and other packages as well – but just the actual fixes I want to apply to the packaged version; cleanups, improvements and optimisations I send upstream and wait for the next release. I didn’t always do it this way, I admit.. I changed my opinion when I started maintaining too many packages to follow all of them individually. For this reason I usually have either a personal or a “gentoo” branch where I make changes to apply to master branch, which get sent upstream and merged, and a scratch branch to handle patches. It also makes it no different to add a custom patch or a backport to a specific version (do note, I’ll try to use the word “backport” whenever possible to stress the important of getting the stuff merged upstream so that it will be present in the future, hopefully).

So we know that in the upstream repository there have been a few commits to fix corner case crashers that, incidentally, seem to always apply on Gentoo (don’t laugh, it happens more often than you can think). The commits have the shorthashes 1111111 2222222 3333333 — I have no fantasy for hashes, so sue me.

% git cherry-pick 1111111
% git cherry-pick 2222222
% git cherry-pick 3333333

Now you have a branch with three commits, cherry-picked copies (with different hashes) of the commits you need. At this point, what I usually do, is tagging the current state (and in a few paragraphs you’ll understand why), so that we can get the data out properly; at this point, the way you name the tag depends vastly on how you will release the backport, so I’ll get to that right away.

The most common way to apply patches in Gentoo, for good or bad, is adding them to the files/ subdirectory of a package; to be honest this is my least preferred way unless they are really trivial stuff, because it means that the patches will be sent down the mirrors to all users, no matter whether they use the software or not; also, given the fact that you can use GIT for patch storage and versioning, it’s also duplicating the effort. With GIT-stored patches, it’s usually the easiest to create a files/${PV}/ subdirectory and store there the patches as exported by git format-patch — easy, yes; nice nope: given that, as I’ll say, you’ll be picking the patches again when a new version is released, they’ll always have different hashes, and thus the files will always differ, even if the patch itself is the same patch. This not only wastes time, it makes it non-deduplicable and also gets around the duplicated-files check. D’oh!

A more intelligent way to handle these trivial patches is to use a single, combined patch; while patchutils has a way to combine patches, it’s not really smart; on the other hand, GIT, like most other source control managers, can provide you with diffs between arbitrary points in the repository’s history… you can thus use git diff to export a combined, complete patch in a single file (though lacking history, attribution and explanation). This helps quite a lot when you have a few, or a number, of very small patches, one or two hunks each, that would cause too much overhead in the tree. Combining this way bigger patches can also work, but you’re more likely to compress it and upload it to the mirrors, or to some storage area and add it to SRC_URI.

A third alternative, which is also requiring you to have a storage area for extra distfiles, is using a so-called “patchset tarball”; as a lot of packages already do. The downside of this is that if you have a release without any patch tarball at all, it becomes less trivial to deal with it. At any rate, you can just put in a compressed tar archive the files created, once again, by git format-patch; if you add them as a subdirectory such as patches/ you can then use the epatch function from eutils.eclass to apply them sequentially, simply pointing it at the directory. You can then use the EPATCH_EXCLUDE variable to remove one patch without re-rolling the entire tarball.

Note: epatch itself was designed to use a slightly different patchset tarball format, that included the use of a specification of the architecture, or all to apply to all architectures. This was mostly because its first users were the toolchain-related packages, where architecture-dependent patches are very common. On the other hand, using conditional patches is usually discouraged, and mostly frown upon, for the rest of the software. Reason being that’s quite more likely to make a mistake when conditionality is involved; and that’s nothing new since it was the topic of an article I wrote over five years ago.

If you export the patches as multiple files in filesdir/, you’re not really going to have to think much about naming the tag; for both other cases you have multiple options: tie the name to the ebuild release, tie it to the CVS revision indication, and so on. My personal preferred choice is that of using a single incremental, non-version-specific number for patch tarballs and patches, and mix that with the upstream release version in the tag; in the example above, it would be 1.2.3-gentoo+1. This is, though, just a personal preference.

The reason is simple to explain and I hope it makes sense for others than me; if you tie it to the release of the ebuild (i.e. ${PF}), like the Ruby team did before, you end up in trouble when you want to add a build-change-only patch – take for instance the Berkeley DB 5.0 patch; it doesn’t change what is already installed on a system built with 4.8; it only allows to build anew with 5.0; given that, bumping the release in tree is going to waste users’ time – while using the CVS revision will create quite a few jumps (if you use the revision of the ebuild, that is) as many times you change the ebuild without changing the patches. Removing the indication of the upstream version is also useful, albeit rarely, when upstream does not merge any of your patches, and you could simply reuse the same patchset tarball as previous release; it’s something that comes handy especially when security releases are done.

At this point, as a summary you can do something like this:

  • mkdir patches; pushd patches; git format-patch v1.2.3..; popd; tar jcf foobar-gentoo-1.tar.bz2 patches — gets you a patchset tarball with the patches (similarly you can prepare split patches to run add to the tree);
  • git diff v1.2.3.. > foobar-gentoo-1.patch — creates the complete patch that you can either compress, or upload to mirrors or (if very very little) put it on the tree.

Now, let’s say upstream releases version 1.2.4, and integrates one of our patches. Redoing the patches is quick with GIT as well.

% git checkout -b 1.2.4-gentoo
% git rebase v1.2.4

If there are compatible changes, the new patches will be applied just fine, and updated to not apply with fuzz; any patch that was applied already will count as “empty” and will be simply removed from the branch. At that point, you can just reiterate the export as said above.

When pushing to the repository, remember to push explicitly the various gentoo branches, and make sure to push --tags as well. If you’re a Gentoo developer, you can host such repository on git.overlays.gentoo.org (I host a few of them already; lxc, libvirt, quagga …); probably contributors, even not developers, can ask for similar repositories to be hosted there.

I hope this can help out other developers dealing with GIT-bound upstreams to ease their overweight.

Backports, better access to

One of the tasks that distributions have to deal with on a daily basis is ensuring that the code they package is as free of bug as humanly possible without having to run after upstream continuously. To solve this problem the usual solution is to use the upstream-provided releases, but also apply over those patches to fix issues and, most importantly, backports.

Backports are patches that upstream already accepted, and merged into the current development tree, applied over an older version (usually, the latest released version). Handling these together with external patches tends to get tricky, from one side you need to track down for the new release which have been merged already (just checking those that apply don’t do the trick, since you would probably find non-merged patches that don’t apply because source files changed between the two versions), from the other, they often apply with fuzz, which has been proven a liability with the latest GNU patch version.

Now, luckily, with time better tools have been created to handle patching: quilt is a very good one when you have to deal with generic packages, but even better than that is the git source control manager software. When you have an upstream repository in git, you can clone it, then create a new branch stemming from the latest release tag, and apply your patches, committing them right away exactly how they are. And thanks to the cherry-pick and cherry commands, handling backports (especially verifying whether they have been merged upstream) is piece of cake.

It even gets better when you use the format-patch command to generate the patchset since they will be ordered, and described, right away; the only thing that it lacks is creating a patchset tarball right out of that, not that it’s overly difficult to do though (I should probably write a script and publish that). Add then tags, and the ability to reset branches and you can see how dealing with distribution patching via git gets much nicer than it was before the coming of git.

But even this has a relative usefulness when you keep the stuff just locally-available. So to solve this problem, and especially to remove myself as a single point of failure (given my background that’s quite something — and even more considering lately I had to spend most of my time on paid work projects, as life became suddenly, and sadly, quite complicated), I’ve decided to prepare and push out branches.

So you can see there is one repository for lxc (thanks to Adrian and Sven for the backports), and one for libvirt (thanks to Daniel that after daring me went on to merge the patches to the schema I made; two out of three are in, the last one is the only reason why nxml integration for libvirt files is not yet in).

Now, there is one other project I’d like to publish the patches of this way, and that project is iscsitarget; unfortunately upstream is still using Subversion so I’ve been using a git-svn bridge which is far from nice. On the other hand my reasoning to publish that is that I dropped iscistarget! — which means that if you’ve been relying on it, from next kernel release onward you’ll probably encounter build failures (I already fixed it to build with 2.6.32 beforehand since I was using release candidates for a while). Myself, I’ve moved to sys-block/tgt (thanks to Alexey) which does not require an external kernel module, but rather uses the SCSI target module that Linux already provides out of the box.

Movin!!

For a while I have been quoting songs, anime and other media when choosing posts’ titles; then I stopped. Today, it looks perfectly fine to quote the title of one of the Bleach anime endings, by Takacha, since it suits what my post is about… just so you know.

So, since my blog has been experiencing technical difficulties last week, as you might know, I want to move out of the current vserver (midas), which is thankfully sponsored by IOS for xine project to a different server that I can handle just for the blog and a couple more things. I’m now waiting for a few answers (from IOS to start) to see where this blog is going to be deployed next time (I’m looking for Gentoo Linux vservers again).

The main problem is that the big, expensive factor in all this is the traffic; midas is currently serving lots of traffic: this blog alone averages over a 300 MB/day, which gets down to about 10GB of traffic a month. But the big hits come from the git repositories, which means that a relatively easy way to cut down the traffic expense of the server is to move the repositories out.

For this reason I’ve migrated my overlay back over Gentoo hardware (layman included), while Ruby-Elf is the first of my projects to be hosted at gitorious (I’m going to add Autotools Mythbuster soon too).

As for why I decided to go with gitorious over GitHub, it’s a technical and political reason for me. Technical, because I like the interface better; political both for the AGPL3 license used by gitorious and for the fact that it does not highlight the “fork it” method that github seem to have based itself off. On the other hand, I actually had difficulties finding where to clone the unofficial PulseAudio repository to prepare my local copy, and the project interface shows pretty well the “Merge Requests” counter.

At any rate there will be some of my stuff available at github at the end of the day, mostly the things that started or are now maintained within github, like Typo itself (for which I have quite a few changes locally, both bug fixes and behaviour changes, that I’d like to get merged upstream soonish).

This starts to look like a good training for when I’ll actually move out of home too.

Update (2017-04-22): as you may know, Gitorious was acquired by GitLab in 2015 and turned down the service. Which not only means this post is now completely useless, but I gave up and joined the GitHub crowd, since that service “won the war”. Unfortunately some of my content from Gitorious has been lost because I wasn’t good at keeping backups.

I still dislike github

When github was uncovered, I was a bit concerned with the idea that it seemed to foment the idea of forking software all over the place, instead of branching, with results that are, in my opinion, quite upsetting in the way some software is handled (see also this David Welton post which is actually quite to the point – I don’t always bash Debian, you know, and at least with the Debian Ruby team I seem to be often on the same page ). I was so concerned that I even wrote an article for LWN about forking and the problems that it comes with.

Thinking about this, I should tell people to read that when they talk about the eglibc mess. And when I can find time I should see to translate my old article about MySQL from Italian to English – and maybe I should see to change the articles page to link the articles directly in HTML form rather than just PDF and DVI.

At any rate, the “fork it” button is not what I’m going to blog about today, but rather what happened yesterday when I decided to update hpricot which is now hosted strictly on github. Indeed there is no download page if not the one in github which points to the tags of the git repository to download.

It starts to get increasingly used the idea that just tagging a release is enough to get it downloaded, no testing, no packaging, nothing else. For Ruby stuff gems are prepared, but that’s it (and I think that github integrates enough logic for not even doing that). It’s cool, isn’t it? No it’s not, not for distributions and not for security.

There is one very important feature for distributions on released code and is the verifiability of the release archives, while it might be a bit too much to ask for all the upstream projects to have a verifiable GnuPG signature and sign all their release, but at least making sure that a release tarball will always be available identical to everybody who download it would be usable. I let you guess that github does not do that which is giving me headaches since it means I have to create the tarballs manually and push them to the Gentoo mirrors for them to be available (git archive makes it not too difficult, but it’s still more difficult that just fetching the release upstream).

I wonder how it might be possible to explain to the Ruby community (because here it’s not just the Rails community I’d say) that distributions are a key to proper management and not something to hinder at every turn.