2010-09-07

Maintaining backports with GIT

I have written last week of the good feeling of merging patches upstream – even though since then I don’t think I got anything else merged … well, beside the bti fixes that I sent Greg – this week, let’s start with the opposite problem: how can you handle backports sanely, and have a quick way to check what was merged upstream? Well, the answer, at least for the software that is managed upstream with GIT, is quite easy to me.

Note: yes this is a more comprehensive rehashing of what I posted last December so if you’ve been following my blog for a long time you might not be extremely surprised by the content.

So let’s start with two ideas: branches and tags; for my system to work out properly, you need upstream to have tagged their releases properly; so if the foobar project just released version 1.2.3, we need to have a tag available that is called foobar-1.2.3, v1.2.3, or something along these lines. From that, we’ll start out a new “scratch branch”; it is important to note that it’s a scratch branch, because it means that it can be force-pushed and might require a complete new checkout to work properly. So we have something like the following:

% git clone git://git.foobar.floss/foobar.git
% cd foobar
% git checkout -b 1.2.3-gentoo origin/v1.2.3Code language: PHP (php)

This gives us the 1.2.3-gentoo branch as the scratch branch, and we’ll see how that behave in a moment. If upstream fails to provide tags you can also try to track down which exact commit a release corresponds to – it is tricky but not unfeasible – and replace origin/v1.2.3 with the actual SHA hash of the commit or, even better as you’ll guess by the end of the post, tag it yourself.

The idea of using a scratch branch, rather than an actual “gentoo branch” is mostly out of simplicity to me; most of the time, I make more than a couple of changes to a project if I’m packaging it – mostly because I find it easier to just fix possible autotools minor issues before they actually spread throughout the package and other packages as well – but just the actual fixes I want to apply to the packaged version; cleanups, improvements and optimisations I send upstream and wait for the next release. I didn’t always do it this way, I admit.. I changed my opinion when I started maintaining too many packages to follow all of them individually. For this reason I usually have either a personal or a “gentoo” branch where I make changes to apply to master branch, which get sent upstream and merged, and a scratch branch to handle patches. It also makes it no different to add a custom patch or a backport to a specific version (do note, I’ll try to use the word “backport” whenever possible to stress the important of getting the stuff merged upstream so that it will be present in the future, hopefully).

So we know that in the upstream repository there have been a few commits to fix corner case crashers that, incidentally, seem to always apply on Gentoo (don’t laugh, it happens more often than you can think). The commits have the shorthashes 1111111 2222222 3333333 — I have no fantasy for hashes, so sue me.

% git cherry-pick 1111111
% git cherry-pick 2222222
% git cherry-pick 3333333

Now you have a branch with three commits, cherry-picked copies (with different hashes) of the commits you need. At this point, what I usually do, is tagging the current state (and in a few paragraphs you’ll understand why), so that we can get the data out properly; at this point, the way you name the tag depends vastly on how you will release the backport, so I’ll get to that right away.

The most common way to apply patches in Gentoo, for good or bad, is adding them to the files/ subdirectory of a package; to be honest this is my least preferred way unless they are really trivial stuff, because it means that the patches will be sent down the mirrors to all users, no matter whether they use the software or not; also, given the fact that you can use GIT for patch storage and versioning, it’s also duplicating the effort. With GIT-stored patches, it’s usually the easiest to create a files/${PV}/ subdirectory and store there the patches as exported by git format-patch — easy, yes; nice nope: given that, as I’ll say, you’ll be picking the patches again when a new version is released, they’ll always have different hashes, and thus the files will always differ, even if the patch itself is the same patch. This not only wastes time, it makes it non-deduplicable and also gets around the duplicated-files check. D’oh!

A more intelligent way to handle these trivial patches is to use a single, combined patch; while patchutils has a way to combine patches, it’s not really smart; on the other hand, GIT, like most other source control managers, can provide you with diffs between arbitrary points in the repository’s history… you can thus use git diff to export a combined, complete patch in a single file (though lacking history, attribution and explanation). This helps quite a lot when you have a few, or a number, of very small patches, one or two hunks each, that would cause too much overhead in the tree. Combining this way bigger patches can also work, but you’re more likely to compress it and upload it to the mirrors, or to some storage area and add it to SRC_URI.

A third alternative, which is also requiring you to have a storage area for extra distfiles, is using a so-called “patchset tarball”; as a lot of packages already do. The downside of this is that if you have a release without any patch tarball at all, it becomes less trivial to deal with it. At any rate, you can just put in a compressed tar archive the files created, once again, by git format-patch; if you add them as a subdirectory such as patches/ you can then use the epatch function from eutils.eclass to apply them sequentially, simply pointing it at the directory. You can then use the EPATCH_EXCLUDE variable to remove one patch without re-rolling the entire tarball.

Note: epatch itself was designed to use a slightly different patchset tarball format, that included the use of a specification of the architecture, or all to apply to all architectures. This was mostly because its first users were the toolchain-related packages, where architecture-dependent patches are very common. On the other hand, using conditional patches is usually discouraged, and mostly frown upon, for the rest of the software. Reason being that’s quite more likely to make a mistake when conditionality is involved; and that’s nothing new since it was the topic of an article I wrote over five years ago.

If you export the patches as multiple files in filesdir/, you’re not really going to have to think much about naming the tag; for both other cases you have multiple options: tie the name to the ebuild release, tie it to the CVS revision indication, and so on. My personal preferred choice is that of using a single incremental, non-version-specific number for patch tarballs and patches, and mix that with the upstream release version in the tag; in the example above, it would be 1.2.3-gentoo+1. This is, though, just a personal preference.

The reason is simple to explain and I hope it makes sense for others than me; if you tie it to the release of the ebuild (i.e. ${PF}), like the Ruby team did before, you end up in trouble when you want to add a build-change-only patch – take for instance the Berkeley DB 5.0 patch; it doesn’t change what is already installed on a system built with 4.8; it only allows to build anew with 5.0; given that, bumping the release in tree is going to waste users’ time – while using the CVS revision will create quite a few jumps (if you use the revision of the ebuild, that is) as many times you change the ebuild without changing the patches. Removing the indication of the upstream version is also useful, albeit rarely, when upstream does not merge any of your patches, and you could simply reuse the same patchset tarball as previous release; it’s something that comes handy especially when security releases are done.

At this point, as a summary you can do something like this:

mkdir patches; pushd patches; git format-patch v1.2.3..; popd; tar jcf foobar-gentoo-1.tar.bz2 patches — gets you a patchset tarball with the patches (similarly you can prepare split patches to run add to the tree);
git diff v1.2.3.. > foobar-gentoo-1.patch — creates the complete patch that you can either compress, or upload to mirrors or (if very very little) put it on the tree.

Now, let’s say upstream releases version 1.2.4, and integrates one of our patches. Redoing the patches is quick with GIT as well.

% git checkout -b 1.2.4-gentoo
% git rebase v1.2.4Code language: CSS (css)

If there are compatible changes, the new patches will be applied just fine, and updated to not apply with fuzz; any patch that was applied already will count as “empty” and will be simply removed from the branch. At that point, you can just reiterate the export as said above.

When pushing to the repository, remember to push explicitly the various gentoo branches, and make sure to push --tags as well. If you’re a Gentoo developer, you can host such repository on git.overlays.gentoo.org (I host a few of them already; lxc, libvirt, quagga …); probably contributors, even not developers, can ask for similar repositories to be hosted there.

I hope this can help out other developers dealing with GIT-bound upstreams to ease their overweight.

Flameeyes 2845 posts

Comments 8

user99 says:

2010-09-08 at 01:37

Really I’m scratching my head at Gentoo sometimes.The docs and by that I mean the handbook are pushing installs of hald and iwconfig. Yet there is http://bugs.gentoo.org/show…and iwconfig I was told on the kernel mailing list is outdated and iw, crda, wpa_supplicant the preferred tools according to http://linuxwireless.org/en…Then stable is way behind and would not seem to mesh well with current gentoo sources. Perhaps no release dates isn’t good for Gentoo. Gnome 2.28 afaik isn’t even stable and most other distro’s have gone on to 2.30.x (see hal above)If there was a freeze and push for a new profile set could out with the old in one swoop maybe.

Reply
Nickolaj Stjujsckij says:

2010-09-08 at 17:18

I’d say “backports” is quite an obscure name, I think about something “Debian backports” when read it.By the way, I see you use Git as primary SCM (like most Ruby-ists). Did you hear about Mercurial Queues? It’s Mercurial extension for that very purpose — controlling a patch stack.

Reply
Flameeyes says:

2010-09-08 at 18:04

Any packager knows what a backport is; and yes I know Mercurial and queues given that I used to be upstream for xine-lib that uses Mercurial as main repository. I even wrote a few times about it.Sure queues are designed to work with this, on the other hand, I noted here that the post is about upstreams who use GIT already.

Reply
Nickolaj Stjujsckij says:

2010-09-08 at 21:07

No cookies for temporary sessions or trackback rss. It’s a pity.> Sure queues are designed to work with this, on the other hand, I noted here that the post is about upstreams who use GIT already. Well, of course, but there’s always dev-vcs/hg-git at your service. Of course, that makes sense only if MQ are really superior to usual distributed version control. And if we add dev-vcs/hgsubversion here, we have unified way for maintaining patches for all the major (at the moment) SCM.

Reply
Flameeyes says:

2010-09-08 at 21:11

Yeah the cookies are bit of a PITA…Mercurial uses too much space to deal with so many branches anyway, and having to convert them is, well, just not acceptable.

Reply
user99 says:

2010-09-09 at 02:14

as for clones…one for this link would be nice unless you’d rather I just edit and email suggestions.http://www.gentoo.org/proj/…http://www.linux.com/archiv… is really well written as far as grammar and syntax still there is at least an article missing just skimming it.These are listed as further reading in the Autotools-Mythbuster GuideHaven’t even glanced at –as-needed link just set it after other articles. though it’s default now

Reply
Nickolaj Stjujsckij says:

2010-09-09 at 13:45

What do you mean by “using too much space”? Mercurial compresses repo quite effectively, and doesn’t need periodical `repack`s

Reply
lu_zero says:

2010-09-10 at 07:53

mercurial has many shortcomings and quirks for people using/used to git (see my gripes about git grep vs hg grep or the revspec management)

Reply

Popular tags

The Latest
View All

Identity Crisis In The Age of AI

Bloke On A Trike

This Blog, Brought To You Through AI! (Well, Kinda)

Did I Finally Solve My Audiobook Woes? Well, Maybe.

Maintaining backports with GIT

Comments 8

Leave a ReplyCancel reply

Maintaining backports with GIT

Share this:

Comments 8

Leave a ReplyCancel reply

Related Posts