2010-01-19

Splitting packages: when to bump?

One of the reasons why I think that splitting packages is a much higher overhead than it’s worth, is the version and revision bumping, and in particular, deciding if it has to happen or not. It’s not only limited to patching but also to new version releases, most of the time.

The problem with patching occurs when you have to apply a patch that spans different split packages: if it changes the internal interface between components, you’ve now got to bump the requirements of the internal packages so that they require the new version; if there are circular dependencies in there, you’re definitely messed up. And this also requires you to cut down the patch into multiple parts (and sometimes, apply the same parts at both sides).

The problem with bumping instead is somewhat simpler: when you have a package that is shipped monolithic, but quite separated logically, it’s not rare for upstream to release a new version that only fixes some of the internal “sub-packages”; this is what happens most of the time with alsa-tools and alsa-plugins (the former even more as each tool has its own configure script, instead of a single big one). In this cases, the split packages might not have to be bumped at all, if the main one is bumped. And this is quite a bit of a burden for the (Gentoo) packagers. You cannot just rely on >=${PV} dependencies (as they might not always be satisfied), and you shouldn’t bump it unnecessarily (users don’t like to rebuild the same code over and over again).

In particular, if you end up having the same version bumps for both even if the code hasn’t changed (or you still have to always rebuild them at every revision bump), then you are just making it more complex than it should be: regroup the package in a single ebuild! That is, unless upstream decides to break up the code themselves, like dbus did. If your problem is providing proper dependencies (as it seems it happened with poppler), then your problem is solved by the (now-available) USE dependencies, rather than by splitting to hair-thin packages and increasing the number of “virtual” ebuilds. The same applies to gtk-sharp (and yes, I know both were done by Peter, and yes he knows I don’t like that solution).

Right now, I maintain one split package (by the standard notion of split), and one that comes very near for our purpose: gdbserver and perf. The former is an almost-standalone sub-package of gdb itself (the GNU debugger), which ships with the same tarball, the latter is part of the Linux kernel source tarball (and patches), but is not tied to the rest of the source either.

In the first case, deciding whether to bump or not is quite simple: you extract the gdbserver sources from the old and new tarball, then diff them together. If there is no change, you ignore the bump (which is what I have done with the recent 7.0.1 release). It’s a bit boring but since the gdb sources aren’t excessively huge it’s not too bad.

It’s a different matter for perf: since it’s shipped within the kernel itself, any new release candidate of the kernel is a possible release candidate for perf as well! Luckily I can usually just rely on the release changelog and grep for perf: in front of the git log. It might not be the best choice, as it’s error-prone, but unpacking (and patching, in case of release candidates) the Linux sources that are needed for perf is a non-trivial task by itself, and it takes much more time than it would be worth.

Flameeyes 2820 posts

Comments 13

RealNC says:

2010-01-20 at 00:25

Amen. The amount of split packages in Gentoo is ridiculous with zero point 5 benefits. Big lack of manpower to be doing this stuff (that really no one needs) in the first place.

Reply
mv says:

2010-01-20 at 01:14

I do not agree with the policy to not bump the split ebuild if the corresponding sub-tree has not changed:For users who prefer to keep the source files around (e.g. for recompilation without internet access), it can be a much less waste of resources to recompile a package unnecessarily than to keep two huge tarball just because there is no bump for avoiding possible recompilation. I would welcome if there would be at least a masked bump with a corresponding comment so that the user can use the same tarball for the split projects if he wants (and the mask would prevent users who do not care about the sources to recompile unnecessarily).Moreover, your suggestion, e.g. to make one package with use-flags out of poppler would have also the effect that all subprojects installed on a machine have to be recompiled after a bump, even if it should be the case that the upstream bump involved none of these (installed) projects nor their interfaces.

Reply
Flameeyes says:

2010-01-20 at 01:38

@mv you’re either confused or confusing. You’re basically asking “bump even unchanged projects” and then “don’t force me to rebuild unchanged (sub) projects”.If we don’t do (a) we’re going to have to do (b), or vice-versa… or we should be having an ebuild with version X to use the tarball version Y which is so much a mess I don’t even want to start.

Reply
mv says:

2010-01-20 at 18:16

I never said “don’t force me to rebuild unchanged (sub) projects” – I want quite the opposite, i.e. I want to be forced (with the advantage that I need only one tarball). I just remarked in the last paragraph of my reply that this side effect happens anyway if the poppler maintainer follows your suggestion.

Reply
Flameeyes says:

2010-01-20 at 19:05

Well, whether they follow my suggestion or not, currently it seems to me like we’re *always* rebuilding all the split packages out there (with the notable exception, I think, of KDE — at least it was a notable exception with KDE3, and I would like to expect that the same applies to KDE4).Generally speaking, splitting packages *is hard*, and getting it right is even harder, which is why the (rare) good sides of it are not important enough to keep on going this way. Poppler is probably the worst case out there because of the way it is also duplicated with virtual ebuilds, but it’s far from the only one.And among other things, the way these dependencies are applied it makes it quite non-obvious how they are supposed to be upgraded, causing angst among users when Portage finds itself unable to resolve the blockers. What is worse? Forcing a (worst case scenario) one-hour rebuild of a package on USE flag change, or stopping users from upgrading until they apply for a voodoo rite?

Reply
mv says:

2010-01-20 at 23:11

> currently it seems to me like we’re always rebuilding all the split packageAs you mentioned, it was different with KDE3 (I am happy that this happens not yet with KDE4 – it was always a lot of work to create KDE3 packages for the non-updated ones in my overlay to use the same tarballs; that’s why I said it would nice if it would be available at least masked: For the maintainer it is just a copy and an entry in the mask). But there are also examples, although one can of course argue whether these are “split” packages: In the moment I recall dev-libs/klibc and dev-libs/ffcall which bring in an additional (usually much older) version of the kernel and clisp tarballs, respectively.

Reply
Maciej Mrozowski says:

2010-01-21 at 06:52

Yes, poppler in Gentoo is really epic fail, considering the tarball is just ~1.6MiB.Since former maintainer is more or less inactive (and yngwin picked it as no one was interested), I have monolithic ebuild in local overlay already and it should appear in testing tree when I feel it’s ready (approximately in a week).As for KDE4, their release plan incorporates feature freeze so that there are rarely invasive *and* urgent patches between KDE SC patch releases (4.x.y->4.x.y+1), and most refactoring – that would cause multiple split packages to be bumped simultaneously – happens between minor releases (4.x.y->4.x+1.z) so everything is bumped in such case anyway.Note that everything is bumped even between patch releases.

Reply
nico says:

2010-01-21 at 11:31

What happens if one of the qt packages fail to build after qt-core has already been upgraded? I’m always worried i’ll break things whenever a new release is available.

Reply
markuz says:

2010-01-21 at 11:52

I think your reasons are pointless, more or less. Take openoffice, not splitting it means that if I want to enable/disable a feature or a langpack I have to recompile a whole 300Mb blob wasting hours of CPU time. So, as a binary distro user, I am for intelligent splits, reducing inter-dependencies and optimizing space usage, but this is, in most cases, up to software developers themselves, and in case of OO, it’s a real mess because its build system is just a huge amount of crap.

Reply
Zeev Tarantov says:

2010-01-21 at 16:59

RE: kernel source. Can someone who understands python IO performance take a look at why unpacking the kernel sources takes so much time?$ time tar xjf /usr/portage/distfiles/linux-2.6.32.tar.bz2real 0m17.326suser 0m15.946ssys 0m1.597s$ cd linux-2.6.32/$ time bzcat /usr/portage/distfiles/patch-2.6.32.4.bz2 | patch -p1real 0m0.215suser 0m0.102ssys 0m0.066swhile using portage:$ time sudo emerge -v =sys-kernel/vanilla-sources-2.6.32.4 > /dev/nullreal 8m40.103suser 1m24.562ssys 6m14.719s

Reply
Flameeyes says:

2010-01-21 at 17:13

Zeev: that’s probably the collision detection taking a *huge* time… try @FEATURES=”-collision-protection -protect-owned”@.

Reply
Zeev Tarantov says:

2010-01-22 at 18:51

This is for 2.6.33_rc5<typo:code>time sudo FEATURES=”-collision-protection -protect-owned” emerge -v sys-kernel/vanilla-sources > /dev/nullreal 10m18.040suser 1m37.181ssys 7m38.159s</typo:code>This is perf top while it’s doing that:<typo:code>—————————————————————————— PerfTop: 23157 irqs/sec kernel:81.0% [100000 cycles], (all, 2 CPUs)—————————————————————————— samples pcnt kernel function _______ _____ _______________ 100055.00 – 34.7% : copy_page_range 70068.00 – 24.3% : unmap_vmas 18438.00 – 6.4% : page_remove_rmap 14164.00 – 4.9% : release_pages 7897.00 – 2.7% : copy_page_c 7605.00 – 2.6% : page_fault 6090.00 – 2.1% : vm_normal_page 3969.00 – 1.4% : free_pages_and_swap_cache 2460.00 – 0.9% : _raw_spin_lock 2171.00 – 0.8% : clear_page_c 2143.00 – 0.7% : schedule 2018.00 – 0.7% : _raw_spin_lock_irqsave 1983.00 – 0.7% : flush_tlb_page 1870.00 – 0.6% : do_wp_page 1510.00 – 0.5% : __wake_up_bit</typo:code>Obviously, portage or python are doing something terribly bad. I can’t imagine what, myself. mmaping and munmapping each byte? fsync’ing each byte? I don’t know.

Reply
RealNC says:

2010-01-23 at 01:31

Portage also takes a huge amount of time here to install the kernel source. You might want to open a bug about it. I’ll confirm it.

Reply

Popular tags

The Latest
View All

IPv6 In Real Life: Network Access Controls

Again On Transports: Car Culture, Motors, And Me

First Impressions Of A 3D Printer

20 Years Of Blogging

Splitting packages: when to bump?

Comments 13

Leave a ReplyCancel reply

Splitting packages: when to bump?

Share this:

Comments 13

Leave a ReplyCancel reply

Related Posts