A couple of thoughts about package splitting

In my post regarding remote debugging (which I promised to finish with a second part, I just didn’t have time to test a couple of things), I’ve suggested the idea I’d like to have some kind of package splitting in Portage, to create multiple binary packages out of a single source package and ebuild, similarly to what distributions based on RPM or deb do (let’s call them RedHat and Debian, for historical reasons).

Now, I want to make sure nobody misunderstand me: I don’t intend to propose this as a way of removing the fine-grained control USE flags give us; I sincerely love that; and I also love not having to worry about installing -dev and -devel packages on my machines to be able to build software, even outside of the package manager’s control. I really find these two are strengths of Gentoo, rather than weakness, so I have no intention to fiddle with them. On the other hand, I think there are enough uses that would allow for an even finer control on binpkg level.

I’ve already given a scenario in my post about remote server debugging, but let’s try to show something different, something I’ve actually been thinking about myself. Yes I know this is a very vested interest for me, but I also think this is what makes Free Software great most of the time; we’re not solutions looking for problem, but usually solutions to problem one had at least at one point in time. Just like my writing support for NFS export on the HFS+ filesystem in Linux.

So let me try to introduce the scenario I’ve been thinking about. As it happens, I tend to a series of boxes in many offices for friends and friends of friends in my spare time, on the side. It’s not too bad, it does not pay my bills, but it does pay for some side things, which is good. Now since these offices usually use Windows, even though I obviously install Firefox as the second step after doing the system updates, it’s not unlikely that every other time I go there I have to clean up the systems. I think there are computers I’ve wiped up and reinstalled a few times already. I’ve now been thinking about setting up some firewalls based on Snort or similar. Since I am who I am, these would end up being Gentoo-based (as a side note, I’m tempted to set it up here so I can finally stop having trouble with Vista-based laptops that mess up my network). Oh and please, I know it might sound very stupid considering there are solutions good for this already, but considering how much I’m paid and the amount of money they are ready to spend (read: near to none), I would find it nicer to be paid to work on some Gentoo-related stuff than be paid to just look up and learn how to use already made equipment. Of course if you have suggestion, they are welcome anyway.

So anyway, in this situation I’d have to set up boxes that would usually feel very embedded-like: a common basis, the minimum maintenance possible, upgrades when needed. Donnie’s idea of using remote package fetching and instant deletion is not that good for this because it still requires a huge pipe to shove the data around; not only I don’t have so much upload bandwidth to employ for binpkging a whole system with debug information, it would also be a hit that most of my users wouldn’t like to have, on their bandwidth (if they want to use BitTorrent or look up p0rn from the office is not my problem).

With this in mind, I’d sincerely find it much nicer to be able to split packages, Portage-side, into multiple binary packages that can be fetched, synced, or whatever else, independently, as needed. As I proposed, a binpkg for the debug information files, but also a binpkg for documentation (including man and info pages), one for development data (headers, pkg-config), and maybe one for the prepared sources, that I want to talk about in a moment. With an environment variable it shouldn’t be much of a problem to choose which ones of these split binary packages to install in the system without explicit request; with a default including all of them but the debug informations and the sources. This would also replace the INSTALL_MASK approach as well as noinfo, noman, nodoc FEATURES. It wouldn’t be like a logical split of a package in multiple entries in the system, but rather a way to choose which parts to install, complementary to USE flags.

As for packaging the sources as I said above, there are two interesting points to be made for that, or maybe three. The first problem is that when you have to distribute a system based on Gentoo, you cannot just provide the binaries; since many packages are released under the GNU GPL version 2, even if you didn’t change the sources at all you should be distributing them alongside the binaries; and we modify a lot of sources. For license compliance we should also provide the full set of sources from which the code is derived. This is especially tricky for embedded systems. By packaging up the sources used for the builds, embedded distributors would be able to just provide all the -src subpackages as the full sources for the system.

The second point is that you can use the source packages for debugging too. Since there is, as far as I know,no way to fully embed the source code of software in the debug section of the files generated from that, the only way for GDB to display the source code lines during debugging is having the source files used for build available during the debugging session. This can easily be done by packaging up the sources and installing them in, say, /usr/src/portage/ when they are needed, from a subpackage.

A final point would be that by packaging sources in sub-packages, and distributing them, we could be reducing the overhead for users to unpack (maybe with uncommon package formats) and prepare sources (maybe with lots of patches and autotools rebuilding). Let’s say that every 6 hours a server produces md5-based source subpackages for all the ebuilds of the tree, or a subset of them. Users would then use those sources primarily, but still having the ebuilds to provide all the data and workflow so that the original untouched source would be enough to compile the package. Of course this would then require us to express dependencies on a per-phase basis, since then autotools wouldn’t be required at buildtime at all.

Okay I guess I’m really dreaming lately, but I think that throwing around some ideas is still better than not doing so, they can always be picked up and worked on; sometimes it worked.

One thought on “A couple of thoughts about package splitting

  1. With FEATURES=installsources portage already installs sources to the system for debugger to use. It finds out which source files are actually used in the binary, installs everything used in /usr/src/$caregory/$package folder and adds or modified some ELF section in the other ELF files (not sure if the /usr real stuff, or the file with other debug sections in /usr/lib/debug) to tell the debugger where the source files are (otherwise it’d expect to find them from inside PORTAGE_TMPDIR).Presumably this step could be done from a source package as well, as I don’t think anything other than having all the files exist at some path is necessary – though can get complicated with generated during build source files.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s