Proposal: reduce the size of system packages set

This is not an official proposal. I repeat, this is not an official proposal. It’s just an idea I had and which I should propose officially one day on gentoo-dev, but as I’m not subscribed right now and I didn’t want to yet, I’ll wait for another day to do that.

I already ranted about the fact that the dependency tree of our ebuilds is vastly incomplete, as many lack dependency on zlib; trying to get this fixed was impossible, as Donnie and other insisted that as zlib was in system, we shouldn’t depend on it at all. I disagree, and I would like to know why we can’t depend on a system package, but whatever.

Anyway, as having a complete dependency tree is almost impossible because of that, I have an alternative proposal: reducing the size of the system package set.

Right now system contains stuff like ncurses, readline, zlib, autoconf, automake and m4, perl, gnuconfig, and so on. Those are packages that certainly would be part of any base Gentoo system, but are those actual part of the system set of packages? I sincerely doubt it.

The reason of the existence of the system package set is related first and foremost to breaking circular dependencies: for instance if any package that used the C compiler would depend on gcc, then the packages that gcc depends upon would create a circular dependency between gcc and itself. Also, specifying libc in almost any ebuild would be quite pointless, as well as coreutils (or freebsd-bin/ubin) for cp, mv, install, …

But why autoconf and automake? Well the easy answer is that those are often used without making sure they are depended upon explicitly… or at least this was the case till me and Martin added autotools.eclass to the tree; nowadays all the ebuilds using autotools should have proper autoconf/automake dependency already, and if they don’t they are broken anyway. So why leaving them in system? And what about m4? m4 is not part of a common Unix system, it’s just an autoconf dependency, why isn’t it just an autoconf dependency?

For what concern the three main libraries, there aren’t that many packages using zlib directly nowadays, this is especially easy to spot on a system built with --as-needed, as without that you actually do see zlib used in every other binary, for indirect dependencies. Nor there aren’t tons and tons of packages using readline, or ncurses. Actually in my own vserver’s chroot I only found four packages using readline, none of them part of system: ruby with the readline extension (uhm I wonder if I should ask for this to become an USE flag, I certainly don’t need it and I’d rather get rid of it), psql from postgresql (which maybe it’s still good to have with readline compiled in, but I could easily get rid of), bc (which is just an e2fsprogs build-dep, and I could build without readline just fine), and mysql.

A little bit different the status of ncurses, which is used by screen, gettext (only a build-dep, not needed for runtime on Linux anyway), procps, psmisc and util-linux (and I wonder why we don’t have a switch to turn it off), texinfo (wonder why we can’t remove it entirely actually) and yet again ruby. Still, I wonder why ncurses is in system rather than being properly on the dependencies list of those packages.

As for perl, that’s probably a bit more justified, there are tons of packages using perl directly or indirectly, especially in build systems. But I would like those to depend on perl properly rather than having perl in system, as there are cases where perl is not really needed at runtime at least.

And the only users of gnuconfig are the packages making use of the old and deprecated gnuconfig.eclass, or portage’s econf. Why can’t it be a dependency of portage then?

There are probably other packages that should, in my opinion, be removed from system, but these are certainly some of the most common. Unfortunately there’s a recursive problem here: to remove the packages from system without breaking stuff you’d need a proper deptree, and to get a proper deptree you need to remove the packages from system. This is what actually stops me from proposing this right away…

14 thoughts on “Proposal: reduce the size of system packages set

  1. system package should be as minimal as possible build environment. that’s my way of seeing things, as i like to abuse gentoo to build custom non-portage systems.

    Like

  2. Actually, if we make portage properly depend on Python, that would also be a good thing to do.

    Like

  3. Yes, this is an great idea. And yes, Python should not be in system, but be a proper dependency.However, migration to a learner system set would be a huge task, if, as I assume, dependancies that are implicit now, can not be found in an automated way.Also: Stuff like perl, sed, gawk and python might be used in ebuilds itself, so eliminating them would be quite some trouble. Should they be added to DEPEND only for the building?I think it would be essential to clearly specify what is allowed in ebuild in a much stricter sense than it is done now (restricting ebuilds to POSIX would be a good start). A perl package depending (now explicitly) on perl should be allowed to use perl in the ebuild.General-propose ebuilds should use sed and gawk and sh. They should use python and perl only if it would be _very_ cumbersome to use sed and gawk for the task.Also there should be one and only one “swiss army knife” used for stuff not implementable in sed/gawk/coreutils. Having three packages pull in perl, python and ruby, just for the building of packges is a Bad Thing(tm).So Python (as a natural ebuild scripting language choice on gentoo) should not be a systems package, but it would probably end up on every system anyway – if only because of ebuild needing it for package building.

    Like

  4. To be honest, embedded Linux is what is going to be the most exciting and interesting thing in the near future. Linux as a server is a solved problem, and the desktop will still take a while to go mainstream.So removing as much of the random GNU bloat from Gentoo’s system set might especially help embedded systems that are oriented towards busybox.Though portage does have rather a lot of deps, I wonder if they are all really required?Being really Unixy and having millions of dependencies is normally the best idea because it makes the thing easier to maintain, however with something low-level, especially when you think about embedded systems, few dependencies as possible are best.Since Portage already depends extremely heavily on Python, perhaps it should use the standard library more instead of bash and all the GNU tools? I bet the Python standard library can do many of the things that Portage’s current deps do.

    Like

  5. And in the same vein, you mention coreutils. I think programs should not expect a particular implementation of cp, mv, etc as it could be coreutils, it could be busybox and it could be freebsd-bin/ubin as you point out.So ideally coreutils should not be in the system set either if Gentoo wants to be a first class distribution for embedded usage.(I suppose a more hackish solution would be to have an embedded system set, but that would be giving up, saying that Gentoo and portage is its current form is not flexible enough to handle embedded properly).

    Like

  6. Actually, the base packages should indeed probably skip over all the details as who provides cp, mv and other stuff; for those there _is_ already an embedded profile, which updates the system set with busybox and uclibc. This is the same that Gentoo/FreeBSD does to put its packages rathe than GNU’s. This is the flexibility of portage already.As for not using bash stuff in ebuilds, that’s not really something that we should waste energy on at the current time. While I completely agree that we should _not_ tie up to a particular implementation of the commands, and you can easily dig up my older posts on the other blog where I wrote about degnuification of the tree, something I was mostly able to complete and which should sustain itself quite easily on the long run, trying to be “more posix” in ebuilds is not something useful at the end for embedded or anything else.First the tools you need at build-time are not required on the end system. I actually did prepare a gentoo based build system for an embedded project last year, so I know what I’m saying ;) Unfortunately there currently are more complex concerns so that crosscompile and embedded system builds can be handled by portage natively. But reducing the system set is certainly a step forward on the right direction for that too.Also, a lot of ebuilds will certainly end up at least depending for build on Perl, this is because of the build systems often do use Perl (if you think about it, even oxy-cursors’s build system used Perl before I rewrote it with two Makefiles and a few sed commands); also Perl is needed for autotools, which a lot of packages depend on. The result will probably be the same for end-users, as perl will be installed, but at least they’d have an option, and the deptree would be clear enough, rather than “oh, why the hell do I have perl installed at all?”.

    Like

  7. > you can easily dig up my older posts on the other blog where I wrote about degnuification of the tree,Cheers I’ll look into that.> I completely agree that we should not tie up to a particular implementation of the commands,Cool.> I actually did prepare a gentoo based build system for an embedded project last year, so I know what I’m saying ;)Hopefully I’ll find out that when I look into the above mentioned posts!

    Like

  8. @Flameeyes: Well said.For the user the question “oh, why the hell do I have perl installed at all?” is not as important as “will my system be fubared when i remove it?”. Explicit deps should help there … ;-)

    Like

  9. I have two machines that use Compact Flash cards of 512MB instead of hard disk drives. Squeezing a working system into such a small space was tough for me … in large part because it’s hard to distinguish between the packages needed to maintain/upgrade the base system [gcc, perl, etc.] and the ones it actually needs to do useful work other than system administration [ogg123, etc.]. This proposal would make that difference clearer, so I support it enthusiastically.

    Like

  10. Great idea; proper dependency tracking is definitely needed.Trying to avoid bash is a dead end imo; posix sh is very minimal. Bash may not be as nice as ksh or zsh but it’s far more common and has been ported just about everywhere that matters. As you say, embedded systems are rarely built on the target.I feel the same about GNU and python if I’m honest: a set of aliases for GNU tools (eg sed=gsed) would then be sufficient. Seems like there’s two sets of system deps here; one for what ebuilds need to run (eg sed, grep and awk), and one for what the applications need to compile. The latter really should be split into build-tools and link-deps (not sure what else goes in there.)

    Like

  11. And bash is very heavily targeted by hackers. So there should be an option to use ksh, ash, or dash as the shell via a configuration option. The thing about Bash, is that early Linux distros used it, like Slackware, and it continues to be a very popular shell because of that.And I was thinking that maybe portage can be rewritten in C? Really, the sole advantage Python has over C is that it has a regex ability that C doesn’t have. But it couldn’t be that hard to write a regex library to use with C that could be used by Portage.Maybe take a look at pkgsrc, used by NetBSD. I believe Portage was influenced heavily by it. And Pkgsrc doesn’t require Python. Yes, it does rely on make and Makefiles, unfortunately, but I feel that Portage could still be doable in C instead of Python.

    Like

  12. I think that saying that “bash is heavily target” is a bit bending the truth. It is being targeted *now* because of shellshock, but it was not before, otherwise all these problems would have been apparent years ago.And the configuration option exists: `eselect sh`. But I doubt you’d care given that you seem to be either playing devil’s advocate or not understanding how development works. C is not in any way more secure than other programming languages. If anything it can be less so. For Portage, Python is a pretty good choice and C is a terrible choice.Let alone the strings handling problem in C, which is terrible, but why on earth would you suggest to rewrite something that works because you don’t like (or even understand, it seems) the language?Seriously sometimes I’m tempted to “curate” comments as well.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s