Gentoo Failed Us Again

Kudos to Markos who basically gave me the title for this blog post!

I’ve spent the past week or so away from computers, I’m having some personal trouble, tied with bad migraines that would have burnt the hell out of me. I came back to updating my systems today, and I received a nasty surprise. Unmasked libpng 1.4 is wrecking havoc on so many systems that it’s not even funny.

I’m not complaining about the fact that we’ve finally unmasked the new libpng, it was needed and we should probably proceed on getting it stable soonish as well. What I complain about is that we’re hitting the same obstacles we hit with libexpat:

  • we still don’t have enabled --as-needed by default, which would reduce considerably the amount of packages that actually need to be rebuilt after such an update (and, by the way, not using --as-needed also increases tremendously the chance that some program will be loading both 1.2 and 1.4 versions, with the usual trouble of symbols’ collisions);
  • we still haven’t solved the problem with libtool archives , requiring the rebuild (or nasty hack) of a number of packages for no good reasons.

The worst part is that I have been preaching about both things for a while, a few years I’d say, and yet we have not gotten our heads out of the sand, so we hit users in the face after this kind of updates. Still.

Using --as-needed, only a fraction of the packages installed in the system will actually link against the libtool file, and only those would need to be rebuilt; without it, it’s very likely almost all the libtool-using packages, as well as most pkg-config using packages will be linking in libpng as a dependency of other libraries, such as GTK+ or Qt. And since you will start updating from those libraries, the newly-started packages will have problems because both libpng versions will be loaded at the same time: once from the library and once from the application.

For what concerns the .la files, the problem is mostly at buildtime and it makes it very very difficult to get out of the mess caused by the update, as a number of packages will start lamenting of missing targets for -lpng12. The solution for this would be to, obviously, carefully remove .la files within the ebuilds; this way we reduce the chances that the dependencies end up polluting packages that would, otherwise, have no involvement whatsoever with libpng.

Unfortunately, removing all the libpng file indiscriminately is a Bad Idea™ (and yes, I know some people experimented with that, I still maintain it’s a bad idea!). What you want to do is to reduce their impact as much as possible, but to do so you have to do some extra work, and that requires developers to understand the problem and accept working on a solution, even a temporary, imperfect one, to avoid staying in the problem area.

Do you remember when I checked eog (Eyes of Gnome) .la files resulting in stating clearly that they are totally useless? Well, eog still installs them; sure enough they are not excessively important in this situation, as they are not linked against, but they will create false positives in revdep-rebuild for instance. Even my flowchart has gone mostly unused by developers.

And we still don’t have any way to sanitise those files within Portage; lafilefixer does solve some stuff, but it’s not part of Portage proper, nor it’s integrated with it. If you want, in the future, to reduce your system’s pollution, do something like this:

# /etc/portage/bashrc
post_src_install() {
    lafilefixer "${D}"
}

This way the files will be sanitised before being merged in the system, and you won’t have to fix them manually.

Will we ever learn?

21 thoughts on “Gentoo Failed Us Again

  1. Thank you for writing what I felt this weekend. One update and my system stops being able to run any programs. No warning, no advice. Revdep-rebuild crashed due to a few things still wanting qt 3. emerge -e system/world didn’t fix all the problems.This kind of stuff shouldnt be this hard. This plus xorg not finding my mouse/keyboard after an update get old.

    Like

  2. Well said, sir.Based on this post, I assume that –as-needed is still unsupported? What are the disadvantages?the Textile link is down btw

    Like

  3. Sorry, I disagree; 1.4 needed unmasking, and it sucks that the transition was so quick and harmful.There is too much emphasis on keeping ~arch working perfectly, while ‘arch’ users are left feeling ignored. It’s nice that ~arch users get a nice wake up every so often :) Even better, that some devs make the bump to libpng, and attempt to get it into stable as quickly as possible. While writing nasty blog posts is nice to relieve stress, it really doesn’t help the situation much.

    Like

  4. Glad your feeling better. Knew there was an issue unlike you to be silent so long ;)Please investigate cinnamon as you might find the ‘real’ cinnamon very beneficial. IE Ceylon cinnamon.

    Like

  5. Got bit by this myself. As for arch needing to run smoothly, for me thats not the issue(it would be nice, but i know some breakage will happen), a simple news item to alert me of systemwide damage if i updated after the emerge –sync(eix-sync in my case) would have been greatly appreciated. I’m pretty sure we got one for libexpat. It would have saved me from posted a bug for gvim about it failing to work after a libpng update followed by a gvim update, as gtk is unusable from the upgrade.Diego, to enable “–as-needed”, just add it to cflags? also how much of the tree fails to build with –as-needed?

    Like

  6. Hmm, is there anything holding lafilefixer being part of portage? What is the current status of the –as-needed support? Can users help?

    Like

  7. Pavel, as-needed support seems pretty good. Diego still finds and files some problems with it, but it seems like regular users are now running with as-needed on real systems and things mostly work. Part of the disconnect between the parts of that statement is that Diego uses a spec file that forces as-needed, which exposes problems in packages that ignore LDFLAGS. Users who just put it in make.conf do not get as-needed for such packages, but they also do not get the breaks that he sees.

    Like

  8. This nonsense sucked up a good entire day of messing about. I would like to stress that it wasn’t just people moving to 1.4, but in my case it was even 1.2.43-r2.

    Like

  9. Hi all,i follow your blog for some time and now its the first time i am really surprised. Occasionally i did a -eD at the last weekend and 800 pkg’s went well. 34 depending on libpng. Not one failed, not one is not running as expacted. All done with –as-needed.I would not tell if i’ve got the impression you talk about the lot of the tree and just my 34 pkg’s running well by accident.Or is it that i missed the need of using the unstable versions of libpng ? The system here is running perfectly with 1.2.40 .I understand the generell la and libtool disaster, but not the “1.4-hype”. Could you just give a hint.thanksKarl

    Like

  10. Hi again,so after verifying and syncing again two hours ago – yes it is also messy here. So i have to excuse.Just the 1.2.40 is running if the now missing amd64 keyword is added again.So the questions left are. : why 1.2.43-r1 in Slot 1.2 and 1.2.43-r2 in Slot 0 have been stabilized (Not looking at Slot/Versions) and what for the amd64 keyword in 1.2.40 was deleted ?cheersKarl

    Like

  11. I’ve been running a number of LDFLAGS for some time, including –as-needed, since I first read Flameeyes’ blogs on the subject. The ld manpage lists what they do (with the gcc manpage listing the -Wl, prefix). Here’s what I run (a couple of them are I believe Gentoo default):LDFLAGS=”-Wl,-z,now,–as-needed,-O1,–hash-style=gnu,–sort-common”Based on the fact that my system has been –as-needed for years and my running lafilefixer –justfixit in the same script that invokes revdep-rebuild for me (lafilefixer is faster than even one package rebuild and can often avoid several, so it was worth putting in the script), PLUS the fact that I had the kde-4.4.3 upgrade to do at the same time (IIRC, the libexpat fiasco happened shortly /after/ a kde update, and being before I was running –as-needed, I ended up rebuilding most of that kde update over again), by the time I was done with my normal emerge –newuse –update –deep –keepgoing @world, I had only 17 packages needing to be revdep-rebuilt after lafilefixer did its thing. One of those (a gtk related package, IDR which) was to be updated in the @world, but failed then due to libpng errors. And the first revdep-rebuild had a couple failures as well, leaving two for a second round. But the second lafilefixer and revdep-rebuild round got ’em all, and the update was *FAR* simpler than the libexpat fiasco.BTW, Cynyr said he thought there was a news item for the libexpat thing. I believe he’s mistaken, as IIRC the libexpat fiasco was what triggered the whole news item idea, which became a GLEP, which was finally implemented, all because of the trouble the /last/ time something like this happened with no warning.So it disturbs me that there wasn’t at least a news item for this, since this is precisely the thing the whole new item infrastructure was designed for. Maybe there will be by the time it goes stable. But come on, it’s not like the problem couldn’t have been foreseen and a proper news item issued, even if because I run –as-needed it wasn’t such a big problem for me this time around. Had the timing of the kde upgrade hit me a bit different, I’d have had a number of kde packages to rebuild as well, and I may well have chosen to delay the kde upgrade to coincide with the libpng upgrade (as I did, I think purely by chance timing), had there been a news item about it.(Sending this blind, as preview never has worked for me on this site. Maybe it requires scripting and submit doesn’t, don’t know, but it has never worked, while submit does.)Duncan

    Like

  12. Q: Will we ever learn?A: No, because a lot of my last attempts to fix ebuilds that install lafiles (by sending ebuild patchsets to the bugzilla) has been rejected or ignored just because the ebuild’s maintainer *don’t want to break user systems as the removal of lafiles leads to libraries breakage*.It’s obvious that it’s true that the lafiles removal *can* (very rarely) leads to libraries breakage, but *totally ignoring the problem* is leading to libraries breakage too as reported by you :Dso what is the advantage to be so conservative with the *.la problems? simply there aren’t advantages at all, so why not just start to massively inject USE ‘static-libs’ and remove lafiles from the ebuilds without thinking of the *users consequences*? after all, be conservative or not, it just ends with the same results.

    Like

  13. It’s a bit of a pitty that the “parallel libraries” issues is not solved properly with the solution that makes berkeley db transitions bearable (symbol versioning). You can do it yourself though using @-Wl,–default-symver,@ for the new library (or globally)

    Like

  14. Paul, that’s quite true… although I don’t think it’s such a good idea to enable it from “the outside”, as it (at least) used to overwrite the otherwise-provided symbol versioning, and also breaks the @dlvsym()@ calls for unversioned symbols. It also has one other problem: versioning is a GNU extension – which if I remember correctly is implemented by FreeBSD as well – but is not part of the generic ELF specifications so not all the linkers/loaders out there supports it.On the other hand, libpng does not really need to be parallel-installable; the current slotting trick to install libpng12 as well as 14 is there mostly for compatibility with binary software, similarly to how it’s done with readline. The problem with keeping the old libpng around for preserved-rebuild to work, is something “I contested before”:https://blog.flameeyes.eu/2… — it would work much better if we moved to @–as-needed@ by default.And also, let’s be honest, the current situation with BerkDB is not perfect either: packages link unconditionally to the latest version – or at least try to – and every update makes it a new game to fix the packages to build properly (and 5.0 seems to get worse than before!).

    Like

  15. Well, almost 2 years later, and we still have a mess.libpng was recently “updated” to have only 1.4.9, and 1.5.{10,11}As a result, and as a result of I know not why, KDE starts and runs with blank icons.Stderr reports that some applications wer linked against libpng 1.4.8 but running against1.5.11. I deleted any and all KDE pieces on the system and re-built KDE four times.And still there is a problem!I delete /usr/local/lib64/libpng* (it was 1.4.8) and aterm crashes because it is missing libpng 1.4.8. Ldd tells me that aterm needs _both_ versions, and indeed after re-installing 1.4.8(in /usr/local/lib64), aterm will emerge and run.I have no idea what to do next.I actually installed Ubuntu on the machine, but it is just as messy once you leave the straight andnarrow path of a plain installation, and after a week I decided it is worse than Gentoo and am backto struggling with KDE.Do you know what compiles without a hitch, and runs perfectly?Enlightenment e16. Yes, this 1998 version is still more viable than KDE (in most versions of KDE).If only it had a dock/panel (or a way I knew to manage the contents of a systray) I would behappy.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s