Your symbol table is not your ABI

You’d say that for how much I have written on the topics related to shared libraries, ABIs, visibility and so on, I would be able to write a whole book about the subject. Unfortunately, between me not being a native speaker – thus requiring more editing – and the topic not being about any shiny web technology, there seem to be no publisher interested, again. At least this time it’s not that somebody else wrote it already. Heh.

When a wise man points at the moon, the fool looks at the finger.

Yesterday (well, today, but I’m going to post this in the morning or afternoon) I hit an interesting bug in the tinderbox: a game failing to build apparently because of a missing link to libesd (the client library of the now-obsolete ESounD. This seemed trivial to me, as it appeared that the new libgnome didn’t link to libesd any longer. I had to look a bit more to see how bad the situation was. Indeed, libgnome used to bring in libesd via pkg-config; then it started using Requires.private, so it was only brought in indirectly (transitively) through the NEEDED entries. Finally, with this release, it is not even in the NEEDED entries, as long as you’re using --as-needed.

What happened? Well, with the latest release, the old, long-time deprecated esd interfaces for sound support are finally gone, and all the gnome-sound API is based off Lennart’s libcanberra; this is the good part of the story. The other part of the story is that two functions that were tightly tied with the old ESounD interface (to load a sample into the ESounD session, and get a handler to the currently running esd session, respectively) are no longer functioning. Since GNOME is not supposed to break ABI (Application Binary Interface) between releases within version 2, the developers decided to keep the symbols around, and, quoting them, to keep linking to libesd to maintain binary compatibility.

Well, first off, the “binary compatibility” that they wish to keep by linking to libesd is rather compatibility with underlinked software that, in its own self, is a bad thing we shouldn’t be condoning at all. Software that uses esd should link against it, not rely on the fact that libgnome would bring it in.

On the other hand, they seem to assume that the ABI is little more than what the symbol tables provides. Of course there are the symbols exported that are part of the ABI, and their types; for the functions, also the type and order of their parameters. You also have to add the structures, the types of their content, their size and the order. All of this is part of the ABI, yet the only thing that the linker will enforce is the symbol table, which is why in my title I wanted to make it clear that the symbol table is not the only part of the ABI; or the fact that the ABI is not only what the linker can enforce — or the API what the compiler can enforce!

What is that I’m referring to? Well as I said there are two functions now that make no sense at all; they weren’t, though, removed, as that would have broken the part of the ABI visible by the linker. Instead, the functions now return the value -1 (i.e. there has been an error) without doing anything (almost; actually the sample loading is done through libcanberra, but it really doesn’t matter given that you’re trying to get the sample handle, that is not returned anyway). Even though this won’t cause a startup error, or a build-time error, it’s still breaking the ABI: software that was built relying on the two functions working will not work starting this version.

You’re not maintaining binary compatibility, you’re hiding your own ABI breakage under the rug!

This is not dissimilar to the infamous undefined symbol problem as it simply shuffles to runtime a mistake that could have been easily found at build-time, by making the build fail when the symbol was used, to run-time, by forcing an error condition where the code was unlikely to expect one.

In this particular case, only two packages directly express the need for esd to be enabled in libgnome2; they should be fixed or simply dropped out of Portage, as ESounD has been deprecated for many years now and thus make no sense to find a way to fix these packages, they are more than likely dead if they didn’t write a fallback already they have probably stopped developing it.

Fixed in overlay (read: not fixed)

One unfortunately still too common practice I find in my fellow developers is to rely too much on overlays to get users involved; my personal preference on this matter is getting people to proxy-maintain packages, and the reason is that this way I can make sure the fixes propagate to all the other users in a timely matter, as well as being able to intercept mistakes before I commit them to the tree.

But there are other reasons why I dislike overlays; for instance, they often clash enough with each other, or mix should-be-working packages with don’t-even-try … which is the case of the current Gnome overlay. I used to use the Gnome overlay so I could test and help reporting bugs before the next release hit ~arch; unfortunately since a while ago now, the overlay contains Gnome3/Gtk+3 packages that really shouldn’t be mixed in on a system that is actually used.

This became obnoxious to me the moment I went to actually try Rygel (so that I could actually get rid of MediaTomb, if it worked and I could add it to the tree — that code is noxious!). The problem is not in the high reliance of Rygel on Vala, that would be good enough, given that we have it in tree; the problem is that the Rygel UI (and after trying it out I can safely say that you don’t want to try without the UI) requires either Gtk+3 (no way!) or Gtk+ 2.21… which is the “devel” branch and is present only on the gnome overlay. Not even masked in tree.

It wouldn’t have been too bad, if it wasn’t that upstream (finally!) split gdk-pixbuf from gtk+ itself, so you should finally be able to use librsvg without X11 on the system (which is why my charts are available only as SVG and cannot be seen by some browsers who have trouble displaying embedded SVG). Unfortunately, this also means that they changed the path gdk-pixbuf uses to load the loaders (no pun intended); and the current ~arch librsvg won’t pick that up. Again, the librsvg in the overlay has automagic-deps trouble, and require both Gtk+2 and Gtk+3 to be present to work. D’oh!

This is nothing new, what is the problem? Well, ostensibly beside the fact that Arun blogs about something we can’t have ;) — not your fault, I know, not picking on you, don’t worry Arun!

The problem comes when I’ve asked before why the Gnome stuff is not pushed in main tree under p.mask, like most other teams work, especially given that I can make use of the tinderbox to check reverse dependencies before it’s unleashed, rather than have to report them afterwards. Indeed, Gtk+ and other libraries’ updates tend to be quite boring because there is way too much software that define GTK_NO_DEPRECATED and similar, which should only be used during development, and thus fail when stuff they us do get deprecated. Of course even if they didn’t define that, the code would be failing at the following update when they get removed, but that’s beside the point now.

Interestingly enough, though, the effect of (at least some) more recent deprecation seem to be causing the same kind of issue (if, by the mere fact we’re talking about Gtk+, to a lesser severity) of the recent glibc-2.12 release in the form of undefined symbols where GTK_* macros are used.

As you might suspect the tinderbox already stumbled across a few of these packages; while the quick-fix is generally quick (drop the NO_DEPRECATED definition), the complete fix (use the correct non-deprecated API) takes a while, and I can’t blame the maintainers for waiting to hear from upstream on that matter, especially given the way gtk+ is always dropped like a bombshell. Just to be on the safe side, I’ve now added some further tests to ensure that neither the “symbols” requirements caused by gtk+–2.20 nor those caused by glibc-2.12 will be left standing without further notice. If the tinderbox will ever build such a broken package, it’ll be reporting it to me so that I can file the proper bug.

Now, the gtk+–2.21 situation seem to start just as well; gtkhtml fails to build and it’s even part of Gnome itself. I will be begging the Gnome team again, starting from here, for them to add the ebuilds as soon as they are usable in main tree, under p.mask, so that the tinderbox can start churning at them.

But since people seem to think I write too much negatively, I have to say that at least a few developers seem to actually keep in mind there is the tinderbox available; Samuli, Alexis and Jeroen asked for feedback before unmasking (XFCE, Ocaml and libevent-2 respectively), and the problems found have been taken care of much more quickly then even I expected, for the most part. So if you’re a package maintainer and want the reverse dependencies of your package tested before unmasking a version into ~arch, just drop me a mail and I’ll set up a special run. It can take anything between half a day to a week or two depending on the size of the revdep tree and the queued up runs (right now it’s completing a full-system-set rebuild to see if there were more issues with glibc-2.12; turns out I only hit another problem and that was related to GNU make 3.82 instead (another “good” scary bump).

After this post, you can guess the following run is going to target gtk+–2.20. If there are no further runs, after that it’ll resume the daily-build of the tree.

Please keep in mind though: package.mask-ed packages are fine, but the tinderbox will not test any overlay. Get your fixes in the tree proper!

I don’t like what people tell me is good for me

— That drink was individually tailored to meet your nutritional requirements and pleasure.
— So I’m a masochist on a diet!

Arthur Dent and the Nutramatic Machine; The Hitchhiker’s Guide To the Galaxy — Secondary Phase

One of the reasons given for Free Software popularity among geeks and other technical people is that it consists, for many, of a simple way to scratch their own itches; it’s probably the same reason why I keep using Gentoo: it allows me to scratch my own issues pretty easily. Since people scratch their own issues, they do the things the way they best like, and that turns out to be successful because both great minds and lots of geeks think alike.

At the same time, there is a very strong drive to give Free Software to the masses… this drive is ethical for some, commercial for others, but the bottomline can be generally summarised in “Free Software needs to be done the right way”. This includes many aspects of Free Software: from code quality, to maintainability, to usability of the interfaces. And once again, to be able to have results you have to accept that you’re going to have rules, standards and common practises to accept. The problem is: how do you forge them? And how much they should distance the “older” versions?

Now, for once don’t let me get into the technicalities of code practises, QA and so on so forth… I’ll focus on something that I have to admit I have near to no working knowledge of: interface usability. I’m a developer, and as many developers, I suck at designing interfaces that are not programming interfaces: websites, GUIs, CLIs… you name it, I suck at it. Thus why I do find it very helpful that there are usability experts out there that works hard to make software interfaces better to use for the average user and (possibly) for me as well.

— The ventilation system; you had a go at me yesterday.
— Yes, because you keep filling the air with cheap perfume.
— You like scented air, it’s fresh and invigorating.

Arthur Dent and the Heart of Gold ventilation system; The Hitchhiker’s Guide To the Galaxy — Secondary Phase

Unfortunately, I’m afraid stuff like that soon gets overboard, because people start to take a liking into dictating how other people should use their computer. This is among the other things one of the most common criticism directed toward Apple, as they tend to only allow you certain degree of use of both their hardware and their software; and the obvious challenge is to get their hardware (at least) to do something it wasn’t deigned for (second hard-drive on MacBooks, XBMC on AppleTV, iPhone Jailbreak…).

Now, sometimes the dictats on how to do something turn out for the best, and people are hooked into the new interfaces and paradigms (let’s take as example the original iMac’s lack of a floppy disk drive; I wouldn’t be surprised if Apple were at some point to drop optical drives on all their line of computers and then ship OSX on read-only USB media). This might create a trend that is followed by other developers, or manufacturers, as well. Without entering the merits of the iPhone in the sparks of Android phones, just think of when Apple pushed iTunes with their iPods: the average Windows user used WinAMP before, and iTunes has a completely different interface, on Linux, XMMS first and Audacious after was the norm, both using the same interface as WinAMP. After iTunes, and for Linux especially after Amarok, around version 1.3, we have a number of playlist-centric players instead.

Now, once upon a time, the KDE users and developers laughed at GNOME’s purported usability studies that hid all the settings, and caused Nautilus to become “spatial” (I remember one commenter on the issue, supporting the then-new spatial Nautilus by saying that tabbed browsing wasn’t usable because it would have been the same as glueing together newspapers to read them… now that was a silly thing to say, especially in that context). With time the situation reversed, for a while at least with KDE deciding to “move for usability” and “new concepts” with KDE 4… and breaking the shit out of it all, for many people, me included. I think a very iconic point here would be some of the complains I heard about the latest Amarok development in #gentoo-it, about the application is supposedly “more usable” by changing so many things around that even long-time users can’t feel at home any longer.

While Amarok always had this edgy feeling that it could screw up your mechanics by simply deciding that something is better done in the opposite way that it was before, it worked out because the ideas caught on pretty quickly: people moaned and ranted, but after a month or two, near everybody was enthusiast, and wondered why the other players didn’t do the same. This trend has changed with Amarok 2 it seems, as I heard almost only rants, and very few enthusiasts outside of the core developers. And I’m not speaking about the technical side of things here (like the usage of MySQL Embedded — which in my opinion has been a very bad move… mostly because MySQLe was definitely not ready at the time, as Jorge might tell you).

But my safe haven of GNOME start to feel disturbed; while I’ve read good things about the “Usabiltiy Hackfest” that happened a couple of weeks ago in London, sponsored among others by Canonical if I recall correctly, some of the posts coming from there looked positively worrisome. In particular, Seth Nickell’s posts about “Task Pooper” (maybe I’m biased but projects choosing such names feel like a very bad start to me) reminded me a lot of Seigo’s posts about Plasma, and while I hear most people happy with it as implemented currently, I also remember the huge rants in the first iterations where the whole interaction was designed out of thin air… I’ll quote the Ars Technica article (which title is in my opinion a bit too forceful):

Despite his protest that the new design isn’t “handwavy,” I had a hard time seeing how all the pieces fit together after reading the initial document. [snip]

Actually, I think Nickell’s went to say that his design was not exactly what he made it to be, as it stands now. Going all the way to declare the New Majestic Paradigm Of Desktops is the first bad move if you want something good, I think. Not only it’ll add a lot of expectation to a project that is for now just designed out of thin air, but it also make him sound way too convinced about his stuff. I like it much better when the designers are not convinced about their stuff as that means they’ll think about it a lot more… it’s a challenge of second-guessing oneself and improving step by step. If you think you reached the top already, you’re going to stop thinking about it.

At any rate, the point I wanted to make was simply that people need to complain and need to rant about things, if you want them to be good. So please don’t take my rants always as negative, I do rant, and sometimes I rant a lot but I usually do that because I want to improve the situation.

P.S.: if GNOME 3 turns out to break as many things as KDE 4.0 I might consider to try the latest version of KDE at that time. Unfortunately I have heard too many bad things about KMail and eating email… so I’m still a bit wary. I really like the idea of GNOME developers working on 3.0 already, even though 2.30 is still to be released… branching is good!

Sorry Sput, but Quassel has to go from my systems

I’m going to get rid of Quassel in the next days unless something drastically changes, but since I really think that Sput was doing a hell of a good job, I’d like to point out what the problems are in my opinion.

There’s nothing wrong with the idea (I love it) nor with the UI (it’s not bad at all); having it be cross-platform also helps a lot. What I really feel is a problem, though, is the creeping in of dependencies in it. Which is not Sput’s fault for the most part, but it is a good example of why I think Qt and KDE development is getting farther and farther from what I liked about it in the past.

With KDE, the last straw was when I’ve noticed that to install Umbrello I had to install Akonadi, which in turn required me to install MySQL. I don’t use MySQL for myself, I used for a couple of web development jobs but I’d really like for it to stay stopped since I don’t need it on a daily basis. On the other hand I have a running PostgreSQL I use for my actual work, like the symbol collision analysis. I doubt that it would have required me to start MySQL or Akonadi to run Umbrello, but the problem was with the build system. Just like KDE guys bastardised autotools in what is one of the most overcomplex build systems that man was able to create in the KDE 3 series, they have made CMake even worse than it would be as released by Kitware (which, on the other hand, somehow seemed to make it a bit less obnoxious — not that I like it any better, but if one has the major need of building under Windows, it can be dealt with better than some custom build systems I’ve seen).

So the new KDE4 build system seems to pick up the concept of shared checks from KDE3, which basically turns down to be a huge amount of checks that are unneeded for most software but will be executed by all of it, just because trying to actually split the “modules” in per-application releases, like GNOME does already, is just too difficult for SuSE, sorry, KDE developers.

This time the dependency creep hit Quassel badly. The recent releases of Quassel added a dependency over qt-webkit to show a preview of a link when posted in IRC. While I think this is a bad idea (because, for instance, if there was a security issue in qt-webkit, it would be tremendously easy to get users to load the page), and it still has implementation issues when the link points to a big binary file rather than a webpage or an image, it can be considered an useful feature so I never complained about it.

Today after setting up the new disks the update proposed by portage contained an interesting request of installing qt-phonon. Which I don’t intend to install at all! The whole idea of having to install phonon for an application like Quassel is just out of my range of acceptable doings.

I was the first one to complain that GNOME required/requires GStreamer, but thanks to Lennart’s efforts we now have an easy way to play system sound without needing GStreamer, on the other hand, KDE is still sticking with huge amount of layers and complex engines to do the easiest of the tasks. I’m not saying that the ideas behind Solid and the like are entirely wrong, but it does feel wrong for them to be KDE-only, just like it feels wrong for other technologies to be GNOME-only. Lennart’s libcanberra shows that there is space for desktop-agnostic technologies implementing the basic features needed by all of them, it just requires work and coordination.

So now I’m starting up Quassel to check on my messages and then I’ll log it out, after installing X-Chat or something.

Prank calls and threats

Disclaimer: please take this post with a grain of salt and consider it tongue-in-cheek. While the situation from which it sprouts is very real, the rest is written with a jokingly tone. In particular I wish to state beforehand that I’m not trying to tie anything to Ciaran and his group, or any other group for what matters.

As it turns out, last night some very funny guy thought that it was a nice idea to call me and tell me I’ll die, obviously with the caller ID withheld. It’s almost certainly a prank call (as a good rationalist, it’s 99.98% a prank call, 0.02% you never know about), but with the cold and the meds in me, I didn’t have the quickness of response to say “you go first and don’t spoilt it to me how it is”.

Just to cover all basis, I’m now considering who might actually want me dead. Which turns out that, if we consider Hans Reiser’s extreme personality cases, might be quite a bit of people. I wouldn’t count Ciaran in the list though, since a) I respect him enough to trust he wouldn’t do it anonymously if he wanted to and b) Stephen is more the person to slander rather than threaten. Beside, that area has been quiet for quite a bit of time that I almost forgot about it.

Last time I was threatened was the time of XMMS removal and it was more than two years ago by now. I don’t think this is related to that at all. But staying on the multimedia side of the fence, I can see a possible issue in people disliking PulseAudio with no good reason (the link is for a positive post); but even though I do lend a hand to Lennart with autotools, I sincerely doubt that my involvement is enough for people to want to get rid of me just for that.

It could have been some anti-Apple activist gone crazy for my previous post praising some of Apple’s products, I guess I should have started with a list of things I don’t like about Apple, or with a list of things that I do for Free Software each day and which is not going to stop just because I can settle for now with Apple products. But there are more chances that if somebody wants me dead is for dissing some projects he likes, it’s not like I didn’t criticise quite a few before, like cmake.

But if we expect this to be tied to something that happened recently, I shouldn’t rule out my criticism of Ruby 1.9 as well as my not-so-recent move from KDE to GNOME (I have to say, why if I move from KDE to GNOME, and I have been a KDE developer, it does not even make a news site, and if Linus does he gets on LWN ? I dare to say this is unjust!). These sound more likely for crazy guys just because they might feel “betrayed” since I was on their side before and then turned away, while for what concerns XMMS, PulseAudio, Apple and CMake I haven’t changed (much) my opinion.

Another option, if we follow what the mass-media has shown of black-hat hackers (even outside our general Western culture, is that somebody got upset about my recent security-oriented work, either because I found some security issue that they tried to hide around. Or it’s another security analyst that is upset because I found so many issues all at once.

All in all, I guess I could have enough reasons to worry if enough FOSS people suffer from the Reiser syndrome. Hopefully this is not the case. The good news is that nobody left me threats on the blog or via e-mail so I really don’t think I have to worry about FOSS people. And please if you think it would be fun to leave them now, just don’t, okay?

For A Parallel World. Case Study n.6: parallel install versus install hooks

Service note: I start to fear for one of my drives, as soon as my local shop restocks the Seagate 7200.10 drives I’ll go get two more to replace the 250GB ones and put them under throughout tests.

I’ve already written in my series about some issues related to parallel install. Today I wish to show a different type of parallel install failure, which I found while looking at the logs of my current tinderbox run.

Before starting, though, I wish to explain one thing that might not be tremendously obvious to most people not used to work with build systems. While the parallel build failures are most of the time related to non-automake based buildsystems, which fail to properly express dependencies, or in which the authors mistook a construct for another, the parallel install failures are almost always related to automake. This is due to the fact that almost all custom-tailored buildsystems don’t allow parallel install in the first place. For most of them, the install target is just one single serial rule, which always works fine even when using multiple parallel jobs, but obviously slows down modern multicore systems. As automake supports parallel install targets, which makes it quite faster to install packages, it also adds the complexity that can cause parallel build failures.

So let’s see what the failure I’m talking about is; the package involved is gmime, with the Mono bindings enabled; Gentoo bug #248657, upstream bug #567549 (thanks to Jeffrey Stedfast, who quickly solved it!). The log of the failure is the following:

Making install in mono
make[1]: Entering directory `/var/tmp/portage/dev-libs/gmime-2.2.23/work/gmime-2.2.23/mono'
make[2]: Entering directory `/var/tmp/portage/dev-libs/gmime-2.2.23/work/gmime-2.2.23/mono'
make[2]: Nothing to be done for `install-exec-am'.
test -z "/usr/share/gapi-2.0" || /bin/mkdir -p "/var/tmp/portage/dev-libs/gmime-2.2.23/image//usr/share/gapi-2.0"
test -z "/usr/lib/pkgconfig" || /bin/mkdir -p "/var/tmp/portage/dev-libs/gmime-2.2.23/image//usr/lib/pkgconfig"
/usr/bin/gacutil /i gmime-sharp.dll /f /package gmime-sharp /root /var/tmp/portage/dev-libs/gmime-2.2.23/image//usr/lib
 /usr/bin/install -c -m 644 'gmime-sharp.pc' '/var/tmp/portage/dev-libs/gmime-2.2.23/image//usr/lib/pkgconfig/gmime-sharp.pc'
 /usr/bin/install -c -m 644 'gmime-api.xml' '/var/tmp/portage/dev-libs/gmime-2.2.23/image//usr/share/gapi-2.0/gmime-api.xml'
Failure adding assembly gmime-sharp.dll to the cache: Strong name cannot be verified for delay-signed assembly
make[2]: *** [install-data-local] Error 1
make[2]: Leaving directory `/var/tmp/portage/dev-libs/gmime-2.2.23/work/gmime-2.2.23/mono'
make[1]: *** [install-am] Error 2
make[1]: Leaving directory `/var/tmp/portage/dev-libs/gmime-2.2.23/work/gmime-2.2.23/mono'

To make it much more readable, the command and the error line in the output are the following:

/usr/bin/gacutil /i gmime-sharp.dll /f /package gmime-sharp /root /var/tmp/portage/dev-libs/gmime-2.2.23/image//usr/lib
Failure adding assembly gmime-sharp.dll to the cache: Strong name cannot be verified for delay-signed assembly

So the problem comes from the gacutil program that in turns comes from mono, which seems to be working on the just-installed file. But was it installed? If you check the complte log above, there is no install(1) call for the gmime-sharp.dll file that gacutil complains about, and indeed that is the problem. Just like I experienced earlier, Mono-related error messages needs to be interpreted to be meaningful. In this case, the actual error should be a “File not found” over /var/tmp/portage/dev-libs/gmime-2.2.23/image//usr/lib/mono/gmime-sharp/gmime-sharp.dll.

The rule that causes this is, as make reports, install-data-local, so let’s check that in the mono/Makefile.am file:

install-data-local:
        @if test -n '$(TARGET)'; then                                                                   
          if test -n '$(DESTDIR)'; then                                                         
            echo "$(GACUTIL) /i $(ASSEMBLY) /f /package $(PACKAGE_SHARP) /root $(DESTDIR)$(prefix)/lib";                
            $(GACUTIL) /i $(ASSEMBLY) /f /package $(PACKAGE_SHARP) /root $(DESTDIR)$(prefix)/lib || exit 1;     
          else                                                                                          
            echo "$(GACUTIL) /i $(ASSEMBLY) /f /package $(PACKAGE_SHARP) /gacdir $(prefix)/lib";                        
            $(GACUTIL) /i $(ASSEMBLY) /f /package $(PACKAGE_SHARP) /gacdir $(prefix)/lib || exit 1;             
          fi;                                                                                           
        fi

So it’s some special code that is executed to register the Mono/.NET assembly with the rest of the code, it does not look broken at a first glance, and indeed this is a very subtle build failure, because it does not look wrong at all, unless you know automake enough already. Although the build logs helps you a lot to find this out.

The gmime-sharp.dll file is created as part of the DATA class of files in automake, but install-data-local is not depending on them directly, its execution order is not guaranteed by automake at all. On the other hand, the install-data-hook rule is called after install-data completed, and after DATA is actually built. So the solution is simply to replace -local with -hook. And there you go.

Next…

Security mitigation strategies or, the only secure computer is the one yet to assemble.

Short preamble: I’m in a very depressed mood, like I haven’t been in month; this is very bad for my health but usually means I can focus on things much better, so you might actually find out I’m doing more than usual. Of course there is also to count in that I’m working during holidays so it’s not going to be all nice at all, even counting my depression off.

As I’ve written, I don’t trust closed-source software even the slightest; even though it does not really mean that free software is much better, process-wise, dealing with bundled libraries (like the bundled libs bug shows), with free software, or at least open-source software, there is the chance to check the sources out to fix the eventual issues.

This means that I won’t be using closed source software where security is a major concern, but since sometimes I have to use closed-source software, like Skype, or Sun’s compiler, it’s obvious that I have to find a compromise so I can still use them and yet feel reasonably safe. This is what is usually called having a mitigation strategy.

One of the most complex and well known mitigation strategies is of course SElinux, which makes a Linux system more like an APC than a computer. But such a system is probably safe to consider overkill for most systems, especially power user desktop systems.

Since this is, as I said, overkill, I’m more prone to look at smaller strategies, one of which I already discussed about: pam_mktemp . This module allows to create per-user private directories that make it much harder to exploit insecure temporary files vulnerabilities. Which is very nice since this seems to be a very common class of vulnerabilities, and my data shows that there is way too much software that still uses insecure functions to create temporary files, closed and open source alike.

Unfortunately, as you can read in my earlier blog post, this is not automatically a way out of the problem. The start-stop-daemon command from OpenRC plays nice with this just in the last release, and even with that, there are problems. The first problem is that the way pam_mktemp works, there is a need for the software calling PAM to open the session to properly set up the environment with its changes (which is what s-s-d lacked in previous versions). This causes for instance the gnome-keyring daemon to start with the wrong temporary directory when started by the PAM session chain. Even though pam_mktemp is invoked before the daemon, by the time it’s started the TMPDIR variable is not set in the environment. The reason for this is that the variable should not be changed if the session chain aborts the login.

The second problem is that not all software supports TMPDIR properly; Emacs has been fixed recently and now the emacs daemon starts up properly, but other software ignores TMPDIR altogether. VirtualBox (of which I still have things to say beside this) does not respect it for instance, which means that the module wouldn’t have spared you from the recent vulnerability that involved the software.

The third problem is that sometimes software expects TMPDIR to be world-readable, which is a bad assumption; Samba does this, and since s-s-d is now fixed, it now fails to work on my system. I still haven’t found out whether the PAM session chain was called at that point, and it’s just duplicating the problem with s-s-d with a different symptom, or if it fails to call it entirely. In either case, it’s a thing that has to be fixed to make sure that mitigation strategies like this one get in the default spirit of users.

But again this is just one part of the problem, and one part of mitigation. Other problems relate to the way we run some of the services, a lot of which still run as root rather than under a unprivileged user; while the git-daemon issue is now solved and the default install does not run as root any longer, there are more daemons that have the same problems.

Just as an example, I noticed that the iSCSI daemon ietd still runs under root, and I’ve added that to the list of software I have to check to see if I can improve it. Similarly, the init script for mpd does not use s-s-d to switch user but leaves it to mpd itself, spawning it by default with unneeded root privileges, and additionally not allowing pam_mktemp to create a new temporary directory for the mpd user (I have to spend some time on that since I’d also like to provide an alternative init script with multiplexing, which would then allow to run multiple mpds for different users, and in my case to just have the single mpd running as my own user rather than a different user entirely).

At any rate, I’m going to continue my best to make sure that secure defaults are in place in Gentoo, and that further mitigation strategies can be made available so that the users forced to use proprietary closed-source software don’t need to just accept whatever comes their way. Please join my efforts, if you can, by checking which software ignores TMPDIR and asking nicely upstream to fix the issue.

For A Parallel World. Case Study n.1: automake variables misuse

Following my post about parallel builds I started today to tackle down some issues with packages not properly building with parallel make. Most of them end up being quite easy to fix, some of them don’t have to be fixed at all, just need the -j1 dropped out of the ebuild because they already build fine (this usually is due to an older version failing and the ebuild never being revisited).

As I haven’t been able yet to find time and energy to restart writing full-fledged guides (the caffeine starvation doesn’t help), I decided to start writing some “case studies”. What I mean is that I’ll try to blog about some common problems I found in a particular package, and show the process to fix that. Hopefully, this way it’ll be easier for other to fix similar problems in the future. This also goes toward the goal of showing more of what Yamato does (by the way, once again thanks to everybody who contributed, and you all are still able to chip in if you want to help me).

The first case study in the list is for libbtctl (that I think is deprecated for what I can understand of its author’s comment).

When building with -j8 (and dropping the ebuild serialisation), the build will fail with an error similar to this:

libtool: compile:  x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I.. -g -I../intl -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include -I/usr/include/pygtk-2.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/python2.5 -I/usr/include -DDATA_DIR="/usr/share/libbtctl" -DGETTEXT_PACKAGE="libbtctl" -march=barcelona -O2 -ftracer -pipe -ftree-vectorize -Wformat=2 -Wno-error -Wno-pointer-sign -g -ggdb -Wstrict-aliasing=2 -Wno-format-zero-length -MT btctl-pymodule.lo -MD -MP -MF .deps/btctl-pymodule.Tpo -c btctl-pymodule.c -o btctl-pymodule.o >/dev/null 2>&1
libtool: compile:  x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I.. -g -I../intl -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include -I/usr/include/pygtk-2.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/python2.5 -I/usr/include -DDATA_DIR="/usr/share/libbtctl" -DGETTEXT_PACKAGE="libbtctl" -march=barcelona -O2 -ftracer -pipe -ftree-vectorize -Wformat=2 -Wno-error -Wno-pointer-sign -g -ggdb -Wstrict-aliasing=2 -Wno-format-zero-length -MT btctl-py.lo -MD -MP -MF .deps/btctl-py.Tpo -c btctl-py.c -o btctl-py.o >/dev/null 2>&1
libtool: link: cannot find the library `libbtctl.la' or unhandled argument `libbtctl.la'
make[3]: *** [btlist] Error 1
make[3]: *** Waiting for unfinished jobs....
libtool: link: cannot find the library `libbtctl.la' or unhandled argument `libbtctl.la'
make[3]: *** [btctl-async-test] Error 1
libtool: link: cannot find the library `libbtctl.la' or unhandled argument `libbtctl.la'
make[3]: *** [btctl-discovery-test] Error 1
libtool: link: cannot find the library `libbtctl.la' or unhandled argument `libbtctl.la'
make[3]: *** [btsignal-watch] Error 1
make[2]: *** [all] Error 2
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

It’s an easy error to understand, it cannot find libbtctl.la, piece of cake. It’s more of a problem to find the cause if you don’t know that beforehand.

The first comment to have here is that the buildsystem used is standard autotools; standard autotools, if used with their internal rules, are not subject to parallel-make failures. They don’t build directories in parallel, but they do the rest in as much parallel as they can. This means that it’s either using a custom rule, or it has misused autotools.

Another common problem with “cannot find the library” problems with libtool is when the library is in a different directory, and the order of subdirectories is wrong; this rarely creeps into the distributed tarball, if upstream is smart enough to run a make distcheck or to at least build their own tarballs, but you never know; usually you find this while trying to change the way interdependent libraries links against so that they can be built with --as-needed.

But there’s a tell-tale sign in the message: the library is not prefixed with any path, so it’s not being built in a different directory but in the same one. This makes it very suspicious.

The first error comes from btlist, so let’s extract the source tarball, and look in src/Makefile.am (because that’s the most likely directory where it is defined, we could have grepped but it’s easier this way):

noinst_PROGRAMS=btlist [...]

[...]

btlist_LDFLAGS = 
        libbtctl.la  $(BTCTL_LIBS) 
        $(BLUETOOTH_LIBS) $(OPENOBEX_LIBS)

What do you know? this is the only property defined for the btlist target, and indeed, it doesn’t look right, the LDFLAGS variable should be used to pass flags to be used by the linker (like -Wl,--as-needed), not the names of libraries. Even worse, name of libraries that have to be built as prerequisites for the target.

Edit: Rémi made me notice that I didn’t give the actual solution here, for those who don’t know automake so well. The correct variable to pass the libraries on is either LIBADD (for other libraries) or LDADD (for final executables). As btlist is in PROGRAMS, the latter is what we need to use.

And obviously the same mistake is repeated for almost every target in the Makefile.am. But luckily there’s a very active upstream, and the bug can be solved the same day it is reported.

It’s not so difficult once you see how to do it, is it?

Dualhead, 16:10 and XRandR

With my move to Gnome I decided to try out again the graphical login. I used to use standard console login before because KDM lacked too many features, and while for some time I kept using GDM, it just didn’t feel right. Unless I was going to just do some system administration, or testing PAM, the only command I was going to type was startx, followed by sudo shutdown -h once I was done.

To do so, I had to give up keychain, at least for inserting the private key’s passphrase. Instead of using that, I decided to change my PAM setup to use pam_ssh (and this is also what brought me to plan removal of pam_ssh_agent .. I just remembered I forgot to add that to the Gentoo Calendar! — done!), so that instead of typing my password and my SSH key passphrase, I just have to type in my passphrase at login and I’m done. The nice thing is that the standard Unix password is also accepted, so I can use that to take me out of the lock screen that Gnome applies when I walk away from the system, which is much shorter.

Again, I will probably put up a request for an electronic ID card next september and consider using a smartcard reader (I I wonder if I can use old, invalid, credit cards as smartcards too, once they are formatted).

So okay, I’ll write at another time about pam_ssh, it’s not what I’d like to write about right now.

With GDM I’m having one problem: my current setup is a dualhead, with a 16:10 monitor (and a 4:3) and using XRandR 1.2 (radeon driver). I have the screen disposition set up in my xorg.conf, but I’m not able to set the proper mode for the 16:10 monitor in the configuration file. If I set PreferredMode in xorg.conf, X11 refuses to start, without any apparent error either.

So for now I’m running at the session manager level two xrandr commands to set up properly the screens, but there is one problem: synergys does not seem to be expecting the geometry of the screens to change, so it still wraps my pointer out on the coordinates where it should have wrapped before the xrandr fixes.

I should probably see to try the latest masked version of xorg, and check if it works there. Although I think Synergy should have considered the case of changing screen layout. Maybe one day it will be rewritten using XCB and it will consider that problem too ;)

Oh yeah I should be writing about XCB too…

I’m running Gnome

As it turns out, I start to dislike the way the KDE project is proceeding, and I don’t refer to the Gentoo KDE project, but to the whole of KDE project.

I dislike the way KDE 4 is being developed, with a focus on eyecandy rather than on features. This is easily shown by the Oxygen style; not only it is taking up a amount of screen real estate for widgets that remind me of Keramik (and if you remember, one thing that made happy a huge amount of users was the switch from Keramik to Plastik as default style in KDE 3.3), but it’s also tremendously slow. And I’m sure of this, it’s not just an impression: as soon as I switch Qt to use Oxygen, it takes five seconds for Quassel to draw the list of buffers; once I use QtCurve, it takes just one second. I don’t know if this is because Enterprise is using XAA and not EXA, but it certainly doesn’t look like something that the default theme should do.

And no, I’m not expected to use a computer that has less than an year, with a hyper-strength gaming videocard to be able to use KDE.

But this is just one of the issues I have with KDE recently. There are some policies I really, really, dislike in KDE. The first is one I already mentioned quite often and it’s the move to CMake. The only “good” reason to move to CMake is to be able to build under Windows using Microsoft’s Visual C++ compiler; yet instead of just saying “we needed cmake because it’s necessary to build for Windows” I see so many devs saying “cmake is just better than everything else out there”. Bullshit.

The other policy that I dislike regards the way KDE is developed and released as a single, huge, monolithic thing. One of the things that made KDE difficult to package in Gentoo (and other source-based distributions) was the fact that by default the source has to be built in those huge amorphous packages. And if the autotools-based build system of KDE sucked so much, it was also because of that.

But even if we leave alone the way the releases are made, it’s just not possible for everything to fall into a single release cycle kind of thing. There are projects that are more mature and projects that are less. Forcing all of them in a single release cycle makes it difficult to provide timely bugfixes for the mature projects, and makes it impossible for the not-so-mature projects to be tested incrementally. The last straw I could bear to see because of this stupid way of releasing, was knowing that Konversation in KDE 4 will probably lose the IRC-over-SSL support because KSSL was removed from the base libraries.

And now KDE 4.1 is on the verge of release, and Kopete still segfaults once you connect to Jabber. Yet when I tried (multiple times) to gather information about the possible cause in #kopete (so I could at least try to debug it myself), I had no feedback at all; maybe it’s because I run Gentoo, although the same happens on (K)Ubuntu. Yeah not the kind of people I like to deal with.

I’m not saying that I think Gnome is perfect for policies and other things. I dislike the fact that it’s always more Linux-/Solaris-centric than cross-platform centric; but I think KDE4 was a set back for that too, for what I read. And their release method does look a lot more sane.

I started using Linux with KDE 2. I moved to Gnome once KDE 3 was being taken care of. I came back to KDE just a bit before 3.3 release. Now I’m going to try Gnome for a while, and if I like it, I’ll think more than twice before going back to KDE. Yeah sure I liked KDE 3 better than I liked Gnome before that, but it’s just not feasible that I have to switch DE every time they want to make a new release.

Besides, since I last used it, Gnome seems much more mature and nicer to deal with.