The pain of installing RT in Gentoo

Since some of my customers tend to forget what they asked me to do and then they complain that I’m going overbudget or overtime, I’ve finally decided to bite the bullet and set up some kind of tracker. Of course given that almost all of said customers are not technical at all, using Bugzilla was completely out of question. The choice fell back to RT that has a nice email-based interface for them to use.

Setting this up seemed a simple process after all: you just have to emerge the package, deal with the obnoxious webapp-config (you can tell I don’t like it at all!), and set up database and mod_perl for Apache. It turned out not to be so easy. The first problem is the one I already wrote about at least in passing: Apache went segfaulting on me when loading mod_perl on this server, and I didn’t care enough to actually go around and debug why that is.

But fear not, since I’ve already rented a second server, as I said, I decided to try deploying RT there, it shouldn’t be trouble, no? Hah, I wish.

First problem is that Apache refused to start because the webmux.pl script couldn’t be launched. Which was at the very least bothersome, since it also refused to actually show me some error message beside repeating to me that it couldn’t load it. I decided against trying mod_perl once again and moved to a more “common” configuration of lighttpd and reverse proxying, using FastCGI.

And here trouble starts getting even nastier. To begin with, FastCGI requires you to start rt with its own init script; the one provided by the current 3.8.10 ebuild is pretty much outdated and won’t work, possibly at all. I rewrote it (and the rewrite I’ll see to push into portage soon), and got it to at least try starting up. But even then it won’t start up. Why is that?

It has to do with the way I decided to fix up the database: since the new server will run at some point a series of WordPress instances (don’t ask!), it’ll have to run MySQL, but there will be other Web Apps that should use PostgreSQL, and as long as performance is not that big an issue, I wanted to keep one database software per server; this meant connecting to PostgreSQL running on Earhart (which is on the same network anyway), and to do so, beside limiting access through iptables, I set it to use SSL. That was a mistake.

Even though you may set authentication as trust in the pg_hba.conf configuration file, the client-side PostgreSQL library tries to see if there are authentication tokens to use, which in case of SSL can be of two kinds: passwords and certificates. The former is the usual clear-text password, the latter as the name implies is a SSL user certificate that can be used to validate the secure connection from one point to the other. I had no interest to use user certificates at that point, so I didn’t care much about procuring or producing any.

So when I start the rt service (without using --background, that is… I’ll solve that before committing the new init script), I get this:

 * Starting RT ...
DBI connect('dbname=rt3;host=earhart.flameeyes.eu;requiressl=1','rt',...) failed: could not open certificate file "/dev/null/.postgresql/postgresql.crt": Not a directory at /usr/lib64/perl5/vendor_perl/5.12.4/DBIx/SearchBuilder/Handle.pm line 106
Connect Failed could not open certificate file "/dev/null/.postgresql/postgresql.crt": Not a directory
 at //var/www/clienti.flameeyes.eu/rt-3.8.10/lib/RT.pm line 206
Compilation failed in require at /var/www/clienti.flameeyes.eu/rt-3.8.10/bin/mason_handler.fcgi line 54.
 * start-stop-daemon: failed to start `/var/www/clienti.flameeyes.eu/rt-3.8.10/bin/mason_handler.fcgi'                                                                       [ !! ]
 * ERROR: rt failed to start

Obviously /dev/null is the home of the rt user, which is what I’m trying to run this as, and of course it is not a directory by itself, trying to handle it as a directory will make the calls fail exactly as expected. And if you see this, your first thought is likely to be “PostgreSQL does not support connecting via SSL without an user certificate, what a trouble!”.. and you’d be wrong.

Indeed, if you look at a strace of psql run as root (again, don’t ask), you’ll see this:

stat("/root/.pgpass", 0x74cde2a44210)   = -1 ENOENT (No such file or directory)
stat("/root/.postgresql/postgresql.crt", 0x74cde2a41bb0) = -1 ENOENT (No such file or directory)
stat("/root/.postgresql/root.crt", 0x74cde2a41bb0) = -1 ENOENT (No such file or directory)

So it tries to find the certificate, doesn’t find it, and proceed to find a different one, if even that doesn’t exist, it gives up. But that’s not the case for the above case. The reason is probably a silly one: the library looks up errno to be ENOENT before ignoring the error, while the rest of them is likely considered fatal.

So how do you deal with a similar issue? The obvious answer would be to make the home directory to point to the RT installation directory, so that it’s also writeable by the user; in most cases this only requires you to set the $HOME variable, but that’s not the case for PostgreSQL, that instead decides to be smarter than that, and looks up the home directory of the user from the passwd file…

So why not changing the user’s home directory to the given directory then? One reason is that you could have multiple RT instances in the same system, mostly thanks to webapp-config, and another is that even with a single RT instance, the path to the installed code has the package’s version in it, so you would have to change the user’s home each time, which is not something you should be looking forward to.

How to solve this whole? Well, there is one “solution” that is what I’m going to do: set up RT on the same system as PostgreSQL, either with lighttpd or by using FastCGI directly within Apache, I have yet to decide; then there is the actual solution to solve this: get the PostgreSQL client library to respect $HOME and at the same time make it not throw a fit if the home directory is not really a directory. I just don’t think I have time to dedicate to the real fix for now.

Switching to the native american

So after yesterday’s post and having seen mod_access_dnsbl (which I still haven’t tried to be honest), I decided to move to Apache instead of lighty.

The reason for this is that lighttpd makes it very cumbersome to add new redirects every time that a site has a broken link to my blog, like OSGalaxy, and it also had a few quirks that started to look awful to me. Also, it was not much able to deal with high load, and spammers got me a lot of high loads lately, yesterday I found the server’s load at more than 60, which is not good.

Now, I know that the problem here was mostly in Typo itself and the fact that it has some crazy stuff, but still, Apache is helping with the situation here, with some things that lighttpd I couldn’t get to properly do.

For instance, the redirection of the huge list of articles that are linked around with the wrong names is now done through a RewriteMap hosted on a berkdb file, which should make lookup much faster. Also, mod_negotiation will allow proper image choice for various browsers together with MultiViews. The whole stuff of redirection that lighty was doing is now replaced by a series of RewriteRules.

And of course Typo is now served through Passenger.

Still there are a few issues left, I haven’t been able to make Bugzilla work with mod_perl for instance. But it’s there at least; and hopefully this way I’ll be able soonish to have it available on HTTPS rather than plain HTTP.

Anyway, I’ve left lighttpd configured so if something does not seem to work properly, I’ll just switch back and look for new solutions.

Redirecting URLs after Typo update

This is a note post that might be useful to others, since I had to fight with this myself. Since I moved to the new version of Typo (the blog engine that powers this blog, and which is in turn powered by Ruby on Rails), there has been a huge amount of redirection hits on the server. The reason is that the application stopped using the /articles path prefix on all the pages, and moved around the feeds (Atom and RSS). While Typo does take care of those requests by itself, it seems to be far from optimal, since it’s hitting the Ruby code just to redirect an URL, and seemed to keep the load average quite high. So I added this code in the list of redirections I had already (more on this later in this post):

url.redirect += (
    # Convert the old URL scheme to the new one at lighttpd level to avoid
    # hitting Rails' redirect controller (much slower than this)
    "^/xml/(atom?|rss)(10|20)?/(category|tag)/(.*)/feed.xml.*" => "/$3/$4.$1",
    "^/xml/(atom?|rss)(10|20)?/feed.xml.*" => "/articles.$1",
    "^/xml/(atom?|rss)(10|20)?/comments/feed.xml.*" => "/comments.$1",
    "^/articles/(.*)$" => "/$1",
)

This will translate the URLs directly in lighttpd so that the call won’t hit FastCGI, Ruby, Rails and all the rest.

I would sincerely start to think about moving to cherokee if it was well supported by webapp-config (for bugzilla); while I hate the configuration file of cherokee with all its exclamation marks, lighttpd sometimes seems quite silly to me, like mod_access only being able to deny access by trailing strings. What if I want to deny access to a subtree? (like, for instance /trackback or /admin on a non-SSL version of the server).

And keeping a map that translates broken URLs into working URLs is a mess, and I’m forced to keep one since OSGalaxy not only stills miswrites my name, but also truncates the URLs to my actual blog entries. I started noticing this through Google’s Webmaster Tools control panel since it reported 404 on URLs that I knew were broken; took me a while to find where these URLs came from. In the mean time I have this huge redirection table that I update every three days with the new scan results from Google…

My reasons for an eventual resurrection of Gitarella

You might or might not remember of gitarella, on old project of mine to write a replacement for gitweb. Some of the reasons why I started the project (like gitweb URLs being not so friendly and other things) are probably no more relevant since gitweb improved a lot. Gitarella on the other hand I didn’t work on for such a long time that now is almost certainly not working, since it’s using deprecated git commands and so on.

For a while I thought to just give up on Gitarella in favour of cgit, that is written in C and so it has to be much faster, but I didn’t look into much more because gitweb at the time was suiting my need well enough. Since lately I’ve been working on quite a lot of different projects with some simple patches or series of patches, many of which are available on git repositories which I can just republish with my patches applied as a branch, my git repositories started looking tremendously heavy for the simple gitweb.

One thing that really I can’t understand is how is it possible for gitweb not to generate any kind of cache for the pages. This is quite a mistake in my opinion; while admittedly it’s impossible to properly cache the index page, for instance, and that requires a lot of queries, the commit description pages, the patch pages and so on can easily be cached up, as they cannot really be changed (you can use the SHA1 id as index for those pages).

If I remember correctly what I was told a longish time ago, cgit does cache pages on disk, which would make it an ideal candidate, if it wasn’t that it doesn’t execute git commands at all but rather links in libgit.a; feel free to check your dev-util/git install, there is no libgit.a, that is not an officially-supported way to interface to GIT. As you can guess, I don’t like it.

So I guess I might resume my work on Gitarella, especially considering now lighttpd supports scgi, it makes it quite interesting.

Running git-daemon under an unprivileged user

Some more notes about the setup I’m following for the vserver (where this blog and xine’s bugtracker’s run, the last one I think was similar, but about awstats. You can see the running thread: running stuff as user rather than root, especially if the user has no access to any particular resource.

So this time my target is git-daemon; I already set it up some time ago to run as nobody (just changed the start-stop-daemon parameters) but there is some problem with that: you either give access to the git repositories to world, or to nobody, which might be used by other services too. Also, there has been a lot of development in git-daemon in the past months, to the point that now it really looks like a proper daemon, included privilege dropping, which makes it very nice with respect to security issues.

So here we go with setting up git-daemon to run as unprivileged user; please note first of all that I’ll also apply along these lines the fix to bug 238351 (git-daemon reported crashed by OpenRC), since I’m doing the same thing locally and it looks quite nicer to do it at once rather than multiple times.

I’m also going to assume a slightly different setup than the one I’m using myself, as it’s the one I’d like to convert soon enough: multi-user setup, with multiple system users (not gitosis-based) being able to publish their GIT repositories. This means that together with a git-daemon user, we’re going to have a git-users group for the users who are able to publish (and thus write to) git repositories:

# useradd -m -d /var/lib/git -s /sbin/nologin git-daemon
# groupadd git-users
# chmod 770 /var/lib/git
# setfacl -d -m u:git-daemon:rx /var/lib/git
# setfacl -m g:git-users:rwx /var/lib/git
# setfacl -d -m:git-users:- /var/lib/git

This way, the users in the git-users group may create subdirectories (and files) in /var/lib/git, but only git-daemon (-and other git-users-) would be able to access them in reading (-the users can then further restrict the access by removing git-users access to their subdirectories, either directly or through a script-; edit: I modified the commands so that other git-users don’t have, by default, access to other users’ subdirectories, they can extend it further). Since most likely you want to provide gitweb (or equivalent) access to the repositories, you might want also to allow your webserver to access that; in my case, gitweb run as the lighttpd user (I haven’t got around to split the various CGIs yet), so I add this:

# setfacl -m u:lighttpd:rx /var/lib/git
# setfacl -d -m u:lighttpd:rx /var/lib/git

This way lighttpd will have access to /var/lib/git and by default to all the subdirectories created.

Now it’s time to set up the init script correctly, since the one that dev-util/git ships at the moment will execute git-daemon only as root. We’re going to change both the start and stop functions, and generically the init script too. The first step is to add a PIDFILE variable outside the functions, we’re going to call it /var/run/git-daemon.pid. The proper fix for the bug linked above allows an override for PIDFILE, but this is not important, to me at least, and should stay only in the official script. Of course /va/rrun is not accessible to the git-daemon user, but this is not the problem; since git-daemon can drop privileges by itself, it will create the pid file beforehand. In my original idea, I was thinking of using start-stop-daemon directly to change the git-daemon user so that it wouldn’t run at root at all, but since the wrapper is mostly a hack, to run as a different user, I decided (for now) to let git-daemon handle its privileges dropping. (It would be nice if some security people assessed what’s the best way to handle this to make it official policy to follow); in that case, a subdirectory to /var/run was needed, since it had to be accessed by an unprivileged user.

After adding the PIDFILE variable, you’ll have to change the invocation of start-stop-daemon; you have to remove the --background option to ssd and replace it with the --detach option to git-daemon itself, and add to the latter also the --pid-file, --user and --group options; you also need to add the --pidfile (look here: no dash!) option to both invocations of ssd (not to the git-daemon parameters!). At that point, just restart git-daemon and you’re done. The final script looks like this:

#!/sbin/runscript
# Copyright 1999-2008 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: /var/cvsroot/gentoo-x86/dev-util/git/files/git-daemon.initd,v 1.3 2008/09/16 17:52:15 robbat2 Exp $

PIDFILE=/var/run/git-daemon.pid

depend() {
        need net
        use logger
}

start() {
        ebegin "Starting git-daemon"
                start-stop-daemon --start 
                --pidfile ${PIDFILE} 
                --exec /usr/bin/git -- daemon ${GITDAEMON_OPTS} 
                --detach --pid-file=${PIDFILE} 
                --user=git-daemon --group=nogroup
        eend $?
}

stop() {
        ebegin "Stopping git-daemon"
                start-stop-daemon --stop --quiet --name git-daemon 
                --pidfile ${PIDFILE}
        eend $?
}

I also dropped the --quiet option to ssd since I had to look for some error at startup some time ago (I also have –syslog option configured through the conf.d file).

Although most of these changes could be handled by conf.d alone, I think it’d be nice if the default script shipped with dev-util/git included not only the fix for the bug but also the privilege dropping: running git-daemon as root is, in my opinion, craziness.

Making awstats run as an unprivileged user

I admit I never was a great sysadmin. My most important sysadmin work was almost a decade ago, for Dragons.it (site is dead since a long time ago now, too), on a Windows 2000 Server system on the other side of Italy (in Genova), that was running the site (quite simple), a forum software (and I tried quite a few at the time) and the Sphere emulator for Ultima OnLine. If you were an Italian player at the time you might have heard of Dragons’ Land, or *Heaven*…

Anyway since last November I’ve started sysadmining a vserver to keep this blog running, and the xine bugtracker too (and by the way, thanks again IOS for the hosting). There were a few things that I left TODO before, and I’m now doing them as I see time for them.

One of these things is to let awstats run as an unprivileged user, instead of as root as it was doing before. I’m writing down what I did here, so that I’ll remember if I ever have to do this again.

The first step is of course to create an awstats user, and giving it full access to its home directory

# useradd -d /var/lib/awstats -s /sbin/nologin
# chown -R awstats /var/lib/awstats

As the configuration files needs to be read by the user too, let’s make that accessible in read-only mode by that user, leaving write access to root:

# chown -R awstats:root /etc/awstats
# chmod 570 /etc/awstats
# chmod 460 /etc/awstats/*

As the awstats CGI is ran as lighttpd user, I need to let it access both the

Now, you need to let this user access the webserver logs, in my case I’m using lighttpd, thus the directory I have the logs in is /var/log/lighttpd, which is owned by the webserver user. To be able to restrict the awstats user access, I need to use ACLs:

# setfacl -m u:awstats:r /var/log/lighttpd/*
# setfacl -m u:awstats:rx /var/log/lighttpd
# setfacl -d -m u:awstats:rx /var/log/lighttpd

Now it’s time to change the script that executes awstats, in my case on multiple virtual hosts. I’m not posting the whole script as it’s quite fugly, but just the general rule is:

# su -s /bin/sh -c 'perl $path_to_awstats/hostroot/cgi-bin/awstats.pl 
  -config=$your_config -update' awstats

Now of course your awstats CGI has to access both the configuration and the datafiles. ACLs come useful again:

# setfacl -m u:lighttpd:r /etc/awstats/* /var/lib/awstats/*
# setfacl -m u:lighttpd:rx /etc/awstats /var/lib/awstats
# setfacl -d -m u:lighttpd:rx /etc/awstats /var/lib/awstats

Now of course you can guess that you cannot ask the CGI to parse the logs to regenerate the data, because it doesn’t have permission to write to the datafiles, but that’s exactly what I want :)

Now you have awstats running at the lowest privilege possible, but still able to access what it has to, hopefully this should be a nice mitigation strategy.

[Now of course if someone knows I made a mistake, I’d very much like to hear about it :)]

Summer of Code ideas for other projects

I know I already filled the Gentoo SoC project page with ideas, but I still got a few more to propose for organisation which I’m not even sure will be on SoC itself. Think of this post just as a braindump of stuff I’d like from other projects and which I would see well suited for Summer of Code.

  • for lighttpd, a PAM-based authentication module, so that, for instance, I could allow all the xine developers to access the server where xine bugzilla and also access a private HTTP directory on it with a single user and password database (the system);
  • for libarchive (FreeBSD), built-in support for lzma (de-)compression algorithm, so that it could handle GNU’s .tar.lzma files on its own;
  • for glib, a confuse-like configuration file parser, so that I could get rid of that dependency on unieject;

New blog engine: Episode III

I haven’t written about this in the latest days because I was pretty much tired of trying to get Roller working; even after configuring it (which is far from trivial, as the instructions not only are MySQL-centric, but also make a lot of assumption on the user being an advanced Tomcat user, I wasn’t able to get it to work; first I got some ClassNotFoundExceptions, then I got some NullPointerExceptions due to the configuration being “incomplete”; I was able to fix the CNFE by copying over some of the system’s JARs rather than leaving there the ones actually provided with Roller, as I had anyway to copy over some other JARs, as hibernate wasn’t distributed together with the package because of license restrictions. My last resort would have been writing a source ebuild for Roller, but I know nothing of Java ebuilds, leave alone ant itself; and I’m on vacation.

Luckily, yesterday blojsom’s author David Czarneski answered me, pointing me to the right direction. I cannot just use blojsom as it is, because, well, I do have some extra requirements that are not currently satisfied (being able to use Textile on new entries, rather than having to change the database every time for instance), so I’ll have to look up JSP a bit and try to write an extra plugin to add a Textile checkbox on the page; I’ll probably also have to check why the <notextile> tag is not respected by the textile rendering classes that blojsom is using (even if there is that string inside the class files, at least strings tell me so); I looked at it last night, but I was offline (thanks to my ISP as usual) so I couldn’t get to the API documentation of textile4j to see what was all about.

My database conversion script is not that bad right now, I can easily convert blog entries, comments, trackbacks and categories (although this is a special case for my blog, as I’ve been using English and Italian as categories, while on the new one I’ll be reporting them as languages but not as different categories, as blojsom does not allow more than one category per post); I found an easy way to get a hold of the tags from Typo’s database and then import them into blojsom’s entries metadata, thanks to PostgreSQL and the ability to create sql-level functions and aggregates. Basically, all I care about is now imported correctly, when leaving out a few particularities, like the notextile thing I wrote above, and <pre> tags that I’m still unsure how to handle (basically I’d just have to replace them with <pre> and making sure that < and > are replaced by < and > inside the tags.

I do have to edit the template a bit if I want to make more prominent the links to the Comments feed and the per-category feeds (that are needed for Planet Gentoo for instance). And I should make the conversion script take care of giving me a table of redirects between the permalinks of Typo and the ones of blojsom, so that I can actually get whatever webserver I’ll chose to maintain compatibility with links to the current blog. One thing I actually would miss of Typo is most likely the ability to browse by tag, but it shouldn’t be so much of a problem, although I do plan to do something like changing a few tags (like xine) into categories to better suit what I actually take care of nowadays.

My problem now is deciding which server software will support my blog when I move it to blojsom; Lighty does not seem to really like doing reverse proxy for Tomcat (it gets timeout during connection), I’m not sure how much Tomcat would like to start serving also all the static content, plus it does not support FastCGI I use for Gitarella, and Resin still fail to work here. I should probably try older versions of Resin as nelchael suggested me, I just hadn’t had time yet (both today and tomorrow afternoon I’ll be busy, so I can’t check it out, and the night I’ve been either relaxing or working on something else lately).

Oh well, sooner or later, I’ll move the blog :)