Yesterday I ranted about installing RT for lighttpd and I was suggested to look at otrs — I did, and even though the lack of
webapp-config support/requirement was appealing, having everything installed in
/var/lib didn’t sound too good.. so I simply set it aside, and tried again wtih RT.
Since the whole lighttpd issue at least let me read an error, I decided to assume that the PostgreSQL bug I was hitting was the reason why it wouldn’t load
webmux.pl either, which meant I could then try Apache once again (I did change the user’s home directory so that it wouldn’t try to use
/dev/null). Lo and behold was right,
webmux.pl was loaded properly… but then, I got Apache to crash, exactly like it did on my original server (where the PostgreSQL bug wouldn’t have hit in the first place). Okay time to debug it.
Since this time it didn’t require me to keep the production web server offline to debug, I decided to spend some time and bandwidth to get to the bottom of the problem. So after a rebuild of Apache, apr and mod_perl with debug symbols, a push to the server, and a binpkg reinstall.. I was looking at the backtrace, that pointed at this little piece of code:
/* httpd core open_logs handler re-opens s->error_log, which might
* change, even though it still points to the same physical file
* (.e.g on win32 the filehandle will be different. Therefore
* reset the tracing logfile setting here, since this is the
* earliest place, happening after the open_logs phase.
* Moreover, we need to dup the filehandle so that when the server
* shuts down, we will be able to log to error_log after Apache
* has closed it (which happens too early for our likening).
MP_RUN_CROAK(apr_file_dup(&dup, s->error_log, pconf),
"mod_perl core post_config");
In particular, the call to
apr_file_dup has the second parameter at NULL. What’s going on here? Well, it’s a bit complex. The first problem is that for whatever reason, up to last night, the
mod_perl ebuild enabled debug information and tracing by default without controlling that with an USE flag. While it is true that if you didn’t enable tracing you wouldn’t get any extra log, the tracing code as you can see is quite invasive, so I want to thank Christian for fixing this up with a new debug USE flag since last night.
Anyway, when tracing support is built into
mod_perl (even though it is not enabled in the configuration file), the module decides to save a copy of the file pointer in its own configuration to know where to output tracing if it ever get requested. But what it doesn’t check for is whether there is said pointer.
In my case, Apache is set with
ErrorLog syslog; this is simply because I pass most of the information to metalog, which can then filter down the information, and take care of rotating the log files by itself. This is also one of the very few differences between the Apache configuration for xine-project.org and my servers, since the former just doesn’t get as many ModSecurity hits (it doesn’t have that module enabled at all, to be precise).
It goes without saying that when the error log is set to output to syslog, and not to a file, there is no
s->error_log file pointer, which explains the NULL parameter and, in turn, the Apache crash. I reported this as issue #75240 which is yet to be fixed.
So I spent the best part of a month fighting with a very stupid bug. Lovely.
Unfortunately the painful installation is not done yet. There is one more issue: the
var directory within the RT installation is marked as writeable by the user the code will run as (so the
rt user in case of lighttpd/FastCGI, and
apache in case of Apache’s
mod_perl)… but the
mason_data sub-directory isn’t! And of course Mason needs to write to that the cache files that are then served to the user. So the default installation still require quiiite a bit of fiddling.
Are we done yet? Not a chance. As you might have guessed at this point, if you know the software I’m working with, I encountered another bothersome point: when using
mod_perl, Apache does not drop privileges to a different user, which means that all the RT code is processed as the Apache user. And I don’t like that at all.
It is true that just dropping privileges is not enough all by itself — before moving to a KVM guest, I had quite a bit of problems with Passenger dropping to a different user, but not processing limits, which didn’t solve the problem of a too greedy Ruby that should have been killed and wasn’t; luckily you can solve that with grsec by enforcing per-user limits.
So my next step is probably getting Apache’s FastCGI set up and get RT to use that.. after that I will probably move it back on this server, so that the remote PostgreSQL connection is avoided altogether, since the problem with Apache crashing has been found anyway, and even if I have to fall back to
mod_perl I know it’ll work just fine as long as I keep the debug USE flag disabled.
Prepare to hear again from me, I’m afraid.