ModSecurity and Debian, let the challenge begin

Some of you might have already read about my personal ruleset that I developed to protect my blog from the tons of spam comments that it receives daily. It is a set of configuration files for ModSecurity for Apache, that denies access to my websites to crawlers, spammers and other malicious clients.

I was talking with Jean-Baptiste of VLC fame the past two days about using the same ruleset to protect their Wiki, which has even worse spam problems than my blog. Judging from the logs j-b has shown me, my rules already cover most of the requests he’s seeing (which is a very positive note for my ruleset); on the other hand, configuring their web host to properly make use of them is proving quite tricky.

In Gentoo, when you install ModSecurity you get both the Apache module, with its basic configuration, and a separate package with the Core Rule Set (CRS). This division is an idea of mine to solve the problem of updating the rules, which are sometimes updated even when the code itself is unchanged — that’s the whole point of making the rules independent of the engine. By using the split package layout, the updater script that is designed to be used together with ModSecurity is not useful on Gentoo so it’s not even installed — even though it is also supposedly flexible enough that I could make it usable with my ruleset as well.

In Debian, though, the situation is quite more complex. First of all there is no configuration installed with the libapache-mod-security package, which only installs the file to load the module, and the module itself. At a minimum, for ModSecurity to work you have to configure the SecData directive, and then give it the set of rules to use. The CRS files, including the basic configuration files, are installed by the Debian packages as part of the documentation, in /usr/share/doc/mod-security-common/examples/rules/.

I’ve now improved the code to provide an init configuration file that can be used without CRS.. but it seriously makes me wonder how can Debian admin deal with ModSecurity at all.

Finally, a consideration: the next version of ModSecurity will have support for looking posted URLs up in the Google Safebrowsing database, which is very good as an antispam measure.. I have hopes that either the next release or the one after will also bring Project Honey Pot http:BL support, given the Apache module was totally messed up and was unusable. That would make it a sweet tool to block crawlers and spammers!

Rails is not for fire-and-forget hosting!

in my position of Gentoo Ruby team member (and thus Ruby packager), I’d like to give a couple more insights regarding what Jürgen wrote about on the Diaspora topic. I guess I should be considered weighting slightly more than him not only because I dislike Python myself (which seems to be the base criticism moved to Jürgen) but also because I did work with Rails, and this very blog runs on Rails!

I have some personal grudges with Rails, but in general I don’t dislike the MVC approach they take. Unfortunately, while the theory is often pretty solid, the implementation leaves to be desired a lot. A framework that suggests doing agile “Test Driven Development” and fails its own tests is not something nice to look at. But that’s not the problem at this point.

The first problem sits at the way the Diaspora developers worked on the project; someone else already dissected some of the security issues (some, not all of them, do note); and it shows that whoever wrote the code wasn’t a huge expert of Rails to begin with. Beside the number of mistakes in security ideas, working on the older Rails 2 framework is, by itself, a bad idea; considering they are targeting a “later” release, working on the newer Rails 3 branch would have reduced the possible upgrade headaches in the future.

Here comes the biggest problem of all though: Rails is far from a fire-and-forget hosting framework; while PHP has had a long history of incompatibilities between versions I don’t think it ever reached the amount of incompatible versions between one release and the next. Basically, when you write an application with Rails, unless you rely only on the very basic architecture, there is an almost 100% chance that the application will only work with the current version of Rails; full version of it, so if you write it on 2.3.5 it might not work on 2.3.8. And Typo, that I use for this blog, still only work on 2.3.8, not 2.3.9!

To complicate the matter, the single-version-compatibility problem extends not only to Rails itself (which include its own helper gems, such as activerecord) but also to their dependencies, such as Rack, and to the other gems that the application might require. This is one thing that makes maintaining Ruby and Rails packages in a distribution such a hell. To “solve” this problem, Rails 3 mandates the use of Bundler, which, well, creates bundled libraries in every Rails application. Indeed this “solves” (or to be precise, hides) the problem of different versions of packages for different applications, but at the price of possibly leaving around older libraries for older applications.

One thing has to be told about Bundler at least: it should make it much easier to update an application running on Rails, as it takes care of keeping the dependencies at the right version, without causing all the dependency hell that was happening before. Unfortunately, the dependency hell is not the only thing that makes the upgrade of a Rails application (or, for what matters, any other web application) complicated; the other is migrating the database. Rails supposedly provides tools for migrating the databases in a clean way, but often enough they only work with a subset of the database drivers Rails comes with. This is supposed not to be a problem with the “NoSQL” databases, but I’m sincerely quite sceptic about those. Finally, there is the problem of customisations, even if it’s just themes, since they tend to embed enough Ruby logic to require full porting.

This brings us to two points: updating the dependencies of a Rails app only happens when updating the app and updating a Rails app is far from trivial. This easily translates to “updating a Rails app often enough happens only when the administrator is forced to”; newer version of the interpreter breaking the older application, new hosting, new protocol versions and so on so forth.

And if you am to tell me that Rails is pretty secure a framework, please note that there are at least six bugs in the GLSA Archive since 2006, which means an average of slightly over once an year. I can’t think of many security notices even coming for third party gems, and that makes it very difficult to assess their security status.

Up to now, I don’t think I have seen much of an interest for security-evaluating third party gems, as most of them don’t have so much known use (as far as I can tell, most of the Rails applications out there are developed as closed-source, proprietary web applications, not Free applications). Those who are actually used by more widely used applications such as Typo and Redmine probably undergo more scrutiny, but even those can be considered small fishes (WordPress, Trac, Drupal definitely look much meatier targets). With the coming of Diaspora, especially with its idea of “distributed social network” (see the related post for more details), these gems are likely to become an interesting target as well. Especially the eventually older versions that the Diaspora releases will be looking into.

Do I think a different language/framework would have worked better? Not sure sincerely; I don’t like PHP for many reasons, but I have been told that newer versions have much better frameworks than what we had around version 4.2 which is the last version I actively used. Since PHP is (even more than Ruby) designed to be used for web applications, one should suppose it has gained more and more features to make it difficult to make mistakes, even when that makes it more difficult to use it for general purposes. For what I’m concerned, I doubt Python would get much better results either.

Rails is, in my opinion, a pretty good framework if you actually maintain a web application; it’s not good if you write an application for somebody to use and leave it there. By its own definition, it’s “agile” development, but “agile development” require that you actually follow it. Do you think that the average “Diaspora enthusiast” is going to follow the Agile development cycle, or will just set up an instance and stop caring about updating it when it becomes too difficult?

It’s not a matter of language or framework, it’s a matter of architecture; in a project whose basic idea was to allow each user to maintain its own instance, the choice should have gone toward software that made it difficult to misuse, mismanage or keep out-of-date instances running. In this, they seem definitely to have failed.

Humoristic note of the day: if it proves that my concerns come true, and Diaspora become a vessel for more spam, I’ll be renaming that DiaSPAMora…

Tinderbox — Help needed for PHP

This time I got to ask the lazyweb, or the Gentoo powerusers at least, some help for my tinderbox; with PHP and the PHP packages. As an introduction, since I know not all users understand what the tinderbox is and this time I’m asking all users, consider my tinderbox like the ultimate Gentoo system. It’s a simple chroot where as many packages as possible gets built, installed and merged, or at least it tries to.

Most of the problems found in the tinderbox are not really tinderbox-specific; they are generic failures with the latest compiler and glibc, or failures with the most recent version of a library, in a package that is rarely used; basically they are bugs that could be reported by anybody, but the tinderbox makes it easier to report them since I receive all of them in series.

In the recent weeks I started reporting test phase failures too; thanks to Portage’s test-fail-continue feature, I don’t need to skip over all the packages when their tests fails, like it would happen with just tests enabled normally, but it still takes time, and more than a couple of packages tend to freeze during test phases.

Unfortunately, I’m having some problems with the testing of some PHP PECL extensions: the only thing that the log says is the following:

>>> Test phase [test]: dev-php5/pecl-timezonedb-2008.8
make -j14 -j1 test 

Build complete.
Don't forget to run 'make test'.

ERROR: Cannot run tests without CLI sapi.

The obvious answer here is to make sure that the CLI (command line interface) version of PHP is built, but, well, it is! The cli USE flag on the PHP ebuild is enabled in the tinderbox, and yet the tests fail. I’ve asked Christian (hoffie) about this but his interest in PHP does not extend to PECL, and the rest of the PHP team seems to be missing in action.

If somebody has an idea on what the problem is here, I’d be very glad to hear about it.

Similarly, all, or almost all the PEAR-based packages fail to install in my tinderbox; I’ve “opened a bug:”http://bugs.gentoo.org/show_bug.cgi?id=276944 where you can find the details, although there aren’t many; the pear command ends up failing without any notice at all.

And of course this also means that if you’re interested in working on our PHP team, especially on the PEAR and PECL extensions, your help is very needed right now!

The xine website: intro

As it turns out, the usual xine website has gone offline since a few days ago. Since then, Darren set up a temporary page on SourceForge.net servers, and I’ve changed the redirect of xine-project.org which is now sorta live with the same page that there was on SourceForge.net, and the xine-ui skins ready to be downloaded.

Since this situation cannot be left to happen for a lot still, I’ve decided to take up the task to rebuild the site on the new domain I’ve acquired to run the Bugzilla installation. Unfortunately the original site (which is downloadable from the SourceForge repositories) is written in PHP, with use of MySQL for user-polling and news posting, and the whole thing looks like a contraption I don’t really want to run myself. In particular, the site itself is pretty static, the only real use of PHP on it is not having to write boilerplate HTML for each release, but rather write a file describing them, which is something that I’ve used to do myself for my site .

Since having a dynamic website for static content is far from my usual work practises, I’m going to do just what I did for my own website: rewrite it in XML and use XSLT to generate the static pages to be served by the webserver. This sounds complex but it really isn’t, once you know the basic XML and XSLT tricks, which I’ve learnt, unfortunately for me, with time. On an interesting note, when I’ve worked on my most complex PHP project, which was a custom CMS – when CMS weren’t this widespread! – for an Italian gaming website, now dead, I already looked into using XSLT for the themes, but at the time the support for it in PHP was almost never enabled.

I’m still working on it and I don’t count on being able to publish it this week, but hopefully once the site will be up again it’ll be entirely static content. And while I want to keep all the previously-available content, and keep the general design, I’m going to overhaul the markup. The old site is written mostly using tables, with very confused CSS and superfluous spacer elements. It’s not an easy task but I think it’s worth to do it especially since it should be much more usable for mobile users, of which I’m one from time to time.

If I find some interesting technicality while preparing the new website I’m going to write it here, so keep reading if you’re interested.

Flex and (linking) conflicts, or a possible reason why PHP and Recode are so crashy

So, this past week I was off teaching a course about programming and Linux at a company I worked for for a while now. Some of the insight about what people need to know and rarely do know are helping me to decide what to focus on in the future in my blog (and not limited to).

Today, though, I want to blog about not something I explained during the course, but something that was explained to me, about Bison and Flex. It’s related to the output of my linking script:

Symbol yytext@@ (64-bit UNIX System V ABI AMD x86-64 architecture) present 7 times
  /usr/bin/text2pcap
  /bin/zsh-4.3.4
  /usr/lib64/libgpsimcli.so.0.0.0
  /usr/lib64/librecode.so.0.0.0
  /usr/lib64/php5/bin/php-cgi
  /bin/zsh
  /usr/lib64/php5/bin/php

yytext, together with a few other yy symbols are generated by Flex when generating the code for a lexer (for what most of my readers are concerned, these are part of a parser). These symbols are private to a single parser, and should not be exported to other parsers. I wasn’t sure about their privateness so I haven’t reported them up to now, but now I am sure: they should not be shared between two different parsers.

Both librecode and PHP export their parser’s symbols, which would create a situation where the two parsers are sharing buffers and… well, let’s just say you don’t want to share a plate between someone eating Nutella, and someone else eating Pasta, would you?

This might actually cause quite a few problems, and as hoffie said, recode is used by PHP and is often broken when used together other extensions. I can’t be sure the problems are all limited to this, but this is certainly a good point to start if we want to fix those.

Easy way out, would be to make sure that php executables don’t export symbols that extensions don’t need to use; proper way out would be adding to that also proper visibility inside recode itself, but I wonder if it’s still maintained, release 3.6 is quite old, and even patching it is a hard task as it doesn’t even recognise AMD64 by default.

And what about imported libraries?

Following the previous blog here also a list of projects that seem to like importing libraries, causing code duplication even for code that was designed to be shared.

  • cdrkit, again, contains a stripped down version of libdvdread, added, of course, by our beloved Jörg Schilling; bug #206939; additionally it contains a copy of cdparanoia code; bug #207029

  • ImageMagick comes with a copy of libltdl; bug #206937

  • not even KDE4 seems to have helped libkcal which even in its newest incarnation ships with an internal copy of libical, causing me to have three copies of it installed in my system;

  • libvncserver comes with a copy of liblzo2; actually there are two, one in libvncserver and one in libvncclient; even the source files are duplicated!; bug #206941

  • SDL_sound, Wine and LAME seem to share some mp3 decoding code, which seems to come originally from mpg123;

  • cmake couldn’t stay out of this, it comes with a copy of libform (which is part of ncurses); follow bug #206920

  • I’m not sure what it is, but DigiKam, Numeric (for Python) and numpy have a few functions in common; the latter seems to have even more than that in common; bug #206931 per Numeric and numpy, and bug #206934 for DigiKam.

  • ghostscript comes with internal copies of zlib, libpng, jpeg and jasper; unfortunately jasper is also modified, for the other three there’s bug #206893; by the way, the copies are present in both the gs command and in the libgs library;

  • OpenOffice comes with loads of duplicated libraries; in particular, it comes with its own copy of icu libraries; see on bug #206889

  • TiMidity++ comes with a copy of libmikmod; bug #206943

  • Korundum for KDE3 has a copy of qtruby embedded, somehow; I wonder if it isn’t a fluke of our buildsystem; bug #206936

  • gdb contains an internal copy of readline; –bug #206947

  • tork contains a copy of some functions coming from readline; bug #206953

  • KTorrent contains a copy of GeoIP (and to think I removed the one in TorK as soon as I’ve spotted it); bug 206957

  • both ruby and php use an internal copy of – I think – oniguruma; I haven’t looked if it’s possible to add that as a system library and then use it; bug #206963

  • MPlayer seems to carry a copy of libogg together with tremor support; bug #206965

  • pkg-config ships with an internal copy of glib; bug #206966

  • tor has an internal copy of libevent’s async dns support; funny, as it links to libevent; bug #206969

  • gettext fails to find the system copy of libxml2, falling back to use the internal copy; at least it has the decency of using a proper commodity library; bug #207018

  • both Perl and Ruby have a default extension based on SDBM, a NDBM workalike; there seems not to be a shared version of it, so they just build the single source file in their own extensions directly, without hiding the symbols; beside the code re-use not being available, if a process loads both libperl and libruby, and in turn they load their sdbm extension, stuff’s gonna hurt;

  • enchant has an internal copy of Hunspell; probably due to the fact that old Hunspell built only static non-PIC libraries, and enchant uses plugins; bug #207025; upstream fixed this in their Subversion repository already;

  • gnome-vfs contains an internal copy of neon; funny as it depends on neon already, in the ebuild; bug #207031

  • KOffice’s Karbon contains an internal copy of gdk-pixbuf; bug #209561;

  • kdegraphics’s KViewShell contains an internal copy of djvulibre; bug #209565;

  • doxygen contains internal copies of zlib and libpng; bug #210237 ; this time I used a different method to identify it as doxygen does not export the symbols;

  • rsync contains an internal copy of zlib; bug #210244 ;

Unfortunately making sure that what I’m reading is true data and not false positive, looking at the output of my script, becomes more difficult now for the presence of multiple Sun JDK versions; I have to add support for alternatives, so that different libraries implementing the same interface don’t show up as colliding (they are that way by design).

Sometimes bad experiences let you learn something

Like having to rebuild one’s /usr/lib64 tree helps to learn that there are quite a few duplicated files installed in a system.

The first thing I have to suggest to anybody who happen to have my problem is: make sure you remove the debug info files before starting the procedure: they are big and a lot, and if you, like me, still have partial directories present, it’s simple to find them and remove them altogether, will save you from a lot of md5sum calls. The script I’m using (actually it’s a oneliner, albeit having two while statements in it) is still a test run with echo rather than the commands themselves, but when I’ll be sure it works as intended, I’ll see to post it here, in case someone else might need it.

Then there are the tricky parts: the script being what it is, it will create a bit of a stir when a given file is present with the same md5sum in different places. This is easier to see with empty files or files containing just ‘n’ having the same MD5SUM (.keep files are the most common offenders on this); to avoid having to copy those files back all over (especially since mtime will be changed, and that is bad), I’ve added a simple -size +1 to skip over files of 1 byte or less. Hopefully should take care of it.

But of course there are duplicated files. PHP is a major offender on this: not only it installs a copy of config.guess and config.sub files, it also have some duplicated libpcre header files, but the absolute winner of the “let’s bloat a system” contest is vmware-server, as it comes with a copy of Perl itself, and some of the files are the same to the MD5SUM!

In addition to this, my script shown that there are packages installing stuff in /usr/lib when they shouldn’t. The multilib-strict warning usually allows to find these packages, but in the case of xc, for instance, there are no arch-dependent files, so multilib-strict does not trigger (obviously). It is not really a problem, as arch-independent files are fine in /usr/lib, but as far as I can see, those files should instead go to /usr/share/xc.

* scribbles something on his TODO list about this *

Preparing a new box

So, I’ve started preparing a new box for Gentoo/FreeBSD beside farragut. Actually I started yesterday in my so-called break, as I’m going to need it for a job (I need to work with PHP and MySQL 4.1, and I don’t want to have them on my workstation or on farragut itself ;) ). This will also give me the opportunity of keyword more stuff, and you’ll now see that also PHP 4.4 is available on Gentoo/FreeBSD (at least for basic options), yuppie! :)

Unfortunately, seems like I did something wrong either when putting it up or with the stage, I’m not yet sure, as rpc.lockd dies at fork, and so does MySQL :/

A part that, I’ve now two boxes where distcc sends the data to build, watching freebsd-lib that compiles at -j5 is really good ;) I also already installed the cross compiler for PowerPC on that box, so that I can use the three of them to be distcc servers for the poor iBook.

New box at hand, I also keyworded a few more things, like hunspell and myspell-{it,en}.. you won’t probably know why, but they are dependencies of OpenOffice, and as I remembered people being working on having it built with GCJ, and that would help to get it working on Gentoo/FreeBSD ;) Thanks a lot to Kevin for having committed the libtool patch to have it built as a shared library, too :)

On a different note, my “no coke” policy is doing fine, it’s not 5 days and it’s the longest time I didn’t drink Coca-Cola in the last few years :) Although, I admit, it’s going to be difficult as the temperature is increasing too much for my likes.
Why oh why do I leave in southern Europe? :P