Get the thorn out

Sometimes it’s necessary to stand by your choices even when they are controversial. We all know that by now. One nice thing of volunteer Free Software is that if you don’t like a controversial decision, you can just leave, or fork, or in any case get away from who made the decision you don’t like.

I left when a decision was made that I didn’t like, I came back when the situation was, to my eyes, corrected. It was and is my freedom.

It so happens that the council made a decision, probably the only real decision since the Council was formed, the first time the council actually grew balls to do something even if that wasn’t going to please everyone.

Am I happy about the decision? Well not really, as it seems to me somewhat silly that we had to go down the road of actually making this decision. But I’m not displeased by the outcome. I think we should have taken this decision a long time ago actually.

To stop speaking like an abstract class in C#, the decision was to retire a number of developers, that for what the Council could gather and upon on are all considered poisonous to the project. Poisonous does not mean they have zero contribution, just that their contribution is shaded by disruption to the wellness of the project. This disruption comprises of a lot of actions, not just one or two. They might even not be huge by themselves, but if they are a lot, well, the size of them starts not to count (the so-called death of a thousand cuts).

This is not meant as a signal that you shouldn’t be criticising Gentoo. Critics are welcome if they are constructive. You can also work in parallel on competing products (hell, Greg KH is listed as a Gentoo Developer but works for Novell!), just as long as you don’t start to use your rights as a Gentoo developer to force people to move on something else, I’d suppose.

It doesn’t even related only to actions on Forums, as our Forums Admins are able to tackle those problems on their own (and I do trust with it). It relates to a lot of small things once again.

In general, the signal that we’re trying to bring through is “don’t poison your contribution to Gentoo”. You can criticise, you can joke, but if the people you joke upon don’t laugh with you at the joke, then apologise and stop it! Otherwise you’re just walking poison and we’re going to get rid of you, sooner or later. Hopefully sooner next time, before developers resign or reduce their involvement because of your actions.

For the Italian readers who read my political rant from yesterday (for those who can’t read Italian it’s a piece talking about job politics, what Italian unionists and politicians do and how it harms the system), you can see a slightly similitude between the two issues. In both cases you have to get rid of some people to avoid leaving everybody out at one point.

Oh and if we wanted to get rid of people working on Paludis, you can expect all of them to be gone, so no that’s not the cause either. It’s just incidental.

And for what it’s worth, nobody is trying to get rid of everybody they disagree with. Otherwise me and Donnie would be trying to get rid of each other ;) As I said before we don’t always agree on how to proceed with things, and we can be often found on opposite sides of an argument. Still we work together, and I’d say we do that quite happily, because of our difference in views: it usually stops us from going with the extremes. But you’ll never find me and Donnie exchanging snide comments, or insulting each other.

In Italy it officially seems spring, this spring cleaning was long due.

Spring cleaning in your $HOME: spamassassin with SQL backend

This is going to be the first of a series of posts about «spring cleaning» of your home directory. We’re also in the right season, so I’m not that Off Topic for now :)

Why do I care about having a clean home directory? Well, it vastly depends on my setup, but I think this is common enough to grant some discussion about it. I have my /home in a partition that is set up with DM to be replicated on my two harddrives, providing me a basic RAID1 setup for that single partition; this allows me to be relatively safe from a harddisk crash, for what concerns my important data, like SSH and GPG keys, configuration files, mail and so on.

The problem with this is that everything that gets written to my home directory has to be written on two disks, and is often a performance drawback; for this reason, I tend to scatter the non-essential data (like repository checkouts and similar) in different partitions, as they also don’t require much backup most of the times. This also brings me to hate the software that uses my home directory to save cache data, because it ends up using RAID1 for disposable data that I wouldn’t want to have backed up together with really important data.

So, this series of posts are going to explain how I try to keep my home directory clean from cache data, in part to help someone else that might want to do the same, in part for me to remember how and why I did something ;)

One of the first services that I thought of, using data in my home directory, was spamassassin; while the amount of spam mail I receive has now decreased a lot since I left Gentoo (as I’m not in 10 aliases), I still receive quite a bit, so I’m not yet ready to remove my local SpamAssassin filter; it’s probably a sane idea especially since for xine-lib I’m going to repeat my email address over and over at every commit ;)

SpamAssassin saves some data in ~/.spamassassin, namely the bayesian tokes database, the automatic whitelist and your extra preferences. As I don’t have extra per-user preferences (I use SpamAssassin in a single-user environment), I don’t need those, but I do need bayes and awl to work. Since I already have Amarok using PostgreSQL in this box, I decided to use PostgreSQL to also save SpamAssassin data.

Unfortunately, as it is the ebuild does not allow you to easily add postgres support, but this is probably going to be fixed in the future; I have a better ebuild in my overlay and I’ll see to send the changes to Perl team now; in the mean time, the things to change are not that much.

The documentation on setting up SpamAssassin with SQL backend can be found on SpamAssassin Wiki, and it applies to PostgreSQL as well as MySQL, even if some things has to be changed around, nothing major though.

During this post I’ll assume that both PostgreSQL and SpamAssassin are only reachable on localhost, and that you don’t need extra security concerns like a password to the database or something like that.

First of all, stop SpamAssassin (if your mail system is not mission critical) and start backing up the bayesian database:

% sudo /etc/init.d/spamd stop
% sa-learn --backup > sa-bayes-backup

This will create a sa-bayes-backup file with the bayesian token currently saved in your home directory in a Berkley DB file.

After this, change the useflags for mail-filter/spamassassin: disable the berkdb useflag and enable the postgres useflag; ignore the warning currently thrown by the ebuild that the bayesian filter needs the DB_File module, it works just as fine with PostgreSQL as backend, but you have to configure it. You might also want to enable the doc useflag, as right now it’s unfortunately controlling the installation of user-serviceable documentation; in alternative, just get an extracted copy of SpamAssassin’s tarball to use as a reference.

Now, it’s time to create the user and the database to store the data into.

% sudo -u postgres -i
postgres % createuser spamassassin
postgres % createdb -O spamassassin spamassassin
postgres % bzcat /usr/share/doc/spamassassin-3*/sql/bayes_pg.sql.bz2 | 
  psql -U spamassassin spamassassin
postgres % bzcat /usr/share/doc/spamassassin-3*/sql/awl_pg.sql.bz2 | 
  psql -U spamassassin spamassassin

You could also use per-user preferences stored in SQL backend if you really need them; as I don’t need them, I instead edited /etc/conf.d/spamd replacing the -c option (which forces spamd into creating per-user configuration files if missing) with -x (which says to spamd to ignore per-user options, that is just what I need.

Now it’s time to set up the database connection from SpamAssassin; although the ebuild suggests to use the file, that is not readable by users, to configure the connection to the database, if you plan to use sa-learn from your user, you might prefer to just enable it in a world-readable file, especially if you don’t have any security concerns on the use of the spamassassin PostgreSQL database; this is what I have done anyway:

bayes_store_module      Mail::SpamAssassin::BayesStore::PgSQL
bayes_sql_dsn           DBI:Pg:dbname=spamassassin;host=localhost
bayes_sql_username      spamassassin
bayes_sql_override_username     spamassassin

auto_whitelist_factory  Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsn            DBI:Pg:dbname=spamassassin;host=localhost
user_awl_sql_username   spamassassin

At this point, SpamAssassin will only use PostgreSQL for its databases, so you can just remove your ~/.spamassassin directory, it will not be recreated. Let’s then start PostgreSQL (or make sure it’s started already, and then restore the Bayes database:

% sudo /etc/init.d/postgresql start
% sa-learn --restore sa-bayes-backup

Now you could restart spamd and have your system back already, but there is one problem with the current ebuild (the one in my overlay does not need this change though): it does not depend on PostgreSQL. From one side it’s correct, you might not be using the localhost pgsql to store the data, so in that case you don’t have to care to start spamd after postgresql, but if you’re going to use a local configuration, you certainly don’t want spamd to start before the PostgreSQL database is up, so you have to edit the /etc/init.d/spamd script, and add in the depend() function, a simple use postgresql line; add postgresql to your default runlevel, and that should be it.

At this point you’re set, just restart your spamd, and it won’t use your homedirectory to store cache data anymore!

% sudo /etc/init.d/spamd start