Last night I ended up in Bizarro World, hacking at Jürgen’s gmaillabelpurge (which he actually wrote on my request, thanks once more Jürgen!). Why? Well, the first reason was that I found out that it hasn’t been running for the past two and a half months, because, for whatever reason, the default Python interpreter on the system where it was running was changed from 2.7 to 3.2.
So I tried first to get it to work with Python 3 keeping it working with Python 2 at the same time; some of the syntax changes ever so slightly and was easy to fix, but the
2to3 script that it comes with is completely bogus. Among other things, it adds parenthesis on all the
You might be asking how comes I didn’t notice this before. The answer is because I’m an idiot! I found out only yesterday that my firewall configuration was such that postfix was not reachable from the containers within Excelsior, which meant I never got the fcron notifications that the job was failing.
While I wasn’t able to fix the Python 3 compatibility, I was able to at least understand the code a little by reading it, and after remembering something about the IMAP4 specs I read a long time ago, I was able to optimize its execution quite a bit, more than halving the runtime on big folders, like most of the ones I have here, by using batch operations, and peeking, instead of “seeing” the headers. At the end, I spent some three hours on the script, give or take.
But at the same time, I ended up having to workaround limitations in Python’s imaplib (which is still nice to have by default), such as reporting fetched data as an array, where each odd entry is a pair of strings (tag and unparsed headers) and each even entry is a string with a closed parenthesis (coming from the tag). Since I wasn’t able to sleep, at 3.30am I started re-writing the script in Perl (which at this point I know much better than I’ll ever know Python, even if I’m a newbie in it); by 5am I had all the features of the original one, and I was supporting non-English locales for GMail — remember my old complain about natural language interfaces? Well, it turns out that the solution is to use the Special-Use Extension for IMAP folders; I don’t remember this explanation page when we first worked on that script.
But this entry is about Python and not the script per-se (you can find on my fork the Perl version if you want). I have said before I dislike Python, and my feeling is still unchanged at this point. It is true that the script in Python required no extra dependency, as the standard library already covered all the bases … but at the same time that’s about it: it is basics that it has; for something more complex you still need some new modules. Perl modules are generally easier to find, easier to install, and less error-prone — don’t try to argue this; I’ve got a tinderbox that reports Python tests errors more often than even Ruby’s (which are lots), and most of the time for the same reasons, such as the damn unicode errors “because LC_ALL=C is not supported”.
I also still hate the fact that Python forces me to indent code to have blocks. Yes I agree that indented code is much better than non-indented one, but why on earth should the indentation mandate the blocks rather than the other way around? What I usually do in Emacs when I’m getting stuff in and out of loops (which is what I had to do a lot on the script, as I was replacing per-message operations with bulk operations), is basically adding the curly brackets in different place, then select the region, and C-M- it — which means that it’s re-indented following my brackets’ placement. If I see an indent I don’t expect, it means I made a mistake with the blocks and I’m quick to fix it.
With Python, I end up having to manage the space to have it behave as I want, and it’s quite more bothersome, even with the
C-c < and
C-c > shortcuts in Emacs. I find the whole thing obnoxious. The other problem is that, while Python does provide basics access to a lot more functionality than Perl, its documentation is .. spotty at best. In the case of imaplib, for instance, the only real way to know what’s going to give you, is to print the returned value and check with the RFC — and it does not seem to have a half-decent way to return the UIDs without having to parse them. This is simply.. wrong.
The obvious question for people who know would be “why did you not write it in Ruby?” — well… recently I’ve started second-guessing my choice of Ruby at least for simple one-off scripts. For instance, the
deptree2dot tool that I wrote for OpenRC –
available here – was originally written as a Ruby script … then I converted it a Perl script half the size and twice the speed. Part of it I’m sure it’s just a matter of age (Perl has been optimized over a long time, much more than Ruby), part of it is due to be different tools for different targets: Ruby is nowadays mostly a long-running software language (due to webapps and so on), and it’s much more object oriented, while Perl is streamlined, top-down execution style…
I do expect to find the time to convert even my
scan2pdf script to Perl (funnily enough,
gscan2pdf which inspired it is written in Perl), although I have no idea yet when… in the mean time though, I doubt I’ll write many more Ruby scripts for this kind of processing..