Last night I ended up in Bizarro World, hacking at Jürgen’s gmaillabelpurge (which he actually wrote on my request, thanks once more Jürgen!). Why? Well, the first reason was that I found out that it hasn’t been running for the past two and a half months, because, for whatever reason, the default Python interpreter on the system where it was running was changed from 2.7 to 3.2.
So I tried first to get it to work with Python 3 keeping it working with Python 2 at the same time; some of the syntax changes ever so slightly and was easy to fix, but the 2to3
script that it comes with is completely bogus. Among other things, it adds parenthesis on all the print
calls… which would be correct if it checked that said parenthesis wouldn’t be there already. In a script link the one aforementioned, the noise on the output is so high that there is really no signal worth reading.
You might be asking how comes I didn’t notice this before. The answer is because I’m an idiot! I found out only yesterday that my firewall configuration was such that postfix was not reachable from the containers within Excelsior, which meant I never got the fcron notifications that the job was failing.
While I wasn’t able to fix the Python 3 compatibility, I was able to at least understand the code a little by reading it, and after remembering something about the IMAP4 specs I read a long time ago, I was able to optimize its execution quite a bit, more than halving the runtime on big folders, like most of the ones I have here, by using batch operations, and peeking, instead of “seeing” the headers. At the end, I spent some three hours on the script, give or take.
But at the same time, I ended up having to workaround limitations in Python’s imaplib (which is still nice to have by default), such as reporting fetched data as an array, where each odd entry is a pair of strings (tag and unparsed headers) and each even entry is a string with a closed parenthesis (coming from the tag). Since I wasn’t able to sleep, at 3.30am I started re-writing the script in Perl (which at this point I know much better than I’ll ever know Python, even if I’m a newbie in it); by 5am I had all the features of the original one, and I was supporting non-English locales for GMail — remember my old complain about natural language interfaces? Well, it turns out that the solution is to use the Special-Use Extension for IMAP folders; I don’t remember this explanation page when we first worked on that script.
But this entry is about Python and not the script per-se (you can find on my fork the Perl version if you want). I have said before I dislike Python, and my feeling is still unchanged at this point. It is true that the script in Python required no extra dependency, as the standard library already covered all the bases … but at the same time that’s about it: it is basics that it has; for something more complex you still need some new modules. Perl modules are generally easier to find, easier to install, and less error-prone — don’t try to argue this; I’ve got a tinderbox that reports Python tests errors more often than even Ruby’s (which are lots), and most of the time for the same reasons, such as the damn unicode errors “because LC_ALL=C is not supported”.
I also still hate the fact that Python forces me to indent code to have blocks. Yes I agree that indented code is much better than non-indented one, but why on earth should the indentation mandate the blocks rather than the other way around? What I usually do in Emacs when I’m getting stuff in and out of loops (which is what I had to do a lot on the script, as I was replacing per-message operations with bulk operations), is basically adding the curly brackets in different place, then select the region, and C-M- it — which means that it’s re-indented following my brackets’ placement. If I see an indent I don’t expect, it means I made a mistake with the blocks and I’m quick to fix it.
With Python, I end up having to manage the space to have it behave as I want, and it’s quite more bothersome, even with the C-c <
and C-c >
shortcuts in Emacs. I find the whole thing obnoxious. The other problem is that, while Python does provide basics access to a lot more functionality than Perl, its documentation is .. spotty at best. In the case of imaplib, for instance, the only real way to know what’s going to give you, is to print the returned value and check with the RFC — and it does not seem to have a half-decent way to return the UIDs without having to parse them. This is simply.. wrong.
The obvious question for people who know would be “why did you not write it in Ruby?” — well… recently I’ve started second-guessing my choice of Ruby at least for simple one-off scripts. For instance, the deptree2dot
tool that I wrote for OpenRC – available here – was originally written as a Ruby script … then I converted it a Perl script half the size and twice the speed. Part of it I’m sure it’s just a matter of age (Perl has been optimized over a long time, much more than Ruby), part of it is due to be different tools for different targets: Ruby is nowadays mostly a long-running software language (due to webapps and so on), and it’s much more object oriented, while Perl is streamlined, top-down execution style…
I do expect to find the time to convert even my scan2pdf
script to Perl (funnily enough, gscan2pdf
which inspired it is written in Perl), although I have no idea yet when… in the mean time though, I doubt I’ll write many more Ruby scripts for this kind of processing..
TL;DR: I don’t know Python and I know other tools that do the job.
I don’t think it’s fair to say that the cause is simply: “he doesn’t know python”sure: he’s the first one to say he’s a python newbie… but his point is more like:”I had to use a low-level and poorly documented library, and I don’t like the syntactical/philosophical choices of the languages… so I’ll use other tools”though it seems really weird to me that one would willingly choose perl in 2013for the syntactical/philosophical choice, I don’t really like Ruby either but I understand how one can have different priorities and prefer other tradeoffsfor the library, you say that you’re seeing lots of test failures on your tinderbox, and so you prefer to avoid to rely on external dependenciesI assume that since CPAN has been around since a lot more time, indeed it’s probably more stable, but I don’t know where these logs are stored to understand if the failures really are so maddeningI personally don’t have many problems when using pip, but the issue probably is with lesser tested packages, and the quality of the ebuilds? (surely, when I have .deb availables, I usually use those… but for simple scripts like these I came to the conclusion that the lack of a proper package in the linux distribution isn’t really bothering me)concerning 2to3, what you were looking for is the -x flag (–nofix) :)explicitly:2to3 -x printalso, when writing python2 code that is really future-thinking… usually people rely on imports like:from __future__ import print_functionand, lo and behold, 2to3 automatically skips the print fixer :)moreover, in the general case adding parenthesis to all the print calls is the correct thing to do, in fact the correct translation from python2 to python3 ofprint(1,2)isprint((1,2))and it’s not that the 2to3 programmers have been lazy: in fact even if they’d have looked into the print’s arguments, you can’t really distinguish the previous 2 snippets: the tuple’s parenthesis are obvious in the AST… but the grouping parenthesis instead disappearso, the only safe thing to do is to always wrap, unless explicitly instructed otherwisebtw, yesterday evening I downloaded the code, tried to run it… and in about 20 minutes (+5 of wrangling with git and 5 of making it working again with python2) I got it running in python3 :)(then, today I fixed some small annoyances, and added an imap4-utf7 encoder to be able to work with unicode labels… a while ago I sent a pull request to Jürgen 🙂 )