OpenOffice trouble, again, again, again

So I already noted my frustration with OpenOffice yesterday but today I’m definitely reaching my limit. And I’m going to rant, yes, it’s going to be pure rant about OpenOffice, encompassing some of the most annoying problems I had with it in the past years. So if you’re one of those people who can’t stand when other Free Software developers complain about projects that are not perfect, please move along. I’m not just going to bless OpenOffice as perfection just because it’s Free, it’ll have to improve for that. Oh, and yes I know that version 3.2 was just released, and I’ll test that one, but then again I’m not sure that there is any improvement about this, I’ll have to see it by myself, and right now it’s not available in Gentoo for me to try.

So let’s begin with one of the most hyped and, apparently, still incomplete feature: interoperability thanks to the OASIS OpenDocument Format. With OpenOffice 2, there should have been total interoperability between Free Software aiming at managing documents, presentations, spreadsheets and so on so forth. Unfortunately, just after the release of the first two suites using that format, I came across a difference in implementation that caused the two of them to export the same content (lists) in different way, both available in the specification for the format, and yet not equally implemented. So long for the interoperability. At the time, I even considered Microsoft’s much more complex solution more feasible — two years after my first rant, the bug was still open; it was only fixed an year later with OpenOffice 3.1.

Interestingly enough, yesterday Morten Welinder, of Gnumeric fame, posted an unrelated rant that finds its root in the same problem: OpenDocument wasn’t specified nearly as deeply as it should have been to begin with. In this particular case, the problem is even worse, as the lack of a standardised interface for formulae makes it almost useless as a format for complex spreadsheets. On a more related note, I also complained about the formula support especially related to the fact that formulas in OpenOffice change function names depending on the used language, and the comma/dot change as decimal separator makes it almost impossible to use it properly in Italian.

To defend OASIS here, ODT is definitely not the only format designed for interoperability that is not interoperable by a long measure. For instance, SVG caused me headaches also including OpenOffice in the mix.

Other problems are intrinsic in the way OpenOffice is developed, I guess, and the priority that they have. I already complained about the tentative imitation of Word regarding their work toward adding “HTML editor” to the list of features Writer has. Not sure how to read the fact that they moved the MediaWiki editor support to an extension… I didn’t even know it had such feature… wasn’t the point of Wikis to not have to use a full-fledged editor to begin with?

There is definitely a number of minor annoyances with OpenOffice as it is. The fact that to set a colour, anywhere, I need first to get to the options and define it as a custom entry in a palette, rather than having a simple colour picker like any other software (Inkscape, Gimp, …) is just the start. You also need to install extensions to have any kind of document template available, and quite a few of them look also totally broken.

Most annoyances seem indeed to come near the areas of drawing and spreadsheet handling (especially charting). You might have noticed my Ruby-related charts and graphs but you might not have noticed one bad things with them. Beside the minor annoyance that I have to copy the graph from Calc to Draw (and thus lose the correlation with the data I’m charting) to be able to export it properly, if I copy the legend out and add text boxes… OpenOffice Draw exports the spell-checker warnings! Look at the images, and notice that the text is zigzag-underlined in red. I couldn’t believe it the first few times.

And today’s problem, is due once again to Calc, and to the charts provided above: I have a spreadsheet file where I keep the scores for the various implementations, to see the trend in porting (for instance I know that nobody else beside me is handling JRuby support nowadays, as it’s stuck when I don’t touch it).

One of the problems I had before is the area and line charts have different ideas on how to handle empty values. If I chart a 30-lines spreadsheet area with a line chart, which is only half-filled in, the graph stops at mid-air, and that’s it (works out fine since it gets automatically extended when I add more lines); on the other hand, if I do the same with an area chart, then it assumes that the empty results have zero as value, and will draw a line falling down to the X-axis at the first empty position. Annoying, but at least acceptable.

On the other hand, what is definitely not acceptable to my way of using this is the way it handles the X-axis values! I was away for FOSDEM, and then had a bit of personal trouble, so I stopped gathering data on February 3rd and I resumed yesterday, February 10th. I added the data to the spreadsheet and… the graph didn’t jump, the same distance that applies between February 2nd and 3rd is applied between 3rd and 10th. This is not right. Indeed, Microsoft Excel gets it perfectly right in this case, and this is exactly what I was expecting: keeping proportions. And before you suggest I keep the same value as 3rd for all the missing days, that’s not how it was, and it’s not what I’m interested in charting. I’ll have to see if I can get some other software to deal with that kind of data (interestingly enough, Gnumeric does not seem to have that feature either, but it might be I just don’t know how to use the charting tool there).

So, should I accept the compromises just because this is Free Software? I don’t know what you think but I don’t think so, I’ll look for a better software, if it’s Free it’ll have extra points, but since there is a boolean gate to the “better” definition (either it does what I need or it does not), right now OpenOffice is definitely not better than anything else for this task (it does not do what I need). It’d be definitely absurd if I’ll end up relying on Microsoft Excel to provide the graphs for Gentoo’s Ruby porting trends, but right now, that does pass the boolean gate. Please, do provide suggestions! I do want to find a better software, a Free Software doing this.

And to keep on the ranting note, do you remember the “OpenOffice Mouse”? That abomination of design that was announced around the time of the past OpenOffice Conference in Italy? The one that a lot of people – me included – thought and hoped was just a joke, playing on Apple’s then just-unveiled Magic Mouse? Well I haven’t heard anything about that for a while, beside being confirmed by Luca that it had been actually developed, and even produced. I went to look it up today, as I wanted to add to the rant the fact that instead of getting a better quality product, time was wasted on products like that… and the results are even funnier. The link above, while having ”OpenOfficeMouse” in the domain, talks about “OOMouse” (trademark, anybody?) and if you drop the actual page, you end up finding that it’s now the “WarMouse Meta” which, and I quote, “compares to the Magic Mouse, the G9, the Naga, and other multi-button and multi-touch mice”… oh, the irony.

I do support FSFE… it’s positive!

A Free Coffee

I have, before, written about my concerns regarding the way the Free Software Foundation is working nowadays, and the fact that I feel RMS is taking too seriously his role as a “semi-religious” figure (and the whole “Church of Emacs” business). On the other hand, I’m happy to be a supporter of Free Software Foundation Europe. I do find the two taking pretty different stances on a lot of things.

Before leaving for FOSDEM, I read (and re-dented) Lydia’s link to a post by Joe Brockmeier (of OpenSUSE fame — of course there will be a vocal minority that will find his involvement with OpenSUSE, and thus Novell, as a bad sign to begin with, I feel happy that I’m not that closed minded) that summarised quite well my feeling with the way Free Software Foundation is behaving nowadays. Let me quote Joe:

Update (2017-04-21): Joe’s article is gone from the net and I can’t even find it on the Wayback Machine. Luckily I quoted it here!

It isn’t that the folks at the Free Software Foundation are wrong that DRM is bad for users, it’s that they are taking an entirely negative and counter-productive approach to the problem. Their approach to “marketing” may resonate with some in the FLOSS community, but their efforts are not at all likely to win hearts and minds of users who don’t get out of bed in the morning singing the Free Software Song.

While Defective By Design highlights legitimate problems with the iPad (and other products) where are the alternatives? Stop telling people what they shouldn’t buy, and make it easier for them to get hands on some kit that lets them do what they want to do with free software. In other words, stop groaning about Apple and deliver a DRM [I guess he meant DRM-free here — Flameeyes] device of your own, already.

And I agree with him wholeheartedly (of course as long as my note above is right): we should propose alternatives, and they need to be valid alternatives. When I say that I use an iPod because it has 80GB of storage space on it (well, my current, old version has 80GB, newer versions have 160GB of course), people suggest me as an alternative to not carry around so much music. Well, I do want to carry around that much music! If you can get me a player with an equivalent disk space and featureset I’d be grateful to get rid of Apple’s lock-ins… while that’s not available, I don’t really care about reducing my music library, as long as I can use it with Rhythmbox and other Free Software tools.

On the other hand, I cannot praise enough one in particular of the FSFE projects: PDFreaders.org website. Instead of telling the users how bad Adobe is, the site provides them with valid alternatives, specific to their operating system! This includes even the two biggest proprietary operating systems, Windows and Mac OS X. Through this website I actually was able to get more people used to Free Software, as they are glad to use something that is, in many ways, better than Adobe’s own Reader.

As I keep repeating, to bring Free Software to the masses, we need to be able to reach and improve over the quality of proprietary software. We are able to do that, we did so before, and we keep doing so in many areas (it’s definitely not a random chance that FFmpeg is one of the most widely used Free Software projects, sometimes even unbeknownst by its users, on the most varied platforms). When we settle for anything less, we’re going to lose. When we say that something is better and everybody should use that just because it’s Free, then we’re deluding ourselves.

I’m not sure what will happen with OpenOffice now that Oracle ate Sun as a snack, but if this will bring enough change in the project, it might actually make it really go mainstream. Right now, myself, I feel it has so many holes that it’s not even funny… on the other hand, as I wrote, it has some very important strong points, including the graphing capabilities (not charting!), and of course, the fact that it is Free Software.

(Mis)feature by (mis)feature porting

There is one thing that doesn’t upset me a half as much as it should, likely because I’m almost never involved in end-user software development nowadays (although it can be found in back-end software as well): feature-by-feature “ports” (or rather, re-implementations).

Say there is a hugely-known, widely-used proprietary software, and lots of people feel like that a free alternative to that software is needed (which happens pretty often, to be honest, and is the driving force for the Free Software movement, in my opinion); you have two main roads, among a gazillion of possible choices, that you can take: you try to focus on the the use cases for the software or you can re-implement it feature-by-feature. I learnt, through experience, that the former case is always better than the latter.

When I talk about experience, I don’t mean the user experience but rather the actual experience of coding such ports. A long time ago, one of my first projects with Qt (3) under Linux was a try at porting the ClrMame Pro tool (for Windows) — Interestingly enough, I cannot find the homepage of the tool on Google, I rather get the usual spam trap links from the search. My reason to try re-implementing that software, at the time, was that I used to be a huge MAME player (with just a couple of ROMs) and that the program didn’t work fine under Wine (and the few tries I took at fixing Wine didn’t work out as well as I’d have hoped — yet I think a few of my patches made it through to Wine, although I doubt the code persists today).

Feature-by-feature porting is usually far from easy, especially for closed-source applications, because you try to deduce the internal working from the external interface (be it user interface or programming interface) and that rarely works out as good as you would like. Given this is often called reinventing the wheel, you should consider this like trying to reinvent the wheel after being given just a cart without wheels, looking at the way they should connect. For open source software, this is obviously easier to do.

Now, while there are so many software out there that make the same mistake, I’d like to look first at one that, luckily, ended up breaking off from the feature-by-feature idea and started working on a different method, albeit slowly and still being tied too much, in my opinion, to the original concept: Evolution. Those who used the first few versions of Evolution might remember that it clearly, and unbearably tried to imitate, feature-by-feature, Microsoft Outlook 2000. The same icon pane on the left-side, same format for the contacts’ summary, and same modules. The result is … not too appealing, I’d say. As I said the original concept creeps in today as well, as you still have essentially the same modules: mail, contacts, calendar, tasks and notes, the last two being those that I find quite pointless today (especially considering the presence of Tomboy and GNote). A similar design can be found in KDE’s Kontact “shell” around the separated components of the PIM package.

On the other hand, I’d like to pick up a different, proprietary effort: Apple’s own PIM suite. While they tend to integrate their stuff quite tightly, they also have taken a quite different approach for their own programs: Apple’s Mail, iCal and Address Book. They are three different applications, they share the information they store, one with the other (so that you can send and receive meeting invites through Mail, picking up the contacts’ emails), but they have widely different, sometimes inconsistent interface when you put one near the other. On the other hand, each interface seem to have its sense, and in my opinion ends up faring pretty well on the usability scale. What it does not try to do is what Microsoft did, that is forcing the same base graphical interface over a bunch of widely different use cases.

It shouldn’t then surprise that the other case of feature-by-feature (or in this case, misfeature-by-misfeature) port, is again attached to Microsoft from the “origin” end: OpenOffice. Of course, it is true that the original implementation for it comes from a different product (StarOffice) that didn’t really have the kind of “get the same” approach that Evolution and other projects have taken, I guess. On the other hand, they seem to keep going that way, at least to me.

The misfeature that brought me to write this post today is a very common one: automatic hyperlink transformation of URLs and email addresses… especially email addresses. If I consider the main target result from OpenOffice, I’d expect printed material (communications, invoices, and so on) should be up on the top. And in that kind of products you definitely don’t need, nor want, those things hyperlinked; they would not be useful and would be mostly unusable. Even if you do produce PDFs out if it (which supports hyperlinks), I don’t think that just hyperlinking everything with an at-character on it would be a sane choice. As I have been made aware, one of the most likely reason for OpenOffice to do that is that… Word does. But why does Word in the first place?

It’s probably either of two. At the time of Office 2000 (or was it 97? I said 97 before on identi.ca, but thinking for a bit, it might have been 2000 instead), Microsoft tried to push Word as a “web editor”: the first amateur websites started to crop around, and FrontPage was still considered much more top-level than Word; having auto-hyperlinking there was obviously needed. The other option is about the same time, when Microsoft tried to push Word as … Outlook’s mail editor (do you remember the time when you received mail from corporate contacts that was only an attached .doc file?).

So in general, the fact that any other software has a feature does not really justify implementing some feature on a new one. Find why the feature would be useful, and then consider it again.

Why natural language interfaces suck

While I’m a fervid proposer of native language support in all kind of software, which includes not only being able to display and make use of native characters (like the ò character in my surname) but also user interface translation and adaptation for the user’s language, I have a huge beef with what I’ll call “natural language interfaces” in this post.

The most widely known natural language interface is the formula language used by spreadsheet software, like OpenOffice Calc and Microsoft Excel. Since both applications are designed to be used by accountants for the most part, they try not to require of them any kind of generic programming skill. Which seem to still include “no knowledge of English”, even though nowadays I’d expect all of them to know it at the tip of their fingers anyway.

At any rate, the language used for the formula is not independent from the language: it changes both function’s names and data formats depending on the selected language. So not only the SUM() function becomes SOMMA() in Italian, the decimal separator character changes from . to , with the obvious problems tied to that (if they are not obvious to you, comma is still the parameters’ separator as well!). I’m not sincerely sure whether internally the two spreadsheets save a generic ID of the function or the name in the local language; I sincerely hope the former, but either way the thing is already quite brain-damaged for me.

But you don’t have to go down the drain to the programming languages to find of places where natural language interfaces do suck. One other example is something so widespread one would probably not think of it: GMail, or Google Mail (and this will come obvious in a moment, why I do precise both names). I guess this also counts like a further example of Google’s mediocrity but I’m not stressing that out; it’s one (somewhat smaller) fault in a product that is, otherwise, great, especially for Google.

Now, you might not know – I didn’t either till a few months back – that GMail is not called GMail in Germany; Jürgen explained this to me when he wrote gmaillabelpurger (one heck of a magic tool for me; it saved me already so much time; especially load time for IMAP access): because of trademark issues they had to fold back to call it “Google Mail” there, thus creating one further domain (even though users are mapped 1:1 on both, which makes most of the point moot I guess). When the user has registered in Germany, it’s not only the web interface to change, but also the IMAP folder hierarchy: the [Gmail] prefix in the service folders’ names changes to [Google Mail].

This would only have mattered for the small error I got when I first tried Jürgen’s script (as he wrote it with the German interface in mind) if not for another issue. Using GMail with the default English language selects the “American” variant. And such variant also affects the dates shown in the web inteface; and since I don’t usually like dealing with stupid date formats (don’t try to say that mm/dd/yyyy is not stupid!) the other day, when I needed to use it to look up a timeline for work mail messages, I switched the interface to “English UK”, which solved the problem at the time for me.

Fast forward a couple of days and I notice that the script is not behaving as it should as messages are not deleted; a quick look has shown me the problem: Gmail’s IMAP interface is affected by the language settings in the web interface! What that comes down to be at that point is that the old Trash folder gets renamed into Bin; d’uh! And even worse, setting the UK variant for the language causes some quite large confusion with the trademarked names: the web interface still reports GMail, but on the other hand, [Google Mail] is used in the IMAP interface. And that’s with me still connecting from an Italian IP address.

Now, thanks to Jürgen the script works again and thus my problem is solved. But it really should show that writing interfaces that depend on the language of the user isn’t really an excessively smart move.

I also start to wonder how soon I’ll get used to move my mail to the bin, rather than trash it.

And finally, the Portage Tree overhead data

I’m sorry it took so long but I had more stuff to write about in the mean time, and I’m really posting stuff as it comes with some pretty randomly ordered things.

In the post about the Portage Tree size I blandly and incompletely separate the overhead due to the filesystem block allocation from the rest of size of the components themselves. Since the whole data was gathered a night I was bored and trying to fixing up my kernel to have both Radeon’s KMS and the Atheros drivers working, it really didn’t strike as a complete work, and indeed it was just to give some sense of proportion on what is actually using up the space (and as you might have noticed, almost all people involved do find the size, and amount, of ChangeLogs a problem). Robin then asked for some more interesting statistics to look at, in particular the trend of the overhead depending on the size of the filesystem blocks.

This post, which comes after quite some angst is going to illustrate the results, although they do tend to be quite easy to see with the involved graphs. I hope this time the graphs do work for everybody out of the box; last time I used Google Docs to produce the output and linked it directly, this saved a lot of traffic on my side, but didn’t work for everybody. This time I’m going to use my blog’s server to publish all the results, hoping it won’t create any stir on it…

First of all, the data; I’m going to publish all the data I collected here, so that you can make use of it in any way you’d like; please note that it might not be perfect, knowledge about filesystems isn’t my favourite subject, so while it should be pretty consistent, there might be side-effects I didn’t consider; for instance, I’m not sure on whether directories have always the same size, and whether that size is the same for any filesystem out there; I assume both of these to be truths, so if I did any mistake you might have to adapt a bit the data.

I also hasn’t gone considering the amount of inodes used for each different configuration, and this is because I really don’t know for certainty how that behaves, and how to find how much space is used by the filesystem structures that handle inodes’ and files’ data. If somebody with better knowledge of that can get me some data, I might be able to improve the results. I’m afraid this is actually pretty critical to have a proper comparison of efficiency between differently-sized blocks because, well, the smaller the block the more blocks you need, and if you need more blocks, you end up with more data associated to that. So if you know more about filesystems than me and want to suggest how to improve this, I’ll be grateful.

I’m attaching the original spreadsheet as well as the tweaked charts (and the PDF of them for those not having OpenOffice at hand).

Overhead of the Gentoo Tree Size

This first graph should give an idea about the storage efficiency of the Gentoo tree changes depending on the size block size: on the far left you got the theoretical point: 100% efficiency, where only the actual files that are in the tree are stored; on the far right an extreme case, a filesystem with 64KiB blocks… for those who wonder, the only way I found to actually have such a filesystem working on Linux is using HFS+ (which is actually interesting to know, I should probably put in such a filesystem the video files I have…); while XFS supports that in the specs, the Linux implementation doesn’t: it only supports blocks of the same size of a page, or smaller (so less than or equal to 4KiB) — I’m not sure why that’s the case, it seems kinda silly since at least HFS+ seems to work fine with bigger sizes.

With the “default” size of 4KiB (page size) the efficiency of the tree seems to be definitely reduced: it goes down to 30%, which is really not good. This really should suggest everybody who care about storage efficiency to move to 1KiB blocks for the Portage tree (and most likely, not just that).

Distribution of the Gentoo Tree Size

This instead should show you how the data inside the tree is distributed; note that I dropped the 64KiB-blocks case, this because the graph would have been unreadable: on such a filesystem, the grand total amounts of just a bit shy of 9GB. This is also why I didn’t go one step further and simulated all the various filesystems to compare the actual used/free space in them, and in the number of inodes.

*This is actually interesting, the fact that I wanted to comment on the chart, not leaving them to speak for themselves, let me find out that I did a huge mistake and was charting the complete size and the overhead instead of the theoretical size and the overhead in this chart. But it also says that it’s easier to note these things in graphical form rather than just looking at the numbers.*

So how do we interpret this data? Well, first of all, as I said, on a 4KiB-sized filesystem, Portage is pretty inefficient: there are too many small files: here the problem is not with ChangeLog (who still has a non-trivial overhead), but rather with the metadata.xml files (most of them are quite small), the ebuilds themselves, and the support files (patches, config files, and so on). The highest offender of overhead in such a configuration is, though, the generated portage metadata: the files are very small, and I don’t think any of them is using more than one block. We also have a huge amount of directories.

Now, the obvious solution to this kind of problems, is, quite reasonably actually, using smaller block sizes. From the reliability chart you can see already that without going for the very-efficient 512 bytes blocks size (which might starve at inode numbers), 1 KiB blocks size yields a 70% efficiency, which is not bad, after all, for a compromise. On the other hand, there is one problem with accepting that as the main approach: the default for almost all filesystems is 4KiB blocks (and actually, I think that for modern filesystems that’s also quite a bad choice, since most of the files that a normal desktop user would be handling nowadays are much bigger, which means that maybe even 128KiB blocks would prove much efficient), so if there is anything we can do to reduce the overhead for that case, without hindering the performance on 512 bytes-sized blocks, I think we should look into it.

As other have said, “throwing more disks at it” is not always the proper solution (mostly because while you can easily find how to add more disk space, it’s hard to get reliable disk space. I just added two external WD disks to have a two-level backup for my data…

So comments, ideas about what to try, ideas about how to make the data gathering more accurate and so on are definitely welcome! And so are links to this post on sites like Reddit which seems to have happened in the past few days, judging from the traffic on my webserver.

Charting and Graphing

You might remember my last post about the Gentoo Portage size which contained some pie charts showing the proportion of space used by the various pieces that make up Portage itself.

On a related note, yes I know that pie charts are often the wrong tool; I still think that for the use case of that post, the pie chart was the best option because I only had to give proportions and not a comparison; as you’ll see though, I’m going to use it sparingly.

Finding a decent way to plot the charts has always been a problem for me, since I would have wanted more than once to give some graphical representation of improvements and benchmarks, but the tools available all have their own set of quirks, which means that I have to fight with them a lot more than I can afford myself to:

  • gnuplot (which, by the way, does not make pie charts at all), is tremendously complex, to the point that even the quite good book about it does not help to easily make use of it, if you’re not already an expert in statistical analysis;
  • R, if anything, is even worse, to the point I don’t really want to discuss it!
  • using gruff should allow to draw all the needed chart, given you can easily represent the values in Ruby; unfortunately it doesn’t really work extremely well, and more than once, both for pie charts and bar charts, I found the colours not to properly cover one the other, with quite shitty results;
  • using Google Docs, with the spreadsheet component, looked almost good, if it wasn’t for the fact that lots of people have had trouble loading the charts in my previous post; while the Google application is definitely well-designed, especially for what concerns user interface and basic functionalities (just as an example, the ability to move the graph to its own dedicated sheet, which I remember being available on Microsoft Excel 97, is not available in OpenOffice, while it is present in Google’s spreadsheet), it also lacks some more “public” features: there is no way to ask for the graphs of a given size when exporting (for instance for thumbnails), and at the same time, the auto-generated text in the public, exported chart seems to always be in the locale the generating interface was set in… guess what? I have it in Italian;
  • I ended up reconsidering OpenOffice; it worked great with flowcharts so I wanted to see the good “old” suite at work to do something that it should probably be designed for.

Now, since I’m not sure whether I’ll post this before or after the results (I’m writing it before the results’ post, but that’s not to say much to be honest, since I have a queue of posts already written, as usual), I cannot really say much about the results themselves, but my area of analysis this time has been the distribution of sizes and overhead with different block sizes (as suggested by Robin). I used Ruby to gather the data, and I’ve copied it into a large sheet into Calc (reaching column AL) — incidentally, the amount of data to handle is the reason why I didn’t go with Google Docs this time: with Firefox is definitely too slow to work with it; probably it’s designed to be faster with Chrome. Then it was time to condition the data…

OpenOffice definitely have some usability issues in that matter! First of all, when selecting the range of data to plot, there is no easy way to select non-contiguous columns, since once you release the mouse button the interface returns to the chart wizard. The trick is to choose the columns manually, using the form A1:A8,C1:C8 and so on so forth. I used again Ruby to generate the list of columns for me or it was definitely a mess… I gave up when I had to re-do the graph for the third time because I didn’t select some stuff, so I just used another sheet to copy the information I needed, and then filter out what I wouldn’t be needing.

As I noted above, there is no way, that I could find or Google, to create a sheet that only holds a chart. I’m pretty sure that Microsoft Excel 97 had a feature like that… and I’m definitely certain it has it in version 2007 (because I have a fully licensed Office 2007 here). Google Docs, as I said, has it as well. The reason why I’m upset that it lacks that feature, is because it would have made it quite a lot easier to export the charts for publication; instead the only way I found was to copy the chart, and then paste it in a Draw document: at that point, while the chart was still tweakable to reorder columns and stuff like that, I had to re-tweak it every time I noticed a flaw in the data, since it was disconnected from its original data source.

Another area that OpenOffice definitely got to improve is the handling of colours: everywhere you select colours you’re not allowed to freely select one, you have to add it to the OOo palette first… which in turn requires a restart of OOo itself since sometimes it fails to pick it up in all the instances. This might not be such a huge deal when seen by most users who just need “a” colour, but it really is upsetting when you know exactly which colour you want. And indeed it reduces the usability of Draw: for a word processor or a spreadsheet, precise colours might not be that important, but for software like Draw (or Impress), the ability of choosing an arbitrary colour without having to jump through a long series of hoops is definitely important!

This is definitely something that sometimes upsets me: OpenOffice has almost all the cards ready to be a perfect poker of productivity software, but there are is a number, toward infinity, of details that need to be fixed up (just to add another quickly: the fact that the packages, ebuilds included, don’t install the templates by default, and you got to install them from the Sun extensions site, which by the way installs them in your home directory and I don’t really like that). I really hope that this is going to get fixed in the future, but counting in the go-oo split there is really a lot of mess around OpenOffice, like a lot of other huge projects (OpenJDK/IcedTea, Mozilla/IceCat/IceWeasel, …). Why Free Software developers can’t really get along together for more than their own itches?

Removing .la files, for dum^W uncertain people

Since I have been still fighting with the damned .la files and I’m pretty sure that even though I have explained some use cases most of my colleagues haven’t really applied them, I decided to go with a different approach this time: graphical guides.

Since the post about the tree size has gotten so much feedback, probably because the graphs impacted on people, this might actually prove useful.

Note: I first tried to draw the chart with Inkscape, but the connector available on its code only draws straight lines, which are unusable for stuff like this; I found no way to anchor lines to an arbitrary point of objects either, so I gave up; dia is tremendously bad to work with; kivio 2 is not in Portage nor available as binary package for either Windows or OSX; OpenOffice to the rescue, worked almost flawlessly, unfortunately I didn’t want to waste time to define customised colours so you get the bad and boring ones in the image.

As you can see from this graph, my idea is that, at the end, every .la file is removed. Of course this is not immediate and depends on a series of factors; this graph shows at least the basic question you got to ask yourself when you have to deal with shared libraries. Please note that this does not apply the same to plugins and for that I’ll post another, different flow chart.

  • Does the package install internal libraries only? A lot of packages provide convenience libraries to share code between different executable programs (see this post for more information about it); this can be detected easily: there are no include files installed by default, the library is not in the ld path (such as /usr/lib/packagename). In this case, the .la files are not useful at all, and can be removed straight away.
  • Does the package only install shared objects? The .la files are only meaningful for static libraries that have no dependency information; if a package is not installing static libraries (.a files) it needs not the .la files.
  • Do the libraries in the package need other libraries? If the libraries are standalone, and only depend on the C library (libc.so), then there is no dependency information useful in the .la file, and can be dropped.
  • Is pkg-config the official way to link to the libraries? When using pkg-config, the dependency information is moved inside the .pc file, so the copy in the .la file is redundant, and thus unnecessary.
  • Is the package new? When adding a new package into Portage, there is no reason to keep the .la files around when the conditions shown above apply. For packages that are already in portage, the removal of .la files need to be considerate, or you’ll get the same kind of fire I got for trying to remove some (useless) .la files out of the blue. Not a situation that I like, but so is life.

Who Pays the Price of Pirated Programs

I have to say sorry before all, because most likely you’ll find typos and grammar mistakes in this post. Unfortunately I have yet to receive my new glasses so I’m typing basically blind.

Bad alliteration in the title, it should have been “pirated software“ but it didn’t sound as good.

I was thinking earlier today who is really paying the price of pirated software in the world of today; we all know that the main entity losing from pirated software is, of course, the software’s publisher and developer. And of course most of them, starting from Microsoft, try their best to reverse the game, saying that the cost is mostly on the user itself (remember Windows Genuine Advantage?). I know this is going to be a flamethrower, but I happen to agree with them nowadays.

Let me explain my point: when you use pirate software, you end up not updating the software at all (‘cause you either have no valid serial code, or you have a crack that would go away); and this include security vulnerabilities, that often enough, for Windows at least, lead to virus infecting the system. And of course, the same problem applies, recursively, to antivirus software. And this is without counting the way most of that software is procured (eMule, torrents, and so on… — note that I have ethical uses of torrent sites for which I’d like at least some sites to be kept alive), which is often the main highway for viruses to infect systems.

So there is already an use case for keep legit with all the software; there is one more reason why you, a Linux enthusiast, should also make sure that your friends and family don’t use pirate software: Windows (as well as Linux, but that’s another topic) botnets send spam to you as well!

Okay, so what’s the solution? Microsoft – obviously – wants everybody to spend money on their licenses (and in Italy they cost twice as much; I had to buy a Microsoft Office 2007 Professional license – don’t ask – in Italy it was at €622 plus VAT; from Amazon UK it was €314, with VAT reimbursed; and Office is multi-language enabled, so there is not even the problem of Italian vs. English). I don’t entirely agree with that; I think that those who really need to use proprietary software that costs, should probably be paying for it, this will give them one more reason to want a free alternative. All the rest, should be replaced with Free (or at least free) alternatives.

So for instance, when a friend/customer is using proprietary software, I tend to replace it along these lines: Nero can be replaced with InfraRecorder (I put this first because it’s the least known); Office with the well-known OpenOffice and Photoshop with Gimp (when there are no needs for professional editing at least).

The main issue here is that I find a lot of Free Software enthusiasts who seem to accept, and foster pirate software; sorry I’m not along those lines, at all. And this is because I loathe proprietary software, not because I like it! I just don’t like being taken for an hypocrite.

Inconsistent Scalable Vector Graphics

The one job I’m taking care of at the moment involves me drawing some stuff using SVG in C code, without using any support libraries. Without going into much detail, since I cannot because of an NDA, I can say the generated file has to be as small as possible since, as you might guess by now, it has to be done on an embedded system.

The task itself is not too difficult, but today I started the actual reduction of the code so that it fits in the software the I have to develop, and here starts the problems. The first issue has been I was tired of looking up the correct attributes for each SVG element, so I ended up doing the same I did for DocBook 5 and added a new ebuild to portage: app-emacs/nxml-svg-schemas:1.1 which installs the SVG 1.1 schemas so that Emacs’s nxml-mode can tab-complete the elements and attributes. I positively love Emacs and nXML since it allows me to have specific XML variants support by just adding its schemas to the system!

A little note now about nXML though: I’ll have to contact upstream because I found one nasty limitation of it: I cannot make it locate the correct shemas on a version basis, which means I won’t be able to provide SVG 1.2 schemas alongside as 1.1 with the code as it is; if I can get a new locating rules schemas that can detect the correct schema to use also through version, that’s going to solve not only SVG 1.2 but also future DocBook versions. So this enters my TODO list. Also, am I the only one using nXML in Gentoo? I’m maintaining all the three schemas ebuilds, it’s not like it’s a big hassle, but I wonder what would happen if I were to leave Gentoo — or more likely at this point if I were to end up in the hospital again; I hope I’m fine now but one is never sure, and my mindset is pretty pessimistic nowadays.

At any rate, I’ve been studying the SVG specifications to find a way to reduce the useless data in the generated file, without burdening the software with doing manual calculations. The easy way out is to use path and polyline elements to draw most of the lines in the file, which would be fine if it wasn’t they only accept coordinates in “pixels” (which are not actual pixel, but just the basic unit for the SVG file itself). This is not too bad since you can define a new viewport which can have an arbitrary size in “pixels”, and is stretched over the area. The problem is with supporting the extra viewports.

The target of the file to generate is to work on as many systems as possible, but it’s a requirement that it works on Windows with Internet Explorer, as well as Firefox. For SVG files under Internet Explorer there is the old, unmaintained and deprecated Adobe SVG plugin (which is still the default Internet Explorer will try to install) and the examotion Renesis Player which is still maintained. So I take out my test file, and try it.

I wrote the file testing it with eog which I’m not sure which SVG library uses for the rendering and with rsvg that uses librsvg obviously; with those, my test file was perfect, the problem has been with other software, since I got the following results:

  • Inkscape wouldn’t load it properly at all and just draw crazy stuff;
  • batik 1.6 worked;
  • Firefox, Safari and Opera shown me grey and red rectangles rather than the actual lines I wrote in the SVG;
  • Renesis Player shown me lines, but way too thick for what I wanted;
  • OpenOffice shown it with the right dimensions but didn’t translate it 2×2 cm down from the upper left corner like I instructed the svg to.

After reporting the issue on examotion’s tracker, since that is the most important failure in that list for my current requirements, I got a suggestion of switching the definition of font-size to direct attribute rather than through style so to change the actual svg measure unit. This made no difference for the three implementations that worked before, nor on examotion, but actually got me one step closer to the wished result:

  • inkscape still has problems, the white rectangle I draw to get a solid white background is positioned over the rest of the elements, rather than under like I’d expect since it’s the first element in the file; it also does not extend the grid like it should, so the viewBox attribute is not properly handled;
  • OpenOffice still has the problem with translation but for the rest seems fine;
  • Safari still has the same problems;
  • Opera 9.6 on Windows finally renders it perfectly, but fails under Ubuntu (?!);
  • Firefox official builds for Windows and OSX, as well as under Ubuntu, work fine; under Gentoo, it does not, and still show the rectangles;
  • Adobe SVG plugin work fine.

At this point I should have enough working implementations so that I can proceed with my job task, but this actually made me think about the whole thing about SVG, and it reminded me tremendously of the OASIS OpenDocument smoke which I had a fight with more than three years ago. I like very much XML-based technologies for sake of interoperation, but it’d be nice if the implementations actually had a way to produce a proper result.

Like in OpenDocument, where the specifications allow two different styles for lists, and different software implements just one of them, making themselves incompatible one with the other, SVG defines some particular features that are not really understood or used by some implementations, or can create compatibility issues between implementations.

In this case, it seems like my problem is the way I use SVG subdocuments to establish new viewports, and then use the viewBox feature to change their unit space. This is perfectly acceptable and well described by the specifics, but it seems to cause further issues down the line with the measure units inside and outside these subdocuments. But it seems like the problem is not just one-way, from this other bugreport on Inkscape you can also see that Inkscape does not generate so pure SVG as it should.

While XML has been properly designed to be extensible, thanks to things like namespaces and similar, one would expect that the feature provided by a given format would be used before creating your own extensions; in this case from that bug report you can see (and I indeed double checked that it still is the case) that Inkscape does not use SVG’s own features to establish a correspondence between “SVG pixels” and the size in real-world units of the image; indeed, it adds two new attributes to the main document: inkscape:export-xdpi and inkscape:export-ydpi, while SVG expects you to use the viewBox for providing that information.

Sigh, I just wished to get my graph working.

Working with .NET, OpenOffice, and Mono

So I did hint as a work having to do with Mono I was commissioned. It’s actually a quite simple thing that by itself has no relationship with Mono, to be honest.

Basically, this friend of mine is self-employed, and sometimes has to prepare a budget for customers, to tell them how much a given service would cost them. I originally wrote such a software for him a few years ago, using Java, producing an HTML page that could be printed for the customers, and archived in a Firebird database. Unfortunately he lost the database and decided to not go with that since he preferred having the page published up in a different way to file down.

After an year without my software he decided that yeah it’s a good idea to have a database for this, and asked me to change the software around. Since he now runs on Vista, I’m tempted to use something different this time, something that also has a more suitable look and feel, which is one thing he complained about before, since the Java application didn’t look native. Of course the choice here is .NET, and as for how to handle the printed result, OpenOffice seems to have everything I need, I just need to generate a document with the right fields filled in, and then ask OpenOffice to convert it to PDF for filing or mailing, and so on so forth. The good thing is that he’s already using OpenOffice.

Now I just need to find how to interface with .NET (and Mono on Linux for development) to OpenOffice, and see if I can actually command it like I need it to. The alternative would be to create macros directly in OpenOffice, but I guess it’d be a bit of a mess for me to write code with that language, while C# I can work with quite nicely nowadays.

I guess that if I can make it generic enough, it’d be a nice thing to have around as Free Software, at least as a sample, I guess.