Wondering about feeds

You might have noticed in the past months a series of issues with my presence on Planet Gentoo. Sometimes posts didn’t appear for a few days, then there have been issues with entries figuratively posted in the future, and a couple of planet spam really made my posts quite obnoxious to many. I didn’t like it either, seems like I had some problems with Typo when moved to Apache from lighttpd, and then there has been issues with Planet and its handling of Atom feeds and similar. Now these problems should be solved, Planet has moved to Venus software, and it now uses the Atom feeds again which are much more easily updated.

But this is not my topic today, today I wish to write about how you can really mess it up with XML technologies. Yesterday I wanted to prepare a feed for the news on the xine’s website so that it could be shown on Ohloh too. Since the idea is to use static content, I wanted to generate the feed, with XSLT, starting from the same data use to generate the news page. Not too difficult actually, I do something similar for my website as well .

But, since my website only needs to sort-of work, while the xine site needs to actually be usable, I decided to validate the generated content using the W3C validator; the results were quite bad. Indeed, the content in the RSS feed needs to be escaped or just plain text, no raw XHTML is allowed.

So I turned to check Atom, which is supposedly better at things, and is being used for a lot of other stuff as well already. That really looks like XML technology for once, using the things that actually make it work nicely: namespaces. But if I look at my blog’s feed I do see a very complex XML file. I tried giving up on it for a while and gone back to RSS, but while the feed is simple around the entries, the entries themselves are quite a bit to deal with, especially since they require the RFC822 date format which is not really the nicest thing to deal with (for once, it expects days names and month names in English, and it’s far from easily parsed by a machine to translate in a generic date that can be translated in the feed’s user’s locale).

I reverted to Atom, created a new ebuild for the Atom schema for nxml (which by the way fail at allowing auto-completion in XSL files, I need to contact someone about that), and started looking at what is strictly needed. The result is a very clean feed which should work just fine for everybody. The code, as usual, is available on the repository.

As soon as I have time I’ll look into switching my website to also provide an Atom feed rather than an RSS feed. I’m also considering the idea of redirecting the requests for the RSS feed on my blog to Atom, if nobody gives me a good reason to keep RSS. I have already hidden them from the syndication links on the right, which now only present Atom feeds, and they are already the most requested compared to the RSS versions. For the ones who can’t see why I’d like to standardise on a single format: I don’t like redundancy where it’s not needed, and in particular, if there is no practical need to keep both, I can reduce the amount of work done by Typo by just hiding the RSS feeds and redirecting them from within Apache rather than keeping them to hit the application. Considering that typo creates feeds for each one of the tags, categories and posts (the latter I already hide and redirect to the main feed, since they make no sense to me), it’s a huge amount of requests that would be merged.

So if somebody has reasons for which the RSS feeds should be kept around, please speak now. Thanks!

Warning: noise!

The compiler warnings are one of the most important features of modern optimising compilers. They replaced, for the good part, the use of the lint tool to identify possible errors in source code. Ignoring too many warnings usually mean that the software hasn’t been looked at and properly polished before release. More importantly, warnings can be promoted to errors with time, since often they show obsolete code structure or unstable code. An example of this is the warning about the size mismatch between integers and pointers, which became an error with the GCC 3.4 release, to make sure that code would run smoothly on 64-bit machines (which started to become much more common).

Usually, a new batch of warnings is added with each minor or major GCC release, but for some reason the latest micro release (GCC 4.3.3) enabled a few more of them by default. This is usually a good sign since a stricter compiler usually mean that code improves with it. Unfortunately I’m afraid the GCC developers have been a little too picky this time around.

You might remember how the most important warnings cannot become errors even if support for turning particular warnings in errors is present with recent GCC releases. But before that I also had some other problem with GCC warnings, in particular warnings against system headers .

That latter problem was quite nasty since by default GCC would hide from me cases where my code defined some preprocessor macro that the system headers were also going to define (maybe with different values). For that reason, I started building xine-lib with -Wsystem-headers to make sure that these case could be handled. It wasn’t much of a problem since the system headers were always quite clean after all, without throwing around warnings and similar. But this is no longer the case, with GCC 4.3.3 I started to get a much lower signal-to-noise ratio, since a lot of internal system headers started throwing warnings like “no previous prototype”. This is not good and really shows how glibc and gcc aren’t really well synchronised (similarly, readline and gdb, but that’s a story for another day).

So okay I disabled -Wsystem-headers since it produces too much noise and went looking for the remaining issues. The most common problem in xine-lib that the new gcc shows is that all the calls to asprintf() ignore the return value. This is technically wrong, but it should lead to little problems there since a negative value for asprintf mean that there is not enough memory to allocate the string, and thus most likely the program would like to abort. As it is, I’m not going to give them too much weight but rather remind myself I should branch xine-lib-1.2 and port it to glib .

Unfortunately, while the “warn about unused return value” idea is quite good for many functions, it is not for all of them. And somehow the new compiler also ignores the old workaround to shut the warning up (that would be a cast to void of the value returned by the function); while, again, technically good to have those warnings, sometimes you just don’t care whether a call succeeded or not because either way you’re going to proceed in the same way, thus you don’t spend time checking the value (usually, because you check it somehow later on). What the “warn about unused return value” should really point out is if there are leaks due to allocation functions whose return value (the address of the newly-allocated memory area) is ignored, since there is no way you can just ignore that without having an error.

One quite stupid place where I have seen the new compiler to throw a totally useless warning is related to the nice() system call; this is a piece of code from xine-lib:

#ifndef WIN32
  /* nice(-value) will fail silently for normal users.
   * however when running as root this may provide smoother
   * playback. follow the link for more information:
   * http://cambuca.ldhs.cetuc.puc-rio.br/~miguel/multimedia_sim/
   */
  nice(-1);
#endif /* WIN32 */

(don’t get me started with the problems related to this function call, it’s not what I’m concerned with right now).

As you can see there is a comment about failing silently, which is exactly what the code wants; use it if it works, don’t if it doesn’t. Automagic, maybe, but I don’t see a problem with that. But with the new compiler this throws a warning because the return value is not checked. So it’s just a matter to check it and eventually log a warning so to make the compiler happy, no? It would be if it wasn’t for the special case of the nice() return value.

On success, the new nice value is returned (but see NOTES below). On error, –1 is returned, and errno is set appropriately.

[snip]

NOTES

[snip]

Since glibc 2.2.4, nice() is implemented as a library function that calls getpriority(2) to obtain the new nice value to be returned to the caller. With this implementation, a successful call can legitimately return –1. To reliably detect an error, set errno to 0 before the call, and check its value when nice() returns –1.

So basically the code would have to morph in something like the following:

#ifndef WIN32
  /* nice(-value) will fail silently for normal users.
   * however when running as root this may provide smoother
   * playback. follow the link for more information:
   * http://cambuca.ldhs.cetuc.puc-rio.br/~miguel/multimedia_sim/
   */
  errno = 0;
  res = nice(-1);
  if ( res == -1 && errno != 0 )
    lprintf("nice failed");
#endif /* WIN32 */

And just for the sake of argument, the only error that may come out of the nice() function is a permission error. While it certainly isn’t an enormous amount of code needed for the check, it really is superfluous for software that is interested in just making use of it if they have the capability to do so. And it becomes even more critical to not bother with this when you consider that xine is a multithreaded program, and that the errno interface is … well… let’s just say it’s not the nicest way to deal with errors in multithreaded software.

What’s the bottom line of this post? Well, I think that the new warnings added with GCC 4.3.3 aren’t bad per-se, they actually are quite useful, but just dumping a whole load of new warnings on the developers is not going to help, especially if there are so many situations where you would just be adding error messages in the application that might never be read by the user. The amount of warnings raised by a compiler should never be so high that the output is filled with them, otherwise the good warnings will just get ignored causing software to not improve.

I think for xine-lib I’ll check out some of the warnings (I at least found some scary code in one situation) and then I’ll disable locally the warn-unused-return warning, or sed out the asprintf() and nice() return values ignoring. In either case I’m going to have to remove some true positive to avoid having to deal with a load of false positives I shouldn’t be caring about. Not nice.

Update (2017-04-28): I feel very sad to have found out over a year and a half later that Michael died. The links in this and other posts to his blog are now linked to the archive kindly provided and set up by Jan Kučera. Thank you, Jan. And thank you, Michael.

Distributed SCM showdown: GIT versus Mercurial

Although I admit it’s tempting, I’m not going to enter the mess of complaints (warranted or not) about GIT that have found place on Planet GNOME. I don’t intend to go down on what my issues are with bzr either, since I think I exposed them already. I’m going to comment on a technical issue I have with Mercurial, and show why I find GIT more useful, at least in that case.

If you remember xine moved to Mercurial almost two years ago. The choice of Mercurial at the time was pushed because it seemed much more stable (git indeed had a few big changes since then), it was already being used for gxine, and it had better multi-platform support (running git on Solaris at the time was a problem, for instance). While I don’t think it’s (yet) the time to reconsider, especially since I haven’t been active in xine development for so long that my opinion wouldn’t matter, I’d like to share some insight about the problems I have with Mercurial, or at least with the Mercurial version that Alioth is using.

Let’s not start with the fact that hg does not seem to play too well with permissions, and the fact that we have a script to fix them on Alioth to make sure that we can all push to the newly created repositories. So if you think that setting up a remote GIT repository is hard, please try doing so with Mercurial, without screwing permissions up.

For what concerns command line interface, I agree that hg follows more the principle of least surprise, and indeed has an interface much more similar to CVS/SVN than git has. On the other hand, it requires quite a bit of wondering around to do stuff like git rebase, and it requires enabling extensions that are not enabled by default, for whatever reason.

The main problem I got with HG, though, is with the lack of named branches. I know that the newer versions should support them but I have been unable to find documentation about them, and anyway Alioth is not updated so it does not matter yet. With the lack of named branches, you basically have one repository per branch; while easier to deal with multiple build directories, it becomes quite space-hungry since the reflog is not shared between these repositories, while it is in git (if you clone one linux-2.6 repository, then decide you need a branch from another developer, you just add that remote and fetch it, and it’ll download the minimum amount of changesets needed to fill in the history, not a whole copy of the repository).

It also makes it much more cumbersome to create a scratch branch before doing more work (even more so because you lack a single-command rebase and you need to update, transplant and strip each time), which is why sometimes Darren kicked me for pushing changes that were still work in progress.

In git, since the changesets are shared between branches, a branch is quite cheap and you can branch N times without almost feeling it, with Hg, it’s not that simple. Indeed, now that I’m working at a git mirror for xine repositories I can show you some interesting data:

flame@midas /var/lib/git/xine/xine-lib.git $ git branch -lv
  1.2/audio-out-conversion   aafcaa5 Merge from 1.2 main branch.
  1.2/buildtime-cpudetection d2cc5a1 Complete deinterlacers port.
  1.2/macosx                 e373206 Merge from xine-lib-1.2
  1.2/newdvdnav              e58483c Update version info for libdvdnav.
* master                     19ff012 "No newline at end of file" fixes.
  xine-lib-1.2               e9a9058 Merge from 1.1.
flame@midas /var/lib/git/xine/xine-lib.git $ du -sh .
34M	.

flame@midas ~/repos/xine $ ls -ld xine-lib*   
drwxr-xr-x 12 flame flame 4096 Feb 21 12:01 xine-lib
drwxr-xr-x 13 flame flame 4096 Feb 21 12:19 xine-lib-1.2
drwxr-xr-x 13 flame flame 4096 Feb 21 13:00 xine-lib-1.2-audio-out-conversion
drwxr-xr-x 13 flame flame 4096 Feb 21 13:11 xine-lib-1.2-buildtime-cpudetection
drwxr-xr-x 13 flame flame 4096 Feb 21 13:12 xine-lib-1.2-macosx
drwxr-xr-x 12 flame flame 4096 Feb 21 13:28 xine-lib-1.2-mpz
drwxr-xr-x 13 flame flame 4096 Feb 21 13:30 xine-lib-1.2-newdvdnav
drwxr-xr-x 13 flame flame 4096 Feb 21 13:50 xine-lib-1.2-plugins-changes
drwxr-xr-x 12 flame flame 4096 Feb 21 12:53 xine-lib-gapless
drwxr-xr-x 12 flame flame 4096 Feb 21 13:56 xine-lib-mpeg2new
flame@midas ~/repos/xine $ du -csh xine-lib* | grep total
805M	total
flame@midas ~/repos/xine $ du -csh xine-lib xine-lib-1.2 xine-lib-1.2-audio-out-conversion xine-lib-1.2-buildtime-cpudetection xine-lib-1.2-macosx xine-lib-1.2-newdvdnav  | grep total
509M	total

As you might guess the ~/repos/xine content are the Mercurial repositories. You can see the size difference between the two SCMs. Sincerely, even though I have tons of space, on the server I’d rather keep git rather than Mercurial.

If some Mercurial wizard knows how to work around this issue I got with Mercurial, I might consider it again, otherwise for the future it’ll always be git for me.

The xine website: design choices

After spending a day working on the new website I’ve been able to identify some of the problems and design some solutions that should produce a good enough results.

The first problem is that the original site did not only use PHP and a database, but also misused them a lot. The usual way to use PHP to avoid duplicating the style is usually to get a generic skin template, and then use one or more scripts per page that gets included in the main one depending on parameters. This usually results in a mostly-working site that, while doing lots of work for nothing, still does not hog down the server with unneeded work.

In the case of xine’s site, the whole thing loaded either a static html page that gets included or a piece of php code that would define variables, which, care of the main script, would then be replaced in a generic skin template. The menu would also not be written once, but generated on the fly for each page request. And almost all the internal links in the pages would be generated by a function call. Adding the padding to the left side of the menu entries for sub-pages was done by creating (with PHP functions) a small table before the image and text that formed the menu link. In addition to all this, the SourceForge logo was cycling on a per-second basis, which meant that an user browsing the site would load about six different SourceForge images in the cache, and that no two request would have got the same page.

The download, release, snapshots and security pages loaded the data on the fly from a series of flat files that contained some metadata about them, and that then produced the output you’d have seen. And to add client-side timewaste to what was already a timewaste on the server side, the changes in shade of the left-handed menu were done using JavaScript rather than the standard CSS2 :hover option.

Probably because of the bad way the PHP code was written, the site had all the crawlers stopped by robots.txt, which is a huge setback for a site aiming to be public. Indeed, you cannot find it on Google’s cache system because of that, which meant that for last night I had to work with the WayBack machine to see how the site appeared earlier. And it was from one year ago, not what we had a few weeks ago. (This has since stopped being a problem since Darren gave me a static snapshot of the thing as seen on his system).

To solve these problems I decided a few things for the new design. First of all as I’ve already said it has to be entirely static after modification, so that the files served are just the same for each request. This includes removing visit counters (who cares nowadays, really), and the changing SourceForge logo. This ensures that crawlers and users alike will see the exact same content over time if it doesn’t change, keeping caches happy.

Also, all the pages will have to hide their extensions, which mean that I don’t have to care whether the page is .htm, .html or .xhtml. Just like my site all the extensions will be hidden so even the switch to a different technology will not invalidate the links. Again this is for search engine and users alike.

The whole generation is done with standard XSLT, without implementation-dependent features, which means it’ll work with libxslt just like with Saxon or anything else. Although I’m going to use libxslt for now since that’s what I’m using for my site as well. By using standard technologies it’s possible to reuse them for the future without relying on versions of libraries and similar. And thanks to the way XSLT has been designed, it’s very easy to decouple the content from the style, which is exactly what a good site should do to be maintainable for a long time.

Since I dislike custom solutions, I’ve been trying very hard to avoid using custom elements and custom templates outside the main skin, the idea is that XHTML usually works by itself, and adding a proper CSS will take care of most of the remaining stuff. This isn’t too difficult after you get around the problem that the original design was entirely based upon tables rather than proper div elements, but the whole thing has been manageable.

Besides, with this method adding a dynamically-generated (but statically-served) sitemap is also quite trivial, since it’s just a different stylesheet applied over the same general data for the rest of the site.

Right now I’m still working on fixing up the security page, but the temporary not-yet-totally-live site is available for testing and the repository is also accessible to see the code if you wish to see how it’s actually implemented. I’ve actually made some sophistication to the xine site I didn’t use for my own, but that will come with time.

The site does not yet validate properly, but the idea is that it will once it’s up, I “just” need to get rid of the remaining usage of tables.

Too much web content

I’m not much of a web development person. I try to keep my fiddling with my site to a minimum and I focus most of my writing on this blog so that it’s all kept at the same place. I also try not to customise my blog too much beside not having it appear like any other Typo-based blog (the theme is actually mostly custom). For the design of both the site and the blog I relied on OSWD and adapted the designs found there.

I also tend to not care about webservices, webapplications and all that related stuff, it’s out of my sphere, I also try not to comment about web-centric news since I sincerely don’t care. But unfortunately, like most developers out there, I often get inquired about possibilities with webapplications and sites development and so on so forth.

For this reason, I came to be quite opinionated, and probably against the majority of the components who “shape” the net as it is now.

One of my opinions is that you shouldn’t use on-request geneated pages for static content which is what most sites do, with CMSs, Wikis, no-comment blogs and stuff like that. The only reasons why I’m using a web application for my blog is that first of all I happen to write entries while I’m on the go, and second I allow user comments, which is what makes it a blog rather than a crappy website. If I didn’t allow comments, I would have no reason to use a webapplication and could probably just do with a system fetching the entries from an email account.

Another opinion is that you shouldn’t reinvent the wheel because it’s cool. I’m sincerely tired of the amount of sites that include social networking features like friendship and similar. I can understand it when it’s the whole idea of the site (Facebook, FriendFeed) but do I care on sites like Anobii ? (on the other hand I’m glad that Ohloh does not have such a feature).

I’ve been asked at least three times about developing a website with social networking features, with friendship and the stuff, and two out of three times, the target of the projects were “making money”. Sure, okay, keep on trying.

Every other site out there has a CMS to manage the news entries, which could also be acceptable when you have a huge archive and the ability to search through it, but do I need to know which hour it is right now? I have a computer in front of me, I can check it on that (unless of couse I’m looking to find out if it’s actually correctly synchronised). Does every news or group site have to have a photo gallery with its own software on it? There are things like Picasa and Flickr too.

But one thing I sincerely loathe is all the sites that are up with Trac or MediaWiki to provide some bit of content that rarely needs to be edited. Even FreeDesktop.org site is basically a big huge wiki with the developers having write access. Why, I don’t know since you can easily make the thing use DocBook and process the files with a custom stylesheet to produce the pages shown to the user. It’s not like this is overly complex. Especially when just a subset of the people browsing the site have access to edit it.

Similarly, I still wonder why every other WordPress blog requires me to register to the main WordPress site to leave comments. I can understand Blogger and LiveJournal requiring a login either with them or OpenID (and I use my Flickr/Yahoo OpenID for that) but why should I do that on a per-site basis, repeatedly?

But even counting that in, I’m tired of the amount of sites that just duplicate information. Why was xine’s site having its own “security advisory” kind of thing? It’s not like we’re a distribution. Thankfully, Darren started just using the assigned CVE numbers since a few years ago so there is no further explosion of pages. Hopefully, I can cut out some of the pointless content of the site to reduce it.

In the day of I-do-everything sites, I’m really looking forward for smaller, tighter sites that only provides the information they have to instead of duplicating it over and over again. The good web is the light web.

The xine website: intro

As it turns out, the usual xine website has gone offline since a few days ago. Since then, Darren set up a temporary page on SourceForge.net servers, and I’ve changed the redirect of xine-project.org which is now sorta live with the same page that there was on SourceForge.net, and the xine-ui skins ready to be downloaded.

Since this situation cannot be left to happen for a lot still, I’ve decided to take up the task to rebuild the site on the new domain I’ve acquired to run the Bugzilla installation. Unfortunately the original site (which is downloadable from the SourceForge repositories) is written in PHP, with use of MySQL for user-polling and news posting, and the whole thing looks like a contraption I don’t really want to run myself. In particular, the site itself is pretty static, the only real use of PHP on it is not having to write boilerplate HTML for each release, but rather write a file describing them, which is something that I’ve used to do myself for my site .

Since having a dynamic website for static content is far from my usual work practises, I’m going to do just what I did for my own website: rewrite it in XML and use XSLT to generate the static pages to be served by the webserver. This sounds complex but it really isn’t, once you know the basic XML and XSLT tricks, which I’ve learnt, unfortunately for me, with time. On an interesting note, when I’ve worked on my most complex PHP project, which was a custom CMS – when CMS weren’t this widespread! – for an Italian gaming website, now dead, I already looked into using XSLT for the themes, but at the time the support for it in PHP was almost never enabled.

I’m still working on it and I don’t count on being able to publish it this week, but hopefully once the site will be up again it’ll be entirely static content. And while I want to keep all the previously-available content, and keep the general design, I’m going to overhaul the markup. The old site is written mostly using tables, with very confused CSS and superfluous spacer elements. It’s not an easy task but I think it’s worth to do it especially since it should be much more usable for mobile users, of which I’m one from time to time.

If I find some interesting technicality while preparing the new website I’m going to write it here, so keep reading if you’re interested.

More tinderboxing, more analysis, more disk space

Even though I had a cold I’ve kept busy in the past few days, which was especially good because today was most certainly Monday. For the sake of mental sanity, I’ve decided a few months ago that the weekend is off work for me, and Monday is dedicated at summing up what I’m going to do during the rest of the week, sort of a planning day. Which usually turns out to mean a lot of reading and very little action and writing.

Since I cannot sleep right now (I’ll have to write a bit about that too), I decided to start with the writing to make sure the plans I figured out will be enacted this week. Whih is especially considerate to do considering I also had to spend some time labelling, as usual this time of the year. Yes I’m still doing that, at least until I can get a decent stable job. It works and helps paying the bills at least a bit.

So anyway, you might have read Serkan’s post regarding the java-dep-check package and the issus that it found once run on the tinderbox packages. This is probably one of the most interesting uses of the tinderbox: large-scale testing of packages that would otherwise keep such a low profile that they would never come out. To make more of a point, the tinderbox is now running with the JAVA_PKG_STRICT variable set so that the Java packages will have extra checks and would be much more safely tested on the tree.

I also wanted to add further checks for bashisms in the configure script. This sprouted from the fact that, on FreeBSD 7.0, the autoconf-generated configure script does not discard the /bin/sh shell any longer. Previously, the FreeBSD implementation was discarded because of a bug, and thus the script re-executed itself using bash instead. This was bad (because bash, as we should really well know, is slow) but also good (because then all the scripts were executed with the same shell on both Linux and FreeBSD). Since the bug is now fixed, the original shell is used, which is faster (and thus good); the problem is that some projects (unieject included!) use bashisms that will fail. Javier spent some time trying to debug the issue.

To check for bashisms, I’ve used the script that Debian makes available. Unfortunately the script is far from perfect. First of all it does not really have an easy way to just scan a subtree for actual sh scripts (using egrep is not totally fie since autoconf m4 fragments often have the #!/bin/sh string in them). Which forced me to write a stupid, long and quite faulty script to scan the configure files.

But even worse, the script is full of false positives: instead of actually parsing its semantics, it only scans for substrings. For instance it identified the strange help output in gnumeric as a bash-specific brace expansion, when it was in an HEREDOC string. Instead of this method, I’d probably take a special parameter in bash that tells the interpreter to output warnings about bash-specific features, maybe I should write it myself.

But I think that there are some things that should be addressed in a much different way than the tinderbox itself. Like I have written before, there are many tests that should actually be executed on source code, like static analysis of the source code, and analysis of configure scripts so to fix issues like canonical targets when they are not needed, or misaligned ./configure --help output, and os on so forth. This kind of scans should not be applied only to released code, but more importantly on the code still in the repositories, so that the issues can be killed before the released code.

I had this idea when I went to look for different conditions on Lennart’s repositories (which are as usually available on my own repositories with changes and fixes and improvements on the buildsystem – a huge thanks to Lennart for allowing me to be his autotools-meister). By build-checking his repositories before he makes release I can ensure the released code works for Gentoo just fine, instead of having to patch it afterwards and queue the patch for the following release. It’s the step beyond upstreaming the patches.

Unfortunately this kind of work is not only difficult because it’s hard to write static analysis software that gets good results; US DHS-founded Coverity Scan, although lauded by people like Andrew Morton, had tremendously bad results in my opinion with xine-lib analysis: lots of issues were never reported, and the ones reported were often enough either false positives or were inside the FFmpeg code (which xine-lib used to import); and the code was almost never updated. If it didn’t pick up the change to the Mercurial repository, that would have been understandable, I don’t pretend them to follow the repository moves of all the projects they analyse, but the problem was there since way before the move. And it also reported every and each day the same exact problems, repeated over and over; for a while I tried to keep track of them and marked hte ones we already dealt with or which were false positives or were parts of FFmpeg (which may even have been fixed already).

So one thing to address is to have an easy way to keep track of various repositories and their branches, which is not so easy since all SCM programs have different ways to access the data. Ohloh Open Hub has probably lots of experience with that, so I guess that might be a start; it has to be considered, though, that Open Hub only supports the three “major” SCM products, GIT, Subversion and the good old CVS, which means that extending it to any repository at all is going to take a lot more work, and it had quite a bit of problems with accessing Gentoo repositories which means that it’s certainly not fault-proof. And even if I was able to hook up a similar interface on my system. it would probably require much more disk space that I’m able to have right now.

For sure now the first step is to actually write the analysis script that first checks the build logs (since anyway that would already allow to have some results, once hooked up with the tinderbox), and then find a way to identify some of the problems we most care about in Gentoo from static analysis of the source code. Not an easy task or something that can be done in spare time so if you got something to contribute, please do, it would be really nice to get the pieces of the puzzle up.

For A Parallel World. Home Exercise n.1: a drop-in dynamic replacement for memcpy()

Since I’ve written about OpenMP I’ve been trying to find time to test it on real usage scenarios; unfortunately between health being far from optimal the past few days with general aches, and work piling up, I haven’t been able to get to work on it at all. It would be much nicer if I could get a job that would allow me to spend time on these things, but I don’t want to rant about that since I’m happy to have jobs from time to time, as it is.

Yesterday I toyed a bit around with OpenMP and xine-lib, I wanted to try implementing a memcpy() replacement that could use OpenMP to work in parallel, it’s especially useful for multimedia applications. Beside some issues with autotools and OpenMP which I’m going to address in a future post, I ended up with a few more things in my mind (the usual problem with trying out new things: you want to achieve one result, and you get material for other three tests; now I know why Mythbusters starts with one idea and then end up doing four or five similar tests).

My parallel memcpy() replacement was just as lame as my byteswapping attempt from before, a single for parallelised with the proper pragma code. Just to make it not too lame, I used 64-bit copies (unaligned, but I would expect that not to matter on x86-64 at least, it was just a test). The reason why I didn’t go for a less lame method is that from a second test on byteswapping, which I didn’t have time to write about yet, I found that using more complex tricks does not really help. While splitting the memory area to swap in X equally-sized areas, with X being the maximum number of threads OpenMP is managing, identified dynamically, caused a slight improvement on the huge areas (half a gigabyte and a full gigabyte), it didn’t make any acceptable difference (considering the cost of the more complex code) on smaller blocks, and I really doubt that such huge memory areas would ever go swapped all at once. Splitting the area in page-sized (4KiB) blocks actually made the code slower, I guess, since I didn’t go deeper to check, that the problem there is that the threads are usually all executed on either the same core or on anyway on the same CPU, which means that the pages are all mapped on the memory directly connected to that CPU; splitting it up in pages might make it swap between the two different CPUs and thus make it slower. I’ll look more deeply into that when I have time.

Unfortunately, using this particular memcpy() implementation didn’t let me start xine properly, I probably made a mistake, maybe unaligned 64-bit copies on x86-64 don’t work, just like on any other architecture, but I didn’t go around trying to fix that for the very same reason why I’m writing this post.

It turns out that xine, just like MPlayer, and other multimedia application, have their own implementation of “fast memcpy()”, using SIMD instructions (MMX, MMXEXT, SSE, AltiVec, …). They benchmark at runtime which one has the best result (on my system it’s either the Linux kernel implementation, not sure which version, or the MMX version), and then they use that. This has some problems that are obvious, and some that are much less obvious. The first problem is that the various implementations do have to take care of some similar issues which cause code duplication (handling of small memory area copies, handling of unaligned copies and so on). The second is much more subtle and it’s what I think is a main issue to be solved.

When a programmer in a C program uses functions like memcpy(), strlen() and others, the compilation process (with optimisations) will hit some particular code called “builtins”. Basically the compiler knows how to deal with it, and emits different machine code depending on the way the function is called. This usually happens when the parameters to the call are known at build time, because they are either constant or can be derived (for static functions) from the way the function is called. How this affects mathematical functions and functions like strlen() can be better understood reading an old article of mine; for what concerns memcpy(), I’ll try to be brief but explain it here.

Let’s take a very easy function that copies an opaque type that is, in all truth, a 64-bit data field:

#include 

void copy_opaque(void *to, void *from) {
  memcpy(to, from, 8);
}

Building this code on x86-64 with GCC 4.3 and no optimisation enabled will produce this code:

copy_opaque:
        pushq   %rbp
        movq    %rsp, %rbp
        subq    $16, %rsp
        movq    %rdi, -8(%rbp)
        movq    %rsi, -16(%rbp)
        movq    -16(%rbp), %rsi
        movq    -8(%rbp), %rdi
        movl    $8, %edx
        call    memcpy
        leave
        ret

As you can see there is a call to memcpy() after setting up the parameters, just like one would expect. But turn on the optimisation with -O2 and the resulting code is quite different:

copy_opaque:
        movq    (%rsi), %rax
        movq    %rax, (%rdi)
        ret

The function has been reduced to two instructions, plus the return, with no stack usage. This because the compiler knows that for 64-bit copies, it can just emit straight memory access and simplify the code quite a bit. The memcpy() function is not a static inline, but the compiler knows its interface and can produce optimised code just fine when using builtin. Similarly, when using -O2 -fno-builtin to ask the compiler not to use builtins knowledge, for instance because you’re using special access functions, you can see that the resulting code is still composed of two instructions, but of a different type:

copy_opaque:
        movl    $8, %edx
        jmp     memcpy

Let’s go back to the builtin though, since that’s what it’s important to know before I can explain why the dynamically-chosen implementation in xine and company is quite suboptimal.

When you change the size of the memory area to copy in copy_opaque() from 8 to a different constant, you can see that the code changes accordingly. If you use a number that is not a multiple of 8 (that is the biggest size that x86-64 can deal with without SIMD), you can see that the “tail” of the area is copied using smaller move operations, but it’s still expanded. If you compare the output with multiple power-of-two values, you can see that up to 128 it inlines multiple movq instructions, while starting with 256, it uses rep movsq. With very big values, like (1 << 20), the compiler emits a straight memcpy() call. This is because the compiler can assess the overhead of the call and decide when it’s big enough to use a function rather than inlining code.

It can also decide this based on what type of optimisation is requested, for instance I said above that rep movsq starts to get used after the value 256 (1 << 8), but that was intended with the -O2 level; with -Os, it’s already when you have more than two 64-bit words.

Since the library functions like memcpy() and similar are very widely used, the fact that the compiler can emit much simpler code is very useful. But this works just as long as the compiler knows about them. As I said, turning off the builtin replacement will cause the code to be compiled “literally” with a call to the function, which might have a much higher overhead than a straight copy. Now it might be quite easier to grasp what the problem is with the dynamic memcpy() replacement used by xine and other software.

Let’s change the code above to something like this:

#include 

extern void *fast_memcpy(void *to, void *from, size_t foo);

void copy_opaque(void *to, void *from, size_t foo) {
  fast_memcpy(to, from, 8);
}

Now, even turning on the optimisations, won’t make any difference, the compiler will always emit a call to memcpy():

copy_opaque:
        movl    $8, %edx
        jmp     fast_memcpy

As you might guess, this is certainly slower than the straight copy that we had before, even if the memcpy() replacement is blazing fast. The jump will also require a symbol resolution since fast_memcpy() is not statically defined, so it’ll have to pass through the PLT (Procedure Linking Table) which is an expensive operation. Even if the symbol were defined internally to the same library, this would still most likely cause a call to the GOT (Global Offset Table) for shared objects.

By redefining the memcpy() function, xine and others are actually slowing the code down, at least when the size of the copy is known, a constant at build time. GCC extensions actually allow to define a macro, or even better a static inline function, that can discern whether a compile-time constant is used, and then fall back to the original memcpy() call, which the compiler will mangle as it prefers, but this is quite complex, and in my opinion not worth the bother.

Why do I say this? Well the first issue is that sometimes even if a value is not properly a constant at build-time, the compiler can find some particular code path where the function can be replaced, and thus emit adaptive code. The second is that you might just as well always use properly optimised memcpy() functions when needed, and if the C library does not provide anything as fast, you just need to use the Force ELF.

When the C library does not provide functions optimised for your architecture, for compatibility or any other reason, you can try to replace them through the ELF feature called symbol interposing, which basically work in the same as symbol collisions (I have some “slides” I’ve been working on for a while on the subject, but I’ll talk more extensively about this in a future post), and allows to intercept or replace calls to C library functions. It’s the same method used to implement the sandbox used by Portage, or the OSS wrappers for ALSA, PulseAudio, ESounD and so on.

What I’d like to see is a configurable library that would allow to choose between different memcpy() implementations, maybe on a per-size basis too, parallel and non-parallel, at runtime, through a file in /etc. This is quite possible, and similar features, with replacement for many common library functions, are available with freevec (which unfortunately only implements AltiVec on 32-bit PPC). But a more arch-neutral way to handle this would most likely help out.

Anyway, if somebody is up to take the challenge, I’d be glad to test, and to add to the tinderbox to test on the packages’ testsuites too. Let’s see what happens.

For A Parallel World: Programming Pointers n.1: OpenMP

A new sub-series of For A Parallel World, to give some development pointers on how to make software make better use of parallel computing resources available with modern multicore systems. Hope some people can find it interesting!

While I was looking for a (yet unfound) scripting language that could allow me to easily write conversion scripts with FFmpeg that execute in parallel, I’ve decided to finally take a look to OpenMP, which is something I wanted to do for quite a while. Thanks to some Intel documentation I tried it out on the old benchmark I used to compare glibc, glib and libavcodec/FFmpeg for what concerned byte swapping.

The code of the byte swapping routine was adapted to this (note the fact that the index is ssize_t, which means there is opening for breakage when len is bigger than SSIZE_T_MAX):

void test_bswap_file(const uint16_t *input, uint16_t *output, size_t len) {
  ssize_t i;

#ifdef _OPENMP
#pragma omp parallel for
#endif
  for (i = 0; i < len/2; i++) {
    output[i] = GUINT16_FROM_BE(input[i]);
  }
}

and I compared the execution time, through the @rdtsc@ instruction, between the standard code and the OpenMP-powered one, with GCC 4.3. The result actually surprised me. I expected I would have needed to make this much more complex, like breaking it into multiple chunks, to be able to make it fast. Instead it actually worked nice running one cycle on each:

A bar graph showing the difference in speed between standard serial bswap and a simple OpenMP-powered variant

The results are the average of 100 runs for each size, the file is on disk accessed through VFS, it’s not urandom; the values are not important since they are the output of rdtsc() so they only work for comparison.

Now of course this is a very lame way to use OpenMP, there are much better more complex ways to make use of it but all in all, I see quite a few interesting patterns arising. One very interesting note is that I can make use of this in quite a few of the audio output conversion functions in xine-lib, if I was able to resume dedicating time to work on that. Unfortunately lately I’ve been quite limited in time, especially since I don’t have a stable job.

But anyway, it’s interesting to know; if you’re a developer interested in making your code faster on modern multicore systems, consider taking some time to play with OpenMP. With some luck I’m going to write more about it in the future on this blog.

To-Do lists, tasks, what to do

These holidays really sucked, for a long series of reasons, and in general, I’m not feeling well neither emotionally nor physically. But they offered me the time to think about what I want to do. I’ve been working on my collision detection script lately and I’m now confident I can make it work as a proper tool to identify issues, and I’d love to work on fixing those issues with upstream, but the problem is I lack the time to do that.

Even if I do improve a project a day, it’s never going to be enough, because in a few months I’m going to need to do it again because code would have rot and stuff like that. I need help for this. One thing I’m going to do is working on a personal archive of autoconf macros I can use on different projects; the attributes.m4 file that comes with xine, lscube and Lennart’s projects is already a personal archive of macros under some aspects, the problem is that its history is shared among different repositories which is very nasty. I’ve avoided up to now to create a repository for that, splitting it up in different macro files (since it’s far from being just attributes checking any longer), but I think i should look for a solution for that problem rather than continuing to procrastinate that.

Today I stopped procrastinating on getting rid of JFS for my root filesystem: since kernel 2.6.28 was released, I’m now starting the long awaited conversion of my partitions to ext4, starting from the repositories (and here git wins against Mercurial big time: once the git repositories are repacked, copying them over is very very quick), while copying over openjdk, icedtea and xine repositories that use Mercurial take so much longer.

Talking about xine, I’m going to do some more work on that in the next days, mostly code cleanup if I can, but I’m also planning on setting up a Transifex instance on this server for xine (and my own projects); hopefully it’ll make it simpler to provide translated versions of xine-lib, xine-ui and gxine. As well as of my own tools that need to be translated one way or another.

There are so many things I’d have to do that I haven’t been able to in months, reading is one of those, but I’m going to preserve that for when I’m going to the hospital next month for check ups; I’m not going to bring my laptop with me this time, nor any handheld console. I’ll be around on the cellphone a bit maybe, but that’ll be it. For the rest of the time I’ll be reading and listening to music (I’m not going to leave the iPod at home, knowing hospitals it will come handy). Actually, since I just have to have a CT scan, a chest X-Ray and an MRCP (MRI), I don’t strictly need to stay in the hospital; but not having a driving license does not help; although I guess even if I had one, I’d better not be driving after they have tests on me.

I’m going to spend the new year’s eve alone at home, maybe with my mom and my sister with her husband and my nephew. On the whole, it’s not going to be holiday either, so like it happened on Christmas, I’m going to spend most of the day working on some analysis or similar. I’ve seen that some of the issues I’ve brought up lately have started being taken care of, which is very good.

I know this post sounds pretty incoherent, I guess I’m incoherent myself at the moment. Anyway if you wish to help out with anything at all, feel free to drop a line.