It would take so little to be compatible

You might remember that I worked on my own static website template engine and that I made it Free some time ago. This template engine uses XSLT at its heart; just like the Gentoo XML format, and DocBook. This makes it slightly less powerful than something written in a “proper” programming language, but still powerful enough for most of my uses.

One of the features I implemented last year, was linking to Facebook so that my friend could share his projects directly there. At the time I implemented some basic OpenGraph support, and added some more links to help Google and other search engines to find the pages in the right place, including using a direct reference to Youtube’s video on those pages that almost entirely focus on the video.

Unfortunately that last feature was not used at all by Facebook, at the time, since it did mean loading a third party website within their page, which was at the very least frown upon. It didn’t really matter, at some later point I wanted to make use of HTML5 <video> elements, but doing so meant spending quite a bit of time trying to get it just right and supporting as many browsers as possible.. for the moment I gave up.

A couple of days ago, my friend published a new video on his site, and I went to link it on the facebook page… right away, instead of simply creating a link with the given thumbnail, it has found the actual YouTube embedding code and put it on the side of the provided link and title. Cool, it works!

So today I was spending some more time polishing FSWS, and I went to Facebook’s Open Graph page to see what else I could make use of. Interestingly enough, I note now that even the Open Graph page is now more complete as to which options are available. I don’t know why, but they decided to add support for indicating the width/height of the video in the page, which is all good and dandy. Until you note the name of the attributes: og:video:width.

What about that? Well, in my system, given I’m speaking about XSLT, I translate the OpenGraph meta tags into attributes to the fsws:page element, and related. This looks like the most RDF-compatible implementation as well, to me. Too bad that for this to work, the attributes need to have a valid QName and a valid QName only has one colon character. D’oh.

The end result of this is that I probably have to rethink most of what I did up to now, and avoid using attributes to represent Open Graph attributes, almost certainly breaking the few websites using FSWS already. But I wonder if they couldn’t just be a bit more compatible and avoid using the colon character as separator there…

I hate the Facebook API

So, my friend is resuming his movie-making projects (site in Italian) and I decided to spend some time to make the site better integrated with Facebook, since that is actually his main contact media (and for what he does, it’s a good tool). The basic integration was obviously adding the infamous “Like” button.

Now, adding a button should be easy, no? It’s almost immediate doing so with the equivalent Flattr button, so there should be no problem to add the Facebook version. Unfortunately the documentation tends to be self-referencing to the point of being mostly useless.

Reading around google, it seems like what you need is:

  • add OpenGraph semantic data to the page; while the OpenGraph protocol itself only mandates title, url, type and image, there is an obscure Facebook doc actually shows that to work it needs one further data type, fb:admins;
  • since we’re adding fb:-prefixed fields, we’re going to declare the namespace when using XHTML; this is done by adding xmlns:fb="http://www.facebook.com/2008/fbml&quot; to the <html> field;
  • at this point we have to add some HTML code to actually load the Javascript SDK… there are a number of ways to do so asynchronously… unfortunately they don’t rely, like Flattr, on loading after the rest of the page has loaded, but need to be loaded first, following the declaration of a <div id="fb-root" /> that they use to store the Facebook data.
window.fbAsyncInit = function() {
  FB.init({ cookie: true, xfbml: true, status: true });
};

(function() {
  var e = document.createElement('script');
  e.src = document.location.protocol + '//connect.facebook.net/en_US/all.js';
  e.async = true;
  document.getElementById('fb-root').appendChild(e);
}());

* then you can add a simple like button adding <fb:like/> in your code! In theory.

In practise, the whole process didn’t work at all for me on the FSWS generated website, with neither Chromium nor Firefox; the documentation and forums don’t really give much help, most people seems to forget fields and so on, but for what I can tell, my page is written properly.

Trying to debug the official Javascript SDK is something I won’t wish for anybody to be forced to; especially since it’s minimised. Luckily, they have released the code under Apache License and is available on GitHub so I went on and checked out the code; the main problem seems to happen in this fragment of Javascript from connect-js/src/xfbml/xfbml.js:

      var xfbmlDoms = FB.XFBML._getDomElements(
        dom,
        tagInfo.xmlns,
        tagInfo.localName
      );
      for (var i=0; i < xfbmlDoms.length; i++) {
        count++;
        FB.XFBML._processElement(xfbmlDoms[i], tagInfo, onTagDone);
      }

For completeness, dom is actually document.body, tagInfo.xmlns is "fb" and tagInfo.localName in this case should be "like". After the first call, xfbmlDoms is still empty. D’uh! So what is the code of the FB.XFBML._getDomElements function?

  /**
   * Get all the DOM elements present under a given node with a given tag name.
   *
   * @access private
   * @param dom {DOMElement} the root DOM node
   * @param xmlns {String} the XML namespace
   * @param localName {String} the unqualified tag name
   * @return {DOMElementCollection}
   */
  _getDomElements: function(dom, xmlns, localName) {
    // Different browsers behave slightly differently in handling tags
    // with custom namespace.
    var fullName = xmlns + ':' + localName;

    switch (FB.Dom.getBrowserType()) {
    case 'mozilla':
      // Use document.body.namespaceURI as first parameter per
      // suggestion by Firefox developers.
      // See https://bugzilla.mozilla.org/show_bug.cgi?id=531662
      return dom.getElementsByTagNameNS(document.body.namespaceURI, fullName);
    case 'ie':
      // accessing document.namespaces when the library is being loaded
      // asynchronously can cause an error if the document is not yet ready
      try {
        var docNamespaces = document.namespaces;
        if (docNamespaces && docNamespaces[xmlns]) {
          return dom.getElementsByTagName(localName);
        }
      } catch(e) {
        // introspection doesn't yield any identifiable information to scope
      }

      // It seems that developer tends to forget to declare the fb namespace
      // in the HTML tag (xmlns:fb="http://www.facebook.com/2008/fbml") IE
      // has a stricter implementation for custom tags. If namespace is
      // missing, custom DOM dom does not appears to be fully functional. For
      // example, setting innerHTML on it will fail.
      //
      // If a namespace is not declared, we can still find the element using
      // GetElementssByTagName with namespace appended.
      return dom.getElementsByTagName(fullName);
    default:
      return dom.getElementsByTagName(fullName);
    }
  },

So it tries to workaround when the xmlns is missing by using the full name fb:like rather than looking for like for their own namespace! So to work it around at first, I tried adding this code before FB.init call:

FB.XFBML._getDomElements = function(a,b,c) {
  return a.getElementsByTagNameNS('http://www.facebook.com/2008/fbml', c);
}

This actually should work as intended, and finds the <fb:like /> elements just fine. If it wasn’t, that is, that the behaviour of the code then went ape, and the button worked intermittently.

At the end of the day, I decided to go with the <iframe> based version like most of the other websites out there do; it’s quite unfortunate, I have to say. I hope that at some point FBML is replaced with something more along the lines of what Flattr does with simple HTML anchors that get replaced, no extra namespaces to load…

The bright side is that I can actually use FSWS to replace the XFBML code with the proper <iframe> tags, and thus once it’s released it’ll allow to use at least part of the FBML markup to produce static websites that do not require the Facebook Javascript SDK to be loaded at all… now, if FSF were to answer me on whether I can use text of the autoconf exception as a model for my own licensing I could actually release the code.

I’m a Geek!

The title, by itself, is pretty obvious; I’m a geek, otherwise I wouldn’t be doing what I’m doing, even with the bile refluxes I end up having, just for the gratitude of a few dozens users (which I’ll thank once again from my heart; you make at least part of the insults I receive bearable). But it’s more a reminder for those who follow me since a long time ago, and who could remember that I started this blog over four years ago, using Rails 1.1 and a Gentoo/FreeBSD install.

Well, at the time my domain wasn’t “flameeyes.eu”, which I only bought two years ago but rather the more tongue-in-cheek Farragut.Flameeyes.Is-A-Geek.org where Farragut was the name of the box (which was contributed by Christoph Brill and served Gentoo/FreeBSD as main testing and stagebuilding box until PSU/MB gave up).

At any rate, I’ve been keeping the hostname, hoping one day to be able to totally phase it out and get rid of it; this because while at the start it was easy to keep it updated, DynDNS has been pressing more and more for free users to register for the “pro” version. Up to now, I’ve been just refreshing it whenever it was on the verge of expiring, but.. since the latest changes will not allow me to properly re-register the same hostname if it failed, and a lot of websites still link to my old addresses, I decided to avoid problems and scams, and I registered the hostname with DynDNS Pro, for two years, which means it won’t risk expiration.

Given that situation, I decided to change the Apache (and AWStats) configuration so that the old URLs for the blog and the site won’t redirect straight to the new sites, but rather accept the request and show the page normally. Obviously, I’d still prefer if the new canonical name is used. Hopefully, at some point in time, browsers and other software will support the extended metadata provided by OpenGraph which not only breaks down the title in site and page title (rather than the current mess of different separators between the two in the <title> element!), but also provides a “canonical URL” value that can solve the problem of multiple-hostnames as well (yes that means that if you were to link one post of mine on Facebook with the old URLs it’ll be automatically translated to the new, canonical URLs).

But it’s not all stopping here; for the spirit of old times, I also ended up looking at some of the articles I wrote around that time, or actually before that time, for NewsForge/Linux.com (as I said in my previous post noted). At the time, I wasn’t even paid for them, but the only requirement was a one year exclusive; last one was in December 2005, so the exclusive definitely expired a long time ago. So, since their own website (now only Linux.com, and changed owner as well) is degrading (broken links, comments with different text formatting functions, spam, …) I decided to re-publish them on my own website in the newly refurbished articles section and, wait for it…

I decided to re-license all three of the articles I wrote in 2005 under CreativeCommons Attribution-ShareAlike license.

Update (2017-04-21): as it happens my articles section is now gone and instead the articles are available as part of the blog itself.

Okay, nothing exceptional I guess, but given there was some doubts about my choices of licenses, this actually makes available a good chunk of my work under a totally free license. I could probably ask Jonathan whether I could do something like that to the articles I wrote for LWN, but since that site is still well maintained I see no urgency.

I should also be converting from PDF/TeX to HTML the rest of the articles on that index page, but they are also not high on my priority list.

Finally, I’m still waiting on FSF to give me an answer regarding the FSWS licensing — Matija helped me adapt the Autoconf exception into something usable for the web… unfortunately the license of the exception itself is quite strict, and so I had to ask FSF for permission of using that.. the request has been logged into their RT, I hope it’ll get me an answer soon… who knows, FSWS might be my last “gift” before I throw the towel.

Trying again to find a license for FSWS

FSWS, standing for a pretty unimaginative “Flameeyes’s Static Web Site”, is simply a bunch of XSL template files that I have developed for my own website to start with and now serves a couple others of a few friends of mine. It is a close relative to what me and Darren worked on for the xine website but it’s not derived directly; it’s rather that I took decisions based on my experience with that, and with the non-generic templates I used for my website before that.

Now, I have written before of my searching for a license for the framework a few months back, and I’ve not yet found something that works for me. I asked Matija for help but I ended without the time to mail him the details and.. what the heck, why should I mail just him the details of what I’m looking for? Isn’t this what the “social” web is all about?

So again, let’s see the specifics about this: I want FSWS to be Free Software, by any standard, and I want it to be copyleft as well: if you modify it, I’d like to see the improvements, and make use of them, since if I modify it, my changes are available to everybody to begin with. This takes out the options of the simplest licenses, such as CreativeCommons, or MIT. Ideally, GPL should do the trick, but as we all should know, it’s a license that works best with non-Web software; for the “new world” of Web, even FSF created a new license, the AGPL; I’ve used AGPL-3 before, for my rbot plugins (the bugzilla one is used by the willikins bot that is on the #gentoo channels on Freenode).

But is AGPL-3 a good choice as it is? Nope, probably not. As I noted in that previous post, using AGPL-3 would make the generated website licensed under AGPL-3 as well, since it’s template we’re talking about. As you can guess, it’s not going to make it any fun to use a similarly-licensed system. And this is something that FSF knows themselves; the equivalent “old world Unix” situation is autoconf and its M4 macro files, for which they created a licensing exception that happens to be more or less what I need for my own code.. of course, the exception as it is, is not really general enough to apply to my use case.

In general, there are just a few points I need to make sure are respected:

  • the templates themselves are the core of the project; edit them, make them available; for what I’m concerned, I’m not interested in having the link to FSWS visible on the websites using it, just as long as there is a link to a downloadable tarball for them, something like a <link rel=“fsws:sources”> tag;
  • the generated website should be able to have any license at all; CC-ND, AGPL, proprietary, nothing should be stopped;
  • you can override and extend FSWS with more template elements; I’m actually a bit undecided on how to handle them; for what I’m concerned they should probably be allowed just as the resulting output, but it then get murky when you re-use the code from the original templates…

So now, what should I do for licensing this work, and publishing it? I want to get it right the first time, rather than deal with the fallout of bad decisions later!

Maybe not-too-neat Apache tricks; Part 1

In general, I’m not an expert sysadmin; my work usually involves development rather than administration, but as many other distribution developers, I had to learn system administration to make sure that the packages do work on the users’ systems. This gets even messier when you deal with Gentoo and its almost infinite amount of combinations.

At any rate, I end up administering not only my local systems, but also two servers (thanks to IOS Solutions who provides xine with its own server for the site and bugzilla). I started using lighttpd, but for a long series of circumstances I ended up moving to Apache (mostly content negotiation things). I had to learn the hard way about a number of issues — luckily security was never involved.

My setup moved from a single lighttp instance to one Apache that kept running two static websites, one Bugzilla and one Typo instances, to two Apache on two servers, one running a static website and a Bugzilla instance, the other running a few static websites and a Typo instance via passenger. The latter is more or less what I have now.

From one side, Midas is keeping up the xine website (static, generated via XSLT after commit and push); from the other, Vanguard – the one I pay for – keeps this blog my website and a few more running. I used to have a gitweb instance (and Gitarella before that), but I stopped providing the git repositories myself, much easier to push them to Gitorious or GitHub as needed.

The static websites use my own generator for which I still have to find a proper license. Most of these sites are mine or simply of friends of mine, but with things changing a bit for me, I’m going to start offering that as a service package for my paying customers (you have no idea how many customers would just be interested in having a simple, static page updated once in a few months… as long as it looked cool).

But since I have, from time to time, to stop Apache to make changes to my blog – or in the past Passenger went crazy and Apache stopped from answering to requests at all – I’m not very convinced about running the two alongside for much longer. I’ve then decided it was a good idea to start figuring out an alternative approach; the solution I’m thinking of requires the use of two Apache instances on the same machine; since I cannot use different ports for them (well, I could run my blog over 443/SSL but I don’t think that would be that good of an idea for the read-only situation), I’ve now requested a second IP address (the current vserver solution I’m renting should support up to 4), and I’ll run two instances with that.. over the two different IP addresses.

Now, one of the nice things of splitting the two instances this way is that I don’t even need ModSecurity on the instance that only serves the static sites; while they are not really as static as a stone (I make use of content negotiation to support multiple languages on the same site, and mod_rewrite to set the forcing), there is no way that I can think of that any security issue is triggered while serving them. I could even use something different from Apache to serve them, but the few advanced features I make use of don’t make it easy to switch (content negotiation is one, another is rewritemaps to recover moved/broken URLs). And obviously, I wouldn’t need Passenger either.

But all the other modules? Well, those I’d need; and since by default they are actually shared modules (I have ranted about that last November), loading two copies of them means duplicating the .data.rel and the other Copy on Write sections. Not nice. So I finally bit the bullet and, knowing that Apache upstream allows using them as builtin, I set out to find if the Gentoo packaging allows for that situation. Indeed it does, but by mishandling the static USE flag which made it quite harder to find out. After enabling that one, disabling the mem_cache, file_cache and cache modules (that are not loaded by default but are still built, and would be built-in when using the static USE flag), and restarting Apache, the process map looked much better, as now the apache2 processes have quite less files open (and thus a much neater memory map).

One thing that is interesting to note: right now, I’ve not been using mod_perl for Bugzilla because of the configuration trouble; one day I might actually try that. Possibly with a second Apache instance on Midas, open only on SSL, with a CACert certificate.

Now it might very well be possible that you were to need a particular module only in one case, such as mod_ssl to run a separate process for an SSL-enabled Apache 2 instance… in that case, one possible solution, even though not extremely nice, is to use the EXTRA_ECONF trick that I already described.. in this case, you could create a /etc/portage/env/www-servers/apache file with this content:

export EXTRA_ECONF="${EXTRA_ECONF} --enable-ssl=shared"

On a separate note, I think one of the reasons why our developers let the default be dynamic modules is more related to the psychology of calling it “shared”. It makes it sound like it’s wasting memory when you have multiple processes using a “non-shared” module.. when in reality you’re creating much more private memory mappings with the shared version. Oh well.

Unfortunately, as it happens, the init system we have in place does not allow for more than one Apache system to be running; it really requires different configuration files and probably a new init script, so I’ll have to come back to this stuff in the next days, for the remaining parts.

There are though three almost completely unrelated notes that I want to sneak in:

  • I’m considering a USE=minimal (or an inverse, default-enabled, USE=noisy) for pambase; it would basically disable modules such as pam_mail (tells you if you have unread mail in your queue — only useful if you have a local MDA), pam_motd (gives you the Message of the Day of the system) and pam_tally/pam_lastlog (keep track of login/su requests). The reason is that these modules are kept loaded in memory by, among others, sshd sessions, and I can’t find any usefulness in them for most desktop systems, or single-user managed servers (I definitely don’t leave a motd to myself).
  • While I know that Nathan complained to me about that, I think I start to understand why the majority of websites seem to stick with www or some other third-level domain: almost no DNS service seem to actually allow for CNAME to be used on the origin record (that is, the second-level domain); this means that you end up with the two-levels domain to point directly to an IP, and changing a lot of those is not a fun task, if you’re switching the hosting from one server to another.
  • CACert and Google Chrome/Chromium don’t seem to get along at all. Not only I’ve been unable to tell it to accept the CACert root certificate, but while trying to generate a new client certificate with it, the page is frozen solid. And if I try to install it after generating it with Firefox, well… it errors out entirely.

Choosing a license for my static website framework

You might remember that some time ago I wrote a static website generator and that I wanted to release it as Free Software at some point in time.

Well, right now I’m using that code to create three different websites – mine the one for my amateur director friend and the one for a Metal band – which is not too shabby for something that started as a way to avoid repeating the same code over and over again, and it actually grew bigger than I expected at first.

Right now, it not only generates the pages, but also the sitemap and to some extent the robots.txt (by providing for instance the link to the sitemap itself). It can generate pages that link to Flickr photos and albums, including providing descriptions and having a gallery-like showcase page, and it also has some limited support for YouTube videos (the problem there is that YouTube does not have a RESTful API; I can implement REST calls through XSLT, but I don’t think I would be able to program GData protocol with that).

Last week, I was clearing up the code a bit more, because I’m soon going to use it for a new website (for a game – not video game – a friend of mine invented and is producing) and ended up finding some interesting documentation from Yahoo! on providing semantic information for their search engine (and I guess, to some extent, Google as well).

This brought up two questions to me:

  • is it worth keeping working on this framework based on XSLT alone? As I said, Flickr support was piece-of-cake, because the API they use is REST-based, but YouTube’s GData-based API definitely require something “more”. And at the same time, even Flickr gallery wrapping has been a bit of a problem, because I cannot really properly paginate using XSLT 1.0 (and libxslt does not support XSLT 2.0, where the iterators I needed are implemented). While I like the consistent generation of code, I start to feel like it needs something to pre-process the data before sending it out; for instance I could make some program just filter the references to youtube videos, write down an XML description of them, downloaded with GData, and then let XSLT handle that. Or cache the Flickr photos (which would be very good to avoid requesting all the photos’ details every time the website is updated);
  • I finally want to publish FSWS to the public; even if – or maybe especially – I want to discontinue it or part of it, or morph into something that is less “pure” than what I have now. What I’m not sure about is which license to use. I don’t want to make it just GPL, as that implies that you can modify it and never give anything back, since you won’t redistribute the framework, but the results; AGPL-3 sounds more like it, but I don’t want to make the pages generated by the framework to apply that license. Does anybody have an idea?

I’m also open to suggestion on how something like this should work. I really would prefer if the original content is written simply in XML: it’s close enough to the output format (XHMTL/HTML5) and shouldn’t be much of a trouble to write. The less vague idea I have on the matter is to use multiple steps of XML conversion; already the framework uses some nasty two-pass conversion of the input document (it splits in N-branches depending on the configured languages, then it processes the two branches almost independently to produce the output), and since some content is generated by the first pass, it’s also difficult to make sure that all the references are there for links and similar.

It would be easier if I could write my own xslt functions: I could just replace an element referring to a youtube video with a reference to a (cached) XML document, and similarly for Flickr photos. But to do so I guess I’ll either have to use JavaScript and an XSLT processor that supports it, or I should write my own libxslt-based processor that can understand some special functions to deal with GData and similar.

The status of some deep roots

While there are quite a few packages that are know to be rotting in the tree, and thus are now being pruned away step by step, there are some more interesting facets in the status of Gentoo as a distribution nowadays.

While the more interesting and “experimental” areas seem to have enough people working on them (Ruby to a point, Python more or less, KDE 4, …), there are quite some deeper areas that are just left to rot as well, but cannot really be pruned away. This includes for instance Perl (for which we’re lagging behind a lot, mostly due to the fact that tove is left alone maintaining that huge piece of software), and SGML, which in turn includes all the DocBook support.

I’d like to focus a second on that latter part because I am partly involved in that; since I like using DocBook and I actually use the stylesheets to produce the online version of Autotools Mythbuster using the packages that are available in Portage. Now, when I wanted to make use of DocBook 5, the stylesheet for the namespaced version (very useful to write with emacs and nxml) weren’t available, so I added them, adding support for them to the build-docbook-catalog script. With time, I ended up maintaining the ebuilds for both versions of the stylesheets, and that hasn’t been always the cleanest thing given that upstream dropped the tests entirely in the newer versions (well, technically they are still there, but they don’t work, seems like they lack some extra stuff that is nowhere documented).

Now, I was quite good as I was with this; I just requested stable for the new ebuilds of the stylesheets (both variants) and I could have kept just doing that, but … yesterday I noticed that the list of examples in my guide had broken links, and after mistakenly opening a bug on the upstream tracker, I noticed that the bug is fixed already in the latest version. Which made me smell something: why nobody complained that the old stylesheets were broken? Looking at the list of bugs for the SGML team, you can see that lots of stuff was actually ignored for way too long a time. I tried cleaning up some stuff, duping bugs that were obviously the same, and fixing one in the b-d-c script, but this is one of the internal roots that is rotting, and we need help to save it.

For those interested in helping out, I have taken note of a few things that should probably be done with medium urgency:

  • make sure that all the DTDs are available in the latest release, and that they are still available upstream; I had to seed an old distfile today because upstream dropped it;
  • try to find a way to install the DocBook 5 schemas properly; right now the nxml-docbook5-schemas package install its own copy of the Relax-NG Compact file; on Fedora 11, there is a package that installs more data about DocBook 5, we should probably use the same original sources; the nxml-docbook5-schemas package could then either be merged in with that package or simply use the already-installed copy;
  • replace b-d-c, making it both more generic and using a framework that exists already (like eselect) instead of reinventing the wheel; the XML/DTD catalog can easily be used for more than just DocBook, while I know the Gentoo documentation team does not want for the Gentoo DTD to just be available as a package to install in the system (which would make it much easier to keep updated for the nxml schemas, but sigh), I would love to be able to make fsws available that way (once I’ll finish building the official schema for it and publish it, again more on that in the future);
  • find out how one should be testing the DocBook XSL stylesheets, so that we can run tests for them; it would have probably avoided the problem I had with Autotools Mythbuster in the past months;
  • package the stylesheets for Xalan and Saxon, which are different from the standard ones; b-d-c already has support for them to a point (although not having to explicit this kind of things in the b-d-c replacement is desirable), but I didn’t have reason to add them.

I don’t think I’ll have much time on working on them in the future, so user contributions are certainly welcome; if you do open any bug for these issue, please do CC me directly, since I don’t intend (yet) to add myself to the sgml alias.

Yes, again more static websites

You might remember I like static websites and that I’ve been working on a static website framework based on XML and XSLT.

Upon necessity, I’ve added support to that framework for multi-language websites; this is both because people asked for my website to be translated in Italian (since my assistance customers don’t usually know English, not that well at least), and because I’m soon working on the website for a metal group that is to be available in both languages too.

Now, making this work in the framework wasn’t an easy job: as it is now, there is a single actual XML document that the stylesheet, with all its helper templates, gets applied to, it already applied a two-pass translation, so that custom elements (like the ones that I use for the projects’ page of my site – yes I know it gets stuck when loading) are processed properly, and translate into fsws proper elements.

To make this work I then applied a similar method (although now I start to feel like I did it in the wrong order): I create a temporary document filtering all the elements that have no xml:lang attribute or have the proper language in that, once for each language the website is written in. Then, I apply the rest of the processing over this data.

Since all the actual XHTML translation happens in the final pass, this pass become almost transparent to the rest of the processing, and at the same time, pages like the articles index can share the whole list of articles between the two versions, since I just change the section element of the intro instead of creating two separate page descriptions.

Now, I’ll be opening fsws one day once this is all sorted out, described and so on, for now I’m afraid it’s still too much in flux to be useful (I haven’t written a schema of any kind just yet, and I want to do that soon so I can even validate my own websites). For now, though, I can share the code I’m currently using to handle the translation of the site. As usual, I don’t rely on any kind of dynamic web application to serve the content (which the frameworks generate in static form), but rather I rely on Apache’s mod_negotiation and mod_rewrite (which ship with the standard distribution).

This is the actual configuration that vanguard is using to do the serving:

AddLanguage en .en
AddLanguage it .it

DefaultLanguage en
LanguagePriority en it
ForceLanguagePriority Fallback

RewriteEngine On

RewriteRule ^(/[a-z]{2})?/$     $1/home [R=permanent]

RewriteCond ${REQUEST_FILENAME} !-F
RewriteRule ^/([a-z]{2})/(.+)$ /$2.$1

(I actually have a few more rules in that configuration file but that’s beside the point now).

Of course this also requires that the MultiView option is also enabled, since that’s what makes Apache pick up the correct file without having map files around. Since the file are all named like home.en.xhtml and home.it.xhtml, requesting the explicit language as suffix allows Apache to just pick up the correct file, without having to mess with extra configuration of files.

Right now there are a few more things that I have to work on, for instance the language selection on the top should really bring you to the other language version of the same page, rather than the homepage. Or it works fine on single-language site just if you never use xml:lang, I should special-case that. For this to work I have to add a little more code to the framework, but it should be feasible in the next weeks. Then there are some extra features I haven’t even started implementing but just planned: an overlay based photo gallery, and some calendar management for ICS and other things like that.

Okay this should be it for the teasing about fsws; I really have to find time to set up a repository for, and release, my antispam rules, but that will have to wait for next week I guess.

What’s going on with me

If you haven’t noticed, I’ve been less involved in Gentoo in the past weeks, and I’m still not sure how long it could continue this way. The main reason is that I’m currently handling multiple work projects at once, and that requires most of my time, robbing it out from Gentoo, which I’m neither paid for nor currently is involved in my work projects.

I’m currently fighting the jobs on various fronts; most of what I’m working for is closed-source, some is web based and other totally isn’t, it’s a real mix and match of problems, and at the same time I’m also expanding in other kind of computer-related services, including standard Home/SoHo PC assistance (pushing for Linux whenever it’s fit), and into reselling/hosting of domain. Probably stuff that most of my colleague and most of the Gentoo powerusers have done or are still doing. Really I don’t like this, but it pays the bills.

In this note, I’ve also been trying to expand my static website generator so to support multi-language websites; this comes from a necessity of myself to actually have a website where I can write about my services for the Italian customers, who prefer an Italian-written site, and at the same time, have the website serve my usual English-based content. It’s going to be tricky and I already foresee it’s going to require me to rewrite most of the code in it to handle the XML language attributes, but sooner or later I’ll get it to work.

Of course, once I’m done with the website generator side, I’ve gotta set up Apache to serve the correct language but still allow override, and there’s where the fun is going to start since I really want the most automation and the most flexibility in a single Apache configuration, with dynamically-served static files!

On other notes, my desktop views are currently full of terminals and other craps in both computers, Yamato (12 desktops) and Merrimac (9 desktops); there are terminals with the tinderbox, terminals and emacs with feng sources, two MonoDevelop instances (yes I know it’s crazy) and a number of terminals and emacs spent between Yamato and Vanguard (this server) since I spent most of the day trying to help tracking down a bug in Typo with caching (now solved, luckily, since the blog without caching tends to get quite a hit).

And finally, if you’re interested in anything at all in particular from me, since I really don’t have much free time myself, you should either wait, ask me, bribe me or hire me (especially if you’re in Europe since I actually can invoice you there!). Yes I should try to sell myself better on my site rather than doing these shameless plugs.

More XSL translated websites

I have written before that, over CMS- or Wiki-based website, I prefer static websites, and that with a bit of magic with XSL and XML you can get results that look damn cool. I also have worked on the new xine site which is entirely static and generated from XML sources and libxslt.

When I wrote the xine website, I also reused some of the knowledge from my own website even though the two of them are pretty different in many aspects: my website used one xml file per page, with an index page, and a series of extra stylesheets that convert some even higher level structures into the mid-level blocks that then translated to XHTML; the xine website used a single XML file with XInclude to merge in many fragments, with one single document for everything, similarly to what DocBook does.

Using the same framework, but made a bit more generic, I wrote the XSL framework (that I called locally “Flameeyes’s Static Website” or fsws for short) that is generating the website for a friend of mine, an independent movie director (which is hosted on vanguard too). I have chosen to go down this road because he needed something cheap, and he didn’t care much about interaction (there’s Facebook for that, mostly). In this framework I implemented some transformation code that implements part of the flickr REST API, and also a shorthand to blend in Youtube videos.

Now, I’m extending the same framework, keeping it abstract from the actual site usage, allowing different options for settig up the pages, to rewrite my own website with a cleaner structure. Unfortunately it’s not as easy as I thought, because while my original framework is extensible enough, and I was able to add in enough of my previous stylesheets’ fragments into it without changing it all over, there are a few things that I could probably share again between different sites without needing to recreate it each time but require me to make extensive changes.

I hope that once I’m done with the extension, I’ll be able to publish fsws as a standard framework for the creation of static websites; for now I’m going to extend it just locally, and for a selected number of friends, until I can easily say “Yes it works” – the first thing I’ll be doing then would be the xine website. But I’m sure that at least this kind of work is going to help me getting better understanding of XSLT that I can use for other purposes too.

Oh and in the mean time I’d like to pay credit to Arcsin whose templates I’ve been using both for my and others’ sites… I guess I know who I’ll be contacting if I need some specific layout.