You might have noticed in the past months a series of issues with my presence on Planet Gentoo. Sometimes posts didn’t appear for a few days, then there have been issues with entries figuratively posted in the future, and a couple of planet spam really made my posts quite obnoxious to many. I didn’t like it either, seems like I had some problems with Typo when moved to Apache from lighttpd, and then there has been issues with Planet and its handling of Atom feeds and similar. Now these problems should be solved, Planet has moved to Venus software, and it now uses the Atom feeds again which are much more easily updated.
But this is not my topic today, today I wish to write about how you can really mess it up with XML technologies. Yesterday I wanted to prepare a feed for the news on the xine’s website so that it could be shown on Ohloh too. Since the idea is to use static content, I wanted to generate the feed, with XSLT, starting from the same data use to generate the news page. Not too difficult actually, I do something similar for my website as well .
But, since my website only needs to sort-of work, while the xine site needs to actually be usable, I decided to validate the generated content using the W3C validator; the results were quite bad. Indeed, the content in the RSS feed needs to be escaped or just plain text, no raw XHTML is allowed.
So I turned to check Atom, which is supposedly better at things, and is being used for a lot of other stuff as well already. That really looks like XML technology for once, using the things that actually make it work nicely: namespaces. But if I look at my blog’s feed I do see a very complex XML file. I tried giving up on it for a while and gone back to RSS, but while the feed is simple around the entries, the entries themselves are quite a bit to deal with, especially since they require the RFC822 date format which is not really the nicest thing to deal with (for once, it expects days names and month names in English, and it’s far from easily parsed by a machine to translate in a generic date that can be translated in the feed’s user’s locale).
I reverted to Atom, created a new ebuild for the Atom schema for nxml (which by the way fail at allowing auto-completion in XSL files, I need to contact someone about that), and started looking at what is strictly needed. The result is a very clean feed which should work just fine for everybody. The code, as usual, is available on the repository.
As soon as I have time I’ll look into switching my website to also provide an Atom feed rather than an RSS feed. I’m also considering the idea of redirecting the requests for the RSS feed on my blog to Atom, if nobody gives me a good reason to keep RSS. I have already hidden them from the syndication links on the right, which now only present Atom feeds, and they are already the most requested compared to the RSS versions. For the ones who can’t see why I’d like to standardise on a single format: I don’t like redundancy where it’s not needed, and in particular, if there is no practical need to keep both, I can reduce the amount of work done by Typo by just hiding the RSS feeds and redirecting them from within Apache rather than keeping them to hit the application. Considering that typo creates feeds for each one of the tags, categories and posts (the latter I already hide and redirect to the main feed, since they make no sense to me), it’s a huge amount of requests that would be merged.
So if somebody has reasons for which the RSS feeds should be kept around, please speak now. Thanks!