From Textile to Markdown

You probably remember by now my blog’s woes with the piece of crap that goes for a blog engine I’m using right now. Since enough people asked me to please not kill the blog, and since I would like, if I keep the blog around, to keep it up to date (if it were to go stale forever I would proceed with my idea of killing it altogether, and maybe make a book out of it), I’m looking in what options I got.

The options for me are either to move to a different blog engine altogether, or to just cut features out of the currently-running version of Typo to remove the errors and the stuff I don’t need. Please do not try to suggest me to use a static generator as I don’t care about it: I want to be able to edit my blog online, and I want to be able to have comments. And I’m not going to maintain two different pieces of software to maintain a static engine and a comment system, and no I’m not going to use Disqus or similar. Full stop.

Whichever way it goes, one thing that I need to change is the text format the posts are written in. Right now most of it is written in Textile. I’m not sure if the choice was simply due to Markdown sucking at the time, or this being the format used by Serendipity that I was using before. In any case, Textile lost the text format war and everything is Markdown nowadays. So I changed the settings for the new posts and I’m writing them in Markdown. The problem is converting the old posts.

Now thankfully I’ve already been pointed at pandoc which is a great tool… but its support for Textile, like most platforms, is not really perfect. For instance, lists are not properly converted; bullet lists are only evaluated correctly if the line does not start with a space (even though the Ruby gem for Textile supports it starting with space), and image URLs, which are expressed between exclamation marks, are matched across lines, making a mess of posts where more than one exclamation mark is present.

I can probably work those two issues around during the conversion (I already have a script that can pass all the posts through pandoc to convert them to Markdown), but there are bound to be more issues.. which means I’ll have to go through all my posts (or most of them at least) to make sure that my posts have been converted correctly. Is anybody with a liking for Haskell willing to fix these smaller issues for me?

Hopefully, I’ll soon be able to stop relying on Textile, and the multiple text filters are going to be the first thing I get rid of in Typo, as their execution requires database access for no good reason…

This blog might just go away

Sorry to say.

Today I lost not one, but two drafts, because the idiotic JavaScript that Typo uses to save the drafts shows you a green “draft autosaved” message, even though the answer to the autosave request is a 404.

This drove me crazy. Seriously crazy. Crazy is the right word. I’m so angry with this pile of detritus that I literally (and not figuratively!) almost thrown my laptop out of the window.

I’ve gotten tired of Typo, or Publify like it’s called now. They started bolting on tons of features, visual editors, carousels, S3-compatible resource upload, and so on so forth, and now most of it is completely useless to me.

I cannot update my install to “Publify” because there are too many failed merge due to the complete rename of the interface. Plus it requires way too many new dependencies that I really don’t care about. I cannot keep the current one because as I said it’s partially broken.

Either I’m going to build my own system on top of the parts that work of Typo over the next week or so, or I find a new replacement platform, or most likely I’ll just close everything down and rm -rf the database.

Blog pages numbering

This post is mostly trivial and useless, you can skip it. Seriously.

I was musing something the other day: Typo allows to consult the whole history of my blog over time, by using a complete archive, for both the whole content, tags, and categories. These are numbered as “pages ”in the archives. But they are not permanent.

Indeed, the homepage you see is counted as “page 1” — so while the pages grow further and further, the content always moves. A post that is in page 12 today will not be in a couple of months. Sure it’s still possible to find it in monthly archives (as long as the month completed) but it’s far from obvious.

This page numbering is common on most systems where you want the most recent, or most relevant, content first, such as search engines and, indeed, most news sites or blogs. But while the bottom-up order of the posts in the single page makes sense to me, the numbering still doesn’t.

What I would like would be for pages to start from page 1 (the oldest posts) and continue further, 10-by-10, until reaching page 250 (which is pretty near at this point, for this blog), for post number 2501 — unfortunately this breaks badly, as your homepage would only have an article, if the homepage corresponded to page 250 indeed. So what is that I would like?

Well, first of all, I would say that the homepage (as well as the landing pages for tags and categories) is “page 0”, and page 0 is out of the order of the archives altogether. Page 0 is bottom-up, just like we have now, and has a fixed amount of entries. Page 1 is the oldest ten (or less) posts, top-down (in ascending date order), and so forth.

What does this achieve? Well, first of all a given post will always be at a given page. There is no more sliding around of old posts, making pages actually useful links; this includes the ability for search engines to actually have meaningful search results to those pages, instead of an ever-moving target — even though I would say that they should probably check the semantic data when reading the archive pages.

At first I thought this would have reduced the cache use as well, as stopping the sliding means that the content of a given page is not changing at every single post… unfortunately at most it can help cache fragments, as adding more pages means that there will be a different “last page number” (or link), at the bottom of the page. Of course it would be possible to use a /page/last link and only count the pages immediately before and after the current one.

Oh well, I guess this adds up to the list of changes i’d like to make to Typo (but I can’t, due to time, right now).

When the tests are more important than the functions

I’ve got to admit that the whole “Test-Driven Development” hype was not something that appealed to me as much — not because I think tests are wrong, I just think that while tests are important, focusing almost exclusively on them is just as dumb as ignoring them altogether.

In the Ruby world, there is so much talking about tests, that it’s still very common to find gems that don’t work at all with newer Ruby versions, but their specs pass just fine. Or even the tests pass fine for the original author, but they will fail on everyone else’s system because they depend heavily on custom configuration — sometimes, they depend on case-insensitive filesystems because the gem was developed on Windows or Mac, and never tested on Linux. Indeed, for the longest time, Rails own tests failed to work at all chances, and the “vendored” code they brought in, never had a working testsuite. Things have improved nowadays but not significantly.

Indeed, RubyGems do not make it easy to perform testing upon install, which means that many gems distributed lack part of the testsuite altogether — sometimes this is an explicit choice; in the case of my own RubyElf gem the tests are not distributed because they grow and grow, and they are quite a bit of megabytes at this point; if you want to run them you fetch the equivalent snapshot from GitHub — the ebuild in Gentoo uses that as a basis for that reason.

Sometimes even gems coming from people who are considered nearly Ruby Gods, like “rails_autolink”https://rubygems.org/gems/rails_autolink by tenderlove end up with a gem that fails tests, badly, in its release — the version we have in Portage is patched up, and the patch is sent upstream. Only the best for our users.

Now unfortunately, as I noted in the post’s title, some projects care more about the tests than the functionality — the project in question is the very same Typo that I use for this blog, and which I already mused forking to implement fixes that are not important for upstream. Maybe I should have done that already, maybe I will do that.

So I sent a batch of changes and fixes to upstream, some of them fixing issues compelled by their own changes, other implementing changes to allow proper usage of Typo over SSL vhosts (yes my blog is now available over SSL — I have to fix a few links and object load paths in some of the older posts, but it will soon work fine), other again simply making it a bit more “SEO”-friendly, since that seems to be a big deal for the developers.

What kind of response do I get about the changes? “They fail spec” — no matter that the one commit I’m first told it breaks specs actually fix editing of blog post after a change that went straight to master, so it might break specs, but it solve a real life issue that makes the software quite obnoxious. So why did I not check specs?

group :development, :test do
  gem 'thin'
  gem 'factory_girl', '~> 3.5'
  gem 'webrat'
  gem 'rspec-rails', '~> 2.12.0'
  gem 'simplecov', :require => false
  gem 'pry-rails'
end

I have no intention to start looking into this whole set of gems just to be able to run the specs for a blog which I find are vastly messed up. Why do I think so? Well, among other reasons, I’ve been told before quite a few times that they wouldn’t ever pass on PostgreSQL — which happens to be the database that has been powering this very instance for the past eight years. I’m pretty sure it’s working good enough!

Well, after asking (a few times) for the specs output — turns out that most of the specs broken are actually those that hardcode http:// in the URLs. Of course they break! My changes use protocol-relative URIs which means that the output changes to use // — no spec is present that tries to validate the output for SSL-delivered blogs which would otherwise break before.

And what is the upstream’s response to my changes? “It breaks here and there, would you mind looking into it?” Nope! “The commit breaks specs.” — No offer (until I complained loudly enough on IRC) for them to look into it, and fix either the patches or the specs are needed. No suggestion that there might be something to change in the specs.

Not even a cherry-pick of the patches that do not break specs.

Indeed as of this writing, even the first patch in the series, the only one that I really would care about get merged, because I don’t want to get out-of-sync with master’s database, at least until I decide to just get to the fork, is still there lingering, even if there is no way in this world that it breaks specs as it introduces new code altogether.

Am I going to submit a new set of commits with at least the visible specs’ failures fixed? Not sure — I really could care more about it, since right now my blog is working, it has the feature, the only one missing being the user agent forwarding to Akismet. I don’t see friendliness coming from upstream, and I keep thinking that a fork might be the best option at this point, especially when, suggesting the use of Themes for Rails to replace the currently theme handling, so that it works properly with the assets pipeline (one of the best features of Rails 3), the answer was “it’s not in our targets” — well it would be in mine, if I had the time! Mostly because being able to use SCSS would make it easier to share the stylesheets with my website (even though I’m considering getting rid of my website altogether).

So my plead to the rest of the development community, which I hope can be considered part of, is to not be so myopic that you care more about tests passing than features working. For sure Windows didn’t reach its popularity level being completely crash-proof — and at the same time I’m pretty sure that they did at least a level of basic testing on it. The trick is always in the compromise, not on the absolute care or negligence for tests.

ModSecurity and my ruleset, a release

After the recent Typo update I had some trouble with Akismet not working properly to mark comments as spam, at least the very few spam comments that could get past my ModSecurity Ruleset — so I set off to deal with it a couple of days ago to find out why.

Well, to be honest, I didn’t really want to focus on why at first. The first thing I found out while looking at the way Typo uses akismet, is that it still used a bundled, hacked, ancient akismet library.. given that the API key I got was valid, I jumped to the conclusion, right or wrong it was, that the code was simply using an ancient API that was dismissed, and decided to look around if there is a newer Akismet version; lo and behold, a 1.0.0 gem was released not many months ago.

After fiddling with it a bit, the new Akismet library worked like a charm, and spam comments passing through ModSecurity were again marked as such. A pull request and its comments later, I got a perfectly working Typo which marks comments as spam as good as before, with one less library bundled within it (and I also got the gem into Portage so there is no problem there).

But this left me with the problem that some spam comments were still passing through my filters! Why did that happen? Well, if you remember my idea behind it was validating the User-Agent header content… and it turns out that the latest Firefox versions have such a small header that almost every spammer seem to have been able to copy it just fine, so they weren’t killed off as intended. So more digging in the requests.

Some work later, and I was able to find two rules with which to validate Firefox, and a bunch of other browsers; the first relies on checking the Connection: keep-alive header that is always sent by Firefox (tried in almost every possible combination), and the other relies on checking the Content-Type on the POST request for a charset being defined: browsers will have it, but whatever the spammers are using nowadays doesn’t.

Of course, the problem is that once I actually describe and upload the rules, spammers will just improve their tools to not commit these mistakes, but in the mean time I’ll have some calm, spamless blog. I still won’t give in to captchas!

At any rate, beside adding these validations, thanks to another round of testing I was able to fix Opera Turbo users (now they can comment just fine), and that lead me to the choice of tagging the ruleset and .. releasing it! Now you can download it from GitHub or, if you use Gentoo, just install it as www-apache/modsec-flameeyes — there’s also a live ebuild for the most brave.

The usual Typo update report

You probably got used to read about me updating Typo at this point — the last update I wrote about was almost an year ago when I updated to Typo 6, using Rails 3 instead of 2. Then you probably remember my rant about what I would like of my blog …

Well, yesterday I was finally able to get rid of the last Rails 2.3 application that was running on my server, as a nuisance of a customer’s contract finally expired, and since I was finally able to get to update Typo without having to worry about the Ruby 1.8 compatibility that was dropped upstream. Indeed since the other two Ruby applications running on this server are Harvester for Planet Multimedia and a custom application I wrote for a customer, the first not using Rails at all, and the second written to work on both 1.8 and 1.9 alike, I was able to move from having three separate Rails slot installed (2.3, 3.0 and 3.1), to having only the latest 3.2, which means that security issues are no longer a problem for the short term either.

The new Typo version solves some of the smaller issues I’ve got with it before — starting from the way it uses Rails (now no longer requiring a single micro-version, but accepting any version after 3.2.11), and the correct dependency on the new addressable. At the same time it does not solve some of the most long-standing issues, as it insists on using the obsolete coderay 0.9 instead of the new 1.0 series.

So let’s go in order: the new version of Typo brings in another bunch of gems — which means I have to package a few more. One of them is fog which includes a long list of dependencies, most of which from the same author, and reminds me of how bad the dependencies issue is with Ruby packages. Luckily for me, even though the dependency is declared mandatory, a quick hacking around got rid of it just fine — okay hacking might be too much, it really is just a matter of removing it from the Gemfile and then removing the require statement for it, done.

For the moment I used the gem command to install the required packages — some of them are actually available on Hans’s overlay and I’ll be reviewing them soon (I was supposed to do that tonight, but my job got in the way) to add them to main tree. A few more requires me to write them from scratch so I’ll spend a few days on that soon. I have other things in my TODO pipeline but I’ll try to cover as many bases as I can.

While I’m not sure if this update finally solves the issue of posts being randomly marked as password-protected, at least this version solves the header in the content admin view, which means that I can finally see what drafts I have pending — and the default view also changed to show me the available drafts to finish, which is great for my workflow. I haven’t looked yet if the planning for future-published posts work, but I’ll wait for that.

My idea of forking Typo is still on, even though it might be more like a set of changes over it instead of being a full-on fork.. we’ll see.

What I’d like from my blog

My blog is, at this point, a vital part of my routine. I use my blog to write about my personal projects, I write about the non-restricted parts of my jobs, and I write about the work that goes into Gentoo Linux and other projects I follow.

I have over 2100 posts over time, especially thanks to the recent import of my original blog on Gentoo infrastructure. I don’t really know if it’s a lot, but sometimes Typo seems to miss something about it. Unfortunately I’m also running an older version of Typo, because I haven’t switched that virtual server to Ruby 1.9 yet as one of my customers is running a version of Radiant that is not going to work otherwise.

Said customer also bitched so hard, and screamed not to keep the site on my server, but as it happens the new webmasters that are supposed to pick up the website, and should have been cheaper and faster than me… have been working since June and still delivered nothing. Hopefully they’ll be done soon and I can kick said customer from the server.

Anyway, at this point there are a few things that I’d like to get out of my blogging platform in the future, which might require me to fork Typo and create my own version, which is likely going to be stripped down — as many things I really don’t care about, that are added here, like the short URLs, which I might just export as I think I used them at some point, but then I would handle through mod_rewrite rather than on the Rails side.

So let’s see what I don’t like about the current Typo I’m using:

  • The database access is more than a bit messed up; it probably has to do that upstream only cares about MySQL, while I want to run it on PostgreSQL; and this causes more than a couple of problems — have you noticed that sometimes my posts end up password-protected? Well, what happens is that the settings for the single posts are serialized in YAML and de-serialized, but somethings something bad happens and the YAML becomes invalid, causing the password-protection to kick in. I know there is an ActiveRecord extension that allows for key-value pairs to be stored in PostgreSQL-specific column types instead of having to (de)serialize them all the time, but again, this wouldn’t be something upstream would use.
  • Alternatively I’ve been toying with the idea of using MongoDB as a backend. Even with the issues that I have pointed out before, I think it might work well for a blog, especially since then the comments would be tied tot he post itself, rather than have the current connected tables.
  • There is a problem with the tags handling, again something upstream doesn’t seem to care about – at some point I remember reading they were mostly interested in making every single word in a post a tag to cross-connect posts with the same word; it’s one of the reasons why I’m not sure if I want to update it. If I change the title of one of the tags to make it more descriptive, then I edit a post that has that tag, it creates one more tag for each word in that title, instead of preserving the older tags. I really should clean up the tags I got right now.
  • I would also like that when I get to the “new post” page it would create it already and then get me back to editing it — this is important to me because sometimes if I have to restart Chromium, or suspend the laptop, something goes very wrong and it creates multiple drafts for the same post. And cleaning them up is a long task.
  • A better implementation of notification for new posts, and integration with Flattr, would be also very good. While IFTTT makes it easy to post the new entries to Twitter and LinkedIn, its lack of integration for Flattr is a major pain, and the fact that right now, to use auto-submit, I have to duplicate part of the content in the HTML of the pages, is also a problem. So being able to create a “Flattr thing” the moment when I actually post something would be a major plus for me.
  • Since I’m actually quite the paranoid, another thing I would like to have would be either two-factor authentication with Google Authenticator on a cellphone, or (actually, in addition to) certificate-based authentication for the admin interface. Having a safe way to make sure that I’m the only one logging in would make me remove some of the administrative interface rules on ModSecurity, which would in turn let me write posts from public WiFi networks sidestepping the problem I posted about the other day.
  • Scheduled posting. This used to be supported, but it’s been completely broken for years at this point, but it was very useful to me a long time ago since I would just write a bunch of posts and schedule them to be posted once a day. I suppose this should now be changed so that the planned posts are only actually posted if a process is called to make sure that the new posts are now “elapsed”… but again this is something that I’d like to have, and you readers would probably enjoy, as it would probably make for more and better content overall.

I definitely do not want to go with WordPress, I just wish I had the time to write my own Typo fork, and make it more usable for what I do, rather than hoping that the upstream development for Typo does not go in a direction I don’t like at all.. Maybe somebody else has the same requirements and would like to join me in this project; if so, send me an email.. maybe it’ll finally be the time I decide to start on the fork itself.

MongoDB, after a Meetup

Yesterday night I had my first glimpse of what can be called, stretching it, a social life. I was invited by one of the 10gen people to go to a MongoDB meetup/party held in honour of the release of MongoDB 2.2 — I came to know about this after I was ranting on Twitter about the new point release for 2.0 is still using the outdated Boost filesystem version 2.

So first of all thanks to 10gen for the drinks and chips — the event was fun and it was nice to meet people from around here, although I was surprised in the seemingly scarceness of Linux users… too bad that Davide couldn’t be there because he was working…

You probably remember I was quite a bit disappointed in MongoDB when I tried it out, not for its own design but for some nuances that make it a good fit or not in an Unix environment — one of which (the syslog support) is finally solved on the new release, 2.2. But most importantly because of the way the Ruby driver has been managed.

Now I didn’t expect much of a technical meeting since it was intended as a party, but the main answer I have been given for the trouble (well, they asked, when I said I was sceptical) was that “the drivers are being worked on just recently”. I have to say that if it wasn’t for Matthew, I would have felt a bit disappointed in the handling… but since he at least engaged in a technical discussion at least on the small run, I actually got a possible reasons for the situation. the “just recently” meant that the drivers have been put on feature parity … and seems like somebody intended it as in “drop features that are not available elsewhere” without considering the API compatibility issue.

Also kudos for him taking time to look whether the Boost 1.50 compatibility was fixed in the new release or not (seems like it is, so that’s good for Gentoo).

The feeling I got is that my doubts on MongoDB are not really just me — I heard many people saying that they like the idea, and they are using it for side projects but … they don’t feel ready to prefer it over traditional solutions at the moment for their main systems. Some are having trouble with the way it handle small amount of data (too big an overhead), others the way it handles complex queries (spidermonkey is too slow, v8 is too unstable). So I guess I’m not more sceptical than the average.

My personal complain right now? I doubt I’ll have time to try the one experiment I’d like to see: converting Typo to use MongoDB — with my usual rate of writing, this blog has now over 1740 articles.. which enough comments already in – and I don’t get hit by spam so much thanks to my ruleset – but it still is sluggish from time to time. Given the kind of data that a blog is composed of, I think there is something to gain to go with an object tree model instead of flat tables… but I don’t have the time to work on this any time soon. Maybe when I’ll have time to play.

All in all, I still think that if there is an alternative (when the use cases make sense of course) to your average RDBMS, MongoDB at the moment is your best bet, which might say something about what the remaining bets are.

Ruby pains, May 2012 edition

While I’m still trying to figure out how to get the logs analysed for the tinderbox, I’ve been spending some time to work on Gentoo’s Ruby packaging again, which is something that happens from time to time as you know.

In this case the spark is the fact that I want to make sure that my systems work with Ruby 1.9. Mostly, this is because the blog engine I’m using (Typo) is no longer supported on Ruby 1.8, and while I did spend some time to get it to work, I’m not interested in keeping it that way forever.

I started by converting my box database so that it would run on Ruby 1.9. This was also particularly important because Mongoid 3 is also not going to support Ruby 1.8. This was helped by the fact that finally bson-1.6 and mongo-1.6 are working correctly with Ruby 1.9 (the previous minor, 1.5, was failing tests). Next step of course will be to get them working on JRuby.

Unfortunately, while now my application is working fine with Ruby 1.9, Typo is still a no-go… reason? It still relies on Rails 3.0, which is not supported on 1.9 in Gentoo, mostly due to its dependencies. For instance it still wants i18n-0.5, which doesn’t work on 1.9, and it tries to get ruby-debug in (which is handled in different gems altogether for Ruby 1.9, don’t ask). The end result is that I’ve still not migrated my blog to the server running 1.9, and I’m not sure when and if that will happen, at this point.. but things seem to fall into place, at least a bit.

Hopefully, before end of the summer, Ruby 1.9 will be the default Ruby interpreter for Gentoo, and next year we’ll probably move off Ruby 1.8 altogether. At some later point, I’d also like to try using JRuby for Rails, since that seems to have its own advantages — my only main problem is that I have to use JDBC to reach PostgreSQL, as the pg gem does not work (and that’s upsetting as that is what my symbol collision analysis script is using).

So, these are my Ruby 1.9 pains for now, I hope to have better news in a while.