Blog Redirects, Azure Style

Last year, I set up an AppEngine app to redirect the old blog’s URLs to the WordPress install. It’s a relatively simple Flask web application, although it turned out to be around 700 lines of code (quite a bit to just serve redirects). While it ran fine for over a year on Google Cloud without me touching anything, and fitting into the free tier, I had to move it, as part of my divestment from GSuite (which is only vaguely linked to me leaving Google).

I could have just migrated the app on a new consumer account for AppEngine, but I decided to try something different, to avoid the bubble, and to compare other offerings. I decided to try Azure, which is Microsoft’s cloud offering. The first impressions were mixed.

The good thing of the Flask app I used for redirection being that simple is that nothing ties it to any one provider: the only things you need are a Python environment, and the ability to install the requests module. For the same codebase to work on AppEngine and Azure, though, there seems to be a need for a simple change. Both providers appear to rely on Gunicorn, but AppEngine appears to be looking for an object called app in the main module, while Azure is looking for it in the application module. This is trivially solved by defining the whole Flask app inside application.py and having the following content in main.py (the command line support is for my own convenience):

#!/usr/bin/env python3

import argparse

from application import app


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--listen_host', action='store', type=str, default='localhost',
        help='Host to listen on.')
    parser.add_argument(
        '--port', action='store', type=int, default=8080,
        help='Port to listen on.')

    args = parser.parse_args()

    app.run(host=args.listen_host, port=args.port, debug=True)

The next problem I encountered was with the deployment. While there’s plenty of guides out there to use different builders to set up the deployment on Azure, I was lazy and went straight for the most clicky one, which used GitHub Actions to deploy from a (private) GitHub repository straight into Azure, without having to install any command line tools (sweet!) Unfortunately, I hit a snag in the form of what I think is a bug in the Azure GitHub Action template.

You see, the generated workflow for the deployment to Azure is pretty much zipping up the content of the repository, after creating a virtualenv directory to install the requirements defined for it. But while the workflow creates the virtualenv in a directory called env, the default startup script for Azure is looking for it in a directory called antenv. So for me it was failing to start until I changed the workflow to use the latter:

    - name: Install Python dependencies
      run: |
        python3 -m venv antenv
        source antenv/bin/activate
        pip install -r requirements.txt
    - name: Zip the application files
      run: zip -r myapp.zip .

Once that problem was solved, the next issue was to figure out how to set up the app on its original domain and have it serve TLS connections as well. This turned out to be a bit more complicated than expected because I had set up CAA records in my DNS configuration to only allow Let’s Encrypt, but Microsoft uses DigiCert to provide the (short lived) certificates, so until I removed that it wouldn’t be able to issue (oops.)

After everything is set up, here’s a few more of the differences between the two services, that I noticed.

First of all, Azure does not provide IPv6, although since they use CNAME records this can change at any time in the future. This is not a big deal for me, not only because the IPv6 is still dreamland, but also because the redirection would point to WordPress, that does not support IPv6. Nonetheless, it’s an interesting point to make, that despite Microsoft having spent years preparing for IPv6 support, and having even run Teredo tunnels, they also appear to not be ready to provide modern service entrypoints.

Second, and related, it looks like on Azure there’s a DNAT in front of the requests sent to Gunicorn — all the logs show the requests coming from 172.16.0.1 (a private IP address). This is opposite to AppEngine that shows the actual request IP in the log. It’s not a huge deal, but it does make it a bit annoying to figure out if there’s someone trying to attack your hostname. It also makes it funny that it’s not supporting IPv6, given it does not appear to need for the application itself to support the new addresses.

Speaking of logs, GCP exposes structured request logs. This is a pet peeve of mine, which GCP appears to at least make easier to deal with. In general, it allows you to filter logs much more easily to find out instances of requests being terminated with an error status, which is something that I paid close attention to in the weeks after deploying the original AppEngine redirector: I wanted to make sure my rewriting code didn’t miss some corner cases that users were actually hitting.

I couldn’t figure out how to get a similar level of detail in Azure, but honestly I have not tried too hard right now, because I don’t need that level of control for the moment. Also, while there does seem to be an entry in the portal’s menu to query logs, when I try it out I get a message «Register resource provider ‘Microsoft.Insights’ for this subscription to enable this query» which suggests to me it might be a paid extra.

Speaking of paid, the question of costs is something that clearly needs to be kept in clear sight, particularly given recent news cycles. Azure seems to provide a 12 months free trial, but it also gives you £150 of credit for 14 days, which don’t seem to match up properly to me. I’ll update the blog post (or write a new one) with more details after I have some more experience with the system.

I know that someone will comment complaining that I shouldn’t even consider Cloud Computing as a valid option. But honestly, from what I can see, I will be likely running a couple more Cloud applications out there, rather than keep hosting my own websites, and running my own servers. It’s just more practical, and it’s a different trade-off between costs and time spent maintaining thing, so I’m okay with it going this way. But I also want to make sure I don’t end up locking myself into a single provider, with no chance of migrating.

Blog Redirects & AppEngine

You may remember that when I announced I moved to WordPress, I promised I wouldn’t break any of the old links, particularly as I kept them working since I started running the blog underneath my home office’s desk, on a Gentoo/FreeBSD, just shy of thirteen years ago.

This is not a particularly trivial matter, because Typo used at least three different permalink formats (and two different formats for linking to tags and categories), and Hugo used different ones for all of those too. In addition to this, one of the old Planet aggregators I used to be on had a long-standing bug and truncated URLs to a certain length (actually, two certain lengths, as they extended it at some point), and since those ended up indexed by a number of search engines, I ended up maintaining a long mapping between broken URLs and what they were meant to be.

And once I had such a mapping, I ended up also keeping in it the broken links that other people have created towards my blog. And then when I fixed typos in titles and permalink I also added all of those to the list. And then, …

Oh yeah, and there is the other thing — the original domain of the blog, which I made a redirect for the newest one nearly ten years ago.

The end result is that I have kept holding, for nearly ten years, an unwieldy mod_rewrite configuration for Apache, that also prevented me to migrate to any other web server. Migrating to a new hostname when I migrated to WordPress was always my plan, if nothing else not to have to deal with all those rewrites in the same configuration as the webapp itself.

I have kept, until last week, the same abomination of a configuration, running on the same vserver as the blog used to run. But between stopping relationships with customers (six years ago when I moved to Dublin), moving the blog out, and removing the website of a friend of mine who decided to run his own WordPress, the amount of work needed to maintain the vserver is no longer commensurate to the results.

While discussing my options with a few colleagues, one idea that came out was to just convert the whole thing to a simple Flask application, and run it somewhere. I ended up wanting to try my employer’s own offerings, and ran it on AppEngine (but the app itself does not use any AppEngine specific API, it’s literally just a Flask app).

This meant having the URL mapping in Python, with a bit of regular expression magic to make sure the URL for previous blog engines are replaced with WordPress compatible ones. It also meant that I can have explicit logic of what to re-process and what not to, which with Apache was not easily done (but still possible).

Using an actual programming language instead of Apache configuration also means that I can be a bit smarter on how I process the requests. In particular, before returning the redirect to the requester, I’m now verifying whether the target exists (or rather, whether WordPress returns an OK status for it), and use that to decide whether to return a permanent or temporary redirect. This means that most of the requests to the old URLs will return permanent (308) redirects, and whatever is not found raises a warning I can inspect and see if I should add more entries to the maps.

A very simple UML Sequence Diagram of the redirector, at a high level.

The best part of all of this is of course that the AppEngine app is effectively always below the free tier quota marker, and as such has an effectively zero cost. And even if it wasn’t, the fact that it’s a simple Flask application with no dependency on AppEngine itself means I can move it to any other hosting option that I can afford.

The code is quite of a mess right now, not generic and fairly loose. It has to workaround an annoying Flask issue, and as such it’s not in any state for me to opensource, yet. My plan is to do so as soon as possible, although it might not include the actual URL maps, for the sake of obscurity.

But what is very clear from this for me is that if you want to have a domain whose only task is to redirect to other (static) addresses, like projects hosted off-site, or affiliate links – two things that I have been doing on my primary domain together with the rest of the site, by the way – then the option of using AppEngine and Flask are actually pretty good. You can get that done in a few hours.

Some of my thoughts on comments in general

One of the points that is the hardest for me to make when I talk to people about my blog is how important comments are for me. I don’t mean comments in source code as documentation, but comments on the posts themselves.

You may remember that was one of the less appealing compromises I made when I moved to Hugo was accepting to host the comments on Disqus. A few people complained when I did that because Disqus is a vendor lock-in. That’s true in more ways than one may imagine.

It’s not just that you are tied into a platform with difficulty of moving out of it — it’s that there is no way to move out of it, as it is. Disqus does provide you the ability to download a copy of all the comments from your site, but they don’t guarantee that’s going to be available: if you have too many, they may just refuse to let you download them.

And even if you manage to download the comments, you’ll have fun time trying to do anything useful with them: Disqus does not let you re-import them, say in a different account, as they explicitly don’t allow that format to be imported. Nor does WordPress: when I moved my blog I had to hack up a script that took the Disqus export format, a WRX dump of the blog (which is just a beefed up RSS feed), and produces a third file, attaching the Disqus comments to the WRX as WordPress would have exported them. This was tricky, but it resolved the problem, and now all the comments are on the WordPress platform, allowing me to move them as needed.

Many people pointed out that there are at least a couple of open-source replacements for Disqus — but when I looked into them I was seriously afraid they wouldn’t really scale that well for my blog. Even WordPress itself appears sometimes not to know how to deal with a >2400 entries blog. The WRX file is, by itself, bigger than the maximum accepted by the native WordPress import tool — luckily the Automattic service has higher limits instead.

One of the other advantages of having moved away from Disqus is that the comments render without needing any JavaScript or third party service, make them searchable by search engines, and most importantly, preserves them in the Internet Archive!

But Disqus is not the only thing that disappoints me. I have a personal dislike for the design, and business model, of Hacker News and Reddit. It may be a bit of a situation of “old man yells at cloud”, but I find that these two websites, much more than Facebook, LinkedIn and other social media, are designed to take the conversation away from the authors.

Let me explain with an example. When I posted about Telegram and IPv6 last year, the post was sent to reddit, which I found because I have a self-stalking recipe for IFTTT that informs me if any link to my sites get posted there. And people commented on that — some missing the point and some providing useful information.

But if you read my blog post you won’t know about that at all, because the comments are locked into Reddit, and if Reddit were to disappear the day after tomorrow there won’t be any history of those comments at all. And this is without going into the issue of the “karma” going to the reposter (who I know in this case), rather than the author — who’s actually discouraged in most communities from submitting their own writings!

This applies in the same or similar fashion to other websites, such as Hacker News, Slashdot, and… is Digg still around? I lost track.

I also find that moving the comments off-post makes people nastier: instead of asking questions ready to understand and talk things through with the author, they assume the post exist in isolation, and that the author knows nothing of what they are talking about. And I’m sure that at least a good chunk of that is because they don’t expect the author to be reading them — they know full well they are “talking behind their back”.

I have had the pleasure to meet a lot of people on the Internet over time, mostly through comments on my or other blogs. I have learnt new thing and been given suggestions, solutions, or simply new ideas of what to poke at. I treasure the comments and the conversation they foster. I hope that we’ll have more rather than fewer of them in the future.

WordPress, really?

If you’re reading this blog post, particularly directly on my website, you probably noticed that it’s running on WordPress and that it’s on a new domain, no longer referencing my pride in Europe, after ten years of using it as my domain. Wow that’s a long time!

I had three reasons for the domain change: the first is that I didn’t want to keep the full chain of redirects of extremely old link onto whichever new blogging platform I would select. And the second is it that it made it significantly easier to set up a WordPress.com copy of the blog while I tweaked and set it up, rather than messing up with the domain at once. The third one will come with a separate rant very soon, but it’s related to the worrying statement from the European Commission regarding the usage of dot-EU domains in the future. But as I said, that’s a separate rant.

I have had a few people surprised when I was talking over Twitter about the issues I faced on the migration. I want to give some more context on why I went this way.

As you remember, last year I complained about Hugo – to the point that a lot of the referrers to this blog are still coming from the Hacker News thread about that – and I started looking for alternatives. And when I looked at WordPress I found that setting it up properly would take me forever, so I kept my mouth shut and doubled-down on Hugo.

Except, because of the way it is set up, it meant not having an easy way to write blog posts, or correct blog posts, from a computer that is not my normal Linux laptop with the SSH token and everything else. Which was too much of a pain to keep working with. While Hector and others suggested flows that involved GIT-based web editors, it all felt too Rube Goldberg to me… and since moving to London my time is significantly limited compared to before, so I either spend time on setting everything up, or I can work on writing more content, which can hopefully be more useful.

I ended up deciding to pay for the personal tier of WordPress.com services, since I don’t care about monetization of this content, and even the few affiliate links I’ve been using with Amazon are not really that useful at the end of the day, so I gave up on setting up OneLink and the likes here. It also turned out that Amazon’s image-and-text links (which use JavaScript and iframes) are not supported by WordPress.com even with the higher tiers, so those were deleted too.

Nobody seems to have published an easy migration guide from Hugo to WordPress, as most of the search queries produced results for the other way around. I will spend some time later on trying to refine the janky template I used and possibly release it. I also want to release the tool I wrote to “annotate” the generated WRX file with the Disqus archive… oh yes, the new blog has all the comments of the old one, and does not rely on Disqus, as I promised.

On the other hand, there are a few things that did get lost in the transition: while JetPack Plugin gives you the ability to write posts in Markdown (otherwise I wouldn’t have even considered WordPress), it doesn’t seem like the importer knows at all how to import Markdown content. So all the old posts have been pre-rendered — a shame, but honestly that doesn’t happen very often that I need to go through old posts. Particularly now that I merged in the content from all my older blogs into Hugo first, and now this one massive blog.

Hopefully expect more posts from me very soon now, and not just rants (although probably just mostly rants).

And as a closing aside, if you’re curious about the picture in the header, I have once again used one of my own. This one was taken at the maat in Lisbon. The white balance on this shot was totally off, but I liked the result. And if you’re visiting Lisbon and you’re an electronics or industrial geek you definitely have to visit the maat!

How blogging changed in the past ten years

One of the problems that keeps poking back at me every time I look for an alternative software for this blog, is that it somehow became not your average blog, particularly not in 2017.

The first issue is that there is a lot of history. While the current “incarnation” of the blog, with the Hugo install, is fairly recent, I have been porting over a long history of my pseudo-writing, merging back into this one big collection the blog posts coming from my original Gentoo Developer blog, as well as the few posts I wrote on the KDE Developers blog and a very minimal amount of content from my (mostly Italian) blog when I was in high school.

Why did I do it that way? Well the main thing is that I don’t want to lose the memories. As some of you might know already, I faced my mortality before, and I came to realize that this blog is probably the only thing of substance that I had a hand on, that will outlive me. And so I don’t want to just let migration, service turndowns, and other similar changes take away what I did. This is also why I did publish to this blog the articles I wrote for other websites, namely NewsForge and Linux.com (back when they were part of Geeknet).

Some of the recovery work actually required effort. As I said above there’s a minimal amount of content that comes from my high school days blog. And it’s in Italian that does not make it particularly interesting or useful. I had deleted that blog altogether years and years ago, so I had to use the Wayback Machine to recover at least some of the posts. I will be going through all my old backups in the hope of finding that one last backup that I remember making before tearing the thing down.

Why did I tear it down in the first place? It’s clearly a teenager’s blog and I am seriously embarrassed by the way I thought and wrote. It was 1314 years ago, and I have admitted last year that I can tell so many times I’ve been wrong. But this is not the change I want to talk about.

The change I want to talk about is the second issue with finding a good software to run my blog: blogging is not what it used to be ten years ago. Or fifteen years ago. It’s not just that a lot of money got involved in the mean time, so now there are a significant amount of “corporate blogs”, that end up being either product announcements in a different form, or the another outlet for not-quite-magazine content. I know of at least a couple of Italian newspapers that provide “blogs” for their writers, which look almost exactly like the paper’s website, but do not have to be reviewed by the editorial board.

In addition to this, a lot of people’s blogs stopped providing as much details of their personal life as they used to. Likely, this is related to the fact that we now know just how nasty people on the Internet can be (read: just as nasty as people off the Internet), and a lot of the people who used to write lightheartedly don’t feel as safe, correctly. But there is probably another reason: “Social Media”.

The advent of Twitter and Facebook made it so that there is less need to post short personal entries, too. And Facebook in particular appears to have swallowed most of the “cutesy memes” such as quizzes and lists of things people have or have not done. I know there are still a few people who insist on not using these big names social networks, and still post for their friends and family on blogs, but I have a feeling they are quite the minority. And I can tell you for sure that since I signed up for Facebook, a lot of my smaller “so here’s that” posts went away.

Distribution chart of blog post sizes over time

This is a bit of a rough plot of blog sizes. In particular I have used the raw file size of the markdown sources used by Hugo, in bytes, which make it not perfect for Unicode symbols, and it includes the “front matter”, which means that particularly all the non-Hugo-native posts have their title effectively doubled by the slug. But it shows trends particularly well.

You can see from that graph that some time around 2009 I almost entirely stopped writing short blog posts. That is around the time Facebook took off in Italy, and a lot of my interaction with friends started happening there. If you’re curious of that visible lack of posts just around half of 2007, that was the pancreatitis that had me disappear for nearly two months.

With this reduction in scope of what people actually write on blogs, I also have a feeling that lots of people were left without anything to say. A number of blogs I still follow (via NewsBlur since Google Reader was shut down), post once or twice a year. Planets are still a thing, and I still subscribe to a number of them, but I realize I don’t even recognize half the names nowadays. Lots of the “old guard” stopped blogging almost entirely, possibly because of a lack of engagement, or simply because, like me, many found a full time job (or a full time family), that takes most of their time.

You can definitely see from the plot that even my own blogging has significantly slowed down over the past few years. Part of it was the tooling giving up on me a few times, but it also involves the lack of energy to write all the time as I used to. Plus there is another problem: I now feel I need to be more accurate in what I’m saying and in the words I’m using. This is in part because I grew up, and know how much words can hurt people even when meant the right way, but also because it turns out when you put yourself in certain positions it’s too easy to attack you (been there, done that).

A number of people that think argue that it was the demise of Google Reader1 that caused blogs to die, but as I said above, I think it’s just the evolution of the concept veering towards other systems, that turned out to be more easily reachable by users.

So are blogs dead? I don’t think so. But they are getting harder to discover, because people use other platforms and it gets difficult to follow all of them. Hacker News and Reddit are becoming many geeks’ default way to discover content, and that has the unfortunate side effect of not having as much of the conversation to happen in shared media. I am indeed bothered about those people who prefer discussing the merit of my posts on those external websites than actually engaging on the comments, if nothing else because I do not track those platforms, and so the feeling I got is of talking behind one’s back — I would prefer if people actually told me if they shared my post on those platforms; for Reddit I can at least IFTTT to self-stalk the blog, but that’s a different problem.

Will we still have blogs in 10 years? Probably yes, but they will not look like the ones we’re used to most likely. The same way as nowadays there still are personal homepages, but they clearly don’t look like Geocities, and there are social media pages that do not look like MySpace.


  1. Usual disclaimer: I do work for Google at the time of writing this, but these are all personal opinions that have no involvement from the company. For reference, I signed the contract before the Google Reader shutdown announcement, but started after it. I was also sad, but I found NewsBlur a better replacement anyway.
    [return]

Tiny Tiny RSS: don’t support Nazi sympathisers


XKCD #1357 — Free Speech

After complaining about the lack of cache hits from feed readers, and figuring out why NewsBlur (that was doing the right thing), and then again fixing the problem, I started looking at what other readers kept being unfixed. It turned out that about a dozen people used to read my blog using Tiny Tiny RSS, a PHP-based personal feed reader for the web. I say “used to” because, as of 2017-08-17, TT-RSS is banned from accessing anything from my blog via ModSecurity rule.

The reason why I went to this extent is not merely technical, which is why you get the title of this blog the way it is. But it all started with me filing requests to support modern HTTP features for feeds, particularly regarding the semantics of permanent redirects, but also about the lack of If-Modified-Since, which allows significant reduction on the bandwidth usage of a blog1. Now, the first response I got about the permanent redirect request was disappointing but it was a technical answer, so I provided more information. After that?

After that the responses stopped being focused on the technical issues, but rather appear to be – and that’s not terribly surprising in FLOSS of course – “not my problem”. Except, the answers also came from someone with a Pepe the Frog avatar.2 And this is August of 2017, when America shown having a real Nazi problem, and willingly associating themselves to alt-right is effectively being Nazi sympathizers. The tone of the further answers also show that it is no mistake or misunderstanding.

You can read the two bugs here: and . Trigger warning: extreme right and ableist views ahead.

While I try to not spend too much time on political activism on my blog, there is a difference from debating whether universal basic income (or even universal health care) is a right nor not, and arguing for ethnic cleansing and the death of part of a population. So no, no way I’ll refrain from commenting or throwing a light on this kind of toxic behaviour from developers in the Free Software community. Particularly when they are not even holding these beliefs for themselves but effectively boasting them by using a loaded avatar on their official support forum.

So what you can do about this? If you get to read this post, and have subscribed to my blog through TT-RSS, you now know why you don’t get any updates from it. I would suggest you look for a new feed reader. I will as usual suggest NewsBlur, since its implementation is the best one out there. You can set it up by yourself, since it’s open source. Not only you will be cutting your support to Nazi sympathisers, but you also will save bandwidth for the web as a whole, by using a reader that actually implements the protocol correctly.

Update (2017-08-06): as pointed out in the comments by candrewswpi, FreshRSS is another option if you don’t want to set up NewsBlur (which admittedly may be a bit heavy). It uses PHP so it should be easier to migrate given the same or similar stack. It supports at least proper caching, but I’m not sure about the permanent redirects, it needs testing.

You could of course, as the developers said on those bugs, change the User-Agent string that TT-RSS reports, and keep using it to read my blog. But in that case, you’d be supporting Nazi sympathisers. If you don’t mind doing that, I may ask you a favour and stop reading my blog altogether. And maybe reconsider your life choices.

I’ll repeat here that the reason why I’m going to this extent is that there is a huge difference between the political opinions and debates that we can all have, and supporting Nazis. You don’t have to agree with my political point of view to read my blog, you don’t have to agree with me to talk with me or being my friend. But if you are a Nazi sympathiser, you can get lost.


  1. you could try to argue that in this day and age there is no point in worrying about bandwidth, but then you don’t get to ever complain about the existence of CDNs, or the fact that AMP and similar tools are “undemocratizing” the web.
    [return]
  2. Update (2017-08-03): as many people have asked: no it’s not just any frog or any Pepe that automatically makes you a Nazi sympathisers. But the avatar was not one of the original illustrations, and the attitude of the commenter made it very clear what their “alignment” was. I mean, if they were fans of the original character, they would probably have the funeral scene as their avatar instead.
    [return]

Modern feed readers

Four years ago, I wrote a musing about the negative effects of the Google Reader shutdown on content publishers. Today I can definitely confirm that some of the problems I foretold materialized. Indeed, thanks to the fortuitous fact that people have started posting my blog articles to reddit and hacker news (neither of which I’m fond of, but let’s leave that aside), I can declare that the vast majority of the bandwidth used by my blog is consumed by bots and in particular by feed readers. But let’s start from the start.

This blog is syndicated over a feed, the URL and format of which changed a number of times before, mostly with the software, or with the update of the software. The most recent change was due to switching from Typo to Hugo, and the feed name changing. I could have kept the original feed name, but it made little sense at the time, so instead I set up permanent redirects from the old URLs to the new URLs, as I always do. I say I always do because I keep working even the original URLs from when I ran the blog off my home DSL.

Some services and feed reading software know how to deal with permanent redirects correctly, and will (eventually) replace the old feed URL with the new one. For instance NewsBlur will replace URLs after ten fetches replied with a permanent redirect (which is sensible, to avoid accepting a redirection that was set up by mistake and soon rolled back, and to avoid data poisoning attacks). Unfortunately, it seems like this behaviour is extremely rare, and so on August 14th I received over three thousands requests for the old Typo feed URL (admittedly, that was the most persistent URL I used). In addition to that, I also received over 300 requests for the very old Typo /xml/ feeds, of which 122 still pointing at my old dynamic domain, which is now pointing at the normal domain for my blog. This has been the case now for almost ten years, and yet some people still have subscription to those URLs! At least one Liferea and one Akregator pointing at those URLs.

But while NewsBlur implements sane semantics for handling permanent redirects, it is far from a perfect implementation. In particular even though I have brought this up many times, Newsblur is not actually sending If-Modified-Since or If-None-Match headers, which means it will take a copy of the feed at every request. Even though it does support compressed responses (non fetch of the feed is allowed without compressed responses), NewsBlur is requesting the same URL more than twice an hour, because it seems to have two “sites” described by the same URL. At 50KiB per request, that makes up about 1% of the total bandwidth usage of the blog. To be fair, this is not bad at all, but one has to wonder why they can’t be saving the last modified or etag values — I guess I could install my own instance of NewsBlur and figure out how to do that myself, but who knows when I would find the time for that.

Update (2017-08-16): Turns out that, as Samuel pointed out in the comments and on Twitter, I wrote something untrue. NewsBlur does send the headers, and supports this correctly. The problem is an Apache bug that causes 304 never to be issued when using If-None-Match and mod_deflate.

To be fair, even rawdog, which I use for Planet Multimedia, does not appear to support these properly. Oh and speaking of Planet Multimedia, would someone be interested in providing a more modern template so that Monty’s pictures don’t take over the page, that would be awesome!

There actually are a few other readers that do support these values correctly, and indeed receive 304 (Not Modified) status code most of the time. These include Lighting (somebody appears to be still using it!) and at least yet-another-reader-service Willreadit.com — this latter appears to be in beta and being invite only; it’s probably the best HTTP implementation I’ve seen for a service with such a rough website. Indeed the bot landing page points out how it supports If-Modified-Since and gzip-compressed responses. Alas it does not appear to learn from persistent redirects though, so it’s currently fetching my blog’s feed twice, probably because there are at least two subscribers for it.

Also note that supporting If-Modified-Since is a prerequisite for supporting delta feeds which is an interesting way to save even more bandwidth (although I don’t think this is feasible to do with a static website at all).

At the very least it looks like we won the battle for supporting compressed responses. The only 406 (Not Acceptable) responses for the feed URL are for Fever, which is no longer developed or supported. Even Gwene, which I pointed out was hammering my blog last time I wrote about this, is now content to get the compressed version. Unfortunately it does not appear like my pull request was ever merged, which means it’s likely the repository itself is completely out of sync with what is being run.

So in 2017, what is the current state of the art feed reader support? NewsBlur has recently added support for JSON Feed which is not particularly exciting – when I read the post I was reminded, by the screenshot of choice there, where I heard of Brent Simmons before: Vesper, which is an interesting connection, but I should not go into that now – but at least shows that Samuel Clay is actually paying attention to the development of the format — even though that development right now appears to just avoiding XML. Which to be honest is not that bad of an idea: since HTML (even HTML5) does not have to be well-formatted XML, you need to provide it as cdata in an XML feed. And the way you do that makes it very easy to implement it incorrectly.

Also, as I wrote this post I realized what else I would like from NewsBlur: the ability to subscribe to an OPML feed as a folder. I still subscribe to lots of Planets, even though they seem to have lost their charm, but a few people are aggregated in multiple planets and it would make sense to be able to avoid duplicate posts. If I could tell NewsBlur «I want to subscribe to this Planet, aggregate it into a folder», it would be able to tell the duplicated feeds, and mark the posts as read on all of them at the same time. Note that what I’d like is something different from just importing an OPML description of the planet! I would like for the folder to be kept in sync with the OPML feed, so that if new feeds are added, they also get added to the folder, and same for removed feeds. I should probably file that on GetSatisfaction at some point.

IPv6: WordPress has a long way to go, too

I recently complained about Hugo and the fact that it seems like its development was taken over by SEO-types, that changed its defaults to something I’m not comfortable with. In the comments to that post I have let it understood that I’ve been looking into WordPress as an alternative once again.

The reason why I’m looking into WordPress is that I expected it to be a much easier setup, and (assuming I don’t go crazy on the plugins) an easily upgradeable install. Jürgen told me that they now support Markdown, and of course moving to WordPress means I don’t need to keep using Disqus for comments, and I can own my comments again.

The main problem with WordPress, like most PHP apps, is that it requires particular care for it to be set up securely and safely. Luckily I do have some experience with this kind of work, and I thought I might as well share my experience and my setup once I got it running. But here is where things got complicated, to the point I’m not sure if I have any chance of getting this working, so I may have to stick with Hugo for much longer than I was hoping. And almost all of the problems fall back to the issue that my battery of test servers are IPv6-only. But don’t let me get ahead of myself.

After installing, configuring, and getting MySQL, Apache, and PHP-FPM to all work together nicely (which believe me was not obvious), I tried to set up the Akismet plugin, which failed. I ignored that, removed it, and then figured out that there is no setting to enable Markdown at all. Turns out it requires a plugin, which, according again to Jürgen, is the Jetpack plugin from WordPress.com itself.

Unfortunately, I couldn’t get the plugins page to work at all: it would just return an error connecting to WordPress.org. The first problem was that the Plugins page wouldn’t load at all, and a quick tcpdump later told me that WordPress tried connecting to api.wordpress.org. Which despite having eight separate IP addresses to respond from, has no IPv6. Well, that’s okay, I have a TinyProxy running on the host system that I use to fetch distfiles from the “outside world” that is not v6-compatible, so I just need to set this up, right? After all, I was already planning on disallowing direct network access to WordPress, so that’s not a big deal.

Well, the first problem is that the way to set up proxies with WordPress is not documented in the default wp-config.php file. Luckily I found that someone else wrote it down. And that started me on the right direction. Except it was not enough, as the list of plugins and the search page would come up, but they wouldn’t download, with the same error about not being able to establish a (secure) connection to WordPress.org, but only on the Apache error log at first — the page itself would have a debug trace if you ask WP to enable debug reporting.

Quite a bit of debugging later, with tcpdump and editing the source code, I found the problem: some of the requests sent by WordPress target HTTP endpoints, and others (including the downloads, correctly) target HTTPS endpoints. The HTTP endpoints worked fine, but the HTTPS ones failed. And the reason why they failed is that they tried to connect to TinyProxy with TLS. TinyProxy does not support TLS, because it really just performs the minimal amount of work needed of a proxy. And for what it’s worth, in my setup it only allows local connections, so there is no real value in adding TLS to it.

Turns out this bug is only present if PHP does not have curl support, and WordPress fallback to fsockopen. Enabling the curl USE flag for the ebuild was enough to fix the problem, and I reported the bug. I honestly wonder if the Gentoo ebuild should actually force curl on, for WordPress, but I don’t want to go there yet.

By the way, I originally didn’t want to say this on this blog post, but since it effectively went viral, I also found out at that point that the reason why I could get a list of plugins, is that when the connection to api.wordpress.org with HTTPS fails, the code retries explicitly with HTTP. It’s effectively a silent connection downgrade (you’d still find the warning in the log, but nothing would appear like breaking at first). This appears to include the “new version check” of WordPress, which makes it an interesting security issue. I reported it via WordPress Hacker One page before my tweet got viral — and sorry, I didn’t at first realize just how bad that downgrade was.

So now I have an installation of WordPress, mostly secure, able to look for, fetch and install plugins. Let me install JetPack to get that famous Markdown support that is to me a requirement and dealbreaker. For some reason (read: because WordPress is more Open Core than Open Source), it requires activating with a WordPress.com account. That should be easy, yes?

Error Details: The Jetpack server was unable to communicate with your site https:// [OMISSIS] [IXR -32300: transport error: http_request_failed cURL error 7: ]

I hid away the URL for my test server, simply to avoid spammers. The website is public, and it has a valid certificate (thank you Let’s Encrypt), and it is not firewalled or requiring any particular IP to connect to. But, IPv6 only. Which makes it easy for me, as it reduces the amount of scanners and spammers while I try it out, and since I have an IPv6-enabled connection at both home and the office, it makes it easy to test with.

Unfortunately it seems like the WordPress infrastructure not only is not reachable from the IPv6 Internet, but it does not even egress onto it either. Which makes once again the myth of IPv6-only networks infeasible. Contacting WordPress.com on Twitter ended up with them opening a support ticket for me, and a few exchanges and logs later, they confirmed their infrastructure does not support IPv6 and, as they said, «[they] don’t have an estimate on when [they] may».

Where does this leave me? Well, right now I can’t activate the normal Jetpack plugin, but they have a “development version”, which they assert is fully functional for Markdown, and that should let me keep evaluating WordPress without this particular hurdle. Of course this requires more time, and may end up with me hitting other roadblock at this point, I’m not sure. So we’ll see.

Whether I’ll get it to work or not, I will share my configuration files in the near future, because it took me a while to get them set up properly, and some of them are not really explained. You may end up seeing a new restyle of the blog in the next few months. It’s going to bother me a bit, because I usually prefer to kepe the blog the same way for years, instead, but I guess that needs to happen this time around. Also, changing the posts’ paths again means I’ll have to set up another chain of redirects. If I do that, I have a bit of a plan to change the hostname of the blog too.

Why I do not like Hugo

Not even a year ago, I decided to start using Hugo as the engine for this blog. This has mostly served me well, except for the fact that it relies on me having some kind of access to a console, a text editor, and my Bitbucket account, which made posting stuff while travelling a bit harder, so I opted instead for writing drafts, and then staggering their posts — which is why you now see that for the most part I post something once every three days, except for the free ideas.

Hugo was sold to me as a static generator for blogs, and indeed when I looked into it, that’s what it was clearly aiming on being. Sure the support for arbitrary taxonomies make it possible to use it in slightly different setups for a blog, but it was at that point seriously focusing on blog, and a few other similar site types. The integration with Disqus was pretty good from the start, as much as I’m not really happy about that choice, and the conversion proceeded mostly smoothly, although it took me weeks to make sure the articles were converted correctly, and even months in I dedicated a few nights a month just to go through the posts and make sure their formatting was right, or through the tags to collapse duplicates.

All in all, while imperfect, it was not as horrible as having to maintain my own Typo fork. Until last week.

I finally decided that maintaining a separate website for the projects is a bad idea. Not just for the style being out of sync between the two, but most importantly because I barely ever update that content, as most of my projects are dead or have their own website already (like Autotools Mythbuster) or they effectively are just using their GitHub repository as the main description, even though it pains me. So the best option I found is to just build the pages I care about into Hugo, particularly using a custom taxonomy for the projects, and be done with it. Except.

Except that to be able to do what I had in mind, I needed a feature that was committed after the version of Hugo I froze myself at, so I had to update. Updates with Typo were always extremely painful because of new dependencies, and new features, and changes to the database layout, and all those kind of problems. Certainly Hugo won’t have these problems! Except it decided not to be able to render the theme I was using, as one function got renamed from RSSlink to RSSLink.

That was an easy fix; a bit less easy at first was figuring out that someone decided that RSS feeds should include, unconditionally, the summary of the article, not the full text, because, and I quote: «This is a somewhat breaking change, but is what most people expect from their RSS feeds.»

I’m not sure what these “most people” are. And I’d say that if you want to change such as default, maybe you want it to be an option, but that does not seem to be Hugo’s style, as I’ll show later. But this is not why I’m angry. I’m angry because changing the RSS from full content to summary is a very clear change in impression.

An RSS feed that has full article content, is an RSS feed for a blog (or other site) that wants to be read. You can use this feed to syndicate on Planets (yes they still exist), read it on services like Feedly, or NewsBlur (no they did not all disappear with the death of Google Reade), and have it at hand on offline readers on your mobile devices, too.

RSS feeds that only carry summaries, are there to drive traffic to a site. And this is where the nasty smell around SEOs and similar titles come back in from below the door. I totally understand if one is trying to make a living off their website they want to be able to bring in traffic, which include ads views and the like. I have spoken about ads before, and though I recently removed it from the blog altogether for lack of any useful profit, I totally empathise with those who actually can make a profit and want people to see their ads.

But the fact that the tools decide to switch to this mode make me feel angry and sick, because they are no longer empowering people to make their view visible, they are empowering them to trick users into opening a website, to either get served ads, or (if they are geek enough to use Brave) give bitcoin to the author.

As it turns out, it’s not the only thing that happen to have changed with Hugo, and they all sound like someone decided to follow the path of WordPress, that went from a blogging engine to a total solution for managing websites — which is kind of what Typo did when becoming Publify. Except that instead of going to a general website solution, they decided to one-up all of them. From the same release notes of the version that changed the RSS feed defaults:

Hugo 0.20 introduces the powerful and long sought after feature Custom Output Formats; Hugo isn’t just that “static HTML with an added RSS feed” anymore. Say hello to calendars, e-book formats, Google AMP, and JSON search indexes, to name a few ( #2828 ).

Why would you want to build e-book formats and calendars with the same tool you used to build a blog with? Sure, if it actually was practical I could possibly make Autotools Mythbuster use this, but I somehow doubt it would have enough support for what I want to get out of the output, so I don’t even want to consider that for now. But all in all, it looks like widening a little too much the target field.

Anyway, I went and reverted the changes for my local build of Hugo. I ended up giving up on that by the way, and just applied a local template replacement instead, since that way I could also re-introduce another fix I needed for the RSS that was not merged upstream (the ability to put the taxonomy data into the feed, so you can use NewsBlur’s intelligence trainer to filter out some of my blog’s content). Of course maintaining a forked copy of the builtin template also means that it can break when I update if they decided that it should be FeedLink next time around.

Then I pushed the new version, including the updated donations page – which is not redirected from the old one yet, still working on that – and stopped looking too much onto it. I did this (purposefully) in the 3-days break between two posts, so that if something broke I would have time to fix it, but it looked everything was alright.

Until I noticed that I somehow flooded Planet Gentoo with a bunch of posts dating back up to 2006! And someone pinged me on Hangouts for the same reason. So I rolled back to the old version (that did not solve the flooding unfortunately), regenerated, and started digging what happened.

In the version of Hugo I used originally, the RSS feeds were fixed to 15 items. This is a perfectly reasonable debug for a blog, as I didn’t go anywhere near it even at at the time I was spending more time blogging than sleeping. But since Hugo is no longer targeting to be a blog engine, that’s not enough. “News” sites (and I use it in quote, because too many of those are actually either aggregators of other things, or just outright scammers, or fake news sites) would have many more than that per day, so 15 is clearly not a good option for them. So in Hugo 0.19 (the version before the one that changed to use summary), this change can be found:

Make RSS item limit configurable #3035

This is reasonable. The default is kept to 15, but now you can change it in the configuration file to whatever you want it to be, be it 20, 50, or 100.

What I did not notice at that point, was from the following version:

Raise the default rssLimit #3145

That sounds still good, no? It raises the limit. To what?

hugolib: Default rssLimit to unlimited

Of course this is perfectly fine for small websites that have a hundred or two pages. But this blog counts over 2400 articles, written over the span of 12 years (as I have recovered a number of blog posts from previous platforms, and I’m still always looking to see if I can find the backups with the posts of my 15 years old self). It ended up generating a 12MB RSS feed with every single page published up to them.

So what am I doing now? That, unfortunately, I’m not sure. This is the kind of bad upgrade path that frustrated the heck out of me with Typo. Unfortunately the only serious alternative I know to this is WordPress, and that still does not support Markdown unless you use a strange combinations of plugins and I don’t even want to get into that.

I am tempted to see what’s out there for Ruby-based blog engines, although at this point I’m ready to pay for a solution that works native on AWS EC21, to avoid having to manage it myself. I would like to be able to edit posts without requiring me a console and git client, and I would like to have an integrated view of the comments, instead of relying on Disqus2, which at least a few people hate, and I don’t particularly enjoy.

For now, I guess I’ll have to be extra careful if I want to update Hugo. But at least I should be able to not break this again so easily as I’ll be checking the output before and after the change.

Update (2017-08-14): it looks like this blog post got picked up by Internet’s own peanut gallery that not only don’t seem to understand why I’m complaining here (how many SEOs ther?), but also appear to have started suggesting with more or less care a number of other options. I say “more or less” because some are repeats or aimed at solving different problems than mine. There are a few interesting ones I may spend some time looking into, either this week or the next while on the road.

Since this post is over a month old by now, I have not been idle and started instead trying out WordPress, with not particularly stunning results either. I am though still leaning towards that option, because WordPress has the best upgrade guarantees (as long as you’re not using all kind of plugins) and it solves the reliance on Disqus by having proper commenting support.


  1. Update (2017-08-14): I mean running the stack, rather than pushing and storing to S3. If there’s a pre-canned EC2 instance I can install and run, that’ll be about enough for me.
    [return]
  2. don’t even try suggesting isso. Maybe my comments are not big data, but at 2400+ blog posts, I don’t really want to have something single-threaded that access a SQLite file!
    [return]

Let’s have a serious talk about Ads

I have already expressed my opinion on Internet Ads months ago, so I would suggest you all to start reading from that, as I don’t want to have to repeat myself on this particular topic. What I want to talk right now is whether Ads actually work at all for things like my blog, or Autotools Mythbuster.

I’ll start by describing the “monetization” options that I use, and then talk a bit about how much they make, look into the costs and then take a short tour of what else I’m still using.

Right now, there are two sources of ads that I use on this blog: Google AdSense and Amazon Native Ads. Autotools Mythbuster only has AdSense, because the Amazon ads don’t really fit it well. On mobile platform, the only thing you really see is AdSense, as the Native Ads are all the way to the bottom (they don’t do page-level ads as far as I can tell), on desktop you only get to see the Amazon ads.

AdSense pays you both for clicks and for views of the ads on your site, although of course the former gives you significantly higher revenue. Amazon Native Ads only pays you for the items people actually buy, after clicking on the Ads on your site, as it is part of the Amazon Affiliate program. I have started using the Amazon Native Ads as an experiment over April and May, mostly out of curiosity of how they would perform.

The reason why I was curious of the performance is that AdSense, while it used to mostly make decision on which ads to show based on the content of the page, it has been mostly doing remarketing, which appears to creep some people app (I will make no further comments on this), while the idea that Amazon could show ads for content relevant to what I talked about appealed to me. It also turned out to have been an interesting way to find a bug with Amazon’s acapbot, because of course crawlers are hard. As it turns out, the amount of clicks coming from Amazon Native Ads is near zero, and the buying rate is even lower, but it still stacks fairly against AdSense.

To understand what I mean I need to give you some numbers, which is something people don’t seem to be very comfortable with in general. Google AdSense, overall, brings in a gross between €3 and €8 a month, with a very few rare cases in which it went all the way up to a staggering €12. Amazon Affiliates (which as I’ll get to does not only include Native Ads) varies very widely month after month, as it even reaches $50. Do note that all of this is still pre-tax, so you have to just about cut it in half to estimate (it’s actually closer to 35% but that’s a longer story).

I would say that between the two sources, over the past year I probably got around €200 before tax, so call it €120 net. I would have considered that not bad when I was self-employed, but nowadays I have different expectations, too. Getting the actual numbers of how much the domains cost me per year is a bit complicated, as some of those, including flameeyes.eu, are renewed for a block of years at the same time, but I can give you makefile.am as a point of reference (and yes that is an alias for Autotools Mythbuster) as €65.19 a year. The two servers (one storing configuration for easy re-deployment, and the other actually being the server you read the blog from) cost me €7.36/month (for the two of them), and the server I use for actually building stuff costs me €49/month. This already exceeds the gross revenue of the two advertising platforms. Oops.

Of course there is another consideration to make. Leaving aside my personal preferences on lifestyle, and thus where I spend my budget for things like entertainment and food, there is one expense I’m okay with sharing, and that is my current standing donations. My employer not only makes it possible to match donations, but it also makes it very easy to just set up a standing donation that gets taken directly at payroll time. Thanks to making this very simple, I have a standing €90/month donation, spread between Diabetes Ireland, EFF and Internet Archive, and a couple others, that I rotate every few months. And then there are Patreons I subscribe to.

This means that even if I were to just put all the revenue from those ads into donations, it would barely make an impact. Which is why by the time you read this post, my blog will have no ads left on (Autotools Mythbuster will continue for a month or two just so that the next payment is processed and not left in the system). They would be okay to be left there even if they make effectively no money, except that they still require paperwork to be filed for taxes, and this is why I have considered using Amazon Native Ads.

As I said, Amazon Native Ads are part of their Affiliate program, and while you can see in the reports how much revenue is coming from ads versus links, the payments, and thus the tax paperwork, is merged with the rest of the affiliate program. And I have been using affiliate links in a number of places, not just my blog, because in that case there is no drawback: Amazon is not tracking you any more or less than before, and it’s not getting in your way at all. The only places in which I actually let Amazon count the impression (view) is when I’m actually reviewing a product (book, game, hardware device), and even that is fairly minimal, and not any different from me just providing the image and a link to it — except I don’t have to deal with the images and the link breakage connected with that.

There is another reason why I am keeping the affiliates: while they require people to actually spend money to get me anything, they give you a percentages of the selling price of what was sold. Not what you linked to specifically, but what is sold in the session that the user initiated when they clicked on your link. This makes it interesting and ironic when people click on the link to buy Autotools Mythbuster and end up buying instead Autotools by Calcote.

Do I think this experience is universal or generally applicable? I doubt so. My blog does not get that many views anyway, and it went significantly down since I stopped blogging daily, and in particular it went down since I no longer talk about Gentoo that much. I guess part of the problem with that is that beside for people looking for particular information finding me on Google, the vast majority of the people end up on my blog either because they read it already, or follow me on various social media. I have an IFTTT recipe to post most of my entries on Twitter and LinkedIn (not Google+ because there is no way to do that automatically), and I used to have it auto-post new entries that would go to Planet Gentoo on /r/gentoo (much as I hate Reddit myself).

There is also the target audience problem: as most of the people reading this blog are geeks, it is very likely that they will have an adblocker installed, and they do not want to see the ads. I think uBlock may even make the affiliate links broken, while at it. They do block things like Skymiles Shooping and similar affiliate aggregators, because “privacy” (though there is not really any privacy problem there).

So at the end of the day, there is no gain for me to keep ads running, and there is some discomfort for my readers, thus I took them down. If I could, I would love to be able to just have ads for charities only, with no money going to me at all, but reminding people of the importance of donating, even a little bit, to organizations such as Internet Archive, which saved my bacon multiple times as I fixed the links from this blog to other sites, that in the mean time moved without redirects, or just got turned down. But that’s a topic for another day, I think.