Blog Redirects, Azure Style

Last year, I set up an AppEngine app to redirect the old blog’s URLs to the WordPress install. It’s a relatively simple Flask web application, although it turned out to be around 700 lines of code (quite a bit to just serve redirects). While it ran fine for over a year on Google Cloud without me touching anything, and fitting into the free tier, I had to move it, as part of my divestment from GSuite (which is only vaguely linked to me leaving Google).

I could have just migrated the app on a new consumer account for AppEngine, but I decided to try something different, to avoid the bubble, and to compare other offerings. I decided to try Azure, which is Microsoft’s cloud offering. The first impressions were mixed.

The good thing of the Flask app I used for redirection being that simple is that nothing ties it to any one provider: the only things you need are a Python environment, and the ability to install the requests module. For the same codebase to work on AppEngine and Azure, though, there seems to be a need for a simple change. Both providers appear to rely on Gunicorn, but AppEngine appears to be looking for an object called app in the main module, while Azure is looking for it in the application module. This is trivially solved by defining the whole Flask app inside application.py and having the following content in main.py (the command line support is for my own convenience):

#!/usr/bin/env python3

import argparse

from application import app


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--listen_host', action='store', type=str, default='localhost',
        help='Host to listen on.')
    parser.add_argument(
        '--port', action='store', type=int, default=8080,
        help='Port to listen on.')

    args = parser.parse_args()

    app.run(host=args.listen_host, port=args.port, debug=True)

The next problem I encountered was with the deployment. While there’s plenty of guides out there to use different builders to set up the deployment on Azure, I was lazy and went straight for the most clicky one, which used GitHub Actions to deploy from a (private) GitHub repository straight into Azure, without having to install any command line tools (sweet!) Unfortunately, I hit a snag in the form of what I think is a bug in the Azure GitHub Action template.

You see, the generated workflow for the deployment to Azure is pretty much zipping up the content of the repository, after creating a virtualenv directory to install the requirements defined for it. But while the workflow creates the virtualenv in a directory called env, the default startup script for Azure is looking for it in a directory called antenv. So for me it was failing to start until I changed the workflow to use the latter:

    - name: Install Python dependencies
      run: |
        python3 -m venv antenv
        source antenv/bin/activate
        pip install -r requirements.txt
    - name: Zip the application files
      run: zip -r myapp.zip .

Once that problem was solved, the next issue was to figure out how to set up the app on its original domain and have it serve TLS connections as well. This turned out to be a bit more complicated than expected because I had set up CAA records in my DNS configuration to only allow Let’s Encrypt, but Microsoft uses DigiCert to provide the (short lived) certificates, so until I removed that it wouldn’t be able to issue (oops.)

After everything is set up, here’s a few more of the differences between the two services, that I noticed.

First of all, Azure does not provide IPv6, although since they use CNAME records this can change at any time in the future. This is not a big deal for me, not only because the IPv6 is still dreamland, but also because the redirection would point to WordPress, that does not support IPv6. Nonetheless, it’s an interesting point to make, that despite Microsoft having spent years preparing for IPv6 support, and having even run Teredo tunnels, they also appear to not be ready to provide modern service entrypoints.

Second, and related, it looks like on Azure there’s a DNAT in front of the requests sent to Gunicorn — all the logs show the requests coming from 172.16.0.1 (a private IP address). This is opposite to AppEngine that shows the actual request IP in the log. It’s not a huge deal, but it does make it a bit annoying to figure out if there’s someone trying to attack your hostname. It also makes it funny that it’s not supporting IPv6, given it does not appear to need for the application itself to support the new addresses.

Speaking of logs, GCP exposes structured request logs. This is a pet peeve of mine, which GCP appears to at least make easier to deal with. In general, it allows you to filter logs much more easily to find out instances of requests being terminated with an error status, which is something that I paid close attention to in the weeks after deploying the original AppEngine redirector: I wanted to make sure my rewriting code didn’t miss some corner cases that users were actually hitting.

I couldn’t figure out how to get a similar level of detail in Azure, but honestly I have not tried too hard right now, because I don’t need that level of control for the moment. Also, while there does seem to be an entry in the portal’s menu to query logs, when I try it out I get a message «Register resource provider ‘Microsoft.Insights’ for this subscription to enable this query» which suggests to me it might be a paid extra.

Speaking of paid, the question of costs is something that clearly needs to be kept in clear sight, particularly given recent news cycles. Azure seems to provide a 12 months free trial, but it also gives you £150 of credit for 14 days, which don’t seem to match up properly to me. I’ll update the blog post (or write a new one) with more details after I have some more experience with the system.

I know that someone will comment complaining that I shouldn’t even consider Cloud Computing as a valid option. But honestly, from what I can see, I will be likely running a couple more Cloud applications out there, rather than keep hosting my own websites, and running my own servers. It’s just more practical, and it’s a different trade-off between costs and time spent maintaining thing, so I’m okay with it going this way. But I also want to make sure I don’t end up locking myself into a single provider, with no chance of migrating.

Blog Redirects & AppEngine

You may remember that when I announced I moved to WordPress, I promised I wouldn’t break any of the old links, particularly as I kept them working since I started running the blog underneath my home office’s desk, on a Gentoo/FreeBSD, just shy of thirteen years ago.

This is not a particularly trivial matter, because Typo used at least three different permalink formats (and two different formats for linking to tags and categories), and Hugo used different ones for all of those too. In addition to this, one of the old Planet aggregators I used to be on had a long-standing bug and truncated URLs to a certain length (actually, two certain lengths, as they extended it at some point), and since those ended up indexed by a number of search engines, I ended up maintaining a long mapping between broken URLs and what they were meant to be.

And once I had such a mapping, I ended up also keeping in it the broken links that other people have created towards my blog. And then when I fixed typos in titles and permalink I also added all of those to the list. And then, …

Oh yeah, and there is the other thing — the original domain of the blog, which I made a redirect for the newest one nearly ten years ago.

The end result is that I have kept holding, for nearly ten years, an unwieldy mod_rewrite configuration for Apache, that also prevented me to migrate to any other web server. Migrating to a new hostname when I migrated to WordPress was always my plan, if nothing else not to have to deal with all those rewrites in the same configuration as the webapp itself.

I have kept, until last week, the same abomination of a configuration, running on the same vserver as the blog used to run. But between stopping relationships with customers (six years ago when I moved to Dublin), moving the blog out, and removing the website of a friend of mine who decided to run his own WordPress, the amount of work needed to maintain the vserver is no longer commensurate to the results.

While discussing my options with a few colleagues, one idea that came out was to just convert the whole thing to a simple Flask application, and run it somewhere. I ended up wanting to try my employer’s own offerings, and ran it on AppEngine (but the app itself does not use any AppEngine specific API, it’s literally just a Flask app).

This meant having the URL mapping in Python, with a bit of regular expression magic to make sure the URL for previous blog engines are replaced with WordPress compatible ones. It also meant that I can have explicit logic of what to re-process and what not to, which with Apache was not easily done (but still possible).

Using an actual programming language instead of Apache configuration also means that I can be a bit smarter on how I process the requests. In particular, before returning the redirect to the requester, I’m now verifying whether the target exists (or rather, whether WordPress returns an OK status for it), and use that to decide whether to return a permanent or temporary redirect. This means that most of the requests to the old URLs will return permanent (308) redirects, and whatever is not found raises a warning I can inspect and see if I should add more entries to the maps.

A very simple UML Sequence Diagram of the redirector, at a high level.

The best part of all of this is of course that the AppEngine app is effectively always below the free tier quota marker, and as such has an effectively zero cost. And even if it wasn’t, the fact that it’s a simple Flask application with no dependency on AppEngine itself means I can move it to any other hosting option that I can afford.

The code is quite of a mess right now, not generic and fairly loose. It has to workaround an annoying Flask issue, and as such it’s not in any state for me to opensource, yet. My plan is to do so as soon as possible, although it might not include the actual URL maps, for the sake of obscurity.

But what is very clear from this for me is that if you want to have a domain whose only task is to redirect to other (static) addresses, like projects hosted off-site, or affiliate links – two things that I have been doing on my primary domain together with the rest of the site, by the way – then the option of using AppEngine and Flask are actually pretty good. You can get that done in a few hours.