“Planets” in the World of Cloud

As I have written recently, I’m trying to reduce the amount of servers I directly manage, as it’s getting annoying and, honestly, out of touch with what my peers are doing right now. I already hired another company to run the blog for me, although I do keep access to all its information at hand and can migrate where needed. I also give it a try to use Firebase Hosting for my tiny photography page, to see if it would be feasible to replace my homepage with that.

But one of the things that I still definitely need a server for is keep running Planet Multimedia, despite its tiny userbase and dwindling content (if you work in FLOSS multimedia, and you want to be added to the Planet, drop me an email!)

Right now, the Planet is maintained through rawdog, which is a Python script that works locally with no database. This is great to run on a vserver, but in a word where most of the investments and improvements go on Cloud services, that’s not really viable as an option. And to be honest, the fact that this is still using Python 2 worries me no little, particularly when the author insists that Python 3 is a different language (it isn’t).

So, I’m now in the market to replace the Planet Multimedia backend with something that is “Cloud native” — that is, designed to be run on some cloud, and possibly lightweight. I don’t really want to start dealing with Kubernetes, running my own PostgreSQL instances, or setting up Apache. I really would like something that looks more like the redirector I blogged about before, or like the stuff I deal with for a living at work. Because it is 2019.

So sketching this “on paper” very roughly, I expect such a software to be along the lines of a single binary with a configuration file, that outputs static files that are served by the web server. Kind of like rawdog, but long-running. Changing the configuration would require restarting the binary, but that’s acceptable. No database access is really needed, as caching can be maintained to process level — although that would men that permanent redirects couldn’t be rewritten in the configuration. So maybe some configuration database would help, but it seems most clouds support some simple unstructured data storage that would solve that particular problem.

From experience with work, I would expect the long running binary to be itself a webapp, so that you can either inspect (read-only) what’s going on, or make changes to the database configuration with it. And it should probably have independent parallel execution of fetchers for the various feeds, that then store the received content into a shared (in-memory only) structure, that is used by the generation routine to produce the output files. It may sounds like over-engineering the problem, but that’s a bit of a given for me, nowadays.

To be fair, the part that makes me more uneasy of all is authentication, but Identity-Aware Proxy might be a good solution for this. I have not looked into that but used something similar at work.

I’m explicitly ignoring the serving-side problem: serving static files is a problem that has mostly been solved, and I think all cloud providers have some service that allows you to do that.

I’m not sure if I will be able to work more on this, rather than just providing a sketched-out idea. If anyone knows of something like this already, or feels like giving a try to building this, I’d be happy to help (employer-permitting of course). Otherwise, if I find some time to builds stuff like this, I’ll try to get it released as open-source, to build upon.

Free Idea: structured access logs for Apache HTTPD

This post is part of a series of free ideas that I’m posting on my blog in the hope that someone with more time can implement. It’s effectively a very sketched proposal that comes with no design attached, but if you have time you would like to spend learning something new, but no idea what to do, it may be a good fit for you.

I have been commenting on Twitter a bit about the lack of decent tooling to deal with Apache HTTPD’s Combined Logging Format (inherited from NCSA). For those who do not know about it, this is hte format used by standard access_log files, which include information about requests, including the source IP, the time, the requested path, the status code and the User-Agent used.

These logs are useful for debugging but are also consumed by tools such as AWStats to produce useful statistics about the request patterns of a website. I used these extensively when writing my ModSecurity rulesets, and I still keep an eye out on them for instance to report wasteful feed readers.

The files are simple text files, and that makes it easy to act on them: you can use tail and grep, and logrotate needs no special code beside moving the file and reloading Apache to have it re-open the paths. This makes it hard to query for particular entries in fields, such as to get the list of User-Agent strings present in a log. Some of the suggestions I got over Twitter to solve this were to use awk, but as it happens, these logs are not actually parseable with a straightforward field separation.

Lacking finding a good set of tools to handle these formats directly, I have been complaining that we should probably start moving away from simple text files into more structured log formats. Indeed, I know that there used to be at least some support for logging directly to MySQL and other relational databases, and that there are more complicated machinery often used by companies and startups that process these access logs into analysis software and so on. But all of these tend to be high overhead, much more than what I or someone else with a small personal blog would care about implementing.

Instead I think it’s time to start using structured file logs. A few people including thresh from VideoLAN suggested using JSON to write the log files. This is not a terrible idea, as the format is at least well understood and easy to interface with most other software, but honestly I would prefer something with an actual structure, a schema that can be followed. Of course I’m not meaning XML, and I would rather suggest having a standardized schema for proto3. Part of that I guess is because I’m used to use this at work, but also because I like the idea of being able to just define my schema and have it generate the code to parse the messages.

Unfortunately currently there is no support or library to access a sequence of protocol buffer messages. Using a single message with repeated sub-messages would work, but it is not append-friendly so there is no way to just keep writing this to a file, and being able to truncate and resume writing to it, which is a property needed for a proper structured log format to actually fit in the space previously occupied by text formats. This is something I don’t usually have to deal with at work, but I would assume that a simple LV (Length-Value) or LVC (Length-Value-Checksum) encoding would be okay to solve this problem.

But what about other properties of the current format? Well, the obvious answer is that, assuming your structured log contains at least as much information (but possibly more) as the current log, you can always have tools that convert on the fly to the old format. This would for instance allow to have a special tail-like command and a grep-like command that provides compatibility with the way the files are currently looked at manually by your friendly sysadmin.

Having more structured information would also allow easier, or deeper analysis of the logs. For instance you could log the full set of headers (like ModSecurity does) instead of just the referrer and User-Agent. And allow for customizing the output on the conversion side rather than lose the details when writing.

Of course this is just one possible way to solve this problem, and just because I would prefer working with technologies that I’m already friendly with it does not mean I wouldn’t take another format that is similarly low-dependency and easy to deal with. I’m just thinking that the change-averse solution of not changing anything and keeping logs in text format may be counterproductive in this situation.

Free Idea: a filtering HTTP proxy for securing web applications

This post is part of a series of free ideas that I’m posting on my blog in the hope that someone with more time can implement. It’s effectively a very sketched proposal that comes with no design attached, but if you have time you would like to spend learning something new, but no idea what to do, it may be a good fit for you.

Going back to a previous topic I wrote about, and the fact that I’m trying to set up a secure WordPress instance, I would like to throw out another idea I won’t have time to implement myself any time soon.

When running complex web applications, such as WordPress, defense-in-depth is a good security practice. This means that in addition to locking down what the code itself can do on to the state of the local machine, it also makes sense to limit what it can do to the external state and the Internet at large. Indeed, even if you cannot drop a shell on a remote server, there is value (negative for the world, positive for the attacker) to at least being able to use it form DDoS (e.g. through an amplification attack).

With that in mind, if your app does not require network at all, or the network dependency can be sacrificed (like I did for Typo), just blocking the user from making outgoing connection with iptables would be enough. The --uid-owner option makes it very easy to figure out who’s trying to open new connections, and thus stop a single user transmitting unwanted traffic. Unfortunately, this does not always work because sometimes the application really needs network support. In the case of WordPress, there is a definite need to contact the WordPress servers, both to install plugins and to check if it should self-update.

You could try to limit access to what the user can access by hosts. But that’s not easy to implement right either. Take WordPress as an example still: if you wanted to limit access to the WordPress infrastructure, you would effectively have to allow it accessing *.wordpress.org, and this can’t really be done in iptables, at far as I know, since those connections go to IP literal addresses. You could rely on FcRDNS to verify the connections, but that can be slow, and if you happen to have access to poison the DNS cache of the server, you’re effectively in control of this kind of ACL. I ignored the option of just using “standard” reverse DNS resolution, because in that case you don’t even need to poison DNS, you can just decide what your IP will reverse-resolve to.

So what you need to do is actually filter at the connection-request level, which is what proxies are designed for. I’ll be assuming we want to have a non-terminating proxy (because terminating proxies are hard), but even in that case you can now know what (forward) address you want to connect to, and in that case *.wordpress.org becomes a valid ACL to use. And this is something you can actually do relatively easily with Squid, for instance. Indeed, this is the whole point of tools such as ufdbguard (which I used to maintain for Gentoo), and the ICP protocol. But Squid is particularly designed as a caching proxy, it’s not lightweight at all, and it can easily become a liability to have it in your server stack.

Up to now, what I have used to reduce the surface of attacks of my webapps is set them behind a tinyproxy, which does not really allow for per-connection ACLs. This only provides isolation against random non-proxied connections, but it’s a starting point. And here is where I want to provide a free idea for anyone who has the time and would like to provide better security tools for srver-side defense-in-depth.

A server-side proxy for this kind of security usage would have to be able to provide ACLs, with both positive and negative lists. You may want to provide all access to *.wordpress.org, but at the same time block all non-TLS-encrypted traffic, to avoid the possibility of downgrade (given that WordPress has a silent downgrade for requests to api.wordpress.org, that I talked about before).

Even better, such a proxy should have the ability to distinguish the ACLs based on which user (i.e. which webapp) is making the request. The obvious way would be to provide separate usernames to authenticate to the proxy — which again Squid can do, but it’s designed for clients for which the validation of username and password is actually important. Indeed, for this target usage, I would ignore the password altogether, and just use the user at face value, since the connection should always only be local. I would be even happier if instead of pseudo-authenticating to the proxy, the proxy could figure out which (local) user the connection came from, by inspecting the TCP socket connection, kind of like querying the ident protocol used to work for IRC.

So to summarise, what I would like to have is an HTTP(S) proxy that focuses on securing server-side web applications. Does not have to support TLS transport (because it should only accept local connections), nor it should be a terminating proxy. It should support ACLs that allow/deny access to a subset of hosts, possibly per-user, without needing a user database of any sort, and even better if it can tell by itself which user the connection came from. I’m more than happy if someone tells me this already exists, or if not, someone starts writing this… thank you!

IPv6 Horror Story: Telegram

This starts to become an interesting series of blog posts, my ranting about badly implemented IPv6. Some people may suggest that this is because I have a problem with IPv6, but the reality is that I like IPv6 and I want more of it, and properly implemented.

IPv6 enthusiasts often forget that implementing IPv6 is not just a matter of having a routable address, and being able to connect the IPv6 network. Indeed, the network layer is just one of the many layers that need to be updated for your applications to be IPv6-compatible, and that is not even going into the problem of optimizing for IPv6.

Screenshot of Telegram from Android declaring my IP as

To the right you can see the screenshot of Telegram messenger on my personal Android phone – I use Telegram to keep in touch with just a handful of people, of course – once I log in on Telegram from my laptop on the web version.

The text of the message I receive on Android at that login is

Diego Elio,

We detected a login into your account from a new device on 06/08/2017 at 10:46:58 UTC.

Device: Web
Location: Unknown (IP =

If this wasn’t you, you can go to Settings – Privacy and Security – Sessions and terminate that session.

If you think that somebody logged in to your account against your will, you can enable two-step verification in Privacy and Security settings.

The Telegram Team

This may not sound as obviously wrong to you as it did to me. Never mind the fact that the IP is unknown. is a private network address, as all the IPv4 addresses starting with 10. Of course when I noticed that the first thing I did was checking whether web.telegram.org published a AAAA (IPv6) record, and of course it does.

A quick check using TunnelBear (that does not support IPv6) also shows that the message is correct when logging in from an IPv4 connection instead, showing the public egress IP used by the connection.

I reported this problem to Telegram well over a month ago and it’s still unfixed. I don’t think this count as a huge security gap, as the message still provides some level of information on new login, it does remove the ability to ensure the connection is direct from the egress you expect. Say for instance that your connection is being VPN’d to an adversarial network, you may notice that your connecting IP is appearing as from a different country than you’re in, and you know there’s a problem. When using IPv6, Telegram is removing this validation, because they not only not show you a location, but they give you a clearly wrong IPv4 in place of the actual IPv6.

Since I have no insight of what the Telegram internal infrastructure looks like, I cannot say exactly which part of the pipeline is failing for them. I would expect that this is not a problem with NAT64, because that does not help to receive inbound connections — I find it more likely it’s either the web frontend looking for an IPv4, not finding it, and having some sort of default (maybe to its own IP address) to pass along to the next service, or alternatively the service that is meant to send the message (or possibly the one that tries to geocode the address) that mis-parses the IPv6 into an IPv4, and discard the right address.

Whatever the problem on their side is, it makes a very good example of how IPv6 is not as straightforward to implement as many enthusiasts who have never looked into converting a complicated app stack would think. And documenting this here, is my way to remind people that these things are more complicated than they appear, and that it will take still a significant amount of time to fix all these problems, and until then it’s unreasonable to expect consumers to deal with IPv6-only networks.

Screenshot of Telegram from Android declaring three open sessions for

Update (2017-08-07): a helpful (for once) comment on a reddit thread over this pointed me at the fact that even the Active Sessions page is also showing addresses in the private ten-dot space — as you can see now on the left here. It also shows that my Android session is coming (correctly) from my ISP’s CGNAT endpoint IPv4.

What this tells us is that not only there is a problem with the geocoding of an IP address that Telegram notifies the user on connection, but as long as the connection is proxied over IPv6, the user is not able to tell whether the connection was hijacked or not. If the source IP address and location was important enough to add in the first place, it feels like it should be important enough to get right. Although it may be just wait to be pushed and fixed, as the commenter on Reddit pointed out they could see the right IPv6 addresses nowadays — I can’t.

Also, I can now discard the theory that they may be parsing an IPv6 as if it was IPv4 and mis-rendering it. Of the three connections listed, one is from my office that uses an entirely different IPv6 prefix from the one at my apartment. So there.

The fact that my Android session appears on v4, despite my phone being connected to a working IPv6 network also tells me that the Telegram endpoint for Android (that is, their main endpoint) is not actually compatible with IPv6, which makes it even the more surprising that they would actually publish a record for it for their web endpoint, and show off the amount of missing details in their implementation.

I would also close the update by taking a look at one of the comments from that reddit thread:

In theory, could be that telegram is actually accepting ipv6 on their internet facing port and translating it to v4 inside the network. I’d change my opinion but given the lack of information like the IP address you had on your machine or the range you use for your network, I really don’t see enough to say this is an issue with ipv6 implementation.

Ignore the part where the commenter thinks the fact I did not want to disclose my network setup is any relevant, which appears to imply I’m seeing my own IP space reflected there (I’m not) like it was something that was even possible, and focus on the last part. «I really don’t see enough to say this is an issue with ipv6 implementation.»

This is an issue with Telegram’s IPv6 implementation. Whether they are doing network translation at the edge and then miss a X-Forwarded-For, or one of their systems defaults to a local IP if they can’t parse the remote one (as it’s in a different format), it does not really have to matter. An IPv6 implementation does not stop at the TCP level, it requires changes at application level just fine. I have unfortunately met multiple network people who think that as long as an app connects and serves data, IPv6 is supported — for an user, this is completely bollocks because they want to be able to use the app, use the app correctly, and use the app safely. Right now that does not appear to be the case for Telegram.

More on IPv6 feasibility

I didn’t think I would have to go back to the topic of IPv6, particularly after my last rant on the topic. But of course it’s the kind of topic that leads itself to harsh discussions over Twitter, so here I am back again (sigh).

As a possibly usual heads’ up, and to make sure people understand where I’m coming from, it is correct I do not have a network background, and I do not know all the details of IPv6 and the related protocol, but I do know quite a bit about it, have been using it for years, and so my opinion is not one of the lazy sysadmin that sees a task to be done and wants to say there’s no point. Among other things, because I do not like that class of sysadmins any more (I used to). I also seem to have given some people the impression that I am a hater of IPv6. That’s not me, that’s Todd. I have been using IPv6 for a long time, I have IPv6 at home, I set up IPv6 tunnels back in the days of having my own office and contracting out, and I have a number of IPv6-only services (including the Tinderbox).

So with all this on the table, why am I complaining about IPv6 so much? The main reason is that, like I said in the other post, geeks all over the place appear to think that IPv6 is great right now and you can throw away everything else and be done with it right this moment. And I disagree. I think there’s a long way to go, and I also think that this particular attitude will make the way even longer.

I have already covered in the linked post the particular issue of IPv6 having originally be designed for globally identifiable, static IPv6 addresses, and the fact that there have been at least two major RFCs to work around this particular problem. If you have missed that, please go back to the post and read it there because I won’t repeat myself here.

I want instead to focus on why I think IPv6 only is currently infeasible for your average end user, and why NAT (including carrier-grade NAT) is not going away any year now.

First of all, let’s define what an average end user is, because that is often lost to geeks. An average end user does not care what a router does, they barely care what a router is, and a good chunk of them probably still just call them modem, as their only interest is referring to “the device that the ISP gives you to connect to the Internet”. An average user does not care what an IP address is, nor cares how DNS resolution happens. And the reason for all of this is because the end user cares about what they are trying to do. And what they are trying to do is browse the Internet, whether it is the Web as a whole, Facebook, Twitter, YouTube or whatever else. They read and write their mail, they watch Netflix, NowTV and HBO GO. They play games they buy on Steam or EA Origin. They may or may not have a job, and if they do they may or may not care to use a VPN to connect to whatever internal resources they need.

I won’t make any stupid or sexist stereotype example for this, because I have combined my whole family and some of my friends in that definition, and they are all different. They all don’t care about IPv6, IPv4, DSL technologies and the like. They just want an Internet connection, and one that works and is fast. And with “that works” I mean “where they can reach the services they need to complete their task, whichever that task is”.

Right now that is not an IPv6 only network. It may be, in the future, but I won’t hold my breath for a number of reasons, that this is going to happen in the next 10 years, despite the increasing pressure and the increasing growth of IPv6 deployment to end users.

The reason why I say this is that right now, there are plenty of services that can only be reached over IPv4, some of which are “critical” (for some definition of critical of course) to end users, such as Steam. Since the Steam servers are not available over IPv6, the only way you can reach them is either through IPv4 (which will involve some NAT) or NAT64. While the speed of the latter, at least on closed-source proprietary hardware solutions, is getting good enough to be usable, I don’t expect it being widely deployed any time now, as it has the drawback of not working with IP literals. We all hate IP literals, but if you think no company ever issue their VPN instructions with an IP literal in them, you are probably going to be disappointed once you ask around.

There could be an interesting post about this level of “security by obscurity”, but I’ll leave that for later.

No ISP wants to receive calls from their customers that access to a given service is not working for them, particularly when you’re talking about end users that do not want to care about tcpdump and traceroute, and customer support that wouldn’t know how to start debugging that NowTV will send a request to an IPv4 address (literal) before opening the stream, and then continue the streaming over IPv4. Or that Netflix refuse to play any stream if the DNS resolution happens over IPv4 and the stream tries to connect over IPv6.

Which I thought Netflix finally fixed until…

Now, to be fair, it is true that if you’re using an IPv6 tunnel you are indeed proxying. Before I had DS-Lite at home I was using TunnelBroker and it appeared like I was connecting from Scotland rather than Ireland, and so for a while I unintentionally (but gladly) sidestepped country restrictions. But this also happened a few times on DS-Lite, simply because the GeoIP and WhoIs records didn’t match between the CGNAT and the IPv6 blocks. I can tell you it’s not fun to debug.

The end result is that most customer ISPs will choose to provide a service in such a way that their users feel the least possible inconvenience. Right now that means DS-Lite, which involves a carrier-grade NAT, which is not great, as it is not cheap to run, and it still can cause problems, particularly when users use Torrent or P2P heavily, in which case they can very quickly exhaust the 200-ports forwarding blocks that are allocated for CGNAT. Of course DS-Lite also takes away your public IPv4, which is why I heard a number of geeks complaining loudly about DS-Lite as a deployment option.

Now there is another particular end user, in addition to geeks, that may care about IP addresses: gamers. In particular online gamers (rather than, say, Fallout 4 or Skyrim fans like me). The reason for that is that most of the online games use some level of P2P traffic, and so require you to have a way to receive inbound packets. While it is technically possible to set up IGD-based redirection all the way from the CGNAT address to your PC or console, I do not know how many ISPs implement that correctly. Also, NAT in general introduces risks for latency, and requires more resources on the passing routers, and that is indeed a topic that is close to the heart of gamers. Of course, gamers are not your average end user.

An aside: back in the early days of ADSL in Italy, it was a gaming website, building its own ISP infrastructure, that first introduced Fastpath to the country. Other ISPs did indeed follow, but NGI (the ISP noted above) stayed for a long while a niche ISP focused on the need of gamers over other concerns, including price.

There is one caveat that I have not described yet, but I really should, because it’s one of the first objections I receive every time I speak about the infeasibility of IPv6 only end user connections: the mobile world. T-Mobile in the US, in particular, is known for having deployed IPv6 only 3G/4G mobile networks. There is a big asterisk to put here, though. In the US, and in Italy, and a number of other countries to my knowledge, mobile networks have classically been CGNAT before being v6-only, and with a large amount of filtering in what you can actually connect to, even without considering tethering – this is not always the case for specialised contracts that allow tethering or for “mobile DSL” as they marked it in Italy back in the days – and as such, most of the problems you could face with VPN, v4 literals and other similar limitations of v6-only with NAT64 (or proxies) already applied.

Up to now I have described a number of complexities related to how end users (generalising) don’t care about IPv6. But ISPs do, or they wouldn’t be deploying DS-Lite either. And so do a number of other “actors” in the world. As Thomas pointed out over Twitter, not having to bother with TCP keepalive for making sure a certain connection is being tracked by a router makes mobile devices faster and use less power, as they don’t have to wake up for no reason. Certain ISPs are also facing problems with the scarcity of IPv4 blocks, particularly as they grow. And of course everybody involved in the industry hates pushing around the technical debt of the legacy IPv4 year after year.

So why are we not there yet? In my opinion and experience, it is because the technical debt, albeit massive, is spread around too much: ISPs, application developers, server/service developers, hosting companies, network operators, etc. Very few of them feel enough pain from v4 being around that they want to push hard for IPv6.

A group of companies that did feel a significant amount of that pain organized the World IPv6 Day. In 2011. That’s six years ago. Why was this even needed? The answer is that there were too many unknowns. Because of the way IPv6 is deployed in dual-stack configurations, and the fact that a lot of systems have to deal with addresses, it seems obvious that there is a need to try things out. And while opt-ins are useful, they clearly didn’t stress test enough of the usage surface of end users. Indeed, I stumbled across one such problem back then: when my hosting provider (which was boasting IPv6 readiness) sent me to their bank infrastructure to make a payment, the IP addresses of the two requests didn’t match, and the payment session failed. Interactions are always hard.

A year after the test day, the “Launch” happened, normalizing the idea that services should be available over IPv6. Even though that the case, it took quite a longer while for many services to be normally available over IPv6, and I think, despite being one of the biggest proponents and pushers of IPv6, Microsoft update servers only started providing v6 support by default in the last year or so. Things improved significantly over the past five years, and thanks to the forced push of mobile providers such as T-Mobile, it’s a minority of the connections of my mobile phones that still connect to the v4 world, though there are still enough not to be able to be ignored.

What are the excuse for those? Once upon a time, the answer was “nobody is using IPv6, so we’re not bothering supporting it”. This is getting less and less valid. You can see the Google IPv6 statistics that show an exponential growth of connections coming from IPv6 addresses. My gut feeling is that the wider acceptance of DS-Lite as a bridge solution is driving that – full disclosure: I work for Google, but I have no visibility in that information, so I’m only guessing this out of personal experience and experience gathered before I joined the company – and it’s making that particular excuse pointless.

Unfortunately, there are still “good” excuses. Or at least reasons that is hard to argue with. Sometimes, you cannot enable IPv6 for your web service, even though you have done all your work, because of dependencies that you do not control, for instance external support services such as the payment system in the OVH example above. Sometimes, the problem is to be found in another piece of infrastructure that your service shares with others and that requires to be adapted, as it may have code expecting a valid IPv4 address at some particular place, and an IPv6 would make it choke, say in some log analysis pipeline. Or you may rely on hardware for the network layer that just still does not understand IPv6, and you don’t want to upgrade because you still have not found enough of an upside to you to make the switch.

Or you may be using an hosting provider that insists that giving you a single routable IPv6 is giving you a “/64” (it isn’t — they are giving you a single address in a /64 they control). Any reference to a certain German ISP I had to deal with in the past is not casual at all.

And here is why I think that the debt is currently too spread around. Yes, it is true that mobile phones batteries can be improved thanks to IPv6. But that’s not something your ISP or the random website owner care about – I mean, there are websites so bad that they take megabytes to load a page, that would be even better! – and of course a pure IPv6 without CGNAT is a dream of ISPs all over the world, but it is very unlikely that Steam would care about them.

If we all acted “for the greater good”, we’d all be investing more energy to make sure that v6-only becomes a feasible reality. But in truth, outside of controlled environments, I don’t see that happening any year now as I said. Controlled environments in this case can refer not only to your own personal network, but to situations like T-Mobile’s mobile data network, or an office’s network — after all, it’s unlikely that an office, outside of Sky’s own, would care whether they can connect to NowTV or Steam. Right now, I feel v6-only network (without NAT64 even) are the realm of backend networks. Because you do not need v4 for connecting between backends you control, such as your database or API provider, and if you push your software images over the same backend network, there is no reason why you would even have to hit the public Internet.

I’m not asking to give a pass to anyone who’s not implementing v6 access now, but as I said when commenting on the FOSDEM network, it is not by bothering the end users that you’ll get better v6 support, is by asking the services to be reachable.

To finish off, here’s a few personal musings on the topic, that did not quite fit into the discourse of the post:

  • Some ISPs appear to not have as much IPv4 pressure as others; Telecom Italia still appears to not have reassigned or rerouted the /29 network I used to have routed to my home in Mestre. Indeed, whois information for those IPs still has my old physical address as well as an old (now disconnected) phone number.
  • A number of protocols that provide out-of-band signalling, such as RTSP and RTMP, required changes to be used in IPv6 environments. This means that just rebuilding the applications using them against a C library capable of understanding IPv6 would not be enough.
  • I have read at least one Debian developer in the past giving up on running IPv6 on their web server, because their hosting provider was sometimes unreliable and they had no way to make sure the site was actually correctly reachable at all time; this may sound like a minimal problem, but there is a long tail of websites that are not actually hosted on big service providers.
  • Possibility is not feasibility. Things may be possible, but not really feasible. It’s a subtle distinction but an important one.

Does your webapp really need network access?

One of the interesting thing that I noticed after shellshock was the amount of probes for vulnerabilities that counted on webapp users to have direct network access. Not only ping to known addresses to just verify the vulnerability, or wget or curl with unique IDs, but even very rough nc or even /dev/tcp connections to give remote shells. The fact that probes are there makes it logical to me to expect that for at least some of the systems these actually worked.

The reason why this piqued my interest is because I realized that most people don’t do the one obvious step to mitigate this kind of problems by removing (or at least limiting) the access to the network of their web apps. So I decided it might be a worth idea to describe a moment why you should think of that. This is in part because I found out last year at LISA that not all sysadmins have enough training in development to immediately pick up how things work, and in part because I know that even if you’re a programmer it might be counterintuitive for you to think that web apps should not have access, well, to the web.

Indeed, if you think of your app in the abstract, it has to have access to the network to serve the response to the users, right? But what happens generally is that you have some division between the web server and the app itself. People who have looked into Java in the early nougthies probably have heard of the term Application Server, which usually is present in form of Apache Tomcat or IBM WebSphere, but here is essentially the same “actor” for Rails app in the form of Passenger, or for PHP with the php-fpm service. These “servers” are effectively self-contained environments for your app, that talk with the web server to receive user requests and serve them responses. This essentially mean that in the basic web interaction, there is no network access needed for the application service.

Things gets a bit more complicated in the Web 2.0 era though: OAuth2 requires your web app to talk, from the backend, with the authentication or data providers. Similarly even my blog needs to talk with some services, to either ping them to tell them that a new post is out, and to check with Akismet for blog comments that might or might not be spam. WordPress plugins that create thumbnails are known to exist and to have a bad history of security and they fetch external content, such as videos from YouTube and Vimeo, or images from Flickr and other hosting websites to process. So there is a good amount of network connectivity needed for web apps too. Which means that rather than just isolating apps from the network, what you need to implement is some sort of filter.

Now, there are plenty of ways to remove access to the network from your webapp: SElinux, GrSec RBAC, AppArmor, … but if you don’t want to set up a complex security system, you can do the trick even with the bare minimum of the Linux kernel, iptables and CONFIG_NETFILTER_XT_MATCH_OWNER. Essentially what this allows you to do is to match (and thus filter) connections based of the originating (or destination) user. This of course only works if you can isolate your webapps on a separate user, which is definitely what you should do, but not necessarily what people are doing. Especially with things like mod_perl or mod_php, separating webapps in users is difficult – they run in-process with the webserver, and negate the split with the application server – but at least php-fpm and Passenger allow for that quite easily. Running as separate users, by the way, has many more advantages than just network filtering, so start doing that now, no matter what.

Now depending on what webapp you have in front of you, you have different ways to achieve a near-perfect setup. In my case I have a few different applications running across my servers. My blog, a WordPress blog of a customer, phpMyAdmin for that database, and finally a webapp for an old customer which is essentially an ERP. These have different requirements so I’ll start from the one that has the lowest.

The ERP app was designed to be as simple as possible: it’s a basic Rails app that uses PostgreSQL to store data. The authentication is done by Apache via HTTP Basic Auth over HTTPS (no plaintext), so there is no OAuth2 or other backend interaction. The only expected connection is to the PostgreSQL server. Pretty similar the requirements for phpMyAdmin: it only has to interface with Apache and with the MySQL service it administers, and the authentication is also done on the HTTP side (also encrypted). For both these apps, your network policy is quite obvious: negate any outside connectivity. This becomes a matter of iptables -A OUTPUT -o eth0 -m owner --uid-owner phpmyadmin -j REJECT — and the same for the other user.

The situation for the other two apps is a bit more complex: my blog wants to at least announce that there are new blog posts, and it needs to reach Akismet; both actions use HTTP and HTTPS. WordPress is a bit more complex because I don’t have much control over it (it has a dedicated server, so I don’t have to care), but I assume it mostly is also HTTP and HTTPS. The obvious idea would be to allow ports 80, 443 and 53 (for resolution). But you can do something better. You can put a proxy on your localhost, and force the webapp to go through it, either as a transparent proxy or by using the environment variable http_proxy to convince the webapp to never connect directly to the web. Unfortunately that is not straight forward to implement as neither Passenger not php-fpm has a clean way to pass environment variables per users.

What I’ve done is for now is to hack the environment.rb file to set ENV['http_proxy'] = '' so that Ruby will at least respect it. I’m still out for a solution for PHP unfortunately. In the case of Typo, this actually showed me two things I did not know: when looking at the admin dashboard, it’ll make two main HTTP calls: one to Google Blog Search – which was shut down back in May – and one to Typo’s version file — which is now a 404 page since the move to the Publify name. I’ll be soon shutting down both implementations since I really don’t need it. Indeed the Publify development still seems to go toward the “let’s add all possible new features that other blogging sites have” without considering the actual scalability of the platform. I don’t expect me to go back to it any time soon.

How not to sell me something — Why I won’t be maintaining Yubikey software directly in Gentoo

You probably remember my previous notes about WordPress, FTP and the problem with security. At the end after a (boring) set up session I was able to get vsftpd provide FTPS service, which should be usable both by WordPress and by Dreamweaver, so that my friend the webmaster can upload through it directly.

This is important because as it happens I have another prospective customer who’s going to run WordPress, and FTPS now start to look more interesting than SSH, as it doesn’t require me to give shell access to the server either.

Unfortunately I’m a bit worried (maybe more than I should be) for the use of standard passwords rather than certificates or keypairs for authentication. Which meant I went tried to think of other alternatives.. of which there are mostly two: Google Authenticator and YubiKey .

The latter I knew by name already because I proxy-maintain the required software for Brant, and I know it’s outdated already and would require a new maintainer who can deal with those packages – I already posted about hardware-related maintenance for what it’s worth – so it was my first choice: while it meant I had to spend some money, it would have solved my problem and improved Gentoo, even if just for a tiny bit. The price for YubiKey devices is also low enough that, if I felt like providing more FTPS access to customers, I could simply bill it to them without many complaints.

So I went on the manufacturer’s (Yubico’s) website and tried to buy two of them (one for me to test and set up, and one to give my friend to access the server); despite publishing the prices in dollars, they sell through Sweden and UK, which means they are part of EU’s VAT area, and me being a registered business within EU, I should receive a reverse-charge invoice by stating my own VAT ID… never had much of a problem with it, as many of my suppliers are sparse through Europe, I registered for the “foreign-enabled” registry right when I opened business — don’t ask me why Italian (and Spanish as far as I can tell) business owners are not enabled by default to have intra-union suppliers.

Now trouble starts: since, as I just noted, not all VAT IDs are valid to use for intra-union trade, there has to be a way to ensure you’re dealing with an acceptable party. This is implemented through VIES the VAT Information Exchange System which, for what concerns Italian businesses, only tells you a boolean result of valid/invalid (and not the full registration data that most other states seem to provide). I knew VIES from a previous business agreement, but I never cared much. Turns out though that most e-Shops I encountered validate the VAT ID after order completed — or in the case of Amazon it seems like they check their internal database as well as VIES.

Yubico instead validates the request through VIES at the time of registration:

VAT Number could not be validated with VIES at this time. This typically happens when the service is under maintenance. Please retry after some time. For urgent orders, please contact order@yubico.com

Considering that the VIES website has a long disclaimer (which I can’t quote here for reasons that will be clear in a moment) stating that they do not guarantee the availability of the service at any time, and only seem to guarantee the validity of the data to the extent that the law ask them to (which probably means “as long as the states’ own databases are correct”), relying on such a service for registration is .. bad.

The VIES website is indeed down since at least 11am today (over four hours ago as I write this); for a moment they also gave me an interesting page (which I forgot to save), telling me that there were too many requests’ failures from “my IP address” … listing an IP address in the 2128 range — my actual IP address is in the 948 range.

What’s the end result here? I’ll probably waste some more time trying to get Google Authenticator; Yubico basically lost a customer and a (possible) contributor by trying and failing to be smarter and won’t have a dedicated maintainer in Gentoo in the near future. It’s sad, because it seems to be easily the most cost- and time-effective solution out there (Google Authenticator is free, but it requires a greater investment of time, and time is money as we all should know).

The web application security culture

Okay, I love to rant, so what?

Just the other day I have complained about Rails’s suggestion for world-writable logs and solved it by making it use syslog and now I’m in front of another situation that makes me think that people still don’t know how to stop themselves from creating software that is pretty much insecure by design.

So what’s up? For a customer of mine I ended up having to install a full LAMP stack, rather than my usual LAPR. In particular, this is for a website that will have to run WordPress. Thankfully, I have ModSecurity to help me out, especially since not even two hours after actually setting up the instance, Spiderlabs announced two more security issues including an extract of their commercial rules.

Anyway, the WordPress instance will have to be managed/administered by a friend of mine, who has already had some trouble before with a different hoster, where the whole WordPress instance was injected with tons of malware, so was quite keen on letting me harden the security as much as I could… the problem here is that it seems like there’s not much that I can!

The first problem is that I don’t have a clean way to convert the admin section to forced SSL: not only wp-login.php is outside of the admin subdirectory, but most of WordPress seem to use fully qualified, absolute URIs rather than relative URLs — such as the ones I’m used with Rails, which in the case both of Typo and Radiant let me restrict the admin/ directory to SSL quite easily. Why is that so important to me? Because I would have used an admin URL outside of the website’s domain for SSL: I don’t own a certificate for the website’s domain, which is not mine, nor I want to add it to the list of aliases of my own box. Oh well for now they’ll live with the “invalid certificate” warning.

Next stop is updating the webapp itself; I was sure at that point that “updating the webapp” meant letting the web server write to the wordpress deployment directory… yes, but that’s just part of it. As it happens, plugins are updated via FTP, like my friend told me.. but not in the sense of “downloaded from an FTP website and written to the filesystem” but the other way around: you have to tell WordPress how to access its own deployment via FTP. In a clear-text web form. Admittedly, it supports FTPS, but it’s still not very funny.

I’m unsure if it was a good idea on my part to accept hosting WordPress: we’re talking about installing MySQL, PHP, vsftpd and enabling one more service on the box (vsftp) just to get a blogging platform. Comparatively, Rails look like a lightweight approach.

Configuring Visual Paradigm Server on Tomcat in Gentoo — Sorta

This post might offend some of you as I’m going to write about a piece of proprietary software. It’s software I’ve already discussed before: the UML modeller I’m using on my dayjobs as well as FLOSS work to help me out with design decisions. If you don’t care about non-Free software, feel free to skip this post.

A couple of months ago I discussed the trouble of getting JSP, Rails and mod_perl to play all together in the same pen Apache configuration. The reason why I had JSP in the mix was that the Visual Paradigm software I bought a couple of years ago is (optionally) licenses on a seat-counting floating license, which I much prefer to a single box’s license, as I often have to move around from one computer to the other.

Back in november, it seemed like I was going finally to work with someone else assisting me, so I bought a license for a second seat, and moved the license server (which is a JSP application, or to be precise a module in a complex JSP application) from my local Yamato to the server that serves this blog as well. The idea was that by making it accessible outside of my own network I could use it on my laptops as well as allowing a colleague to access it to coordinate design decisions.

Unfortunately I needed to make it run fast, and at the end of the day I didn’t set it up properly at all, just hacky enough to work… until a Tomcat update arrived. With the update, the ROOT web application was replaced by Tomcat’s own, taking the place of the VPServer application… and all hell broke loose. I didn’t have time to debug this up to today, when I really felt the need to have my UML models in front of me again, so I decided to spend some time to understand how to set this up properly.

My current final setup is something like this: Apache is the frontend server, it handles SSL and proxies the host – https://whatever.flameeyes.eu/ – to the Tomcat server. The Tomcat server is configured with an additional WebApp (rather than replacing the ROOT one) for the VPServer application, which means that I have to do some trickery with mod_rewrite to get the URLs straightened out (Visual Paradigm does not like it if the license server is not top-level, but the admin interface does not like if it’s accessed with a different prefix between Tomcat and Apache).

The application does not only provide floating license entrypoints, it also performs duties for three other modules, mostly collaborative designing tools that need to be purchased separately to be enabled, which I don’t really care about. Possibly for this it allows more than just file-based data storage, which is still the default. You can easily select a MySQL or PostgreSQL instance to store the data — in my case I decided to use PostgreSQL, since the server already had one running, and I’m very happy to lift the I/O task of managing storage from the Java process. For whatever reason, though, the JDBC connector for PostgreSQL is unable to connect to the unix socket path, so I had to sue TCP over localhost. Nothing major, just bothersome.

At the end of the day, all I needed to do was fetching the VPServer WebApp package (.zip file), extract it and move its ROOT/ directory to /var/lib/tomcat-6/webapps/VPServer, make the WEB-INF sub-directory writable to tomcat user (I don’t like it but it seems to be required — the code would like to write to the whole directory structure to be able to auto-update, I’m not really keen on that), and then configure Apache this way:

DocumentRoot /var/lib/tomcat-6/webapps/VPServer

<Directory /var/lib/tomcat-6/webapps/VPServer>
Order allow,deny
Allow from all

<Directory /var/lib/tomcat-6/webapps/VPServer/WEB-INF>
Order deny,allow
Deny from all

RewriteEngine On

RewriteRule ^/VPServer(.*)$ $1 [PT]

ProxyPassMatch ^/(VPServer|images).*$ !
ProxyPassMatch ^/.*\.css$ !
ProxyPass / ajp://localhost:8009/VPServer/
ProxyPassReverse / ajp://localhost:8009/VPServer/

SecRuleRemoveById flameeyes-2

(Before you ask, the SecRuleRemoveById above is just for documentation: the problem was that up to a couple of versions ago, Visual Paradigm left the default Java User-Agent string, which was filtered by my ModSecurity ruleset — nowadays it reports properly more details about its version and the operating system it is running on.)

The end result is pleasant, finally: with all this in mind it should be possible for me to create a (non-QA-compliant, unfortunately) ebuild for the VPServer software for my overlay, to avoid managing it all by myself, risking to forget how to set it up properly. I’m afraid though that it’ll take me much time to properly unbundle all the JARs, but in the mean time I can at least make it easier for me to update it.

What do you mean it’s not IPv6-compatible?

For those who wonder where I disappeared, I’ve had a bit of an health scare, which is unfortunately common for me during summertime. This time the problem seems to be tied to anxiety, should probably pass once most of the stress related to taxes and work projects deadlines gives up.

Earlier this month we’ve had the IPv6 World Test Day, and things seems to have moved quite a bit since then. Even the Gentoo Infra team had a bit of work to be done to get ready for the test day and to set it up, and if you follow Apple’s software updates you probably know that they “improved IPv6 reliability” with their latest release.

I’m very interested in the IPv6 technology myself, and I’d very much like to rely more on it; unfortunately, as it happens, my hosting provider still hasn’t provided me with IPv6 addresses, nor it seems likely to happen soon. On the other hand, I’ve deployed it at home, even backing off from 6to4 which was my original hope to avoid tunnels (Hurricane Electric is much more reliable, and faster). While I can’t remember an IPv6 address by heart, I can set up proper, visible hostnames for my boxes so that I can compare the logs and not be forced to use NATed addresses all the time.

Now, given that IPv6 is fully deployed in my home network, if a website is set up to use IPv6, then it’ll be using IPv6. It could be a bit of a slow-down when you consider that I use a tunnel to get to the IPv6 network, but generally it seems to behave just as good, possibly because my home network is slow enough by itself. Of course, the website needs to be IPv6-compatible, and not just “IPv6 ready”.

What happens is that a number of websites have enabled IPv6 during the World Test Day, and when they saw that enough users were able to access them just fine, they kept the extra addresses on.. why doing twice the work to turn it off? But that kind of testing sometimes is not just good enough. While the main obstacle to IPv6 support is listening for and responding to IPv6-bound requests, there is code that deals with remote and local host addresses in most applications, including web applications. Validating addresses, applying ACLs, and all these things are due to require knowledge of the addresses it has to deal with, and so many times, they expect dotted-quad IPv4 addresses.

I’m still fighting with one real-world scenario as such. Most of my domains are registered through the French provider OVH who also started providing IPv6-access to their website after the World Day. All the management services work just fine (even though last I checked they didn’t provide a dynamic AAAA record, which is why I had to search for complex alternative approaches which, actually, I’m still keeping up with), as well as the pages detailing their products and services. But when I had to renew one of the domains, it stopped when I was supposed to be redirected to pay (via creditcard), with an internal server error (HTTP 500 Error).

After waiting over the weekend (and a bit, given I was swamped with work), I’ve decided to call to see if it was a known issue: it wasn’t, the system was supposedly working fine, and they suggested me to try a different computer. After testing with Firefox on Windows (no go), I’ve tried the infamous iPad and… it worked. A quick doubt I got was related to the connection protocol, and bingo: it works all fine with IPv4, but fails badly with v6.

This is a very plain example of how just listening for v6 connections is not enough: you need to ensure that the interaction between pieces of the software are still satisfied. In this instance, I can think of two possible reasons why it doesn’t work correctly with IPv6:

  • the system logs all the payment requests separately, and to do so, it assumes the remote host address to be a dotted-quad;
  • since the page redirects to their processing bank’s site, it probably signals it of the incoming request to avoid cross-site forgery, and again it assumes a dotted-quad address.

Whatever the problem, right now they fail to process payments properly, and when I reported it they shut me down twice (first on the phone “oh it’s not our problem, but the bank’s!”), then by mail (“everything should be fine now!” — no it isn’t).. and still they are publishing AAAA records for their website.

If even an European-wide ISP fails this badly at implementing IPv6 on their website (for one critical piece of infrastructure as payment processing is!), I’m afraid that we have very little hope for IPv6 to get deployed worldwide painlessly.