Does your webapp really need network access?

One of the interesting thing that I noticed after shellshock was the amount of probes for vulnerabilities that counted on webapp users to have direct network access. Not only ping to known addresses to just verify the vulnerability, or wget or curl with unique IDs, but even very rough nc or even /dev/tcp connections to give remote shells. The fact that probes are there makes it logical to me to expect that for at least some of the systems these actually worked.

The reason why this piqued my interest is because I realized that most people don’t do the one obvious step to mitigate this kind of problems by removing (or at least limiting) the access to the network of their web apps. So I decided it might be a worth idea to describe a moment why you should think of that. This is in part because I found out last year at LISA that not all sysadmins have enough training in development to immediately pick up how things work, and in part because I know that even if you’re a programmer it might be counterintuitive for you to think that web apps should not have access, well, to the web.

Indeed, if you think of your app in the abstract, it has to have access to the network to serve the response to the users, right? But what happens generally is that you have some division between the web server and the app itself. People who have looked into Java in the early nougthies probably have heard of the term Application Server, which usually is present in form of Apache Tomcat or IBM WebSphere, but here is essentially the same “actor” for Rails app in the form of Passenger, or for PHP with the php-fpm service. These “servers” are effectively self-contained environments for your app, that talk with the web server to receive user requests and serve them responses. This essentially mean that in the basic web interaction, there is no network access needed for the application service.

Things gets a bit more complicated in the Web 2.0 era though: OAuth2 requires your web app to talk, from the backend, with the authentication or data providers. Similarly even my blog needs to talk with some services, to either ping them to tell them that a new post is out, and to check with Akismet for blog comments that might or might not be spam. WordPress plugins that create thumbnails are known to exist and to have a bad history of security and they fetch external content, such as videos from YouTube and Vimeo, or images from Flickr and other hosting websites to process. So there is a good amount of network connectivity needed for web apps too. Which means that rather than just isolating apps from the network, what you need to implement is some sort of filter.

Now, there are plenty of ways to remove access to the network from your webapp: SElinux, GrSec RBAC, AppArmor, … but if you don’t want to set up a complex security system, you can do the trick even with the bare minimum of the Linux kernel, iptables and CONFIG_NETFILTER_XT_MATCH_OWNER. Essentially what this allows you to do is to match (and thus filter) connections based of the originating (or destination) user. This of course only works if you can isolate your webapps on a separate user, which is definitely what you should do, but not necessarily what people are doing. Especially with things like mod_perl or mod_php, separating webapps in users is difficult – they run in-process with the webserver, and negate the split with the application server – but at least php-fpm and Passenger allow for that quite easily. Running as separate users, by the way, has many more advantages than just network filtering, so start doing that now, no matter what.

Now depending on what webapp you have in front of you, you have different ways to achieve a near-perfect setup. In my case I have a few different applications running across my servers. My blog, a WordPress blog of a customer, phpMyAdmin for that database, and finally a webapp for an old customer which is essentially an ERP. These have different requirements so I’ll start from the one that has the lowest.

The ERP app was designed to be as simple as possible: it’s a basic Rails app that uses PostgreSQL to store data. The authentication is done by Apache via HTTP Basic Auth over HTTPS (no plaintext), so there is no OAuth2 or other backend interaction. The only expected connection is to the PostgreSQL server. Pretty similar the requirements for phpMyAdmin: it only has to interface with Apache and with the MySQL service it administers, and the authentication is also done on the HTTP side (also encrypted). For both these apps, your network policy is quite obvious: negate any outside connectivity. This becomes a matter of iptables -A OUTPUT -o eth0 -m owner --uid-owner phpmyadmin -j REJECT — and the same for the other user.

The situation for the other two apps is a bit more complex: my blog wants to at least announce that there are new blog posts, and it needs to reach Akismet; both actions use HTTP and HTTPS. WordPress is a bit more complex because I don’t have much control over it (it has a dedicated server, so I don’t have to care), but I assume it mostly is also HTTP and HTTPS. The obvious idea would be to allow ports 80, 443 and 53 (for resolution). But you can do something better. You can put a proxy on your localhost, and force the webapp to go through it, either as a transparent proxy or by using the environment variable http_proxy to convince the webapp to never connect directly to the web. Unfortunately that is not straight forward to implement as neither Passenger not php-fpm has a clean way to pass environment variables per users.

What I’ve done is for now is to hack the environment.rb file to set ENV['http_proxy'] = 'http://127.0.0.1:3128/' so that Ruby will at least respect it. I’m still out for a solution for PHP unfortunately. In the case of Typo, this actually showed me two things I did not know: when looking at the admin dashboard, it’ll make two main HTTP calls: one to Google Blog Search – which was shut down back in May – and one to Typo’s version file — which is now a 404 page since the move to the Publify name. I’ll be soon shutting down both implementations since I really don’t need it. Indeed the Publify development still seems to go toward the “let’s add all possible new features that other blogging sites have” without considering the actual scalability of the platform. I don’t expect me to go back to it any time soon.

Yes, again spam filtering

You might remember that I reported success with my filters using User-Agent value as reported by the clients; unfortunately it seems like I was really speaking way too soon. While the amount of spam I had to manually remove from the blog decreased tremendously, which allowed me to disable the 45 days limits on commenting, and also comment moderation, it still didn’t cut it, and caused a few false positives.

The main problem is that the filter on HTTP/1.0 behaviour was hitting almost anybody that tried to comment with a proxied connection: default squid configuration doesn’t use HTTP/1.1 and so downgrades everything to 1.0; thanks to binki and moesasji I was able to track down the issue and now my ruleset (which I’m going to attach at the end of the post) checks for the Via header to identify proxies. Unfortunately, the result is that now I get much more spam; indeed lots and lots of comment spam comes through open proxies, which is far from an uncommon thing.

I guess one option would be to use the SORBS DNSBL blacklists to filter out known open proxies; unfortunately either I misconfigured the dnsbl lookup module for Apache (which I hoped I got already working) or the proxies I’m receiving spam from are not listed there at all. I was also told that mod_security can handle the lookup itself, which is probably good since I can reduce the lookups for the open proxy to the case when a proxy is actually used.

I was also told to look at the rules from Got Root which also list some user agent based filtering; I haven’t done so yet though, because I start to get worried: my rules are already executing a number of regular expression matching on the User-Agent header, and I’m trying to do my best to make sure that the expressions are generic enough but not too broad; on the other hand Got Root’s rules seems to provide a straight match of a series of user agents, which means lots and lots of checks added; the rules also seems to either be absolute (for any requested URL) or just for WordPress-based blogs, which means I’d have to adapt or tinker with them since I’m currently limiting the antispam measures through the use of Apache’s Location block (previously LocationMatch, but the new Typo version uses a single URL for all the comments posting).

What I’d like to see is some kind of Apache module that is able to match an User-Agent against a known list of bad User-Agents, as well as a list of regular expressions, compiled in some kind of bytecode, to be much much faster than the “manual” parsing that is done now. Unfortunately I don’t have neither time nor expertise with Apache to take care of that myself, which means either someone else does it, or I’m going to keep with mod_security for a while longer.

Anyway here’s the beef!

SecDefaultAction "pass,phase:2,t:lowercase"

# Ignore get requests since they cannot post comments.
SecRule REQUEST_METHOD "^get$" "pass,nolog"

# 2009-02-27: Kill comments where there is no User-Agent at all; I
# don't care if people like to be "anonymous" in the net, but the
# whole thing about anonymous browsers is pointless.
SecRule REQUEST_HEADERS:User-Agent "^$" 
    "log,msg:'Empty User-Agent when posting comments.',deny,status:403"

# Since we cannot check for _missing_ user agent we have to check if
# it's present first, and then check whether the variable is not
# set. Yes it is silly but it seems to be the only way to do this with
# mod_security.
SecRule REQUEST_HEADERS_NAMES "^user-agent" 
    "setvar:tx.flameeyes_has_ua=1"
SecRule TX:FLAMEEYES_HAS_UA "!1" 
    "log,msg:'Missing User-Agent header when posting comments.',deny,status:403"

# Check if the comment arrived from a proxy; if that's the case we
# cannot rely on the HTTP version that is provided because it's not
# the one of the actual browser. We can, though, check if it's an open
# proxy blacklist.
SecRule REQUEST_HEADERS_NAMES "^via" 
    "setvar:tx.flameeyes_via_proxy=1,log,msg:'Commenting via proxy'"

# If we're not going through a proxy, and it's not lynx, and yet we
# have an HTTP/1.0 comment request, then it's likely a spambot with a
# fake user agent.
#
# Note the order of the rules is explicitly set this way so that the
# majority of requests from HTTP/1.1 browsers (legit) are ignored
# right away; then all the requests from proxies, then lynx.
SecRule REQUEST_PROTOCOL "!^http/1.1$" 
    "log,msg:'Host has to be used but HTTP/1.0, posting spam comments.',deny,status:403,chain"
SecRule TX:FLAMEEYES_VIA_PROXY "!1" "chain"
SecRule REQUEST_HEADERS:User-Agent "!lynx"


# Ignore very old Mozilla versions (not modern browsers, often never
# exiting) and pre-2 versions of Firefox.
#
# Also ignore comments coming from IE 5 or earlier since we don't care
# about such old browsers. Note that Yahoo feed fetcher reports itself
# as MSIE 5.5 for no good reason, but we don't care since it cannot
# _post_ comments anyway.
#
# 2009-02-27: Very old Gecko versions should not be tollerated, grace
# the period 2007-2009 for now.
#
# 2009-03-01: Ancient Opera versions usually posting spam comments.
#
# 2009-04-22: Some spammers seem to send requests with "Opera "
# instead of "Opera/", so list that as an option.
SecRule REQUEST_HEADERS:User-Agent "(mozilla/[0123]|firefox/[01]|gecko/200[0123456]|msie ([12345]|7.0[ab])|opera[/ ][012345678])" 
    "log,msg:'User-Agent too old to be true, posting spam comments.',deny,status:403"

# The Mozilla/4.x and /5.x agents have 0 as minor version, nothing
# else.
SecRule REQUEST_HEADERS:User-Agent "(mozilla/[45].[1-9])" 
    "log,msg:'User-Agent sounds fake, posting spam comments.',deny,status:403"

# Malware and spyware that advertises itself on the User-Agent string,
# since a lot of spam comments seem to come out of browsers like that,
# make sure we don't accept their comments.
SecRule REQUEST_HEADERS:User-Agent "(funwebproducts|myie2|maxthon)" 
    "log,msg:'User-Agent contains spyware/adware references, posting spam comments.',deny,status:403"

# Bots usually provide an http:// address to look up their
# description, but those don't usually post comments. Consider any
# comment coming from a similar User-Agent as spam.
SecRule REQUEST_HEADERS:User-Agent "http://" 
    "log,msg:'User-Agent spamming URLs, posting spam comments.',deny,status:403"

SecRule REQUEST_HEADERS:User-Agent 
    "^mozilla/4.0+" "log,msg:'Spaces converted to + symbols, posting spam comments.',deny,status:403"

# We expect Windows XP users to upgrade at least to IE7. Or use
# Firefox (even better) or Safari, or Opera, ...
#
# All the comments coming from the old default OS browser have a high
# chance of being spam, so reject them.
#
# 2009-04-22: Note that we shouldn't check for 5.0 and 6.0 NT versions
# specifically, since Server and x64 editions can have different minor
# versions.
SecRule REQUEST_HEADERS:User-Agent "msie 6.0;( .+;)? windows nt [56]." 
    "log,msg:'IE6 on Windows XP or Vista, posting spam comments.',deny,status:403"

# List of user agents only ever used by spammers
#
# 2009-04-22: the "Windows XP" declaration is never used by official
# MSIE agent strings, it uses "Windows NT 5.0" instead, so if you find
# it, just kill it.
SecRule REQUEST_HEADERS:User-Agent "(libwen-us|msie .+; .*windows xp)" 
    "log,msg:'Confirmed spam User-Agent posting spam comments.',deny,status:403"

Another appeal for CJK improvements

I’m trying a new appeal looking for proxy maintainers. Thanks to Patrick, we’re finding more and more packages currently rotting down in ebuilds that don’t get updated, and sometimes not even bumped.

Unfortunately, I’m not good enough with Japanese (and I’ll leave alone other languages of which I don’t really know anything) so what I can do is just simple testing and looking if it seems to work fine. Zhang Le is already helping taking care of zhcon, but there is plenty of ebuilds that needs to be cleaned up and fixed. I’ve fixed ochusha a few minutes ago to install the desktop file in the right place, so that it appears on menus, but I can only find these things when I stumble across them.

So, once again, if you’re interested in helping Gentoo to support CJK packages, please contact me or the CJK team.. if you’re maintaining locally or in an overlay a particular ebuild because the one in portage is outdated or broken, open a bug for it and state clearly that you’d like to be the proxied maintainer for that, and I’ll look into it.

I’ll be trying to run an announce on GWN about this too, let’s hope I can find enough people to help with this ;)

Too many overlays will bring us down

So, a part closing envelopes, last night and today I’m working to fix as many bug as I can with the time I’m given, as I want ot use the time I have before starting new jobs as much as I can for Gentoo, in the case I won’t have enough time later (although I hope to still have some afterward).

Today I’ll be working on patching a some software for FLAC 1.1.3 support, thanks to Josh Coalson (FLAC upstream) who has provided already three patches: kdemultimedia, k3b and vorbis-tools. Last night instead, as I had the chroot on pitr still opened, I decided to spend more time testing and fixing the bugs for CJK that Patrick reported through his tinderbox effort.

The result is that I’ve dedicated also this morning to fixing CJK bugs.. considering my Japanese is waaay limited, and I decided to help the herd just because I had free time, they missed manpower, and I had a few changes to merge, being able to fix stuff there is something that makes me feel a bit better toward my involvement. Unfortunately as I said I can’t really use most of the software there, and this disallow me from trying it too :(

It’s sad to see so little people involved lately :( Most of the CJK packages are unmaintained.. I fixed cannadic eclass and the canna packages not to have a canna useflag, but this does not entirely help… I’ve been told there’s a gentoo-china overlay, but I cannot find it in layman and I don’t remember seeing much things from that in bugzilla…

Then it comes the news from the sound herd. ReZound is masked and pending removal next month, if nobody steps up to maintain it. The problem is that currently, under the sound herd you can find desktop multimedia applications like Amarok and Audacious, libraries mostly used by video software like a52dec and libdts, and professional audio software like Audacity and ReZound. In the past there were also ALSA drivers and packages, they are now instead on their own herd, which means that if you look up alsa-driver you get a more truthful response (me and phreak being the ones handling its bugs) rather than the whole sound herd… for professional audio software there is almost nobody taking care of it lately, especially now that kito is being retired.

And then, I’m told by lavish on #gentoo-it that there’s a proaudio overlay . Why fraction this way the efforts? Why instead of creating an overlay they cannot become official developers? :|

I’ll try to write an appeal for the next GWN, trying to get more people involved with professional audio software in Portage, and maybe something for CJK could help. The problem is to establish at least some kind of trust relationship, so that the packages can be proxy maintained… I don’t know a thing about pro audio software, but I know quite enough of ebuild development for handling a proxy maintainership I suppose ;)

Anyway, if you’re interested, feel free to drop me a line, I’ll try to answer as soon as I can!