The importance of reliability

In the past seven years I worked at Google as a Site Reliability Engineer. I’m actually leaving the company — I’m writing this during my notice period. I’m currently scheduled to join Facebook as a Production Engineer at the end of May (unless COVID-19 makes things even more complicated). Both roles are related to reliability of services – Google even put it in the name – so you could say I have more than a passing idea of what is involved in maintaining (if not writing) reliable software and services.

I had to learn to be an SRE — this was my first 9-5 job (well, not really 9-5 given that SREs tend to have very flexible and very strange hours), and I hadn’t worked on such a scale before this. And as I wrote before, the job got me used to expect certain things. But it also made me realise how important it is for services and systems to be reliable, as well as secure. And just how much this is not fairly distributed out there.

During my tenure at Google, I’ve been oncall for many different services. Pretty much all of them have been business critical in one way or another — some much more than others. But none of them were critical to society: I’ve never joined the Google Cloud teams, any of the communication teams, or Maps teams. I had been in the Search team, but while it’s definitely important to the business, I think society would rather stay without search results than without a way to contact their loved ones. But that’s just my personal opinion.

The current huge jump in WFH users due to COVID-19 concerns has clearly shown how much more critical to society some of the online services are, that even ten years ago wouldn’t be found as important: Hangouts, Meet, Zoom, Messenger, WhatsApp, and the list goes on. Video calls appear to be the only way to get in touch with our loved ones right now, as well as, for many, the only way to work. Thankfully, most of these services are provided by companies that are big enough to be able to afford reliability in one form or another.

But at least in the UK, this has shown how many other services are clearly critical for society, but not provided by companies who can afford reliability. Online grocery shopping became the thing to do, nearly overnight. Ocado, possibly the biggest grocery delivery company, had had so much pressure on their system that they had to scramble, first introducing a “virtual queue” system, and then eventually taking down the whole website. As I type this, their website has a front page that informs you that the login is only available for those who already have a slot booked for this weekend, and otherwise is not available to anyone — no new slots can be booked.

In similar fashion online retailers, surgery online systems, online prescription services, and banks also appeared to be smothered in requests. I would be surprised if libraries, bookstores, and restaurant websites who don’t rely on the big delivery companies weren’t also affected.

And that had made me sad, and at least in part made me feel ashamed of myself. You see, I have been interviewing at another place, while I was looking for a new job. Not a big multinational company, a smaller one, an utility. And while the offer was very appealing, it was also a more challenging role, and I decided to pass on it. I’m not saying that I’d have made a huge difference for them from any other “trained” SRE, but I do think that a lot of these “smaller” players need their fair dose of reliability.

The problem is that there’s a mixture of different attitude, and actual costs, related to reliability the way Google and the other “bigs” do it. In the case of Google, more often than not the answer to something not working very well is to throw more resources (CPU, memory, storage) at it. That’s not something that you can do quickly when your service is running “on premise” (that is, in your own datacenter cabinet), and not something that you can do cheaply when you run on someone else’s cloud solution.

The thing is, Cloud is not just someone else’s computer. It’s a lot of computers, and it does add a lot of flexibility. And it can even be cheaper than running your own server, sometimes. But it’s also a risk, because if you don’t know when to say “enough”, you end up with budget-wrecking bills. Or sometimes with a problem “downstream”. Take Ocado — the likeliness it that it’s not the website that was being overloaded. It was the fulfillment. Indeed, the virtual queue approach was awesome: it limited the whole human interaction, not just the browser requests. And indeed, the queue worked fine (unlike, say, the CCC ticket queue), and the website didn’t look overloaded at all.

But saying that on-premise equipment does not scale is not trying to market cloud solutions — it’s admitting the truth: if you start getting so many requests at short notice, you can’t go, buy, image, and set up to serve another four or five machines — but you can tell Google Cloud, Amazon, Azure, to go and triple the amount of resources available. And that might or might not make it better for you.

It’s a tradeoff. And not one I have answers for. I can’t say I have experience with managing this tradeoff, either — all the teams I worked on had nearly blank cheques for internal resources (not quite, but nearly), and while resource saving was and is a thing, it never gets to be a real dollar amount that, as an SRE, you end up dealing with. While other companies, particularly smaller companies, need to pay a lot of attention to that.

From my point of view, what I can do is try to be more open with discussing design decisions in my software, particularly when I think it’s my experience talking. I still need to work actively on Tanuga, and I am even considering making a YouTube video of me discussing the way I plan to implement it — as if I was discussing this during a whiteboard design interview (since I have quite a bit of experience with them, this year).

Fantasyland: in the world of IPv6 only networks

It seems to be the time of the year when geeks think that IPv6 is perfect, ready to be used, and the best thing after sliced bread (or canned energy drinks). Over on Twitter, someone pointed out to me that FontAwesome (which is used by the Hugo theme I’m using) is not accessible over an IPv6-only network, and as such the design of the site is broken. I’ll leave aside my comments on FontAwesome because they are not relevant to the rant at hand.

You may remember I called IPv6-only networks unrealistic two years ago, and I called IPv6 itself a geeks’ wet dream last year. You should then not be surprised to find me calling this Fantasyland an year later.

First of all, I want to make perfectly clear that I’m not advocating that IPv6 deployment should stop or slow down. I really wish it would be actually faster, for purely selfish reasons I’ll get to later. Unfortunately I had to take a setback when I moved to London, as Hyperoptic does not have IPv6 deployment, at least in my building, yet. But they provide a great service, for a reasonable price, so I have no intention to switch to something like A&A just to get a good IPv6 right now.

$ host has address has address mail is handled by 0

$ host has address has address

$ host has address

$ host is an alias for has address
Host not found: 2(SERVFAIL)

$ host is an alias for is an alias for has address

$ host is an alias for has address has address has IPv6 address 2001:8b0:0:30::65 has IPv6 address 2001:8b0:0:30::68

I’ll get back to this later.

IPv6 is great for complex backend systems: each host gets their own uniquely-addressable IP, so you don’t have to bother with jumphosts, proxycommands, and so on so forth. Depending on the complexity of your backend, you can containerize single applications and then have a single address per application. It’s a gorgeous thing. But as you move towards user facing frontends, things get less interesting. You cannot get rid of IPv4 on the serving side of any service, because most of your visitors are likely reaching you over IPv4, and that’s unlikely to change for quite a while longer still.

Of course the IPv4 address exhaustion is a real problem and it’s hitting ISPs all over the world right now. Mobile providers already started deploying networks that only provide users with IPv6 addresses, and then use NAT64 to allow them to connect to the rest of the world. This is not particularly different from using an old-school IPv4 carrier-grade NAT (CGN), which a requirement of DS-Lite, but I’m told it can get better performance and cost less to maintain. It also has the advantage of reducing the number of different network stacks that need to be involved.

And in general, having to deal with CGN and NAT64 add extra work, latency, and in general bad performance to a network, which is why gamers, as an example, tend to prefer having a single-stack network, one way or the other.

$ host has address

$ host is an alias for is an alias for has address

$ host is an alias for is an alias for has address

$ host is an alias for is an alias for is an alias for has address has IPv6 address 2a02:26f0:a1:29e::71e has IPv6 address 2a02:26f0:a1:280::71e

$ host is an alias for is an alias for has address

But multiple other options started spawning around trying to tackle the address exhaustion problem, faster than the deployment of IPv6 is happening. As I already noted above, backend systems, where the end-to-end is under control of a single entity, are perfect soil for IPv6: there’s no need to allocate real IP addresses to these, even when they have to talk over the proper Internet (with proper encryption and access control, goes without saying). So we won’t see more allocations like Xerox’s or Ford’s of whole /8 for backend systems.

$ host is an alias for is an alias for has address

$ host is an alias for is an alias for has address

$ host is an alias for has address has address has address has address has IPv6 address 2a04:4e42::67 has IPv6 address 2a04:4e42:200::67 has IPv6 address 2a04:4e42:400::67 has IPv6 address 2a04:4e42:600::67 mail is handled by 10 mail is handled by 20 mail is handled by 30 mail is handled by 30 mail is handled by 30 mail is handled by 30 mail is handled by 20

Another technique that slowed down the exhaustion is SNI. This TLS feature allows to share the same socket for applications having multiple certificates. Similarly to HTTP virtual hosts, that are now what just about everyone uses, SNI allows the same HTTP server instance to deliver secure connections for multiple websites that do not share their certificate. This may sound totally unrelated to IPv6, but before SNI became widely usable (it’s still not supported by very old Android devices, and Windows XP, but both of those are vastly considered irrelevant in 2018), if you needed to provide different certificates, you needed different sockets, and thus different IP addresses. It would not be uncommon for a company to lease a /28 and point it all at the same frontend system just to deliver per-host certificates — one of my old customers did exactly that, until XP became too old to support, after which they declared it so, and migrated all their webapps behind a single IP address with SNI.

Does this mean we should stop caring about the exhaustion? Of course not! But if you are a small(ish) company and you need to focus your efforts to modernize infrastructure, I would not expect you to focus on IPv6 deployment on the frontends. I would rather hope that you’d prioritize TLS (HTTPS) implementation instead, since I would rather not have malware (including but not limited to “coin” miners), to be executed on my computer while I read the news! And that is not simple either.

$ host is an alias for has address has address

$ host is an alias for has address has address has address has address

$ host has address has address has address has address has address has address has address has address

Okay I know these snippets are getting old and probably beating a dead horse. But what I’m trying to bring home here is that there is very little to gain in supporting IPv6 on frontends today, unless you are an enthusiast or a technology company yourself. I work for a company that believes in it and provides tools, data, and its own services over IPv6. But it’s one company. And as a full disclosure, I have no involvement in this particular field whatsoever.

In all of the examples above, which are of course not complete and not statistically meaningful, you can see that there are a few interesting exceptions. In the gaming world, XBox appears to have IPv6 frontends enabled, which is not surprising when you remember that Microsoft even developed one of the first tunnelling protocols to kickstart adoption of IPv6. And of course XKCD, being ran by a technologist and technology enthusiast couldn’t possibly ignore IPv6, but that’s not what the average user needs from their Internet connection.

Of course, your average user spends a lot of time on platforms created and maintained by technology companies, and Facebook is another big player of the IPv6 landscape, so they have been available over it for a long while — though that’s not the case of Twitter. But at the same time, they need their connection to access their bank…

$ host is an alias for has address

$ host has address

$ host has address

$ host has address

$ host has address

$ host has address

$ host is an alias for has address

$ host has address

to pay their bills…

$ host has address

$ host has address

$ host has address

$ host is an alias for is an alias for has address

$ host has address

$ host is an alias for has address mail is handled by 10 mail is handled by 30

$ host is an alias for has address

to do shopping…

$ host is an alias for is an alias for is an alias for is an alias for is an alias for has address

$ host is an alias for is an alias for has address

$ host is an alias for is an alias for has address

$ host is an alias for is an alias for has address

$ host is an alias for is an alias for has address

to organize fun with friends…

$ host is an alias for is an alias for has address

$ host is an alias for has address

$ host is an alias for is an alias for is an alias for is an alias for has address

$ host has address

and so on so forth.

This means that for an average user, an IPv6-only network is not feasible at all, and I think the idea that it’s a concept to validate is dangerous.

What it does not mean, is that we should just ignore IPv6 altogether. Instead we should make sure to prioritize it accordingly. We’re in a 2018 in which IoT devices are vastly insecure, so the idea of having a publicly-addressable IP for each of the devices in your home is not just uninteresting, but actively frightening to me. And for the companies that need the adoption, I would hope that the priority right now would be proper security, instead of adding an extra layer that would create more unknowns in their stack (because, and again it’s worth noting, as I had a discussion about this too, it’s not just the network that needs to support IPv6, it’s the full application!). And if that means that non-performance-critical backends are not going to be available over IPv6 this century, so be it.

One remark that I’m sure is going to arrive from at least a part of the readers of this, is that a significant part of the examples I’m giving here appear to all be hosted on Akamai’s content delivery network which, as we can tell from XBox’s website, supports IPv6 frontends. “It’s just a button to press, and you get IPv6, it’s not difficult, they are slackers!” is the follow up I expect. For anyone who has worked in the field long enough, this would be a facepalm.

The fact that your frontend can receive IPv6 connections does not mean that your backends can cope with it. Whether it is for session validation, for fraud detection, or just market analysis, lots of systems need to be able to tell what IP address a connection was coming from. If your backend can’t cope with IPv6 addresses being used, your experience may vary between being unable to buy services and receiving useless security alerts. It’s a full stack world.

Personal Infrastructure Services Security and Reliability

I started drafting this post just before I left Ireland for Enigma 2017. While at ENIGMA I realized how important it is to write about this because it is too damn easy to forget about it altogether.

How secure and reliable are our personal infrastructure services, such as our ISPs? My educated guess is, not much.

The start of this story I already talked about: my card got cloned and I had to get it replaced. Among the various services that I needed it replaced in, there were providers in both Italy and Ireland: Wind and Vodafone in Italy, 3 IE in Ireland. As to why I had to use an Irish credit card in Italy, it is because SEPA Direct Debit does not actually work, so my Italian services cannot debit my Irish account directly, as I would like, but they can charge (nearly) any VISA or MasterCard credit card.

Changing the card on Wind Italy was trivial, except that when (three weeks later) I went to restore to the original Tesco card, Chrome 56 reported the site as Not Secure because the login page is served on a non-secure connection by default (which means it can be hijacked by a MITM attack). I bookmarked the HTTPS copy (which load non-encrypted resources, which makes it still unsafe) and will keep using that for the near future.

Vodafone Italy proved more interesting in many ways. The main problem is that I could not actually set up the payment with the temporary card I intended to use (Ulster Bank Gold), the website would just error out on me providing a backend error message — after annoying Vodafone Italy over Twitter, I found out that the problem is in the BIN of the credit card, as the Tesco Bank one is whitelisted in their backend, but the Ulster Bank is not. But that is not all; all the pages of the “Do it yourself” have mixed-content requests, making it not completely secure. But this is not completely uncommon.

What was uncommon and scary was that while I was trying to force them into accepting the card I got to the point where Chrome would not auto-fill the form because not secure. Uh? Turned out that, unlike news outlets, Vodafone decided that their website with payment information, invoices, and call details does not need to be hardened against MITM, and instead allows stripping HTTPS just fine: non-secure cookies and all.

In particular what happened was that the left-side navigation link to “Payment methods” used an explicit http:// link, and the further “Edit payment method” link is a relative link… so it would bring up the form in a non-encrypted page. I brought it up on Twitter (together with the problems with changing the credit card on file), and they appear to have fixed that particular problem.

But almost a month later when I went out to replace the card with the new Tesco replacement card, I managed to find something else with a similar problem: when going through the “flow” to change the way I receive my bill (I wanted the PDF attached), the completion stage redirects me to an HTTP page. And from there, even though the iframes are then loaded over HTTPS, the security is lost.

Of course there are two other problems: the login pane is rendered on HTTP, which means that Chrome 56 and the latest Firefox consider it not secure, and since the downgrade from HTTPS to HTTP does not log me out, it means the cookies are not secure, and that makes it possible for an attacker to steal them with not much difficulty. Particularly as the site does not seem to send any HTTP headers to make the connection safe ( of Mozilla Observatory).

Okay so these two Italian providers have horrible security, but at least I have to say that they mostly worked fine when I was changing the credit cards — despite the very cryptic error that Vodafone decided to give me because my card was foreign. Let’s now see two other (related) providers: Three Ireland and UK — ironically enough, in-between me having to replace the card and writing this post, Wind Italy has completed the merge with Three Italy.

Both the Threes websites are actually fairly secure, as they have a SAML flow on a separate host for login, and then a separate host again for the account management. Even though they also get a bad grade on Mozilla Observatory.

What is more interesting with these two websites is their reliability, or lack thereof. For now almost a month, the Three Ireland website does not allow me to check my connected payment cards, or change them. Which means the automatic top-up does not work and I have to top-up manually. Whenever I try to get to the “Payment Cards” page, it starts loading and then decides to redirect me back to the homepage of the self-service area. It also appears to be using a way to do redirection that is not compatible with some Chrome policy as there is a complicated warning message on the console when that happens.

Three UK is slightly better but not by much. All of this frustrating experience happened just before I left for my trip to the USA for ENIGMA 2017. As I wrote previously I generally use 3 UK roaming there. To use the roaming I need to enable an add-on (after topping up the prepaid account of course), but the add-ons page kept throwing errors. And the documentation suggested to call the wrong number to enable the add-ons on the phone. They gave me the right one over Twitter, though.

Without going into more examples of failures from phone providers, the question for me would be, why is that all we hear about security and reliability comes from either big companies like Google and Facebook, or startups like Uber and AirBnb, but not from ISPs.

While ISPs stopped being the default provider of email for most people years and years ago, they are still the one conduit we need to connect to the rest of the Internet. And when they screw up, they screw up big. Why is it that they are not driving the reliability efforts?

Another obvious question would be whether the open source movement can actually improve the reliability of ISPs by building more tools for management and accounting, just as they used to be more useful to ISPs by building mail and news servers. Unfortunately, that would require admitting that some times you need to be able to restrict the “freedom” of your users, and that’s not something the open source movement has ever been able to accept.

Random quality

We all know that random numbers might not be very random unless you are very careful. Indeed, as the (now old) Debian OpenSSL debacle, a not-enough-random random number generator can be a huge breach in your defences. The other problem is that if you want really random numbers you need a big pool of entropy otherwise code requiring a huge chunk of random bytes would stall until enough data is available.

Luckily there are a number of ways to deal with this; one is to use the EntropyKey while other involves either internal sources of entropy (which is what timer_entropyd and haveged do), or external ones (audio_entropyd, but a number of custom circuitry and software exist as well). These fill in the entropy pool, hopefully at a higher rate than it is depleted, providing random data that is still of high quality (there are other options such as prngd, but as far as I can tell those are slightly worse in term of quality).

So, the other day I was speaking with Jaervosz, who’s also an EntropyKey user, and we were reflecting on whether, if there is not enough entropy during crypto operations, the process would stall or cause the generation to be less secure. In most cases, this shouldn’t be a problem: any half-decent crypto software will make sure not to process pseudo-random numbers (this is why OpenSSL key generation tells you to move your mouse or something).

What we ended up wondering about, was how much software uses /dev/urandom (that re-uses the entropy when it’s starving) rather than /dev/random (which blocks on entropy starvation). Turns out there are quite a few. For instance on my systems, I know that Samba uses /dev/urandom, and so does netatalk — neither of which make me very happy.

A few ebuilds allow you to choose which one you want to use through the (enabled-by-default) urandom USE flag… but these I noted above aren’t among those. I suppose, one thing we could be doing would be going over a few ebuilds and see if we can make it configurable which one to use.. for those of us who make sure to have a stable source of entropy, this change should be a very good way to be safe.

Are you wondering if any of your mission-critical services are using /dev/urandom ? Try this:

# fuser -v /dev/{,u}random
                     USER        PID ACCESS COMMAND
/dev/random:         root      12527 F.... ekey-egd-linux
/dev/urandom:        root      10129 f.... smbd
                     root      10141 f.... smbd
                     root      10166 f.... afpd
                     flame     12356 f.... afpd

Also, if you want to make sure that any given service is started only after the entropy services, you can simply make it depend on the virtual service entropy (provided by haveged, or ekeyd if set to kernel output, or ekey-egd-linux if set to EGD output). A quick way to do so without having to edit the init script yourself, is to add the following line to /etc/conf.d/$SERVICENAME:


Backing up cloud data? Help request.

I’m very fond of backups, after the long series of issues I’ve had before I started doing incremental backups.. I still have got some backup DVDs around, some of which are almost unreadable, and at least one that is compressed with the xar archive in a format that is no longer supported, especially on 64-bit.

Right now, my backups are all managed through rsnapshot, with a bit of custom scripts over it to make sure that if an host is not online, the previous backup is maintained. This works almost perfectly, if you exclude the problems with restored files and the fact that a rename causes files to double, as rsnapshot does not really apply any data de-duplication (and the fdupes program and the like tend to be .. a bit too slow to use on 922GB of data).

But there is one problem that rsnapshot does not really solve: backup of cloud data!

Don’t get me wrong: I do backup the (three) remote servers just fine, but this does not cover the data that is present in remote, “cloud” storage, such as the GitHub, Gitorious and BitBucket repositories, or delicious bookmarks, GMail messages, and so on so forth.

Cloning the bare repositories and backing those up is relatively trivial: it’s a simple script to write. The problem starts with the less “programmatic” services, such as the noted bookmarks and messages. Especially with GMail as copying the whole 3GB of data each time from the server is unlikely to work fine, it has to be done properly.

Has anybody any pointer on the matter? Maybe there’s already a smart backup script, similar to tante’s smart pruning script that can take care of copying the messages via IMAP, for instance…

Service limits

There is one thing that, by maintaining PAM, I absolutely dread. No, it’s not cross-compilation but rather the handling of pam_limits for what concerns services, and, in particular, start-stop-daemon.

I have said before that if you don’t properly run start-stop-daemon to start your services, pam_limits is not executed and thus you won’t have limits supported by your startup. That is, indeed, the case. Unfortunately, the situation is not as black and white as I hoped, or most people expected.

Christian reported to me the other day he was having trouble getting user-based limits properly respected by services; I also had similar situations before, but I never went as far as checking them out properly. So I went to check it out with his particular use case: dovecot.

Dovecot processes have their user and group set to specific limited users; on the other hand, they have to be started as root to begin with; not only the runtime directories are not writeable but by root, but also it fails to bind the standard ports for IMAP as user (since they are lower than 1024); and it fails further on when starting user processes, most likely because they are partly run with the privileges of the user logging in.

So with the following configuration in /etc/security/limits.conf, what happens?

* hard nproc 50
root hard nproc unlimited
root soft nproc unlimited

dovecot hard nproc 300
dovecot soft nproc 100

The first problem is that, as I said, dovecot is started from root, not the dovecot user; and when the privileges are actually dropped, it happens directly within dovecot, and does not pass through pam_limits! So the obvious answer is, the processes are started with the limits of root, which, with the previous configuration, are unlimited. Unfortunately, as Christian reported, that was not the case; the nproc limit was set to 50 (and that was low enough that gradm killed it.

The first guess was that the user’s session limits are imposed after starting the service; but this is exactly what we’re trying to avoid by using start-stop-daemon. So, we’ve got a problem, at first glance. A quick check on OpenRC’s start-stop-daemon code shows what the problem is:

                if (changeuser != NULL)
                        pamr = pam_start("start-stop-daemon",
                            changeuser, &conv, &pamh);
                        pamr = pam_start("start-stop-daemon",
                            "nobody", &conv, &pamh);

So here’s the problem; unless we’re using the --user parameter, start-stop-daemon is applying limits for user nobody not root. Right now we’ve got a problem; this means that we cannot easily configure per-user limits for the users used by services to drop their privileges, and that is a bad thing. How can we solve this problem?

The first obvious solution is adding something like --user foo --nochuid that would make start-stop-daemon abide to the limits of the user provided, but no call to setgid() or setuid() is performed, leaving that to the software itself to take care of that. This is fast but partly hacky. The second option is not exclusive and actually should probably be implemented anyway: set up the proper file-based capabilities on the software, then run it as the user directly in s-s-d. With maybe a bit of help from pam_cap to set it per-user rather than per-file.

At any rate, this is one thing that we should be looking into. Sigh!

To finish it off, there is one nasty situation that I haven’t checked yet and actually worries me: if you set the “standard user” limits lower than nobody (since that is what the services would get), they can probably workaround the limits by using start-stop-daemon to start their own code; I’m not sure if it works, but if it does, we’ve got a security issue on at least openrc at our hands!

Some notes about Google Wave

I’m still not sure about the whole hype around Google’s new service, Wave. Thanks to Jürgen, I got an invite as well and I’ve been fiddling with it from time to time… I’m not saying it’s useless, but I don’t think it’s excessively useful either.

What I think Google was able to do here was a lot of pre-hype of something that, generally, is once again mediocre (and definitely the code was; the first days I tried it out, the “something went wrong, please refresh” message was absolutely common). And again the whole “invite frenzy” is working very well for them. The idea that it’s something that just a “limited set” can look at makes the product much more desired than it would be if it was simply accessible to anybody.

And to be honest, every time I read about people “stealing invites” and tricking others about entering the preview I start to worry about the destiny of humanity as a species. At least, I have yet to see a literal telephone sanitizer around. Although I’m not entirely positive that this will keep to be the case in the future. Again, don’t get me wrong, I was curious about Wave as well, given how much I read about it, also on twitter/identica from other FLOSS developers, but at the same time, I wasn’t really going to jump through any hoop to find out how much that was relevant or not.

So, first note I have to make is that the interface seems really to be designed to be part of those web applications that try to replace the standard desktop, with the widgets that behave like standard windows and so on. I don’t really like that idea because I still think that a standard desktop is very useful (I’m a bit worried about Gnome Shell as well, to be honest); I don’t make excessive use of Apple’s Dashboard, nor I use stuff like iGoogle, or the widget support in my Bravia LCD TV. But I guess this might actually be Google’s strategy for their Chrome OS thing.

Behind all the hype around it, I define Wave (to Luca’s laughs) as the Mailing List’s equivalent of what IM is for the email: never going to replace it, but sometimes easier to deal with. It’s probably a good thing somewhat given that we’re still using IRC as the main many-to-many communication channel… and that’s not something I definitely like (for the multitude of shortcomings of the IRC protocol). On the other hand, I find this quite crippled by the fact there are no ways to define groups, or lists, of contacts (it’d be nice to have them, because then I could just “send a wave” to the Gentoo developers in there to ask for some help or plan something out, and so on); somewhat a strange thing to lack, given that both Facebook and Twitter seems to have taken pride in implementing those lists in the months that passed between the Wave announce and the actual opening of the public beta.

One interesting thing is that, while Google implemented a new schema for addresses ( – which sounds quite pointless to me, one thing I liked about Google Talk is that it allowed me to use the same address for both email, Jabber and MSN – it is adding by default the Google Talk contacts to the Google Wave contact list as they register. I guess this can be considered a minimum feature share (the same contact thing applies to Google Reader subscribers). But what I definitely liked about all that is the way it handles the contacts’ names.

For those who actually set up a proper name in their Google profile, Google Wave uses by default the First Name for display (so you’d probably find me as Diego Elio — or Diego, I’m not sure); though, when there are more than one contact with the same name, it displays the start of the surname as well (so I got Jason S and Jason A in my contacts right now). Some other software should probably learn from that. And that means both open source and proprietary software.

All in all, what I can judge for now is mostly the interface at a first glance; while my contact list is starting to fill up, I don’t see anything in there yet that makes it more usable than a standard IM chat… it might have been even less useful if Jabber/GTalk had working multi-user chats, akin to MSN’s or Skype’s (don’t get me started with the “usability” of Jabber rooms). The fact that it needs the page to stay open (and the fact that the JavaScript in it seem to slow Firefox down positively — I guess that’s their main reason to push for Chrome at this point, or the other way around Wave is their way to push for Chrome), really makes the whole thing a lot less useful in the whole; even just adding a bot to GTalk to tell you when Waves went updated would have been much more useful.

And finally, just one little, tiny note for Google: why on earth you cannot seem to find a single interface style between different applications? Already Google Reader and Gmail have different interfaces; Wave has a drastically different one as well; Google Code even have the navigation bar on the right (when all the rest have it on the left). The two services that have the most common interface seem to be Gmail and Google Calendar, but there are quite a few subtle differences between the two… and that anyway only applies to the default Gmail theme, anyway.

Multiple password recovery failures

For safety, I never use the same exact password unless it’s the very generic one for services that I don’t care about at all; any service that really keeps information about me, like Amazon and various other hardware (and software) suppliers, have a different password each. I try to stick, whenever I can, with the same username; although sometimes I’m provided an username already (and sometimes, they use my surname, included the accented “ò” letter that ensures funny stuff will happen).

Now, with so many different passwords, it’s almost logical that at some point I’ll forget one; I actually make use of the save password feature in the various OSs/browsers to remember the password for me (on the other hand, I do change some passwords periodically). Sometimes though, when I reset Firefox, change computer, or simply use a new box, I find myself in small trouble since I can’t remember what password I was using on a given site.

This is usually not too bad since almost all sites nowadays provide a “Lost Password” feature. The problem is that such feature is, often enough, written in so many bad ways:

  • don’t send me my old password! If you’re able to send me my old password, then you’re already at two failure points: the first is that you have my password saved in clear text in your database (which is bad because if your database is compromised, your user’s passwords are readable), the second is that you sent me an email, most likely through clear text channels, with the password in clear-text;
  • don’t just change my password! What if somebody else was asking for my password to be changed to waste my time? Send me a token to change the password, please;
  • don’t just send me a permanent new password Even though I’m smart enough to change it right away, make the password a one-time temporary password that requires me to change it right away, pretty please; this way nobody could find it in my mail archive by mistake (the stolen-laptop kind of problem).

While I’m not the kind of paranoid person who would use continuously one-time passwords (well, without considering the banking account), I’m paranoid enough to be doubtful when a service does not provide SSL-based login (okay even my own blog does not do that, but in general I mean for important stuff), and I seriously get scared when a service that remembers – for instance – my credit card, sends me an email with my password in clear-text. Which is why I use different passwords in the first place.

I learnt this the hard way actually, because the ASP web application used for the forum of an ancient gaming site I was involved is stored the passwords in clear-text, on an Access database file that was readable via HTTP if you knew the path, and since that went hacked quite easily (I only started administering that box after this happened), and I was using the same password for lots of services.

Facebook, usefulness of

Seems to me like the existence of Facebook is seen in either of two ways: either the coolest website in the world, or the most useless one; given I am subscribed, too one would expect me not to be in the latter category; but I really would take offence if you categorized me on the former category!

Indeed, I actually fall into a “third way” category: I find it useful to some point, but not very much. I do use it, and I have both friends, and a few work-related contacts (not many); I also have Gentoo users and developers, but I tend to select who I accept requests from (so if I spoke with you once or twice, it’s rare; if you’re somebody I happened to collaborate with for a while at least, then I’ll probably accept). I don’t feel like it’s an invasion of my privacy, to be honest, since my statuses are almost the same there as they are on and twitter; my notes are usually just my own blog, I might do some non-idiotic meme from time to time (more on that later), I don’t really do quizzes or use strange applications. I might have some not-so-public photos, but that’s really just nothing you’d have fun seeing me in, since they are usually nights out with friends; and if it was just for me they could also be public, I don’t care. I do have my full details: phone numbers, street address, email and IM users, but they are not really private details, given that my phone numbers and addresses correspond to my work details, and the rest, well, let’s just say I don’t really have much imagination and you can find me as Flameeyes almost everywhere.

So what’s the usefulness of Facebook at this point in time for me? Well, I do aggregate my blog, to show it to my friends, and let them share it with others so that others can read what I write (I hoped that my post about Venetian public offices ended up shared more but it doesn’t seem like my friends do seem to be interested in real politics outside that of parties); I reach more people, that don’t follow or twitter, and I do follow them too, so it really does not add much there either. When somebody I know I have as a contact on Facebook asks me for my details, well, my answer is just “Look at my Facebook profile”; it’s there for a reason). In general, it’s just another medium like this blog, like planets aggregators and so on. It does not really add much. It’s a bit more than an overhyped address book.

One note that is often made is that the idea of finding “people you haven’t seen in years” is pointless because… you haven’t seen them in years for some reason.Sometimes, though, it might just be a problem with losing contacts, going different ways but still interested in getting back in touch and hearing from, from time to time, so it works as a medium for that too.

And on a similar note, why do I find memes interesting, or even useful? Well sometimes you do know somebody, or at least met somebody but don’t know well enough to know some personal nitpicking details; memes might strengthen a bond between people by providing possibilities to compare and identify similar tastes and other stuff. In particular note-based memes (or blog based memes) don’t require you to use stupid third-party applications to do that. Yes I know it might sound silly, but I can use the example of an ex-classmate of mine who I haven’t seen in almost ten years for various reason, until facebook came and we actually found we now have common interests; people grow up and change.

Unfortunately, in all this I don’t see anything that can save Facebook from its financial problems: it really does not work for advertisement, most of the applications seems to be on the limit of fraud, and there is no fee to enter, nor there seems to be any particularly interesting or important paid services (as a counter-example, Flickr’s paid version, with no limit on photo upload and access to the original images, is a service for which even I pay!). For this reason, I really don’t relay (sorry for the words’ game) on Facebook to store important information (so yeah I do keep my address book updated outside of it), I wouldn’t be surprised if next month they start charging for service, or if in four they close down entirely. Nor I would miss them.

And to finish, why on earth am I writing about Facebook now? Well, I just want to warn my readers for why in the next few days they might find some Italian posts talking about Facebook; and in turn that is part of my plan to try instructing my friends and acquaintances on how to behave on the network, and with a computer. Which hopefully will allow me to write it once rather than re-explain it every other time I have to take over a PC to clean up from viruses and other issues.

User Services

A little context for those reading me; I’m writing this post a Friday night when I was planning to meet with friends (why I didn’t is a long story and not important now). After I accepted that I wouldn’t have a friendly night I decided to finish a job-related task, but unfortunately I’ve had some issues with my system. Somehow the latest radeon driver is unstable (well it’s an experimental driver after all), and it messes up compiz; in turn after a while either X crashes or I’m forced to restart it. This wouldn’t be a problem if the emacs daemon worked as expected. Since it doesn’t, I lose my workspace, with the open files, and everything related to that. It’s obnoxious. Since this happened four times already today I decided to take the night off, but I wasn’t in the mood for playing, so I settled for watching Pirates of the Carribean 2 in Blu-Ray, and write out some notes regarding the topics I wanted to write about for quite a while.

The choice of topic was related to the actual context I’ve just written above. As I said GNU Emacs is acting badly when it comes to the daemon. While the idea of the daemon would be to share buffers (open files) between ttys, network and graphical session, and to actually allow restarting those sessions without losing your settings, your data, and your open files, it’s pretty badly implemented.

A few months ago I reported that as soon as X was killed by anything (or even closed properly), the whole emacs daemon went down. After some debugging it turned out to be a problem with the handling of message logging. When the clients closed they sent a message to be logged by the emacs daemon, but since it had no way to actually write it to a TTY session, it died. That problem have been solved.

Now the problem appear to be just the same mirrored around: after X dies, the emacs daemon process is still running, but as soon as I open a new client, it dies. I guess it’s still trying to logging. As of today the problem still happens with the CVS version.

So anyway, this reminded me of a problem I already wanted to discuss with a blog: user-tied services. Classically, you had user-level software that i s started by an user and services that are started by the init system when the system starts up. With time, software became less straightforward. We have hotplugged services, that start up when you connect hardware like, for instance, a bluetooth dongle, and we have session software that is started when you login and is stopped once you exit.

Now, most of the session-related software is started when you log into X, and is stopped when you exit, sometimes, though, you want processes to persist between sessions. This is the case of emacs, but also my use case for PulseAudio since I want for it to keep going from before I login to before I shut down the system straight. There are more cases of similar issues but let’s start with this for now.

So how do we handle these issues? Well for PulseAudio we have an init script for the systemwide daemon. It works, but it’s not the suggested method to handle PulseAudio (on the other hand is probably the only way to have a multi-user setup with more than one user able to play sound, but that’s for another day too). For emacs, we have a a multiplexed init script that provides one service per user; a similar method is available for other service. Indeed, in my list of things to work on regarding PulseAudio there is to add a similar multiplexed init script to run per-user sessions of PulseAudio without using the system wide instance (should solve a bit of problems).

So the issue should be solved with he multiplexed per-user init script, no? Unfortunately, no. To be able to add the init scripts to the runlevels to be started, you need to have the root privileges. To start, stop and restart the services, you also need root privileges. While you can use sudo to allow users to run the start/stop commands to the init script, this is far from being the proper solution.

What I’d like to have one day is a way to have user-linked services, in three type of runlevels: always running (start when the machine starts up, stop when the system shuts down), after the first login (the services are started at the first login and never stop till shutdown), and while logged in (the services start at the first login, and stop when the last session logs out).

At that point it would be then possible to provide init scripts capable of per-user multiplexing for stuff like mpd too, so that users could actually have the flexibility f choosing how to run any software in the tree.

Unfortunately I don’t have any idea on how to implement this right now, but I guess I could just throw this in for the Summer of Code ideas.