We need Free Software Co-operatives, but we probably won’t get any

The recent GitHub craze that got a number of Free Software fundamentalists to hurry away from GitHub towards other hosting solutions.

Whether it was GitLab (a fairly natural choice given the nature of the two services), BitBucket, or SourceForge (which is trying to rebuild a reputation as a Free Software friendly hosting company), there are a number of options of new SaaS providers.

At the same time, a number of projects have been boasting (and maybe a bit too smugly, in my opinion) that they self-host their own GitLab or similar software, and suggested other projects to do the same to be “really free”.

A lot of the discourse appears to be missing nuance on the compromises that using SaaS hosting providers, self-hosting for communities and self-hosting for single projects, and so I thought I would gather my thoughts around this in one single post.

First of all, you probably remember my thoughts on self-hosting in general. Any solution that involves self-hosting will require a significant amount of ongoing work. You need to make sure your services keep working, and keep safe and secure. Particularly for FLOSS source code hosting, it’s of primary importance that the integrity and safety of the source code is maintained.

As I already said in the previous post, this style of hosting works well for projects that have a community, in which one or more dedicated people can look after the services. And in particular for bigger communities, such as KDE, GNOME, FreeDesktop, and so on, this is a very effective way to keep stewardship of code and community.

But for one-person projects, such as unpaper or glucometerutils, self-hosting would be quite bad. Even for xine with a single person maintaining just site+bugzilla it got fairly bad. I’m trying to convince the remaining active maintainers to migrate this to VideoLAN, which is now probably the biggest Free Software multimedia project and community.

This is not a new problem. Indeed, before people rushed in to GitHub (or Gitorious), they rushed in to other services that provided similar integrated environments. When I became a FLOSS developer, the biggest of them was SourceForge — which, as I noted earlier, was recently bought by a company trying to rebuild its reputation after a significant loss of trust. These environments don’t only include SCM services, but also issue (bug) trackers, contact email and so on so forth.

Using one of these services is always a compromise: not only they require an account on each service to be able to interact with them, but they also have a level of lock-in, simply because of the nature of URLs. Indeed, as I wrote last year, just going through my old blog posts to identify those referencing dead links had reminded me of just how many project hosting services shut down, sometimes dragging along (Berlios) and sometimes abruptly (RubyForge).

This is a problem that does not only involve services provided by for-profit companies. Sunsite, RubyForge and Berlios didn’t really have companies behind, and that last one is probably one of the closest things to a Free Software co-operative that I’ve seen outside of FSF and friends.

There is of course Savannah, FSF’s own Forge-lookalike system. Unfortunately for one reason or another it has always lagged behind the featureset (particularly around security) of other project management SaaS. My personal guess is that it is due to the political nature of hosting any project over on FSF’s infrastructure, even outside of the GNU project.

So what we need would be a politically-neutral, project-agnostic hosting platform that is a co-operative effort. Unfortunately, I don’t see that happening any time soon. The main problem is that project hosting is expensive, whether you use dedicated servers or cloud providers. And it takes full time people to work as system administrators to keep it running smoothly and security. You need professionals, too — or you may end up like lkml.org being down when its one maintainer goes on vacation and something happens.

While there are projects that receive enough donations that they would be able to sustain these costs (see KDE, GNOME, VideoLAN), I’d be skeptical that there would be an unfocused co-operative that would be able to take care of this. Particularly if it does not restrict creation of new projects and repositories, as that requires particular attention to abuse, and to make good guidelines of which content is welcome and which one isn’t.

If you think that that’s an easy task, consider that even SourceForge, with their review process, that used to take a significant amount of time, managed to let joke projects use their service and run on their credentials.

A few years ago, I would have said that SFLC, SFC and SPI would be the right actors to set up something like this. Nowadays? Given their infights I don’t expect them being any useful.

Project Memory

For a series of events that I’m not entirely sure the start of, I started fixing some links on my old blog posts, fixing some links that got broken, most of which I ended up having to find on the Wayback Archive. While looking at that, I ended up finding some content for my very old blog. A one that was hosted on Blogspot, written in Italian only, and seriously written by an annoying brat that needed to learn something about life. Which I did, of course. The story of that blog is for a different time and post, for now I’ll actually focus on a different topic.

When I started looking at this, I ended up going through a lot of my blogposts and updated a number of broken links, either by linking to the Wayback machine or by removing the link. I focused on those links that can easily be grepped for, which turns out to be a very good side effect of having migrated to Hugo.

This meant, among other things, removing references to identi.ca (which came to my mind because of all the hype I hear about Mastodon nowadays), removing links to my old “Hire me!” page, and so on. And that’s where things started showing a pattern.

I ended up updating or removing links to Berlios, Rubyforge, Gitorious, Google Code, Gemcutter, and so on.

For many of these, turned out I don’t even have a local copy (at hand at least) of some of my smaller projects (mostly the early Ruby stuff I’ve done). But I almost certainly have some of that data in my backups, some of which I actually have in Dublin and want to start digging into at some point soon. Again, this is a story for a different time.

The importance to the links to those project management websites is that for many projects, those pages were all that you had about them. And for some of those, all the important information was captured by those services.

Back when I started contributing to free software projects, SourceForge was the badge of honor of being an actual project: it would give you space to host a website, as well as the ability to have source code repositories and websites. And this was the era before Git, before Mercurial, and the other DVCS, which means either you had SourceForge, or you likely had no source control at all. But SourceForge admins also reviewed (or at least alleged to) every project that was created, and so creating a project on the platform was not straightforward, you would do that only if you really had the time to invest on the project.

A few projects were big enough to have their own servers, and a few projects were hosted in some other “random” project management sites, that for a while appeared to sprout out because the Forge software used by SourceForge was (for a while at least) released as free software itself. Some of those websites were specific in nature, others more general. Over time, BerliOS appeared to become the anti-SourceForge, with a streamlined application process, and most importantly, with Subversion years before SF would gain support for it.

Things got a bit more interesting later, when things like Bazaar, Mercurial, GIT and so on started appearing on the horizon, because at that point having proper source control could be achieved without needing special servers (without publishing it at least, although there were ways around that. Which at the same time made some project management website redundant, and others more appealable.

But, let’s take a look at the list of project management websites I have used and are now completely or partly gone, with or without history:

  • The aforementioned BerliOS, which has been teetering back and forth a couple of times. I had a couple of projects over there, which I ended up importing to GitHub, and I also forked unpaper. The service and the hosting were taken down in 2014, but (all?) the projects hosted on the platform were mirrored on SourceForge. As far as I can tell they were mirrored read-only, so for instance I can’t de-duplicate the unieject projects since I originally wrote it on SourceForge and then migrated to BerliOS.

  • The Danish SunSITE, which hosted a number of open-source projects for reasons that I’m not entirely clear on. NoX-Wizard, an open-source Ultima OnLine server emulator was hosted there, for reasons that are even murkier to me. The site got renamed to dotsrc.org, but they dropped all the hosting services in 2009. I can’t seem to find an archive of their data; Nox-Wizard was migrated during my time to SourceForge, so that’s okay by me.

  • RubyForge used that same Forge app as SourceForge, and was focused on Ruby module development. It was abruptly terminated in 2014, and as it turns out I made the mistake here of not importing my few modules explicitly. I should have them in my backups if I start looking for them, I just haven’t done so yet.

  • Gitorious set itself up as being an open, free software software to compete with GitHub. Unfortunately it clearly was not profitable enough and it got acquired, twice. The second time by competing service GitLab, that had no interest in running the software. A brittle mirror of the project repositories only (no user pages) is still online, thanks to Archive Team. I originally used Gitorious for my repositories rather than GitHub, but I came around to it and moved everything over before they shut the service down, well almost everything, as it turns out some of the LScube repos were not saved, because they were only mirrors… except that the domain for that project expired so we lost access to the main website and GIT repository, too.

  • Google Code was Google’s project hosting service, that started by offering Subversion repositories, downloads, issue trackers and so on. Very few of the projects I tracked used Google Code to begin with, and it was finally turned down in 2015, by making all projects read-only except for setting up a redirection to a new homepage. The biggest project I followed from Google Code was libarchive and they migrated it fully to GitHub, including migrating the issues.

  • Gemcutter used to be a repository for Ruby gems packages. I actually forgot why it was started, but it was for a while the alternative repository where a lot of cool kids stored their libraries. Gemcutter got merged back into rubygems.org and the old links now appear to redirect to the right pages. Yay!

With such a list of project hosting websites going the way of the dodo, an obvious conclusion to take is that hosting things on your own servers is the way to go. I would still argue otherwise. Despite the amount of hosting websites going away, it feels to me like the vast majority of the information we lost over the past 13 years is for blogs and personal websites badly used for documentation. With the exception of RubyForge, all the above examples were properly archived one way or the other, so at least the majority of the historical memory is not gone at all.

Not using project hosting websites is obviously an option. Unfortunately it comes with the usual problems and with even higher risks of losing data. Even GitLab’s snafu had higher chances to be fixed than whatever your one-person-project has when the owner gets tired, runs out of money, graduate from university, or even dies.

So what can we do to make things more resilient to disappearing? Let me suggest a few points of actions, which I think are relevant and possible right now to make things better for everybody.

First of all, let’s all make sure that the Internet Archive by donating. I set up a €5/month donation which gets matched by my employer. The Archive not only provides the Wayback Machine, which is how I can still fetch some of the content both from my past blogs and from blogs of people who deleted or moved them, or even passed on. Internet is our history, we can’t let it disappear without effort.

Then for what concerns the projects themselves, it may be a bit less clear cut. The first thing I’ll be much more wary about in the future is relying on the support sites when writing comments or commit messages. Issue trackers get lost, or renumbered, and so the references to those may be broken too easily. Be verbose in your commit messages, and if needed provide a quoted issue, instead of just “Fix issue #1123”.

Even mailing lists are not safe. While Gmane is currently supposedly still online, most of the gmane links from my own blog are broken, and I need to find replacements for them.

This brings me to the following problem: documentation. Wikis made documenting things significantly cheaper as you don’t need o learn lots neither in form of syntax nor in form of process. Unfortunately, backing up wikis is not easy because a database is involved, and it’s very hard, when taking over a project whose maintainers are unresponsive, to find a good way to import the wiki. GitHub makes thing easier thanks to their GitHub pages, and it’s at least a starting point. Unfortunately it makes the process a little more messy than the wiki, but we can survive that, I’m sure.

Myself, I decided to use a hybrid approach. Given that some of my projects such as unieject managed to migrate from SourceForge to BerliOS, to Gitorious, to GitHub, I have now set up a number of redirects on my website, so that their official websites will read https://www.flameeyes.eu/p/glucometerutils and it’ll redirect to wherever I’m hosting them at the time.

Revisiting my opinion of GitHub — How do you branch a readonly repository?

I have expressed before quite a bit of discontent of GitHub before, regarding the way they continue suggesting people to “fork” projects. I’d like for once to state that while I find the fork word the wrong one the idea is not too far from ideal.

Fast, reliable and free distributed SCMs have defined a new landscape in the world of Free Software, as many probably noted already; but without the help of GitHub, Gitorious and BitBucket I wouldn’t expect them to have made such an impact. Why? Because hosting a repository is generally not an easy task for non-sysadmins, and finding where the original code is supposed to be is also not that easy, a lot of times.

Of course, you cannot “rely on the cloud” to be the sole host for your content, especially if you’re a complex project like Gentoo, but it doesn’t hurt to be able to tell users “You want to make changes? Simply branch it here and we’ll take care of the management”, which is what those services enabled.

Why am I writing about this now? Well, it happened more than once before that I was in need to publish a branch of a package whose original repository was hosted on SourceForge, or the GNOME source , where you cannot branch it to make any change. To solve that problem I set up a system on my server to clone and mirror the repositories over to Gitorious; the automirror user is now tracking twelve repositories and copying them over to gitorious every six hours.

As it happens, last night I was hacking a bit at ALSA — mostly I was looking into applying what I wrote yesterday regarding hidden symbols on the ALSA plugins, as my script found duplicated symbols between the two Pulse plugins (one is the actual sound plugin the other is the control one), but ended up doing general fixing of their build system as it was slightly broken. I sent the patches to the mailing lists, but I wanted to have them available as a branch as well.

Well, now you got most of the ALSA project available on Gitorious for all your branching and editing needs. Isn’t Git so lovely?

Before finishing though I’d like to point out that there is one thing I’m not going to change my opinion on: the idea of “forking” projects is, in my opinion, very bad, as I wrote on the article at the top of the page. I actually like Gitorious’s “clones” better, as that’s what it should be: clones with branches, not forks. Unfortunately as I wrote some other time, Gitorious is not that great when it comes to licensing constraints, so for some projects of mine I’ve been using GitHub instead. Both the services are, in my opinion, equally valuable for the Free Software community, and not only.

Update (2017-04-22): as you may know, Gitorious was acquired by GitLab in 2015 and turned down the service. This means that all the references are totally useless now. Sorry.

What’s wrong with release notifications?

Distributions of the like of Gentoo have one huge issue with users: they all demand their updates the same moment they are released. This is why many people, including me, have ranted before about the meme of 0day bumps. Generally speaking, we tend to know about the new release of a package we maintain, because we follow, tightly or loosely the development. Unfortunately, it’s quite possible that the new release passes into background for whatever reason, and the result is, well, a not bumped package. Note here: it’s very well possible that some developer will forget to bump his own (upstream) package; shit happens sometimes.

Most of the time, to solve this kind of problem we can use one of the many tools at our disposal to identify release notifications… unfortunately this is not all that feasible nowadays: it was better and it definitely got worse in the past months! Given that most upstream barely have a release publishing procedure, most of us preferred notifications that are not “actively handled” by the developers, but rather than happen as a by-product of the release itself: this way even sloppier release had their notification sent out.

The biggest provider of by-product release notifications was, once upon a time, SourceForge — I say “once upon a time” because they stopped that. While I can understand that a lot of the services offered by SF were redundant, and that most projects ended up setting up better, if less integrated, software anyway (such as phpBB, Mantis – as Bugzilla wouldn’t work, or various Wikis), and I can appreciate that the old File Release System was definitely overcomplex, I can’t see why they stopped allowing to subscribe to notifications. The email that they sent are now loosely replaced by the RSS feed of released files… the problem is that the feed is huge (as it lists all the files ever released for a project) and not sorted chronologically. Sure there is still freshmeat but to have notifications working there you’re asking the upstream maintainer to explicitly remember bumping the version in a different website, and that’s a bit too much, for most people.

You’d expect that other sites that took the place of SourceForge got better at handling this things, wouldn’t you? But that’s definitely not true. Strangely enough, the good example here seem to come from the Ruby community (I say “strangely enough” because you might remember that I ranted so much about missing practises, mandatory procedures and metadata and so on.

First of all, Rubyforge still keeps release notifications by mail (good!), and second, kudos to gemcutter that allows for subscribing to a (semi-)private RSS feed with the new release of just a subset of all the gems released (the Gentoo Ruby team has a common one where the gems present in portage are subscribed — if I remember to add them, that is). It works, sorta. It still requires for you to poll something, although at the end you’re switching the mail reader for the feed reader, so it’s not much of a change. The one problem with that is that you end up receiving a bit more noise than I’d like, as gems that are available in binary form for Windows and Java are listed more than once after an update. But it’s good that it actually is integrated with the gem release procedure.

On the other hand, excluding all the packages that have no hosting facility included at all (which sometimes, such as sudo’s case, have better announcement systems), there are two sites that I count as major screw up, with different degrees: Google Code and Launchpad. The former, is just partly screwed: starring a project does not subscribe to updates, but at least it has a feed of all the files released by the project. What I find definitely strange is that there is no integrated “Subscribe in Google Reader” that would have been definitely more friendly.

Launchpad, instead looks much worse. I recently picked up co-maintainership of a trio of projects and not only there is no email notification, there is no feed of either releases or files! Which means that the only way to find whether a project made a new release is to check the homepage of them. Yuppie. I opened a bug for that on launchpad, but I now lost the link: it was duped against something else, and is no longer visible through launchpad’s own interface, which is, in my book, yet another failure.

Why is it so difficult that packagers need the notifications? This gets even more silly considering that I’m sure the main argument against notification is going to be ”but users won’t have to care, as the package will be available on their distribution”.

Is ohloh here to stay?

You probably know ohloh Open Hub — it’s a website that provides statistical information about a number of Free Software (but not limited to) projects, fetching data from various source repositories and allowing developers to “aggregate” their commit statistics; it works partly like CIA but rather than receiving commit messages, it fetches the whole commits (since it analyses the code as well as the commits).

While I like it, and blogged about it before, I do start having some reserves to it; there are quite a few problems related to it that made some of its useful features moot, and at the same time it seems like it grew some extra features (like download servers) that seem, nowadays, pretty pointless.

Don’t get me wrong, I love the idea itself; and I’m pretty sure developers love statistics, but as I said there are quite a few issues, especially when you add to the story some things like the huge increment in use of distributed version control systems (Git, Bzr, Mercurial, …), and the increased popularity of identi.ca among free software developers. I’m afraid that some of these environment changes are going to kill off ohloh by this pace, mostly because it really doesn’t seem like it’s going to adapt anytime soon.

You might remember my post about the journal feature which was, at the end, simply a tweaked microblogging application; I say tweaked because it had one fundamental feature: hash-tags weren’t simply invented, they directly related to the ohloh projects. Unfortunately, even I abandoned that feature. The reason for that was not only that it seemed to fail to reach the critical mass for such services to be useful, but also that the implementation had quite a few problems that made it more of a nuisance than something useful. The Jabber bot happened to die more often than not, and even when it worked sometimes it failed to update the website at all. I don’t know whether a proper API was defined for that, but it didn’t get support from desktop microblogging software like gwibber for instance, which could have helped build-up the critical mass needed.

Another issue is with the explosion of DVCS, as I said: since now any people wanting to branch a software to apply some fixes or changes is able to have their own repository, there has to be some filtering in which repositories do get enlisted in ohloh: who decides which repositories are official? This is probably one of the reasons why managers were added to the projects; unfortunately this came probably in too late, and as far as I can see most projects lack a manager at all.

And another problem still: seems like any project that involves changing something in the Linux kernel ended up importing a whole branch of the Linus repository (for obvious reasons), which makes my contributions list projects such as Linux ACPI LinuxSH LTTng OpenMoko (this actually created a bit of a fuzz with a colleague of mine some time ago) OpenEZX KVM linux-omap and linux-davinci and that’s just for one patch, mostly (a few already picked up my second named patch to the kernel, which is even more trivial than the first one; I got a third I’ll have to re-send around sooner or later).

But this by itself would just mean that, like many other projects, of all possible kind out there, ohloh has problems to face and solve, no sh*t Sherlock. Why do I go a step further saying that it might not be around for long still? Well, some time ago, I think it was in relation to the blog post I named above about journals, I was contacted by Jason Allen which is ohloh’s head, who asked my help to clear out some problems with indexing gentoo’s repositories (the problem still exists, by the way, there is a huge timeframe indexed but nowhere near the 10 years we just celebrated). I was able for a while to contact him when some problem came up with ohloh and that was fine; unfortunately I have been unable to contact him for a few months now (around the time SourceForge acquired them if that says anything), and this includes pinging him on ohloh’s own journal feature. I hope he’s alright, and simply too busy with the joined stuff to answer, but still that doesn’t profile well for ohloh as a website.

There are other problems as well, don’t you worry: for instance, the projects allow to set up feeds to publish in the project’s page: they used to have problem with UTF-8 and thus garbled my surname (not the only ones mind you), but this became even worse with time because requests go out without an User-Agent (which means my current mod_security configuration is rejecting the requests); of course I could whitelist the ohloh’s server IP address, but… it doesn’t look like a complex bug to fix does it?

And finally, the other day I was considering making use of the ohloh data to prepare a script showing a tagcloud-like list of projects I contribute to, I wanted something that could show easily what I really do… ohloh makes available an API that most likely had everything I needed, but, for a website that proposes you to “Grok Open Source” (that’s what the homepage says), having a clause like this in the API documentation seems a bit… backwards:

It is important not to share API keys. In order to access or modify account data, your application must be granted permission by an individual Ohloh account holder. This permission is granted on a per-key basis.

Now, I know that it’s pretty difficult for webservices to properly authenticate applications using services, and thus why API keys are used; but at the same time, doesn’t this block the whole idea of open-source clients using those APIs?

Unieject moves to GIT

It’s not like I love GIT unconditionally, I think Mike has quite a point about it. But it makes it way easier to handle repositories than Mercurial. So I am using it for almost all the projects I maintain alone.

Unieject up to now was still using Subversion on SourceForge.net; the problem was that git-svn didn’t grasp a rename that I made during the early life of the project when I imported the local Subversion repository to Berlios.

Today, after I couldn’t commit to Sourceforge because my password expired (is this something new?) I tried git-svn again and… it worked! It imported the repository correctly. After a bit of fiddling to replace the tags branches with actual tags, I was able to get my new repository online on the server.

I’ve now disabled SourceForge’s SVN for Unieject, the code can be found at https://www.flameeyes.eu/p/unieject.

I’m now debating with myself about either resuming to work on gitarella, or abandon it for cgit… the problem is that I’d have to prepare an ebuild for cgit at least, and I never tried to understand how to make an ebuild for a webapp. If somebody from the webapp team can give me some of his time to either teach me how to make an ebuild for cgit, or directly creating one, I’d be quite happy :)