Project Memory

For a series of events that I’m not entirely sure the start of, I started fixing some links on my old blog posts, fixing some links that got broken, most of which I ended up having to find on the Wayback Archive. While looking at that, I ended up finding some content for my very old blog. A one that was hosted on Blogspot, written in Italian only, and seriously written by an annoying brat that needed to learn something about life. Which I did, of course. The story of that blog is for a different time and post, for now I’ll actually focus on a different topic.

When I started looking at this, I ended up going through a lot of my blogposts and updated a number of broken links, either by linking to the Wayback machine or by removing the link. I focused on those links that can easily be grepped for, which turns out to be a very good side effect of having migrated to Hugo.

This meant, among other things, removing references to identi.ca (which came to my mind because of all the hype I hear about Mastodon nowadays), removing links to my old “Hire me!” page, and so on. And that’s where things started showing a pattern.

I ended up updating or removing links to Berlios, Rubyforge, Gitorious, Google Code, Gemcutter, and so on.

For many of these, turned out I don’t even have a local copy (at hand at least) of some of my smaller projects (mostly the early Ruby stuff I’ve done). But I almost certainly have some of that data in my backups, some of which I actually have in Dublin and want to start digging into at some point soon. Again, this is a story for a different time.

The importance to the links to those project management websites is that for many projects, those pages were all that you had about them. And for some of those, all the important information was captured by those services.

Back when I started contributing to free software projects, SourceForge was the badge of honor of being an actual project: it would give you space to host a website, as well as the ability to have source code repositories and websites. And this was the era before Git, before Mercurial, and the other DVCS, which means either you had SourceForge, or you likely had no source control at all. But SourceForge admins also reviewed (or at least alleged to) every project that was created, and so creating a project on the platform was not straightforward, you would do that only if you really had the time to invest on the project.

A few projects were big enough to have their own servers, and a few projects were hosted in some other “random” project management sites, that for a while appeared to sprout out because the Forge software used by SourceForge was (for a while at least) released as free software itself. Some of those websites were specific in nature, others more general. Over time, BerliOS appeared to become the anti-SourceForge, with a streamlined application process, and most importantly, with Subversion years before SF would gain support for it.

Things got a bit more interesting later, when things like Bazaar, Mercurial, GIT and so on started appearing on the horizon, because at that point having proper source control could be achieved without needing special servers (without publishing it at least, although there were ways around that. Which at the same time made some project management website redundant, and others more appealable.

But, let’s take a look at the list of project management websites I have used and are now completely or partly gone, with or without history:

  • The aforementioned BerliOS, which has been teetering back and forth a couple of times. I had a couple of projects over there, which I ended up importing to GitHub, and I also forked unpaper. The service and the hosting were taken down in 2014, but (all?) the projects hosted on the platform were mirrored on SourceForge. As far as I can tell they were mirrored read-only, so for instance I can’t de-duplicate the unieject projects since I originally wrote it on SourceForge and then migrated to BerliOS.

  • The Danish SunSITE, which hosted a number of open-source projects for reasons that I’m not entirely clear on. NoX-Wizard, an open-source Ultima OnLine server emulator was hosted there, for reasons that are even murkier to me. The site got renamed to dotsrc.org, but they dropped all the hosting services in 2009. I can’t seem to find an archive of their data; Nox-Wizard was migrated during my time to SourceForge, so that’s okay by me.

  • RubyForge used that same Forge app as SourceForge, and was focused on Ruby module development. It was abruptly terminated in 2014, and as it turns out I made the mistake here of not importing my few modules explicitly. I should have them in my backups if I start looking for them, I just haven’t done so yet.

  • Gitorious set itself up as being an open, free software software to compete with GitHub. Unfortunately it clearly was not profitable enough and it got acquired, twice. The second time by competing service GitLab, that had no interest in running the software. A brittle mirror of the project repositories only (no user pages) is still online, thanks to Archive Team. I originally used Gitorious for my repositories rather than GitHub, but I came around to it and moved everything over before they shut the service down, well almost everything, as it turns out some of the LScube repos were not saved, because they were only mirrors… except that the domain for that project expired so we lost access to the main website and GIT repository, too.

  • Google Code was Google’s project hosting service, that started by offering Subversion repositories, downloads, issue trackers and so on. Very few of the projects I tracked used Google Code to begin with, and it was finally turned down in 2015, by making all projects read-only except for setting up a redirection to a new homepage. The biggest project I followed from Google Code was libarchive and they migrated it fully to GitHub, including migrating the issues.

  • Gemcutter used to be a repository for Ruby gems packages. I actually forgot why it was started, but it was for a while the alternative repository where a lot of cool kids stored their libraries. Gemcutter got merged back into rubygems.org and the old links now appear to redirect to the right pages. Yay!

With such a list of project hosting websites going the way of the dodo, an obvious conclusion to take is that hosting things on your own servers is the way to go. I would still argue otherwise. Despite the amount of hosting websites going away, it feels to me like the vast majority of the information we lost over the past 13 years is for blogs and personal websites badly used for documentation. With the exception of RubyForge, all the above examples were properly archived one way or the other, so at least the majority of the historical memory is not gone at all.

Not using project hosting websites is obviously an option. Unfortunately it comes with the usual problems and with even higher risks of losing data. Even GitLab’s snafu had higher chances to be fixed than whatever your one-person-project has when the owner gets tired, runs out of money, graduate from university, or even dies.

So what can we do to make things more resilient to disappearing? Let me suggest a few points of actions, which I think are relevant and possible right now to make things better for everybody.

First of all, let’s all make sure that the Internet Archive by donating. I set up a €5/month donation which gets matched by my employer. The Archive not only provides the Wayback Machine, which is how I can still fetch some of the content both from my past blogs and from blogs of people who deleted or moved them, or even passed on. Internet is our history, we can’t let it disappear without effort.

Then for what concerns the projects themselves, it may be a bit less clear cut. The first thing I’ll be much more wary about in the future is relying on the support sites when writing comments or commit messages. Issue trackers get lost, or renumbered, and so the references to those may be broken too easily. Be verbose in your commit messages, and if needed provide a quoted issue, instead of just “Fix issue #1123”.

Even mailing lists are not safe. While Gmane is currently supposedly still online, most of the gmane links from my own blog are broken, and I need to find replacements for them.

This brings me to the following problem: documentation. Wikis made documenting things significantly cheaper as you don’t need o learn lots neither in form of syntax nor in form of process. Unfortunately, backing up wikis is not easy because a database is involved, and it’s very hard, when taking over a project whose maintainers are unresponsive, to find a good way to import the wiki. GitHub makes thing easier thanks to their GitHub pages, and it’s at least a starting point. Unfortunately it makes the process a little more messy than the wiki, but we can survive that, I’m sure.

Myself, I decided to use a hybrid approach. Given that some of my projects such as unieject managed to migrate from SourceForge to BerliOS, to Gitorious, to GitHub, I have now set up a number of redirects on my website, so that their official websites will read https://www.flameeyes.eu/p/glucometerutils and it’ll redirect to wherever I’m hosting them at the time.

Revisiting my opinion of GitHub — How do you branch a readonly repository?

I have expressed before quite a bit of discontent of GitHub before, regarding the way they continue suggesting people to “fork” projects. I’d like for once to state that while I find the fork word the wrong one the idea is not too far from ideal.

Fast, reliable and free distributed SCMs have defined a new landscape in the world of Free Software, as many probably noted already; but without the help of GitHub, Gitorious and BitBucket I wouldn’t expect them to have made such an impact. Why? Because hosting a repository is generally not an easy task for non-sysadmins, and finding where the original code is supposed to be is also not that easy, a lot of times.

Of course, you cannot “rely on the cloud” to be the sole host for your content, especially if you’re a complex project like Gentoo, but it doesn’t hurt to be able to tell users “You want to make changes? Simply branch it here and we’ll take care of the management”, which is what those services enabled.

Why am I writing about this now? Well, it happened more than once before that I was in need to publish a branch of a package whose original repository was hosted on SourceForge, or the GNOME source , where you cannot branch it to make any change. To solve that problem I set up a system on my server to clone and mirror the repositories over to Gitorious; the automirror user is now tracking twelve repositories and copying them over to gitorious every six hours.

As it happens, last night I was hacking a bit at ALSA — mostly I was looking into applying what I wrote yesterday regarding hidden symbols on the ALSA plugins, as my script found duplicated symbols between the two Pulse plugins (one is the actual sound plugin the other is the control one), but ended up doing general fixing of their build system as it was slightly broken. I sent the patches to the mailing lists, but I wanted to have them available as a branch as well.

Well, now you got most of the ALSA project available on Gitorious for all your branching and editing needs. Isn’t Git so lovely?

Before finishing though I’d like to point out that there is one thing I’m not going to change my opinion on: the idea of “forking” projects is, in my opinion, very bad, as I wrote on the article at the top of the page. I actually like Gitorious’s “clones” better, as that’s what it should be: clones with branches, not forks. Unfortunately as I wrote some other time, Gitorious is not that great when it comes to licensing constraints, so for some projects of mine I’ve been using GitHub instead. Both the services are, in my opinion, equally valuable for the Free Software community, and not only.

Update (2017-04-22): as you may know, Gitorious was acquired by GitLab in 2015 and turned down the service. This means that all the references are totally useless now. Sorry.

Movin!!

For a while I have been quoting songs, anime and other media when choosing posts’ titles; then I stopped. Today, it looks perfectly fine to quote the title of one of the Bleach anime endings, by Takacha, since it suits what my post is about… just so you know.

So, since my blog has been experiencing technical difficulties last week, as you might know, I want to move out of the current vserver (midas), which is thankfully sponsored by IOS for xine project to a different server that I can handle just for the blog and a couple more things. I’m now waiting for a few answers (from IOS to start) to see where this blog is going to be deployed next time (I’m looking for Gentoo Linux vservers again).

The main problem is that the big, expensive factor in all this is the traffic; midas is currently serving lots of traffic: this blog alone averages over a 300 MB/day, which gets down to about 10GB of traffic a month. But the big hits come from the git repositories, which means that a relatively easy way to cut down the traffic expense of the server is to move the repositories out.

For this reason I’ve migrated my overlay back over Gentoo hardware (layman included), while Ruby-Elf is the first of my projects to be hosted at gitorious (I’m going to add Autotools Mythbuster soon too).

As for why I decided to go with gitorious over GitHub, it’s a technical and political reason for me. Technical, because I like the interface better; political both for the AGPL3 license used by gitorious and for the fact that it does not highlight the “fork it” method that github seem to have based itself off. On the other hand, I actually had difficulties finding where to clone the unofficial PulseAudio repository to prepare my local copy, and the project interface shows pretty well the “Merge Requests” counter.

At any rate there will be some of my stuff available at github at the end of the day, mostly the things that started or are now maintained within github, like Typo itself (for which I have quite a few changes locally, both bug fixes and behaviour changes, that I’d like to get merged upstream soonish).

This starts to look like a good training for when I’ll actually move out of home too.

Update (2017-04-22): as you may know, Gitorious was acquired by GitLab in 2015 and turned down the service. Which not only means this post is now completely useless, but I gave up and joined the GitHub crowd, since that service “won the war”. Unfortunately some of my content from Gitorious has been lost because I wasn’t good at keeping backups.