We need Free Software Co-operatives, but we probably won’t get any

The recent GitHub craze that got a number of Free Software fundamentalists to hurry away from GitHub towards other hosting solutions.

Whether it was GitLab (a fairly natural choice given the nature of the two services), BitBucket, or SourceForge (which is trying to rebuild a reputation as a Free Software friendly hosting company), there are a number of options of new SaaS providers.

At the same time, a number of projects have been boasting (and maybe a bit too smugly, in my opinion) that they self-host their own GitLab or similar software, and suggested other projects to do the same to be “really free”.

A lot of the discourse appears to be missing nuance on the compromises that using SaaS hosting providers, self-hosting for communities and self-hosting for single projects, and so I thought I would gather my thoughts around this in one single post.

First of all, you probably remember my thoughts on self-hosting in general. Any solution that involves self-hosting will require a significant amount of ongoing work. You need to make sure your services keep working, and keep safe and secure. Particularly for FLOSS source code hosting, it’s of primary importance that the integrity and safety of the source code is maintained.

As I already said in the previous post, this style of hosting works well for projects that have a community, in which one or more dedicated people can look after the services. And in particular for bigger communities, such as KDE, GNOME, FreeDesktop, and so on, this is a very effective way to keep stewardship of code and community.

But for one-person projects, such as unpaper or glucometerutils, self-hosting would be quite bad. Even for xine with a single person maintaining just site+bugzilla it got fairly bad. I’m trying to convince the remaining active maintainers to migrate this to VideoLAN, which is now probably the biggest Free Software multimedia project and community.

This is not a new problem. Indeed, before people rushed in to GitHub (or Gitorious), they rushed in to other services that provided similar integrated environments. When I became a FLOSS developer, the biggest of them was SourceForge — which, as I noted earlier, was recently bought by a company trying to rebuild its reputation after a significant loss of trust. These environments don’t only include SCM services, but also issue (bug) trackers, contact email and so on so forth.

Using one of these services is always a compromise: not only they require an account on each service to be able to interact with them, but they also have a level of lock-in, simply because of the nature of URLs. Indeed, as I wrote last year, just going through my old blog posts to identify those referencing dead links had reminded me of just how many project hosting services shut down, sometimes dragging along (Berlios) and sometimes abruptly (RubyForge).

This is a problem that does not only involve services provided by for-profit companies. Sunsite, RubyForge and Berlios didn’t really have companies behind, and that last one is probably one of the closest things to a Free Software co-operative that I’ve seen outside of FSF and friends.

There is of course Savannah, FSF’s own Forge-lookalike system. Unfortunately for one reason or another it has always lagged behind the featureset (particularly around security) of other project management SaaS. My personal guess is that it is due to the political nature of hosting any project over on FSF’s infrastructure, even outside of the GNU project.

So what we need would be a politically-neutral, project-agnostic hosting platform that is a co-operative effort. Unfortunately, I don’t see that happening any time soon. The main problem is that project hosting is expensive, whether you use dedicated servers or cloud providers. And it takes full time people to work as system administrators to keep it running smoothly and security. You need professionals, too — or you may end up like lkml.org being down when its one maintainer goes on vacation and something happens.

While there are projects that receive enough donations that they would be able to sustain these costs (see KDE, GNOME, VideoLAN), I’d be skeptical that there would be an unfocused co-operative that would be able to take care of this. Particularly if it does not restrict creation of new projects and repositories, as that requires particular attention to abuse, and to make good guidelines of which content is welcome and which one isn’t.

If you think that that’s an easy task, consider that even SourceForge, with their review process, that used to take a significant amount of time, managed to let joke projects use their service and run on their credentials.

A few years ago, I would have said that SFLC, SFC and SPI would be the right actors to set up something like this. Nowadays? Given their infights I don’t expect them being any useful.

Two words about my personal policy on GitHub

I was not planning on posting on the blog until next week, trying to stick on a weekly schedule, but today’s announcement of Microsoft acquiring GitHub is forcing my hand a bit.

So, Microsoft is acquiring GitHub, and a number of Open Source developers are losing their mind, in all possible ways. A significant proportion of comments on this that I have seen on my social media is sounding doomsday, as if this spells the end of GitHub, because Microsoft is going to ruin it all for them.

Myself, I think that if it spells the end of anything, is the end of the one-stop-shop to work on any project out there, not because of anything Microsoft did or is going to do, but because a number of developers are now leaving the platform in protest (protest of what? One company buying another?)

Most likely, it’ll be the fundamentalists that will drop their projects away to GitHub. And depending on what they decide to do with their projects, it might even not show on anybody’s radar. A lot of people are pushing for GitLab, which is both an open-core self-hosted platform, and a PaaS offering.

That is not bad. Self-hosted GitLab instances already exist for VideoLAN and GNOME. Big, strong communities are in my opinion in the perfect position to dedicate people to support core infrastructure to make open source software development easier. In particular because it’s easier for a community of dozens, if not hundreds of people, to find dedicated people to work on it. For one-person projects, that’s overhead, distracting, and destructive as well, as fragmenting into micro-instances will cause pain to fork projects — and at the same time, allowing any user who just registered to fork the code in any instance is prone to abuse and a recipe for disaster…

But this is all going to be a topic for another time. Let me try to go back to my personal opinions on the matter (to be perfectly clear that these are not the opinions of my employer and yadda yadda).

As of today, what we know is that Microsoft acquired GitHub, and they are putting Nat Friedman of Xamarin fame (the company that stood behind the Mono project after Novell) in charge of it. This choice makes me particularly optimistic about the future, because Nat’s a good guy and I have the utmost respect for him.

This means I have no intention to move any of my public repositories away from GitHub, except if doing so would bring a substantial advantage. For instance, if there was a strong community built around medical devices software, I would consider moving glucometerutils. But this is not the case right now.

And because I still root most of my projects around my own domain, if I did move that, the canonical URL would still be valid. This is a scheme I devised after getting tired of fixing up where unieject ended up with.

Microsoft has not done anything wrong with GitHub yet. I will give them the benefit of the doubt, and not rush out of the door. It would and will be different if they were to change their policies.

Rob’s point is valid, and it would be a disgrace if various governments would push Microsoft to a corner requiring it to purge content that the smaller, independent GitHub would have left alone. But unless that happens, we’re debating hypothetical at the same level of “If I was elected supreme leader of Italy”.

So, as of today, 2018-06-04, I have no intention of moving any of my repositories to other services. I’ll also use a link to this blog with no accompanying comment to anyone who will suggest I should do so without any benefit for my projects.

Book Review: Instant Mercurial Distributed SCM Essentials How-to

Okay the title is a mouthful for sure, but this new book from Packt Publishing is an interesting read for those who happen to use Mercurial only from time to time and tends to forget most of the commands and workflows, especially when they differ quite a bit from the Git ones.

While I might disagree with using some very unsafe examples (changing the owner of /etc/apache to your user to experiment on it? Really?), the book is a very quick read and I feel like for the price it’s sold by Packt (don’t get distracted by the cover above, that links to Amazon) it’s worth a read, and keeping it on one’s shelf or preferred ebook reader device.

Well, not sure if I can add more to this, I know it sounds like filler, but the book is short enough that trying to get into more details about the various recipes it proposes would probably repeat it whole. As I said, in general, if you have to work with Mercurial for whatever reason, go for it!

Common sense isn’t quite as common, that’s why the QA team is there

Long title, hopefully catchy enough so that both users and fellow developers can get to read this, since it’s half a rant, half an explanation on why the QA team can be quite anal when it comes to bugs that, for most developers, and especially for the maintainers of the packages involved, might look minor or not causing problems to the general population of users.

Today’s problem for instance was with packages that, for non-live (thus, snapshot or release) versions used the SCM eclasses, fetching directly from CVS, Subversion, GIT, Mercurial and so on. QA already have stated that the ebuilds using SCM eclasses should be masked (it’s in the devmanual if you wish to look at it), and that by extension should tell that using them for relesed code is bad. Among the various reasons for not using SCM eclasses for proper versions of the software, no matter whether upstream has or hasn’t made any serious release, there are safety involvements (you sidestep the Manifest in portage) and problems with proxy (not all SCMs use HTTP for fetching), but what creates quite a bit of a problem to me with the tinderbox: those ebuilds don’t abide the fetch command! When I run the tinderbox, I launch in parallel both a build sequence and a fetch sequence (I cannot use the parallel fetch feature because I launch each package one by one); when the packages using SCM hit, they spend time not building, which is bad for the tinderbox. And it gets even worse when you add stuff like the old, removed rubinius that spent time timing out because the server went away.

But the same class of problems involve using too big files in the tree: when you add a 100K patch for a package that just a few users are going to merge, you’re wasting bandwidth and disk space for a huge amount of people who just don’t care. That’s why we stress the need for making filesdir as lightweight as possible without messing with the development process, of course. Again, this is just another little task for the developer: package the patches and send them on the mirrors, then add them to the ebuild so that they are only downloaded by those who do need them.

And again it’s the same problem with packages ignoring compilers, compiler flags, linker flags, or prestripping when they shouldn’t. These are a problem because one of the selling points of Gentoo is the customisability of it all, at all different levels. So while they might be seen as minor points, these are all the little details that the QA team has to answer for. And it goes on and beyond this, making sure software builds, that it builds in parallel if it’s possible, that it doesn’t fail with new version of dependencies, that it builds with the correct kernel.

So please help us helping everybody, and don’t just ignore our requests, or start a pissing contest on why your package should be special, and not abide to the common rules and directions of the rest of Gentoo. Sorry, but unless it is really special, and that’s pretty rare, your software will have one way or another to abide to those rules, and if I have to piss you off by forcing the decision as QA, then I will. I hope I won’t have to, though.

Packaging Ruby extensions is no fun task

As you might know, among other things I’m in the Gentoo Ruby team and I take care of a few sparse packages. I’ve started working on the new ruby eclasses because, beside liking Ruby as a language, one of the tasks I had for my job (which is now running out, and soon won’t be any longer) required me to use some extensions that weren’t packages for Gentoo at all.

After some time, I resumed yesterday working on that particular project, and a few more dependencies were added, which I went packaging; to package these, it really reminded me why packaging Ruby extensions in Gentoo is so difficult, so for reference I’m actually writing it down here.

The main problem is that we shouldn’t package gems not because, like Debian, we don’t like the layout alone, but also because they don’t comply with our packaging phases, including testing, that allows our packages to be what they are on the system (the best integration). Indeed, while we can package gems somehow, doing that we have a few drawbacks:

  • the content of the gem is installed unconditionally; which means we install more than we should, like tests;
  • the gem itself get installed in the so-called cache, which means it also is wasting more space on disk;
  • installation for multiple Ruby implementations will duplicate even more content;
  • tests are not run during ebuild phase, so it’s very difficult to make sure that the installed code is properly handled;
  • documentation doesn’t get installed in our specific documentation system;

The main problem of this is, obviously, the missing test phase: test phases are designed to make sure that the package is fine and all its dependencies are present, dropping them are a huge problem (indeed I identified more than a couple of packages that only with failing testsuite shown that they are broken by --as-needed build).

But the problem does not stop at this; to not package gems, we need to make use of tarballs; unfortunately most Ruby extension projects lately decided that making tarballs is not trendy enough, and they just release gems; this wouldn’t be a terrible problem, if they tagged their sources in their repositories, whatever they are, but that’s also getting old it seems. Actually, our best ally on this matter has become github; since they fixed their problem with downloads and EAPI 2 introduced the redirected naming for downloads, downloading tagged tarballs is quite easy and fast; when the project tags releases at least.

Indeed, the tagging does seem to be quite hectic between projects, sometimes I’m quite sure they are mistakes (git tags not pushed), some other times there is a systemic lack of tagging that is quite a problem. And to solve that the best option I have is usually to just make a snapshot of the thing; again github helped here when I had to package ttfunk and pdf-inspector yesterday.

It really would be quite simpler if projects consistently tagged their releases, and used a public source control manager that allowed to download an arbitrary tag. But I guess I won’t be counting on this for quite a while.

Distributed SCM showdown: GIT versus Mercurial

Although I admit it’s tempting, I’m not going to enter the mess of complaints (warranted or not) about GIT that have found place on Planet GNOME. I don’t intend to go down on what my issues are with bzr either, since I think I exposed them already. I’m going to comment on a technical issue I have with Mercurial, and show why I find GIT more useful, at least in that case.

If you remember xine moved to Mercurial almost two years ago. The choice of Mercurial at the time was pushed because it seemed much more stable (git indeed had a few big changes since then), it was already being used for gxine, and it had better multi-platform support (running git on Solaris at the time was a problem, for instance). While I don’t think it’s (yet) the time to reconsider, especially since I haven’t been active in xine development for so long that my opinion wouldn’t matter, I’d like to share some insight about the problems I have with Mercurial, or at least with the Mercurial version that Alioth is using.

Let’s not start with the fact that hg does not seem to play too well with permissions, and the fact that we have a script to fix them on Alioth to make sure that we can all push to the newly created repositories. So if you think that setting up a remote GIT repository is hard, please try doing so with Mercurial, without screwing permissions up.

For what concerns command line interface, I agree that hg follows more the principle of least surprise, and indeed has an interface much more similar to CVS/SVN than git has. On the other hand, it requires quite a bit of wondering around to do stuff like git rebase, and it requires enabling extensions that are not enabled by default, for whatever reason.

The main problem I got with HG, though, is with the lack of named branches. I know that the newer versions should support them but I have been unable to find documentation about them, and anyway Alioth is not updated so it does not matter yet. With the lack of named branches, you basically have one repository per branch; while easier to deal with multiple build directories, it becomes quite space-hungry since the reflog is not shared between these repositories, while it is in git (if you clone one linux-2.6 repository, then decide you need a branch from another developer, you just add that remote and fetch it, and it’ll download the minimum amount of changesets needed to fill in the history, not a whole copy of the repository).

It also makes it much more cumbersome to create a scratch branch before doing more work (even more so because you lack a single-command rebase and you need to update, transplant and strip each time), which is why sometimes Darren kicked me for pushing changes that were still work in progress.

In git, since the changesets are shared between branches, a branch is quite cheap and you can branch N times without almost feeling it, with Hg, it’s not that simple. Indeed, now that I’m working at a git mirror for xine repositories I can show you some interesting data:

flame@midas /var/lib/git/xine/xine-lib.git $ git branch -lv
  1.2/audio-out-conversion   aafcaa5 Merge from 1.2 main branch.
  1.2/buildtime-cpudetection d2cc5a1 Complete deinterlacers port.
  1.2/macosx                 e373206 Merge from xine-lib-1.2
  1.2/newdvdnav              e58483c Update version info for libdvdnav.
* master                     19ff012 "No newline at end of file" fixes.
  xine-lib-1.2               e9a9058 Merge from 1.1.
flame@midas /var/lib/git/xine/xine-lib.git $ du -sh .
34M	.

flame@midas ~/repos/xine $ ls -ld xine-lib*   
drwxr-xr-x 12 flame flame 4096 Feb 21 12:01 xine-lib
drwxr-xr-x 13 flame flame 4096 Feb 21 12:19 xine-lib-1.2
drwxr-xr-x 13 flame flame 4096 Feb 21 13:00 xine-lib-1.2-audio-out-conversion
drwxr-xr-x 13 flame flame 4096 Feb 21 13:11 xine-lib-1.2-buildtime-cpudetection
drwxr-xr-x 13 flame flame 4096 Feb 21 13:12 xine-lib-1.2-macosx
drwxr-xr-x 12 flame flame 4096 Feb 21 13:28 xine-lib-1.2-mpz
drwxr-xr-x 13 flame flame 4096 Feb 21 13:30 xine-lib-1.2-newdvdnav
drwxr-xr-x 13 flame flame 4096 Feb 21 13:50 xine-lib-1.2-plugins-changes
drwxr-xr-x 12 flame flame 4096 Feb 21 12:53 xine-lib-gapless
drwxr-xr-x 12 flame flame 4096 Feb 21 13:56 xine-lib-mpeg2new
flame@midas ~/repos/xine $ du -csh xine-lib* | grep total
805M	total
flame@midas ~/repos/xine $ du -csh xine-lib xine-lib-1.2 xine-lib-1.2-audio-out-conversion xine-lib-1.2-buildtime-cpudetection xine-lib-1.2-macosx xine-lib-1.2-newdvdnav  | grep total
509M	total

As you might guess the ~/repos/xine content are the Mercurial repositories. You can see the size difference between the two SCMs. Sincerely, even though I have tons of space, on the server I’d rather keep git rather than Mercurial.

If some Mercurial wizard knows how to work around this issue I got with Mercurial, I might consider it again, otherwise for the future it’ll always be git for me.

More tinderboxing, more analysis, more disk space

Even though I had a cold I’ve kept busy in the past few days, which was especially good because today was most certainly Monday. For the sake of mental sanity, I’ve decided a few months ago that the weekend is off work for me, and Monday is dedicated at summing up what I’m going to do during the rest of the week, sort of a planning day. Which usually turns out to mean a lot of reading and very little action and writing.

Since I cannot sleep right now (I’ll have to write a bit about that too), I decided to start with the writing to make sure the plans I figured out will be enacted this week. Whih is especially considerate to do considering I also had to spend some time labelling, as usual this time of the year. Yes I’m still doing that, at least until I can get a decent stable job. It works and helps paying the bills at least a bit.

So anyway, you might have read Serkan’s post regarding the java-dep-check package and the issus that it found once run on the tinderbox packages. This is probably one of the most interesting uses of the tinderbox: large-scale testing of packages that would otherwise keep such a low profile that they would never come out. To make more of a point, the tinderbox is now running with the JAVA_PKG_STRICT variable set so that the Java packages will have extra checks and would be much more safely tested on the tree.

I also wanted to add further checks for bashisms in the configure script. This sprouted from the fact that, on FreeBSD 7.0, the autoconf-generated configure script does not discard the /bin/sh shell any longer. Previously, the FreeBSD implementation was discarded because of a bug, and thus the script re-executed itself using bash instead. This was bad (because bash, as we should really well know, is slow) but also good (because then all the scripts were executed with the same shell on both Linux and FreeBSD). Since the bug is now fixed, the original shell is used, which is faster (and thus good); the problem is that some projects (unieject included!) use bashisms that will fail. Javier spent some time trying to debug the issue.

To check for bashisms, I’ve used the script that Debian makes available. Unfortunately the script is far from perfect. First of all it does not really have an easy way to just scan a subtree for actual sh scripts (using egrep is not totally fie since autoconf m4 fragments often have the #!/bin/sh string in them). Which forced me to write a stupid, long and quite faulty script to scan the configure files.

But even worse, the script is full of false positives: instead of actually parsing its semantics, it only scans for substrings. For instance it identified the strange help output in gnumeric as a bash-specific brace expansion, when it was in an HEREDOC string. Instead of this method, I’d probably take a special parameter in bash that tells the interpreter to output warnings about bash-specific features, maybe I should write it myself.

But I think that there are some things that should be addressed in a much different way than the tinderbox itself. Like I have written before, there are many tests that should actually be executed on source code, like static analysis of the source code, and analysis of configure scripts so to fix issues like canonical targets when they are not needed, or misaligned ./configure --help output, and os on so forth. This kind of scans should not be applied only to released code, but more importantly on the code still in the repositories, so that the issues can be killed before the released code.

I had this idea when I went to look for different conditions on Lennart’s repositories (which are as usually available on my own repositories with changes and fixes and improvements on the buildsystem – a huge thanks to Lennart for allowing me to be his autotools-meister). By build-checking his repositories before he makes release I can ensure the released code works for Gentoo just fine, instead of having to patch it afterwards and queue the patch for the following release. It’s the step beyond upstreaming the patches.

Unfortunately this kind of work is not only difficult because it’s hard to write static analysis software that gets good results; US DHS-founded Coverity Scan, although lauded by people like Andrew Morton, had tremendously bad results in my opinion with xine-lib analysis: lots of issues were never reported, and the ones reported were often enough either false positives or were inside the FFmpeg code (which xine-lib used to import); and the code was almost never updated. If it didn’t pick up the change to the Mercurial repository, that would have been understandable, I don’t pretend them to follow the repository moves of all the projects they analyse, but the problem was there since way before the move. And it also reported every and each day the same exact problems, repeated over and over; for a while I tried to keep track of them and marked hte ones we already dealt with or which were false positives or were parts of FFmpeg (which may even have been fixed already).

So one thing to address is to have an easy way to keep track of various repositories and their branches, which is not so easy since all SCM programs have different ways to access the data. Ohloh Open Hub has probably lots of experience with that, so I guess that might be a start; it has to be considered, though, that Open Hub only supports the three “major” SCM products, GIT, Subversion and the good old CVS, which means that extending it to any repository at all is going to take a lot more work, and it had quite a bit of problems with accessing Gentoo repositories which means that it’s certainly not fault-proof. And even if I was able to hook up a similar interface on my system. it would probably require much more disk space that I’m able to have right now.

For sure now the first step is to actually write the analysis script that first checks the build logs (since anyway that would already allow to have some results, once hooked up with the tinderbox), and then find a way to identify some of the problems we most care about in Gentoo from static analysis of the source code. Not an easy task or something that can be done in spare time so if you got something to contribute, please do, it would be really nice to get the pieces of the puzzle up.

Resuming work on Gitarella

After release of 0.001 version, Gitarella became famous for a couple of days, as it was also posted on official git mailing list. This shown me that there is a bit of interest about it that make it useful for someone else.

I’ve then restarted working on it tonight, and I started implementing tags support, now that I do have one on a local repository. I hope to release a 0.002 in the next days, just to let people know I’m not dead on that ;)

On different notes, autoconf 2.59d/2.60 incompatibilities are still spread over the tree. From a few packages that seemed to be problematic at the start, right now we have quite a few failing. I was able to find a couple thanks to Pitr, but I still haven’t been able to complete gnome emerge (one of the packages was Totem, for completeness).

Also, yesterday I received Amazon’s package with a gift from Claes Mogren, who I want to thanks for it :) The Magic of the Wizard’s Dream by Rhapsody is really really good, and the Italian version, with Christopher Lee singing, makes me creep from the pleasure :)

Too bad the Lene Marlin DVD came scratched, so I had to start the procedure for change it; I’ve already prepared the box to send, will do that tomorrow.

A huge thanks to Norbert Thiebaud too, for his notable donation :)

Okay, now it’s time to return to Gitarella and to implement a few more things :)

Visibility, I’m not yet satisfied

So, I think I talked some time ago of my efforts to make FFmpeg build with hidden visibility when available (GCC 4.1). I have in my overlay a patch that works for that svn snapshot, but unfortunately I think I opened a can of worms with that :)

Not like I didn’t expect something like this. One of the good things of visibility if that you know exactly what you’re going to show to your users, and especially when you first patch the visibility support you end up looking at what you want to export or not. Our story starts with some symbols declared but no more defined, and a lot of functions that should not be exported at all in the public header files.

Now, as I’m a known masochist (;)) I decided to start looking a bit more deeply into these functions so that they can be moved around and so on. I originally took a simple subversion checkout but handling the patches on that would be quite… difficult, considering that I will be working on the visibility patch and the series of side patches to get move around the declarations that has to be hidden.

Anyway, on Luca’s suggestion, I’m now trying to use git-svnimport to manage the sources locally, maybe this way I’ll be able to get around preparing some patches.. if I feel masochist enough I might end up trying to slim down the diff between xine-lib’s copy of ffmpeg and original ffmpeg.

I have to say that starting this way, my poor computer is really doing as much job as it can :/ having to compile parallel copies of ffmpeg, xine-lib, xine-ui and so on does not really take little time. I could use some new memory, but that’s anything but cheap now. If you want to help the cause by donating a few euros is of course welcome ;) but I think now I really need to plan on taking some new job with enough money to put something a part to upgrade this box…

Oh, ffmpeg compiled without some of the not-to-be-exported symbols is now down to 179 symbols against the previous 229. Not bad at all!

Gitarella does FastCGI, too

So, tonight I didn’t sleep almost at all, so I decided to implement something more on Gitarella. Right now it does work as a FastCGI application if configured correctly and if ruby-fcgi is installed :)

The interesting thing of this is that the connection to memcached is never dropped this way, making it access even more fast, and also it caches the configuration file for the whole session. I can definitely see the speed increase of Gitarella versus gitweb, although I still miss quite a bit of features.

Last night I implemented summary, log and shortlog pages, today the commit summary page.

Something that scares me is having to find a way to display the differences in a decent web-suited way as well as the colourised output and the raw one. Sigh :P

But considering I’m working on gitarella since just a few days and in my spare time, it’s proceeding quite well. I might consider putting it in production on Farragut soon..

For the ones interesting in taking a look, just get the git repository at https://www.flameeyes.eu/p/gitarella and if you want to help feel free to mail me.