Why a tinderbox is not enough (not even two)

I’ve already written something about automation for Gentoo bug search, but I think sometimes it’s not easy to understand that just using a huge tinderbox, even distributed, is not going to help much to make sure that software works. The problem is that sometimes, even if software builds fine, it’s just going to break at runtime, and even though tests help, when they are handled properly, they are far from complete solutions.

Problems like the one described in the post above for hfsplusutils are just impossible to gather at build time without running the tests on sample data (which reminds me of the uif2iso problem); but this can get even more subtle: while it’s true that most --as-needed failure happen at buildtime, there are quite a few that will not hit until runtime; one such cases happened to me with libcompizconfig, compizconfig-python and fusion-icon: this last one software wouldn’t start because Python failed to load the second, because the first one wasn’t linking to libX11.

Now of course this could have been found if either libcompizconfig or compizconfig-python had a testsuite, but since I already said that running a tinderbox run with testsuite is probably not something that I would like to do on a daily basis.

Especially for old software, there are problems like endianness issues, 64-bit arches and PIC code that are almost impossible to figure out at buildtime, and that need to be checked during software use. But it’s not just that. For binary packages, especially those of proprietary software, there usually isn’t a testsuite, for this reason, their executables are rarely checked at runtime for consistency. This becomes a problem because sometimes you have software that links against old versions of libraries. This is the same reason why adding VirtualBox 2.1.0 binary package in the tree is going to take a while for Alessio: it uses the old ABI of libcap, which will require resurrecting, and maintaining, an old verison of libcap just for that. (I have a few more issues to talk about regarding the new VirtualBox release but I’ll get to that in another moment).

And yet, the reasons why neither my nor Patrick’s tinderbox can be a replacement for a more throughout approach to packages testing are not finished here. But before proceeding to more, I have to make a distinction between the different approaches me and Patrick took. Patrick’s tinderbox removes all the superfluous packages from the system when installing a new one, which is very good to test for missing dependencies; my method instead iterates over each of the packages in the tree and installs them one by one in the system, filling up the space, which can easily ignore missing dependencies but provides more interesting results regarding iteration of particular ebuilds.

So while my method glosses over broken runtime and buildtime dependencies, like pkg-config not depended upon and similar, Patrick’s method is not going to hit problems like dev-scheme/chicken breaking most of the Mono packages (that would pick up /usr/bin/csc as the C# compiler rather than mcsc), or collisions between unrelated packages.

This means that either one of the two tinderboxes is just not enough to find all the issues, and even the two of them together won’t be enough. Even adding AutoTua to that, it’s just not going to cut it. As Jeremy said on a blog post of mine, we need humans (developers and users) to report issues. I start to feel we also have a need for some real numbers of how many users use packages. Yes I know that’s going to be a popularity contest, and it’s likely that there will be people that would just go on to submit fake results, but even for tree cleaning, it’s important to know whether packages don’t have bugs failed against them because they are good, or just because nobody has used them in so much time.

Oh and so that you know, I currently have little less than 1500 bugs open that I reported (and over 3000 bugs that I reported since I started contributing to Gentoo), and all of them are reported by hand, there are still issues that force me not to use scripts like pybugz. I’ll see to write about them, maybe Zac can see to find a solution to those, like he has been doing quite a while lately for me. Thanks Zac!

3 thoughts on “Why a tinderbox is not enough (not even two)

  1. Tinderboxing, standalone or distributed has sense. It will at least find SOME bugs. Better to have some automation then none.Anyways, Zac said “we need humans (developers and users) to report issues” … The key might be to allow more users to do more analysis without interaction with a dev. If a less experienced user encounters an error, he can often be unsure if he caused the problem (misconfiguration) or if the problem is in the package and will tend to ignore the problem. To get these guys involved out I’d suggest to make a package that would contain- a script to create a local tinderbox (Sometimes lots of packages need to be rebuild to test something or to track down a bug. Some users may fear that they might mess up something during the testing process and would end with a crippled system. Automatic tinderbox creation should with that). the script could also have the option to mirror the current system+packages so that users can test in a clean tinderbox or in the setup where they encounter the problem- a script that would automatically gather all relevant data, write them to some textfiles and compress them .. i’ve seen many times users not submitting emerge –info and other relevant info – this could help get very detailed information about the system, the problematic package etc. Additionally, information from gdb could be gathered too (i’m not sure if gdb can be run non-interactively – if not, using expect + gdb combination might help)Otherwise, it might help to write some documentation introducing users into bug chasing, i.e. sumarize in one document what /var/log is about (usage of cat, grep, tail), howto use gdb ( i.e. http://developer.pidgin.im/… ), how to get more info about the machine (/proc/*, lshw, lspci, dmesg, etc., etc.), how to use scanelf … Maybe it’s already available somewhere, didnt really search for it …

    Like

  2. I don’t disagree that the combination of tinderboxing plus users is the key to have the vastest coverage. And I agree there is need to provide users with better tools to report issues so we can have more information to work on, I actually am trying to clear up my mind in something related to that since your fist comment on the matter.For what concern the documentation, there is “the meaningful backtrace guide”:http://www.gentoo.org/proj/… that provides most of the basic information.If you read also “my post about remote debugging”:https://blog.flameeyes.eu/2… and “the one about debugging information”:https://blog.flameeyes.eu/2… you can start to see a _fil rouge_ about my push for providing users with more powerful debugging means for servers too.

    Like

  3. BTW, Lucas Naussbaum from Debian has access to Grid5000, a cluster of 5000 amd64 nodes in France. If you ever want to run something more time consuming that could benefit from parallelism, you might try asking him to run a job for you.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s