“Put it there and wait for users to break” isn’t a valid QA method

So it seems like Jeremy feels we’re asking too much of maintainers by asking them to test their packages. As I said in the comments to his blog, picking up poppler as a case study is a very bad idea because, well, poppler has history.

Historically, new upstream poppler releases in the recent times have been handled by guys working on the KDE frontends, when they suited their need, rather than being coordinated with other developers. This isn’t much of a problem for them because binary distributions have no problem with shipping two version of the same library on different ABI versions, and even building packages against different versions is quite okay for them — it definitely isn’t for us.

In turn, within Gentoo, in the recent times, poppler has been maintained by the KDE team; mostly by the one developer who, not for the first time, wanted to bump to 0.16.0 a couple of days ago, without adding a mask first and ensuring that all the packages in tree that use poppler would work correctly with it. Jeremy states that doing such a test would make ~arch akin to stable; I and I’m sure other of my colleagues, rather see it as “not screwing users up”.

First of all, it’s not like bumping something that you expect not having trouble; bumping grep and finding out that half the packages fail whose tarballs were built with Gentoo and a very very old version of our elibtoolize patches is unexpected. Finding out that, once again, the new minor version of poppler has API breakage is not surprising, it’s definitely expected, which means that when committing the version bump you can have either of two thoughts in mind “I don’t care whether GNOME packages break” or “Ooh shiny!”. In either case, you should learn that such thoughts are not the right state of mind for committing something to the tree. I said that before of other two developers, I maintain my opinion now.

And just to be clear, I’m not expecting that we spend months in overlays before reaching the main tree, I’m just saying that if you know it’ll break something, you should mask it beforehand, and give a heads’ up to the developers maintaining the other packages to due their job and fix it.

Now, in the comments of that post, Jeremy insists that factoring in my tinderbox is not an option for him, because it is “counting on a single developer cpu/time”. Right, because there is no other way to test reverse dependencies, uh? The tinderbox is meant to help the general situation, but it’s definitely not the only way; even though I’d have been glad to run it for poppler to find and report the problems, the task of checking its reverse dependencies is far from impossible to deal with for a single developer. There are a grand total of … thirty-nine (39!) packages with reverse dependencies on poppler! So it’s not a “can’t be done” situation, it’s a “can’t be arsed”. This also brings Jeremy’s ratio of “7/14000” packages with a problem to a more worrisome 739. See the difference?

Simply put, do your own homework: learn about the package you’re maintaining, try hard not to break reverse dependencies; if it happens that you break a package unexpectedly, hey, it’s ~arch, we can deal with it. But do not commit stuff that you know will break a good deal of the packages depending on yours. Especially if you’re not out there to fix them yourself.

17 thoughts on ““Put it there and wait for users to break” isn’t a valid QA method

  1. I would agree to the entire post, except “I don’t care whether GNOME packages break”, that’s kosher

    Like

  2. Also, http://bugs.gentoo.org/show…There have been a number of bumps that change API/ABI without checking reverse dependencies.http://bugs.gentoo.org/show…A patch is applied without checking the package builds.There is a big difference between a distro and a method of tracking upstream release notifications, downloading tarballs and expecting everything to build and work.Is ~arch supposed to be a distro, or a “spray and pray”?

    Like

  3. Sigh, I guess I need to defend myself here too since I’ve had more than one perspective employer read my blog and I can’t afford to have my name dragged around like this. So here is where you mentioned my name and my replies…> “So it seems like Jeremy feels we’re asking too much of maintainers by asking them to test their packages.”I was hoping for a discussion of what arch/~arch should mean. I tired to provoke this by my “hard questions” at the end. The bottom line is that I don’t care so much about *this* issue (poppler-0.16.0) but rather future.. Related baggage in the post was related to QA team, this was based on IRC conversation that you weren’t present for and I did not document because I felt no need to get names involved. Anyway, there was a number of devs present for the conversation and I was not being a lone-wolf or devil’s advocate in my ideas. Provoking interesting discussion can not be had as a fault. [I wasn’t trolling or flaming, either]Nowhere did I say that maintainers don’t need to test, that wouldn’t be too responsible.> “Jeremy states that doing such a test would make ~arch akin to stable”Again, as I told another dev, I alluded to the fact that ~arch is becoming a stable tree [in my opinion]. I did not explicitly say that doing any test would make ~arch akin to arch.I, and others, feel that Gentoo Users and Gentoo Devs have different concepts of what arch/~arch means. This isn’t good for anyone.> “Jeremy insists that factoring in my tinderbox is not an option for him”As I said, relying on one man is not viable. I even thanked you for your efforts multiple times and appreciate what you do. I did not “insist” or “dis” anything involving you. I simply cannot accept the “ask Diego to do $X” as a REQUIRED step in the QA process. I don’t think you want that responsibility either. At least for me, that would become a job and jobs are not as fun as what you want to do yourself.> “This also brings Jeremy’s ratio of “7/14000” packages with a problem to a more worrisome 7/39.”Noted. Wouldn’t it be nice if there was tool for maintainers that found those (39) and compiled them? The only way I know of right now is manually looking it up. I’ll discuss this with the dev-portage people to see if it is an enhancement possibility. I wish to not talk about specific numbers though because no two groups will agree on a magic number of acceptability that they are satisfied with.On a final note, I’m looking forward to more collaborative, productive conversations in the future between us. I’ll try to do a better job too, of course.

    Like

  4. Speaking as a Gentoo user with about 5 years worth of experience with Gentoo most of which I have spent on ~x86 and ~amd64, I expect ~arch to be reasonably unstable, that is, some minor and irregular/unintentional program b0rks that are fixed within few days or weeks at worst are expected and fine but stuff that makes system unbootable, very unusable or makes you just think “here we go again” when you just know that trouble is ahead (as in case of libpng, for example).Thank god that I much like other ~arch users also use Portage 2.2 and therefore things are a lot better than they where, say, in 2006, when upgrades that left the whole system unusable happened at least once or twice a year.

    Like

  5. Lost end of sentence while rephrasing:…but stuff that makes system unbootable, very unusable or makes you just think “here we go again” when you just know that trouble is ahead (as in case of libpng, for example)… is not really fine (but if I wasn’t willing to withstand that for the sake of being on the bleeding edge, I wouldn’t be using Gentoo in the first place or at least not ~arch).

    Like

  6. Diego – I agree with your points completely. Sure, running ~arch should have some level of risk, and I think our users are comfortable with that. However, that doesn’t mean that we should commit unmasked builds into ~arch when they are as likely as not to cause serious problems.Taking out significant portions of entire desktop environments would be a serious problem.Here is how I view the appropriate level of quality in Gentoo:~arch – generally should be usable by a power-user who won’t get too flustered if once every month something odd happens, which is either fixable by checking a bug and getting a reasonably quick answer, or by waiting a few days at most and doing an emerge -uD world. Even in ~arch packages should not stay broken for weeks on end. ~arch should be release-candidate level of quality, not alpha. The expectation should be that normally packages flow from ~arch to stable without further work.stable – any distro-related problem at all in a package is a failure to some extent in Gentoo QA (the process, not the team, though the team should be kicking butt to keep the process in gear). Upgrades that require non-obvious work to deal with should ALWAYS have a news item warning about them in advance. Gentoo isn’t Debian, and we don’t need to backport patches to Firefox v1.2, but stable should still be suitable as a starting point even for serious production environments. Sure, we aren’t 100% there yet, but we do pretty well.Nobody is in a better place to anticipate problems than the package maintainer – they are the ones with the history on the package and they should have some idea when something could go wrong.I don’t see ~arch as another stable tree. I think the problem is that people think that stable means “I don’t think this will cause problems.” Stable really means “this has been demonstrated to not cause problems.” If it hasn’t been used seriously, then it isn’t stable, period. So, ~arch will NEVER be stable (even if it were bug-free) by virtue of the fact that packages hit it first, so they are unproven by definition.

    Like

  7. > Wouldn’t it be nice if there was tool for maintainers> that found those (39) and compiled them?Well, it doesn’t compile them but listing them is simple enough:equery depends –all-packages (-a) PKG.Now that lists individual package versions, but that’s useful info as well.By my (quick) count that lists 41 packages, but a couple of those might be renames or maybe a couple are masked or I was just too quick and double-counted two versions of a package. It’s ball-park to the 39 mentioned above, however.Using the –indirect (-i) option can be informative too, as that’ll yield “leaf” dependencies, arguably more appropos to a “breaks the tree” claim. (Caution, equery depends -a -i will take a LONG time. Plan to do something else for awhile and come back for the results.) With this there’s also the –depth= option, tho for a “breaks the tree” claim you’d want the whole depth.I’d argue that another option, to list only the top version of each package, could be useful as well. Presumably it’d have sub-options to choose top version period, top-~arch and top-stable (for some particular arch). This would make the output rather more manageable.Arguably, a package maintainer should be reasonably familiar with at least the equery depends -a output, as it’s his package’s direct reverse-dependencies, thus, what breaks (directly) if his package breaks. If as here, it’s only a few packages, it’s feasable to build-test them all when one does bumps and that’s what Flameeyes appears to be arguing for, tho obviously there’s some number at which that’s no longer feasable and toolchain packages will need a different testing strategy entirely. Regardless of whether build-testing them all is feasable or not, I’d argue that at a minimum, a maintainer should have some idea of at minimum, how to get that list, and what’s in it in general.

    Like

  8. 1) Thank you to all the QA guys for all of your hard work, even when things slip by(I’m looking at you 32 bit libjpeg and libjpeg, not being bumped at the same time 2-5 times now breaking wine every time).2) I can see both sides of this “argument”, I wonder if a “dump a list suitable for rebuilding” should be added to equery depends. That would allow a very simple wrapper for emerge so you could test building packages at least. I’ll look into it and see if it is simple.3) I posted this question on Jeremy Olexa’s blog as well. Is there a way for a power user to discover non live masked versions of packages already installed on the system?number 3 preferably would have a wrapper that would allow for “testing a p.mask’ed package and filling a bug, and restoring the old package” all in one shot. If #2 was working it could even be incorporated for helping to test libs, and reverse dependencies (just the ones installed on the users system by default).

    Like

  9. >> “Jeremy insists that factoring in my tinderbox is not an option for him”>As I said, relying on one man is not viable. I even thanked you for your efforts multiple times and appreciate what you do. I did not “insist” or “dis” anything involving you. I simply cannot accept the “ask Diego to do $X” as a REQUIRED step in the QA process. I don’t think you want that responsibility either. At least for me, that would become a job and jobs are not as fun as what you want to do yourself.sounds about true but wouldnt it be nice if you had a public tinderbox (farm) where you could look at upcoming breakage for the testing/stable tree? or is that idea totally useless?

    Like

  10. My 2 cents on how (as a user) I’ve viewed arch vs ~arch over the past 8ish yrs:arch – best stability from what is available~arch – all dependencies are met and expected to build correctly. Candidate for stable if no issues come up, or if it’s found to be more dependable than current stable. (i.e. Last call for bug reports and unanticipated corner cases)masked – Untested, dependencies may not be available or work. Staging area while dependency issues are worked out.

    Like

  11. Well… It is difficult for me to see that devs/maintainers are actually even enabling the ‘QA’ flags and fixing anything since the prevailing sentiment seems to be ‘let the users file upstream bugs’Even for such things as missing headers, and ‘breaks strict-aliasing rules’The sheer number of string format errors in some packages is mind boggling and at present there are no qa chcecks for those.

    Like

  12. Personal name calling is not the most productive way of sharing ideas, especially when it’s misdirected.Jeremy was certainly not a lone-wolf during mentioned IRC discussion. It was actually myself explaining basics (one would think..) of integration tests and why running full ~arch is not helping.Jeremy happened to agree with me and he was very kind to summarize all those ideas and publish them on his blog.> I and I’m sure other of my colleagues, rather> see it as “not screwing users up”.That’s fundamental problem – users are not supposed to run full ~arch.

    Like

  13. Too bad that _users do and users will_, and *you* are not supposed to commit crap to ~arch just because you feel like it. Either you learn that or as soon as a strong QA is in place you’re going to suffer from it.Your personal behaviour has caused problems for many users up to now, if it was for me your commit rights would have been revoked a long time ago already.

    Like

  14. I’m not going to repeat myself, Jeremy explained basics of integration tests (of ~arch packages in stable tree) already on his blog. If you or anyone else is unwilling to learn, that’s not my problem. I’m not going to support two independent stable trees. Period.Please, feel free to request revoking of my commit rights when you feel ready to enter discussion on release coordination using points not based on your feelings. Thanks.Package that happens to cause reverse dependencies issues in testing tree is not crap. Testing tree is purposely kept for that reason – to detect such issues.Let me rephrase it – nobody is supposed to run full ~arch. It’s unproductive from integration tests point of view. Those who are silly enough to do it, should be ready to grap the pieces.

    Like

  15. In my experience with Gentoo if a bug turns up in ~arch the usual reply is ‘we accept patches’ and ‘report to upstream’ where you need to create another account usually and file a report but first you need to pull the latest svn or git clone and check it there. After searching bug reports of course.Debian has a new QA thing going now too http://qa.debian.org/daca/ nice list of tools. I’ve been trying out cppcheck which is in portage. Gives warnings for memory leaks and etc haven’t tried the others yet.

    Like

  16. As a 4 year Gentoo user I remember reading in the forums early on and coming away with the thought that most 64bit users were using ~amd64. The thought being that running a stable amd64 made it harder to run up to date packages. I have not minded do so myself because I was always able to fix the issues that came up, even major ones, with the help of the Gentoo community.This begs the question now though, is that perception now outdated? Since my Gentoo systems were always personal and in constant flux the unstable branch was fine. Now it seems the stable 64bit branch should be emphasized over the unstable for general use.Just a thought somewhat off topic…

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s