Gentoo’s Quality Status

So I have complained about the fact that we had a libpng 1.4 upgrade fallout that we could have for the most part avoided by using --as-needed (which is now finally available by default in the profiles!) and starting much earlier to remove the .la files that I have been writing for so long about.

I also had to quickly post about Python breakage — and all be glad that I noticed by pure chance, and Brian was around to find a decent fix to trickle down to users as fast as possible.

And today we’re back with a broken, stable Python caused by the blind backport of over 200k of patch from the Python upstream repository.

And in all this, the QA team only seem to have myself complaining aloud about the problem. While I started writing a personal QA manifesto, so that I could officially requests new election on the public gentoo-qa mailing list, I’m currently finding it hard to complete; Mark refused to call the elections himself, and noone else in the QA team seems to think that we’re being too “delicate”.

But again, all the fuck-ups with Gentoo and Python could have been seen from a mile away; we’ve had a number of eclass changes, some of which broke compatibility, packages trying to force users into using the 3.x version of Python that even upstream considers not yet ready for production use, and an absolute rejection of working together with others. Should we have been surprised that sooner or later shit would hit the fan?

So we’re now looking for a new Python team to pick up the situation and fix the problem, which will require more (precious) Tinderbox time to make sure we can pull this without risking breaking more stuff in the middle of it. And as someone already said to me, whoever is going to pick-up Python again, will have at their hand the need to replace the “new” Python packaging framework with something more similar to what the Ruby team has been doing with Ruby NG — which could have actually been done once and for all before this..

Now, thankfully there are positive situations; one is the --as-needed entering the defaults, if not yet as strongly as I’d liked it to be; another is Alexis and Samuli asking me specifically for OCAML and XFCE tinderbox runs to identify problems beforehand, and now Brian with the new Python revision.

Markos also is trying to stir up awareness about the lack of respect for the LDFLAGS variable; since the profile sets --as-needed in the variable, you end up ignoring that if you ignore the variable. (My method of using GCC_SPECS is actually sidestepping that problem entirely.) I’ve opened a bunch of bugs on the subject today as I added the test on the tinderbox; it’s going to be tricky, because at least the Ruby packages (most of them at least) respect the flags set on the Ruby implementation build, rather than on their own, as it’s saved in a configuration file. This is a long-standing problem and not limited to Ruby, actually. I’ve been manually getting around the problem on some extensions such as eventmachine but it’s tricky to solve in a general way.

And this is without adding further problems as that pointed out by Kay and Eray that I could have found before if I had more time to work on my linking collision script — it is designed to just find those error cases, but it needs lot of boring manual lookup to identify the issues.

Now, I’d like to be able to do more about this, but as you can guess, it already eats up enough of my time that I have even trouble fitting in enough work to cover the costs of running it (Yamato is not really cheap to work on, it’s power-hungry, has crunched a couple of hard-disks already, and needs a constant flow of network data to work clearly, and this is without adding the time I pour into it to keep it working as intended). Given these points, I’m actually going to make a request if somebody can get either one of two particular pieces of hardware to me: either another 16GB of Crucial CT2KIT51272AB667 memory (it’s Registered ECC memory) or a Gigabyte i-RAM (or anything equivalent; I’ve been pointed at the ACard’s ANS-9010 as an alternative) device with 8 or 16GB (or more, but that much is good already) of memory. Either option would allow me to build on a RAM-based device which would thus reduce the build time, and make it possible to run many many more tests.

21 thoughts on “Gentoo’s Quality Status

  1. I pulled in that broken Python, built a few things, saw your rss, re-synced, rebuilt Python without noticing any issues.Overall my gentoo python experience has been much better than some other things like java. Gentoo seems to handle multiple python versions better than other distros. I’m basically free to use python3 for personal stuff using mostly standard libs without breaking the system. I also haven’t had any issues with python3 + portage 2.2 since it became possible to do so.Unhandled exceptions (usually the fault of upstream) are probably the most common thing I run across with Python packages and Gentoo. Most of the time just a matter of adding IndexError or similar to an except clause.

    Like

  2. I now regret it that we didn’t hunt down all the “not respecting LDFLAGS” bugs before we make –as-needed default. *sigh*

    Like

  3. Well, i-ram, Acard’s ANS-9010… for me they are al the same at the end, guess I should change a bit the text.But anything able to build with RAM speed is going to be a godsend for the tinderbox, simply put.

    Like

  4. Not enough ram on the system and if I were to enable swap it would be worse, because the I/O hit of using swap will grind everything to a halt way before most testsuites complete.

    Like

  5. Given I cannot afford anything at all for the tinderbox, whatever users would be able to spare.And I’m pretty sure _that_ is not going to fly that well…

    Like

  6. I think this is enough for me to finally move away from gentoo :( It’s QA has been awful lately, and I’m getting tired of managing non-broken items myself.

    Like

  7. It would be nice if some corporate sponsor were to actually pay for QA to actually work on this stuff. Unfortunately seems like near nobody is interested in helping providing a stable Gentoo…

    Like

  8. And we all know that, but if we had more horsepower for QA we could have checked for backward-compatibility faster.Seriously, the one developer screwed up tremendously, and there is no excuse for that; Brian is now going to work on sorting out the issue. But for better quality, we need better QA; and better QA requires more horsepower, and more eyes.

    Like

  9. that is enterprise grade hardware, need corporate sponsor indeed, only if we can get more friendly to corporation usage. as far as I can see, that is a hard work if we can not provide a good binary redistribution way, that is a big shortage for corporate usage, as they don’t want to rebuild all packages in thousand hosts.

    Like

  10. Actually, we have numbers of ways we can provide binaries; I maintain two almost identical (in hardware, not software) vservers with a single build host.The problem is that when things break, automatic rebuilding is also broken, thus, we need better QA.Sincerely, I don’t think that it would be too much if even just a couple business who actually use Gentoo in production (and I know there are quite a few) could spare a one-time €200 fee to make sure that their stuff won’t break further…

    Like

  11. Well, I still don’t think this is right, but I’ll stop commenting :( You are basically saying “please buy me more hardware, so that it’s easier to babysit other devs from breaking the tree.”To me, while it might work, it’s a band-aid to a FAR bigger problem.

    Like

  12. No, what I’m saying is “please help us making sure the tree won’t break anymore”.Example: * crazy developer Alpha commits broken Python version (and it’s not known to be broken by the other devs yet); * security needs a new version of Python stabled, and the target is the just-committed version; * arches won’t have time to make sure _everything_ works, and since it’s a security stable, the usual 30 days to find out what would break are not feasible; * QA team launches a specific-aim tinderbox run for that particular Python version, and finds that it breaks a long series of packages; * QA, arches and security ask for a proper ebuild to stable, and for the broken one to be punted.Unfortunately right now this cannot happen because the tinderbox is a _personal_ project of mine that nobody’s funding constantly; it takes CPU-time to build all the Python reverse dependencies, and even more time to sort through the bugs. And since it’s personal I choose the target, target being @~arch@ because that’s what I run on my workstation, and where _usually_ there is more breakage to sort through. Obviously a corporate-funded project would have the stable tree as the aim.And this does not mean that we’re not currently failing to handle the final effects of such finding: the crazy developer Alpha above shouldn’t have been left committing already a few months ago, way before this situation really hit the fan. I’m not going to deny that *Gentoo’s QA team failed once again because we still try to keep a “delicate” approach*. I’m not going to deny that *there is no technical no social protection against crap entering the tree* and both problems should be solved.But again, I still feel alone doing this stuff. If you try to “interpret” what I’m saying, I’ll tell you how all this feel from where I stand: “Do you expect to be thanked for spending more than half of your free time looking after the tree, after you allowed a broken python to enter stable?” — I would expect at least a “thank you”, yes… I know we as a whole failed, but you can’t say that me alone am not doing over my share to get things working properly..

    Like

  13. Diego, you are doing excelent job with tinderbox for the whole Gentoo community!…but I think that, there should have been tinderbox in gentoo-infra (e.g. not being funded as a personal project) a long time ago and I wonder why is no one from gentoo-foundation pushing for it. (Is it really that hard to get 1 or 2 good servers?)I think that having tinderbox for the stable tree is a _key_ issue for Gentoo, because any dev, that makes a mistake (and mistakes _do_ happen) can break stable tree.Broken stable tree is pretty much show-stopper for any corporate gentoo user and they just migrate to something else.On the other hand, Gentoo is a great distro for anyone who wants to get max. out of his hardware (and corp. users like that too ;-) ), but if it means that something breaks now and then, then they will probably choose something more “stable”.I wonder how FreeBSD project deals with ports, because I’ve personally never seen broken ports tree.PS: who is/was the dev that broke the tree? Was it the one that went on bumping packages few months ago? :)

    Like

  14. The problem with boxes for Gentoo Infra is that they need a long term plan; we should have one box ready to run a parallel tinderbox, but I haven’t been able yet to get access to it properly; I probably made a mistake into the first configuration there.That doesn’t mean that it would solve everything; even having two tinderboxes crunching at the same time is not going to help anybody if there isn’t a way to make sure the output is aggregated decently; I’m actually thinking of looking at what Måns wrote as “FATE frontend”:http://git.mansr.com/?p=fat… as it might as well be suitable for the task of reporting the logs.Also, given that the “identi.ca reporting bot”:http://identi.ca/flameeyest… that I’ve been using already was banned ten days ago from the server, to coordinate more tinderboxes it might be a good idea to have a frontend running a status.net instance..

    Like

  15. Flameeyes,Do you need money to purchase the memory, or do you need some to purchase the memory and send to you since you don’t have access to such devices? How much does it costs (in US dollars please)?

    Like

  16. Randy, the hardware itself would probably be easier, since I wouldn’t have to pass through the income taxes for that…Storage-wise, someone already offered me a SSD so it might at least reduce the problem; more memory is still welcome as it could allow for multiple runs in parallel more easily. The Crucial memory noted above is priced US $379.99 directly from “Crucial”:http://www.crucial.com — but I’d need two of those kits to keep the two CPUs in sync.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s