Rose Tinted Glasses: On Old Computers and Programming

The original version of this blog post was going to be significantly harder to digest and it actually was much more of a rant than a blog post. I decided to discard that, and try to focus on the positives, although please believe me when I say that I’m not particularly happy with what I see around me, and sometimes it takes strength not to add to the annoying amount of negativity out there.

In the second year of Coronavirus pandemic, I (and probably a lot more people) have turned to YouTube content more than ever, just to keep myself entertained in lieu of having actual office mates to talk with day in and day out. This meant, among other things, noticing a lot more the retrocomputing trend: a number of channels are either dedicated to talk about both games from the 80s and 90s and computers from the same era, or they seem to at least spend a significant amount of time on those. I’m clearly part of the target audience, having grown up with some of those games and systems, and now being in my 30s with disposable income, but it does make me wonder sometimes about how we are treating the nostalgia.

One of the things that I noted, and that actually does make me sad, is when I see some video insisting that old computers were better, or that people who used them were smarter because many (Commodore 64, Apple II, BBC Micro) only came with a BASIC interpreter, and you were incentivised to learn programming to do pretty much anything with them. I think that this thesis is myopic and lacks not just in empathy, but also in understanding of the world at large. Which is not to say that there couldn’t be good ways to learn from what worked in the past, and make sure the future is better.

A Bit Of Personal History

One of the things that is clearly apparent watching different YouTube channels is that there are chasms between different countries, when it comes to having computers available at an early age, particularly in schools. For instance, it seems like a lot of people in the USA have had access to a PET in elementary or junior high schools. In the UK instead the BBC Micro has been explicitly designed as a learning computer for kids, and clearly the ZX Spectrum became the symbol of an entire generation. I’m not sure how much bias there is in this storytelling — it’s well possible that for most people, all of these computers were not really within reach, and only a few expensive schools would have access to it.

In Italy, I have no idea what the situation was when I was growing up, outside of my own experience. What I can say is that until high school, I haven’t seen a computer in school. I know for sure that my elementary school didn’t have any computer, not just for the students, but also for the teachers and admins, and it was in that school that one of the teachers took my mother aside one day and told her to make me stop playing with computers because «they won’t have a future». In junior high, there definitely were computers for the admins, but no students was given access to anything. Indeed, I knew that one of the laboratories (that we barely ever saw, and really never used) had a Commodore (either 64 or 128) in it. This was the same years that I finally got my own PC at home: a Pentium 133MHz. You can see there is a bit of a difference in generations there.

Indeed, it might sound even strange that I even had a Commodore 64. As far as I know, I was the only one having it in my school: a couple of other kids had a family PC at home (which later I kind of did too), and a number of them had NES or Sega Master Systems, but the Commodore best years were long gone by the time I could read, so how did I end up with one? Well, as it turns out, not as a legacy from anyone older than me, which would be the obvious option.

My parents bought the Commodore 64 around the time I was seven, or at least that’s the best I can date it. It was, to the best of my knowledge, after my grandfather died, as I think he would have talked a bit more sense into my mother. Here’s a thing: my mother has had a quirk for encyclopaedias and other books collection, so when me and my sisters were growing up, the one thing we never missed was access to general knowledge. Whether it was a generalist encyclopedia with volumes dedicated to the world, history, and science, or a “kids’ encyclopedia” that pretty much only covers stuff aimed at preteens, or a science one that goes into details of the state of the art scientific thinking in the 80s.

So when a company selling a new encyclopedia, supposedly compiled and edited locally, called my parents up and offered a deal of 30 volumes, bound in a nice and green cover, and printed in full colour, together with a personal computer, they lapped it up fairly quickly. Well, my mother did mostly, my father was never someone for books, and couldn’t give a toss generally about computers.

Now, to be honest, I have fond memories of that encyclopedia, so it’s very possible that this was indeed one of the best purchases my parents undertook for me. Not only most of it was aimed at elementary-to-junior high ages, including a whole volume on learning grammar rules and two on math, but it also came with some volumes full to the brim of questionable computer knowledge.

In particular, the first one (Volume 16, I still remember the numbers) came with a lot of text describing computers, sometimes in details so silly that I still don’t understand how they put it together: it is here that I first read about core memory, for instance. It also went into long details about videogames of the time, including text and graphical adventures. I really think it would be an interesting read for me nowadays that I understand and know a lot more about computers and games at the time.

The second volume focused instead on programming in BASIC. Which would have been a nice connection to the Commodore 64 if it wasn’t that the described language was not the one used by the Commodore 64 in the first place, and it didn’t really go into details of how to use the hardware, with POKE and PEEK and the like. Instead it tried to describe some support for printers and graphics, that never worked on the computer I actually had. Even when my sister got a (second) computer, it came with GW-BASIC and it was also not compatible.

What the second volume did teach me, though, was something more subtle, which would take me many years to understand fully. And that is that programming is a mean to an end, for most people. The very first example of a program in the book, is a father-daughter exercise in writing a BASIC program to calculate the area of the floor of a room based on triangles and Heron’s Formula. This was a practical application, rather than teaching concepts first, and that may be the reason why I liked learning from that to begin with.

Now let me rant aside for a moment that the last time I wrote something about teaching, I ended up tuning out of some communities because I got tired of hearing someone complain that I cannot possibly have an opinion on teaching materials without having taught in academia. I have a feeling that this type of behaviour is connected with the hatred for academia that a number of us have. Just saying.

You may find it surprising that these random volumes of an encyclopedia my mother brought home when I could barely read would stay this long with me, but the truth is that I pretty much carried them along with me for many years. Indeed, they had two examples in the book that I nearly memorized, that were connected to each other. The first was a program that calculated the distance in days between two dates — explaining how the Gregorian calendar worked, including the rules for leap years around centuries. The second used this information to let you calculate a “biorhythm” that was sold as some ancient greek theory but was clearly just a bunch of “mumbo-jumbo” as Adam Savage would say.

The thing with this biorhythm idea, though, is that it’s relatively straightforward to implement: the way they describe it is that there’s three sinusoidal functions that set up three “characteristics” on different period lengths, so you calculate the “age in days” and apply a simple mathematical formula, et voilà! You have some personalised insight that is worth nothing but some people believe in. I can’t tell for sure if I ever really believed in those, or if I was just playing along like people do with horoscopes. (One day I’ll write my whole rant on why I expect people may find horoscope sign traits to be believable. That day is not today.)

So, having a basis of something to lay along with, I pretty much reimplemented this same idea over, and over, and over again. It became my “go to” hello world example, and with enough time it allowed me to learn a bit more of different systems. For example, when I got my Pentium 133 with Windows 95, and one of the Italian magazines made Visual Basic 5 CCE available, I reimplemented the same for that. When the same magazine eventually included a free license of Borland C++ Builder 1.0, as I was learning C++, I reimplemented it there. When I started moving to Linux more of the time and I wanted to write something, I did that.

I even got someone complaining that my application didn’t match the biorhythm calculated with some other app, and I had to find a diplomatic way to point out that there’s nothing scientific with either of thsoe and why should they even expect two apps to agree with it?

But now I’m digressing. The point I’m making is that I have, over the years, kept the lessons learned from those volumes with me, in different forms, and in different contexts. As I said, it wasn’t until a few years back that I realized that for most people, programming is not an art or a fun thing to do in their spare time, but it’s just a mean to an end. They don’t care how beautiful, free, or well designed a certain tool is, if the tool works. But it also means that knowing how to write some level of software gives empowers. It gives people power to build the tools they don’t have, or to modify what is already there but doesn’t quite work the way they want.

My wife trained as a finance admin, she used to be an office manager, and has some experience with CAFM software (Computer Aided Facilities Management). Most CAFM suites allow extensions in Python or JavaScript, to implement workflows that would otherwise be manual and repeating. This is the original reason she had to learn programming: even in her line of work, it is useful knowledge to have. It also comes with the effect of making it easier to understand spreadsheets and Excel — although I would say that there’s plenty of people who may be great at writing Python and C, but would be horrible Excel wranglers. Excel wrangling is its own set of skills and I submit to those who actually have them.

So Were Old Computers Better?

One of the often repeated lines is that old computers were better because either they were simpler to understand in one’s mind, or because they all provided a programming environment out of the box. Now, this is a particularly contentious point to me, because pretty much every Unix environment always had the same ability of providing a programming environment. But also, I think that the problem here is that there’s what I would call a “bundling of concerns”.

First of all, I definitely think that operating systems should come with programming and automation tools out of the box. But in fact that has (mostly) been the case since the time of Commodore 64 for me personally. On my sister’s computer, MS-DOS came with GW-BASIC first (4.01), and QBasic later (6.22). Windows 98 came with VBScript, and when I first got to Mac OS X it came with some ugly options, but some options nonetheless. The only operating system that didn’t have a programming environment for me was Windows 95, but as I said above, Visual Basic 5 CCE covered that need. It was even better with ActiveDesktop!

Now, as it turns out, even Microsoft appears to work to make it easier to code in Windows, with Visual Studio Code being free, Python being available in the Microsoft Store, and all those trimmings. So it’s hard to argue that there aren’t more opportunities to start programming now than there were in the early ’90s. What might be arguable is that nowadays you do not need to program to use a computer. You can use a computer perfectly fine without ever having learnt a programming language, and you don’t really need to know the difference between firmware and operating system, most of the time. The question becomes, whether you find this good, or bad.

And personally, I find it good. As I said, I find it natural that people are interested in using computers and software to do something, and not just for the experience of using a computer. In the same way I think most people would use a car to go to the places they need to go to, rather than just for the sake of driving a car. And in the same spirit of the car, there are people who enjoy the feeling of driving even when they don’t have a reason to, and there are people who find unnecessary things to be required when it comes to computers and technology.

I wish I found it surprising, but I just find it saddening that so many developers seem to be falling into the trap of thinking that just because they became creative by writing programs (or games, or whatever), the fact that computer users stopped having to learn programming means that they are less creative. John Scalzi clearly writes it better than me: there’s a lot of creativity in modern devices, even those that are attacked for being “passive consumption devices”. And a lot of that is not about programming in the first place.

What I definitely see is a pattern of repeating the behaviour of the generation that came before us, or maybe the one who came before them, I’m not sure. I see a number of parents (but thankfully by no mean all of them), insisting that since they learnt their trade and their programming a certain way, their kids should have the same level of tools available, no more and no less. It saddens me, even sometimes angers me, because it feels so similar to the way my own father kept telling me I was wasting my time inside, and wanted me to go and play soccer as he did in his youth.

This is certainly not only my experience, because I have talked and compared stories with quite a few people over the years, and there’s definitely a huge amount of geeks in particular who have been made fun of by their parents, and left scarred by that. And some of them are going to do the same to their kids, because they think their choice of hobbies is not as good as the ones we had in the good old days.

Listen, I said already in the past that I do not want to have children. Part of it has always been the fear of repeating the behaviour my father had with me. So of course I should not be the one to judge what others who do have kids do. But I do see a tendency from some, to rebuild the environment they grew up in, expecting that their kids would just pick up the same strange combination of geekiness they have.

At the same time I see a number of parents feeding the geekiness in their children with empowerment, giving them tools and where possible a leg up in life. Even this cold childfree heart warms up to see kids being encouraged to learn Scratch, or Minecraft.

What About All The Making, Then?

One of the constant refrains I hear is that older tools and apps were faster and more “creative”. I don’t think I have much in terms of qualifications to evaluate that. But I’m also thinking that for the longest time, creativity tools and apps were only free if you pirated them. This is obviously not to dismiss the importance of FLOSS solutions (otherwise why would I still be writing on the topic?) but the fact that a lot of the FLOSS solutions for creativity appear to have a similar spirit to the computers in the ’80s: build the tools you want to be creative.

I’m absolutely sure that there will be people arguing that you can totally be creative with Gimp and Inkscape. I also heard a lot more professionals laughing in the face of such suggestions, given the lack of important features that tools like that have had in comparison with proprietary software for many years. They are not bad programs per se, but they do find their audience in a niche compared to Photoshop, Illustrator, or Affinity Designer. And it’s not to say that FLOSS tools can’t become that good. I have heard the very same professionals who sneered (and still sneer) at Inkscape, point out how Krita (which has a completely different target audience) is a fascinating tool.

But when we look back at the ’90s, not even many FLOSS users would consider Gimp an useful photo-editing tool. If you didn’t have the money for the creativity, your option was most likely chosen between a pirate copy of Photoshop, or maybe if you’re lucky and an Italian magazine gifted it out, a license for Macromedia xRes 2.0. Or maybe FreeHand. Or Micrografx Windows Draw!.

The thing is, a lot of free-but-limited tools online are actually the first time that a wide range of people have finally been able to be creative. Without having to be “selected” as a friend of Unix systems. Without having to pirate software to be able to afford it, and without having to pony up a significant investment for something that they may not be able to make good use of. So I honestly welcome that, when it comes to creativity.

Again: the fact that someone cannot reason around code, or the way that Inkscape or Blender work, does not mean that they are less creative, or less skilled. If you can’t see how people using other tools are being just as creative, you’re probably missing a lot of points I’m making.

But What About The Bloated Web?

I’ve been arguing for less bloat in… pretty much everything, for the past 17 years on blogs and other venues. I wrote tools to optimize (even micro-optimize in some cases) programs and libraries so that they perform better on tiny systems. I have worked on Gentoo Linux, that pretty much allows you to turn off everything you can possibly turn off so you can build the minimalistic system you can think of. So I really don’t like bloat.

So is the web bloated? Yes, I’d say so. But not all of it is bloat, even when people complain about it. I see people suggesting that UTF-8 is bloat. That dynamic content is bloat. That emojis are bloat. Basically anything they don’t need directly is bloat.

So it’s clearly easy to see how your stereotypical 30-something US-born-and-raised, English-only-speaking “hacker” would think that an unstyled, white-on-black-background (or worse, green-on-black) website in ASCII would be the apotheosis of usable web. But that is definitely not what everyone would find perfect. People who speak languages needing more than ASCII exist, and are out there. Heck, people for whom the actual bloat from UTF-8 (vs UTF-16) is the wasteful optimization for ASCII are probably the majority of the world! People who cannot read on black backround exist, and they are even developers themselves at times (I’m one of them, which is why all my editors and terminals use light backgrounds, I get migraines from black backgrounds and dark themes).

Again, I’m not suggesting that everything is perfect and nothing needs to change. I’m actually suggesting that a lot needs to change, but it is not everything needs to change. So if you decide to tell me that Gmail is bloated and slow and use that as the only comparison to ’90s mail clients, I would point out to you that Gmail has tons of features that are meant for users not to shoot themselves in the feet, as well as being a lot more reliable than Microsoft Outlook Express or Eudora (which I know has lots of loyal followers, but I could never get behind myself), and also that there are alternatives.

Let me beat this dead horse a bit more. Over on Twitter when this topic came up, I was given the example of ICQ vs Microsoft Teams. Now the first thing is, I don’t use Teams. I know that Teams is an Electron app, and I know that most Electron app are annoyingly heavy and use a ton of resources. So, fair, I can live with calling them “bloated”. I can see why they chose this particular route, and disagree with it, but there is another important thing to note here: ICQ in 1998 is barely comparable with a tool like Teams, that is pretty much a corporate beast.

So instead, let’s try to compare something that is a bit more close: Telegram (which is already known I use — rather than talking about anything that I would have a conflict of interest on). How fast is Telegram to launch on my PC? It’s pretty much a single click to start and it takes less than a second on the beast that is my Gamestation. It also takes less than a second on my phone. How much did ICQ take to load? I don’t remember, but quite a lot longer because I remember seeing a splash screen. Which may as well have been timed to stay on the screen for a second or so because the product manager requested that, like it happened at one of my old jobs (true story!)

And in that, would ICQ provide the same features of Telegram? No, not really. First of all, it was just messages. Yes it’s still instant messaging and in that it didn’t really change much, but it didn’t have the whole “send and receive pictures” we have on modern chat applications, you ended up with having to do peer-to-peer transfers and good luck with that. It also had pretty much *no* server-side support for anything, at least when I started using it in 1998: your contact list was entirely client-side, and even the “authorization” to add someone to your friend list was a simple local check. There were plenty of ways to avoid these checks, too. Back in the day, I got in touch with a columnist from the Italian The Games Machine, Claudio Todeschini (who I’m still in touch with, but because life is strange and we met in person in a completely different situation many, many years later); the next time I re-installed my computer, having forgotten to back up ICQ data, I didn’t have him in my contacts anymore, and unsure on whether he would remember me, I actually used a cracked copy of ICQ to re-add him to my contacts.

Again, this was the norm back then. It was a more naive world, where we didn’t worry that much about harassment, we didn’t worry so much about SWATing, and everything was just, well, simpler. But that doesn’t mean it was good. It only meant that if you did worry about harassment, if someone was somehow trying to track you down, if the technician at your ISP was actually tapping your TCP sessions, they would be able to. ICQ was not encrypted for many years after I started using it, not even c2s, let alone e2e like Telegram secret chats (and other chat clients) are.

Someone joked about trying to compare running software on the same machine to see the performance fairly, but that is an absolute non-sequitur. Of course we use a lot more resources in absolute terms, compared to 1998! Back then I still had my Pentium 133MHz, with 48MiB of RAM (I upgraded!), a Creative 3D Blaster Banshee PCI (because no AGP slots, and the computer came with a Cirrus Logic that was notorious for not working well with Voodoo 2), and a Radio card (I really liked radio, ok?). Nowadays, my phone has a magnitude or two more resources, and you can find 8051s just as fast.

Old tech may be fascinating and easier to get into when it comes into learning how it all fits together, but the usable modern tech is meant to take trade offs toward the users more and more. That’s why we have UIs, that’s why we have touch inputs, that’s even why we have voice-controlled assistants, much as a number of tech enthusiasts appear to want to destroy them all.

Again, this feels like a number of people are yelling “Kids these days”, and repeating how “in their days” everything was better. But also, I fear there are a number of people who just don’t appreciate how a lot of the content you see on YouTube, particularly in the PC space of the ’90s and early ’00s, is not representative of what we experienced back then.

Let me shout out to two YouTubers that I find are doing it right: LGR and RetroSpector78. The former is very open to point out when he’s looking at a ludicrous build of some kind, that would never be affordable back in the day; the latter is always talking about what would be appropriate for the vintage and usage of a machine.

Just take all of the videos that use CF2IDE or SCSI2SD to replace “spinning rust” hard drives of yonder. This alone is such a speed boost on loading stuff that most people wouldn’t even imagine. If you were to try to load a program like Microsoft Works on a system that would be perfect for the time except for the storage, you would be experiencing a significant different loading time than it was back in the day.

And, by the way, I do explicitly mean Microsoft Works, not Office because, as Avery pointed out on Twitter, that was optimized for load speed — by starting a ton of processes early on, trading memory usage for startup speed. The reason why I say that is because, short of pirated copies of Office, most people in the ’90s that I know would be able to use at best Works, because it came pre-installed on their system.

So, What?

I like the retrocomputing trend, mostly. I love Foone’s threads, because one of the most important things he does is explain stuff. And I think that, if what you want is to learn how a computer works in detail, it’s definitely easier to do that with a relatively uncomplicated solution first, and build up to more modern systems. But at the same time, I think there is plenty of abstraction that don’t need to be explained if you don’t want to. This is the same reason why I don’t think that using C to teach programming and memory is a great idea: you need to know too much of details that are not actually meant to be understood for newcomers.

I also think that understanding the techniques used in both designing, and writing software for, constrained systems such as the computers we had in the ’80s and ’90s does add to the profession as a whole. Figuring out which trade off was and was not possible at the time is one step, finding and possibly addressing some of the bugs is another. And finally there is the point we’re getting to a lot lately: we can now build replacement components with tools that are open to everyone!

And you know what? I do miss some of the constrained systems, because I have personal nostalgia for them. I did get myself a Commodore 64 a couple of years ago, and I loved the fact that, in 2021, I can get the stuff I could have never afforded (or even didn’t exist) back when I was using it: fast loaders, SD2IEC, a power supply that wouldn’t be useful as a bludgeoning instrument, and a SCART cable to a nice and sharp image, rather than the fuzzy one when using the RF input I had to.

I have been toying with the idea of trying to build some constrained systems myself. I think it’s a nice stretch for something I can do, but with the clear note that it’s mostly art, and not something that is meant to be consumed widely. It’s like Birch Books to me.

And finally, if you only take a single thing away from this post, is that you should always remember that an usable “bloated” option will always win over a slim option that nobody but a small niche of people can use.

More threads is not always a good solution

You might remember that some time ago I took over unpaper as I used (and sort of use still) to pre-filter the documents I scan to archive in electronic form. While I’ve been very busy in the past few months, I’m now trying to clear out my backlog, and it turns out that a good deal of it involves unpaper. There are bugs to fix and features to implement, but also patches to evaluate.

One of the most recent patches I received is designed to help with the performance of the tool, which are definitely not what Jens, the original author, had in mind when he came up with the code. Better performance is something that about everybody would love to have at this point. Unfortunately, it turns out that the patch is not working out as intended.

The patch is two-fold: from one side, it makes (optional) use of OpenMP to parallelize the code in hope to speed it up. Given that most of the code is a bunch of loops, it seemed obvious that using multithreading would speed things out, right? Well, first try after applying it showed very easily that it slows down, at least on Excelsior, which is a 32-cores system. While it would take less than 10 seconds for the first test to run without OpenMP, it would take over a minute with it, spinning up all the cores for a 3000% CPU usage.

A quick test shows that forcing the number of threads to 8, rather than leaving it unbound, makes it actually faster than the non-OpenMP variant. This means that there are a few different variables in play that needs to be tuned for performance to improve. Without going into profiling the code, I can figure out a few things that can go wrong with unchecked multithreading:

  • extensive locking when each worker thread is running, either because they are all accessing the same memory page, or because the loop needs a “reduced” result (e.g. a value has to be calculated as the sum of values calculated within a parallelized loop); in the case of unpaper I’m sure both situations happen fairly often in the looping codepath.
  • cache trashing: as the worker threads jump around the memory area to process, it no longer is fetching a linear amount of memory;
  • something entirely more complicated.

Beside being obvious that doing the “stupid” thing and just making all the major loops parallel, this situation is bringing me to the point where I finally will make use of the Using OpenMP book that I got a few years back and I only started reading after figuring out that OpenMP was not ready yet for prime time. Nowadays OpenMP support in Linux improved to the point it’s probably worth taking another look at it, and I guess unpaper is going to be the test case for it. You can expect a series of blog posts on the topic at this point.

The first thing I noticed while reading the way OpenMP handles shared and private variables, is that the const indication is much stronger when using OpenMP. The reason is that if you tell the code that a given datum is not going to change (it’s a constant, not a variable), it can easily assume direct access from all the threads will work properly; the variable is shared by default among all of them. This is something that, for non-OpenMP programs, is usually inferred from the SSA form — I assume that for whatever reason, OpenMP makes SSA weaker.

Unfortunately, this also means that there is one nasty change tha tmight be required to make code friendlier to OpenMP, and that is a change in prototypes of functions. The parameters to a function are, for what OpenMP is concerned, variables, and that means that unless you declare them const, it won’t be sharing them by default. Within a self-contained program like unpaper, changing the signatures of the functions so that parameters are declared, for instance, const int is painless — but for a library it would be API breakage.

Anyway, just remember: adding multithreading is not the silver bullet you might think it is!

P.S.: I’m currently trying to gauge interest on a volume that would collect, re-edit, and organize all I written up to now on ELF. I’ve created a page on leanpub where you can note down your interest, and how much you think such a volume would be worth. If you would like to read such a book, just add your interest through the form; as soon as at least a dozen people are interested, I’ll spend all my free time working on the editing.

Why would an executable export symbols?

I think this post might be interesting for those people interested in trying to get all the performance power out of a box, without breaking anything in the process.

I’ve blogged before about the problems related to exporting symbols from final executables, but I haven’t really dug deep enough to actually provide useful information to developers and users about what those exported symbols represent, for an executable.

First of all, let’s start to say that an executable under Linux, and on most modern Unixes, is just the same kind of file of a shared object (shared library, if you prefer, those which are called DLL under Windows, if you come from a Windows background). And exactly as shared libraries, they can export symbols.

Exported symbols are resolved by the dynamic – runtime – linker, through the process of dynamic bindings, and thy might collide. I’ll return on the way the runtime linker works at a different moment. For now let’s just say that the exported symbols require some extra step to be taken during the execution of a program, and that the process takes time.

Executables don’t usually need to export symbols, and they usually don’t export symbols at all. Although there are rare cases where executables are required to export symbols, for instance because they are used by some of the libraries they link to as a “callback” from the library to the program, or for C++ programs for RTTI to properly work, most of the times the symbols are exported just because of libtool.

By default when you link a program, it doesn’t get its symbols exported, they are hidden and thus resolved directly at buildtime, for those symbols present in the source files themselves. When you add code to a convenience library that is built with libtool, then something changes, and the symbols defined inside that library are exported even when linking it statically inside the final executable.

This causes quite a few drawbacks. and as I said, is not usually used for anything:

  • the symbols are resolved at runtime through dynamic binding, which takes time, even if usually very little for a normal system, repeated time wasted during dynamic binding might actually become a good deal of time;

  • the symbols might collide with a library loaded afterward, this is for instance why recode breaks PHP;

  • using --gc-sections won’t help much because exported symbols are seen as always used, and this might increase the amount of code added to the executable with no good reason;

  • prelink will likely set the wrong addresses for symbols that collide, which in turn will drop off the improvement of using prelink entirely, at least for some packages.

The easy solution for this is for software packages to actually check if the compiler supports hidden visibility, and if it does, hide all the symbols but for the ones in the public API of their libraries. In case of software like cmake that install no shared objects, hidden visibility could be forced by the ebuild, but to give back to the community, the best thing is to get as much software as possible to use hidden visibility, thus reducing the amount of symbols that gets exported on both binaries and shared libraries.

I hope these few notes might actually help Gentoo maintainers to understand why I’m stressing on this point. It would be nice if we all could improve the software we maintain, even one step at a time.

As for what concerns my linking collision scripts, the bad packages’ list got a few more entries today: KViewShell with djvulibre, Karbon with gdk-pixbuf, and gcj, with both boehm-gc and libltdl.

And now I can actually start seeing the true collisions, like gfree symbol, having two different definitions in libgunicode.so (fontforge) and poppler/libkpdfpart (xpdf code), or the scan_token symbol in ghostscript with a completely different definition in libXfont/libt1.

Talking about libXfont and libt1 (or t1lib). I wonder if there is hope in the future for one to use the other, rather than both use the same parser code for type1 fonts. I’ll have to check the FreeDesktop bugzilla tomorrow to see if it was ever discussed. At the moment they duplicate a lot of symbols one with the other.

I have to say, PostgreSQL is an important speed improvement, that will allow me to complete my task in shorter time. Now I’m waiting for Patrick to run my script over the whole set of packages in Gentoo, that might actually be something. If only there was an easy way to make building and testing code faster (for COW reduction) without changing hardware, that would be awesome. Unfortunately that I need to do locally :(

Reminding a weakness of Prelink

For extra content about this entry, please refer to the previous one which talks about array of strings and PIC.

As I say in the other post, prelink can reduce the amount of dirty RSS pages due to COW of PIC code. As prelink assigns to every library a predefined load address in memory, which is either truly unique, or unique within the scope of the set of programs who are said to be able to load that library, there is no COW (Copy-on-Write) during load as the loader doesn’t have to change the address loaded from the file, and is thus able to share the same page with many processes. This is how prelink save (sometimes a lot of) memory.

Unfortunately there is one big problem especially with modern software architectures: many programs now use runtime-loaded plugins for functions; the whole KDE architecture is based on this, even for KDE 4, as well as xine and others and others.

The problem is that prelink can’t really take into account the plugins, as it doesn’t know about them. For instance it can’t understand that amarok is able to load the xine engine, thus amarokapp is not going to be prelinked for libxine.so. Additionally, it can’t understand that libxine.so is able to load the xineplug_decode_ff.so, which in turn depends on libavcodec.so. This means that for instance when using the -m switch, it could be assigning libqt3-mt.so and libavcodec.so the same address.. causing a performance hit rather than improvement at runtime, when the linker will have to relocate all the code of libavcodec.so, triggering a COW.

The same is true for almost all scripting languages which use C-compiled extensions: Perl, Ruby, Python, as you can’t tell that the interpreter can load them just by looking at the ELF file, which is what prelink does.

A possible way around this is to define post-facto, that is after compilation, which shared object the program can load. It could probably be done through a special .note section in the ELF file and a modified prelink, but I’m afraid it would be quite difficult to properly implement it especially in ebuilds. On the other hand, it might give quite some performance improvement; as I said today’s software architecture are often based on on-demand loading of code through plugins, so it could be quite interesting.