After all I wrote before, I’m sure at least some people might think that every COW is a bad COW, and that you should never use COW sections in your programs and libraries.
It’s not exactly this way. There are times when using a copy on write section like .bss
is a good choice over the alternatives.
This is true for instance for buffers: there are mainly three ways to handle buffers: malloc()
allocated buffers, automatic buffers allocated on the stack, and static buffers that are added to .bss
.
A buffer allocated on the stack has the main advantage of not having to be explicitly free()
’d, but big buffers on the stack, especially if not well warded, might cause security issues. Plus it might require a big stack.
A buffer allocated in heap through malloc()
is more flexible, as you only request memory as needed, and free it as soon as it’s not needed anymore (for stack-based buffers, you need to wait the end of the block, or create a block for the instructions to be executed that use the buffer). This reduces the memory footprint when looked at in time, but it has a little overhead as malloc()
and free()
are called.
Another option is to use static arrays as buffers. Non-initialised static arrays are put into .bss
which is a copy on write section that is usually backed against the zero page (although I’m not yet sure how the changes in Linux 2.6.24 about zero page affect this statement). The good thing about having static buffers is that you don’t need to manage their lifetime, neither explicitly, nor implicitly, as they are already allocated at the start of the program.
This is not good for libraries, as you might have a static buffer in .bss
which is never used, but still takes up memory once copy on write of other, used, .bss
variables are modified. The thing is better for simple fire-off programs, which starts and terminate quickly (non-interactive programs).
It’s also important to note that libraries should always be designed for work in multi-threaded software as a good design principle, and that static variables and arrays there are not much useful, unless they are all marshalled by the presence of a mutex (which will reduce performance). For this reason, .bss
is a bad thing, for libraries, in almost all cases.
For fire-off programs, as I said this is less of a problem, as the buffer might just be used a few couple of times during the life of the program, and if it’s reasonably sized, it might not even impact the whole memory usage of the program (even a single static variable, once changed, will require you to waste a 4KiB memory page, so if you add a 100 bytes variable, that will not change; it will change if you use a 4KiB, or bigger, static buffer).
So sometimes you just have to give up, the static buffer might be increasing the performance of the program, so it just has to stay there. This is why I don’t really fight with .bss
too much, the only thing that I don’t think should ever go to .bss
are tables: calculating them at runtime is useful only for single task embedded systems, so there has to be a way to opt out from that by using hardcoded tables calculated before or right at build time.
Another good use of .bss
is when the memory would just be allocated at the start of the program and then freed at the exit. This is often the case in xine-ui for instance, as there are big structures with all the state information for a certain feature. Those data cannot be shared between instances, and has a life so long that it’s just easier to allocate it all at once, rather than using up heap or stack for them (in xine-ui, especially, some structures were accessed through a .bss
pointer, which was set to an allocated memory area once the feature was activated, and freed either when the feature was temporarily not used anymore, or when the program exited; while you don’t always get the stream information, trying to save a few KiBs of memory by using heap memory might not be a good idea if you have to access the data through a pointer, rather than having the structure in .bss
and skipping over the pointer).
So for this reason while I’d be happy if we could find ways to avoid using COW sections at all and still be fast, I’m not targetting moving all the numbers to zero, I just want to make sure that there aren’t memory areas where the space is just wasted.
On the other hand, there are cases which show that something in the design of the program might just be way out of the sane league, for instance the 10MB of .bss
that is used by the quota support of xfsprogs is tremendously suspicious.
New patches! Just for our Gentoo users – and for developers of other distributions who want to take the patches 😉 – I’ve added a few more patches to my overlay: giflib’s patch is now in sync with what upstream applied, moving back a constant to variable as needed, while app-arch/libarchive and sys-apps/file got one patch each to reduce their COW memory usage. Both had good results by applying character arrays in structures.
Thanks for all the work you put into Gentoo + applications!These are the kind of optimizations that make Linux such a great OS, that even runs unlike popular M$ OS’ on decent hardware.
Premature optimization may seem bad but things like this or the project to reduce unrequired wakeups to save power are what makes free software really good.