In my previous post I’ve noted that there are some cases where --as-needed
stops a program from building even though it’s not because of an indirect link. I like to call this class of failures the “misguided link” failures.
Consider the following diagram showing such a case:
We have a given software, linking to libssl and instead using libcrypto. This is the inverse of the indirect case I wrote about last time, but it still features a link relationship with no use relationship, which is going to be cut by --as-needed
. This is one of the most interesting cases since it’s really difficult to identify without going to check either the source code or the missing symbols. It’s not limited to OpenSSL libraries, it’s actually pretty common in general, but it happens quite a lot with them since people forget that OpenSSL is more than just libssl.
So how can we identify this problem? Well the first issue here is to identify what can cause this. Let’s say we have a simple software that calculates the MD5 of its standard input, something like this:
#include <stdint.h>
#include <stdio.h>
#include <openssl/md5.h>
int main() {
MD5_CTX md5;
uint8_t md5digest[MD5_DIGEST_LENGTH];
int i;
MD5_Init(&md5);
while(!feof(stdin)) {
char buff[4096] = { 0, };
size_t read = fread(buff, 1, sizeof(buff), stdin);
MD5_Update(&md5, buff, read);
}
MD5_Final(&md5digest[0], &md5);
for(i = 0; i < sizeof(md5digest); i++)
printf("%02x", md5digest[i]);
printf("n");
return 0;
}
Now if we try to compile this on a system without forced --as-needed
(and no --as-needed
in LDFLAGS) linking it with -lssl
, it will work just fine
% GCC_SPECS="" gcc md5-ssl.c -o md5-ssl -lssl
% scanelf -n md5-ssl
TYPE NEEDED FILE
ET_EXEC libssl.so.0.9.8,libc.so.6,libcrypto.so.0.9.8 md5-ssl
% ldd md5-ssl
linux-vdso.so.1 => (0x00007fff11bfe000)
libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0x00007f070961e000)
libc.so.6 => /lib/libc.so.6 (0x00007f07092ab000)
libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x00007f0708f19000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0709870000)
libdl.so.2 => /lib/libdl.so.2 (0x00007f0708d15000)
but if we try to compile it with forced --as-needed
, or even just --as-needed
in LDFLAGS, the results are quite different:
% gcc md5-ssl.c -o md5-ssl -lssl
/tmp/.private/flame/cc8kRKqi.o: In function `main':
md5-ssl.c:(.text+0x10): undefined reference to `MD5_Init'
md5-ssl.c:(.text+0x8d): undefined reference to `MD5_Update'
md5-ssl.c:(.text+0xae): undefined reference to `MD5_Final'
collect2: ld returned 1 exit status
% GCC_SPECS="" gcc -Wl,--as-needed md5-ssl.c -o md5-ssl -lssl
/tmp/.private/flame/ccVWCirl.o: In function `main':
md5-ssl.c:(.text+0x10): undefined reference to `MD5_Init'
md5-ssl.c:(.text+0x8d): undefined reference to `MD5_Update'
md5-ssl.c:(.text+0xae): undefined reference to `MD5_Final'
collect2: ld returned 1 exit status
A lot of people at this point would be thrown off since the library is there, after the source files (or object files), there are no commodity libraries involved, so the linking line should be correct. But instead it fails, and the problem lies in using the wrong library.
As the name tells you, libssl contains functions that are used for implementing Secure Socket Layer, while MD5 is also used for the implementation, it’s not part of the interface. And indeed, MD5 functions are not part of the library’s interface.
Now, since even the man page for these function does not tell you which library to find them in (while most Linux, *BSD and Solaris man pages tell you which library a function comes from), you have to rely on either experience, or test to find which is the correct library.
Let’s try two different approaches here, just so that people can understand how I end up debugging these things in the first place.
To begin with, let’s check whether libssl provides the symbols we’re missing, we don’t expect it to since the link failed; easy way to do this? nm
and grep
:
% nm -D /usr/lib/libssl.so | egrep 'MD5_(Init|Update|Final)'
%
There is no defined nor undefined symbol with those names, which means there is no MD5 interface defined nor used in that library. Which explains why the link failed. Now since we know the build works without --as-needed
we check which library libssl brings in as dependencies:
% ldd /usr/lib/libssl.so
linux-vdso.so.1 => (0x00007fff1dbfe000)
libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x00007f2d1551d000)
libc.so.6 => /lib/libc.so.6 (0x00007f2d151aa000)
libdl.so.2 => /lib/libdl.so.2 (0x00007f2d14fa5000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2d15b28000)
The first library is the virtual dynamic shared object of the Linux kernel, let’s ignore it; the last is the dynamic linker (or loader) itself, which we also want to ignore; we can exclude libc or otherwise the program wouldn’t have failed, since that’s always brought in. We’re left with two candidates: libdl and libcrypto. Now let’s be very dumb and ignore the name “crypto”, as well as ignoring that libdl is home of dlopen()
and other known functions, and look in the two of them for the symbols:
% nm -D --defined-only /lib/libdl.so.2 | egrep 'MD5_(Init|Update|Final)'
% nm -D --defined-only /usr/lib/libcrypto.so.0.9.8 | egrep 'MD5_(Init|Update|Final)'
000000000006a5e0 T MD5_Final
000000000006a5a0 T MD5_Init
000000000006a6e0 T MD5_Update
So we found the problem, and indeed you can try yourself that requesting -lcrypto
directly in the build of the program above will make it work just fine with and without --as-needed
, with the added benefit that libssl is not being loaded when running the software.
Now this is a slightly boring and long approach, the alternative approach, which work just fine in Gentoo, requires just one command:
% scanelf -ql -s +MD5_Init
MD5_Init /usr/lib64/libcrypto.so.0.9.8
MD5_Init /usr/lib64/libgnutls-openssl.so.26.11.3
The scanelf call we have here will go searching for the correct library we need, although it might confuse you since it might report different implementation or totally unrelated libraries in case of symbol collisions (which is something I use to identify broken software by the way). Note that here I just targeted one symbol, the reason for this is that the current version of scanelf from 0.1.18 is not working properly with regex-based search; in the current CVS version you could be using scanelf -gqls 'MD5_(Init|Update|Final)'
, but it would just find the first anyway.
Is this easy enough to fix, in your opinion? Also consider that if software were to use pkg-config
, right now, it would be listing -lssl -lcrypto -ldl
, which would stop --as-needed
from breaking, but is most likely going to break in the future if libssl.pc
is updated to use Require.private to list libcrypto.
Note that this kind of an approach of linking to a library higher up in the chain instead of the one where the functions are actually defined working properly is a necessary requirement for being able to reduce some bloat in shared libraries in a certain way. I think (at least now initially) that –as-needed is the thing to fix here – don’t remove stuff that is actually needed – don’t go against the very meaning of “as-needed”, libcrypto is needed, don’t remove it from the chain of things to be used.Here’s an example where relying on such a thing (that you show to lead to undefined references) is beneficial:lets take libxml2 – it’s a great, much used, XML DOM-based library that does all kinds of useful things. However one of the things it does involves necessary relatively huge amounts of private dirty memory usage, while not many things linking against -lxml2 actually need them. Possible solution with minimal impact on ABI? Split out the more widely used API that doesn’t involve lots of private dirty memory into a separate new library – in lack of imagination lets name it libxml2-lite.so. Make libxml2 be the one implementing the less used parts and link it against libxml2-lite (it of course actually also needs it). Now everything using -lxml2 should keep working just fine and dandy, while those many users of the libraries that do not need those advanced specific features can start linking with -lxml2-lite at their own pace on systems whose libxml2 package has the split, to somewhat reduce runtime memory usage from less private dirty memory, while for some systems you actually never need those advanced things and don’t even need that code mmapped in if all users link to the smaller one with commonly used API.–as-needed crudely breaks this backwards compatibility and while I argue that static initialization is something that shouldn’t be labelled a breakage from –as-needed, I believe the case you are describing here should be labelled as breaking things and fixed at binutils level.
I actually think that binutils is right here instead, and I’d rather take the OpenSSL approach: have libxml2.pc reply with both @-lxml2 -lxml2-lite@ so that the software can get the one of the two that is actually needed as a “grace period” before making the two standalone by themselves.Since the NEEDED lines specify a given ABI, while -lfoo takes the latest ABI, there is space for breakage if mixed ABI are loaded when climbing up the NEEDED tree.
I see no reasoning as to why binutils is right in this case. And I believe quite some things unfortunately use -lxml2 directly instead of pkg-config. There’s no reason this couldn’t work. -lxml2 still guarantees the same ABI if its DT_NEEDED entries are checked for symbols before outright purging stuff (-lxml2 in this case) from being linked to – instead if –as-needed would behave non-breaking here, applications only using the common API would automatically be leaner without the application author even knowing he could now link only to libxml2-lite thanks to a well behaving –as-needed if it were well behaving here…
Diego, thank you for all your Gentoo Development and technical articles. I added something to the coffee fundraiser.Cheers from Germany.
Thanks a lot Tom :)Keep on reading the blog because I have some more to write about the topic, that I’ll probably upload for next week!
You may want to look at vtd-xml as the state of the art in XML processing, consuming far less memory than DOMvtd-xml