Progress Logging and Results Logging

There is one thing that my role as software mechanic seems to get me attracted to, and that’s the importance of logging information. Logging is one of those areas that tend to bring up opinions, and with the idea of making this into a wider area of observability, it brought on entire businesses (shout out to friends Honeycomb.io). But even in smaller realities, I found myself caring about logging, setting up complex routing with metalog, or hoping to get a way to access Apache logs in a structured format.

Obviously, when talking about logging in bubbles, there’s a lot more to consider than just which software you send the logs to — even smaller companies nowadays need to be careful with PII, since GDPR makes most data toxic to handle. I can definitely tell you that some of the analysis I used to do for User-Agent filtering would not pass muster for a company at the time of GDPR — in a very similar fashion as the pizzeria CRM.

But leaving aside the whole complicated legal landscape, there’s a distinction in logs that I have not seen well understood by engineers – no matter where they are coming from – and that is the difference between what I call progress logging and results logging. I say that I call them this way, because I found a number of other different categorizations of logs, but none that matches my thoughts on the matter, and I needed to give it names.

Distinctions that I did hear people talk about are more like “debug logs” versus “request logs”, or “text logs” versus “binary logs”. But this all feels like it’s mixing media and message, in too many cases — as I said in my post about Apache request logs, I would love for structured (even binary would do) request logs, which are currently “simple” text logs.

Indeed, Apache (and any other server) request logs to me fit neatly in the category of results logging. They describe what happened when an action completed: the result of the HTTP request includes some information of the request, and some information of the response. It provides a result of what happened.

If you were to oversimplify this, you could log each full request and each full response, and call that results logging: a certain request resulted in a certain response. But I would expect that there is a lot more information available on the server, which does not otherwise make it to the response, for many different reasons (e.g. it might be information that the requestor is not meant to find out, or simply doesn’t have to know, and the response is meant to be as small as possible). In the case of an HTTP request to a server that act as a reverse proxy, the requestor should not be told about which backend the request was handled by — but it would be an useful thing to log as part of the result.

When looking at the practicality of implementing results logging, servers tend to accrue the information needed for generating the result logs in data structures that are kept around throughout the request (or whatever other process) lifetime, and then extract all of the required information from them at the time of generating the log.

This does mean that if the server terminates (either because it’s killed, the power goes off, or the request caused it to crash), and the result is not provided, then you don’t get a log about the request happening at all — this is the dirty secret of Apache request logs (and many other servers): they’re called request logs but they actually log responses. There are ways around this, by writing parts of the results logs as they are identified – this helps both in terms of persistence and in terms of memory usage (if you’re storing something in memory just because you should be logging it later) but that ends up getting much closer to the concept of tracing.

Progress logs, instead, are closer to what is often called shotgun debugging or printf() debugging. They are log statement emitted as the code goes through them, and they are usually free-form for the developer writing the code. This is what you get with libraries such as Python’s logging, and can assume more or less structured form depending on a number of factors. For instance, you can have a single formatted string with maybe the source file and line, or you may have a full backtrace of where the log event happened and what the local variables in each of the function calls were. What usually make you choose between the two is cost, and signal-to-noise ratio, of course.

For example, Apache’s mod_rewrite has a comprehensive progress log that provides a lot of details of how each rewrite is executed, but if you turn that on, it’ll fill up your server’s filesystem fairly quickly, and it will also make the webserver performance go down the drain. You do want this log, if you’re debugging an issue, but you most definitely don’t want it for every request. The same works for results logs — take for instance ModSecurity: when I used to maintain my ruleset, I wouldn’t audit-log every request, but I had a special rule that, if a certain header was provided in the request, would turn on audit-logging. This allowed me to identify problems when I was debugging a new possible rule.

Unfortunately, my experience straddling open-source development and industry bubbles means I don’t have overall good hopes for an easy way to implement logging “correctly”. Both because correctly is subjective, and because I really haven’t found a good way to do this that scales all the way from a simple tool like my pdfrename to a complex Cloud-based solution. Indeed , while the former would generally be caring less about structured logs and request tracing, a Cloud software like my planned-and-never-implemented Tanuga would get a significant benefit from using OpenTelemetry to connect feed fetching and rendering.

Flexible and configurable logging libraries, such as are available for Python, Ruby, Erlang, and many more, provide a good “starting point” but by experience they don’t scale well between in and out of an organization or unit. It’s a combination of problems similar to the schema issue and the RPC issue: within an organization you can build a convention of what you expect logs to be, and you can pay the cost of updating the configuration for all sorts of tools to do the right thing, but if you’re an end user, that’s unlikely — besides, sometimes that’s untested.

So it makes sense that, up to this day, we still have a significant reliance on “simple”, unstructured text logs. They are the one universally accessible way to provide information to users. But I would argue that we should be better off to build an ecosystem of pluggable, configurable backends, where the same tool, without being recompiled or edited, can be made to output simple text on the standard error stream, or to a more structured event log. Unfortunately, judging by how the FLOSS world took the idea of standardizing services’ behaviours with systemd, I doubt that’s going to happen any time soon in the wide world… but you can probably get away with it in big enough organizations that control what they run.

Also, for a bit of fun related tidbit, verbose (maybe excessively so) progress logging is what made my reverse engineering the OneTouch Verio so easy: on Windows the standard error is not usually readable… unless you run the application through a debugger. So once I did that, I could see every single part of the code as it processed the requests and responses for the device. Now, you could think that just hiding the logs by default, without documenting the flag to turn them on would be enough — but as it turns out, as long as the logging calls are built into a binary, it’s not too hard to understand them while reverse engineering.

What this is meant to say is that, just because easy access to logs is a great feature for open source tools, and for most internal tools in companies and other institutions, the same cannot be said for proprietary software: indeed, the ability to obfuscate logs, or even generate “encrypted” logs, is something that proprietary software (and hardware) thrive on: it makes it harder to reverse engineer. So it’s no surprise if logs are a complicated landscape with requirements that are not only divergent, but at times opposite, between different stakeholders.

Why strcasecmp() and similar functions should not be used when parsing data

It might sound obvious to most experienced programmers, but it certainly is not obvious to most, which I’m afraid is a very bad thing since I’d really like to expect people who write code to understand at least a little bit of logic behind it.

I’m not going to talk about the problems regarding case insensitive comparison and locale settings (just remember that i and I are not the same character in Turkish), which still I expect most developers to ignore, but totally beside the point here, they are justified by not being linguists (unless they are Turks and then I’d worry).

What I’m talking about is the logic behind the comparison at all. In a normal string comparison you have a very easy workflow, each character of the string is compared, drop by at the first one that differs, and finish when they both arrive to the end. When you want to compare two strings case-independently, the comparison cannot just happen over the characters by themselves, they have to have the same case.

To achieve that you have many different options: lookup equivalence tables (up to 256 by 256 elements for ascii), lookup case-changing tables (twice), check if the character is in a given range, and so on. At any rate, it’s much more work than a simple comparison.

You can expect the library you’re using to be optimised enough so that the comparison does not take too long, so using strcasecmp() for a one-shot comparison is fine. What is not fine is, though, when you do parsing using it, like taking some token out of a file, and then start comparing it case-insentive to a series of known tokens. That’s a no-no since you’re going to require lookups or transformations many times in a row.

The easy way out of this is to ensure that all the reference tokens have a given case (lowercase or uppercase does not matter), and then convert the read token to the same case, so that you can just use the standard, fast, and absolutely non-complex case-sensitive string comparison.

It’s not that difficult, is it?

Update (2017-04-28): I feel very sad to have found out over a year and a half later that Michael died. The links in this and other posts to his blog are now linked to the archive kindly provided and set up by Jan Kučera. Thank you, Jan. And thank you, Michael.

Learning ADA and extending ELF analysis?

It seems like my motivation, since I left the hospital, is always falling down. Unless I’m doing something new and interesting, I’m unable to keep myself focused.

This is my reason to start the work on ruby-elf and the whole analysis thing. Unfortunately, doing the analysis that way does not seem to be the easiest way at all.

Add to that the missed challenge with C#. When I was first told I had to develop in .NET with C#, beside a first understandable visceral reaction to that, I was excited to the idea of learning a new language. It has been quite some time since I learn my last useful language. While I tried to learn LISP (ELISP to be exact), that is quite a bit jump for me, as I’m way too used to non-functional programming languages.

I’ve been wanting to look at ADA for quite a while, and after the last In Our Time podcast (I’m podcast-addicted lately), I decided it was the right time to at least start looking at the thing. It does sound quite interesting after reading a bit about it, so I’ll be trying to read about it in the next week in my spare time. It might come handy the next time I get a job to work on embedded stuff.

I admit I’m not sure how ADA support for SQL databases is, but if there is any kind of support, I’m tempted to rewrite part of my elf analysis code in ADA (and even if there is any, maybe I can do that to cowstat at least). The intrinsic support to multi-threading is what I’m more fascinated from, especially for things like cowstats that could easily analyse multiple file at once, rather than doing it sequentially.

I am really afraid of what the pancreatitis did to me on a spiritual/mental level, more than physical, lately. Not like the physical damage is nothing, it’s actually quite a lot; luckily I didn’t smoke or drink before, as now I can’t do it for sure (well, it wouldn’t have been good to do even if I didn’t have the pancreatitis, but who had similar experience knows what I mean ;) ). But the spiritual damage seems to be more than just fear to me. I really am thinking a lot of how much time I’m left, and how much I wasted my time before. I really wanted one day to find the right person, have a family, children, … and while the idea itself was already quite faint before (I’m too geeky to find a girl who can tolerate me), now it seems to be impossible altogether.

But nevermind this depressing thoughts, I sincerely think Summer of Code will give me at least something new to work on, with the students to mentor.. or at least I hope so ;) So please start working already on your applications!

Some notes about multi-threading

Ah, multithreading, such a wonderful concept, as it allows you to create programs in such a way that you don’t have to implement an event loop (for the most part). Anybody who ever programmed in the days of DOS has at least once implemented an event loop to write an interactive program. During my high school days I wrote a quite complex one for a game I designed for a class. Too bad I lost it :(

Multithreading is also an interesting concept nowadays that all the major CPUs are multi-core; for servers they were already for some time, but we know that mainstream is always behind on these things, right?

So, now that multithreading is even more intersting, it’s important to design programs, but even more importantly, libraries to be properly multithreaded.

I admit I’m not such a big expert of multithreading problems, I admit that, but I do know a few things that come useful when developing. One of these is that static variables are evil.

The reason why static variables are evil is because they are implicitly shared between different threads. For constants this is good, as they use less memory, but for variables this is bad, because you might overwrite the data another thread is using.

For this reason, one of the easiest thing to spot in a library to tell if it’s multithread-safe or not is to check if it relies on static variables. If it does, it’s likely not thread safe, and almost surely not thread optimised.

You could actually be quite thread safe even when using static variables, the easy way to do that is to have a mutex protecting every and all accesses to the variable, this way only one thread at a time can access it, and noone can overwrite someone else’s data.

That cause a problem though, as this serialises the execution of a given function. Let’s take for instance a function that requests data through the net with an arbitrary protocol (we don’t care which protocol it is), saves it on a static buffer, and then parse it filling a structure with the data received and parsed. If such a function is used in a multithreaded program, it has to be protected by a mutex, as it uses a static buffer. If four threads require access to that function almost simultaneously (and that might happen, especially on multi-core systems!), then the first one arriving will get the mutex, the other three will wait till the first one completed processing. Although in general, on a multicore system you’d then have other processes scheduled to be executed at that point, you’re going to waste time by waiting for a thread to complete its operation, before the next one can be resumed.

This is extremely annoying, especially now that the future of CPUs seems to be an increase in number of cores, rather than in their speed (as we’re walking around a physical limit of 3GHz as far as I can see). The correct way to handle such a situation is not to use a static buffer, but rather use a heap-allocated buffer, even if that is slightly slower for a single thread (as you have to allocate the memory and free it afterward); this way the four threads are independent and can be executed simultaneously. For this reason, libraries should try to never use static buffers, as they might not know if the software using them is multi-threaded or not.

When a library is blatantly not thread-safe, there is even a bigger problem, which can be solved in two ways: the first is to limit access to that library to a single thread. This way there are no problems with threading, but then all the requests that need to be sent to that library has to be passed to the thread, and the thread has to answer to them; while cheaper than IPC, ITC is still more expensive than using a properly thread-safe library.

The other option is to protect every use of the library with a mutex. This makes a library thread-safe if it’s at least re-entrant (that is, no function depends on the status of global variables set by other functions), but acts in the same way as the “big kernel lock” does: it does not allow you to run the same function from two threads at once, or even any function of that library from two threads at once – if the functions use shared global variables.

How should libraries behave, then, when they need to keep track of the state? Well there easiest way is obviously to have a “context” parameter, pointer to a structure that keeps all the needed state data, allowing two threads to use different contexts, and call the library simultaneously.

Sometimes, though, you just need to keep something similar to an errno variable, that is global and set by all your functions. There’s no way to handle that case gracefully through mutexes, but there’s an easy way to do that through Thread-Local Storage. If you mark the variable as thread-local, then every thread will see just one copy of that variable, and doesn’t need an explicit mutex to handle that (the implementation might use a mutex, I don’t really know the underlying details).

This is also quite useful for multi-threaded programs that would like to use global variables rather than having to pass a thread structure to all the functions. Take this code for instance:

/* Instantiated a few times simultaneously */
void *mythread(void *address) {
  mythread_state_t *state = malloc(sizeof(mythread_state_t));

  set_address(state, address);
  do_connection(state);
  check_data(state);

  do_more(state);
}

While for library API calls having a context parameter is an absolutely good idea, if the code has no reason to be reentrant, passing it as parameter might be a performance hit. At the same time, while using global variables in libraries is a very bad idea, for programs it’s not always that bad, and it can actually be useful to avoid passing parameters around or using up more memory. You could then have the same code done this way:

__thread mythread_state_t thread_state;

/* Instantiated a few times simultaneously */
void *mythread(void *address) {
  set_address(address);
  do_connection();
  check_data();

  do_more();
}

The thread_state variable would be one per thread, needing neither a mutex to protect it, nor to e passed once to every function.

There are a few notes about libraries and thread safety which I’d like to discuss, but I’ll leave those for another time. Two tech posts a day is quite a lot already, and I need to resume my paid job now.

Some more about arrays of strings

If you want to read this entry, make sure you read my previous entry about array of strings too, possibly with comments.

Mart (leio), commented about an alternative to using const char* const for arrays of strings. While it took me a while to get it, he has a point. First let me quote his comment:

If you have larger strings, or especially if you have strings with wildly differing length, then you can also use a long constant string that contains all the strings and a separate array that stores offsets into that array.

For example:

static const char foo[] =
    "Foo" 
    "Longer string" 
    "Bar";

static const int foo_index[3] = { 0, 4, 18 };

and then you can just do

#include  
#define N_ELEMENTS(array) (sizeof((array)) / sizeof ((array)[0])) 

int main(void)
{
    int i;
    for(i = 0; i < N_ELEMENTS(foo_index); ++i)
    {
        printf("%sn", foo + foo_index[i]);
    }
}

There’s perl and other scripts to auto-generate such an array pair from a more readable form. Some low-level GNOME libraries have one, and Ulrich Dreppers DSO Howto does too (they differ, having different pros and cons).

Thought it might also be useful to someone if dealing with largely differently sized strings :)

Of course if they are similarly sized, then it’s better to use the method described here. Sometimes it even makes sense to split the array up into two parts – one for the small strings that could use this method, and one for the larger variable sized ones using this method or just a different size that fits them all without wasting muc

For a simple example, the code the code he published is actually quite pointless, as the method of using const char* const works just the same way, putting everything in .rodata.

When this makes sense is in shared objects when using PIC. In those cases, even const char* const doesn’t get to .rodata, but goes into .data.relro. This is due to the way that type of array is implemented. As we’ve seen for the const char* case, the strings are saved in the proper .rodata section, but then the pointers in the array were saved to .data as being non-constant.

When we’re using PIC, the address at which the sections is loaded is not known at compile time, it’s the ELF loader (or whatever loader is used for the current executable format) which has to replace the address of the variables. Which means that while for the purpose of the C language the pointers are constant, they are filled in at runtime, and thus cannot reside on the read-only shared pages. They can’t be shared also because two processes might load the same page at different addresses, this is how PIE is useful together with address randomisation in hardened setups.

When using PIC, the code emitted would be:

        .section        .rodata
.LC0:
        .string "bar"
.LC1:
        .string "foobar"
        .section        .data.rel.ro.local,"aw",@progbits
        .align 16
        .type   foo, @object
        .size   foo, 16
foo:
        .quad   .LC0
        .quad   .LC1

What this method tries to address is the constant 4KB of dirty RSS page that every process loading a given library built with PIC would have, just to keep the .data.rel.ro information. So Mart’s method uses up a bit more processing time (an extra load) compared to the array of arrays of characters, coming more or less to the same performances of const char *const, allowing for variable-sized strings without padding, but without wasting a 4KiB memory page, trading that for readability.

It’s not entirely a bad idea, actually, and I should consider it more for xine-lib, although I doubt I can spare the 4KB page for all of them, as having a structure to pass information like ID description and other stuff like that ends up propping up more stuff in .data.relro anyway.

On the other hand, .data.relro is a COW page, but a COW could be avoided by using prelink: prelinking gives a suitable default address for a shared objects, which in turn should fill the .data.relro sections with the right value already. Of course, prelink does not work with PIE and randomised addresses.

I think we should try to make use of array of arrays of characters whenever possible anyway ;) Faster and easier.

In APIs you should always accept the stricter pointer

Yet another entry on some insights of C/C++ low level code, this time, rather than a performance issue, it’s a correctness issue, related to one of the warnings added by default by GCC 4.2: deprecated conversion from string constant to ‘char*‘. It will also be a shorter entry as I don’t have to dig into ASM code generated.

The content of this entry formed in my mind when I seen a lot of those warnings on the newest version of QSynth (which I’m committing to the tree right now). I first thought there was a pointer simply declared as char* rather than const char*, and then assigned a literal (“this is a literal” if you didn’t know the term).

Unfortunately the problem is not this simple; the warnings appear when using literals to call the fluidsynth API (of which QSynth is a frontend). And this because fluidsynth API declares all the string parameters as char*.

As the title of the entry suggest, this is not really good. You should always accept the stricter pointer you can, this means that if you need strings, you should accept a pointer to constant characters (const char*), unless you need to actually modify the string. In almost all standard cases you don’t need to modify the string.

This makes it possible for the compiler to stop you from changing the data in the string (which also stops you from using = rather than == if you’re going to do a comparison), and removes the cause of the warnings above.

As you seen in yesterday’s post, the const specifier in front of char makes it a pointer to constant characters, so the pointer can be changed, but the pointed characters can’t. This means that you’re not passing a constant parameter (like in the case of a const int parameter), you’re passing a variable pointer to a constant string.

The const specifier does not in any way require that the object resides on .rodata section, so that the content is certainly constant, as it’s more an indication to the compiler. On the other hand, not specifying const requires, even if GCC does not enforce this, that the object does not reside on .rodata section. You can probably read in this difference the main point of this entry:

You can pass a pointer to non-constant objects to a function expecting a pointer to constant objects, you shouldn’t pass a pointer to constant objects to a function expecting a pointer to non-constant objects.

Note that I used the form “shouldn’t” because there are libraries which functions take non-constant object pointers, even if they don’t change the content at all (and thus should accept constant object pointers). This seems to be the case for fluidsynth.

For this reason, if your API accepts a pointer to an object, and you don’t modify the object in any way, you should always use const in the parameter declaration. There are more implications when passing pointers to constant structures that have pointers to other structures, but that’s a topic for another day.

Now, to return on my original problem: do I fix fluidsynth to accept constant strings, and send the patch upstream? Or leave that to upstream to deal with?

Array of pointers and array of arrays

While doing microoptimisations on xine, I started considering one particular optimisation that might actually make some difference on the grand plan of things, is to change some array of pointers to array of arrays.

Talking on #gentoo-it, I realised that it’s not clear to everybody that there is a substantial difference between these two forms:

static const char *foo[] = { "foo1", "foo2", "foo3" };

static const char foo[][8] = { "foo1", "foo2", "foo3" };

The first is an array of pointers containing three pointers to the three literals, that will be anonymous entries in the .rodata section of a file.

The second is an array of arrays of characters, a single object containing four literals.

It’s easier to understand this when you actually see the assembler representation of the two above.

The first, in AMD64 assembler is compiled this way:

        .section        .rodata
.LC0:
        .string "foo1"
.LC1:
        .string "foo2"
.LC2:
        .string "foo3"
        .data
        .align 16
        .type   foo, @object
        .size   foo, 24
foo:
        .quad   .LC0
        .quad   .LC1
        .quad   .LC2

While the second is compiled this way:

        .section        .rodata
        .align 16
        .type   foo, @object
        .size   foo, 24
foo:
        .string "foo1"
        .zero   3
        .string "foo2"
        .zero   3
        .string "foo3"
        .zero   3
        .text
        .type   print_foo, @function

As you can see while the size (in this case) is the same (if I avoided the padding to [8] the strings, setting them to [5] which is the minimum required, the second would be smaller. As we’ll see, the first is actually using more memory, but that’s up for later.

In the first case, three strings are defined, they get an automatic label (LC0, 1 and 2) assigned, and then at the foo object, the addresses of the three strings are listed as quad (quad-words, 64-bit sized words, as are addresses on 64-bit architectures).

I’m not going to show the assembler code to access a member of the two objects, but I’ll try to describe it in words, as that’s also quite interesting:

foo[2];

Accessing the second member of foo returns a string (a pointer to characters, or an array of characters). But the way this is achieved with the two objects is different. In both cases there are actions that are actually calculated at buildtime, and others that need to be done at runtime. I won’t try to separate them, as they will probably change depending on optimisations and on the architecture used.

In the first object, the offset from the address of foo is calculated, this is 2 (the index to read) multiplied per the size of a pointer (8 bytes in 64-bit achitectures). Then the 8-bytes forming the pointer that is returned are read at the address (foo+offset).

In the second object, the offset from the address of foo is calculated, once again, but this time it’s 2 (the index to read) multiplied per the size of the sub arrays (again in this case, 8-bytes). Then the pointer that is returned is (foo+offset).

It might seem the same, but if you look closely, you can see that in the first case the pointer is read at (foo+offset), while in the latter the pointer is (foo+offset). This actually saves you from an operation.

As I said, the size of foo in both cases is the same, but the first case takes up more memory. This because in the first case foo contains the pointers to the strings, not the strings themselves, so you have to sum the size of the array itself with the size of the strings to actually have the occupied space. The result is then (8 (size of a pointer) * 3) + (5 (size of a string of 4 characters) * 3), which turns out to be 39; for the second case instead we just have 8 (size of an array of characters) * 3, which is 24.

Now let me explain why in the example I’m using [8] as size rather than using [5] which is the actual space the three strings in the example require. As you’ve read, the offset is calculated by multiplying the size of the string by the index we’re accessing. If the size of the array is 5, it would be 5*i, if it’s 8, it would be 8*i. Multiplication is an operation that even on modern CPUs take a bit of time, but even on the oldest 8086, 8*i can be simplified.

The value 8 is a power of two, it’s exactly 2 to the third power. As computers use a binary representation of data, multiplying for a power of two is just a matter of adding a zero to the right of a number (like we do in usual math when multiplying by ten). In programming that operation is called a (binary) left shift, and it’s usually implemented through a very fast CPU instruction.

This means that the operation 8*i can be replaced by i<<3 (left shift by three bits), which is faster than the 5*i in almost every possible CPU. So by padding the elements to a power of two boundary, access is made faster. And as we see in this case, we’re still using less memory than we would have with the “usual” method.

Now, there is another thing that you might have noticed in the two assembler listings: in the first one, the section is first set to .rodata to write down the three strings, and then it’s moved to .data to define the foo array. This is due to the way we declared foo:

static const char *foo[] = { "foo1", "foo2", "foo3" };

declares a static array of pointers to constant characters. So the characters are constant, and thus go to .rodata, but the pointers are not, so they need to go in .data, as you may modify them (and thus trigger a COW – Copy On Write – of the section’s memory page). This is actually a classic mistake and can often generate dirty RSS pages due to the possible COW.

In the case of the array, we declared it as

static const char foo[][8] = { "foo1", "foo2", "foo3" };

which declares a static constant array of arrays. In this case, the constant applies to the array and its members down to the characters, so the whole object is written into .rodata, making it shareable, and avoiding a possible COW trigger. Quite nicer.

As not always you can use the array of arrays (because the strings might have too different sizes, and then padding them to a power of 2 can waste way too much memory), you can change the array of pointers declaration in this way:

static const char *const foo[] = { "foo1", "foo2", "foo3" };

which declares a static array of constant pointers to constant characters. As you can guess, now that even the pointers are constant, the result is the following code:

        .section        .rodata
.LC0:
        .string "foo1"
.LC1:
        .string "foo2"
.LC2:
        .string "foo3"
        .align 16
        .type   foo, @object
        .size   foo, 24
foo:
        .quad   .LC0
        .quad   .LC1
        .quad   .LC2

As you can see now everything is once again in .rodata section, no COW triggers around.

Now I hope this entry will be useful for those of you who care about these microscopic optimisations, and for those of you who are interested in what the compilers produce at the end of the day.

Functional programming

As a programmer, I started quite early, when I learnt GW-BASIC on my sister’s PC, and then continuing (yes I know that it’s more a regression, but..) with the C64 BASIC and then QBASIC on MS-DOS 6. Probably most of you wouldn’t consider BASIC a programming language, but I was 7 years old, it was enough already. I then learnt VisualBASIC 5 through a CCE distribution that was free as in soda with some magazines (laugh as you wish, but this helped me when I was in high school, as one of the teachers was obsessed with Visual BASIC, even though at the time I would have preferred working with Borland C++ Builder, probably I would have been even faster to write the exercises if I had it).

Of course, when I was fifteen I understood that VisualBASIC was bad and didn’t allow me to do what I wanted, and so I started studying C++; it wasn’t until the second year of high school that I ended up meeting Pascal (for school), but it was piece of cake after studying C.

Then, I ended up learning the basic Python, and PHP, and of course Java. Not for school, no, as we studied C++ in high school, and I already knew enough of it to slack off, but I decided to look at those languages, and they paid off, even if I’m not too confident with Python, I was able to fix up Portage to run on FreeBSD at the start of Gentoo/FreeBSD.

Ruby came more or less last year, I was reading so much good about it, and I wanted to try it, and I loved it. I was never able to get over the basic Perl, it still makes me puke after too much of it.

So, this whole introduction was just to say, up until now I never got interested in any functional programming language. But lately I’ve been using Emacs, and from time to time I need to understand why some particular mode does not work, especially since I like looking for modes doing more stuff for me so I don’t have to do all of that myself (nxml, nxhtml, quilt.el, ebuild-mode… well okay, ebuild-mode I originally tried to write, and mostly failed, but thanks to our magnificent Emacs team we now have gentoo-syntax that works flawlessy). And sometimes I have wishes that I’d like to implement myself rather than doing like I did today (I did go to Ulrich asking him to implement something for me ;) ).

So in light of this, I asked to the LISP wizards in #gentoo-lisp for a good way to learn LISP, in particular the variant used by Emacs, and Ulrich suggested me the introductory text on Emacs LISP. I started reading it tonight, and I have to say that at least now I know how to read basic LISP.

As an addition, I found the podcast for CS1A from U.C. Berkley on iTunes, and loaded it on my iPod; okay it teaches Scheme, but it’s still functional programming, and some GNU tools (included LilyPond) seems to be written in Scheme, so I might need it one day, continuing to work as a maintainer :)

Oh and for who’s following xine’s bugtracker story, tomorrow I should have an interesting update for you all.