Parameters handling and COWs.

One common problem while fixing copy on write pages is finding an array of structures that hold parameters name and their value, usually loaded up either from network or from configuration files. One of the classical declarations of those structures is:

struct {
    char name[16];
    int value;
} parameters[20] = {
  { "foo", 34 },
  { "bar", 50 }
};

I put a character array in the struct so to show that using them does not make everything magically working.

What is wrong with this structure? Well it is not constant, so it gets written to .data, which is a COW section. You can’t really make it constant either as the value of the parameters has to change, if you’re using it to set the parameters for the program to work with. As long as the user is not going to change the default parameters, it’s not going to use extra memory, luckily, but if the user changes even just one parameter, COW is triggered.

This is actually quite common, it’s even used by Xorg drivers like radeon, often with extra flags like type, and unions, and so on.

Most people would react to this by using pointers to strings so that those get to .rodata and just their pointer is added to the parameters object. In the best case, those pointer will end up in .data, when using PIC, they’ll be added to .data.rel; the difference between the two is, again, that the first is only copied over when changing the values from the default, while the latter is copied over every time the executable or shared library is loaded (unless using prelink, if that works at all actually).

My proposed solution to that is simply to use two different objects. Somehow one limitation of C shows off here: being unable to write in columns ;)

static const char param_names[16][20] = { "foo", "bar" };
char param_values[20] = { 34, 50 };

If there is more to the names, one might use a structure instead, think for instance of this example, where the parameter can be either an integer, a float or a string:

static const struct {
    char name[16];
    enum { PARAM_INT, PARAM_FLOAT, PARAM_STRING } type;
} parameters[20] = {
  { "foo", PARAM_INT },
  { "bar", PARAM_FLOAT },
  { "baz", PARAM_STRING }
};

union {
    int intval;
    float floatval;
    char *stringval;
} param_values[20] = {
  { .intval = 34 },
  { .floatval = 50.0 },
  { .stringval = "default" }
};

Now in these last two examples the metadata about parameters (name, and type) can be written off to .rodata, while the values are saved in .data; at the same time, saved for the last one with the string, the parameter values will be shared as long as the defaults are not changed. Nice uh?

The only thing to remember is to make sure the index of the two arrays is the same. This is why I think it’s a pity that there is no way to write in columns in C, it would make maintaining this code quite easier.

When the list of parameters to maintain is quite big, the best way to tackle down the problem is to write a script that translate a table of metadata and default values into C code. It should be quite easy to do with most tools like Perl or Ruby or AWK. This way there is little to no overhead in maintaining the two arrays separate, and the memory use will be optimised.

As for the reason why I used an union in the last example, I have seen more than once structures defined the same way of that struct, carrying all three kinds at once. Those structures tend to waste a lot of space. Once you’re certain that only one of the three entries is used for a given parameter, using an union also limit the amount of memory the table use. This is one thing unions have been designed for, after all.

23 thoughts on “Parameters handling and COWs.

  1. Uh huh. So saving a cow is more important than having readable, maintainable, sane, efficient code?

    Like

  2. Uh this is sane and efficient code. As for maintainable and readable, I agree it is not (it would be if you could write C code in column), so I already proposed a solution to that:

    When the list of parameters to maintain is quite big, the best way to tackle down the problem is to write a script that translate a table of metadata and default values into C code.

    Like

  3. The solution you really want is std::initializer_list…But no, the code is not efficient as soon as you stick a union in there because unions force the compiler to assume aliasing. It is also not efficient as soon as you start keeping parameters in different arrays because it breaks locality in typical access patterns (where several members of what would have been the struct are accessed one after another).

    Like

  4. Sure, the solution _for C_ is to use a _C++_ feature, and you talk about sane code? Please shut up if you can’t give an answer that doesn’t involve using a fragile thing like C++ and STL? Or you’re doing that just so you can waste more of my time?Anyway, it also breaks caching to keep it in a COW page as a big structure could end up split between two different pages. As for unions, modern compilers should be able to optimise their usage. Plus parameters are not you usually accessed many times once you have loaded them (from whatever source), so the eventual hit of using union is usually worth, compared to the probably doubled memory usage.Sequential access to the parameters structure is only needed during load, by the way: most other time what you’re doing is random access to it to get the value of a given parameter. Using a script to generate the code you’ll also have define with the array index, to avoid iterating through the whole parameters list to find the one you need.Do you have any idea about efficiency that does not come straight out of a theory book?

    Like

  5. C++ is the solution to C’s shortcomings. And std::initializer_list has got nothing to do with the STL.No, modern compilers cannot optimise unions. They aren’t allowed to.And we’re not talking sequential access of parameters. We’re talking sequential access of members of the same struct. In the real world, if you access whatever->name, there is a very high chance that an access to whatever->value will follow very soon after. If instead you have to do *whatever_name and *whatever_value, you’re pulling from different cache lines and are much more likely to stall. Doubling the number of not-in-cache reads is a slowdown, not a speedup.But then, that’s the problem here — your ideas of efficiency are based upon a few rather arbitrary things that you happen to have written a tool to measure. Your ideas of efficiency should instead be based upon real world usage measurements, and your optimisations should be directed at only those things that make a measurable difference in an area where it has been demonstrated that performance matters.

    Like

  6. C++ is not the solution to anything but over-engineers’ dreams.And yeah you usually have to access the value when you access the name, but this does not always happen, if you have to analyse 10 parameters given by the commandline on a 100 parameters array, you usually have to iterate 100 times around the array to find the right one, for each parameters, this mean 1000 accesses to the names’ array and just 10 for the values’ array; you can reduce the iterations by stopping as soon as you find the parameter of course; and even better, you can make sure the array is sorted and use a binary search, but that is not sequential again..And you don’t usually have to _read_ the parameter’s value during those iterations, just set it, so the cacheline argument is bogus.Your problem I suppose is to only read textbooks and being unable to see what most real life code does. And no, most real life code does not use C++.

    Like

  7. No, you do not need 1000 accesses, nor even 10 * log(100) accesses. You need 10 accesses. You also need a better data structure, which is conveniently available as std::tr1::unordered_set<>, and conveniently constructible in rodata using std::initializer_list<>.Doing this is what we call ‘writing efficient code’.

    Like

  8. And yet again, your only way out of a problem is C++ and STL.Go rewrite the Linux kernel code in C++ then, just stop trying to waste my time with your “only C++ is good” solutions.

    Like

  9. C++ is not the only solution. It is merely the easiest efficient solution — implementing std::tr1::unordered_set<> in C and getting an implementation half as good as the one shipped with your C++ compiler would take several days. And, uh, no-one’s talking STL here.You also appear to be confusing userland programming and kernel programming. Unix, by nature of its public interface, is inherently tied to not using any programming language feature invented since 1970, and as such is an extremely different problem domain. I wouldn’t tell you to use C++ for writing small web applications either…

    Like

  10. Uh, any Unix? Are you sure what you are saying? Are you sure that you can’t use for instance C99 to write a Unix-compatible kernel? Do you have any clue what you’ve been saying up to now or do you need an UML activity diagram to understand it?

    Like

  11. There is a huge difference between what C99 adds to C and what C++ adds to C. C99 is merely a few little extras that don’t affect the underlying execution, flow or data model.

    Like

  12. spb, you’re just bored because you can’t troll on the forums anymore, uh? You should also try to understand that it’s not C++, it’s just Paludis to be broken when switching to a different linking rules model. Stop acting like an idiot. Thanks.Ciaran, you just changed idea then that you can use some “programming language feature” that was invented after 1970? Or you just try to cover your ass after saying stupid stuff? Please stop here, you’re just wasting my time.

    Like

  13. I’m fairly sure that wasn’t spb… I’m also fairly sure Paludis works just fine with as-needed, despite as-needed imposing restrictions that the standards do not…And by “feature” I do not mean a few new data types, keywords and bits of syntactic sugar. I mean something of the scope of templates, the C++ class system, namespaces, the C++ overload resolution rules, exceptions, concepts, l/r-value references, a defined memory model, compiled lambdas or non-copy variable semantics.

    Like

  14. > it’s just Paludis to be broken when switching to a different linking rules modelSo… don’t switch to a different linking rules model and then blame the applications? (Besides what ciaranm already said about Paludis not actually breaking.)

    Like

  15. Well if that isn’t spb, it’s still as bored and as clueless.And if you count as “feature” of a language… a whole different language, I do think you’re totally insane. And as I’m bored by now, comment moderation has been enabled. You can guess that constructive comments (like the first one Ciaran made) will be allowed; stupid comments won’t be.

    Like

  16. Going through the parameters list will not incur in any penalty when the code Diego suggested is used on “modern” (less than 5 years old) x86 CPUs. On PPC and other non register-starving architectures (x86_64) with decent L1 caches, the new code could be even faster.Please show code (asm code with annotation) to prove that the new code to read the params will be slower.The aliasing in the union will enable aliasing, but that is not a big problem at all as (well designed) parameter checking is done at beginning of the app only.And please, let’s talk about facts, not speculation or personal attacks.

    Like

  17. looks like I showed up late to the party…..anyways, back to the topic at hand,I did some experimentation on this exact subject a few weeks ago and was happy with the results of just marking the string portion of the struct as const. The result being, as you said, a pointer .data to addresses in .rodata. I think the “cost” of having the pointers in .data is worth the huge increase in readability.

    Like

  18. It really depends. If you assume you’re never going to build that code with PIC, it might work out, but if you add PIC (and PIE) to the thing, it most likely will add up to the cost.While the readability of the code shown here is quite bad, it should be quite easy to produce working code out of a very readable description file. If I have more time I’ll propose a solution to that in the next days, using AWK.

    Like

  19. I agree, generating the code would certainly work, it just sounds messy. I split up my test code to group the constants and the variables in different arrays, and with PIC, the difference in the variable/const storage is (combined, split):43c43< .section .data.rel.local,”aw”,@progbits—> .section .data.rel.ro.local,”aw”,@progbitsBut the asm for manipulating the variable data is significantly reduced when they are split up. So you may be right about the additional costs adding up.The meta-programming approach to making the split arrays is probably a good idea any time it can be done cleanly.

    Like

  20. You just need to apply the appropriate Makefile magic to have it work seamlessy for the rest of the project lifetime :)e.g parameters.c is built from parameters.tmpl (pick a better extension) using this awk line. As soon as the timestamp differs (you update the template file that has it in readable form), the generated file will be automatically rebuilt if all is right.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s