Parsing configuration files

One of the remaining big issues I have with feng, one of the areas that we haven’t yet rewritten completely from since I joined, is the code to parse its configuration file. I have written about it not too long ago but I hadn’t received enough comments to move me away from the status quo yet. But since we’re now trying our best to implement whole new features that will require new configuration file options, it’s time for the legacy code to be killed.

In the previous post I linked, I was considering the use of an ISC-style configuration file derived from the syntax used by dhcpd.conf; since then I decided to switch on the very-similar bind-style syntax used by named.conf – it’s almost the same, with the difference that there are no “top-level” options, and that should make it easier to parse – and then I even tried to look into re-using the parsing code directly from bind.

Interestingly, the bind distribution contains an almost standalone configuration file parser code, with a prebuilt lexer. Unfortunately, re-using that code directly was not an option: while it was providing a clean, split interface, its backend code is very much entangled with the bind internals; including, but not limited to, their own memory allocators. Trying to reduce these dependencies would require as much work as creating something anew.

So the other night I decided I would find a way to implement the new parser… I spent the whole night, till 10am the morning after, to find possible alternatives; the obvious choice would have been using the good old classic unix tools: lex & yacc… unfortunately the tutorials that I could Google over, and the not-really-mine copy of the eponymous O’Reilly book didn’t help me at all. After deciding that half the stuff I was reading was obsoleted and old, I decided to tackle the problem in another direction.

John Levine – of Linkers & Loaders fame, a cornerstone for linker geeks like me – was one of the authors of the original lex & yacc book, the last edition of which was published in 1992; he has since updated the book by writing flex & bison and describing the modern versions of the tools. Thanks to O’Reilly’s digital distribution buying a copy of the book was both fast and cheap: with the 50% off discount over two books I got both that and Being Geek – a book I have been curious about for a while but I didn’t care just enough to buy it alone – for $20 total. Nice deal.

More importantly, not half an hour after downloading the PDF version of the book I was able to complete the basic parser… which makes it a totally worthwhile investment. Some of the problems I was facing were mostly due to legacy of the old tools, and with modern documentation at hand, it became trivial to implement what I needed. On the other hand, it turned out to be a verbose, and repetitive task, so once again I resolved to my “usual” Rube Goldberg machines by using XML and XSLT… (I really really have to find a different approach sooner or later).

The current code I’m working on has a single, short XML document that describes the various sections and entries in the configuration file, and a number of (not-too-long) stylesheets that produce the input for flex/bison, and a header file with the C structures with the options; add to this, is a separate source file with pure-C callbacks to commit the read configuration data into C structures, presetting defaults where needed, and validating the semantic of the code.

Right at this moment, feng has the new config parser set in; together with the stuff I developed for it, I also posted patches for the Autoconf Archive to improve the checks for flex and bison themselves (not simple lex/yacc), and a few more fixes there. All in all, it seems to be a decrease of many kilobytes of compiled object code size.