This Time Self-Hosted
dark mode light mode Search

Parsing configuration files

One of the remaining big issues I have with feng, one of the areas that we haven’t yet rewritten completely from since I joined, is the code to parse its configuration file. I have written about it not too long ago but I hadn’t received enough comments to move me away from the status quo yet. But since we’re now trying our best to implement whole new features that will require new configuration file options, it’s time for the legacy code to be killed.

In the previous post I linked, I was considering the use of an ISC-style configuration file derived from the syntax used by dhcpd.conf; since then I decided to switch on the very-similar bind-style syntax used by named.conf – it’s almost the same, with the difference that there are no “top-level” options, and that should make it easier to parse – and then I even tried to look into re-using the parsing code directly from bind.

Interestingly, the bind distribution contains an almost standalone configuration file parser code, with a prebuilt lexer. Unfortunately, re-using that code directly was not an option: while it was providing a clean, split interface, its backend code is very much entangled with the bind internals; including, but not limited to, their own memory allocators. Trying to reduce these dependencies would require as much work as creating something anew.

So the other night I decided I would find a way to implement the new parser… I spent the whole night, till 10am the morning after, to find possible alternatives; the obvious choice would have been using the good old classic unix tools: lex & yacc… unfortunately the tutorials that I could Google over, and the not-really-mine copy of the eponymous O’Reilly book didn’t help me at all. After deciding that half the stuff I was reading was obsoleted and old, I decided to tackle the problem in another direction.

John Levine – of Linkers & Loaders fame, a cornerstone for linker geeks like me – was one of the authors of the original lex & yacc book, the last edition of which was published in 1992; he has since updated the book by writing flex & bison and describing the modern versions of the tools. Thanks to O’Reilly’s digital distribution buying a copy of the book was both fast and cheap: with the 50% off discount over two books I got both that and Being Geek – a book I have been curious about for a while but I didn’t care just enough to buy it alone – for $20 total. Nice deal.

More importantly, not half an hour after downloading the PDF version of the book I was able to complete the basic parser… which makes it a totally worthwhile investment. Some of the problems I was facing were mostly due to legacy of the old tools, and with modern documentation at hand, it became trivial to implement what I needed. On the other hand, it turned out to be a verbose, and repetitive task, so once again I resolved to my “usual” Rube Goldberg machines by using XML and XSLT… (I really really have to find a different approach sooner or later).

The current code I’m working on has a single, short XML document that describes the various sections and entries in the configuration file, and a number of (not-too-long) stylesheets that produce the input for flex/bison, and a header file with the C structures with the options; add to this, is a separate source file with pure-C callbacks to commit the read configuration data into C structures, presetting defaults where needed, and validating the semantic of the code.

Right at this moment, feng has the new config parser set in; together with the stuff I developed for it, I also posted patches for the Autoconf Archive to improve the checks for flex and bison themselves (not simple lex/yacc), and a few more fixes there. All in all, it seems to be a decrease of many kilobytes of compiled object code size.

Comments 4
  1. for configuration files I still prefer Domain Specific Languages (DSL) with a curly brackets syntax (C-like); on the subject there is the excellet book “Domain Specific Languages” from Martin Fowler:http://my.safaribooksonline…p.s.: I use a curly brachets syntax because is more easy to convert the configuration file in other data exchange formats (like JSON)

  2. My favorite hammer for configuration files is libConfuse (…. Syntax is a bit off from “ISC-style” (assignments require “=” signs), but it does everything I have ever needed to do (including the sections, nesting, and lists of “ISC-style”).Also, I have to disagree with “equilibrium”. I find it utterly frustrating that every developer thinks configuration files are an opportunity to brush up on lexing/parsing, and invent their own language for it. Think of the poor sysadmins..

  3. I don’t consider libConfuse nowadays more than in passing. I have tried using it before, but the code is far from stable, upstream is not really communicative and their insistence on keeping with CVS is, simply put, appalling.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.