Tuesday, October 16, 2018

An API for configurations in a UNIX environment

Ruthless in purpose insidious in method

-- Orphan Black TV Series, season 3 Episode 8

Introduction

>>>The code together with Getting started documentation is available under github.<<<
>>>For a quick start please see: github<<<.

Some months ago I ran into a problem related to passing configuration information to a set of applications. I wanted to retrieve configuration information in the form of wrkdir = `pwd`/"wrkdir"-${HOSTNAME}/${USER}. Additionally I needed to access the configuration information from bash scripts, C++ programs and Python programs.

A configuration standard together with an API that would support what I needed was nowhere to be found.

In this post I describe a first cut of a small, simple but quite powerful configuration language together with a supporting API. It can be used in bash programs as well as in C++ programs. In later posts I will show an implementation of the corresponding Python API.

A small example

I'll use the following configuration file - test.cfg - to illustrate how the configuration machinery works:

namespace sys{
  du=`df -k . | grep '%' | grep -v Use | awk '{print $5}'`
  user=${USER}                                                
  usermachine=@"%{user}@`hostname -i`"
}
${USERMACHINE}=sys.usermachine    # export USERMACHINE environment variable
The configuration code illustrates:
  • line#1: namespace for grouping of variables
  • line#2: assign the output from execution of a command to a configuration file variable
  • line#3: assign an environment variable to a configuration file variable
  • line#4: interpolate a string containing the value of a configuration file variable as well as the result of a command
  • line#6: export the environment variable USERMACHINE into the environment

We can view the configuration by using the xconfig utility:

xconfig test.cfg

the output from xconfig is:

sys_du="51%"
sys_user="hansewetz"
sys_usermachine="hansewetz@10.0.0.4"

We can also read the configuration file from a C++ program as shown here:

#include "xconfig/XConfig.h"
#include <iostream>
using namespace std;
using namespace xconfig;

int main(){
  XConfig xfg("test.cfg");
  for(auto&&[name,value]:xfg())cout<<name<<": "<<value<<endl;
}

the output after running the program is:

sys.du: 51%
sys.user: hansewetz
sys.usermachine: hansewetz@10.0.0.4

The for loop prints all variables from the configuration file. The separator between the namespace and a variable name is a dot in this output. In the output from xconfig the separator is an underscore so that the variable can be used in a bash context.

xconfig utility program

Not yet done

Requirements on configuration language

There are a few basic requirements that I put on the configuration language and the supporting API. To start with, it must be possible to access environment variables. I also want to be able to set variables in the current environment.

Next, I should be able to retrieve the output from a command and store it in a variable. I must also be able to concatenate strings and/or integers to form longer strings.

To avoid name clashes in the space of variable names it must be possible to create logical groupings of variables. In the previous example the namespace construct was used to group variables in a sys namespace.

For convenience, I want to be able to catenate environment variables, output from commands, normal variables as well as strings and integers inside a string. In the example above I used interpolation (activated by prepending the string with an at sign) to evaluate the value assigned to the variable usermachine.

Requirements on the configuration API

Requirements on the API are fairly simple. First it must be possible to get a complete listing of all variables in the configuration file. For convenience it must also be possible to select variables by name, by regular expression or by namespace.

Before retrieving variables the configuration must be readable from a file or a stream. The API should expose a single method to make this as painless as possible.

Finally, for debug purpose it must be possible to dump various types of information related to the parsing of a file . For example, in a complex configuration file we might want to see how the internal machinery of the parsing evaluates certain expressions.

Additional requirements

The configuration language should not bea full fledged programming language. The syntax should be simple, it must be easy to read and it must be fairly easy to learn. Specifically I want to avoid an XML type syntax since large XML files are notoriously difficult to read by humans.

Configuration language

The language follows a C like style where variable names consists of alphanumeric characters including the underscore character. Variables can be assigned string or integer values:

user = "hansewetz"             # 'user' is a string
maxthread = 20                 # 'maxthreads' is an integer

Environment variables are accessed by using a bash style notation:

user = ${USER}             # 'user' is assigned the value of environment variable 'USER'
user1 = $USER              # works without braces also

Normal variable s are access by using their name directly or by prefixing them with '%':

user = @"${USER}"                        # 'user' is assigned the value of environment variable 'USER'
userAtMachine= @"%{user}@`hostname -i`"  # access variable 'USER' inside a string

Commands can be executed by enclosing the the command inside back-quotes:

machine = `hostname`   # get name of machine

The plus operator concatenate strings and adds integers:

user = "hans"+"ewetz"         # user = "hansewetz"
minthreads = 10
maxthreads = minthreads + 8   # maxthreads = 18

An expression on the right hand side of the assignment operator can access a variable using dot separated namespaces:

namespace system{
  user = $USER                     # user = "hansewetz"
  machine = `hostname -i`          # machine = "10.0.0.4"
}
userAtMachine = @"%{system.user}@%{system.machine}"   # userAtMachine = "hansewetz@10.0.0.4"

As the previous example showed strings can be interpolated. During interpolation environment variable, normal variables as well as commands are interpolated. A variation of the previous example is shown here:

namespace system{
  user = "@USER"
  machine = `hostname -i`          # machine = "10.0.0.4"
}
userAtMachine = @"%{system.user}@%{system.machine}"   # userAtMachine = "hansewetz@10.0.0.4"

There are some restrictions on how variables can be used. A variable cannot be assigned to once it has a value. Neither can a namespace qualified variable be used on the left hand side of the assignment operator.

Design notes

The design is relatively straight forward:

  • bison/flex is used for parsing the configuration
  • during parsing opcodes are written as a program into a virtual machine
  • the virtual machine executes the program populating the virtual maches memory with the result
  • a user uses an extractor to retrieve data from the memory of the virtual machine

The design is shown in this diagram:

Implementation notes

The virtual machine is implemented as a simple stack machine tailored specifically for this project. The name of the virtual machine is MMVM - Mickey Mouse Virtual Machine. The MMVM currently supports 13 opcodes. Among them are simple operation such as 'push value on stack' or 'store value in memory'. More complex operations such as 'evaluate a command in a shell and store output on stack' are also supported.

Now, one might wonder if it is not overkill to implement a VM just for parsing a configuration file. As it turns out, using a VM makes it almost effortless to support and experiment with new features in the configuration language. Since the MMVM implementation is only around 330 lines of C++ code it can be implemented in only a few hours.

The MMVM currently supports strings and integers. The support is not extensive. Two integers can be added, two strings can be concatenated and integers can be numerically added.

Evaluation of a configuration file is done in two steps:

  • compile the file into a program (generate a program consisting of op codes for MMVM)
  • execute program (sequence of opcodes

Namespaces are used to scope variables. The full name of a variable is a series of dot separated namespaces followed by the name of the variable. It is not necessary to use a fully namespace qualified name of a variable. When namespaces are left out, the compiler looks for the variable in the enclosing scope(s) . For example:

# test1.cfg
a = 18
namespace ns{
  namespace ns1{
    b = 19
    namespace ns2{
      c = b       # 'b' is found one step up
      d = ns1.b   # same as previous
      e = a       # 'a' is found in outer most scope
    }
  }
}

The output from executing: xconfig test1.cfg:

a="18"
ns_ns1_b="19"
ns_ns1_ns2_c="19"
ns_ns1_ns2_d="19"
ns_ns1_ns2_e="18"

When interpolating strings it must be possible to handle namespaces the same way as they are handled when referencing variables outside string interpolation. In order to reference variables without specifying the full namespace path, we need a table that contains information about the location of a variables within namespaces. This is done by building the table at compilation time and using it at runtime during string interpolation.

Downloading and building the xconfig API

Please see xconfig under github.

Bugs

Not yet done

Improvements

Not yet done