<- Avoiding code injection | Example 1: Modifying a config file | Example 2: Unraveling the X-Y problem ->

Example 1: Modifying a config file

This problem comes from a #bash user. We have about 90% of the specification, which is far better than average. I rate this problem's difficulty as medium.

Specification

Goal: Modify a configuration file (example shown below). For each line that identifies a server (format shown below), extract the IP address from the line, and attempt to ping it. If the ping is unsuccessful, comment out the line. If the ping is successful, uncomment the line. Server lines may or may not already be commented. If any line is altered, restart the foobard service.

File format:

options => {
  log_stats => 86400
  tcp_timeout => 15 ; zonefile-style comment
  include_optional_ns => true
  listen => [ 0.0.0.0 ]
}

plugins => {
  weighted => {
    multi = false # default
    service_types = up
    up_thresh => 0.1 # default

    xlaunch => {
      server1 = [ 1.1.1.1, 50 ]
      server2 = [ 2.2.2.2, 50 ]
      server3 = [ 3.3.3.3, 50 ]
     }
  }
}

Lines which identify a server have a hostname beginning with the literal string "server", and conform to this format:

      hostname = [ IP, nn ]

IP addresses are in IPv4 dotted quad format. Server lines may already be commented out (with "#"), or not.

It is not stated whether the "zonefile-style comment" character (";") may be used to comment out a server line. It is not stated whether trailing in-line comments may be present on a server line, or whether they should be preserved if found.

Discussion

At this point, the specification is sufficient for a program to be written. To know whether the program is 100% correct for all inputs would require additional information, but this level of clarity is enough for the purposes of this document. You may want to try writing a solution before reading the rest of this page.

The #bash user's initial thought was to make one pass over the input file to retrieve a list of all the server IPs, ping each one, and then use a GNU sed -i command to edit the file (requiring an additional pass + copy) for every single IP.

<Prelude2004c> i was thinking sed -i
<greycat> Do not think sed.
<greycat> Think "while read".

<greycat> Read FAQ 1.  Read the input file, write the output file.  One line at a time.  No sed, no grep.
<Prelude2004c> sorry greycat. but don't i replace text inside existing file ?

<Prelude2004c> if i wrote to new file. i have to then move file back to original right?
<greycat> Yes.

<Prelude2004c> doesn't something like sed -i do in-place replacemnt ?
<greycat> Doesn't matter what sed -i does because it's the *wrong tool* for this job.

(Lots more in this vein.) Getting people to stop pursuing dead ends like "I know! I'll launch one hundred sed commands... by the way, I can't figure out how to write the sed commands" is one of our major themes.

Strategy

As I implied, the preferred strategy for this problem is to read from one file, and write to another. Each line that we read from the input file is either a "server line", or not. If it's not, we just write it out without any processing or modification. If it's a server line, then we have to parse it, do a ping test on the IP address, and construct a new server line to write to the output file.

The complex nested structure of the input file, it turns out, can be ignored for this particular problem. We were asked only to parse server lines, where the hostname begins with the literal string "server". No knowledge is required of the semantics of the file outside of these lines. We don't need to know what section we're in, how deeply nested we are, or anything else.

The hardest part, obviously, is parsing the server line. It's clear that we need to extract the IP address. But in order to properly reconstruct the new server line, we actually need five (or possibly six) pieces of information:

The hostname
The IP address
The second number
The amount of leading whitespace
Whether the line was originally commented out
(Unspecified: trailing in-line comments)

Let's assume that we don't need to preserve trailing comments. In addition to these five pieces of information, we also need to know whether we're actually looking at a server line, or not.

There is a tool that allows us to determine whether an input string matches a given format, and if so, to extract various pieces of it: the RegularExpression. It's not pretty, but it seems like a good match for this problem.

Bash's [[ command has an =~ operator that lets us match a string against a regular expression (ERE), with the matching portion and each matching parenthesized subexpression being saved in an array named BASH_REMATCH. I will be using this tool.

Implementation

We need to read a line at a time from one file, and write to another. So our primary program structure will be a while read loop. We can't use IFS field splitting here, because we need to preserve the non-server lines verbatim, and because our parsing will be done by a regular expression, not by field splitting. So, we start with:

while IFS= read -r line; do

done < inputfile > outputfile &&
mv outputfile inputfile

We also wanted to keep track of whether we have made any modifications, and if so, to restart something (which was not well specified, so I'll just provide a generic restart command, which can be altered later). So, let's add a variable:

changed=0
while IFS= read -r line; do

done < inputfile > outputfile &&
mv outputfile inputfile

if ((changed)); then
  service foobard restart
fi

Those are the easy parts. The hard part will be processing the server lines.

Next, I need to write a regular expression that performs two goals:

Validates whether an input line is a server line or not
Extract the 5 pieces of information we need

The file format we were given appears to have highly flexible whitespace between tokens, so I'm going to assume that all whitespace is optional -- meaning, I will have to put [[:space:]]* in between virtual every pair of tokens in the regex. This is going to make it long and cumbersome, so I'll do my best to ease readability. I'll start by making a variable to hold that piece, and reusing it liberally.

s='[[:space:]]*'
re="^($s)(#?)$s(server[[:alnum:]]*)$s=$s\[$s([[:digit:].]*)$s,$s([[:digit:]]*)$s\]$s\$"

Note the 5 captures (parenthesized sub-regexes): one for the leading whitespace, one for the comment character (assuming we can only comment these lines with "#" and not with ";"), one for the server name (assuming no hyphens, underscores, dots, ...), one for the IP, and one for the second number. If the input matches this regex, then we'll know it's a server line, and we'll have the 5 pieces of it that we need to ping-test the IP, and construct the new server line.

Since I built up the regex using a variable, I had to do it inside double quotes (which is unusual). Therefore, I want to double-check that I have my backslashes correct. I want the $ at the end to be literal, because it will be the end-of-string anchor in the regex. I want the \[ and \] to be literal with backslashes, because I want them to stand for literal [ and ] characters in the regex, respectively. So, I test at a shell prompt:

$ echo "\[\]\$"
\[\]$

Looks good. Next, I will test the entire regex at a shell prompt:

$ s='[[:space:]]*'
$ re="^($s)(#?)$s(server[[:alnum:]]*)$s=$s\[$s([[:digit:].]*)$s,$s([[:digit:]]*)$s\]$s\$"
$ in='      server1 = [ 1.1.1.1, 50 ]'
$ [[ $in =~ $re ]] && declare -p BASH_REMATCH
declare -ar BASH_REMATCH=([0]="      server1 = [ 1.1.1.1, 50 ]" [1]="      " [2]="" [3]="server1" [4]="1.1.1.1" [5]="50")

We got output from declare, so the regex validation passed. We ignore array element 0, because that's just the entire input line. Array element 1 is our leading whitespace, 2 is our optional comment indicator, 3 is the hostname, 4 is the IP, and 5 is that other number. Looks great!

Now, we just put it all together:

   1 #!/bin/bash
   2 changed=0
   3 s='[[:space:]]*'
   4 re="^($s)(#?)$s(server[[:alnum:]]*)$s=$s\[$s([[:digit:].]*)$s,$s([[:digit:]]*)$s\]$s\$"
   5 
   6 while IFS= read -r line; do
   7     if [[ "$line" =~ $re ]]; then
   8         # This is a server line.  Use simple names for the extracted pieces.
   9         white=${BASH_REMATCH[1]}
  10         wascomment=${BASH_REMATCH[2]}
  11         host=${BASH_REMATCH[3]}
  12         ip=${BASH_REMATCH[4]}
  13         number=${BASH_REMATCH[5]}
  14 
  15         # Perform a ping test of the IP.  Linux ping syntax.
  16         if ping -c 1 "$ip" >/dev/null 2>&1; then
  17             comment=
  18         else
  19             comment="#"
  20         fi
  21         if [[ "$comment" != "$wascomment" ]]; then
  22             changed=1
  23         fi
  24 
  25         # Construct and write output server line.
  26         printf '%s%s%s = [ %s, %s ]\n' "$white" "$comment" "$host" "$ip" "$number"
  27     else
  28         # Not a server line.
  29         printf '%s\n' "$line"
  30     fi
  31 done < inputfile > outputfile &&
  32 mv outputfile inputfile
  33 
  34 if ((changed)); then
  35     service foobard restart
  36 fi

Testing

Testing is not optional, especially with a task of this complexity.

You probably don't want to actually restart the service during testing, so put an "echo" in front of the restart line for now. Remember to remove it when we're done testing. (Alternative: test as a user who is not privileged to restart the service; the restart command should fail noisily.)

Start with the input file we were given, and run the script. Then examine the "inputfile". Since all three IPs are fake, they should all be commented out, and the script should have echoed the service restart line. If you run it again, the "inputfile" should remain unchanged, and the script shouldn't echo anything to the terminal.

Now try putting in a real IP that you can actually ping (like 127.0.0.1 or 8.8.8.8). Start with that line commented out, and make sure the comment character gets removed. Start with it uncommented, and make sure it stays that way.

When you're satisfied that it works, remove the "echo", change "inputfile" to the actual pathname that you want it to operate on, change "outputfile" to a usable temporary pathname in the same file system as the "inputfile", and install the script.