<- Avoiding code injection | Example 1: Modifying a config file | Example 2: Unraveling the X-Y problem ->
Example 1: Modifying a config file
This problem comes from a #bash user. We have about 90% of the specification, which is far better than average. I rate this problem's difficulty as medium.
Specification
Goal: Modify a configuration file (example shown below). For each line that identifies a server (format shown below), extract the IP address from the line, and attempt to ping it. If the ping is unsuccessful, comment out the line. If the ping is successful, uncomment the line. Server lines may or may not already be commented. If any line is altered, restart the foobard service.
File format:
options => { log_stats => 86400 tcp_timeout => 15 ; zonefile-style comment include_optional_ns => true listen => [ 0.0.0.0 ] } plugins => { weighted => { multi = false # default service_types = up up_thresh => 0.1 # default xlaunch => { server1 = [ 1.1.1.1, 50 ] server2 = [ 2.2.2.2, 50 ] server3 = [ 3.3.3.3, 50 ] } } }
Lines which identify a server have a hostname beginning with the literal string "server", and conform to this format:
hostname = [ IP, nn ]
IP addresses are in IPv4 dotted quad format. Server lines may already be commented out (with "#"), or not.
It is not stated whether the "zonefile-style comment" character (";") may be used to comment out a server line. It is not stated whether trailing in-line comments may be present on a server line, or whether they should be preserved if found.
Discussion
At this point, the specification is sufficient for a program to be written. To know whether the program is 100% correct for all inputs would require additional information, but this level of clarity is enough for the purposes of this document. You may want to try writing a solution before reading the rest of this page.
The #bash user's initial thought was to make one pass over the input file to retrieve a list of all the server IPs, ping each one, and then use a GNU sed -i command to edit the file (requiring an additional pass + copy) for every single IP.
<Prelude2004c> i was thinking sed -i <greycat> Do not think sed. <greycat> Think "while read". <greycat> Read FAQ 1. Read the input file, write the output file. One line at a time. No sed, no grep. <Prelude2004c> sorry greycat. but don't i replace text inside existing file ? <Prelude2004c> if i wrote to new file. i have to then move file back to original right? <greycat> Yes. <Prelude2004c> doesn't something like sed -i do in-place replacemnt ? <greycat> Doesn't matter what sed -i does because it's the *wrong tool* for this job.
(Lots more in this vein.) Getting people to stop pursuing dead ends like "I know! I'll launch one hundred sed commands... by the way, I can't figure out how to write the sed commands" is one of our major themes.
Strategy
As I implied, the preferred strategy for this problem is to read from one file, and write to another. Each line that we read from the input file is either a "server line", or not. If it's not, we just write it out without any processing or modification. If it's a server line, then we have to parse it, do a ping test on the IP address, and construct a new server line to write to the output file.
The complex nested structure of the input file, it turns out, can be ignored for this particular problem. We were asked only to parse server lines, where the hostname begins with the literal string "server". No knowledge is required of the semantics of the file outside of these lines. We don't need to know what section we're in, how deeply nested we are, or anything else.
The hardest part, obviously, is parsing the server line. It's clear that we need to extract the IP address. But in order to properly reconstruct the new server line, we actually need five (or possibly six) pieces of information:
- The hostname
- The IP address
- The second number
- The amount of leading whitespace
- Whether the line was originally commented out
- (Unspecified: trailing in-line comments)
Let's assume that we don't need to preserve trailing comments. In addition to these five pieces of information, we also need to know whether we're actually looking at a server line, or not.
There is a tool that allows us to determine whether an input string matches a given format, and if so, to extract various pieces of it: the RegularExpression. It's not pretty, but it seems like a good match for this problem.
Bash's [[ command has an =~ operator that lets us match a string against a regular expression (ERE), with the matching portion and each matching parenthesized subexpression being saved in an array named BASH_REMATCH. I will be using this tool.
Implementation
We need to read a line at a time from one file, and write to another. So our primary program structure will be a while read loop. We can't use IFS field splitting here, because we need to preserve the non-server lines verbatim, and because our parsing will be done by a regular expression, not by field splitting. So, we start with:
while IFS= read -r line; do done < inputfile > outputfile && mv outputfile inputfile
We also wanted to keep track of whether we have made any modifications, and if so, to restart something (which was not well specified, so I'll just provide a generic restart command, which can be altered later). So, let's add a variable:
changed=0 while IFS= read -r line; do done < inputfile > outputfile && mv outputfile inputfile if ((changed)); then service foobard restart fi
Those are the easy parts. The hard part will be processing the server lines.
Next, I need to write a regular expression that performs two goals:
- Validates whether an input line is a server line or not
- Extract the 5 pieces of information we need
The file format we were given appears to have highly flexible whitespace between tokens, so I'm going to assume that all whitespace is optional -- meaning, I will have to put [[:space:]]* in between virtual every pair of tokens in the regex. This is going to make it long and cumbersome, so I'll do my best to ease readability. I'll start by making a variable to hold that piece, and reusing it liberally.
s='[[:space:]]*' re="^($s)(#?)$s(server[[:alnum:]]*)$s=$s\[$s([[:digit:].]*)$s,$s([[:digit:]]*)$s\]$s\$"
Note the 5 captures (parenthesized sub-regexes): one for the leading whitespace, one for the comment character (assuming we can only comment these lines with "#" and not with ";"), one for the server name (assuming no hyphens, underscores, dots, ...), one for the IP, and one for the second number. If the input matches this regex, then we'll know it's a server line, and we'll have the 5 pieces of it that we need to ping-test the IP, and construct the new server line.
Since I built up the regex using a variable, I had to do it inside double quotes (which is unusual). Therefore, I want to double-check that I have my backslashes correct. I want the $ at the end to be literal, because it will be the end-of-string anchor in the regex. I want the \[ and \] to be literal with backslashes, because I want them to stand for literal [ and ] characters in the regex, respectively. So, I test at a shell prompt:
$ echo "\[\]\$" \[\]$
Looks good. Next, I will test the entire regex at a shell prompt:
$ s='[[:space:]]*' $ re="^($s)(#?)$s(server[[:alnum:]]*)$s=$s\[$s([[:digit:].]*)$s,$s([[:digit:]]*)$s\]$s\$" $ in=' server1 = [ 1.1.1.1, 50 ]' $ [[ $in =~ $re ]] && declare -p BASH_REMATCH declare -ar BASH_REMATCH=([0]=" server1 = [ 1.1.1.1, 50 ]" [1]=" " [2]="" [3]="server1" [4]="1.1.1.1" [5]="50")
We got output from declare, so the regex validation passed. We ignore array element 0, because that's just the entire input line. Array element 1 is our leading whitespace, 2 is our optional comment indicator, 3 is the hostname, 4 is the IP, and 5 is that other number. Looks great!
Now, we just put it all together:
1 #!/bin/bash
2 changed=0
3 s='[[:space:]]*'
4 re="^($s)(#?)$s(server[[:alnum:]]*)$s=$s\[$s([[:digit:].]*)$s,$s([[:digit:]]*)$s\]$s\$"
5
6 while IFS= read -r line; do
7 if [[ "$line" =~ $re ]]; then
8 # This is a server line. Use simple names for the extracted pieces.
9 white=${BASH_REMATCH[1]}
10 wascomment=${BASH_REMATCH[2]}
11 host=${BASH_REMATCH[3]}
12 ip=${BASH_REMATCH[4]}
13 number=${BASH_REMATCH[5]}
14
15 # Perform a ping test of the IP. Linux ping syntax.
16 if ping -c 1 "$ip" >/dev/null 2>&1; then
17 comment=
18 else
19 comment="#"
20 fi
21 if [[ "$comment" != "$wascomment" ]]; then
22 changed=1
23 fi
24
25 # Construct and write output server line.
26 printf '%s%s%s = [ %s, %s ]\n' "$white" "$comment" "$host" "$ip" "$number"
27 else
28 # Not a server line.
29 printf '%s\n' "$line"
30 fi
31 done < inputfile > outputfile &&
32 mv outputfile inputfile
33
34 if ((changed)); then
35 service foobard restart
36 fi
Testing
Testing is not optional, especially with a task of this complexity.
You probably don't want to actually restart the service during testing, so put an "echo" in front of the restart line for now. Remember to remove it when we're done testing. (Alternative: test as a user who is not privileged to restart the service; the restart command should fail noisily.)
Start with the input file we were given, and run the script. Then examine the "inputfile". Since all three IPs are fake, they should all be commented out, and the script should have echoed the service restart line. If you run it again, the "inputfile" should remain unchanged, and the script shouldn't echo anything to the terminal.
Now try putting in a real IP that you can actually ping (like 127.0.0.1 or 8.8.8.8). Start with that line commented out, and make sure the comment character gets removed. Start with it uncommented, and make sure it stays that way.
When you're satisfied that it works, remove the "echo", change "inputfile" to the actual pathname that you want it to operate on, change "outputfile" to a usable temporary pathname in the same file system as the "inputfile", and install the script.
<- Avoiding code injection | Example 1: Modifying a config file | Example 2: Unraveling the X-Y problem ->