Diff for "BashFAQ/110"

Differences between revisions 1 and 9 (spanning 8 versions)

How can i perform a substitution with arbitrary values ("s/$foo/$bar/") safely, without treating either value as a regular expression or worrying about other special characters?

Sed is not the right tool for this. Nor is ed. At best, attempting either will result in an escaping nightmare, and will be extremely prone to bugs. (The exception to this rule is when you have sanitized input, but then you probably wouldn't be needing this faq, would you?)

First, what are we performing the substitution on? If it's a string, it can be done very simply with a parameter expansion.

var='some string'
search=some
replace=another
printf '%s\n' "${var//"$search"/$replace}"

This is discussed in more detail in Faq #100.

If it's a file or stream, things get a bit trickier. One way to accomplish this would be to combine the previous method with Faq #1.

search=foo
rep=bar

# file
while IFS= read -r line; do
  printf '%s\n' "${line//"$search"/$rep}"
done < "$file"

# command output
while IFS= read -r line; do
  printf '%s\n' "${line//"$search"/$rep}"
done < <(my_command)

my_command | while IFS= read -r line; do
  printf '%s\n' "${line//"$search"/$rep}"
done

The second of the two command examples there creates a subshell. See Faq #24 for more information on that.

Both of the above examples print to stdout; neither actually edits the file in place. Of course this could be resolved with something like:

# create a temp file, die on failure
tmp=$(mktemp) || exit

while IFS= read -r line; do
  printf '%s\n' "${line//"$search"/$rep}"
done < "$file" > "$tmp" && mv "$tmp" "$file"

On large data sets, you'll notice that this is quite slow. The following functions use awk, and are quite a bit faster:

# usage: sub_literal STR REP
# Replaces the first instance (on each line) of STR with REP, treating them as
# literal strings and not regexes. Reads stdin and writes to stdout.
# Similar to sed 's/STR/REP/'

sub_literal() {
  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    # if the search string is in the line
    (i = index($0, str)) {
      # replace the first occurance with rep
      $0 = substr($0, 1, i-1) rep substr($0, i + len);
    }

    # print each line
    1
  '
}

# usage: sub_literal_f STR REP FILE
# Replaces the first instance (on each line) of STR with REP in FILE, treating 
# them as literal strings and not regexes.
# Similar to sed -i 's/STR/REP/' FILE

sub_literal_f() {
  local tmp
  if ! [[ -f $3 && -r $3 && -w $3 ]]; then
    printf '%s does not exist or is not readable or writable\n' "$3" >&2
    return 1
  fi

  trap 'rm -rf "$tmp"' RETURN
  tmp=$(mktemp) && cp "$3" "$tmp" || return

  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    # if the search string is in the line
    (i = index($0, str)) {
      # replace the first occurance with rep
      $0 = substr($0, 1, i-1) rep substr($0, i + len);
    }

    # print each line
    1
  ' "$tmp" > "$3"
}


# usage: gsub_literal STR REP
# Replaces all instances of STR with REP, treating them as literal strings 
# and not regexes. Reads stdin and writes to stdout
# Similar to sed 's/STR/REP/g'

gsub_literal() {
  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    {
      # empty the output string
      out = "";

      # continue looping while the search string is in the line
      while (i = index($0, str)) {
        # append everything up to the search string, and the replacement string
        out = out substr($0, 1, i-1) rep;

        # remove everything up to and including the first instance of the
        # search string from the line
        $0 = substr($0, i + len);
      }

      # append whatever is left
      out = out $0;

      print out;
    }
  '
}

# usage: gsub_literal_f STR REP FILE
# Replaces all instances of STR with REP in FILE, treating them as literal 
# strings and not regexes.
# Similar to sed -i 's/STR/REP/g' FILE

gsub_literal_f() {
  local tmp
  if ! [[ -f $3 && -r $3 && -w $3 ]]; then
    printf '%s does not exist or is not readable or writable\n' "$3" >&2
    return 1
  fi

  trap 'rm -rf "$tmp"' RETURN
  tmp=$(mktemp) && cp "$3" "$tmp" || return

  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    {
      # empty the output string
      out = "";

      # continue looping while the search string is in the line
      while (i = index($0, str)) {
        # append everything up to the search string, and the replacement string
        out = out substr($0, 1, i-1) rep;

        # remove everything up to and including the first instance of the
        # search string from the line
        $0 = substr($0, i + len);
      }

      # append whatever is left
      out = out $0;

      print out;
    }
  ' "$tmp" > "$3"
}

The mktemp(1) command used in some of the examples above is not completely portable. While it will work on most systems, more information on safely creating temp files can be found in Faq #62.

-  ⇤ ← Revision 1 as of 2012-03-21 17:23:44 → 
  Size: 1333
  Editor: e36freak
  Comment:
+   ← Revision 9 as of 2012-03-21 19:26:45 → ⇥
  Size: 5846
  Editor: e36freak
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 2:
-== How can i perform a substitution (s/foo/bar/) safely, without treating either value as a regular expression? ==

Sed is not the right tool for this. At best, it will be an escaping nightmare, and extremely prone to bugs.
+== How can i perform a substitution with arbitrary values ("s/$foo/$bar/") safely, without treating either value as a regular expression or worrying about other special characters? ==
Sed is not the right tool for this. Nor is ed. At best, attempting either will result in an escaping nightmare, and will be extremely prone to bugs. (The exception to this rule is when you have sanitized input, but then you probably wouldn't be needing this faq, would you?)
-Line 10:
+Line 9:
-echo "${var//some/another}"
+search=some
replace=another
printf '%s\n' "${var//"$search"/$replace}"
-Line 13:
+Line 14:
-This is discussed in more detail in [[BashFAQ/100][Faq #100]].
+This is discussed in more detail in [[BashFAQ/100|Faq #100]].
-Line 15:
+Line 16:
-If it's a file or stream, things get a bit trickier. One way to accomplish this would be to combine the previous method with [[BashFAQ/001][Faq #1]].
+If it's a file or stream, things get a bit trickier. One way to accomplish this would be to combine the previous method with [[BashFAQ/001|Faq #1]].
-Line 18:
+Line 19:
+search=foo
rep=bar
-Line 20:
+Line 24:
-  printf '%s\n' "${line//foo/bar}"
done < file
+  printf '%s\n' "${line//"$search"/$rep}"
done < "$file"
-Line 25:
+Line 29:
-  printf '%s\n' "${line//foo/bar}"
+  printf '%s\n' "${line//"$search"/$rep}"
-Line 29:
+Line 33:
-  printf '%s\n' "${line//foo/bar}"
+  printf '%s\n' "${line//"$search"/$rep}"
-Line 33:
+Line 37:
-The second of the two command examples there creates a subshell. See [[BashFAQ/024][Faq #24]] for more information on that.
+The second of the two command examples there creates a subshell. See [[BashFAQ/024|Faq #24]] for more information on that.
-Line 35:
+Line 39:
-Both of the above examples print to stdout. Neither actually edits the file in place. Of course this could be resolved with something like:
+Both of the above examples print to stdout; neither actually edits the file in place. Of course this could be resolved with something like:
-Line 37:
+Line 41:
+# create a temp file, die on failure
tmp=$(mktemp) || exit
-Line 38:
+Line 45:
-  printf '%s\n' "${line//foo/bar}"
done < file > new_file && mv new_file file
+  printf '%s\n' "${line//"$search"/$rep}"
done < "$file" > "$tmp" && mv "$tmp" "$file"
-Line 41:
+Line 48:
+On large data sets, you'll notice that this is quite slow.  The following functions use awk, and are quite a bit faster:
{{{
# usage: sub_literal STR REP
# Replaces the first instance (on each line) of STR with REP, treating them as
# literal strings and not regexes. Reads stdin and writes to stdout.
# Similar to sed 's/STR/REP/'

sub_literal() {
  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    # if the search string is in the line
    (i = index($0, str)) {
      # replace the first occurance with rep
      $0 = substr($0, 1, i-1) rep substr($0, i + len);
    }

    # print each line
    1
  '
}

# usage: sub_literal_f STR REP FILE
# Replaces the first instance (on each line) of STR with REP in FILE, treating 
# them as literal strings and not regexes.
# Similar to sed -i 's/STR/REP/' FILE

sub_literal_f() {
  local tmp
  if ! [[ -f $3 && -r $3 && -w $3 ]]; then
    printf '%s does not exist or is not readable or writable\n' "$3" >&2
    return 1
  fi

  trap 'rm -rf "$tmp"' RETURN
  tmp=$(mktemp) && cp "$3" "$tmp" || return

  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    # if the search string is in the line
    (i = index($0, str)) {
      # replace the first occurance with rep
      $0 = substr($0, 1, i-1) rep substr($0, i + len);
    }

    # print each line
    1
  ' "$tmp" > "$3"
}


# usage: gsub_literal STR REP
# Replaces all instances of STR with REP, treating them as literal strings 
# and not regexes. Reads stdin and writes to stdout
# Similar to sed 's/STR/REP/g'

gsub_literal() {
  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    {
      # empty the output string
      out = "";

      # continue looping while the search string is in the line
      while (i = index($0, str)) {
        # append everything up to the search string, and the replacement string
        out = out substr($0, 1, i-1) rep;

        # remove everything up to and including the first instance of the
        # search string from the line
        $0 = substr($0, i + len);
      }

      # append whatever is left
      out = out $0;

      print out;
    }
  '
}

# usage: gsub_literal_f STR REP FILE
# Replaces all instances of STR with REP in FILE, treating them as literal 
# strings and not regexes.
# Similar to sed -i 's/STR/REP/g' FILE

gsub_literal_f() {
  local tmp
  if ! [[ -f $3 && -r $3 && -w $3 ]]; then
    printf '%s does not exist or is not readable or writable\n' "$3" >&2
    return 1
  fi

  trap 'rm -rf "$tmp"' RETURN
  tmp=$(mktemp) && cp "$3" "$tmp" || return

  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    {
      # empty the output string
      out = "";

      # continue looping while the search string is in the line
      while (i = index($0, str)) {
        # append everything up to the search string, and the replacement string
        out = out substr($0, 1, i-1) rep;

        # remove everything up to and including the first instance of the
        # search string from the line
        $0 = substr($0, i + len);
      }

      # append whatever is left
      out = out $0;

      print out;
    }
  ' "$tmp" > "$3"
}
}}}

The mktemp(1) command used in some of the examples above is not completely portable. While it will work on most systems, more information on safely creating temp files can be found in [[BashFAQ/062|Faq #62]].