Diff for "BashFAQ/021"

Differences between revisions 2 and 56 (spanning 54 versions)

How can I replace a string with another string in a variable, a stream, a file, or in all the files in a directory?

There are a number of techniques for this. Which one to use depends on many factors, the biggest of which is what we're editing.

Contents

Files
1. Using nonstandard tools
Variables
Streams

Files

Editing files is tricky. The only standard tool that actually edits a file is ed. Other methods could be used, but they involve a temp file and mv (or nonstandard tools, or extensions to POSIX).

ed is the standard UNIX command-based editor. Here are some commonly-used syntaxes for replacing the string olddomain.com by the string newdomain.com in a file named file. All four commands do the same thing, with varying degrees of portability and efficiency:

# Bash
ed -s file <<< $'g/olddomain\\.com/s//newdomain.com/g\nw\nq'

# Bourne (with printf)
printf '%s\n' 'g/olddomain\.com/s//newdomain.com/g' w q | ed -s file

printf 'g/olddomain\\.com/s//newdomain.com/g\nw\nq' | ed -s file

# Bourne (without printf)
ed -s file <<!
g/olddomain\\.com/s//newdomain.com/g
w
q
!

To replace a string in all files of the current directory, just wrap one of the above in a loop:

for file in ./*; do
    [[ -f $file ]] && ed -s "$file" <<< $'g/old/s//new/g\nw\nq'
done

To do this recursively, the easy way would be to enable globstar in bash 4 (shopt -s globstar, a good idea to put this in your ~/.bashrc) and use:

for file in ./**/*; do
    [[ -f $file ]] && ed -s "$file" <<< $'g/old/s//new/g\nw\nq'
done

If you don't have bash 4, you can use find. Unfortunately, it's a bit tedious to feed ed stdin for each file hit:

find . -type f -exec bash -c 'printf "%s\n" "g/old/s//new/g" w q | ed -s "$1"' _ {} \;

If shell variables are used as the search and/or replace strings, ed is not suitable. Nor is sed, or any tool that uses regular expressions. Consider using the awk code at the bottom of this FAQ with redirections, and mv.

gsub_literal "$search" "$rep" < "$file" > tmp && mv tmp "$file"

1. Using nonstandard tools

sed is a Stream EDitor, not a file editor. Nevertheless, people everywhere tend to abuse it for trying to edit files. It doesn't edit files. GNU sed (and some BSD seds) have a -i option that makes a copy and replaces the original file with the copy. An expensive operation, but if you enjoy unportable code, I/O overhead and bad side effects (such as destroying symlinks), this would be an option:

sed -i    's/old/new/g' ./*  # GNU
sed -i '' 's/old/new/g' ./*  # FreeBSD

Those of you who have perl 5 can accomplish the same thing using this code:

perl -pi -e 's/old/new/g' ./*

Recursively using find:

find . -type f -exec perl -pi -e 's/old/new/g' {} \;   # if your find doesn't have + yet
find . -type f -exec perl -pi -e 's/old/new/g' {} +    # if it does

If you want to delete lines instead of making substitutions:

# Deletes any line containing the perl regex foo
perl -ni -e 'print unless /foo/' ./*

To replace for example all "unsigned" with "unsigned long", if it is not "unsigned int" or "unsigned long" ...:

find . -type f -exec perl -i.bak -pne \
    's/\bunsigned\b(?!\s+(int|short|long|char))/unsigned long/g' {} \;

All of the examples above use regular expressions, which means they have the same issue as the sed code earlier; trying to embed shell variables in them is a terrible idea, and treating an arbitrary value as a literal string is painful at best.

Moreover, perl can be used to pass variables into both search and replace strings with no unquoting or potential for conflict with sigil characters:

in="input (/string" out="output string" perl -pi -e $'$quoted_in=quotemeta($ENV{\'in\'}); s/$quoted_in/$ENV{\'out\'}/g' ./*

Or, simpler:

in=$search out=$replace perl -pi -e 's/\Q$ENV{"in"}/\Q$env{"out"}/g' ./*

Variables

If it's a variable, this can (and should) be done very simply with Bash's parameter expansion:

var='some string'; search=some; rep=another

# Bash
var=${var//"$search"/$rep}

It's a lot harder in POSIX:

# POSIX function

# usage: string_rep SEARCH REPL STRING
# replaces all instances of SEARCH with REPL in STRING
string_rep() {
  # initialize vars
  in=$3
  unset out

  # SEARCH must not be empty
  test -n "$1" || return

  while true; do
    # break loop if SEARCH is no longer in "$in"
    case "$in" in
      *"$1"*) : ;;
      *) break;;
    esac

    # append everything in "$in", up to the first instance of SEARCH, and REP, to "$out"
    out=$out${in%%"$1"*}$2
    # remove everything up to and including the first instance of SEARCH from "$in"
    in=${in#*"$1"}
  done

  # append whatever is left in "$in" after the last instance of SEARCH to out, and print
  printf '%s%s\n' "$out" "$in"
}

var=$(string_rep "$search" "$rep" "$var")

# Note: POSIX does not have a way to localize variables. Most shells (even dash and 
# busybox), however, do. Feel free to localize the variables if your shell supports
# it. Even if it does not, if you call the function with var=$(string_rep ...), the
# function will be run in a subshell and any assignments it makes will not persist.

In the bash example, the quotes around "$search" prevent the contents of the variable to be treated as a shell pattern (also called a glob). Of course, if pattern matching is intended, do not include the quotes. If "$rep" were quoted, however, the quotes would be treated as literal.

Parameter expansions like this are discussed in more detail in Faq #100.

Streams

If it's a stream, then use the stream editor:

some_command | sed 's/foo/bar/g'

sed uses regular expressions. In our example, foo and bar are literal strings. If they were variables (e.g. user input), they would have to be rigorously escaped in order to prevent errors. This is very impractical, and attempting to do so will make your code extremely prone to bugs. Embedding shell variables in sed commands is never a good idea.

You could also do it in Bash itself, by combining a parameter expansion with Faq #1:

search=foo; rep=bar

while IFS= read -r line; do
  printf '%s\n' "${line//"$search"/$rep}"
done < <(some_command)

some_command | while IFS= read -r line; do
  printf '%s\n' "${line//"$search"/$rep}"
done

If you want to do more processing than just a simple search/replace, this may be the best option. Note that the last example runs the loop in a subshell. See Faq #24 for more information on that.

You may notice, however, that the bash loop above is very slow for large data sets. So how do we find something faster, that can replace literal strings? Well, you could use AWK. The following function replaces all instances of STR with REP, reading from stdin and writing to stdout.

# usage: gsub_literal STR REP
# replaces all instances of STR with REP. reads from stdin and writes to stdout.
gsub_literal() {
  # STR cannot be empty
  [[ $1 ]] || return

  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    {
      # empty the output string
      out = "";

      # continue looping while the search string is in the line
      while (i = index($0, str)) {
        # append everything up to the search string, and the replacement string
        out = out substr($0, 1, i-1) rep;

        # remove everything up to and including the first instance of the
        # search string from the line
        $0 = substr($0, i + len);
      }

      # append whatever is left
      out = out $0;

      print out;
    }
  '
}

some_command | gsub_literal "$search" "$rep"


# condensed as a one-liner:
some_command | awk -v s="${search//\\/\\\\}" -v r="${rep//\\/\\\\}" 'BEGIN {l=length(s)} {o="";while (i=index($0, s)) {o=o substr($0,1,i-1) r; $0=substr($0,i+l)} print o $0}'

CategoryShell

-  ⇤ ← Revision 2 as of 2007-08-06 13:44:36 → 
  Size: 2946
  Editor: 74
  Comment: removed loop for sed -i
+   ← Revision 56 as of 2012-07-18 15:48:02 → ⇥
  Size: 8405
  Editor: e36freak
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-[[Anchor(faq21)]]
== How can I replace a string with another string in all files? ==
{{{sed}}} is a good command to replace strings, e.g.

{{{
    sed 's/olddomain\.com/newdomain\.com/g' input > output
}}}

To replace a string in all files of the current directory:

{{{
    for i in *; do
        sed 's/old/new/g' "$i" > atempfile && mv atempfile "$i"
    done
}}}

GNU sed 4.x (but no other version of sed) has a special {{{-i}}} flag which makes the loop and temp file unnecessary:

{{{
      sed -i 's/old/new/g' *
+#pragma section-numbers 3
<<Anchor(faq21)>>
== How can I replace a string with another string in a variable, a stream, a file, or in all the files in a directory? ==
There are a number of techniques for this. Which one to use depends on many factors, the biggest of which is ''what we're editing''.

<<TableOfContents>>

=== Files ===

Editing files is tricky. The only standard tool that actually edits a file is `ed`.  Other methods could be used, but they involve a temp file and `mv` (or nonstandard tools, or extensions to POSIX).

`ed` is the standard UNIX command-based editor.  Here are some commonly-used syntaxes for replacing the string `olddomain.com` by the string `newdomain.com` in a file named `file`.  All four commands do the same thing, with varying degrees of portability and efficiency:

{{{
# Bash
ed -s file <<< $'g/olddomain\\.com/s//newdomain.com/g\nw\nq'

# Bourne (with printf)
printf '%s\n' 'g/olddomain\.com/s//newdomain.com/g' w q | ed -s file

printf 'g/olddomain\\.com/s//newdomain.com/g\nw\nq' | ed -s file

# Bourne (without printf)
ed -s file <<!
g/olddomain\\.com/s//newdomain.com/g
w
q
!
}}}

To replace a string in all files of the current directory, just wrap one of the above in a loop:

{{{
for file in ./*; do
    [[ -f $file ]] && ed -s "$file" <<< $'g/old/s//new/g\nw\nq'
done
}}}

To do this recursively, the easy way would be to enable globstar in bash 4 (`shopt -s globstar`, a good idea to put this in your `~/.bashrc`) and use:

{{{
for file in ./**/*; do
    [[ -f $file ]] && ed -s "$file" <<< $'g/old/s//new/g\nw\nq'
done
}}}

If you don't have bash 4, you can use [[UsingFind|find]].  Unfortunately, it's a bit tedious to feed `ed` stdin for each file hit:

{{{
find . -type f -exec bash -c 'printf "%s\n" "g/old/s//new/g" w q | ed -s "$1"' _ {} \;
}}}

If shell variables are used as the search and/or replace strings, ed is not suitable. Nor is sed, or any tool that uses regular expressions. Consider using the awk code at the bottom of this FAQ with redirections, and mv.
{{{
gsub_literal "$search" "$rep" < "$file" > tmp && mv tmp "$file"
}}} 

==== Using nonstandard tools ====
`sed` is a '''Stream EDitor''', not a '''file''' editor.  Nevertheless, people everywhere tend to abuse it for trying to edit files.  It doesn't edit files.  GNU `sed` (and some BSD `sed`s) have a `-i` option that makes a copy and replaces the original file with the copy.  An expensive operation, but if you enjoy unportable code, I/O overhead and bad side effects (such as destroying symlinks), this would be an option:

{{{
sed -i    's/old/new/g' ./*  # GNU
sed -i '' 's/old/new/g' ./*  # FreeBSD
-Line 26:
+Line 69:
-    perl -pi -e 's/old/new/g' *
}}}

Recursively:

{{{
    find . -type f -print0 | xargs -0 perl -pi -e 's/old/new/g'
+perl -pi -e 's/old/new/g' ./*
}}}

Recursively using `find`:

{{{
find . -type f -exec perl -pi -e 's/old/new/g' {} \;   # if your find doesn't have + yet
find . -type f -exec perl -pi -e 's/old/new/g' {} +    # if it does
}}}

If you want to delete lines instead of making substitutions:

{{{
# Deletes any line containing the perl regex foo
perl -ni -e 'print unless /foo/' ./*
-Line 38:
+Line 89:
-    perl -i.bak -pne 's/\bunsigned\b(?!\s+(int|short|long|char))/unsigned long/g' $(find . -type f)
}}}


Finally, here's a script that some people may find useful:

{{{
    :
    # chtext - change text in several files

    # neither string may contain '|' unquoted
    old='olddomain\.com'
    new='newdomain\.com'

    # if no files were specified on the command line, use all files:
    [ $# -lt 1 ] && set -- *

    for file
    do
        [ -f "$file" ] || continue # do not process e.g. directories
        [ -r "$file" ] || continue # cannot read file - ignore it
        # Replace string, write output to temporary file. Terminate script in case of errors
        sed "s|$old|$new|g" "$file" > "$file"-new || exit
        # If the file has changed, overwrite original file. Otherwise remove copy
        if cmp "$file" "$file"-new >/dev/null 2>&1
        then rm "$file"-new              # file has not changed
        else mv "$file"-new "$file"      # file has changed: overwrite original file
        fi
    done
}}}

If the code above is put into a script file (e.g. {{{chtext}}}), the resulting script can be used to change a text e.g. in all HTML files of the current and all subdirectories:

{{{
    find . -type f -name '*.html' -exec chtext {} \;
}}}

Many optimizations are possible:
 * use another {{{sed}}} separator character than '|', e.g. ^A (ASCII 1)
 * some implementations of {{{sed}}} (e.g. GNU sed) have an "-i" option that can change a file in-place; no temporary file is necessary in that case
 * the {{{find}}} command above could use either {{{xargs}}} or the built-in {{{xargs}}} of POSIX find

Note: {{{set -- *}}} in the code above is safe with respect to files whose names contain spaces.  The expansion of * by {{{set}}} is the same as the expansion done by {{{for}}}, and filenames will be preserved properly as individual parameters, and not broken into words on whitespace.

A more sophisticated example of {{{chtext}}} is here: http://www.shelldorado.com/scripts/cmds/chtext
+find . -type f -exec perl -i.bak -pne \
    's/\bunsigned\b(?!\s+(int|short|long|char))/unsigned long/g' {} \;
}}}

All of the examples above use regular expressions, which means they have the same issue as the sed code earlier; trying to embed shell variables in them is a terrible idea, and treating an arbitrary value as a literal string is painful at best.

Moreover, perl can be used to pass variables into both search and replace strings with no unquoting or potential for conflict with sigil characters:

{{{
in="input (/string" out="output string" perl -pi -e $'$quoted_in=quotemeta($ENV{\'in\'}); s/$quoted_in/$ENV{\'out\'}/g' ./*
}}}

Or, simpler:

{{{
in=$search out=$replace perl -pi -e 's/\Q$ENV{"in"}/\Q$env{"out"}/g' ./*
}}}

=== Variables ===

If it's a variable, this can (and should) be done very simply with Bash's parameter expansion:

{{{
var='some string'; search=some; rep=another

# Bash
var=${var//"$search"/$rep}
}}}

It's a lot harder in POSIX:

{{{
# POSIX function

# usage: string_rep SEARCH REPL STRING
# replaces all instances of SEARCH with REPL in STRING
string_rep() {
  # initialize vars
  in=$3
  unset out

  # SEARCH must not be empty
  test -n "$1" || return

  while true; do
    # break loop if SEARCH is no longer in "$in"
    case "$in" in
      *"$1"*) : ;;
      *) break;;
    esac

    # append everything in "$in", up to the first instance of SEARCH, and REP, to "$out"
    out=$out${in%%"$1"*}$2
    # remove everything up to and including the first instance of SEARCH from "$in"
    in=${in#*"$1"}
  done

  # append whatever is left in "$in" after the last instance of SEARCH to out, and print
  printf '%s%s\n' "$out" "$in"
}

var=$(string_rep "$search" "$rep" "$var")

# Note: POSIX does not have a way to localize variables. Most shells (even dash and 
# busybox), however, do. Feel free to localize the variables if your shell supports
# it. Even if it does not, if you call the function with var=$(string_rep ...), the
# function will be run in a subshell and any assignments it makes will not persist.
}}}

In the bash example, the quotes around "$search" prevent the contents of the variable to be treated as a shell pattern (also called a [[glob]]). Of course, if pattern matching is intended, do not include the quotes. If "$rep" were quoted, however, the quotes would be treated as literal.

Parameter expansions like this are discussed in more detail in [[BashFAQ/100|Faq #100]].

=== Streams ===

If it's a stream, then use the '''s'''tream '''ed'''itor:

{{{
some_command | sed 's/foo/bar/g'
}}}

`sed` uses [[RegularExpression|regular expressions]].  In our example, `foo` and `bar` are literal strings.  If they were variables (e.g. user input), they would have to be rigorously escaped in order to prevent errors. This is very impractical, and attempting to do so will make your code extremely prone to bugs. Embedding shell variables in sed commands is '''never''' a good idea.

You could also do it in Bash itself, by combining a parameter expansion with [[BashFAQ/001|Faq #1]]:
{{{
search=foo; rep=bar

while IFS= read -r line; do
  printf '%s\n' "${line//"$search"/$rep}"
done < <(some_command)

some_command | while IFS= read -r line; do
  printf '%s\n' "${line//"$search"/$rep}"
done
}}}

If you want to do more processing than just a simple search/replace, this may be the best option. Note that the last example runs the loop in a subshell. See [[BashFAQ/024|Faq #24]] for more information on that.

You may notice, however, that the bash loop above is very slow for large data sets. So how do we find something faster, that can replace literal strings? Well, you could use `AWK`. The following function replaces all instances of STR with REP, reading from stdin and writing to stdout.

{{{
# usage: gsub_literal STR REP
# replaces all instances of STR with REP. reads from stdin and writes to stdout.
gsub_literal() {
  # STR cannot be empty
  [[ $1 ]] || return

  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    {
      # empty the output string
      out = "";

      # continue looping while the search string is in the line
      while (i = index($0, str)) {
        # append everything up to the search string, and the replacement string
        out = out substr($0, 1, i-1) rep;

        # remove everything up to and including the first instance of the
        # search string from the line
        $0 = substr($0, i + len);
      }

      # append whatever is left
      out = out $0;

      print out;
    }
  '
}

some_command | gsub_literal "$search" "$rep"


# condensed as a one-liner:
some_command | awk -v s="${search//\\/\\\\}" -v r="${rep//\\/\\\\}" 'BEGIN {l=length(s)} {o="";while (i=index($0, s)) {o=o substr($0,1,i-1) r; $0=substr($0,i+l)} print o $0}'
}}}

----
CategoryShell