2901
Comment: clean-up
|
11176
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
[[Anchor(faq21)]] == How can I replace a string with another string in all files? == {{{sed}}} is a good command to replace strings, e.g. {{{ sed 's/olddomain\.com/newdomain.com/g' input > output |
<<Anchor(faq21)>> == How can I replace a string with another string in a variable, a stream, a file, or in all the files in a directory? == There are a number of tools available for this. Which one to use depends on a lot of factors, the biggest of which is of course ''what we're editing''. === Variables === If it's a variable, this can (and should) be done very simply with parameter expansion. Forking an external tool for string manipulation is extremely slow and unnecessary. {{{ var='some string'; search=some; rep=another # Bash var=${var//"$search"/$rep} # POSIX function # usage: string_rep SEARCH REPL STRING # replaces all instances of SEARCH with REPL in STRING string_rep() { # initialize vars in=$3 unset out # SEARCH must not be empty [[ $1 ]] || return while true; do # break loop if SEARCH is no longer in "$in" case "$in" in *"$1"*) : ;; *) break;; esac # append everything in "$in", up to the first instance of SEARCH, and REP, to "$out" out=$out${in%%"$1"*}$2 # remove everything up to and including the first instance of SEARCH from "$in" in=${in#*"$1"} done # append whatever is left in "$in" after the last instance of SEARCH to out, and print printf '%s%s\n' "$out" "$in" } var=$(string_rep "$var" "$search" "$rep") # Note: POSIX does not have a way to localize variables. Most shells (even dash and busybox), however, do. Feel free to localize the variables if your shell supports it. EVen if it does not, if you call the function with var=$(string_rep ...), the function will be run in a subshell and any assignments it makes will not persist. }}} In the bash example, the quotes around "$search" prevent the contents of the variable to be treated as a shell pattern (also called a "glob"). Of course, if pattern matching is intended, do not include the quotes. If "$rep" were quoted, however, the quotes would be treated as literal. Parameter expansions like this are discussed in more detail in [[BashFAQ/100|Faq #100]]. === Streams === If it's a file or a stream, things get a little bit trickier. The standard tools available for this are `sed` or `AWK` (for streams), and `ed` (for files). Of course, you could do it in bash itself, by combining the previous method with [[BashFAQ/001|Faq #1]]: {{{ search=foo; rep=bar while IFS= read -r line; do printf '%s\n' "${line//"$search"/$rep}" done < <(some_command) some_command | while IFS= read -r line; do printf '%s\n' "${line//"$search"/$rep}" done }}} If you want to do more processing than just a simple search/replace, this may be the best option. Note that the last example runs the loop in a subshell. See [[BashFAQ/024|Faq #24]] for more information on that. Another option would, of course, be `sed`: {{{ # replaces all instances of "search" with "replace" in the output of "some_command" some_command | sed 's/search/replace/g' }}} `sed` uses [[RegularExpression|regular expressions]]. Unlike the bash, "search" and "replace" would have to be rigorously escaped in order to treat the values as literal strings. This is very impractical, and attempting to do so will make your code extremely prone to bugs. Embedding shell variables in sed is '''never''' a good idea. You may notice, however, that the bash loop above is very slow for large data sets. So how do we find something faster, that can replace literal strings? Well, you could use `AWK`. The following function replaces all instances of STR with REP, reading from stdin and writing to stdout. {{{ # usage: gsub_literal STR REP # replaces all instances of STR with REP. reads from stdin and writes to stdout. gsub_literal() { # STR cannot be empty [[ $1 ]] || return # string manip needed to escape '\'s, so awk doesn't expand '\n' and such awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" ' # get the length of the search string BEGIN { len = length(str); } { # empty the output string out = ""; # continue looping while the search string is in the line while (i = index($0, str)) { # append everything up to the search string, and the replacement string out = out substr($0, 1, i-1) rep; # remove everything up to and including the first instance of the # search string from the line $0 = substr($0, i + len); } # append whatever is left out = out $0; print out; } ' } some_command | gsub_literal "$search" "$rep" # condensed as a one-liner: some_command | awk -v s="${search//\\/\\\\}" -v r="${rep//\\/\\\\}" 'BEGIN {l=length(s)} {o="";while (i=index($0, s)) {o=o substr($0,1,i-1) r; $0=substr($0,i+l)} print o $0}' }}} === Files === Actually editing files gets even trickier. The only tool listed that actually edits a file is `ed`. The other methods could be used, but to do so involves a temp file and `mv` (or POSIX extensions). `ed` is the standard UNIX command-based editor. Here are some commonly-used syntaxes for replacing the string `olddomain.com` by the string `newdomain.com` in a file named `file`. All four commands do the same thing, with varying degrees of portability and efficiency: {{{ # Bash ed -s file <<< $'g/olddomain\\.com/s//newdomain.com/g\nw\nq' # Bourne (with printf) printf '%s\n' 'g/olddomain\.com/s//newdomain.com/g' w q | ed -s file printf 'g/olddomain\\.com/s//newdomain.com/g\nw\nq' | ed -s file # Bourne (without printf) ed -s file <<! g/olddomain\\.com/s//newdomain.com/g w q ! |
Line 12: | Line 153: |
for i in *; do sed 's/old/new/g' "$i" > atempfile && mv atempfile "$i" done }}} GNU sed 4.x (but no other version of sed) has a special {{{-i}}} flag which makes the loop and temp file unnecessary: {{{ sed -i 's/old/new/g' * |
for file in ./*; do [[ -f $file ]] && ed -s "$file" <<< $'g/old/s//new/g\nw\nq' done }}} To do this recursively, the easy way would be to enable globstar in bash 4 (`shopt -s globstar`, a good idea to put this in your `~/.bashrc`) and use: {{{ for file in ./**/*; do [[ -f $file ]] && ed -s "$file" <<< $'g/old/s//new/g\nw\nq' done }}} If you don't have bash 4, you can use [[UsingFind|find]]. Unfortunately, it's a bit tedious to feed `ed` stdin for each file hit: {{{ find . -type f -exec bash -c 'printf "%s\n" "g/old/s//new/g" w q | ed -s "$1"' _ {} \; }}} `sed` is a '''Stream EDitor''', not a '''file''' editor. Nevertheless, people everywhere tend to abuse it for trying to edit files. It doesn't edit files. GNU `sed` (and some BSD `sed`s) have a `-i` option that makes a copy and replaces the original file with the copy. An expensive operation, but if you enjoy unportable code, I/O overhead and bad side effects (such as destroying symlinks), this would be an option: {{{ sed -i 's/old/new/g' ./* # GNU sed -i '' 's/old/new/g' ./* # BSD # POSIX sed, uses a temp file and mv: # remove all temp files on exit, in case sed fails and they weren't moved trap 'rm -f "${temps[@]}"' EXIT temps=() for file in ./*; do if [[ -f $file ]]; then tmp=$(mktemp) || exit temps+=("$tmp") sed 's/old/new/g' "$file" > "$tmp" && mv "$tmp" "$file" fi done |
Line 26: | Line 198: |
perl -pi -e 's/old/new/g' * }}} Recursively (requires GNU or BSD {{{find}}}): {{{ find . -type f -print0 | xargs -0 perl -pi -e 's/old/new/g' |
perl -pi -e 's/old/new/g' ./* }}} Recursively using `find`: {{{ find . -type f -exec perl -pi -e 's/old/new/g' {} \; # if your find doesn't have + yet find . -type f -exec perl -pi -e 's/old/new/g' {} + # if it does }}} If you want to delete lines instead of making substitutions: {{{ # Deletes any line containing the perl regex foo perl -ni -e 'print unless /foo/' ./* |
Line 38: | Line 218: |
find . -type f -print0 | xargs -0 perl -i.bak -pne \ 's/\bunsigned\b(?!\s+(int|short|long|char))/unsigned long/g' }}} Finally, for those of you with ''none'' of the useful things above, here's a script that may be useful: {{{ #!/bin/sh # chtext - change text in several files # neither string may contain '|' unquoted old='olddomain\.com' new='newdomain\.com' # if no files were specified on the command line, use all files: [ $# -lt 1 ] && set -- * for file do [ -f "$file" ] || continue # do not process e.g. directories [ -r "$file" ] || continue # cannot read file - ignore it # Replace string, write output to temporary file. Terminate script in case of errors sed "s|$old|$new|g" "$file" > "$file"-new || exit # If the file has changed, overwrite original file. Otherwise remove copy if cmp "$file" "$file"-new >/dev/null 2>&1 then rm "$file"-new # file has not changed else mv "$file"-new "$file" # file has changed: overwrite original file fi done }}} If the code above is put into a script file (e.g. {{{chtext}}}), the resulting script can be used to change a text e.g. in all HTML files of the current and all subdirectories: {{{ find . -type f -name '*.html' -exec chtext {} \; }}} Many optimizations are possible: * use another {{{sed}}} separator character than '|', e.g. ^A (ASCII 1) * the {{{find}}} command above could use either {{{xargs}}} or the built-in {{{xargs}}} of POSIX find Note: {{{set -- *}}} in the code above is safe with respect to files whose names contain spaces. The expansion of * by {{{set}}} is the same as the expansion done by {{{for}}}, and filenames will be preserved properly as individual parameters, and not broken into words on whitespace. A more sophisticated example of {{{chtext}}} is here: http://www.shelldorado.com/scripts/cmds/chtext |
find . -type f -exec perl -i.bak -pne \ 's/\bunsigned\b(?!\s+(int|short|long|char))/unsigned long/g' {} \; }}} ---- All of the tools listed above use regular expressions, which means they have the same issue as the sed code earlier; trying to embed shell variables in them is a terrible idea, and treating an arbitrary value as a literal string is painful at best. This brings us back to our while read loop, or the awk function above. The while read loop: {{{ # overwrite a single file tmp=$(mktemp) || exit trap 'rm -f "$tmp"' EXIT while IFS= read -r line; do printf '%s\n' "${line//"$search"/$rep}" done < "$file" > "$tmp" && mv "$tmp" "$file" }}} Replaces all files in a directory: {{{ trap 'rm -f "${temps[@]}"' EXIT temps=() for f in ./*; do if [[ -f $f ]]; then tmp=$(mktemp) || exit temps+=("$tmp") while IFS= read -r line; do printf '%s\n' "${line//"$search"/$rep}" done < "$f" > "$tmp" && mv "$tmp" "$f" fi done }}} The above glob could be changed to './**/*' in order to use globstar (mentioned above) to be recursive, or of course we could use `find`: {{{ # this example uses GNU find's -print0. Using POSIX find -exec is left as an exercise to the reader trap 'rm -f "${temps[@]}"' EXIT temps=() while IFS= read -rd '' f <&3; do tmp=$(mktemp) || exit temps+=("$tmp") while IFS= read -r line; do printf '%s\n' "${line//"$search"/$rep}" done < "$f" > "$tmp" && mv "$tmp" "$f" done 3< <(find . -type f -print0) }}} And of course, we can adapt the 'AWK' function above. The following function replaces all instances of STR with REP in FILE, actually overwriting FILE: {{{ # usage: gsub_literal_f STR REP FILE # replaces all instances of STR with REP in FILE gsub_literal_f() { local tmp # make sure FILE exists, is a regular file, and is readable and writable if ! [[ -f $3 && -r $3 && -w $3 ]]; then printf '%s does not exist or is not readable or writable\n' "$3" >&2 return 1 fi # STR cannot be empty [[ $1 ]] || return tmp=$(mktemp) || return trap 'rm -f "$tmp"' RETURN # string manip needed to escape '\'s, so awk doesn't expand '\n' and such awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" ' # get the length of the search string BEGIN { len = length(str); } { # empty the output string out = ""; # continue looping while the search string is in the line while (i = index($0, str)) { # append everything up to the search string, and the replacement string out = out substr($0, 1, i-1) rep; # remove everything up to and including the first instance of the # search string from the line $0 = substr($0, i + len); } # append whatever is left out = out $0; print out; } ' "$3" > "$tmp" && mv "$tmp" "$3" } }}} This function, of course, could be called on all of the files in a dir, or recursively. ---- '''Notes:''' For more information on `sed` or `awk`, you can visit the '''##sed''' and '''#awk''' channels on freenode, respectively. ''mktemp(1)'', used in many of the examples above, is not completely portable. While it will work on most systems, more information on safely creating temp files can be found in [[BashFAQ|Faq #62]]. ---- CategoryShell |
How can I replace a string with another string in a variable, a stream, a file, or in all the files in a directory?
There are a number of tools available for this. Which one to use depends on a lot of factors, the biggest of which is of course what we're editing.
Variables
If it's a variable, this can (and should) be done very simply with parameter expansion. Forking an external tool for string manipulation is extremely slow and unnecessary.
var='some string'; search=some; rep=another # Bash var=${var//"$search"/$rep} # POSIX function # usage: string_rep SEARCH REPL STRING # replaces all instances of SEARCH with REPL in STRING string_rep() { # initialize vars in=$3 unset out # SEARCH must not be empty [[ $1 ]] || return while true; do # break loop if SEARCH is no longer in "$in" case "$in" in *"$1"*) : ;; *) break;; esac # append everything in "$in", up to the first instance of SEARCH, and REP, to "$out" out=$out${in%%"$1"*}$2 # remove everything up to and including the first instance of SEARCH from "$in" in=${in#*"$1"} done # append whatever is left in "$in" after the last instance of SEARCH to out, and print printf '%s%s\n' "$out" "$in" } var=$(string_rep "$var" "$search" "$rep") # Note: POSIX does not have a way to localize variables. Most shells (even dash and busybox), however, do. Feel free to localize the variables if your shell supports it. EVen if it does not, if you call the function with var=$(string_rep ...), the function will be run in a subshell and any assignments it makes will not persist.
In the bash example, the quotes around "$search" prevent the contents of the variable to be treated as a shell pattern (also called a "glob"). Of course, if pattern matching is intended, do not include the quotes. If "$rep" were quoted, however, the quotes would be treated as literal.
Parameter expansions like this are discussed in more detail in Faq #100.
Streams
If it's a file or a stream, things get a little bit trickier. The standard tools available for this are sed or AWK (for streams), and ed (for files).
Of course, you could do it in bash itself, by combining the previous method with Faq #1:
search=foo; rep=bar while IFS= read -r line; do printf '%s\n' "${line//"$search"/$rep}" done < <(some_command) some_command | while IFS= read -r line; do printf '%s\n' "${line//"$search"/$rep}" done
If you want to do more processing than just a simple search/replace, this may be the best option. Note that the last example runs the loop in a subshell. See Faq #24 for more information on that.
Another option would, of course, be sed:
# replaces all instances of "search" with "replace" in the output of "some_command" some_command | sed 's/search/replace/g'
sed uses regular expressions. Unlike the bash, "search" and "replace" would have to be rigorously escaped in order to treat the values as literal strings. This is very impractical, and attempting to do so will make your code extremely prone to bugs. Embedding shell variables in sed is never a good idea.
You may notice, however, that the bash loop above is very slow for large data sets. So how do we find something faster, that can replace literal strings? Well, you could use AWK. The following function replaces all instances of STR with REP, reading from stdin and writing to stdout.
# usage: gsub_literal STR REP # replaces all instances of STR with REP. reads from stdin and writes to stdout. gsub_literal() { # STR cannot be empty [[ $1 ]] || return # string manip needed to escape '\'s, so awk doesn't expand '\n' and such awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" ' # get the length of the search string BEGIN { len = length(str); } { # empty the output string out = ""; # continue looping while the search string is in the line while (i = index($0, str)) { # append everything up to the search string, and the replacement string out = out substr($0, 1, i-1) rep; # remove everything up to and including the first instance of the # search string from the line $0 = substr($0, i + len); } # append whatever is left out = out $0; print out; } ' } some_command | gsub_literal "$search" "$rep" # condensed as a one-liner: some_command | awk -v s="${search//\\/\\\\}" -v r="${rep//\\/\\\\}" 'BEGIN {l=length(s)} {o="";while (i=index($0, s)) {o=o substr($0,1,i-1) r; $0=substr($0,i+l)} print o $0}'
Files
Actually editing files gets even trickier. The only tool listed that actually edits a file is ed. The other methods could be used, but to do so involves a temp file and mv (or POSIX extensions).
ed is the standard UNIX command-based editor. Here are some commonly-used syntaxes for replacing the string olddomain.com by the string newdomain.com in a file named file. All four commands do the same thing, with varying degrees of portability and efficiency:
# Bash ed -s file <<< $'g/olddomain\\.com/s//newdomain.com/g\nw\nq' # Bourne (with printf) printf '%s\n' 'g/olddomain\.com/s//newdomain.com/g' w q | ed -s file printf 'g/olddomain\\.com/s//newdomain.com/g\nw\nq' | ed -s file # Bourne (without printf) ed -s file <<! g/olddomain\\.com/s//newdomain.com/g w q !
To replace a string in all files of the current directory:
for file in ./*; do [[ -f $file ]] && ed -s "$file" <<< $'g/old/s//new/g\nw\nq' done
To do this recursively, the easy way would be to enable globstar in bash 4 (shopt -s globstar, a good idea to put this in your ~/.bashrc) and use:
for file in ./**/*; do [[ -f $file ]] && ed -s "$file" <<< $'g/old/s//new/g\nw\nq' done
If you don't have bash 4, you can use find. Unfortunately, it's a bit tedious to feed ed stdin for each file hit:
find . -type f -exec bash -c 'printf "%s\n" "g/old/s//new/g" w q | ed -s "$1"' _ {} \;
sed is a Stream EDitor, not a file editor. Nevertheless, people everywhere tend to abuse it for trying to edit files. It doesn't edit files. GNU sed (and some BSD seds) have a -i option that makes a copy and replaces the original file with the copy. An expensive operation, but if you enjoy unportable code, I/O overhead and bad side effects (such as destroying symlinks), this would be an option:
sed -i 's/old/new/g' ./* # GNU sed -i '' 's/old/new/g' ./* # BSD # POSIX sed, uses a temp file and mv: # remove all temp files on exit, in case sed fails and they weren't moved trap 'rm -f "${temps[@]}"' EXIT temps=() for file in ./*; do if [[ -f $file ]]; then tmp=$(mktemp) || exit temps+=("$tmp") sed 's/old/new/g' "$file" > "$tmp" && mv "$tmp" "$file" fi done
Those of you who have perl 5 can accomplish the same thing using this code:
perl -pi -e 's/old/new/g' ./*
Recursively using find:
find . -type f -exec perl -pi -e 's/old/new/g' {} \; # if your find doesn't have + yet find . -type f -exec perl -pi -e 's/old/new/g' {} + # if it does
If you want to delete lines instead of making substitutions:
# Deletes any line containing the perl regex foo perl -ni -e 'print unless /foo/' ./*
To replace for example all "unsigned" with "unsigned long", if it is not "unsigned int" or "unsigned long" ...:
find . -type f -exec perl -i.bak -pne \ 's/\bunsigned\b(?!\s+(int|short|long|char))/unsigned long/g' {} \;
All of the tools listed above use regular expressions, which means they have the same issue as the sed code earlier; trying to embed shell variables in them is a terrible idea, and treating an arbitrary value as a literal string is painful at best. This brings us back to our while read loop, or the awk function above.
The while read loop:
# overwrite a single file tmp=$(mktemp) || exit trap 'rm -f "$tmp"' EXIT while IFS= read -r line; do printf '%s\n' "${line//"$search"/$rep}" done < "$file" > "$tmp" && mv "$tmp" "$file"
Replaces all files in a directory:
trap 'rm -f "${temps[@]}"' EXIT temps=() for f in ./*; do if [[ -f $f ]]; then tmp=$(mktemp) || exit temps+=("$tmp") while IFS= read -r line; do printf '%s\n' "${line//"$search"/$rep}" done < "$f" > "$tmp" && mv "$tmp" "$f" fi done
The above glob could be changed to './**/*' in order to use globstar (mentioned above) to be recursive, or of course we could use find:
# this example uses GNU find's -print0. Using POSIX find -exec is left as an exercise to the reader trap 'rm -f "${temps[@]}"' EXIT temps=() while IFS= read -rd '' f <&3; do tmp=$(mktemp) || exit temps+=("$tmp") while IFS= read -r line; do printf '%s\n' "${line//"$search"/$rep}" done < "$f" > "$tmp" && mv "$tmp" "$f" done 3< <(find . -type f -print0)
And of course, we can adapt the 'AWK' function above. The following function replaces all instances of STR with REP in FILE, actually overwriting FILE:
# usage: gsub_literal_f STR REP FILE # replaces all instances of STR with REP in FILE gsub_literal_f() { local tmp # make sure FILE exists, is a regular file, and is readable and writable if ! [[ -f $3 && -r $3 && -w $3 ]]; then printf '%s does not exist or is not readable or writable\n' "$3" >&2 return 1 fi # STR cannot be empty [[ $1 ]] || return tmp=$(mktemp) || return trap 'rm -f "$tmp"' RETURN # string manip needed to escape '\'s, so awk doesn't expand '\n' and such awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" ' # get the length of the search string BEGIN { len = length(str); } { # empty the output string out = ""; # continue looping while the search string is in the line while (i = index($0, str)) { # append everything up to the search string, and the replacement string out = out substr($0, 1, i-1) rep; # remove everything up to and including the first instance of the # search string from the line $0 = substr($0, i + len); } # append whatever is left out = out $0; print out; } ' "$3" > "$tmp" && mv "$tmp" "$3" }
This function, of course, could be called on all of the files in a dir, or recursively.
Notes:
For more information on sed or awk, you can visit the ##sed and #awk channels on freenode, respectively.
mktemp(1), used in many of the examples above, is not completely portable. While it will work on most systems, more information on safely creating temp files can be found in Faq #62.