Differences between revisions 15 and 25 (spanning 10 versions)
Revision 15 as of 2012-04-12 18:30:27
Size: 5513
Editor: e36freak
Comment:
Revision 25 as of 2021-09-30 00:41:01
Size: 5264
Editor: emanuele6
Comment: since ${///} is bash specific, let's use local
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
== How can i perform a substitution with arbitrary values ("s/$foo/$bar/") safely, without treating either value as a regular expression or worrying about other special characters? ==
Sed is not the right tool for this. Nor is ed. At best, attempting either will result in an escaping nightmare, and will be extremely prone to bugs.
== How do I copy a file to a remote system, and specify a remote name which may contain spaces? ==
All of the common tools for copying files to a remote system (ssh, scp, rsync) send the filename as part of a shell command, which the remote system interprets. This makes the issue extremely complex, because the remote shell will often mangle the filename. There are at least three ways to deal with the problem: NFS, careful encoding of the filename, or submission of the filename as part of the data stream.
Line 5: Line 5:
First, what are we performing the substitution on? If it's a string, it can be done very simply with a parameter expansion. First let's look at what '''does not''' work:
{{{
# Will not work
scp "my file" remote:"your file"
}}}

scp is basically a thin wrapper on top of ssh, which works by instructing the remote system's shell to open a file for writing. Since the filename is passed to the remote shell in the most naive way imaginable, the remote shell sees the space as an argument separator, and ends up creating a file named ''your''.

Similar problems plague most of the "obvious" (but wrong) attempts to address the problem with other tools:
{{{
# Will not work
ssh remote cat \> "your file" < "my file"
}}}
Line 8: Line 20:
var='some string'
search=some
rep=another
printf '%s\n' "${var//"$search"/$rep}"
# Will not work
rsync "my file" remote:"your file"
Line 14: Line 24:
This is discussed in more detail in [[BashFAQ/100|Faq #100]]. So, what works?
Line 16: Line 26:
If it's a file or stream, things get a bit trickier. One way to accomplish this would be to combine the previous method with [[BashFAQ/001|Faq #1]].
=== NFS ===
If you mount the remote host's file system onto your local host with NFS (or any other competent network file system sharing technology, including sshfs, or possibly even smbfs) then you can just perform a direct copy:
Line 19: Line 29:
search=foo
rep=bar

# file
while IFS= read -r line; do
  printf '%s\n' "${line//"$search"/$rep}"
done < "$file"

# command output
while IFS= read -r line; do
  printf '%s\n' "${line//"$search"/$rep}"
done < <(my_command)

my_command | while IFS= read -r line; do
  printf '%s\n' "${line//"$search"/$rep}"
done
}}}
Note that the last example creates a subshell. See [[BashFAQ/024|Faq #24]] for more information on that.

Both of the above examples print to stdout; neither actually edits the file in place. Of course this could be resolved with something like:
{{{
# create a temp file, die on failure
tmp=$(mktemp) || exit

while IFS= read -r line; do
  printf '%s\n' "${line//"$search"/$rep}"
done < "$file" > "$tmp" && mv "$tmp" "$file"
cp "my file" /remote/"your file"
Line 48: Line 32:
=== Carefully encoding the remote name ===
Now, obviously if you ''know'' the remote name at the time you're writing the command, you can encode it in a way that you know the remote shell will be able to decipher. Usually this means adding one extra layer of quotes. For example, this works:
{{{
scp "my file" remote:"'your file'"
}}}
Line 49: Line 38:
On large data sets, you'll notice that this is quite slow. The following functions use awk, and are quite a bit faster: But in the general case, we ''won't'' know the exact remote filename at the time we're writing a script. It will be given to our script as an argument, or an environment variable, etc. In that case, we have to be clever enough to encode ''any'' possible filename.
Line 51: Line 40:
The problem is further complicated by the fact that we don't necessarily know which shell the remote user is using. Just because you're using bash on your client workstation, that doesn't mean the remote system's sshd is going to spawn bash to parse your command. (And remember, scp sends a shell command over ssh, which ''some unknown remote shell'' is going to have to parse.) So, any solution we use must be as shell-agnostic as possible. That rules out bash's `printf %q` for example.
Line 52: Line 42:
The first two here are similar to sed 's/STR/REP/', they only replaces the first instance on each line.
The first function operates on stdin and writes to stdout, the second overwrites FILE.
Given these constraints, the only remaining approach is to wrap single quotes around the entire filename. This means we also have to modify any existing single quotes that are already in the filename. So, our encoding goes like this:
Line 55: Line 44:
# usage: sub_literal STR REP
sub_literal() {
  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }
q=\'
dest="'${dest//$q/$q\\$q$q}'"
}}}
Line 64: Line 48:
    # if the search string is in the line
    (i = index($0, str)) {
      # replace the first occurance with rep
      $0 = substr($0, 1, i-1) rep substr($0, i + len);
    }
This gives us a modified `dest` which has literal single quotes at the start and end, and which has replaced all internal `'` characters with `'\''`. When this is passed to a remote shell for parsing, the result is our original filename.
Line 70: Line 50:
    # print each line
    1
  '
}

# usage: sub_literal_f STR REP FILE
sub_literal_f() {
  local tmp
  if ! [[ -f $3 && -r $3 && -w $3 ]]; then
    printf '%s does not exist or is not readable or writable\n' "$3" >&2
    return 1
  fi

  trap 'rm -f "$tmp"' RETURN
  tmp=$(mktemp) || return

  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    # if the search string is in the line
    (i = index($0, str)) {
      # replace the first occurance with rep
      $0 = substr($0, 1, i-1) rep substr($0, i + len);
    }

    # print each line
    1
  ' "$3" > "$tmp" && mv "$tmp" "$3"
So, a full copy function would look something like this:
{{{
# copyto <sourcefile> <remotehost> <remotefile>
copyto() {
    local q dest
    q=\'
    dest="'${3//$q/$q\\$q$q}'"
    scp "$1" "$2":"$dest"
Line 105: Line 61:
The next two functions are similar to 's/STR/REP/g', replacing every instance. Just like above, the first reads stdin and writes to stdout, the second actually edits FILE. === Sending the filename in the data stream ===
This approach is a bit less portable, because it requires that bash be installed on the remote host (though not necessarily as the remote user's login shell). It is a more generalized solution, because in theory ''any'' kind of data can be passed in the stream, as long as you can write a parser for it (but remember, you have to ''send the parser'' to the remote system for execution, so it needs to be simple).
Line 107: Line 64:
In this example, we are going to send a data stream which has two things in it: a filename, and the file's contents. They will be separated by a NUL byte. We use bash to parse this stream on the remote system, because it is one of the very few shells that can parse NUL-delimited data streams.
Line 108: Line 66:
# usage: gsub_literal STR REP
gsub_literal() {
  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    {
      # empty the output string
      out = "";

      # continue looping while the search string is in the line
      while (i = index($0, str)) {
        # append everything up to the search string, and the replacement string
        out = out substr($0, 1, i-1) rep;

        # remove everything up to and including the first instance of the
        # search string from the line
        $0 = substr($0, i + len);
      }

      # append whatever is left
      out = out $0;

      print out;
    }
  '
}

# usage: gsub_literal_f STR REP FILE
gsub_literal_f() {
  local tmp
  if ! [[ -f $3 && -r $3 && -w $3 ]]; then
    printf '%s does not exist or is not readable or writable\n' "$3" >&2
    return 1
  fi

  trap 'rm -f "$tmp"' RETURN
  tmp=$(mktemp) || return

  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    {
      # empty the output string
      out = "";

      # continue looping while the search string is in the line
      while (i = index($0, str)) {
        # append everything up to the search string, and the replacement string
        out = out substr($0, 1, i-1) rep;

        # remove everything up to and including the first instance of the
        # search string from the line
        $0 = substr($0, i + len);
      }

      # append whatever is left
      out = out $0;

      print out;
    }
  ' "$3" > "$tmp" && mv "$tmp" "$3"
# copyto <sourcefile> <remotehost> <remotefile>
copyto() {
    { printf '%s\0' "$3"; cat < "$1"; } |
    ssh "$2" bash -c \''read -rd ""; cat > "$REPLY"'\'
Line 179: Line 72:
For more information on how these work, and awk in general, i recommend the #awk channel on freenode.
Line 181: Line 73:
The mktemp(1) command used in some of the examples above is not completely portable. While it will work on most systems, more information on safely creating temp files can be found in [[BashFAQ/062|Faq #62]]. {{{#!html
<div style="float: right; border: 1px solid; padding: 10px; margin-left: 20px">
"It is a riddle, wrapped in a mystery, inside an enigma."<br>
-- Winston Churchill
</div>
}}}
Our parser is the bash command `read -rd ""; cat > "$REPLY"`. This reads the filename (terminated by NUL) into the shell variable `REPLY`, then calls cat to read the remainder of the stream. There are two quoting layers around the parser, because we need to quote it for our local shell ''and'' for the remote shell. So, we avoid ''all'' use of single quotes in the parser, use single quotes for the local layer, and escaped single quotes for the remote layer.

This version does not use scp, so it doesn't copy the file's permissions. If you want to do that, you could pass the permissions as another object in the data stream, parse it out, and call chmod. (There is no portable way to retrieve a local file's permissions, so that's actually the hardest part.)

How do I copy a file to a remote system, and specify a remote name which may contain spaces?

All of the common tools for copying files to a remote system (ssh, scp, rsync) send the filename as part of a shell command, which the remote system interprets. This makes the issue extremely complex, because the remote shell will often mangle the filename. There are at least three ways to deal with the problem: NFS, careful encoding of the filename, or submission of the filename as part of the data stream.

First let's look at what does not work:

# Will not work
scp "my file" remote:"your file"

scp is basically a thin wrapper on top of ssh, which works by instructing the remote system's shell to open a file for writing. Since the filename is passed to the remote shell in the most naive way imaginable, the remote shell sees the space as an argument separator, and ends up creating a file named your.

Similar problems plague most of the "obvious" (but wrong) attempts to address the problem with other tools:

# Will not work
ssh remote cat \> "your file" < "my file"

# Will not work
rsync "my file" remote:"your file"

So, what works?

NFS

If you mount the remote host's file system onto your local host with NFS (or any other competent network file system sharing technology, including sshfs, or possibly even smbfs) then you can just perform a direct copy:

cp "my file" /remote/"your file"

Carefully encoding the remote name

Now, obviously if you know the remote name at the time you're writing the command, you can encode it in a way that you know the remote shell will be able to decipher. Usually this means adding one extra layer of quotes. For example, this works:

scp "my file" remote:"'your file'"

But in the general case, we won't know the exact remote filename at the time we're writing a script. It will be given to our script as an argument, or an environment variable, etc. In that case, we have to be clever enough to encode any possible filename.

The problem is further complicated by the fact that we don't necessarily know which shell the remote user is using. Just because you're using bash on your client workstation, that doesn't mean the remote system's sshd is going to spawn bash to parse your command. (And remember, scp sends a shell command over ssh, which some unknown remote shell is going to have to parse.) So, any solution we use must be as shell-agnostic as possible. That rules out bash's printf %q for example.

Given these constraints, the only remaining approach is to wrap single quotes around the entire filename. This means we also have to modify any existing single quotes that are already in the filename. So, our encoding goes like this:

q=\'
dest="'${dest//$q/$q\\$q$q}'"

This gives us a modified dest which has literal single quotes at the start and end, and which has replaced all internal ' characters with '\''. When this is passed to a remote shell for parsing, the result is our original filename.

So, a full copy function would look something like this:

# copyto <sourcefile> <remotehost> <remotefile>
copyto() {
    local q dest
    q=\'
    dest="'${3//$q/$q\\$q$q}'"
    scp "$1" "$2":"$dest"
}

Sending the filename in the data stream

This approach is a bit less portable, because it requires that bash be installed on the remote host (though not necessarily as the remote user's login shell). It is a more generalized solution, because in theory any kind of data can be passed in the stream, as long as you can write a parser for it (but remember, you have to send the parser to the remote system for execution, so it needs to be simple).

In this example, we are going to send a data stream which has two things in it: a filename, and the file's contents. They will be separated by a NUL byte. We use bash to parse this stream on the remote system, because it is one of the very few shells that can parse NUL-delimited data streams.

# copyto <sourcefile> <remotehost> <remotefile>
copyto() {
    { printf '%s\0' "$3"; cat < "$1"; } |
    ssh "$2" bash -c \''read -rd ""; cat > "$REPLY"'\'
}

"It is a riddle, wrapped in a mystery, inside an enigma."
-- Winston Churchill

Our parser is the bash command read -rd ""; cat > "$REPLY". This reads the filename (terminated by NUL) into the shell variable REPLY, then calls cat to read the remainder of the stream. There are two quoting layers around the parser, because we need to quote it for our local shell and for the remote shell. So, we avoid all use of single quotes in the parser, use single quotes for the local layer, and escaped single quotes for the remote layer.

This version does not use scp, so it doesn't copy the file's permissions. If you want to do that, you could pass the permissions as another object in the data stream, parse it out, and call chmod. (There is no portable way to retrieve a local file's permissions, so that's actually the hardest part.)

BashFAQ/110 (last edited 2021-09-30 00:41:01 by emanuele6)