Differences between revisions 2 and 46 (spanning 44 versions)
Revision 2 as of 2007-09-15 18:50:34
Size: 1946
Editor: irc2samus
Comment:
Revision 46 as of 2021-02-08 16:03:51
Size: 26917
Editor: GreyCat
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
[[Anchor(faq71)]]
== How do I convert an ASCII character to its decimal (or hexadecimal) value and back? ==

This task is quite easy while using the {{{printf}}} builtin. You can either write two simple functions as shown below or use the plain {{{printf}}} constructions alone.

{{{
   # chr() - converts decimal value to its ASCII character representation
   # ord() - converts ASCII character to its decimal value
 
   chr() {
     printf \\$(printf '%03o' $1)
   }
 
   ord() {
     printf '%d' "'$1"
   }

   hex() {
      printf '%x' "'$1"
   }

   # examples:
 
   chr $(ord A) # -> A
   ord $(chr 65) # -> 65
}}}

The {{{ord}}} function above is quite tricky. It can be re-written in several other ways (use that one that will best suite your coding style or your actual needs).

 ''Q: Tricky? Rather, it's using a feature that I can't find documented anywhere -- putting a single quote in front of an integer. Neat effect, but how on '''earth''' did you find out about it? Source diving? -- GreyCat''

 ''A: It validates The Single Unix Specification: "If the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote." (see [http://www.opengroup.org/onlinepubs/009695399/utilities/printf.html printf()] to know more) -- mjf''

{{{
   ord() {
     printf '%d' \"$1\"
   }
}}}

Or:

{{{
   ord() {
     printf '%d' \'$1\'
   }
}}}

Or, rather:

{{{
   ord() {
     printf '%d' "'$1'"
   }
}}}

Etc. All of the above {{{ord}}} functions should work properly. Which one you choose highly depends on particular situation.



There is also an alternative when printing characters by their ascii value that is using escape sequences like:
{{{
   echo $'\x27'
}}}
which prints a literal ' (there 27 is the hexadecimal ascii value of the character).
<<Anchor(faq71)>>
== How do I convert an ASCII character to its decimal (or hexadecimal) value and back? How do I do URL encoding or URL decoding? ==
If you have a known octal or hexadecimal value (at script-writing time), you can just use `printf`:

{{{#!highlight bash
# POSIX
printf '\047\n'

# bash/ksh/zsh and a few other printf implementations also support:
printf '\x27\n'
}}}

In [[locale|locales]] where the character encoding is a superset of ASCII, this prints the literal ' character (47 is the octal ASCII value of the apostrophe character) and a newline. The hexadecimal version can also be used with a few `printf` implementations including the bash builtin, but is not standard/POSIX.

`printf` in bash 4.2 and higher, and in ksh93, supports Unicode code points as well:

{{{#!highlight bash
# bash 4.2, ksh93
printf '\u0027\n'
}}}

Another approach: bash's `$'...'` [[Quotes|quoting]] can be used to expand to the desired characters, either in a variable assignment, or directly as a command argument:

{{{#!highlight bash
ExpandedString=$'\x27\047\u0027\U00000027\n'
printf %s\\n "$ExpandedString"

# or, more simply
printf %s\\n $'\x27\047\u0027\U00000027\n'
}}}

If you need to convert characters (or numeric ASCII values) that are not known in advance (i.e., in variables), you can use something a little more complicated. '''Note''': These functions ''only'' work for single-byte character encodings.

{{{#!highlight bash
# POSIX
# chr() - converts decimal value to its ASCII character representation
# ord() - converts ASCII character to its decimal value

chr() {
  [ "$1" -lt 256 ] || return 1
  printf "\\$(printf %o "$1")"
}
}}}

Even better to avoid using a subshell is to pass the value inside a variable instead of the command output.
faster as it avoids the subshell
{{{#!highlight bash
chr () {
  local val
  [ "$1" -lt 256 ] || return 1
  printf -v val %o "$1"; printf "\\$val"
  # That one requires bash 3.1 or above.
}
}}}

{{{#!highlight bash
ord() {
  # POSIX
  LC_CTYPE=C printf %d "'$1"
}

# hex() - converts ASCII character to a hexadecimal value
# unhex() - converts a hexadecimal value to an ASCII character

hex() {
   LC_CTYPE=C printf %x "'$1"
}

unhex() {
   printf "\\x$1"
}

# examples:

chr "$(ord A)" # -> A
ord "$(chr 65)" # -> 65
}}}

The {{{ord}}} function above is quite tricky.

 . ''Tricky? Rather, it's using a feature that I can't find documented anywhere -- putting a single quote in front of a character. Neat effect, but how on '''earth''' did you find out about it? Source diving? -- GreyCat''
  . ''It validates The Single Unix Specification: "If the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote." (see [[http://www.opengroup.org/onlinepubs/009695399/utilities/printf.html|printf()]] to know more) -- mjf''

=== URL encoding and URL decoding ===
Note that [[https://tools.ietf.org/html/rfc3986|URL encoding]] is defined only at the ''byte'' (octet) level. A URL-encoding of a multibyte (e.g. UTF-8) character is done by simply encoding each byte individually, then concatenating everything.

Also note that the urldecode function below performs no error checking; getting it to yield a sensible error message when you feed it malformed input is left as an exercise for the reader.

{{{#!highlight bash
urlencode() {
    # urlencode <string>
    local LC_ALL=C c i n
    for (( i = 0, n = ${#1}; i < n; i++ )); do
        c=${1:i:1}
        case $c in
            [[:alnum:].~_-]) printf %s "$c" ;;
            *) printf %%%02X "'$c" ;;
        esac
    done
}
}}}

{{{#!highlight bash
urldecode() {
    # urldecode <string>
    local s
    s=${1//\\/\\\\}
    s=${s//+/ }
    printf %b "${s//'%'/\\x}"
}
}}}

{{{#!highlight bash
# Alternative urlencode, prints all at once (requires bash 3.1)
urlencode() {
    # urlencode <string>
    local LC_ALL=C c i n=${#1}
    local out= tmp
    for (( i=0; i < n; i++ )); do
        c=${1:i:1}
        case $c in
            [[:alnum:].~_-]) printf -v tmp %s "$c" ;;
            *) printf -v tmp %%%02X "'$c" ;;
        esac
        out+=$tmp
    done
    printf %s "$out"
}
}}}

=== More complete examples (with UTF-8 support) ===

The command-line utility `nkf` can decode URLs:
{{{#!highlight bash
echo 'https://ja.wikipedia.org/wiki/%E9%87%8E%E8%89%AF%E7%8C%AB' | nkf --url-input
}}}

==== Note about Ext Ascii and UTF-8 encoding ====
 ''The following example was never peer-reviewed. Everyone is terrified of it. Proceed at your own risk.''

  * for values 0x00 - 0x7f Identical
  * for values 0x80 - 0xff conflict between UTF-8 & ExtAscii
  * for values 0x100 - 0xffff Only UTF-8 UTF-16 UTF-32
  * for values 0x100 - 0x7FFFFFFF Only UTF-8 UTF-32

  || value || EAscii || UTF-8 || UTF-16 || UTF-32 ||
  || 0x20 || "\x20" || "\x20" || \u0020 || \U00000020 ||
  || 0x20 || "\x7f" || "\x7f" || \u007f || \U0000007f ||
  || 0x80 || "\x80" || "\xc2\x80" || \u0080 || \U00000080 ||
  || 0xff || "\xff" || "\xc3\xbf" || \u00ff || \U000000ff ||
  || 0x100 || N/A || "\xc4\x80" || \u0100 || \U00000100 ||
  || 0x1000 || N/A || "\xc8\x80" || \u1000 || \U00001000 ||
  || 0xffff || N/A || "\xef\xbf\xbf" || \uffff || \U0000ffff ||
  || 0x10000 || N/A || "\xf0\x90\x80\x80" || \ud800\udc00 || \U00010000 ||
  || 0xfffff || N/A || "\xf3\xbf\xbf\xbf" || \udbbf\udfff || \U000fffff ||
  || 0x10000000 || N/A || "\xfc\x90\x80\x80\x80\x80" || N/A || \U10000000 ||
  || 0x7fffffff || N/A || "\xfd\xbf\xbf\xbf\xbf\xbf" || N/A || \U7fffffff ||
  || 0x80000000 || N/A || N/A || N/A || N/A ||
  || 0xffffffff || N/A || N/A || N/A || N/A ||

{{{#!highlight bash
  ###########################################################################
  ## ord family
  ###########################################################################
  # ord <Return Variable Name> <Char to convert> [Optional Format String]
  # ord_hex <Return Variable Name> <Char to convert>
  # ord_oct <Return Variable Name> <Char to convert>
  # ord_utf8 <Return Variable Name> <Char to convert> [Optional Format String]
  # ord_eascii <Return Variable Name> <Char to convert> [Optional Format String]
  # ord_echo <Char to convert> [Optional Format String]
  # ord_hex_echo <Char to convert>
  # ord_oct_echo <Char to convert>
  # ord_utf8_echo <Char to convert> [Optional Format String]
  # ord_eascii_echo <Char to convert> [Optional Format String]
  #
  # Description:
  # converts character using native encoding to its decimal value and stores
  # it in the Variable specified
  #
  # ord
  # ord_hex output in hex
  # ord_hex output in octal
  # ord_utf8 forces UTF8 decoding
  # ord_eascii forces eascii decoding
  # ord_echo prints to stdout
  function ord {
          printf -v "${1?Missing Dest Variable}" "${3:-%d}" "'${2?Missing Char}"
  }
  function ord_oct {
          ord "${@:1:2}" "0%c"
  }
  function ord_hex {
          ord "${@:1:2}" "0x%x"
  }
  function ord_utf8 {
          LC_CTYPE=C.UTF-8 ord "${@}"
  }
  function ord_eascii {
          LC_CTYPE=C ord "${@}"
  }
  function ord_echo {
          printf "${2:-%d}" "'${1?Missing Char}"
  }
  function ord_oct_echo {
          ord_echo "${1}" "0%o"
  }
  function ord_hex_echo {
          ord_echo "${1}" "0x%x"
  }
  function ord_utf8_echo {
          LC_CTYPE=C.UTF-8 ord_echo "${@}"
  }
  function ord_eascii_echo {
          LC_CTYPE=C ord_echo "${@}"
  }

  ###########################################################################
  ## chr family
  ###########################################################################
  # chr_utf8 <Return Variale Name> <Integer to convert>
  # chr_eascii <Return Variale Name> <Integer to convert>
  # chr <Return Variale Name> <Integer to convert>
  # chr_oct <Return Variale Name> <Octal number to convert>
  # chr_hex <Return Variale Name> <Hex number to convert>
  # chr_utf8_echo <Integer to convert>
  # chr_eascii_echo <Integer to convert>
  # chr_echo <Integer to convert>
  # chr_oct_echo <Octal number to convert>
  # chr_hex_echo <Hex number to convert>
  #
  # Description:
  # converts decimal value to character representation an stores
  # it in the Variable specified
  #
  # chr Tries to guess output format
  # chr_utf8 forces UTF8 encoding
  # chr_eascii forces eascii encoding
  # chr_echo prints to stdout
  #
  function chr_utf8_m {
    local val
    #
    # bash only supports \u \U since 4.2
    #

    # here is an example how to encode
    # manually
    # this will work since Bash 3.1 as it uses -v.
    #
    if [[ ${2:?Missing Ordinal Value} -le 0x7f ]]; then
      printf -v val "\\%03o" "${2}"
    elif [[ ${2} -le 0x7ff ]]; then
      printf -v val "\\%03o" \
        $(( (${2}>> 6) |0xc0 )) \
        $(( ( ${2} &0x3f)|0x80 ))
    elif [[ ${2} -le 0xffff ]]; then
      printf -v val "\\%03o" \
        $(( ( ${2}>>12) |0xe0 )) \
        $(( ((${2}>> 6)&0x3f)|0x80 )) \
        $(( ( ${2} &0x3f)|0x80 ))
    elif [[ ${2} -le 0x1fffff ]]; then
      printf -v val "\\%03o" \
        $(( ( ${2}>>18) |0xf0 )) \
        $(( ((${2}>>12)&0x3f)|0x80 )) \
        $(( ((${2}>> 6)&0x3f)|0x80 )) \
        $(( ( ${2} &0x3f)|0x80 ))
    elif [[ ${2} -le 0x3ffffff ]]; then
      printf -v val "\\%03o" \
        $(( ( ${2}>>24) |0xf8 )) \
        $(( ((${2}>>18)&0x3f)|0x80 )) \
        $(( ((${2}>>12)&0x3f)|0x80 )) \
        $(( ((${2}>> 6)&0x3f)|0x80 )) \
        $(( ( ${2} &0x3f)|0x80 ))
    elif [[ ${2} -le 0x7fffffff ]]; then
      printf -v val "\\%03o" \
        $(( ( ${2}>>30) |0xfc )) \
        $(( ((${2}>>24)&0x3f)|0x80 )) \
        $(( ((${2}>>18)&0x3f)|0x80 )) \
        $(( ((${2}>>12)&0x3f)|0x80 )) \
        $(( ((${2}>> 6)&0x3f)|0x80 )) \
        $(( ( ${2} &0x3f)|0x80 ))
    else
      printf -v "${1:?Missing Dest Variable}" ""
      return 1
    fi
    printf -v "${1:?Missing Dest Variable}" "${val}"
  }
  function chr_utf8 {
          local val
          [[ ${2?Missing Ordinal Value} -lt 0x80000000 ]] || return 1

          if [[ ${2} -lt 0x100 && ${2} -ge 0x80 ]]; then

                  # bash 4.2 incorrectly encodes
                  # \U000000ff as \xff so encode manually
                  printf -v val "\\%03o\%03o" $(( (${2}>>6)|0xc0 )) $(( (${2}&0x3f)|0x80 ))
          else
                  printf -v val '\\U%08x' "${2}"
          fi
          printf -v ${1?Missing Dest Variable} ${val}
  }
  function chr_eascii {
          local val
          # Make sure value less than 0x100
          # otherwise we end up with
          # \xVVNNNNN
          # where \xVV = char && NNNNN is a number string
          # so chr "0x44321" => "D321"
          [[ ${2?Missing Ordinal Value} -lt 0x100 ]] || return 1
          printf -v val '\\x%02x' "${2}"
          printf -v ${1?Missing Dest Variable} ${val}
  }
  function chr {
          if [ "${LC_CTYPE:-${LC_ALL:-}}" = "C" ]; then
                  chr_eascii "${@}"
          else
                  chr_utf8 "${@}"
          fi
  }
  function chr_dec {
          # strip leading 0s otherwise
          # interpreted as Octal
          chr "${1}" "${2#${2%%[!0]*}}"
  }
  function chr_oct {
          chr "${1}" "0${2}"
  }
  function chr_hex {
          chr "${1}" "0x${2#0x}"
  }
  function chr_utf8_echo {
          local val
          [[ ${1?Missing Ordinal Value} -lt 0x80000000 ]] || return 1

          if [[ ${1} -lt 0x100 && ${1} -ge 0x80 ]]; then

                  # bash 4.2 incorrectly encodes
                  # \U000000ff as \xff so encode manually
                  printf -v val '\\%03o\\%03o' $(( (${1}>>6)|0xc0 )) $(( (${1}&0x3f)|0x80 ))
          else
                  printf -v val '\\U%08x' "${1}"
          fi
          printf "${val}"
  }
  function chr_eascii_echo {
          local val
          # Make sure value less than 0x100
          # otherwise we end up with
          # \xVVNNNNN
          # where \xVV = char && NNNNN is a number string
          # so chr "0x44321" => "D321"
          [[ ${1?Missing Ordinal Value} -lt 0x100 ]] || return 1
          printf -v val '\\x%x' "${1}"
          printf "${val}"
  }
  function chr_echo {
          if [ "${LC_CTYPE:-${LC_ALL:-}}" = "C" ]; then
                  chr_eascii_echo "${@}"
          else
                  chr_utf8_echo "${@}"
          fi
  }
  function chr_dec_echo {
          # strip leading 0s otherwise
          # interpreted as Octal
          chr_echo "${1#${1%%[!0]*}}"
  }
  function chr_oct_echo {
          chr_echo "0${1}"
  }
  function chr_hex_echo {
          chr_echo "0x${1#0x}"
  }

  #
  # Simple Validation code
  #
  function test_echo_func {
    local Outcome _result
    _result="$( "${1}" "${2}" )"
    [ "${_result}" = "${3}" ] && Outcome="Pass" || Outcome="Fail"
    printf "# %-20s %-6s => " "${1}" "${2}" "${_result}" "${3}"
    printf "[ "%16q" = "%-16q"%-5s ] " "${_result}" "${3}" "(${3//[[:cntrl:]]/_})"
    printf "%s\n" "${Outcome}"


  }
  function test_value_func {
    local Outcome _result
    "${1}" _result "${2}"
    [ "${_result}" = "${3}" ] && Outcome="Pass" || Outcome="Fail"
    printf "# %-20s %-6s => " "${1}" "${2}" "${_result}" "${3}"
    printf "[ "%16q" = "%-16q"%-5s ] " "${_result}" "${3}" "(${3//[[:cntrl:]]/_})"
    printf "%s\n" "${Outcome}"
  }
  test_echo_func chr_echo "$(ord_echo "A")" "A"
  test_echo_func ord_echo "$(chr_echo "65")" "65"
  test_echo_func chr_echo "$(ord_echo "ö")" "ö"
  test_value_func chr "$(ord_echo "A")" "A"
  test_value_func ord "$(chr_echo "65")" "65"
  test_value_func chr "$(ord_echo "ö")" "ö"
  # chr_echo 65 => [ A = A (A) ] Pass
  # ord_echo A => [ 65 = 65 (65) ] Pass
  # chr_echo 246 => [ $'\303\266' = $'\303\266' (ö) ] Pass
  # chr 65 => [ A = A (A) ] Pass
  # ord A => [ 65 = 65 (65) ] Pass
  # chr 246 => [ $'\303\266' = $'\303\266' (ö) ] Pass
  #


  test_echo_func chr_echo "65" A
  test_echo_func chr_echo "065" 5
  test_echo_func chr_dec_echo "065" A
  test_echo_func chr_oct_echo "65" 5
  test_echo_func chr_hex_echo "65" e
  test_value_func chr "65" A
  test_value_func chr "065" 5
  test_value_func chr_dec "065" A
  test_value_func chr_oct "65" 5
  test_value_func chr_hex "65" e
  # chr_echo 65 => [ A = A (A) ] Pass
  # chr_echo 065 => [ 5 = 5 (5) ] Pass
  # chr_dec_echo 065 => [ A = A (A) ] Pass
  # chr_oct_echo 65 => [ 5 = 5 (5) ] Pass
  # chr_hex_echo 65 => [ e = e (e) ] Pass
  # chr 65 => [ A = A (A) ] Pass
  # chr 065 => [ 5 = 5 (5) ] Pass
  # chr_dec 065 => [ A = A (A) ] Pass
  # chr_oct 65 => [ 5 = 5 (5) ] Pass
  # chr_hex 65 => [ e = e (e) ] Pass

  #test_value_func chr 0xff $'\xff'
  test_value_func chr_eascii 0xff $'\xff'
  test_value_func chr_utf8 0xff $'\uff' # Note this fails because bash encodes it incorrectly
  test_value_func chr_utf8 0xff $'\303\277'
  test_value_func chr_utf8 0x100 $'\u100'
  test_value_func chr_utf8 0x1000 $'\u1000'
  test_value_func chr_utf8 0xffff $'\uffff'
  # chr_eascii 0xff => [ $'\377' = $'\377' (�) ] Pass
  # chr_utf8 0xff => [ $'\303\277' = $'\377' (�) ] Fail
  # chr_utf8 0xff => [ $'\303\277' = $'\303\277' (ÿ) ] Pass
  # chr_utf8 0x100 => [ $'\304\200' = $'\304\200' (Ā) ] Pass
  # chr_utf8 0x1000 => [ $'\341\200\200' = $'\341\200\200' (က) ] Pass
  # chr_utf8 0xffff => [ $'\357\277\277' = $'\357\277\277' (���) ] Pass
  test_value_func ord_utf8 "A" 65
  test_value_func ord_utf8 "ä" 228
  test_value_func ord_utf8 $'\303\277' 255
  test_value_func ord_utf8 $'\u100' 256



  #########################################################
  # to help debug problems try this
  #########################################################
  printf "%q\n" $'\xff' # => $'\377'
  printf "%q\n" $'\uffff' # => $'\357\277\277'
  printf "%q\n" "$(chr_utf8_echo 0x100)" # => $'\304\200'
  #
  # This can help a lot when it comes to diagnosing problems
  # with read and or xterm program output
  # I use it a lot in error case to create a human readable error message
  # i.e.
  echo "Enter type to test, Enter to continue"
  while read -srN1 ; do
    ord asciiValue "${REPLY}"
    case "${asciiValue}" in
      10) echo "Goodbye" ; break ;;
      20|21|22) echo "Yay expected input" ;;
      *) printf ':( Unexpected Input 0x%02x %q "%s"\n' "${asciiValue}" "${REPLY}" "${REPLY//[[:cntrl:]]}" ;;
    esac
  done

  #########################################################
  # More exotic approach 1
  #########################################################
  # I used to use this before I figured out the LC_CTYPE=C approach
  # printf "EAsciiLookup=%q" "$(for (( x=0x0; x<0x100 ; x++)); do printf '%b' $(printf '\\x%02x' "$x"); done)"
  EAsciiLookup=$'\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023'
  EAsciiLookup+=$'\024\025\026\027\030\031\032\E\034\035\036\037 !"#$%&\'()*+,-'
  EAsciiLookup+=$'./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghi'
  EAsciiLookup+=$'jklmnopqrstuvwxyz{|}~\177\200\201\202\203\204\205\206\207\210'
  EAsciiLookup+=$'\211\212\213\214\215\216\217\220\221\222\223\224\225\226\227'
  EAsciiLookup+=$'\230\231\232\233\234\235\236\237\240\241\242\243\244\245\246'
  EAsciiLookup+=$'\247\250\251\252\253\254\255\256\257\260\261\262\263\264\265'
  EAsciiLookup+=$'\266\267\270\271\272\273\274\275\276\277\300\301\302\303\304'
  EAsciiLookup+=$'\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323'
  EAsciiLookup+=$'\324\325\326\327\330\331\332\333\334\335\336\337\340\341\342'
  EAsciiLookup+=$'\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361'
  EAsciiLookup+=$'\362\363\364\365\366\367\370\371\372\373\374\375\376\377'
  function ord_eascii2 {
    local idx="${EAsciiLookup%%${2:0:1}*}"
    eval ${1}'=$(( ${#idx} +1 ))'
  }

  #########################################################
  # More exotic approach 2
  #########################################################
  #printf "EAsciiLookup2=(\n %s\n)" "$(for (( x=0x1; x<0x100 ; x++)); do printf '%-18s' "$(printf '[_%q]="0x%02x"' "$(printf "%b" "$(printf '\\x%02x' "$x")")" $x )" ; [ "$(($x%6))" != "0" ] || echo -en "\n " ; done)"
  typeset -A EAsciiLookup2
  EAsciiLookup2=(
    [_$'\001']="0x01" [_$'\002']="0x02" [_$'\003']="0x03" [_$'\004']="0x04"
    [_$'\005']="0x05" [_$'\006']="0x06" [_$'\a']="0x07" [_$'\b']="0x08"
    [_$'\t']="0x09" [_'']="0x0a" [_$'\v']="0x0b" [_$'\f']="0x0c"
    [_$'\r']="0x0d" [_$'\016']="0x0e" [_$'\017']="0x0f" [_$'\020']="0x10"
    [_$'\021']="0x11" [_$'\022']="0x12" [_$'\023']="0x13" [_$'\024']="0x14"
    [_$'\025']="0x15" [_$'\026']="0x16" [_$'\027']="0x17" [_$'\030']="0x18"
    [_$'\031']="0x19" [_$'\032']="0x1a" [_$'\E']="0x1b" [_$'\034']="0x1c"
    [_$'\035']="0x1d" [_$'\036']="0x1e" [_$'\037']="0x1f" [_\ ]="0x20"
    [_\!]="0x21" [_\"]="0x22" [_\#]="0x23" [_\$]="0x24"
    [_%]="0x25" [_\&]="0x26" [_\']="0x27" [_\(]="0x28"
    [_\)]="0x29" [_\*]="0x2a" [_+]="0x2b" [_\,]="0x2c"
    [_-]="0x2d" [_.]="0x2e" [_/]="0x2f" [_0]="0x30"
    [_1]="0x31" [_2]="0x32" [_3]="0x33" [_4]="0x34"
    [_5]="0x35" [_6]="0x36" [_7]="0x37" [_8]="0x38"
    [_9]="0x39" [_:]="0x3a" [_\;]="0x3b" [_\<]="0x3c"
    [_=]="0x3d" [_\>]="0x3e" [_\?]="0x3f" [_@]="0x40"
    [_A]="0x41" [_B]="0x42" [_C]="0x43" [_D]="0x44"
    [_E]="0x45" [_F]="0x46" [_G]="0x47" [_H]="0x48"
    [_I]="0x49" [_J]="0x4a" [_K]="0x4b" [_L]="0x4c"
    [_M]="0x4d" [_N]="0x4e" [_O]="0x4f" [_P]="0x50"
    [_Q]="0x51" [_R]="0x52" [_S]="0x53" [_T]="0x54"
    [_U]="0x55" [_V]="0x56" [_W]="0x57" [_X]="0x58"
    [_Y]="0x59" [_Z]="0x5a" [_\[]="0x5b" #[_\\]="0x5c"
    #[_\]]="0x5d"
                      [_\^]="0x5e" [__]="0x5f" [_\`]="0x60"
    [_a]="0x61" [_b]="0x62" [_c]="0x63" [_d]="0x64"
    [_e]="0x65" [_f]="0x66" [_g]="0x67" [_h]="0x68"
    [_i]="0x69" [_j]="0x6a" [_k]="0x6b" [_l]="0x6c"
    [_m]="0x6d" [_n]="0x6e" [_o]="0x6f" [_p]="0x70"
    [_q]="0x71" [_r]="0x72" [_s]="0x73" [_t]="0x74"
    [_u]="0x75" [_v]="0x76" [_w]="0x77" [_x]="0x78"
    [_y]="0x79" [_z]="0x7a" [_\{]="0x7b" [_\|]="0x7c"
    [_\}]="0x7d" [_~]="0x7e" [_$'\177']="0x7f" [_$'\200']="0x80"
    [_$'\201']="0x81" [_$'\202']="0x82" [_$'\203']="0x83" [_$'\204']="0x84"
    [_$'\205']="0x85" [_$'\206']="0x86" [_$'\207']="0x87" [_$'\210']="0x88"
    [_$'\211']="0x89" [_$'\212']="0x8a" [_$'\213']="0x8b" [_$'\214']="0x8c"
    [_$'\215']="0x8d" [_$'\216']="0x8e" [_$'\217']="0x8f" [_$'\220']="0x90"
    [_$'\221']="0x91" [_$'\222']="0x92" [_$'\223']="0x93" [_$'\224']="0x94"
    [_$'\225']="0x95" [_$'\226']="0x96" [_$'\227']="0x97" [_$'\230']="0x98"
    [_$'\231']="0x99" [_$'\232']="0x9a" [_$'\233']="0x9b" [_$'\234']="0x9c"
    [_$'\235']="0x9d" [_$'\236']="0x9e" [_$'\237']="0x9f" [_$'\240']="0xa0"
    [_$'\241']="0xa1" [_$'\242']="0xa2" [_$'\243']="0xa3" [_$'\244']="0xa4"
    [_$'\245']="0xa5" [_$'\246']="0xa6" [_$'\247']="0xa7" [_$'\250']="0xa8"
    [_$'\251']="0xa9" [_$'\252']="0xaa" [_$'\253']="0xab" [_$'\254']="0xac"
    [_$'\255']="0xad" [_$'\256']="0xae" [_$'\257']="0xaf" [_$'\260']="0xb0"
    [_$'\261']="0xb1" [_$'\262']="0xb2" [_$'\263']="0xb3" [_$'\264']="0xb4"
    [_$'\265']="0xb5" [_$'\266']="0xb6" [_$'\267']="0xb7" [_$'\270']="0xb8"
    [_$'\271']="0xb9" [_$'\272']="0xba" [_$'\273']="0xbb" [_$'\274']="0xbc"
    [_$'\275']="0xbd" [_$'\276']="0xbe" [_$'\277']="0xbf" [_$'\300']="0xc0"
    [_$'\301']="0xc1" [_$'\302']="0xc2" [_$'\303']="0xc3" [_$'\304']="0xc4"
    [_$'\305']="0xc5" [_$'\306']="0xc6" [_$'\307']="0xc7" [_$'\310']="0xc8"
    [_$'\311']="0xc9" [_$'\312']="0xca" [_$'\313']="0xcb" [_$'\314']="0xcc"
    [_$'\315']="0xcd" [_$'\316']="0xce" [_$'\317']="0xcf" [_$'\320']="0xd0"
    [_$'\321']="0xd1" [_$'\322']="0xd2" [_$'\323']="0xd3" [_$'\324']="0xd4"
    [_$'\325']="0xd5" [_$'\326']="0xd6" [_$'\327']="0xd7" [_$'\330']="0xd8"
    [_$'\331']="0xd9" [_$'\332']="0xda" [_$'\333']="0xdb" [_$'\334']="0xdc"
    [_$'\335']="0xdd" [_$'\336']="0xde" [_$'\337']="0xdf" [_$'\340']="0xe0"
    [_$'\341']="0xe1" [_$'\342']="0xe2" [_$'\343']="0xe3" [_$'\344']="0xe4"
    [_$'\345']="0xe5" [_$'\346']="0xe6" [_$'\347']="0xe7" [_$'\350']="0xe8"
    [_$'\351']="0xe9" [_$'\352']="0xea" [_$'\353']="0xeb" [_$'\354']="0xec"
    [_$'\355']="0xed" [_$'\356']="0xee" [_$'\357']="0xef" [_$'\360']="0xf0"
    [_$'\361']="0xf1" [_$'\362']="0xf2" [_$'\363']="0xf3" [_$'\364']="0xf4"
    [_$'\365']="0xf5" [_$'\366']="0xf6" [_$'\367']="0xf7" [_$'\370']="0xf8"
    [_$'\371']="0xf9" [_$'\372']="0xfa" [_$'\373']="0xfb" [_$'\374']="0xfc"
    [_$'\375']="0xfd" [_$'\376']="0xfe" [_$'\377']="0xff"
  )
  function ord_eascii3 {
        local -i val="${EAsciiLookup2["_${2:0:1}"]-}"
        if [ "${val}" -eq 0 ]; then
                case "${2:0:1}" in
                        ]) val=0x5d ;;
                        \\) val=0x5c ;;
                esac
        fi
        eval "${1}"'="${val}"'
  }
  # for fun check out the following
  time for (( i=0 ; i <1000; i++ )); do ord TmpVar 'a'; done
  # real 0m0.065s
  # user 0m0.048s
  # sys 0m0.000s

  time for (( i=0 ; i <1000; i++ )); do ord_eascii TmpVar 'a'; done
  # real 0m0.239s
  # user 0m0.188s
  # sys 0m0.000s

  time for (( i=0 ; i <1000; i++ )); do ord_utf8 TmpVar 'a'; done
  # real 0m0.225s
  # user 0m0.180s
  # sys 0m0.000s

  time for (( i=0 ; i <1000; i++ )); do ord_eascii2 TmpVar 'a'; done
  # real 0m1.507s
  # user 0m1.056s
  # sys 0m0.012s

  time for (( i=0 ; i <1000; i++ )); do ord_eascii3 TmpVar 'a'; done
  # real 0m0.147s
  # user 0m0.120s
  # sys 0m0.000s

  time for (( i=0 ; i <1000; i++ )); do ord_echo 'a' >/dev/null ; done
  # real 0m0.065s
  # user 0m0.044s
  # sys 0m0.016s

  time for (( i=0 ; i <1000; i++ )); do ord_eascii_echo 'a' >/dev/null ; done
  # real 0m0.089s
  # user 0m0.068s
  # sys 0m0.008s

  time for (( i=0 ; i <1000; i++ )); do ord_utf8_echo 'a' >/dev/null ; done
  # real 0m0.226s
  # user 0m0.172s
  # sys 0m0.012s

}}}

How do I convert an ASCII character to its decimal (or hexadecimal) value and back? How do I do URL encoding or URL decoding?

If you have a known octal or hexadecimal value (at script-writing time), you can just use printf:

   1 # POSIX
   2 printf '\047\n'
   3 
   4 # bash/ksh/zsh and a few other printf implementations also support:
   5 printf '\x27\n'

In locales where the character encoding is a superset of ASCII, this prints the literal ' character (47 is the octal ASCII value of the apostrophe character) and a newline. The hexadecimal version can also be used with a few printf implementations including the bash builtin, but is not standard/POSIX.

printf in bash 4.2 and higher, and in ksh93, supports Unicode code points as well:

   1 # bash 4.2, ksh93
   2 printf '\u0027\n'

Another approach: bash's $'...' quoting can be used to expand to the desired characters, either in a variable assignment, or directly as a command argument:

   1 ExpandedString=$'\x27\047\u0027\U00000027\n'
   2 printf %s\\n "$ExpandedString"
   3 
   4 # or, more simply
   5 printf %s\\n $'\x27\047\u0027\U00000027\n'

If you need to convert characters (or numeric ASCII values) that are not known in advance (i.e., in variables), you can use something a little more complicated. Note: These functions only work for single-byte character encodings.

   1 # POSIX
   2 # chr() - converts decimal value to its ASCII character representation
   3 # ord() - converts ASCII character to its decimal value
   4 
   5 chr() {
   6   [ "$1" -lt 256 ] || return 1
   7   printf "\\$(printf %o "$1")"
   8 }

Even better to avoid using a subshell is to pass the value inside a variable instead of the command output. faster as it avoids the subshell

   1 chr () {
   2   local val
   3   [ "$1" -lt 256 ] || return 1
   4   printf -v val %o "$1"; printf "\\$val"
   5   # That one requires bash 3.1 or above.
   6 }

   1 ord() {
   2   # POSIX
   3   LC_CTYPE=C printf %d "'$1"
   4 }
   5 
   6 # hex() - converts ASCII character to a hexadecimal value
   7 # unhex() - converts a hexadecimal value to an ASCII character
   8 
   9 hex() {
  10    LC_CTYPE=C printf %x "'$1"
  11 }
  12 
  13 unhex() {
  14    printf "\\x$1"
  15 }
  16 
  17 # examples:
  18 
  19 chr "$(ord A)"    # -> A
  20 ord "$(chr 65)"   # -> 65

The ord function above is quite tricky.

  • Tricky? Rather, it's using a feature that I can't find documented anywhere -- putting a single quote in front of a character. Neat effect, but how on earth did you find out about it? Source diving? -- GreyCat

    • It validates The Single Unix Specification: "If the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote." (see printf() to know more) -- mjf

URL encoding and URL decoding

Note that URL encoding is defined only at the byte (octet) level. A URL-encoding of a multibyte (e.g. UTF-8) character is done by simply encoding each byte individually, then concatenating everything.

Also note that the urldecode function below performs no error checking; getting it to yield a sensible error message when you feed it malformed input is left as an exercise for the reader.

   1 urlencode() {
   2     # urlencode <string>
   3     local LC_ALL=C c i n
   4     for (( i = 0, n = ${#1}; i < n; i++ )); do
   5         c=${1:i:1}
   6         case $c in
   7             [[:alnum:].~_-]) printf %s "$c" ;;
   8             *) printf %%%02X "'$c"  ;;
   9         esac
  10     done
  11 }

   1 urldecode() {
   2     # urldecode <string>
   3     local s
   4     s=${1//\\/\\\\}
   5     s=${s//+/ }
   6     printf %b "${s//'%'/\\x}"
   7 }

   1 # Alternative urlencode, prints all at once (requires bash 3.1)
   2 urlencode() {
   3     # urlencode <string>
   4     local LC_ALL=C c i n=${#1}
   5     local out= tmp
   6     for (( i=0; i < n; i++ )); do
   7         c=${1:i:1}
   8         case $c in
   9             [[:alnum:].~_-]) printf -v tmp %s "$c" ;;
  10             *) printf -v tmp %%%02X "'$c"  ;;
  11         esac
  12         out+=$tmp
  13     done
  14     printf %s "$out"
  15 }

More complete examples (with UTF-8 support)

The command-line utility nkf can decode URLs:

   1 echo 'https://ja.wikipedia.org/wiki/%E9%87%8E%E8%89%AF%E7%8C%AB' | nkf --url-input

Note about Ext Ascii and UTF-8 encoding

  • The following example was never peer-reviewed. Everyone is terrified of it. Proceed at your own risk.

    • for values 0x00 - 0x7f Identical
    • for values 0x80 - 0xff conflict between UTF-8 & ExtAscii

    • for values 0x100 - 0xffff Only UTF-8 UTF-16 UTF-32
    • for values 0x100 - 0x7FFFFFFF Only UTF-8 UTF-32

      value

      EAscii

      UTF-8

      UTF-16

      UTF-32

      0x20

      "\x20"

      "\x20"

      \u0020

      \U00000020

      0x20

      "\x7f"

      "\x7f"

      \u007f

      \U0000007f

      0x80

      "\x80"

      "\xc2\x80"

      \u0080

      \U00000080

      0xff

      "\xff"

      "\xc3\xbf"

      \u00ff

      \U000000ff

      0x100

      N/A

      "\xc4\x80"

      \u0100

      \U00000100

      0x1000

      N/A

      "\xc8\x80"

      \u1000

      \U00001000

      0xffff

      N/A

      "\xef\xbf\xbf"

      \uffff

      \U0000ffff

      0x10000

      N/A

      "\xf0\x90\x80\x80"

      \ud800\udc00

      \U00010000

      0xfffff

      N/A

      "\xf3\xbf\xbf\xbf"

      \udbbf\udfff

      \U000fffff

      0x10000000

      N/A

      "\xfc\x90\x80\x80\x80\x80"

      N/A

      \U10000000

      0x7fffffff

      N/A

      "\xfd\xbf\xbf\xbf\xbf\xbf"

      N/A

      \U7fffffff

      0x80000000

      N/A

      N/A

      N/A

      N/A

      0xffffffff

      N/A

      N/A

      N/A

      N/A

   1   ###########################################################################
   2   ## ord family
   3   ###########################################################################
   4   # ord        <Return Variable Name> <Char to convert> [Optional Format String]
   5   # ord_hex    <Return Variable Name> <Char to convert>
   6   # ord_oct    <Return Variable Name> <Char to convert>
   7   # ord_utf8   <Return Variable Name> <Char to convert> [Optional Format String]
   8   # ord_eascii <Return Variable Name> <Char to convert> [Optional Format String]
   9   # ord_echo                      <Char to convert> [Optional Format String]
  10   # ord_hex_echo                  <Char to convert>
  11   # ord_oct_echo                  <Char to convert>
  12   # ord_utf8_echo                 <Char to convert> [Optional Format String]
  13   # ord_eascii_echo               <Char to convert> [Optional Format String]
  14   #
  15   # Description:
  16   # converts character using native encoding to its decimal value and stores
  17   # it in the Variable specified
  18   #
  19   #       ord
  20   #       ord_hex         output in hex
  21   #       ord_hex         output in octal
  22   #       ord_utf8        forces UTF8 decoding
  23   #       ord_eascii      forces eascii decoding
  24   #       ord_echo        prints to stdout
  25   function ord {
  26           printf -v "${1?Missing Dest Variable}" "${3:-%d}" "'${2?Missing Char}"
  27   }
  28   function ord_oct {
  29           ord "${@:1:2}" "0%c"
  30   }
  31   function ord_hex {
  32           ord "${@:1:2}" "0x%x"
  33   }
  34   function ord_utf8 {
  35           LC_CTYPE=C.UTF-8 ord "${@}"
  36   }
  37   function ord_eascii {
  38           LC_CTYPE=C ord "${@}"
  39   }
  40   function ord_echo {
  41           printf "${2:-%d}" "'${1?Missing Char}"
  42   }
  43   function ord_oct_echo {
  44           ord_echo "${1}" "0%o"
  45   }
  46   function ord_hex_echo {
  47           ord_echo "${1}" "0x%x"
  48   }
  49   function ord_utf8_echo {
  50           LC_CTYPE=C.UTF-8 ord_echo "${@}"
  51   }
  52   function ord_eascii_echo {
  53           LC_CTYPE=C ord_echo "${@}"
  54   }
  55 
  56   ###########################################################################
  57   ## chr family
  58   ###########################################################################
  59   # chr_utf8   <Return Variale Name> <Integer to convert>
  60   # chr_eascii <Return Variale Name> <Integer to convert>
  61   # chr        <Return Variale Name> <Integer to convert>
  62   # chr_oct    <Return Variale Name> <Octal number to convert>
  63   # chr_hex    <Return Variale Name> <Hex number to convert>
  64   # chr_utf8_echo                  <Integer to convert>
  65   # chr_eascii_echo                <Integer to convert>
  66   # chr_echo                       <Integer to convert>
  67   # chr_oct_echo                   <Octal number to convert>
  68   # chr_hex_echo                   <Hex number to convert>
  69   #
  70   # Description:
  71   # converts decimal value to character representation an stores
  72   # it in the Variable specified
  73   #
  74   #       chr                     Tries to guess output format
  75   #       chr_utf8                forces UTF8 encoding
  76   #       chr_eascii              forces eascii encoding
  77   #       chr_echo                prints to stdout
  78   #
  79   function chr_utf8_m {
  80     local val
  81     #
  82     # bash only supports \u \U since 4.2
  83     #
  84 
  85     # here is an example how to encode
  86     # manually
  87     # this will work since Bash 3.1 as it uses -v.
  88     #
  89     if [[ ${2:?Missing Ordinal Value} -le 0x7f ]]; then
  90       printf -v val "\\%03o" "${2}"
  91     elif [[ ${2} -le 0x7ff        ]]; then
  92       printf -v val "\\%03o" \
  93         $((  (${2}>> 6)      |0xc0 )) \
  94         $(( ( ${2}     &0x3f)|0x80 ))
  95     elif [[ ${2} -le 0xffff       ]]; then
  96       printf -v val "\\%03o" \
  97         $(( ( ${2}>>12)      |0xe0 )) \
  98         $(( ((${2}>> 6)&0x3f)|0x80 )) \
  99         $(( ( ${2}     &0x3f)|0x80 ))
 100     elif [[ ${2} -le 0x1fffff     ]]; then
 101       printf -v val "\\%03o"  \
 102         $(( ( ${2}>>18)      |0xf0 )) \
 103         $(( ((${2}>>12)&0x3f)|0x80 )) \
 104         $(( ((${2}>> 6)&0x3f)|0x80 )) \
 105         $(( ( ${2}     &0x3f)|0x80 ))
 106     elif [[ ${2} -le 0x3ffffff    ]]; then
 107       printf -v val "\\%03o"  \
 108         $(( ( ${2}>>24)      |0xf8 )) \
 109         $(( ((${2}>>18)&0x3f)|0x80 )) \
 110         $(( ((${2}>>12)&0x3f)|0x80 )) \
 111         $(( ((${2}>> 6)&0x3f)|0x80 )) \
 112         $(( ( ${2}     &0x3f)|0x80 ))
 113     elif [[ ${2} -le 0x7fffffff ]]; then
 114       printf -v val "\\%03o"  \
 115         $(( ( ${2}>>30)      |0xfc )) \
 116         $(( ((${2}>>24)&0x3f)|0x80 )) \
 117         $(( ((${2}>>18)&0x3f)|0x80 )) \
 118         $(( ((${2}>>12)&0x3f)|0x80 )) \
 119         $(( ((${2}>> 6)&0x3f)|0x80 )) \
 120         $(( ( ${2}     &0x3f)|0x80 ))
 121     else
 122       printf -v "${1:?Missing Dest Variable}" ""
 123       return 1
 124     fi
 125     printf -v "${1:?Missing Dest Variable}" "${val}"
 126   }
 127   function chr_utf8 {
 128           local val
 129           [[ ${2?Missing Ordinal Value} -lt 0x80000000 ]] || return 1
 130 
 131           if [[ ${2} -lt 0x100 && ${2} -ge 0x80 ]]; then
 132 
 133                   # bash 4.2 incorrectly encodes
 134                   # \U000000ff as \xff so encode manually
 135                   printf -v val "\\%03o\%03o" $(( (${2}>>6)|0xc0 )) $(( (${2}&0x3f)|0x80 ))
 136           else
 137                   printf -v val '\\U%08x' "${2}"
 138           fi
 139           printf -v ${1?Missing Dest Variable} ${val}
 140   }
 141   function chr_eascii {
 142           local val
 143           # Make sure value less than 0x100
 144           # otherwise we end up with
 145           # \xVVNNNNN
 146           # where \xVV = char && NNNNN is a number string
 147           # so chr "0x44321" => "D321"
 148           [[ ${2?Missing Ordinal Value} -lt 0x100 ]] || return 1
 149           printf -v val '\\x%02x' "${2}"
 150           printf -v ${1?Missing Dest Variable} ${val}
 151   }
 152   function chr {
 153           if [ "${LC_CTYPE:-${LC_ALL:-}}" = "C" ]; then
 154                   chr_eascii "${@}"
 155           else
 156                   chr_utf8 "${@}"
 157           fi
 158   }
 159   function chr_dec {
 160           # strip leading 0s otherwise
 161           # interpreted as Octal
 162           chr "${1}" "${2#${2%%[!0]*}}"
 163   }
 164   function chr_oct {
 165           chr "${1}" "0${2}"
 166   }
 167   function chr_hex {
 168           chr "${1}" "0x${2#0x}"
 169   }
 170   function chr_utf8_echo {
 171           local val
 172           [[ ${1?Missing Ordinal Value} -lt 0x80000000 ]] || return 1
 173 
 174           if [[ ${1} -lt 0x100 && ${1} -ge 0x80 ]]; then
 175 
 176                   # bash 4.2 incorrectly encodes
 177                   # \U000000ff as \xff so encode manually
 178                   printf -v val '\\%03o\\%03o' $(( (${1}>>6)|0xc0 )) $(( (${1}&0x3f)|0x80 ))
 179           else
 180                   printf -v val '\\U%08x' "${1}"
 181           fi
 182           printf "${val}"
 183   }
 184   function chr_eascii_echo {
 185           local val
 186           # Make sure value less than 0x100
 187           # otherwise we end up with
 188           # \xVVNNNNN
 189           # where \xVV = char && NNNNN is a number string
 190           # so chr "0x44321" => "D321"
 191           [[ ${1?Missing Ordinal Value} -lt 0x100 ]] || return 1
 192           printf -v val '\\x%x' "${1}"
 193           printf "${val}"
 194   }
 195   function chr_echo {
 196           if [ "${LC_CTYPE:-${LC_ALL:-}}" = "C" ]; then
 197                   chr_eascii_echo "${@}"
 198           else
 199                   chr_utf8_echo "${@}"
 200           fi
 201   }
 202   function chr_dec_echo {
 203           # strip leading 0s otherwise
 204           # interpreted as Octal
 205           chr_echo "${1#${1%%[!0]*}}"
 206   }
 207   function chr_oct_echo {
 208           chr_echo "0${1}"
 209   }
 210   function chr_hex_echo {
 211           chr_echo "0x${1#0x}"
 212   }
 213 
 214   #
 215   # Simple Validation code
 216   #
 217   function test_echo_func {
 218     local Outcome _result
 219     _result="$( "${1}" "${2}" )"
 220     [ "${_result}" = "${3}" ] && Outcome="Pass" || Outcome="Fail"
 221     printf "# %-20s %-6s => "           "${1}" "${2}" "${_result}" "${3}"
 222     printf "[ "%16q" = "%-16q"%-5s ] "  "${_result}" "${3}" "(${3//[[:cntrl:]]/_})"
 223     printf "%s\n"                       "${Outcome}"
 224 
 225 
 226   }
 227   function test_value_func {
 228     local Outcome _result
 229     "${1}" _result "${2}"
 230     [ "${_result}" = "${3}" ] && Outcome="Pass" || Outcome="Fail"
 231     printf "# %-20s %-6s => "           "${1}" "${2}" "${_result}" "${3}"
 232     printf "[ "%16q" = "%-16q"%-5s ] "  "${_result}" "${3}" "(${3//[[:cntrl:]]/_})"
 233     printf "%s\n"                       "${Outcome}"
 234   }
 235   test_echo_func  chr_echo "$(ord_echo  "A")"  "A"
 236   test_echo_func  ord_echo "$(chr_echo "65")"  "65"
 237   test_echo_func  chr_echo "$(ord_echo  "ö")"  "ö"
 238   test_value_func chr      "$(ord_echo  "A")"  "A"
 239   test_value_func ord      "$(chr_echo "65")"  "65"
 240   test_value_func chr      "$(ord_echo  "ö")"  "ö"
 241   # chr_echo             65     => [                A = A               (A)   ] Pass
 242   # ord_echo             A      => [               65 = 65              (65)  ] Pass
 243   # chr_echo             246    => [      $'\303\266' = $'\303\266'     (ö)  ] Pass
 244   # chr                  65     => [                A = A               (A)   ] Pass
 245   # ord                  A      => [               65 = 65              (65)  ] Pass
 246   # chr                  246    => [      $'\303\266' = $'\303\266'     (ö)  ] Pass
 247   #
 248 
 249 
 250   test_echo_func  chr_echo     "65"     A
 251   test_echo_func  chr_echo     "065"    5
 252   test_echo_func  chr_dec_echo "065"    A
 253   test_echo_func  chr_oct_echo "65"     5
 254   test_echo_func  chr_hex_echo "65"     e
 255   test_value_func chr          "65"     A
 256   test_value_func chr          "065"    5
 257   test_value_func chr_dec      "065"    A
 258   test_value_func chr_oct      "65"     5
 259   test_value_func chr_hex      "65"     e
 260   # chr_echo             65     => [                A = A               (A)   ] Pass
 261   # chr_echo             065    => [                5 = 5               (5)   ] Pass
 262   # chr_dec_echo         065    => [                A = A               (A)   ] Pass
 263   # chr_oct_echo         65     => [                5 = 5               (5)   ] Pass
 264   # chr_hex_echo         65     => [                e = e               (e)   ] Pass
 265   # chr                  65     => [                A = A               (A)   ] Pass
 266   # chr                  065    => [                5 = 5               (5)   ] Pass
 267   # chr_dec              065    => [                A = A               (A)   ] Pass
 268   # chr_oct              65     => [                5 = 5               (5)   ] Pass
 269   # chr_hex              65     => [                e = e               (e)   ] Pass
 270 
 271   #test_value_func chr          0xff   $'\xff'
 272   test_value_func chr_eascii   0xff   $'\xff'
 273   test_value_func chr_utf8     0xff   $'\uff'      # Note this fails because bash encodes it incorrectly
 274   test_value_func chr_utf8     0xff   $'\303\277'
 275   test_value_func chr_utf8     0x100  $'\u100'
 276   test_value_func chr_utf8     0x1000 $'\u1000'
 277   test_value_func chr_utf8     0xffff $'\uffff'
 278   # chr_eascii           0xff   => [          $'\377' = $'\377'         (�)   ] Pass
 279   # chr_utf8             0xff   => [      $'\303\277' = $'\377'         (�)   ] Fail
 280   # chr_utf8             0xff   => [      $'\303\277' = $'\303\277'     (ÿ)  ] Pass
 281   # chr_utf8             0x100  => [      $'\304\200' = $'\304\200'     (Ā)  ] Pass
 282   # chr_utf8             0x1000 => [  $'\341\200\200' = $'\341\200\200' (က) ] Pass
 283   # chr_utf8             0xffff => [  $'\357\277\277' = $'\357\277\277' (���) ] Pass
 284   test_value_func ord_utf8     "A"           65
 285   test_value_func ord_utf8     "ä"          228
 286   test_value_func ord_utf8     $'\303\277'  255
 287   test_value_func ord_utf8     $'\u100'     256
 288 
 289 
 290 
 291   #########################################################
 292   # to help debug problems try this
 293   #########################################################
 294   printf "%q\n" $'\xff'                  # => $'\377'
 295   printf "%q\n" $'\uffff'                # => $'\357\277\277'
 296   printf "%q\n" "$(chr_utf8_echo 0x100)" # => $'\304\200'
 297   #
 298   # This can help a lot when it comes to diagnosing problems
 299   # with read and or xterm program output
 300   # I use it a lot in error case to create a human readable error message
 301   # i.e.
 302   echo "Enter type to test, Enter to continue"
 303   while read -srN1 ; do
 304     ord asciiValue "${REPLY}"
 305     case "${asciiValue}" in
 306       10) echo "Goodbye" ; break ;;
 307       20|21|22) echo "Yay expected input" ;;
 308       *) printf ':( Unexpected Input 0x%02x %q "%s"\n' "${asciiValue}" "${REPLY}" "${REPLY//[[:cntrl:]]}" ;;
 309     esac
 310   done
 311 
 312   #########################################################
 313   # More exotic approach 1
 314   #########################################################
 315   # I used to use this before I figured out the LC_CTYPE=C approach
 316   # printf "EAsciiLookup=%q" "$(for (( x=0x0; x<0x100 ; x++)); do printf '%b' $(printf '\\x%02x' "$x"); done)"
 317   EAsciiLookup=$'\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023'
 318   EAsciiLookup+=$'\024\025\026\027\030\031\032\E\034\035\036\037 !"#$%&\'()*+,-'
 319   EAsciiLookup+=$'./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghi'
 320   EAsciiLookup+=$'jklmnopqrstuvwxyz{|}~\177\200\201\202\203\204\205\206\207\210'
 321   EAsciiLookup+=$'\211\212\213\214\215\216\217\220\221\222\223\224\225\226\227'
 322   EAsciiLookup+=$'\230\231\232\233\234\235\236\237\240\241\242\243\244\245\246'
 323   EAsciiLookup+=$'\247\250\251\252\253\254\255\256\257\260\261\262\263\264\265'
 324   EAsciiLookup+=$'\266\267\270\271\272\273\274\275\276\277\300\301\302\303\304'
 325   EAsciiLookup+=$'\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323'
 326   EAsciiLookup+=$'\324\325\326\327\330\331\332\333\334\335\336\337\340\341\342'
 327   EAsciiLookup+=$'\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361'
 328   EAsciiLookup+=$'\362\363\364\365\366\367\370\371\372\373\374\375\376\377'
 329   function ord_eascii2 {
 330     local idx="${EAsciiLookup%%${2:0:1}*}"
 331     eval ${1}'=$(( ${#idx} +1 ))'
 332   }
 333 
 334   #########################################################
 335   # More exotic approach 2
 336   #########################################################
 337   #printf "EAsciiLookup2=(\n    %s\n)" "$(for (( x=0x1; x<0x100 ; x++)); do printf '%-18s'  "$(printf '[_%q]="0x%02x"' "$(printf "%b" "$(printf '\\x%02x' "$x")")" $x )" ; [ "$(($x%6))" != "0" ] || echo -en "\n    " ; done)"
 338   typeset -A EAsciiLookup2
 339   EAsciiLookup2=(
 340     [_$'\001']="0x01" [_$'\002']="0x02" [_$'\003']="0x03" [_$'\004']="0x04"
 341     [_$'\005']="0x05" [_$'\006']="0x06" [_$'\a']="0x07"   [_$'\b']="0x08"
 342     [_$'\t']="0x09"   [_'']="0x0a"      [_$'\v']="0x0b"   [_$'\f']="0x0c"
 343     [_$'\r']="0x0d"   [_$'\016']="0x0e" [_$'\017']="0x0f" [_$'\020']="0x10"
 344     [_$'\021']="0x11" [_$'\022']="0x12" [_$'\023']="0x13" [_$'\024']="0x14"
 345     [_$'\025']="0x15" [_$'\026']="0x16" [_$'\027']="0x17" [_$'\030']="0x18"
 346     [_$'\031']="0x19" [_$'\032']="0x1a" [_$'\E']="0x1b"   [_$'\034']="0x1c"
 347     [_$'\035']="0x1d" [_$'\036']="0x1e" [_$'\037']="0x1f" [_\ ]="0x20"
 348     [_\!]="0x21"      [_\"]="0x22"      [_\#]="0x23"      [_\$]="0x24"
 349     [_%]="0x25"       [_\&]="0x26"      [_\']="0x27"      [_\(]="0x28"
 350     [_\)]="0x29"      [_\*]="0x2a"      [_+]="0x2b"       [_\,]="0x2c"
 351     [_-]="0x2d"       [_.]="0x2e"       [_/]="0x2f"       [_0]="0x30"
 352     [_1]="0x31"       [_2]="0x32"       [_3]="0x33"       [_4]="0x34"
 353     [_5]="0x35"       [_6]="0x36"       [_7]="0x37"       [_8]="0x38"
 354     [_9]="0x39"       [_:]="0x3a"       [_\;]="0x3b"      [_\<]="0x3c"
 355     [_=]="0x3d"       [_\>]="0x3e"      [_\?]="0x3f"      [_@]="0x40"
 356     [_A]="0x41"       [_B]="0x42"       [_C]="0x43"       [_D]="0x44"
 357     [_E]="0x45"       [_F]="0x46"       [_G]="0x47"       [_H]="0x48"
 358     [_I]="0x49"       [_J]="0x4a"       [_K]="0x4b"       [_L]="0x4c"
 359     [_M]="0x4d"       [_N]="0x4e"       [_O]="0x4f"       [_P]="0x50"
 360     [_Q]="0x51"       [_R]="0x52"       [_S]="0x53"       [_T]="0x54"
 361     [_U]="0x55"       [_V]="0x56"       [_W]="0x57"       [_X]="0x58"
 362     [_Y]="0x59"       [_Z]="0x5a"       [_\[]="0x5b"      #[_\\]="0x5c"
 363     #[_\]]="0x5d"
 364                       [_\^]="0x5e"      [__]="0x5f"       [_\`]="0x60"
 365     [_a]="0x61"       [_b]="0x62"       [_c]="0x63"       [_d]="0x64"
 366     [_e]="0x65"       [_f]="0x66"       [_g]="0x67"       [_h]="0x68"
 367     [_i]="0x69"       [_j]="0x6a"       [_k]="0x6b"       [_l]="0x6c"
 368     [_m]="0x6d"       [_n]="0x6e"       [_o]="0x6f"       [_p]="0x70"
 369     [_q]="0x71"       [_r]="0x72"       [_s]="0x73"       [_t]="0x74"
 370     [_u]="0x75"       [_v]="0x76"       [_w]="0x77"       [_x]="0x78"
 371     [_y]="0x79"       [_z]="0x7a"       [_\{]="0x7b"      [_\|]="0x7c"
 372     [_\}]="0x7d"      [_~]="0x7e"       [_$'\177']="0x7f" [_$'\200']="0x80"
 373     [_$'\201']="0x81" [_$'\202']="0x82" [_$'\203']="0x83" [_$'\204']="0x84"
 374     [_$'\205']="0x85" [_$'\206']="0x86" [_$'\207']="0x87" [_$'\210']="0x88"
 375     [_$'\211']="0x89" [_$'\212']="0x8a" [_$'\213']="0x8b" [_$'\214']="0x8c"
 376     [_$'\215']="0x8d" [_$'\216']="0x8e" [_$'\217']="0x8f" [_$'\220']="0x90"
 377     [_$'\221']="0x91" [_$'\222']="0x92" [_$'\223']="0x93" [_$'\224']="0x94"
 378     [_$'\225']="0x95" [_$'\226']="0x96" [_$'\227']="0x97" [_$'\230']="0x98"
 379     [_$'\231']="0x99" [_$'\232']="0x9a" [_$'\233']="0x9b" [_$'\234']="0x9c"
 380     [_$'\235']="0x9d" [_$'\236']="0x9e" [_$'\237']="0x9f" [_$'\240']="0xa0"
 381     [_$'\241']="0xa1" [_$'\242']="0xa2" [_$'\243']="0xa3" [_$'\244']="0xa4"
 382     [_$'\245']="0xa5" [_$'\246']="0xa6" [_$'\247']="0xa7" [_$'\250']="0xa8"
 383     [_$'\251']="0xa9" [_$'\252']="0xaa" [_$'\253']="0xab" [_$'\254']="0xac"
 384     [_$'\255']="0xad" [_$'\256']="0xae" [_$'\257']="0xaf" [_$'\260']="0xb0"
 385     [_$'\261']="0xb1" [_$'\262']="0xb2" [_$'\263']="0xb3" [_$'\264']="0xb4"
 386     [_$'\265']="0xb5" [_$'\266']="0xb6" [_$'\267']="0xb7" [_$'\270']="0xb8"
 387     [_$'\271']="0xb9" [_$'\272']="0xba" [_$'\273']="0xbb" [_$'\274']="0xbc"
 388     [_$'\275']="0xbd" [_$'\276']="0xbe" [_$'\277']="0xbf" [_$'\300']="0xc0"
 389     [_$'\301']="0xc1" [_$'\302']="0xc2" [_$'\303']="0xc3" [_$'\304']="0xc4"
 390     [_$'\305']="0xc5" [_$'\306']="0xc6" [_$'\307']="0xc7" [_$'\310']="0xc8"
 391     [_$'\311']="0xc9" [_$'\312']="0xca" [_$'\313']="0xcb" [_$'\314']="0xcc"
 392     [_$'\315']="0xcd" [_$'\316']="0xce" [_$'\317']="0xcf" [_$'\320']="0xd0"
 393     [_$'\321']="0xd1" [_$'\322']="0xd2" [_$'\323']="0xd3" [_$'\324']="0xd4"
 394     [_$'\325']="0xd5" [_$'\326']="0xd6" [_$'\327']="0xd7" [_$'\330']="0xd8"
 395     [_$'\331']="0xd9" [_$'\332']="0xda" [_$'\333']="0xdb" [_$'\334']="0xdc"
 396     [_$'\335']="0xdd" [_$'\336']="0xde" [_$'\337']="0xdf" [_$'\340']="0xe0"
 397     [_$'\341']="0xe1" [_$'\342']="0xe2" [_$'\343']="0xe3" [_$'\344']="0xe4"
 398     [_$'\345']="0xe5" [_$'\346']="0xe6" [_$'\347']="0xe7" [_$'\350']="0xe8"
 399     [_$'\351']="0xe9" [_$'\352']="0xea" [_$'\353']="0xeb" [_$'\354']="0xec"
 400     [_$'\355']="0xed" [_$'\356']="0xee" [_$'\357']="0xef" [_$'\360']="0xf0"
 401     [_$'\361']="0xf1" [_$'\362']="0xf2" [_$'\363']="0xf3" [_$'\364']="0xf4"
 402     [_$'\365']="0xf5" [_$'\366']="0xf6" [_$'\367']="0xf7" [_$'\370']="0xf8"
 403     [_$'\371']="0xf9" [_$'\372']="0xfa" [_$'\373']="0xfb" [_$'\374']="0xfc"
 404     [_$'\375']="0xfd" [_$'\376']="0xfe" [_$'\377']="0xff"
 405   )
 406   function ord_eascii3 {
 407         local -i val="${EAsciiLookup2["_${2:0:1}"]-}"
 408         if [ "${val}" -eq 0 ]; then
 409                 case "${2:0:1}" in
 410                         ])  val=0x5d ;;
 411                         \\) val=0x5c ;;
 412                 esac
 413         fi
 414         eval "${1}"'="${val}"'
 415   }
 416   # for fun check out the following
 417   time for (( i=0 ; i <1000; i++ )); do ord TmpVar 'a'; done
 418   #  real 0m0.065s
 419   #  user 0m0.048s
 420   #  sys  0m0.000s
 421 
 422   time for (( i=0 ; i <1000; i++ )); do ord_eascii TmpVar 'a'; done
 423   #  real 0m0.239s
 424   #  user 0m0.188s
 425   #  sys  0m0.000s
 426 
 427   time for (( i=0 ; i <1000; i++ )); do ord_utf8 TmpVar 'a'; done
 428   #  real       0m0.225s
 429   #  user       0m0.180s
 430   #  sys        0m0.000s
 431 
 432   time for (( i=0 ; i <1000; i++ )); do ord_eascii2 TmpVar 'a'; done
 433   #  real 0m1.507s
 434   #  user 0m1.056s
 435   #  sys  0m0.012s
 436 
 437   time for (( i=0 ; i <1000; i++ )); do ord_eascii3 TmpVar 'a'; done
 438   #  real 0m0.147s
 439   #  user 0m0.120s
 440   #  sys  0m0.000s
 441 
 442   time for (( i=0 ; i <1000; i++ )); do ord_echo 'a' >/dev/null ; done
 443   #  real       0m0.065s
 444   #  user       0m0.044s
 445   #  sys        0m0.016s
 446 
 447   time for (( i=0 ; i <1000; i++ )); do ord_eascii_echo 'a' >/dev/null ; done
 448   #  real       0m0.089s
 449   #  user       0m0.068s
 450   #  sys        0m0.008s
 451 
 452   time for (( i=0 ; i <1000; i++ )); do ord_utf8_echo 'a' >/dev/null ; done
 453   #  real       0m0.226s
 454   #  user       0m0.172s
 455   #  sys        0m0.012s

BashFAQ/071 (last edited 2021-02-08 16:03:51 by GreyCat)