Differences between revisions 6 and 22 (spanning 16 versions)
Revision 6 as of 2008-11-22 21:54:31
Size: 1968
Editor: GreyCat
Comment: first-line and }}}
Revision 22 as of 2016-07-18 22:32:49
Size: 3493
Editor: dsl
Comment:
Deletions are marked like this. Additions are marked like this.
Line 9: Line 9:
If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems.  You can check the kind of line endings in use by running: If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems.

=== Testing for line terminator type ===

A simple
check is to simply look at the output of `sed -n l`:
Line 11: Line 15:
cat -e yourscript sed -n l yourscript
}}}
which should output the script in one of the variants shown in the ascii table below:
{{{
+--------------------+---------------------+-------------------------------+
| LF (unix) | CR-LF (dos/windows) | CR (old mac) |
+--------------------+---------------------+-------------------------------+
|command$ |command\r$ |command\r\ranother command\r$ |
|$ |\r$ | |
|another command$ |another command\r$ | |
+--------------------+---------------------+-------------------------------+
Line 14: Line 28:
If you see something like this: Another method is to use the `file` utility if available to guess at the file type:
Line 16: Line 31:
command^M$
^M$
another command^M$
file yourscript
Line 21: Line 34:
then you need to remove the CRs. There are a plethora of ways to do this. The output tells you whether the ASCII text has some CR, if that's the case. Note: this is only true on GNU/Linux. On other operating systems, the result of `file` is unpredictable, except that it should contain the word "text" somewhere in the output if the result "kind of looks like a text file of some sort, maybe".
Line 23: Line 36:
All these are from the [[http://www.student.northpark.edu/pemente/sed/sed1line.txt|sed one-liners page]]:
Line 25: Line 37:
sed 's/.$//' dosfile # assumes that all lines end with CR/LF
sed 's/^M$//' dosfile # in bash/tcsh, press Ctrl-V then Ctrl-M
sed 's/\x0D$//' dosfile # GNUism - does not work with Unix sed!
imadev:~$ printf 'DOS\r\nline endings\r\n' > foo
imadev:~$ file foo
foo: commands text
arc3:~$ file foo
foo: ASCII text, with CRLF line terminators
Line 30: Line 44:
If you want to remove all CRs regardless of whether they are at the end of a line, you can use {{{tr}}}: In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and by a DOS utility, there should be no bare LFs not preceded by a CR.
Line 33: Line 47:
tr -d '\r' < dosfile # Bash / Ksh /Zsh

if grep -qv $'\r$' < File; then
    echo 'File contains at least one newline not preceded by a CR'
else
    echo 'File contains only CRLFs (or is empty)'
fi
Line 36: Line 56:
If you want to use the second {{{sed}}} example above, but without embedding a literal CR into your script: === Converting files ===

`ex` is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it in-place from a script:
Line 39: Line 61:
sed $'s/\r$//' dosfile # BASH only # works with vim's ex but not vi's ex
ex -sc $'%s/\r$//e|x' file

# works with vi's ex but not vim's ex
ex -sc $'%s/\r$//|x' file
Line 42: Line 68:
All of the previous examples write the modified file to standard output. Redirect the output to a new file, and then {{{mv}}} it over top of the original. {{{
# Using ed.
ed -s file <<< $'%s/\r$//g\nwq'
}}}
Line 44: Line 73:
There are many more ways:
 * Some systems have a {{{dos2unix}}} command which can do this. Or {{{recode}}}, or {{{fromdos}}}.
 * In {{{vim}}}, you can use {{{:set fileformat=unix}}} to do it.
 * You can use Perl:
  {{{
  perl -pi -e 's/\r\n/\n/' filename
  }}}
 This has the advantage of overwriting the original file, so you don't have to mess with temporary files.
Of course, Any of the more powerful dynamic languages to do this with relative ease.

{{{
perl -pi -e 's/\r\n/\n/' filename
}}}

Some systems have special conversion tools available to do this automatically. `dos2unix`, `recode`, and `fromdos` are some examples.

It be done manually with an editor like nano:

{{{
nano -w yourscript
}}}
Type Ctrl-O and before confirming, type Alt-D (DOS) or Alt-M (Mac) to change the format.

Or in Vim, use `:set fileformat=unix` and save with `:w`. Ensure the value of `fenc` is correct (probably utf-8).

To simply strip all CRs from some input stream, you can use `tr -d '\r' <infile >outfile`. Of course, you must ensure these are not the same file.

How do I convert a file from DOS format to UNIX format (remove CRs from CR-LF line terminators)?

Carriage return characters (CRs) are used in line ending markers on some systems. There are three different kinds of line endings in common use:

  • Unix systems use Line Feeds (LFs) only.
  • MS-DOS and Windows systems use CR-LF pairs.
  • Old Macintosh systems use CRs only.

If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems.

Testing for line terminator type

A simple check is to simply look at the output of sed -n l:

sed -n l yourscript

which should output the script in one of the variants shown in the ascii table below:

+--------------------+---------------------+-------------------------------+
| LF (unix)          | CR-LF (dos/windows) | CR (old mac)                  |
+--------------------+---------------------+-------------------------------+
|command$            |command\r$           |command\r\ranother command\r$  |
|$                   |\r$                  |                               |
|another command$    |another command\r$   |                               |
+--------------------+---------------------+-------------------------------+

Another method is to use the file utility if available to guess at the file type:

file yourscript

The output tells you whether the ASCII text has some CR, if that's the case. Note: this is only true on GNU/Linux. On other operating systems, the result of file is unpredictable, except that it should contain the word "text" somewhere in the output if the result "kind of looks like a text file of some sort, maybe".

imadev:~$ printf 'DOS\r\nline endings\r\n' > foo
imadev:~$ file foo
foo:            commands text
arc3:~$ file foo
foo: ASCII text, with CRLF line terminators

In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and by a DOS utility, there should be no bare LFs not preceded by a CR.

# Bash / Ksh /Zsh

if grep -qv $'\r$' < File; then
    echo 'File contains at least one newline not preceded by a CR'
else
    echo 'File contains only CRLFs (or is empty)'
fi

Converting files

ex is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it in-place from a script:

# works with vim's ex but not vi's ex
ex -sc $'%s/\r$//e|x' file

# works with vi's ex but not vim's ex
ex -sc $'%s/\r$//|x' file

# Using ed.
ed -s file <<< $'%s/\r$//g\nwq'

Of course, Any of the more powerful dynamic languages to do this with relative ease.

perl -pi -e 's/\r\n/\n/' filename

Some systems have special conversion tools available to do this automatically. dos2unix, recode, and fromdos are some examples.

It be done manually with an editor like nano:

nano -w yourscript

Type Ctrl-O and before confirming, type Alt-D (DOS) or Alt-M (Mac) to change the format.

Or in Vim, use :set fileformat=unix and save with :w. Ensure the value of fenc is correct (probably utf-8).

To simply strip all CRs from some input stream, you can use tr -d '\r' <infile >outfile. Of course, you must ensure these are not the same file.

BashFAQ/052 (last edited 2022-01-30 01:59:53 by larryv)