Diff for "BashFAQ/052"

Differences between revisions 4 and 27 (spanning 23 versions)

How do I convert a file from DOS format to UNIX format (remove CRs from CR-LF line terminators)?

Carriage return characters (CRs) are used in line ending markers on some systems. There are three different kinds of line endings in common use:

Unix systems use Line Feeds (LFs) only.
MS-DOS and Windows systems use CR-LF pairs.
Old Macintosh systems use CRs only.

If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems.

Testing for line terminator type

A simple check is to simply look at the output of sed -n l:

sed -n l yourscript

which should write the script in one of these formats:

LF (Unix)	CR-LF (DOS/Windows)	CR (Old Mac OS)
command$	command\r$	command\r\ranother command\r$
$	\r$
another command$	another command\r$

Another method is to use the file utility if available to guess at the file type:

file yourscript

The output tells you whether the ASCII text has some CR, if that's the case. Note: this is only true on GNU/Linux. On other operating systems, the result of file is unpredictable, except that it should contain the word "text" somewhere in the output if the result "kind of looks like a text file of some sort, maybe".

imadev:~$ printf 'DOS\r\nline endings\r\n' > foo
imadev:~$ file foo
foo:            commands text
arc3:~$ file foo
foo: ASCII text, with CRLF line terminators

In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and by a DOS utility, there should be no bare LFs not preceded by a CR.

   1 # Bash / Ksh / Zsh
   2 
   3 if grep -qv $'\r$' File; then
   4     echo 'File contains at least one newline not preceded by a CR'
   5 else
   6     echo 'File contains only CRLFs (or is empty)'
   7 fi

Converting files

ex is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it in-place from a script:

# works with vim's ex but not vi's ex
ex -sc $'%s/\r$//e|x' file

# works with vi's ex but not vim's ex
ex -sc $'%s/\r$//|x' file

# Using ed.
ed -s file <<< $'%s/\r$//g\nwq'

Of course, Any of the more powerful dynamic languages to do this with relative ease.

perl -pi -e 's/\r\n/\n/' filename

Some systems have special conversion tools available to do this automatically. dos2unix, recode, and fromdos are some examples.

It be done manually with an editor like nano:

nano -w yourscript

Type Ctrl-O and before confirming, type Alt-D (DOS) or Alt-M (Mac) to change the format.

Or in Vim, use :set fileformat=unix and save with :w. Ensure the value of fenc is correct (probably utf-8).

To simply strip all CRs from some input stream, you can use tr -d '\r' <infile >outfile. Of course, you must ensure these are not the same file.

-  ⇤ ← Revision 4 as of 2008-08-08 02:41:35 → 
  Size: 1958
  Editor: GreyCat
  Comment: explain why
+   ← Revision 27 as of 2021-12-27 19:13:27 → ⇥
  Size: 3499
  Editor: GreyCat
  Comment: tr doesn't support filename arguments
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-[[Anchor(faq52)]]
+<<Anchor(faq52)>>
 Line 3:
+Carriage return characters (CRs) are used in line ending markers on some systems.  There are three different kinds of line endings in common use:
-Line 4:
+Line 5:
-Carriage return characters (CRs) are used in line ending markers on some systems.  There are three different kinds of line endings in common use:
 Line 9:
-If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems.  You can check the kind of line endings in use by running:
+If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems.

=== Testing for line terminator type ===

A simple check is to simply look at the output of `sed -n l`:
-Line 11:
+Line 15:
-cat -e yourscript}}}

If you see something like this:
{{{
command^M$
^M$
another command^M$}}}

then you need to remove the CRs.  There are a plethora of ways to do this.

All these are from the [http://www.student.northpark.edu/pemente/sed/sed1line.txt sed one-liners page]:
{{{
sed 's/.$//' dosfile              # assumes that all lines end with CR/LF
sed 's/^M$//' dosfile             # in bash/tcsh, press Ctrl-V then Ctrl-M
sed 's/\x0D$//' dosfile           # GNUism - does not work with Unix sed!
+sed -n l yourscript
-Line 28:
+Line 18:
-If you want to remove all CRs regardless of whether they are at the end of a line, you can use {{{tr}}}:
+which should write the script in one of these formats:

|| '''LF (Unix)''' || '''CR-LF (DOS/Windows)''' || '''CR (Old Mac OS)''' ||
||<style="border-bottom:none;"> command$ ||<style="border-bottom:none;"> command\r$ ||<style="border-bottom:none;"> command\r\ranother command\r$ ||
||<style="border-top:none; border-bottom:none"> $ ||<style="border-top:none; border-bottom:none;"> \r$ ||<style="border-top:none; border-bottom:none;"> ||
||<style="border-top:none;"> another command$ ||<style="border-top:none;"> another command\r$ ||<style="border-top:none;"> ||

Another method is to use the `file` utility if available to guess at the file type:
-Line 31:
+Line 28:
-tr -d '\r' < dosfile
+file yourscript
-Line 34:
+Line 31:
-If you want to use the second {{{sed}}} example above, but without embedding a literal CR into your script:
+The output tells you whether the ASCII text has some CR, if that's the case.  Note: this is only true on GNU/Linux.  On other operating systems, the result of `file` is unpredictable, except that it should contain the word "text" somewhere in the output if the result "kind of looks like a text file of some sort, maybe".
-Line 37:
+Line 34:
-sed $'s/\r$//' dosfile            # BASH only
+imadev:~$ printf 'DOS\r\nline endings\r\n' > foo
imadev:~$ file foo
foo:            commands text
arc3:~$ file foo
foo: ASCII text, with CRLF line terminators
-Line 40:
+Line 41:
-All of the previous examples write the modified file to standard output.  Redirect the output to a new file, and then {{{mv}}} it over top of the original.
+In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and by a DOS utility, there should be no bare LFs not preceded by a CR.
-Line 42:
+Line 43:
-There are many more ways:
 * Some systems have a {{{dos2unix}}} command which can do this.  Or {{{recode}}}, or {{{fromdos}}}.
 * In {{{vim}}}, you can use {{{:set fileformat=unix}}} to do it.
 * You can use Perl:
  {{{
  perl -pi -e 's/\r\n/\n/' filename}}}
 This has the advantage of overwriting the original file, so you don't have to mess with temporary files.
+{{{#!highlight bash
# Bash / Ksh / Zsh

if grep -qv $'\r$' File; then
    echo 'File contains at least one newline not preceded by a CR'
else
    echo 'File contains only CRLFs (or is empty)'
fi
}}}

=== Converting files ===

`ex` is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it in-place from a script:

{{{#!highlight bash numbers=disable
# works with vim's ex but not vi's ex
ex -sc $'%s/\r$//e|x' file

# works with vi's ex but not vim's ex
ex -sc $'%s/\r$//|x' file
}}}

{{{#!highlight bash numbers=disable
# Using ed.
ed -s file <<< $'%s/\r$//g\nwq'
}}}

Of course, Any of the more powerful dynamic languages to do this with relative ease.

{{{
perl -pi -e 's/\r\n/\n/' filename
}}}

Some systems have special conversion tools available to do this automatically. `dos2unix`, `recode`, and `fromdos` are some examples.

It be done manually with an editor like nano:

{{{
nano -w yourscript
}}}
Type Ctrl-O and before confirming, type Alt-D (DOS) or Alt-M (Mac) to change the format.

Or in Vim, use `:set fileformat=unix` and save with `:w`. Ensure the value of `fenc` is correct (probably utf-8).

To simply strip all CRs from some input stream, you can use `tr -d '\r' <infile >outfile`. Of course, you must ensure these are not the same file.