Differences between revisions 15 and 28 (spanning 13 versions)
Revision 15 as of 2013-02-10 15:48:02
Size: 2931
Editor: ormaaj
Comment: Reorganize.
Revision 28 as of 2022-01-30 01:19:28
Size: 3543
Editor: larryv
Comment: add some internal links
Deletions are marked like this. Additions are marked like this.
Line 13: Line 13:
A simple check is to simply look at the output of `cat -e`: A simple check is to simply look at the output of `sed -n l`:
Line 15: Line 15:
cat -e yourscript sed -n l yourscript
Line 18: Line 18:
If you see something like this, then you're dealing with CRLF style newlines: which should write the script in one of these formats:
Line 20: Line 20:
{{{
command^M$
^M$
another command^M$
}}}
|| '''LF (Unix)''' || '''CR-LF (DOS/Windows)''' || '''CR (Old Mac OS)''' ||
||<style="border-bottom:none;"> command$ ||<style="border-bottom:none;"> command\r$ ||<style="border-bottom:none;"> command\r\ranother command\r$ ||
||<style="border-top:none; border-bottom:none"> $ ||<style="border-top:none; border-bottom:none;"> \r$ ||<style="border-top:none; border-bottom:none;"> ||
||<style="border-top:none;"> another command$ ||<style="border-top:none;"> another command\r$ ||<style="border-top:none;"> ||
Line 26: Line 25:
Another method is to use the `file` utility to guess at the file type: Another method is to use the `file` utility if available to guess at the file type:
Line 42: Line 41:
In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and by a DOS utility, there should be no bare LFs not preceded by a CR. You can check this with a PCRE. In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and by a DOS utility, there should be no bare LFs not preceded by a CR.
Line 44: Line 43:
{{{
# Bash / Ksh
{{{#!highlight bash
# Bash / Ksh / Zsh
Line 47: Line 46:
# [[ $(</dev/fd/0) == ~(P)(?<!$'\r')$'\n' ]] # ksh93 PCRE
pattern=*[^$'\r']$'\n'*
if [[ $(</dev/fd/0) == $pattern ]] <<<$'foo\r\nbar\r\nbaz\r'; then
    print 'File contains only CRLFs'
if grep -qv $'\r$' File; then
    echo 'File contains at least one newline not preceded by a CR'
Line 52: Line 49:
    print 'File contains at least one newline not preceded by a CR'     echo 'File contains only CRLFs (or is empty)'
Line 58: Line 55:
`ex` is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it in-place from a script: `ex` is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it [[BashFAQ/021#Files|in-place]] from a script:
Line 60: Line 57:
{{{ {{{#!highlight bash numbers=disable
# works with vim's ex but not vi's ex
Line 62: Line 60:

# works with vi's ex but not vim's ex
ex -sc $'%s/\r$//|x' file
}}}

{{{#!highlight bash numbers=disable
# Using ed.
ed -s file <<< $'%s/\r$//g\nwq'
Line 70: Line 76:
Some systems have special conversion tools available to do this automatically. `dos2unix`, `recode, and `fromdos` are some examples. Some systems have special conversion tools available to do this automatically. `dos2unix`, `recode`, and `fromdos` are some examples.
Line 81: Line 87:
To simply strip all CRs from some input stream, you can use `tr -d '\r' <infile >outfile`. Of course, you must ensure these are not the same file. To simply strip all CRs from some input stream, you can use `tr -d '\r' <infile >outfile`. Of course, you must ensure these [[BashPitfalls#pf13|are not the same file]].

How do I convert a file from DOS format to UNIX format (remove CRs from CR-LF line terminators)?

Carriage return characters (CRs) are used in line ending markers on some systems. There are three different kinds of line endings in common use:

  • Unix systems use Line Feeds (LFs) only.
  • MS-DOS and Windows systems use CR-LF pairs.
  • Old Macintosh systems use CRs only.

If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems.

Testing for line terminator type

A simple check is to simply look at the output of sed -n l:

sed -n l yourscript

which should write the script in one of these formats:

LF (Unix)

CR-LF (DOS/Windows)

CR (Old Mac OS)

command$

command\r$

command\r\ranother command\r$

$

\r$

another command$

another command\r$

Another method is to use the file utility if available to guess at the file type:

file yourscript

The output tells you whether the ASCII text has some CR, if that's the case. Note: this is only true on GNU/Linux. On other operating systems, the result of file is unpredictable, except that it should contain the word "text" somewhere in the output if the result "kind of looks like a text file of some sort, maybe".

imadev:~$ printf 'DOS\r\nline endings\r\n' > foo
imadev:~$ file foo
foo:            commands text
arc3:~$ file foo
foo: ASCII text, with CRLF line terminators

In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and by a DOS utility, there should be no bare LFs not preceded by a CR.

   1 # Bash / Ksh / Zsh
   2 
   3 if grep -qv $'\r$' File; then
   4     echo 'File contains at least one newline not preceded by a CR'
   5 else
   6     echo 'File contains only CRLFs (or is empty)'
   7 fi

Converting files

ex is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it in-place from a script:

# works with vim's ex but not vi's ex
ex -sc $'%s/\r$//e|x' file

# works with vi's ex but not vim's ex
ex -sc $'%s/\r$//|x' file

# Using ed.
ed -s file <<< $'%s/\r$//g\nwq'

Of course, Any of the more powerful dynamic languages to do this with relative ease.

perl -pi -e 's/\r\n/\n/' filename

Some systems have special conversion tools available to do this automatically. dos2unix, recode, and fromdos are some examples.

It be done manually with an editor like nano:

nano -w yourscript

Type Ctrl-O and before confirming, type Alt-D (DOS) or Alt-M (Mac) to change the format.

Or in Vim, use :set fileformat=unix and save with :w. Ensure the value of fenc is correct (probably utf-8).

To simply strip all CRs from some input stream, you can use tr -d '\r' <infile >outfile. Of course, you must ensure these are not the same file.

BashFAQ/052 (last edited 2022-01-30 01:59:53 by larryv)