Differences between revisions 18 and 29 (spanning 11 versions)
Revision 18 as of 2013-07-27 07:10:16
Size: 3056
Comment: `sed -n l` is the POSIX equivalent of `cat -e`.
Revision 29 as of 2022-01-30 01:59:53
Size: 3518
Editor: larryv
Comment: minor clarifications, stylistic revisions
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
Carriage return characters (CRs) are used in line ending markers on some systems. There are three different kinds of line endings in common use: Carriage return (CR) characters are used in line ending markers on some systems. There are three different kinds of line endings in common use:
Line 5: Line 5:
 *Unix systems use Line Feeds (LFs) only.  *Unix systems use line feed (LF) characters only.
Line 13: Line 13:
A simple check is to simply look at the output of `sed -n l`: A simple check is to look at the output of `sed -n l`:
Line 18: Line 18:
If you see something like this, then you're dealing with CRLF style newlines: This should output the script in one of these formats:
Line 20: Line 20:
{{{
command\r$
\r$
another command\r$
}}}
|| '''LF (Unix)''' || '''CR-LF (DOS/Windows)''' || '''CR (Old Mac OS)''' ||
||<style="border-bottom:none;"> command$ ||<style="border-bottom:none;"> command\r$ ||<style="border-bottom:none;"> command\r\ranother command\r$ ||
||<style="border-top:none; border-bottom:none"> $ ||<style="border-top:none; border-bottom:none;"> \r$ ||<style="border-top:none; border-bottom:none;"> ||
||<style="border-top:none;"> another command$ ||<style="border-top:none;"> another command\r$ ||<style="border-top:none;"> ||
Line 26: Line 25:
Another method is to use the `file` utility if available to guess at the file type: Another method is to guess at the file type using the `file` utility, if available:
Line 32: Line 31:
The output tells you whether the ASCII text has some CR, if that's the case. Note: this is only true on GNU/Linux. On other operating systems, the result of `file` is unpredictable, except that it should contain the word "text" somewhere in the output if the result "kind of looks like a text file of some sort, maybe". On GNU/Linux, the output of `file` tells you whether the ASCII text has some CR. On other operating systems, the output is unpredictable, except that it should contain the word "text" somewhere if the input "kind of looks like a text file of some sort, maybe".
Line 42: Line 41:
In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and by a DOS utility, there should be no bare LFs not preceded by a CR. You can check this with a PCRE. In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory, a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and one created by a DOS utility should not contain any LFs that are not preceded by a CR.
Line 44: Line 43:
{{{
# Bash / Ksh
{{{#!highlight bash
# Bash / Ksh / Zsh
Line 47: Line 46:
# [[ $(</dev/fd/0) == ~(P)(?<!$'\r')$'\n' ]] # ksh93 PCRE
pattern=*[^$'\r']$'\n'*
if [[ $(</dev/fd/0) == $pattern ]] <<<$'foo\r\nbar\r\nbaz\r'; then
    print 'File contains only CRLFs'
if grep -qv $'\r$' File; then
    echo 'File contains at least one newline not preceded by a CR'
Line 52: Line 49:
    print 'File contains at least one newline not preceded by a CR'     echo 'File contains only CRLFs (or is empty)'
Line 58: Line 55:
`ex` is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it in-place from a script: `ex` is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it [[BashFAQ/021#Files|in-place]] from a script:
Line 60: Line 57:
{{{ {{{#!highlight bash numbers=disable
Line 68: Line 65:
Of course, Any of the more powerful dynamic languages to do this with relative ease. {{{#!highlight bash numbers=disable
# Using ed.
ed -s file <<< $'%s/\r$//g\nwq'
}}}

Of course, more powerful dynamic languages can do this with relative ease.
Line 83: Line 85:
Or in Vim, use `:set fileformat=unix` and save with `:w`. Ensure the value of `fenc` is correct (probably utf-8). Or in Vim, use `:set fileformat=unix` and save with `:w`. Ensure the value of `fenc` is correct (probably "utf-8").
Line 85: Line 87:
To simply strip all CRs from some input stream, you can use `tr -d '\r' <infile >outfile`. Of course, you must ensure these are not the same file. To simply strip all CRs from some input stream, you can use `tr -d '\r' <infile >outfile`. Of course, you must ensure that `infile` and `outfile` [[BashPitfalls#pf13|are not the same file]].

How do I convert a file from DOS format to UNIX format (remove CRs from CR-LF line terminators)?

Carriage return (CR) characters are used in line ending markers on some systems. There are three different kinds of line endings in common use:

  • Unix systems use line feed (LF) characters only.
  • MS-DOS and Windows systems use CR-LF pairs.
  • Old Macintosh systems use CRs only.

If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems.

Testing for line terminator type

A simple check is to look at the output of sed -n l:

sed -n l yourscript

This should output the script in one of these formats:

LF (Unix)

CR-LF (DOS/Windows)

CR (Old Mac OS)

command$

command\r$

command\r\ranother command\r$

$

\r$

another command$

another command\r$

Another method is to guess at the file type using the file utility, if available:

file yourscript

On GNU/Linux, the output of file tells you whether the ASCII text has some CR. On other operating systems, the output is unpredictable, except that it should contain the word "text" somewhere if the input "kind of looks like a text file of some sort, maybe".

imadev:~$ printf 'DOS\r\nline endings\r\n' > foo
imadev:~$ file foo
foo:            commands text
arc3:~$ file foo
foo: ASCII text, with CRLF line terminators

In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory, a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and one created by a DOS utility should not contain any LFs that are not preceded by a CR.

   1 # Bash / Ksh / Zsh
   2 
   3 if grep -qv $'\r$' File; then
   4     echo 'File contains at least one newline not preceded by a CR'
   5 else
   6     echo 'File contains only CRLFs (or is empty)'
   7 fi

Converting files

ex is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it in-place from a script:

# works with vim's ex but not vi's ex
ex -sc $'%s/\r$//e|x' file

# works with vi's ex but not vim's ex
ex -sc $'%s/\r$//|x' file

# Using ed.
ed -s file <<< $'%s/\r$//g\nwq'

Of course, more powerful dynamic languages can do this with relative ease.

perl -pi -e 's/\r\n/\n/' filename

Some systems have special conversion tools available to do this automatically. dos2unix, recode, and fromdos are some examples.

It be done manually with an editor like nano:

nano -w yourscript

Type Ctrl-O and before confirming, type Alt-D (DOS) or Alt-M (Mac) to change the format.

Or in Vim, use :set fileformat=unix and save with :w. Ensure the value of fenc is correct (probably "utf-8").

To simply strip all CRs from some input stream, you can use tr -d '\r' <infile >outfile. Of course, you must ensure that infile and outfile are not the same file.

BashFAQ/052 (last edited 2022-01-30 01:59:53 by larryv)