Differences between revisions 14 and 17 (spanning 3 versions)
Revision 14 as of 2012-12-03 07:49:27
Size: 2331
Editor: 37
Comment:
Revision 17 as of 2013-07-11 16:31:27
Size: 3039
Editor: herngaard
Comment:
Deletions are marked like this. Additions are marked like this.
Line 9: Line 9:
If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems.  You can check the kind of line endings in use by running: If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems.

=== Testing for line terminator type ===

A simple
check is to simply look at the output of `cat -e`:
Line 14: Line 18:
If you see something like this: If you see something like this, then you're dealing with CRLF style newlines:
Line 21: Line 26:
then you need to remove the CRs. There are a plethora of ways to do this. Another method is to use the `file` utility to guess at the file type:
Line 23: Line 28:
To remove them from a file, `ex` is a good standard way to do it:

{{{
ex -sc $'%s/\r$//e|x' file
}}}

There are many more ways:
 * Some systems have a {{{dos2unix}}} command which can do this. Or {{{recode}}}, or {{{fromdos}}}.
 * You can also use col <input.txt > output.txt
 * In {{{vim}}}, you can use {{{:set fileformat=unix}}} to do it and save it with a ":w".
 * You can use Perl:
  {{{
  perl -pi -e 's/\r\n/\n/' filename
  }}}
 This has the advantage of overwriting the original file, so you don't have to mess with temporary files.

----

Another way to check it:
Line 45: Line 31:
Line 46: Line 33:
Line 54: Line 42:
And another way to fix it: In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and by a DOS utility, there should be no bare LFs not preceded by a CR. You can check this with a PCRE.

{{{
# Bash / Ksh

# [[ $(</dev/fd/0) == ~(P)(?<!$'\r')$'\n' ]] # ksh93 PCRE
pattern=*[^$'\r']$'\n'*
if [[ $(</dev/fd/0) == $pattern ]] <<<$'foo\r\nbar\r\nbaz\r'; then
    print 'File contains only CRLFs'
else
    print 'File contains at least one newline not preceded by a CR'
fi
}}}

=== Converting files ===

`ex` is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it in-place from a script:

{{{
# works with vim's ex but not vi's ex
ex -sc $'%s/\r$//e|x' file

# works with vi's ex but not vim's ex
ex -sc $'%s/\r$//|x' file
}}}

Of course, Any of the more powerful dynamic languages to do this with relative ease.

{{{
perl -pi -e 's/\r\n/\n/' filename
}}}

Some systems have special conversion tools available to do this automatically. `dos2unix`, `recode`, and `fromdos` are some examples.

It be done manually with an editor like nano:
Line 60: Line 83:
Or, if available, you can use the program `dos2unix`, sometimes also called `tofrodos`. Or in Vim, use `:set fileformat=unix` and save with `:w`. Ensure the value of `fenc` is correct (probably utf-8).
Line 62: Line 85:
Finally you can just use tr:
{{{
cat yourscript | tr -d "\r" > yourscript2
cat yourscript2 > yourscript
rm yourscript2 # we could mv but this way we preserve attributes if your script
}}}
To simply strip all CRs from some input stream, you can use `tr -d '\r' <infile >outfile`. Of course, you must ensure these are not the same file.

How do I convert a file from DOS format to UNIX format (remove CRs from CR-LF line terminators)?

Carriage return characters (CRs) are used in line ending markers on some systems. There are three different kinds of line endings in common use:

  • Unix systems use Line Feeds (LFs) only.
  • MS-DOS and Windows systems use CR-LF pairs.
  • Old Macintosh systems use CRs only.

If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems.

Testing for line terminator type

A simple check is to simply look at the output of cat -e:

cat -e yourscript

If you see something like this, then you're dealing with CRLF style newlines:

command^M$
^M$
another command^M$

Another method is to use the file utility to guess at the file type:

file yourscript

The output tells you whether the ASCII text has some CR, if that's the case. Note: this is only true on GNU/Linux. On other operating systems, the result of file is unpredictable, except that it should contain the word "text" somewhere in the output if the result "kind of looks like a text file of some sort, maybe".

imadev:~$ printf 'DOS\r\nline endings\r\n' > foo
imadev:~$ file foo
foo:            commands text
arc3:~$ file foo
foo: ASCII text, with CRLF line terminators

In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and by a DOS utility, there should be no bare LFs not preceded by a CR. You can check this with a PCRE.

# Bash / Ksh

# [[ $(</dev/fd/0) == ~(P)(?<!$'\r')$'\n' ]] # ksh93 PCRE
pattern=*[^$'\r']$'\n'*
if [[ $(</dev/fd/0) == $pattern ]] <<<$'foo\r\nbar\r\nbaz\r'; then
    print 'File contains only CRLFs'
else
    print 'File contains at least one newline not preceded by a CR'
fi

Converting files

ex is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it in-place from a script:

# works with vim's ex but not vi's ex
ex -sc $'%s/\r$//e|x' file

# works with vi's ex but not vim's ex
ex -sc $'%s/\r$//|x' file

Of course, Any of the more powerful dynamic languages to do this with relative ease.

perl -pi -e 's/\r\n/\n/' filename

Some systems have special conversion tools available to do this automatically. dos2unix, recode, and fromdos are some examples.

It be done manually with an editor like nano:

nano -w yourscript

Type Ctrl-O and before confirming, type Alt-D (DOS) or Alt-M (Mac) to change the format.

Or in Vim, use :set fileformat=unix and save with :w. Ensure the value of fenc is correct (probably utf-8).

To simply strip all CRs from some input stream, you can use tr -d '\r' <infile >outfile. Of course, you must ensure these are not the same file.

BashFAQ/052 (last edited 2022-01-30 01:59:53 by larryv)