2931
Comment: Reorganize.
|
3493
|
Deletions are marked like this. | Additions are marked like this. |
Line 13: | Line 13: |
A simple check is to simply look at the output of `cat -e`: | A simple check is to simply look at the output of `sed -n l`: |
Line 15: | Line 15: |
cat -e yourscript | sed -n l yourscript }}} which should output the script in one of the variants shown in the ascii table below: {{{ +--------------------+---------------------+-------------------------------+ | LF (unix) | CR-LF (dos/windows) | CR (old mac) | +--------------------+---------------------+-------------------------------+ |command$ |command\r$ |command\r\ranother command\r$ | |$ |\r$ | | |another command$ |another command\r$ | | +--------------------+---------------------+-------------------------------+ |
Line 18: | Line 28: |
If you see something like this, then you're dealing with CRLF style newlines: {{{ command^M$ ^M$ another command^M$ }}} Another method is to use the `file` utility to guess at the file type: |
Another method is to use the `file` utility if available to guess at the file type: |
Line 42: | Line 44: |
In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and by a DOS utility, there should be no bare LFs not preceded by a CR. You can check this with a PCRE. | In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and by a DOS utility, there should be no bare LFs not preceded by a CR. |
Line 45: | Line 47: |
# Bash / Ksh | # Bash / Ksh /Zsh |
Line 47: | Line 49: |
# [[ $(</dev/fd/0) == ~(P)(?<!$'\r')$'\n' ]] # ksh93 PCRE pattern=*[^$'\r']$'\n'* if [[ $(</dev/fd/0) == $pattern ]] <<<$'foo\r\nbar\r\nbaz\r'; then print 'File contains only CRLFs' |
if grep -qv $'\r$' < File; then echo 'File contains at least one newline not preceded by a CR' |
Line 52: | Line 52: |
print 'File contains at least one newline not preceded by a CR' | echo 'File contains only CRLFs (or is empty)' |
Line 61: | Line 61: |
# works with vim's ex but not vi's ex | |
Line 62: | Line 63: |
# works with vi's ex but not vim's ex ex -sc $'%s/\r$//|x' file }}} {{{ # Using ed. ed -s file <<< $'%s/\r$//g\nwq' |
|
Line 70: | Line 79: |
Some systems have special conversion tools available to do this automatically. `dos2unix`, `recode, and `fromdos` are some examples. | Some systems have special conversion tools available to do this automatically. `dos2unix`, `recode`, and `fromdos` are some examples. |
How do I convert a file from DOS format to UNIX format (remove CRs from CR-LF line terminators)?
Carriage return characters (CRs) are used in line ending markers on some systems. There are three different kinds of line endings in common use:
- Unix systems use Line Feeds (LFs) only.
- MS-DOS and Windows systems use CR-LF pairs.
- Old Macintosh systems use CRs only.
If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems.
Testing for line terminator type
A simple check is to simply look at the output of sed -n l:
sed -n l yourscript
which should output the script in one of the variants shown in the ascii table below:
+--------------------+---------------------+-------------------------------+ | LF (unix) | CR-LF (dos/windows) | CR (old mac) | +--------------------+---------------------+-------------------------------+ |command$ |command\r$ |command\r\ranother command\r$ | |$ |\r$ | | |another command$ |another command\r$ | | +--------------------+---------------------+-------------------------------+
Another method is to use the file utility if available to guess at the file type:
file yourscript
The output tells you whether the ASCII text has some CR, if that's the case. Note: this is only true on GNU/Linux. On other operating systems, the result of file is unpredictable, except that it should contain the word "text" somewhere in the output if the result "kind of looks like a text file of some sort, maybe".
imadev:~$ printf 'DOS\r\nline endings\r\n' > foo imadev:~$ file foo foo: commands text arc3:~$ file foo foo: ASCII text, with CRLF line terminators
In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and by a DOS utility, there should be no bare LFs not preceded by a CR.
# Bash / Ksh /Zsh if grep -qv $'\r$' < File; then echo 'File contains at least one newline not preceded by a CR' else echo 'File contains only CRLFs (or is empty)' fi
Converting files
ex is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it in-place from a script:
# works with vim's ex but not vi's ex ex -sc $'%s/\r$//e|x' file # works with vi's ex but not vim's ex ex -sc $'%s/\r$//|x' file
# Using ed. ed -s file <<< $'%s/\r$//g\nwq'
Of course, Any of the more powerful dynamic languages to do this with relative ease.
perl -pi -e 's/\r\n/\n/' filename
Some systems have special conversion tools available to do this automatically. dos2unix, recode, and fromdos are some examples.
It be done manually with an editor like nano:
nano -w yourscript
Type Ctrl-O and before confirming, type Alt-D (DOS) or Alt-M (Mac) to change the format.
Or in Vim, use :set fileformat=unix and save with :w. Ensure the value of fenc is correct (probably utf-8).
To simply strip all CRs from some input stream, you can use tr -d '\r' <infile >outfile. Of course, you must ensure these are not the same file.