1958
Comment: explain why
|
3499
tr doesn't support filename arguments
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
[[Anchor(faq52)]] | <<Anchor(faq52)>> |
Line 3: | Line 3: |
Carriage return characters (CRs) are used in line ending markers on some systems. There are three different kinds of line endings in common use: | |
Line 4: | Line 5: |
Carriage return characters (CRs) are used in line ending markers on some systems. There are three different kinds of line endings in common use: | |
Line 9: | Line 9: |
If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems. You can check the kind of line endings in use by running: | If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems. === Testing for line terminator type === A simple check is to simply look at the output of `sed -n l`: |
Line 11: | Line 15: |
cat -e yourscript}}} If you see something like this: {{{ command^M$ ^M$ another command^M$}}} then you need to remove the CRs. There are a plethora of ways to do this. All these are from the [http://www.student.northpark.edu/pemente/sed/sed1line.txt sed one-liners page]: {{{ sed 's/.$//' dosfile # assumes that all lines end with CR/LF sed 's/^M$//' dosfile # in bash/tcsh, press Ctrl-V then Ctrl-M sed 's/\x0D$//' dosfile # GNUism - does not work with Unix sed! |
sed -n l yourscript |
Line 28: | Line 18: |
If you want to remove all CRs regardless of whether they are at the end of a line, you can use {{{tr}}}: | which should write the script in one of these formats: || '''LF (Unix)''' || '''CR-LF (DOS/Windows)''' || '''CR (Old Mac OS)''' || ||<style="border-bottom:none;"> command$ ||<style="border-bottom:none;"> command\r$ ||<style="border-bottom:none;"> command\r\ranother command\r$ || ||<style="border-top:none; border-bottom:none"> $ ||<style="border-top:none; border-bottom:none;"> \r$ ||<style="border-top:none; border-bottom:none;"> || ||<style="border-top:none;"> another command$ ||<style="border-top:none;"> another command\r$ ||<style="border-top:none;"> || Another method is to use the `file` utility if available to guess at the file type: |
Line 31: | Line 28: |
tr -d '\r' < dosfile | file yourscript |
Line 34: | Line 31: |
If you want to use the second {{{sed}}} example above, but without embedding a literal CR into your script: | The output tells you whether the ASCII text has some CR, if that's the case. Note: this is only true on GNU/Linux. On other operating systems, the result of `file` is unpredictable, except that it should contain the word "text" somewhere in the output if the result "kind of looks like a text file of some sort, maybe". |
Line 37: | Line 34: |
sed $'s/\r$//' dosfile # BASH only | imadev:~$ printf 'DOS\r\nline endings\r\n' > foo imadev:~$ file foo foo: commands text arc3:~$ file foo foo: ASCII text, with CRLF line terminators |
Line 40: | Line 41: |
All of the previous examples write the modified file to standard output. Redirect the output to a new file, and then {{{mv}}} it over top of the original. | In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and by a DOS utility, there should be no bare LFs not preceded by a CR. |
Line 42: | Line 43: |
There are many more ways: * Some systems have a {{{dos2unix}}} command which can do this. Or {{{recode}}}, or {{{fromdos}}}. * In {{{vim}}}, you can use {{{:set fileformat=unix}}} to do it. * You can use Perl: {{{ perl -pi -e 's/\r\n/\n/' filename}}} This has the advantage of overwriting the original file, so you don't have to mess with temporary files. |
{{{#!highlight bash # Bash / Ksh / Zsh if grep -qv $'\r$' File; then echo 'File contains at least one newline not preceded by a CR' else echo 'File contains only CRLFs (or is empty)' fi }}} === Converting files === `ex` is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it in-place from a script: {{{#!highlight bash numbers=disable # works with vim's ex but not vi's ex ex -sc $'%s/\r$//e|x' file # works with vi's ex but not vim's ex ex -sc $'%s/\r$//|x' file }}} {{{#!highlight bash numbers=disable # Using ed. ed -s file <<< $'%s/\r$//g\nwq' }}} Of course, Any of the more powerful dynamic languages to do this with relative ease. {{{ perl -pi -e 's/\r\n/\n/' filename }}} Some systems have special conversion tools available to do this automatically. `dos2unix`, `recode`, and `fromdos` are some examples. It be done manually with an editor like nano: {{{ nano -w yourscript }}} Type Ctrl-O and before confirming, type Alt-D (DOS) or Alt-M (Mac) to change the format. Or in Vim, use `:set fileformat=unix` and save with `:w`. Ensure the value of `fenc` is correct (probably utf-8). To simply strip all CRs from some input stream, you can use `tr -d '\r' <infile >outfile`. Of course, you must ensure these are not the same file. |
How do I convert a file from DOS format to UNIX format (remove CRs from CR-LF line terminators)?
Carriage return characters (CRs) are used in line ending markers on some systems. There are three different kinds of line endings in common use:
- Unix systems use Line Feeds (LFs) only.
- MS-DOS and Windows systems use CR-LF pairs.
- Old Macintosh systems use CRs only.
If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems.
Testing for line terminator type
A simple check is to simply look at the output of sed -n l:
sed -n l yourscript
which should write the script in one of these formats:
LF (Unix) |
CR-LF (DOS/Windows) |
CR (Old Mac OS) |
command$ |
command\r$ |
command\r\ranother command\r$ |
$ |
\r$ |
|
another command$ |
another command\r$ |
|
Another method is to use the file utility if available to guess at the file type:
file yourscript
The output tells you whether the ASCII text has some CR, if that's the case. Note: this is only true on GNU/Linux. On other operating systems, the result of file is unpredictable, except that it should contain the word "text" somewhere in the output if the result "kind of looks like a text file of some sort, maybe".
imadev:~$ printf 'DOS\r\nline endings\r\n' > foo imadev:~$ file foo foo: commands text arc3:~$ file foo foo: ASCII text, with CRLF line terminators
In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and by a DOS utility, there should be no bare LFs not preceded by a CR.
Converting files
ex is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it in-place from a script:
# works with vim's ex but not vi's ex
ex -sc $'%s/\r$//e|x' file
# works with vi's ex but not vim's ex
ex -sc $'%s/\r$//|x' file
# Using ed.
ed -s file <<< $'%s/\r$//g\nwq'
Of course, Any of the more powerful dynamic languages to do this with relative ease.
perl -pi -e 's/\r\n/\n/' filename
Some systems have special conversion tools available to do this automatically. dos2unix, recode, and fromdos are some examples.
It be done manually with an editor like nano:
nano -w yourscript
Type Ctrl-O and before confirming, type Alt-D (DOS) or Alt-M (Mac) to change the format.
Or in Vim, use :set fileformat=unix and save with :w. Ensure the value of fenc is correct (probably utf-8).
To simply strip all CRs from some input stream, you can use tr -d '\r' <infile >outfile. Of course, you must ensure these are not the same file.