1311
Comment: expand
|
← Revision 29 as of 2022-01-30 01:59:53 ⇥
3518
minor clarifications, stylistic revisions
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
[[Anchor(faq52)]] | <<Anchor(faq52)>> |
Line 3: | Line 3: |
Carriage return (CR) characters are used in line ending markers on some systems. There are three different kinds of line endings in common use: | |
Line 4: | Line 5: |
All these are from the [http://www.student.northpark.edu/pemente/sed/sed1line.txt sed one-liners page]: | *Unix systems use line feed (LF) characters only. *MS-DOS and Windows systems use CR-LF pairs. *Old Macintosh systems use CRs only. If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems. === Testing for line terminator type === A simple check is to look at the output of `sed -n l`: |
Line 6: | Line 15: |
sed 's/.$//' dosfile # assumes that all lines end with CR/LF sed 's/^M$//' dosfile # in bash/tcsh, press Ctrl-V then Ctrl-M sed 's/\x0D$//' dosfile # GNUism - does not work with Unix sed! |
sed -n l yourscript |
Line 11: | Line 18: |
If you want to remove all CRs regardless of whether they are at the end of a line, you can use {{{tr}}}: | This should output the script in one of these formats: || '''LF (Unix)''' || '''CR-LF (DOS/Windows)''' || '''CR (Old Mac OS)''' || ||<style="border-bottom:none;"> command$ ||<style="border-bottom:none;"> command\r$ ||<style="border-bottom:none;"> command\r\ranother command\r$ || ||<style="border-top:none; border-bottom:none"> $ ||<style="border-top:none; border-bottom:none;"> \r$ ||<style="border-top:none; border-bottom:none;"> || ||<style="border-top:none;"> another command$ ||<style="border-top:none;"> another command\r$ ||<style="border-top:none;"> || Another method is to guess at the file type using the `file` utility, if available: |
Line 14: | Line 28: |
tr -d '\r' < dosfile | file yourscript |
Line 17: | Line 31: |
If you want to use the second {{{sed}}} example above, but without embedding a literal CR into your script: | On GNU/Linux, the output of `file` tells you whether the ASCII text has some CR. On other operating systems, the output is unpredictable, except that it should contain the word "text" somewhere if the input "kind of looks like a text file of some sort, maybe". |
Line 20: | Line 34: |
sed $'s/\r$//' dosfile # BASH only | imadev:~$ printf 'DOS\r\nline endings\r\n' > foo imadev:~$ file foo foo: commands text arc3:~$ file foo foo: ASCII text, with CRLF line terminators |
Line 23: | Line 41: |
All of the previous examples write the modified file to standard output. Redirect the output to a new file, and then {{{mv}}} it over top of the original. | In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory, a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and one created by a DOS utility should not contain any LFs that are not preceded by a CR. |
Line 25: | Line 43: |
There are many more ways: * Some systems have a {{{dos2unix}}} command which can do this. Or {{{recode}}}, or {{{fromdos}}}. * In {{{vim}}}, you can use {{{:set fileformat=unix}}} to do it. * You can use Perl: {{{ perl -pi -e 's/\r\n/\n/' filename}}} This has the advantage of overwriting the original file, so you don't have to mess with temporary files. |
{{{#!highlight bash # Bash / Ksh / Zsh if grep -qv $'\r$' File; then echo 'File contains at least one newline not preceded by a CR' else echo 'File contains only CRLFs (or is empty)' fi }}} === Converting files === `ex` is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it [[BashFAQ/021#Files|in-place]] from a script: {{{#!highlight bash numbers=disable # works with vim's ex but not vi's ex ex -sc $'%s/\r$//e|x' file # works with vi's ex but not vim's ex ex -sc $'%s/\r$//|x' file }}} {{{#!highlight bash numbers=disable # Using ed. ed -s file <<< $'%s/\r$//g\nwq' }}} Of course, more powerful dynamic languages can do this with relative ease. {{{ perl -pi -e 's/\r\n/\n/' filename }}} Some systems have special conversion tools available to do this automatically. `dos2unix`, `recode`, and `fromdos` are some examples. It be done manually with an editor like nano: {{{ nano -w yourscript }}} Type Ctrl-O and before confirming, type Alt-D (DOS) or Alt-M (Mac) to change the format. Or in Vim, use `:set fileformat=unix` and save with `:w`. Ensure the value of `fenc` is correct (probably "utf-8"). To simply strip all CRs from some input stream, you can use `tr -d '\r' <infile >outfile`. Of course, you must ensure that `infile` and `outfile` [[BashPitfalls#pf13|are not the same file]]. |
How do I convert a file from DOS format to UNIX format (remove CRs from CR-LF line terminators)?
Carriage return (CR) characters are used in line ending markers on some systems. There are three different kinds of line endings in common use:
- Unix systems use line feed (LF) characters only.
- MS-DOS and Windows systems use CR-LF pairs.
- Old Macintosh systems use CRs only.
If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems.
Testing for line terminator type
A simple check is to look at the output of sed -n l:
sed -n l yourscript
This should output the script in one of these formats:
LF (Unix) |
CR-LF (DOS/Windows) |
CR (Old Mac OS) |
command$ |
command\r$ |
command\r\ranother command\r$ |
$ |
\r$ |
|
another command$ |
another command\r$ |
|
Another method is to guess at the file type using the file utility, if available:
file yourscript
On GNU/Linux, the output of file tells you whether the ASCII text has some CR. On other operating systems, the output is unpredictable, except that it should contain the word "text" somewhere if the input "kind of looks like a text file of some sort, maybe".
imadev:~$ printf 'DOS\r\nline endings\r\n' > foo imadev:~$ file foo foo: commands text arc3:~$ file foo foo: ASCII text, with CRLF line terminators
In a script, it's more difficult to say what the most reliable method should be. Anything you do is going to be a heuristic. In theory, a non-corrupt file created by a non-broken UNIX utility should only contain LFs, and one created by a DOS utility should not contain any LFs that are not preceded by a CR.
Converting files
ex is a good standard way to convert CRLF to LF, and probably one of the few reasonable methods for doing it in-place from a script:
# works with vim's ex but not vi's ex
ex -sc $'%s/\r$//e|x' file
# works with vi's ex but not vim's ex
ex -sc $'%s/\r$//|x' file
# Using ed.
ed -s file <<< $'%s/\r$//g\nwq'
Of course, more powerful dynamic languages can do this with relative ease.
perl -pi -e 's/\r\n/\n/' filename
Some systems have special conversion tools available to do this automatically. dos2unix, recode, and fromdos are some examples.
It be done manually with an editor like nano:
nano -w yourscript
Type Ctrl-O and before confirming, type Alt-D (DOS) or Alt-M (Mac) to change the format.
Or in Vim, use :set fileformat=unix and save with :w. Ensure the value of fenc is correct (probably "utf-8").
To simply strip all CRs from some input stream, you can use tr -d '\r' <infile >outfile. Of course, you must ensure that infile and outfile are not the same file.