12580
Comment:
|
238
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= Arguments = This topic describes what is probably the post important and most misunderstood topic about shell programming. It is absolutely vital that you '''understand''' everything that is explained here thoroughly before you do any important work in the shell. Misunderstanding what arguments are and how word-splitting works will lead to unexpected bugs, even in code you have tested and appears to work well; and in worse cases, severe corruption and data loss. == Executing commands == A shell is an interface between you (or your script) and the kernel. It allows you to execute commands using simplified syntax as compared to invoking direct system calls. What your shell really does for you, is translate its syntax into system calls. Because of this, it is important that we at least understand the basics of what happens when the shell is ready reading your orders and begin doing your bidding. Executing commands happens through the execve(2) system call. This call needs three pieces of information: * The file to execute: This can be a binary program or a script. * An array of arguments: A list of strings that tell the program what to do. * An array of environment variables To give context to a program and tell it what to do, we provide it with an array of arguments. That means, we give it a series of strings. Each of these strings can contain '''any''' character (byte) (except for a `NUL`-byte). That means, each argument can be a word, a sentence, or more. If we, for example, wish to delete a certain ebook, we might invoke the `rm` program, providing it with the filename of the ebook we wish to remove: {{{ Execute File: [ "/bin/rm" ] With arguments: [ "James, P.D. - Children of Men - Chapter 1.pdf" ] [ "James, P.D. - Children of Men - Chapter 2.pdf" ] ... }}} The `rm` command will use these two arguments to determine what to delete. In its case, it will unlink(2) each argument. So remember: Arguments are '''strings''' of characters, each of them can contain '''any''' character (other than a `NUL` byte) and we can pass several arguments when we execute a file. == Shell Syntax == To make it easy for you to express yourself when asking the system to perform an operation; shells exist that translate their simplified syntax into system calls. It is imperative that we understand this syntax correctly if we are to avoid mistakes and bugs. Shell syntax has been built to provide an intuitive way for us to communicate with the system. It uses techniques such as word splitting and english keywords to allow us to express our wishes in a language that closely resembles the way we would write to each other. Don't be fooled though: This syntax is very exact and shells are no humans; they can't guess at what you might mean if you don't express yourself clearly and unambiguously. '''Do not guess at shell syntax''' based on intuition. Understand it, and then write exactly what you mean. To execute a simple `rm` command from the shell; we would use a statement as plain as: {{{ rm myfile myotherfile }}} This would instruct the shell to delete (`r`e`m`ove) two files: `myfile`, and `myotherfile`. How does the shell know this? How does it convert a sentence into system calls? The key to this is ''Word Splitting''. === Word Splitting === To a shell, whitespace is incredibly important. So don't be fooled into thinking a space or tab more or less won't make much of a difference. And don't assume that because whitespace isn't very relevant in C or Java, that the same goes for your shell: Whitespace is ''vital'' to allowing your shell to understand you. The shell takes your line of code and cuts it up into bits ''wherever there is sequences of syntactical whitespace''. The command above would be split up into the following: {{{ rm myfile myotherfile ^ ^ [rm] [myfile] [myotherfile] }}} As you can see, ''all syntactical whitespace has been removed''. There is no more whitespace left after word splitting is done with your line. We simply have three completely separate chunks of characters: One says `rm`, the other says `myfile`, and the last reads `myotherfile`. The shell now uses these chunks to build its execve(2) system call. First, it uses `PATH` to find a directory that contains a file named `rm` (the first chunk). Then it builds an array of arguments to pass to this file from the other chunks. An execve(2) call is invoked passing `/bin/rm` the last two chunks as context arguments. `rm` then unlink(2)s those files. === Quoting === Now; let's come back to our first example: We wanted to delete the chapter files of our ebook. Doing this from a shell seems problematic, because the chapter filenames contain whitespace. This is not a problem whatsoever for the system call, but it is a big problem for the shell. The shell already uses whitespace for something very important: Determining what chunks of our statement to pass as separate arguments. If we were to, naively, tell the shell to delete the first chapter, without any thought or consideration for its syntax; this would happen: {{{ rm James, P.D. - Children of Men - Chapter 1.pdf ^ ^ ^ ^ ^ ^ ^ ^ ^ [rm] [James,] [P.D.] [-] [Children] [of] [Men] [-] [Chapter] [1.pdf] }}} Your shell would be passing the `rm` program 9 filenames for deletion; none of which are the intended filename. `rm` would try to delete each filename. If you were unlucky, `rm` might delete some of your files that you never intended to delete by your accident. From the ''Word Splitting'' section above, you know why the shell does this now. But how do we help the shell to understand what we really wish to accomplish? The problem is that whitespace is ''syntax'' to the shell. That means, '''it has a meaning'''. We don't want the whitespace in our filename to ''mean'' anything to the shell; we just want it to be part of the chunk; just like any of the other characters. Just like a normal, plain, happy byte. We want our whitespace to be '''literal whitespace'''. Changing something from ''syntax'' into ''literal data'' involves one of two processes: Quoting or Escaping. Quoting our bytes is done by wrapping syntactical quotation marks around them. Escaping is done by preceding each syntactical byte by a syntactical backslash. Pay special notice the word ''syntactical'': It means that these quotation marks '''must not''' be literal; just like our whitespace above, these quotes must be ''unquoted'' and ''unescaped'' to remain syntactical). {{{ # Escaped: rm James,\ P.D.\ -\ Children\ of\ Men\ -\ Chapter\ 1.pdf # Quoted: rm James," "P.D." "-" "Children" "of" "Men" "-" "Chapter" "1.pdf # But also valid is the cleaner: rm "James, P.D. - Children of Men - Chapter 1.pdf" ^ [rm] [James, P.D. - Children of Men - Chapter 1.pdf] }}} Every byte that is embedded in syntactical quotes is no longer considered syntactical (with some quote-specific exceptions I won't go into now). What that means, is that if we quote the string `foo bar`, each character in that string will loose any special purpose or meaning to the shell. The shell will see them as ordinary bytes and pass them along to the chunk it's working on. Since the quotes (or backslashes) are syntactical, they are (just like the one syntactical space left) not included in any chunks. All literal bytes are included in chunks, however, which means we now get only two chunks: One with the command name, and another with the correct filename. === Parameter Expansions === You should understand arguments and quotes well now. Let's introduce another concept that is very popular in shell scripts yet almost just as often misunderstood. Parameters are containers in memory that hold strings for us. We can later use these strings in shell commands without having to repeat the data: We "unload" the data from the memory containers into the statement. This "unloading" is called ''expansion''; hence the term: ''Parameter Expansion''. A common type of parameters are variables. They are parameters with a distinct name and are easy to assign data to. The name of a variable contains only alphanumeric characters (and optionally, an underscore). '''It does not contain a dollar sign'''. Expanding a parameter occurs by prefixing it with a dollar sign. The act of expansion causes the data in this parameter to be ''injected'' into the current statement; almost as though you replaced the parameter expansion with some sentence yourself. {{{ $ place=lawn $ echo Welcome to my $place. Welcome to my lawn. }}} It is '''vital''' to understand, however, that ''Quoting'' and ''Escaping'' are considered ''before'' parameter expansion happens, while ''Word Splitting'' is performed ''after''. That means that it remains absolutely vital that we ''quote our parameter expansions'', in case the may expand values that contain syntactical whitespace which will then in the next step be word-split. It is almost ''never'' desirable to put ''syntactical'' whitespace in parameters. Perhaps you may want to include multiple chunks in one parameter; however, when this is necessary; it is important that you do '''NOT''' use a string-parameter, but expand an ''array'' instead. Here's what would happen if we expanded a parameter whose data contains whitespace into an unquoted statement: {{{ book="Children of Men.pdf" rm $book # After parameter expansion: rm Children of Men.pdf ^ ^ ^ [rm] [Children] [of] [Men.pdf] }}} ==== Wordsplitting Happens After PE ==== Quoting the parameter expansion causes its data to expand inside of a quoted context; meaning its whitespace will loose its syntactical value and will become literal: {{{ book="Children of Men.pdf" rm "$book" # After parameter expansion: rm [Children of Men.pdf] # The [ and ] are pseudo-code; they are not really there but symbolize that these bytes were marked as literal by the quotes above. ^ [rm] [Children of Men.pdf] }}} ==== Quoting Happens Before PE ==== Another common mistake many people make when they see word-splitting errors is to try and include quotes inside their parameter data. This doesn't work, for the simple reason that these quotes inside parameters are literal quotes and by the time bash expands parameter values, it has already determined what bytes in the statement need to literalized later on: {{{ book='"Children of Men.pdf"' rm $book # After parameter expansion: rm "Children of Men.pdf" # The quotes here are LITERAL quotes, NOT syntactical. I wrote no [ and ] because there were no quotes in the above rm command to tell bash to literalize any bytes. ^ ^ ^ [rm] ["Children] [of] [Men.pdf"] }}} Note that since you've expanded literal quotes, these quotes are now also part of the chunks, just like any other literal bytes. The whitespace, however, is not literal: It was not quoted or escaped by any syntactical quotes and the word-splitting step that occurs after parameter expansion has its way with them. = Conclusion = This may be a bit much for you to grasp all at once, and grasp it well. Please bookmark this page if you think it will help you to come back and re-read it later. To make things simple, consistent and safe for you; you should follow the following guidelines: * "Quote" any arguments that contain data which also happens to be shell syntax. * "$Quote" all parameter expansions in arguments. You never '''really''' know what a parameter might expand into; and even if you think it won't expand bytes that happen to be shell syntax, quoting will future-proof your code and make it safer and ''more consistent''. * Don't try to put syntactical quotes inside parameters: It doesn't work. And some additional related tips: * If you need to store multiple "items", use an array: files=( 1.pdf 2.pdf "1 and a half.pdf" ); rm "${files[@]}" * Do '''NOT''' try to put commands inside parameters; you cannot properly quote its arguments. Use a function instead: search() { cd /foo; find . -name "$1"; }; search '*.pdf'; search '*.jpg' |
Hi! <<BR>> My name is Steven and I'm a 28 years old boy from [[https://www.vocabulary.com/dictionary/Haversin|Haversin]].<<BR>> <<BR>> Feel free to visit my blog post - [[http://bestfragrancesforwomen.net/|best fragrances for women]] |
Hi!
My name is Steven and I'm a 28 years old boy from Haversin.
Feel free to visit my blog post - best fragrances for women