Differences between revisions 2 and 30 (spanning 28 versions)
Revision 2 as of 2009-02-09 19:54:14
Size: 690
Editor: localhost
Comment: Added XML parsing.... As master commanded.
Revision 30 as of 2022-09-01 18:14:07
Size: 9617
Editor: 188
Comment: The old `jdebp.eu` domain seems to be expired -- moved to `jdebp.info`.
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:

''This is a stub. Please fill in the missing pieces.''
Line 7: Line 4:
 * Floating point math. Bash has only [[ArithmeticExpression|integer math]]. Use `bc(1)` or [[AWK]] instead.  1. '''Speed'''. Do we really have to say it? Bash is slow. If speed is an important consideration, then Bash may not be the best choice.
Line 9: Line 6:
 * Associative arrays (coming in bash 4.0). Use AWK or perl or Tcl instead.  1. '''Floating point math'''. Bash has only [[ArithmeticExpression|integer math]]. Use `bc(1)` or `awk(1)` if you need to do floating point math.
Line 11: Line 8:
 * Fancy ProcessManagement. Bash has nothing analogous to `select(2)` or `poll(2)`. Use C instead.  1. '''Data structures'''. Bash does not have Pascal-style records (C-style structs); nor does it have pointers. Any attempt to create advanced data structures (stacks, queues, linked lists, binary trees...) will have to be done with extremely primitive hacks.
Line 13: Line 10:
 * XML and HTML (or alike) parsing. You'd need external tools for that, at best, use Perl.  1. '''Fancy [[ProcessManagement|process management]]'''. Bash has nothing analogous to `select(2)` or `poll(2)`. There's no way to enter an [[WikiPedia:event loop]]. Use other programming languages if you need an event-driven model. Most "object oriented" languages will do better at these tasks.

 1. '''[[BashFAQ/113|XML and HTML parsing]]'''. These are tag-based languages and cannot be parsed by regular expressions. You need dedicated tools or libraries to do this correctly. Use xslt, tidy, xmlstarlet, perl, or some other suitable tool.

 1. '''JSON parsing'''. Not quite as bad as XML, but you still shouldn't use the wrong tool for the job. Try [[http://stedolan.github.com/jq/|jq]] instead, or the upcoming version of ksh93, which will [[http://lists.research.att.com/pipermail/ast-users/2014q4/004652.html|include json parsing]] (currently in beta).

 1. '''Binary data'''. Bash has no way to store the NUL byte in a variable, so binary data either has to be encoded (and decoded), or kept in a file. You also can't pass the NUL byte as an argument to a program, because the kernel uses C strings for those. Parsing binary data from a file is also a nontrivial problem. Try perl or C instead.

 1. '''Database queries'''. When retrieving a tuple from a relational database, there is no way for Bash to understand where one element of the tuple ends and the next begins. In general, Bash is not suited to any sort of data retrieval that extracts multiple data values in a single operation, unless there is a clearly defined delimiter between fields. For database queries (SQL or otherwise), switch to a language that supports the database's query API.

 1. '''Variable typing'''. Like most scripting languages, Bash does not really support strong variable types. Variables are loosely categorized as scalar or array (plus associative arrays in bash 4), with partial support for an integer type. But really, everything is a string.

 1. '''Dropping permissions'''. It can be tough to make a bash script safe to execute as root. In languages like C, perl, and python, you can easily drop privileges at a certain point. With bash, this is tricky, because while you can run `su` or `sudo` (or [[http://jdebp.info/FGA/dont-abuse-su-for-dropping-privileges.html|better dedicated programs]]), these are external -- you lose your entire executing environment.

 1. '''Try/catch'''. Some programming languages let you wrap a command in a `try ... catch` block. This will interpret the command in a sort of "sandbox", where errors that would normally cause an abort are "caught", and trigger some sort of error-handling code. Bash does not have anything analogous to this. Any bash code you run is real code.

 1. '''Exception handling'''. Many programming languages have the concept of an "exception", essentially an event that the runtime environment creates when certain kinds of errors occur. Bash doesn't have these. Bash uses the C model for error handling: it makes ''you'' do it. You need to check the result of every critical command in your script. (And no, [[BashFAQ/105|set -e]] isn't the right answer either.)

 1. '''Functions'''. Bash's "functions" have several issues:
  * '''Return values''': Bash functions don't ''return'' anything; they only produce output streams and other side effects. The common way people try to transfer information from the function to its caller is to capture the function's stdout stream with a CommandSubstitution. This creates a SubShell, which is slow, and breaks assignments to outer scopes and other potentially desirable side effects. See [[BashFAQ/084]] for other ways to retrieve results from a function, but realize that they are all ''tricks'', and they have varying limitations.
  * '''Reusability''': You can't pass arguments "by reference" either, at least not until Bash 4.3 (and even there the `declare -n` mechanism has [[BashFAQ/048|serious security flaws]]). There's no ''safe'' way to tell a function the name of a variable where you want it to put its output. Working with arrays is even worse -- you can't pass the name of an array to a function and let the function use it. The best you can do, typically, is to pass ''each array element'' as a separate argument. This means libraries of nontrivial reusable functions are not feasible, except by performing `eval` backflips.
  * '''Scope''': Bash has a simple system of local scope which roughly resembles "dynamic scope" (e.g. Javascript, elisp). Functions see the locals of their callers (like Python's "nonlocal" keyword), but can't access a caller's positional parameters (except through BASH_ARGV if extdebug is enabled). Reusable functions can't be guaranteed free of namespace collisions unless you resort to weird naming rules to make conflicts sufficiently unlikely. This is particularly a problem if implementing functions that expect to be acting upon variable names from frame n-3 which may have been overwritten by your reusable function at n-2. Ksh93 can use the more common lexical scope rules by declaring functions with the "function name { ... }" syntax (Bash can't, but supports this syntax anyway).
  * '''Closures''': In Bash, functions themselves are always global (have "file scope"), so no closures. Function definitions may be nested, but these are not closures, though they look very much the same. Functions are not "passable" (first-class), and there are no anonymous functions (lambdas). In fact, nothing is "passable", ''especially'' not arrays. Bash uses strictly [[BashFAQ/006|call-by-value]] semantics (magic alias hack excepted).
  * There are many more complications involving subshells, exported functions, "function collapsing" (functions that define or redefine other functions or themselves), traps (and their inheritance), and the way functions interact with stdio. Don't bite the newbie for not understanding all this. Shell functions are totally f***ed.

 1. '''Sorting'''. Bash can't sort data sets. If you need to sort an array, you can either write your own sorting algorithm in pure bash, or you can serialize the data set, pipe it to `sort`, and then parse it back in. Either way is painful, particularly if your `sort` doesn't have `-z`.

 1. '''Syntax nightmare'''. Bash can do many different things, and is even good at some of them, but just about every single of one them has its own special, unique syntax. There is no consistency, almost as if each syntax feature had been developed separately. Nearly every single punctuation character on a US keyboard has some special meaning ''somewhere'', and many of them have more than one.
  * The `*` character, when unquoted, will expand to a list of filenames in some places, but not other places.
  * `$((` may be the start of a an ArithmeticExpression, or the start of a CommandSubstitution, depending on whether bash can find a matching `))` somewhere further on.
  * `{` may be the start of a command group, or the start of a brace expansion.
  * `${foo: -1}` is a substring parameter expansion, but `${foo:-1}` is a parameter expansion with a default value.
  * `+=` may append data to a string variable, or add new elements to an array, or perform integer addition.
  * And so on, and so on.

 1. '''Hidden landmines'''. Bash has lots of seductively elegant features, almost all of which are '''broken''' by default. Working around the wrongness takes considerable work, usually involving ugly boilerplate code, or in some cases a complete rewrite. See BashPitfalls for a comprehensive overview. For specific cases, see CodeInjection, and [[BashFAQ/050]].

On top of these, Bash is not ideal for large programs. If your program is going to be responsible for a lot of tasks, especially interactively, then you may want to consider another interpreter or switch to a compiled language altogether. Large Bash scripts very quickly get in trouble because Bash is slow at a lot of things other interpreters are fast at. Large chunks of Bash code quickly become non-transparent with few ways other than functions to bring structure to your code. Bash scripts are nearly untestable. Even the most purist of bash programmers (and there aren't many!) write code that, when it all adds up, becomes difficult to maintain. Bash has almost no concept of code safety which lets sneaky little bugs crawl in really easily without warning or notice. And when things go wrong (and things ''will'' go wrong), really large scripts are very difficult to debug.

If you '''do''' plan to write large Bash scripts, make sure to pay even more attention than normal to every single good practice rule and uphold a consistent style throughout the entire code to avoid ''too'' much headache later on.

----
CategoryShell

There are certain things BASH is not very good at. There are certain tasks you shouldn't do in bash, unless you really, truly have to. It's often better to switch to a different language for most of these tasks.

  1. Speed. Do we really have to say it? Bash is slow. If speed is an important consideration, then Bash may not be the best choice.

  2. Floating point math. Bash has only integer math. Use bc(1) or awk(1) if you need to do floating point math.

  3. Data structures. Bash does not have Pascal-style records (C-style structs); nor does it have pointers. Any attempt to create advanced data structures (stacks, queues, linked lists, binary trees...) will have to be done with extremely primitive hacks.

  4. Fancy process management. Bash has nothing analogous to select(2) or poll(2). There's no way to enter an event loop. Use other programming languages if you need an event-driven model. Most "object oriented" languages will do better at these tasks.

  5. XML and HTML parsing. These are tag-based languages and cannot be parsed by regular expressions. You need dedicated tools or libraries to do this correctly. Use xslt, tidy, xmlstarlet, perl, or some other suitable tool.

  6. JSON parsing. Not quite as bad as XML, but you still shouldn't use the wrong tool for the job. Try jq instead, or the upcoming version of ksh93, which will include json parsing (currently in beta).

  7. Binary data. Bash has no way to store the NUL byte in a variable, so binary data either has to be encoded (and decoded), or kept in a file. You also can't pass the NUL byte as an argument to a program, because the kernel uses C strings for those. Parsing binary data from a file is also a nontrivial problem. Try perl or C instead.

  8. Database queries. When retrieving a tuple from a relational database, there is no way for Bash to understand where one element of the tuple ends and the next begins. In general, Bash is not suited to any sort of data retrieval that extracts multiple data values in a single operation, unless there is a clearly defined delimiter between fields. For database queries (SQL or otherwise), switch to a language that supports the database's query API.

  9. Variable typing. Like most scripting languages, Bash does not really support strong variable types. Variables are loosely categorized as scalar or array (plus associative arrays in bash 4), with partial support for an integer type. But really, everything is a string.

  10. Dropping permissions. It can be tough to make a bash script safe to execute as root. In languages like C, perl, and python, you can easily drop privileges at a certain point. With bash, this is tricky, because while you can run su or sudo (or better dedicated programs), these are external -- you lose your entire executing environment.

  11. Try/catch. Some programming languages let you wrap a command in a try ... catch block. This will interpret the command in a sort of "sandbox", where errors that would normally cause an abort are "caught", and trigger some sort of error-handling code. Bash does not have anything analogous to this. Any bash code you run is real code.

  12. Exception handling. Many programming languages have the concept of an "exception", essentially an event that the runtime environment creates when certain kinds of errors occur. Bash doesn't have these. Bash uses the C model for error handling: it makes you do it. You need to check the result of every critical command in your script. (And no, set -e isn't the right answer either.)

  13. Functions. Bash's "functions" have several issues:

    • Return values: Bash functions don't return anything; they only produce output streams and other side effects. The common way people try to transfer information from the function to its caller is to capture the function's stdout stream with a CommandSubstitution. This creates a SubShell, which is slow, and breaks assignments to outer scopes and other potentially desirable side effects. See BashFAQ/084 for other ways to retrieve results from a function, but realize that they are all tricks, and they have varying limitations.

    • Reusability: You can't pass arguments "by reference" either, at least not until Bash 4.3 (and even there the declare -n mechanism has serious security flaws). There's no safe way to tell a function the name of a variable where you want it to put its output. Working with arrays is even worse -- you can't pass the name of an array to a function and let the function use it. The best you can do, typically, is to pass each array element as a separate argument. This means libraries of nontrivial reusable functions are not feasible, except by performing eval backflips.

    • Scope: Bash has a simple system of local scope which roughly resembles "dynamic scope" (e.g. Javascript, elisp). Functions see the locals of their callers (like Python's "nonlocal" keyword), but can't access a caller's positional parameters (except through BASH_ARGV if extdebug is enabled). Reusable functions can't be guaranteed free of namespace collisions unless you resort to weird naming rules to make conflicts sufficiently unlikely. This is particularly a problem if implementing functions that expect to be acting upon variable names from frame n-3 which may have been overwritten by your reusable function at n-2. Ksh93 can use the more common lexical scope rules by declaring functions with the "function name { ... }" syntax (Bash can't, but supports this syntax anyway).

    • Closures: In Bash, functions themselves are always global (have "file scope"), so no closures. Function definitions may be nested, but these are not closures, though they look very much the same. Functions are not "passable" (first-class), and there are no anonymous functions (lambdas). In fact, nothing is "passable", especially not arrays. Bash uses strictly call-by-value semantics (magic alias hack excepted).

    • There are many more complications involving subshells, exported functions, "function collapsing" (functions that define or redefine other functions or themselves), traps (and their inheritance), and the way functions interact with stdio. Don't bite the newbie for not understanding all this. Shell functions are totally f***ed.
  14. Sorting. Bash can't sort data sets. If you need to sort an array, you can either write your own sorting algorithm in pure bash, or you can serialize the data set, pipe it to sort, and then parse it back in. Either way is painful, particularly if your sort doesn't have -z.

  15. Syntax nightmare. Bash can do many different things, and is even good at some of them, but just about every single of one them has its own special, unique syntax. There is no consistency, almost as if each syntax feature had been developed separately. Nearly every single punctuation character on a US keyboard has some special meaning somewhere, and many of them have more than one.

    • The * character, when unquoted, will expand to a list of filenames in some places, but not other places.

    • $(( may be the start of a an ArithmeticExpression, or the start of a CommandSubstitution, depending on whether bash can find a matching )) somewhere further on.

    • { may be the start of a command group, or the start of a brace expansion.

    • ${foo: -1} is a substring parameter expansion, but ${foo:-1} is a parameter expansion with a default value.

    • += may append data to a string variable, or add new elements to an array, or perform integer addition.

    • And so on, and so on.
  16. Hidden landmines. Bash has lots of seductively elegant features, almost all of which are broken by default. Working around the wrongness takes considerable work, usually involving ugly boilerplate code, or in some cases a complete rewrite. See BashPitfalls for a comprehensive overview. For specific cases, see CodeInjection, and BashFAQ/050.

On top of these, Bash is not ideal for large programs. If your program is going to be responsible for a lot of tasks, especially interactively, then you may want to consider another interpreter or switch to a compiled language altogether. Large Bash scripts very quickly get in trouble because Bash is slow at a lot of things other interpreters are fast at. Large chunks of Bash code quickly become non-transparent with few ways other than functions to bring structure to your code. Bash scripts are nearly untestable. Even the most purist of bash programmers (and there aren't many!) write code that, when it all adds up, becomes difficult to maintain. Bash has almost no concept of code safety which lets sneaky little bugs crawl in really easily without warning or notice. And when things go wrong (and things will go wrong), really large scripts are very difficult to debug.

If you do plan to write large Bash scripts, make sure to pay even more attention than normal to every single good practice rule and uphold a consistent style throughout the entire code to avoid too much headache later on.


CategoryShell

BashWeaknesses (last edited 2022-09-01 18:14:07 by 188)