Pages:

  1. Basic concepts

  2. Tool selection

  3. Working with files

Working with files

On the previous page, we looked at some input file formats, and considered the choice of various tools that can read them. But many scripts deal with the files themselves, rather than what's inside them.

Filenames

On Unix systems, filenames may contain whitespace. This includes the space character, obviously. It also includes tabs, carriage returns, newlines, and more. Unix filenames may contain every character except / and NUL, and / is obviously allowed in pathnames (which are filenames preceded by zero or more directory components, either relative like ./myscript or absolute like /etc/aliases).

It is a tragic mistake to write software that assumes filenames may be separated by spaces, or even newlines. Poorly written bash scripts are especially likely to be vulnerable to malicious or accidentally created unusual filenames. It's your job as the programmer to write scripts that don't fall over and die (or worse) when the user has a weird filename.

Iteration over filenames should be done by letting bash expand a glob, never by ParsingLs. If you need to iterate recursively, you can use the globstar option and a glob containing **, or you can use find. I won't duplicate the UsingFind page here; you are expected to have read it.

A single filename may be safely stored in a bash string variable. If you need to store multiple filenames for some reason, use an array variable. Never attempt to store multiple filenames in a string variable with whitespace between them. In most cases, you shouldn't need to store multiple filenames anyway. Usually you can just iterate over the files once, and don't need to store more than one filename at a time. Of course, this depends on what the script is doing.


<- Tool selection | Working with files |