What's the difference between "cmd < file" and "cat file | cmd"? What is a UUOC?

Most of the time, these commands do the same thing, but the second one is less efficient, and it also breaks in certain rare circumstances.

Assuming that cmd is not a function, builtin or special shell keyword, the first command cmd < file works like this:

  1. The shell forks a SubShell.

  2. Inside the subshell, standard input is closed and reopened using file in read-only mode.

  3. The subshell execs cmd, replacing itself with the new command, which presumably reads the file.

  4. The parent waits for the subshell to terminate, and then continues normally.

The second command cat file | cmd works like this:

  1. The shell forks two subshells, and creates an anonymous pipe.
  2. The pipe is connected from one subshell to the other.
  3. The first subshell execs cat with file as its argument. cat opens the file and copies its content to the pipe.

  4. The second subshell execs cmd, which presumably reads from the pipe.

  5. When both subshells have terminated, the parent continues normally.

So, we can already see why the second command is less efficient: it introduces an extra process, as well as the anonymous pipe.

Another difference is that in the first command, the standard input of cmd is a file descriptor pointing to an actual file. The command can rewind or otherwise seek to a different location in the file. However, in the second command, the standard input of cmd is an anonymous pipe, not a real file. The command can only read the input sequentially as a stream of bytes; it can't rewind it, or skip ahead or backward.

Commands that expect a seekable standard input will therefore break when using the second command. This is rare, but it does happen sometimes.

In some circumstances, the second command is also slower. If cmd happens to be a while read loop implemented in shell, the shell is forced to use single-byte reads, rather than buffered reads.

What is a UUOC?

UUOC is an acronym (or initialism, for you Brits) which stands for Useless Use Of Cat. It's an informal term used to describe the improper cat file | cmd construct.

Among circles of competent programmers, when a newcomer does something inefficient or wrong, the others typically try to help that person by pointing out the mistake and showing the correct way. On some Usenet groups, this became a tradition: the newcomer would be "presented" with a satirical UUOC award -- a sort of badge of achievement.

Sadly, some people don't appreciate the help they are being given, and see constructive criticism as merely criticism. See also: tact filter.

BashFAQ/119 (last edited 2020-06-10 14:18:07 by GreyCat)