Diff for "BashFAQ/003"

Differences between revisions 7 and 8

How can I find the latest (newest, earliest, oldest) file in a directory?

The obvious way to do this is to use ls to sort the filenames, and take the first one. This works well in all the trivial cases, but breaks when the filenames are unfriendly; the ls approach cannot be made robust unless someone adds a -0 extension to ls. I don't know of any version of ls with such an extension.

Therefore we need some way to compare the timestamps on the files. The most common requirement is to sort them based on modification time (mtime); Bash and ksh have -nt and -ot operators which can do this:

# Bash/ksh
latest=
for file in "$dir"/*; do
  [[ $file -nt $latest ]] && latest=$file
done

To find the oldest:

# Bash/ksh
oldest=
for file in "$dir"/*; do
  [[ -z $oldest || $file -ot $oldest ]] && oldest=$file
done

Neither shell has an analogous operator which compares by atime or ctime, so one would need external utilities for that; however, it is nearly impossible to handle the output of said utilities properly in standard Bourne or POSIX shells (or even Korn shell...).

If the sorting criteria are different from "oldest or newest file by mtime", then GNU find and GNU sort may be used together to produce a sorted list of filenames + timestamps, delimited by NUL characters. This will of course operate recursively (GNU find's -maxdepth operator can prevent that); and Bash's read command can be used to extract the filenames from this stream:

# Bash + GNU find + GNU sort
IFS= read -r -d '' latest \
  < <(find "$dir" -type f -printf '%T@ %p\0' | sort -znr)
latest=${latest#* }   # remove timestamp + space

The find and sort options may be adjusted as needed, for example to sort by atime instead of mtime, or to sort in ascending order to get the oldest file. A group of multiple read commands may be used if more than the single latest file are needed.

We read whole lines and use parameter expansion, instead of reading two fields with IFS set to a space, because whitespace in IFS would cause all leading and trailing spaces to be stripped from the filename. We don't want that.

Of course, one disadvantage of this approach is that the entire list is sorted, which is an O(n log n) operation, whereas simply iterating through the list to find the minimum or maximum timestamp (assuming we want just one file) would be O(n). Here's an alternative which does that:

# Bash + GNU find
time=0 latest=
while IFS= read -r -d '' line; do
  t=${line%% *} t=${t%.*}   # truncate fractional seconds
  ((t > time)) && { latest=${line#* } time=$t; }
done < <(find "$dir" -type f -printf '%T@ %p\0')

In practice, reading lines in Bash is much slower than doing so in most other utilities, and so the algorithmic improvement may be less important than the inefficiency of the shell (in other words, the iterative loop may be slower than calling sort). Benchmarking is left as an exercise for the reader.

Readers who are asking this question in order to rotate their log files may wish to look into logrotate(1) instead, if their operating system provides it.

This question number has been recycled. The previous version of this question was:

How can I insert a blank character after each character?

    sed 's/./& /g'

Example:

    $ echo "testing" | sed 's/./& /g'
    t e s t i n g

For more examples of sed 1-liners, see sed 1-liners or the sed FAQ.

This question almost never came up in several years of IRC, so there's no point keeping it, especially at such an early (prominent) point in the FAQ. I'll leave this Q&A here for a while before deleting it.

CategoryShell

-  ⇤ ← Revision 7 as of 2011-10-15 21:01:24 → 
  Size: 3978
  Editor: geirha
  Comment: Erm, shortening that a bit.
+   ← Revision 8 as of 2011-11-18 06:54:53 → ⇥
  Size: 3967
  Editor: host-74-211-79-188
  Comment: Changed comment. Removed GNU sort from snippet that did not use sort
-Deletions are marked like this.
+Additions are marked like this.
 Line 42:
-# Bash + GNU find + GNU sort
+# Bash + GNU find