Differences between revisions 7 and 9 (spanning 2 versions)
Revision 7 as of 2011-10-15 21:01:24
Size: 3978
Editor: geirha
Comment: Erm, shortening that a bit.
Revision 9 as of 2012-04-29 23:02:08
Size: 4317
Editor: ormaaj
Comment: half-rewrite
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
The obvious way to do this is to use `ls` to sort the filenames, and take the first one. This works well in all the trivial cases, but breaks when the filenames are unfriendly; the `ls` approach [[ParsingLs|cannot be made robust]] unless someone adds a `-0` extension to `ls`. I don't know of any version of `ls` with such an extension. The tempting solution is to use `ls` to output sorted filenames and take the first result. As usual, the `ls` approach [[ParsingLs|cannot be made robust]] and should never be used in scripts due in part to the possibility of arbitrary characters (including newlines) present in filenames. Therefore, we need some other way to compare file metadata.
Line 5: Line 5:
Therefore we need some way to compare the timestamps on the files. The most common requirement is to sort them based on ''modification time'' (mtime); Bash and ksh have `-nt` and `-ot` operators which can do this:
The most common requirement is to get the most or least recently modified files in a directory. Bash and all ksh variants can compare ''modification times'' (mtime) using the `-nt` and `-ot` operators of the `conditional expression` compound command:
Line 9: Line 8:
latest= unset -v latest
Line 15: Line 14:
To find the oldest: Or to find the oldest:
Line 18: Line 17:
oldest= unset -v oldest
Line 24: Line 23:
Neither shell has an analogous operator which compares by atime or ctime, so one would need external utilities for that; however, it is nearly impossible to ''handle'' the output of said utilities properly in standard Bourne or POSIX shells (or even Korn shell...). Keep in mind that ''mtime'' on directories is that of the most recently added, removed, or renamed file in that directory. Also note that `-nt` and `-ot` are not specified by [[http://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html#tag_20_128|POSIX test]], however many shells such as [[Bashism|dash]] include them anyway. No bourne-like shell has analogous operators for comparing by ''atime'' or ''ctime'', so one would need external utilities for that; however, it's nearly impossible to either produce output which can be safely parsed, or ''handle'' said output in a shell without using nonstandard features on both ends.
Line 26: Line 25:
If the sorting criteria are different from "oldest or newest file by mtime", then GNU `find` and GNU `sort` may be used together to produce a sorted list of filenames + timestamps, delimited by NUL characters. This will of course operate recursively ([[UsingFind|GNU find's]] `-maxdepth` operator can prevent that); and Bash's `read` command can be used to extract the filenames from this stream: If the sorting criteria are different from "oldest or newest file by mtime", then GNU `find` and GNU `sort` may be used together to produce a sorted list of filenames + timestamps, delimited by NUL characters. This will of course operate recursively ([[UsingFind|GNU find's]] `-maxdepth` operator can prevent that); Here are a few possibilities, which can be modified as necessary to use ''atime'' or ''ctime'', or to sort in reverse order:
Line 29: Line 28:
# Bash + GNU find + GNU sort # Bash + GNU find + GNU sort (To the precision possible on the given OS, but returns only one result)
Line 35: Line 34:
The `find` and `sort` options may be adjusted as needed, for example to sort by atime instead of mtime, or to sort in ascending order to get the oldest file. A group of multiple `read` commands may be used if more than the single latest file are needed.

We read whole lines and use [[BashFAQ/073|parameter expansion]], instead of reading two fields with [[IFS]] set to a space, because whitespace in `IFS` would cause all leading and trailing spaces to be stripped from the filename. We don't want that.

Of course, one disadvantage of this approach is that the entire list is sorted, which is an ''O(n log n)'' operation, whereas simply iterating through the list to find the minimum or maximum timestamp (assuming we want just one file) would be ''O(n)''. Here's an alternative which does that:
{{{
# GNU find + Bash w/ arrays (To the nearest 1s, using an undocumented "find -printf" format (%Ts).)
while IFS= read -rd '' 'latest[$(read -rd "" y; echo $y)]'
    do :
done < <(find "$dir" -type f -printf '%p\0%Ts\0')
latest=${latest[-1]}
}}}
Line 42: Line 43:
# Bash + GNU find + GNU sort
time=0 latest=
# GNU stat + Bash /w arrays (non-recursive w/o globstar, to the nearest 1s)
while IFS= read -rd '' 'latest[$(read -rd "" y; echo $y)]'
    do :
done < <(stat '--printf=%n\0%Y\0' "$dir"/*)
latest=${latest[-1]}
}}}

One disadvantage to these approaches is that the entire list is sorted, whereas simply iterating through the list to find the minimum or maximum timestamp (assuming we want just one file) would be faster, however, depending on the size of the job the algorithmic disadvantage of sorting may be negligible in comparison to the overhead of using a shell.

{{{
# Bash + GNU find
unset -v latest time
Line 50: Line 61:
In practice, reading lines in Bash is much slower than doing so in most other utilities, and so the algorithmic improvement may be less important than the inefficiency of the shell (in other words, the iterative loop may be ''slower'' than calling `sort`). Benchmarking is left as an exercise for the reader. Lastly, here's a more verbose variant for use in a library or .bashrc which can either return a result or assign directly to a variable:
{{{
latest() {
    if [[ $FUNCNAME == ${FUNCNAME[1]} ]]; then
        unset -v x latest
        printf ${2:+'-v' "$2"} '%s' "$1"
        return
    fi

    if (($# > 2)); then
        echo $'Usage: latest <glob> <varname>\nError: Takes at most 2 arguments. Glob defaults to *'
        return 1
    fi >&2

    if ! shopt -q nullglob; then
        trap 'shopt -u nullglob; trap - RETURN' RETURN
        shopt -s nullglob
    fi

    local x latest

    for x in ${1-*}; do
        [[ -f $x && $x -nt $latest ]] && latest=$x
    done

    latest "$latest" ${2+"$2"}
}
}}}
Line 54: Line 92:

'''This question number has been recycled. The previous version of this question was:'''
== How can I insert a blank character after each character? ==
{{{
    sed 's/./& /g'
}}}

Example:

{{{
    $ echo "testing" | sed 's/./& /g'
    t e s t i n g
}}}

For more examples of sed 1-liners, see [[http://www.student.northpark.edu/pemente/sed/sed1line.txt|sed 1-liners]] or [[http://sed.sourceforge.net/sedfaq.html|the sed FAQ]].

'''This question almost never came up in several years of IRC, so there's no point keeping it, especially at such an early (prominent) point in the FAQ. I'll leave this Q&A here for a while before deleting it.'''

How can I find the latest (newest, earliest, oldest) file in a directory?

The tempting solution is to use ls to output sorted filenames and take the first result. As usual, the ls approach cannot be made robust and should never be used in scripts due in part to the possibility of arbitrary characters (including newlines) present in filenames. Therefore, we need some other way to compare file metadata.

The most common requirement is to get the most or least recently modified files in a directory. Bash and all ksh variants can compare modification times (mtime) using the -nt and -ot operators of the conditional expression compound command:

# Bash/ksh
unset -v latest
for file in "$dir"/*; do
  [[ $file -nt $latest ]] && latest=$file
done

Or to find the oldest:

# Bash/ksh
unset -v oldest
for file in "$dir"/*; do
  [[ -z $oldest || $file -ot $oldest ]] && oldest=$file
done

Keep in mind that mtime on directories is that of the most recently added, removed, or renamed file in that directory. Also note that -nt and -ot are not specified by POSIX test, however many shells such as dash include them anyway. No bourne-like shell has analogous operators for comparing by atime or ctime, so one would need external utilities for that; however, it's nearly impossible to either produce output which can be safely parsed, or handle said output in a shell without using nonstandard features on both ends.

If the sorting criteria are different from "oldest or newest file by mtime", then GNU find and GNU sort may be used together to produce a sorted list of filenames + timestamps, delimited by NUL characters. This will of course operate recursively (GNU find's -maxdepth operator can prevent that); Here are a few possibilities, which can be modified as necessary to use atime or ctime, or to sort in reverse order:

# Bash + GNU find + GNU sort (To the precision possible on the given OS, but returns only one result)
IFS= read -r -d '' latest \
  < <(find "$dir" -type f -printf '%T@ %p\0' | sort -znr)
latest=${latest#* }   # remove timestamp + space

# GNU find + Bash w/ arrays (To the nearest 1s, using an undocumented "find -printf" format (%Ts).)
while IFS= read -rd '' 'latest[$(read -rd "" y; echo $y)]'
    do :
done < <(find "$dir" -type f -printf '%p\0%Ts\0')
latest=${latest[-1]}

# GNU stat + Bash /w arrays (non-recursive w/o globstar, to the nearest 1s)
while IFS= read -rd '' 'latest[$(read -rd "" y; echo $y)]'
    do :
done < <(stat '--printf=%n\0%Y\0' "$dir"/*)
latest=${latest[-1]}

One disadvantage to these approaches is that the entire list is sorted, whereas simply iterating through the list to find the minimum or maximum timestamp (assuming we want just one file) would be faster, however, depending on the size of the job the algorithmic disadvantage of sorting may be negligible in comparison to the overhead of using a shell.

# Bash + GNU find
unset -v latest time
while IFS= read -r -d '' line; do
  t=${line%% *} t=${t%.*}   # truncate fractional seconds
  ((t > time)) && { latest=${line#* } time=$t; }
done < <(find "$dir" -type f -printf '%T@ %p\0')

Lastly, here's a more verbose variant for use in a library or .bashrc which can either return a result or assign directly to a variable:

latest() {
    if [[ $FUNCNAME == ${FUNCNAME[1]} ]]; then
        unset -v x latest
        printf ${2:+'-v' "$2"} '%s' "$1"
        return
    fi

    if (($# > 2)); then
        echo $'Usage: latest <glob> <varname>\nError: Takes at most 2 arguments. Glob defaults to *'
        return 1
    fi >&2

    if ! shopt -q nullglob; then
        trap 'shopt -u nullglob; trap - RETURN' RETURN
        shopt -s nullglob
    fi

    local x latest

    for x in ${1-*}; do
        [[ -f $x && $x -nt $latest ]] && latest=$x
    done

    latest "$latest" ${2+"$2"}
}

Readers who are asking this question in order to rotate their log files may wish to look into logrotate(1) instead, if their operating system provides it.


CategoryShell

BashFAQ/003 (last edited 2018-01-19 22:00:52 by GreyCat)