Differences between revisions 1 and 29 (spanning 28 versions)
Revision 1 as of 2007-05-02 23:36:06
Size: 4984
Editor: redondos
Comment:
Revision 29 as of 2024-10-06 10:58:28
Size: 6078
Editor: emanuele6
Comment: exec 9>/path/to/lock/file; if ! flock -n 9; then is not POSIX because fd>=3 opened with exec may or may not be CLOEXEC according to POSIX; they are CLOEXEC in ksh93 so this doesn't work there
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
[[Anchor(faq45)]]
== How can I ensure that only one instance of a script is running at a time (mutual exclusion)? ==

We need some means of '''mutual exclusion'''. One easy way is to use a "lock": any number of processes can try to acquire the lock simultaneously, but only one of them will succeed.
<<Anchor(faq45)>>
== How can I ensure that only one instance of a script is running at a time (mutual exclusion, locking)? ==
We need some means of ''mutual exclusion''. One way is to use a "lock": any number of processes can try to acquire the lock simultaneously, but only one of them will succeed.
Line 8: Line 7:
 {{{
 # locking example -- WRONG
{{{#!highlight bash
# locking example -- WRONG
lockfile=/tmp/myscript.lock
if [ -f "$lockfile" ]
then # lock is already held
    printf >&2 'cannot acquire lock, giving up: %s\n' "$lockfile"
    exit 0
else # nobody owns the lock
    > "$lockfile" # create the file
    #...continue script
fi
}}}
Line 11: Line 20:
 lockfile=/tmp/myscript.lock
 if [ -f "$lockfile" ]
 then # lock is already held
     echo >&2 "cannot acquire lock, giving up: $lockfile"
     exit 0
 else # nobody owns the lock
     > "$lockfile" # create the file
     #...continue script
 fi}}}
This example '''does not work''', because there is a RaceCondition: a time window between checking and creating the file, during which other programs may act. Assume two processes are running this code at the same time. Both check if the lockfile exists, and both get the result that it does not exist. Now both processes assume they have acquired the lock -- a disaster waiting to happen. We need an atomic check-and-create operation, and fortunately there is one: `mkdir`, the command to create a directory:
Line 21: Line 22:
This example '''does not work''', because there is a time window between checking and creating the file. Assume two processes are running the code at the same time. Both check if the lockfile exists, and both get the result that it does not exist. Now both processes assume they have acquired the lock -- a disaster waiting to happen. We need an atomic check-and-create operation, and fortunately there is one: {{{mkdir}}}, the command to create a directory: {{{#!highlight bash
# locking example -- CORRECT
# Bourne
lockdir=/tmp/myscript.lock
if mkdir -- "$lockdir"
then # directory did not exist, but was created successfully
    printf >&2 'successfully acquired lock: %s\n' "$lockdir"
    # continue script
else
    printf >&2 'cannot acquire lock, giving up on %s\n' "$lockdir"
    exit 0
fi
}}}
Line 23: Line 36:
 {{{
 # locking example -- CORRECT
Here, even when two processes call {{{mkdir}}} at the same time, only one process can succeed at most. This atomicity of check-and-create is ensured at the operating system kernel level.
Line 26: Line 38:
 lockdir=/tmp/myscript.lock
 if mkdir "$lockdir"
 then # directory did not exist, but was created successfully
     echo >&2 "successfully acquired lock: $lockdir"
     # continue script
 else
     echo >&2 "cannot acquire lock, giving up on $lockdir"
     exit 0
 fi}}}
Instead of using {{{mkdir}}} we could also have used the program to create a symbolic link, {{{ln -s}}}. A third possibility is to have the program delete a preexisting lock file with {{{rm}}}. The lock is released by recreating the file on exit.
Line 36: Line 40:
The advantage over using a lock file is, that even when two processes call {{{mkdir}}} at the same time, only one process can succeed at most. This atomicity of check-and-create is ensured at the operating system kernel level.

Note that we cannot use "mkdir -p" to automatically create missing path components: "mkdir -p" does not return an error if the directory exists already, but that's the feature we rely upon to ensure mutual exclusion.
Note that we cannot use {{{mkdir -p}}} to automatically create missing path components: {{{mkdir -p}}} does not return an error if the directory exists already, but that's the feature we rely upon to ensure mutual exclusion.
Line 42: Line 44:
 {{{
 lockdir=/tmp/myscript.lock
 if mkdir "$lockdir"
 then
     echo >&2 "successfully acquired lock"
 
     # Remove lockdir when the script finishes, or when it receives a signal
     trap 'rm -rf "$lockdir"' 0 # remove directory when script finishes
     trap "exit 2" 1 2 3 15 # terminate script when receiving signal
 
     # Optionally create temporary files in this directory, because
     # they will be removed automatically:
     tmpfile=$lockdir/filelist
 
 else
     echo >&2 "cannot acquire lock, giving up on $lockdir"
     exit 0
 fi}}}
{{{#!highlight bash
# POSIX (maybe Bourne?)
lockdir=/tmp/myscript.lock
if mkdir -- "$lockdir"
then
    printf >&2 'successfully acquired lock\n'
Line 61: Line 51:
This example provides reliable mutual exclusion. There is still the disadvantage that a ''stale'' lock file could remain when the script is terminated with a signal not caught (or signal 9, SIGKILL), but it's a good step towards reliable mutual exclusion. An example that remedies this (contributed by Charles Duffy) follows:     # Remove lockdir when the script finishes, or when it receives a signal
    trap 'rm -rf -- "$lockdir"' 0 # remove directory when script finishes
Line 63: Line 54:
 ''Are we sure this code's correct? There seems to be a discrepancy between the names LOCK_DEFAULT_NAME and DEFAULT_NAME; and it checks for processes in what looks to be a race condition; and it uses the Linux-specific /proc file system and the GNU-specific egrep -o to do so.... I don't trust it. It looks overly complex and fragile. And quite non-portable. -- GreyCat''     # Optionally create temporary files in this directory, because
    # they will be removed automatically:
    tmpfile=$lockdir/filelist
Line 65: Line 58:
 {{{
 LOCK_DEFAULT_NAME=$0
 LOCK_HOSTNAME="$(hostname -f)"
 
 ## function to take the lock if free; will fail otherwise
 function grab-lock {
   local PROGRAMNAME="${1:-$DEFAULT_NAME}"
   local PID=${2:-$$}
   (
     umask 000;
     mkdir -p "/tmp/${PROGRAMNAME}-lock"
     mkdir "/tmp/${PROGRAMNAME}-lock/held" || return 1
     mkdir "/tmp/${PROGRAMNAME}-lock/held/${LOCK_HOSTNAME}--pid-${PID}" && return 0 || return 1
   ) 2>/dev/null
   return $?
 }
 
 ## function to nicely let go of the lock
 function release-lock {
   local PROGRAMNAME="${1:-$DEFAULT_NAME}"
   local PID=${2:-$$}
   (
     rmdir "/tmp/${PROGRAMNAME}-lock/held/${LOCK_HOSTNAME}--pid-${PID}" || true
     rmdir "/tmp/${PROGRAMNAME}-lock/held" && return 0 || return 1
   ) 2>/dev/null
   return $?
 }
 
 ## function to force anyone else off of the lock
 function break-lock {
   local PROGRAMNAME="${1:-$DEFAULT_NAME}"
   (
     [ -d "/tmp/${PROGRAMNAME}-lock/held" ] || return 0
     for DIR in "/tmp/${PROGRAMNAME}-lock/held/${LOCK_HOSTNAME}--pid-"* ; do
       OTHERPID="$(echo $DIR | egrep -o '[0-9]+$')"
       [ -d /proc/${OTHERPID} ] || rmdir $DIR
     done
     rmdir /tmp/${PROGRAMNAME}-lock/held && return 0 || return 1
   ) 2>/dev/null
   return $?
 }
 
 ## function to take the lock nicely, freeing it first if needed
 function get-lock {
   break-lock "$@" && grab-lock "$@"
 }
 }}}
else
    printf >&2 'cannot acquire lock, giving up on %s\n' "$lockdir"
    exit 0
fi
}}}
Line 113: Line 64:
Instead of using {{{mkdir}}} we could also have used the program to create a symbolic link, {{{ln -s}}}. This example is much better. There is still the problem that a ''stale'' lock could remain when the script is terminated with a signal not caught (or signal 9, SIGKILL), or could be created by a user (either accidentally or maliciously), but it's a good step towards reliable mutual exclusion. Charles Duffy has [[/contrib|contributed an example]] that may remedy the "stale lock" problem.

If you're using a GNU/Linux distribution, you can also get the benefit of using flock(1), which ties a [[FileDescriptor]] to a lock file. There are multiple ways to use it; one possibility to solve the multiple instance problem is:

{{{#!highlight bash
# Bash -- in POSIX fds >=3 may not get inherithed; doesn't work in ksh93
exec 9>/path/to/lock/file
if ! flock -n 9; then
    printf 'another instance is running\n';
    exit 1
fi
# this now runs under the lock until 9 is closed (it will be closed automatically when the script ends)
}}}

flock can also be used to protect only a part of your script, see the [[https://www.man7.org/linux/man-pages/man1/flock.1.html|man page]] for more information.


=== Discussion ===

==== Alternative Solution ====

I believe using {{{if (set -C; : >$lockfile); then ...}}} is equally safe if not safer. The Bash source uses {{{open(filename, flags|O_EXCL, mode);}}} which should be atomic on almost all platforms (with the exception of some versions of NFS where mkdir may not be atomic either). I haven't traced the path of the flags variable, which must contain {{{O_CREAT}}}, nor have I looked at any other shells. I wouldn't suggest using this until someone else can backup my claims. --Andy753421

 Using set -C does not work with ksh88. Ksh88 does not use O_EXCL, when you set noclobber (-C). --jrw32982

 Are you sure mkdir has problems with being atomic on NFS? I thought that affected only open, but I'm not really sure. -- BeJonas <<DateTime(2008-07-24T01:22:59Z)>>

==== Removal of locking mechanism ====
Shouldn't the example code blocks above include a `rm "$lockfile"` or `rmdir "lockdir"` directly after the `#...continue script` line? - AnthonyGeoghegan

 . The lock can't be safely removed while the script is still doing its work -- that would allow another instance to run. The longer example includes a `trap` that removes the lock when the script exits.

==== flock file descriptor uniqueness ====

The example uses file descriptor 9 with flock, i.e.

exec 9>/path/to/lock/file
  if ! flock -n 9...

Note, file descriptors are unique per-process. FD 0,1, and 2 are used for stdin,stdout, and stderr so picking a generally high value is sufficient.
(source: http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.genprogc/doc/genprogc/fdescript.htm )

However, what if this file descriptor is already in use by a completely different process? Are we then locking on the file descriptor and not the lock file? How can we ensure we use something that is not already being used?
Line 116: Line 109:




----
CategoryShell

How can I ensure that only one instance of a script is running at a time (mutual exclusion, locking)?

We need some means of mutual exclusion. One way is to use a "lock": any number of processes can try to acquire the lock simultaneously, but only one of them will succeed.

How can we implement this using shell scripts? Some people suggest creating a lock file, and checking for its presence:

   1 # locking example -- WRONG
   2 lockfile=/tmp/myscript.lock
   3 if [ -f "$lockfile" ]
   4 then                      # lock is already held
   5     printf >&2 'cannot acquire lock, giving up: %s\n' "$lockfile"
   6     exit 0
   7 else                      # nobody owns the lock
   8     > "$lockfile"         # create the file
   9     #...continue script
  10 fi

This example does not work, because there is a RaceCondition: a time window between checking and creating the file, during which other programs may act. Assume two processes are running this code at the same time. Both check if the lockfile exists, and both get the result that it does not exist. Now both processes assume they have acquired the lock -- a disaster waiting to happen. We need an atomic check-and-create operation, and fortunately there is one: mkdir, the command to create a directory:

   1 # locking example -- CORRECT
   2 # Bourne
   3 lockdir=/tmp/myscript.lock
   4 if mkdir -- "$lockdir"
   5 then    # directory did not exist, but was created successfully
   6     printf >&2 'successfully acquired lock: %s\n' "$lockdir"
   7     # continue script
   8 else
   9     printf >&2 'cannot acquire lock, giving up on %s\n' "$lockdir"
  10     exit 0
  11 fi

Here, even when two processes call mkdir at the same time, only one process can succeed at most. This atomicity of check-and-create is ensured at the operating system kernel level.

Instead of using mkdir we could also have used the program to create a symbolic link, ln -s. A third possibility is to have the program delete a preexisting lock file with rm. The lock is released by recreating the file on exit.

Note that we cannot use mkdir -p to automatically create missing path components: mkdir -p does not return an error if the directory exists already, but that's the feature we rely upon to ensure mutual exclusion.

Now let's spice up this example by automatically removing the lock when the script finishes:

   1 # POSIX (maybe Bourne?)
   2 lockdir=/tmp/myscript.lock
   3 if mkdir -- "$lockdir"
   4 then
   5     printf >&2 'successfully acquired lock\n'
   6 
   7     # Remove lockdir when the script finishes, or when it receives a signal
   8     trap 'rm -rf -- "$lockdir"' 0    # remove directory when script finishes
   9 
  10     # Optionally create temporary files in this directory, because
  11     # they will be removed automatically:
  12     tmpfile=$lockdir/filelist
  13 
  14 else
  15     printf >&2 'cannot acquire lock, giving up on %s\n' "$lockdir"
  16     exit 0
  17 fi

This example is much better. There is still the problem that a stale lock could remain when the script is terminated with a signal not caught (or signal 9, SIGKILL), or could be created by a user (either accidentally or maliciously), but it's a good step towards reliable mutual exclusion. Charles Duffy has contributed an example that may remedy the "stale lock" problem.

If you're using a GNU/Linux distribution, you can also get the benefit of using flock(1), which ties a FileDescriptor to a lock file. There are multiple ways to use it; one possibility to solve the multiple instance problem is:

   1 # Bash -- in POSIX fds >=3 may not get inherithed; doesn't work in ksh93
   2 exec 9>/path/to/lock/file
   3 if ! flock -n 9; then
   4     printf 'another instance is running\n';
   5     exit 1
   6 fi
   7 # this now runs under the lock until 9 is closed (it will be closed automatically when the script ends)

flock can also be used to protect only a part of your script, see the man page for more information.

Discussion

Alternative Solution

I believe using if (set -C; : >$lockfile); then ... is equally safe if not safer. The Bash source uses open(filename, flags|O_EXCL, mode); which should be atomic on almost all platforms (with the exception of some versions of NFS where mkdir may not be atomic either). I haven't traced the path of the flags variable, which must contain O_CREAT, nor have I looked at any other shells. I wouldn't suggest using this until someone else can backup my claims. --Andy753421

  • Using set -C does not work with ksh88. Ksh88 does not use O_EXCL, when you set noclobber (-C). --jrw32982

    Are you sure mkdir has problems with being atomic on NFS? I thought that affected only open, but I'm not really sure. -- BeJonas 2008-07-24 01:22:59

Removal of locking mechanism

Shouldn't the example code blocks above include a rm "$lockfile" or rmdir "lockdir" directly after the #...continue script line? - AnthonyGeoghegan

  • The lock can't be safely removed while the script is still doing its work -- that would allow another instance to run. The longer example includes a trap that removes the lock when the script exits.

flock file descriptor uniqueness

The example uses file descriptor 9 with flock, i.e.

exec 9>/path/to/lock/file

  • if ! flock -n 9...

Note, file descriptors are unique per-process. FD 0,1, and 2 are used for stdin,stdout, and stderr so picking a generally high value is sufficient. (source: http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.genprogc/doc/genprogc/fdescript.htm )

However, what if this file descriptor is already in use by a completely different process? Are we then locking on the file descriptor and not the lock file? How can we ensure we use something that is not already being used?

For more discussion on these issues, see ProcessManagement.


CategoryShell

BashFAQ/045 (last edited 2024-10-06 10:58:28 by emanuele6)