Differences between revisions 2 and 3
Revision 2 as of 2008-11-22 14:09:11
Size: 2271
Editor: localhost
Comment: converted to 1.6 markup
Revision 3 as of 2008-11-22 23:37:59
Size: 2278
Editor: GreyCat
Comment: first-line and }}}
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
Line 8: Line 7:
 execlp("ls", "ls", "-l", "dir1", "dir2", (char *) NULL);}}}  execlp("ls", "ls", "-l", "dir1", "dir2", (char *) NULL);
 
}}}
Line 16: Line 16:
 bash: /usr/bin/grep: Arg list too long}}}  bash: /usr/bin/grep: Arg list too long
 
}}}
Line 29: Line 30:
 done}}}  done
 
}}}

I'm getting "Argument list too long". How can I process a large list in chunks?

First, let's review some background material. When a process wants to run another process, it fork()s a child, and the child calls one of the exec* family of system calls (e.g. execve()), giving the name or path of the new process's program file; the name of the new process; the list of arguments for the new process; and, in some cases, a set of environment variables. Thus:

  •  /* C */
     execlp("ls", "ls", "-l", "dir1", "dir2", (char *) NULL);

There is (generally) no limit to the number of arguments that can be passed this way, but on most systems, there is a limit to the total size of the list. For more details, see http://www.in-ulm.de/~mascheck/various/argmax/ .

If you try to pass too many filenames (for instance) in a single program invocation, you'll get something like:

  •  $ grep foo /usr/include/sys/*.h
     bash: /usr/bin/grep: Arg list too long

There are various tricks you could use to work around this in an ad hoc manner (change directory to /usr/include/sys first, and use grep foo *.h to shorten the length of each filename...), but what if you need something absolutely robust?

Some people like to use xargs here, but it has some serious issues. It treats whitespace and quote characters in its input as word delimiters, making it incapable of handling filenames properly. (See UsingFind for a discussion of this.)

The most robust alternative is to use a Bash array and a loop to process the array in chunks:

  •  # Bash
     files=(/usr/include/*.h /usr/include/sys/*.h)
     for ((i=0; i<${#files[*]}; i+=100)); do
       grep foo "${files[@]:i:100}" /dev/null
     done

Here, we've chosen to process 100 elements at a time; this is arbitrary, of course, and you could set it higher or lower depending on the anticipated size of each element vs. the target system's getconf ARG_MAX value. If you want to get fancy, you could do arithmetic using ARG_MAX and the size of the largest element, but you still have to introduce "fudge factors" for the size of the environment, etc. It's easier just to choose a conservative value and hope for the best.

BashFAQ/095 (last edited 2018-07-06 17:47:06 by GreyCat)