Category: Parallel

Multiple Streams Write to the Same File with GNU C Library

Types of Channels

(1) You can have multiple file descriptors and streams (let’s call both streams and descriptors “channels” for short) connected to the same file. [1]

(2) There are two cases to consider: linked channels that share a single file position value, and independent channels that have their own file positions. [1]

(3) It’s impossible for two channels to have separate file pointers for a file that doesn’t support random access. Thus, channels for reading or writing such files are always linked, never independent. [2]

(4) Append-type channels are also always linked. For these channels, follow the rules for linked channels. [2]

Independent Channels

(1) You should clean an output stream after use, before doing anything else that might read or write from the same part of the file. [2]

This statement implies that it is OK for multiple independent channels each write to different parts of the same file at the same time as long as they clean themselves after use.

(2) You should clean an input stream before reading data that may have been modified using an independent channel. Otherwise, you might read obsolete data that had been in the stream’s buffer. [2]

Reference:

[1] http://www.gnu.org/software/libc/manual/html_node/Stream_002fDescriptor-Precautions.html#Stream_002fDescriptor-Precautions

[2] http://www.gnu.org/software/libc/manual/html_node/Independent-Channels.html#Independent-Channels

Use xargs with functions

Example:

—————————-
worker () {
echo $1
}

export -f worker # This is important

cat inputList.txt | xargs -L 1 -I {} -P 4 bash -c “worker {}”
# Use bash -c “CommandToExecute”

—————————-

This trick can also be used with the find command.
find . -name ‘*.txt’ -exec bash -c “worker {}” \;
# Do not forget the “\;” termination for the -exec option of find.