8.0 Collective Communication

Collective communication involves communication of data using all processes inside of a given communicator, the default communicator that contains all available processes is called MPI_COMM_WORLD. When a collective call is made it must be called by all processes inside of the communicatior. Collective communications will not interfere with point-to-point communications nor will point-to-point communications interfere with collective communication. Collective communications also do not need the use of tags. Send and receive buffers when using collective communication calls must match in order for the call to work and there is no guarantee that a function will be synchronizing (except for barrier). Also all collective communication operations are blocking. These are some things to keep in mind while using collective communication operations.

8.1 Types of collective communication

Collective communication operations are made of the following types:

  1. Barrier Synchronization – Blocks until all processes have reached a synchronization point

  2. Data Movement (or Global Communication) – Broadcast, Scatters, Gather, All to All transmission of data across the communicator.

  3. Collective Operations (or Global Reduction) – One process from the communicator collects data from each process and performs an operation on that data to compute a result.

8.2 Barrier Synchronization

8.2.1 MPI_Barrier

MPI_Barrier( MPI_Comm comm );

Parameter Meaning of Parameter
comm communicator (handle)

MPI_Barrier blocks until all process have reached this routine.

8.3 Data Movement (or Global Communication)

8.3.1 MPI_Bcast

MPI_Bcast( void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm );

Parameter Meaning of Parameter
buffer starting address of buffer (choice)
count number of entries in buffer (integer)
datatype datatype of buffer (handle)
root rank of broadcast root (integer)
comm communicator (handle)

MPI_Bcast broadcasts a message from the process with rank "root" to all other processes of the group.

8.3.2 MPI_Scatter

MPI_Scatter( void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, int root, MPI_Comm comm );

Parameter Meaning of Parameter
sendbuf address of send buffer (choice, significant only at root)
sendcnt number of elements sent to each process (integer, significant only at root)
sendtype data type of send buffer elements (significant only at root) (handle)
recvbuf address of receive buffer (choice)
recvcnt number of elements in receive buffer (integer)
recvtype data type of receive buffer elements (handle)
root rank of sending process (integer)
comm communicator (handle)

MPI_Scatter sends data from one task to all other tasks in a group.

8.3.3 MPI_Gather

MPI_Gather( void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm );

Parameter Meaning of Parameter
sendbuf starting address of send buffer (choice)
sendcount number of elements in send buffer (integer)
sendtype data type of send buffer elements (handle)
recvbuf address of receive buffer (choice, significant only at root)
recvcount number of elements for any single receive (integer, significant only at root)
recvtype data type of receive buffer elements (significant only at root) (handle)
root rank of receiving process (integer)
comm communicator (handle)

MPI_Gather gathers together values from a group of processes.

8.3.4 MPI_Allgather

MPI_Allgather( void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm );

Parameter Meaning of Parameter
sendbuf starting address of send buffer (choice)
sendcount number of elements in send buffer (integer)
sendtype data type of send buffer elements (handle)
recvbuf address of receive buffer (choice)
recvcount number of elements received from any process (integer)
recvtype data type of receive buffer elements (handle)
comm communicator (handle)

MPI_Allgather gathers data from all tasks and distribute it to all.

8.4 Collective Operations (or Global Reduction)

8.4.1 MPI_Reduce

MPI_Reduce( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm );

Parameter Meaning of Parameter
sendbuf address of send buffer (choice)
recvbuf address of receive buffer (choice, significant only at root)
count number of elements in send buffer (integer)
datatype data type of elements in send buffer (handle)
op reduction operation (handle)
root rank of root process (integer)
comm communicator (handle)

MPI_Reduce reduces values on all processes to a single value. The example illustration is using the operation MPI_SUM.

The op handle can be many predefined operations give to us by MPI or it can also be a user-defined operation. The following table lists predefined reduction operations available for you to use:

MPI Reduction Operation
Meaning
C Data Types
MPI_MAX Maximum integer, float
MPI_MIN Minimum integer, float
MPI_SUM Sum integer, float
MPI_PROD Product integer, float
MPI_LAND Logical AND integer
MPI_BAND Bitwise AND integer, MPI_BYTE
MPI_LOR Logical OR integer
MPI_BOR Bitwise OR integer, MPI_BYTE
MPI_LXOR Logical XOR integer
MPI_BXOR Bitwise XOR integer, MPI_BYTE
MPI_MAXLOC Maximum Value and Location float, double and long double
MPI_MINLOC Minimum Values and Location float, double and long double

8.4.2 MPI_Allreduce

MPI_Allreduce( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm );

Parameter Meaning of Parameter
sendbuf address of send buffer (choice)
recvbuf starting address of receive buffer (choice)
count number of elements in send buffer (integer)
datatype data type of elements in send buffer (handle)
op operation (handle)
comm communicator (handle)

MPI_Allreduce combines values from all processes and distribute the result back to all processes. The illustration is using the illustration MPI_SUM.

8.4.3 MPI_Reduce_scatter

MPI_Reduce_scatter( void *sendbuf, void *recvbuf, int *recvcounts, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm );

Parameter Meaning of Parameter
sendbuf address of send buffer (choice)
recvbuf starting address of receive buffer (choice)
recvcounts integer array specifying the number of elements in result distributed to each process. Array must be identical on all calling processes.
datatype data type of elements of input buffer (handle)
op operation (handle)
comm communicator (handle)

MPI_Reduce_scatter combines values and scatters the results. The illustration is using the operation MPI_SUM.

8.4.4 MPI_Scan

MPI_Scan( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm );

Parameter Meaning of Parameter
sendbuf starting address of send buffer (choice)
recvbuf starting address of receive buffer (choice)
count number of elements in input buffer (integer)
datatype data type of elements of input buffer (handle)
op operation (handle)
comm communicator (handle)

MPI_Scan computes the scan (partial reductions) of data on a collection of processes. The illustration is using the operation MPI_SUM.

8.5 User defined operations for MPI_Reduce and MPI_Scan

MPI gives us the ability to create our own operations that can be used with the MPI_Reduce or MPI_Scan calls. To do this you must declare a C function using this prototype:

typedef void MPI_User_function( void *invar, void * outvar, int *len, MPI_Datatype * datatype );

Then you need to a handle to the function using MPI_Op_create, here are the details of the call:

8.5.1 MPI_Op_create

Make sure you declare a variable for the handle before calling MPI_Op_create. EX:

MPI_Op myOp;

MPI_Op_create( MPI_User_function *function, int commute, MPI_Op *myOp );

Parameter Meaning of Parameter
function user defined function (function)
commute true if commutative; false otherwise.
myOp operation (handle)

MPI_Op_create creates a user-defined combination function handle.

This new hande, op variable, that is created can now be used with a call to MPI_Reduce and MPI_Scan.

8.5.2 MPI_Op_free

MPI_Op_free( MPI_Op *op );

Parameter Meaning of Parameter
op operation (handle)

MPI_Op_free frees a user-defined combination function handle.