This example shows a common approach for having all processors act on all
data, with each processor holding part of the data. Each processor circulates
a piece of data to the next processor in line, starting with the data that it
has. If this is done using MPI_Isend and MPI_Irecv, with the MPI_Wait calls
after the computation, there is the possibility for the communication to
overlap with the computation. But this can cause problems for MPI
implementations that use Rendezvous with sender push, and which poll for MPI
activity.
This example is easier to describe as code; feel free to look at the solution
and the log files. To simplify the interpretation, the code outputs the time
that communication, computation, and both together take.