The performance of this example depends on both the MPI implementation and the
size of the problem. Because blocking sends and receives are used, for a
large enough problem, every send except the one sending the MPI_PROC_NULL (at
the top or bottom edge) will block until the send closer to that final send
completes. This produces a ripple pattern in the sends and receives.