Use MPI_Wtime to benchmark the performance of the system memcpy routine on
your system. Generate a table for 1, 2, 4, 8, ..., 524288 integers showing
the number of bytes, time to send, and the rate in Megabytes per second.
Use unaligned data items; that is, make sure that the low-order bits of the
source and destination addresses are different. Also, ensure that the
source and destination are "well separated" in memory.
You should perform enough memcpy operations to take a good fraction of a
second; the sample solution does 100000/size
iterations for
size
integers. It also repeats the test 10 times and reports the
best time.