Use MPI_Wtime to benchmark the performance of the system memcpy routine on your system. Generate a table for 1, 2, 4, 8, ..., 524288 integers showing the number of bytes, time to send, and the rate in Megabytes per second. Use unaligned data items; that is, make sure that the low-order bits of the source and destination addresses are different. Also, ensure that the source and destination are "well separated" in memory.

You should perform enough memcpy operations to take a good fraction of a second; the sample solution does 100000/size iterations for size integers. It also repeats the test 10 times and reports the best time.