\/ paper.tex

User's Guide for mpich,a Portable Implementation of MPIVersion 1.2.1

William Gropp and Ewing Lusk

This User's Guide corresponds to Version 1.2.1 of mpich. It was processed by LaTeX on Tue Sep 5 14:50:55 2000.

MPI (Message-Passing Interface) is a standard specification for message-passing libraries. mpich is a portable implementation of the full MPI specification for a wide variety of parallel and distributed computing environments. This paper describes how to build and run MPI programs using the mpich implementation of MPI.

Version 1.2.1 of mpich is primarily a bug fix and increased portability release, particularly for LINUX-based clusters.

New in 1.2.1:

* Improved support for assorted Fortran and Fortran 90 compilers. In particular, a single version of MPICH can now be built to use several different Fortran compilers; see the installation manual (in doc/install.ps.gz) for details.
* Using a C compiler for MPI programs that use MPICH that is different from the one that MPICH was built with is also easier now; see the installation manual.
* Significant upgrades have been made to the MPD system of daemons that provide fast startup of MPICH jobs, management of stdio, and a crude parallel debugger based on gdb. See the README file in the mpich/mpid/mpd directory and the mpich User's Guide for information on how to use the MPD system with mpich.
* The NT version of MPICH has been further enhanced and is available separately; see the MPICH download page http://www.mcs.anl/gov/mpi/mpich/download.html.
* The MPE library for logging and program visualization has been much improved. See the file mpe/README for more details.
* A new version of ROMIO, 1.0.3, is included. See romio/README for details.
* A new version of the C++ interface from the University of Notre Dame is also included.
* Known problems and bugs with this release are documented in the file mpich/KnownBugs .
* There is an FAQ at http://www.mcs.anl.gov/mpi/mpich/faq.html . See this if you get "permission denied", "connection reset by peer", or "poll: protocol failure in circuit setup" when trying to run MPICH.
* There is a paper on jumpshot available at ftp://ftp.mcs.anl.gov/pub/mpi/jumpshot.ps.gz . A paper on MPD is available at ftp://ftp.mcs.anl.gov/pub/mpd.ps.gz.

Features that were new in 1.2.0 are:
* Full MPI 1.2 compliance, including cancel of sends
* IMPI (Interoperable MPI [2]) style flow control.
* A Windows NT version is now available as open source. The installation and use for this version is different; this manual covers only the Unix version of mpich.
* Support for SMP-clusters in mpirun.
* A Fortran 90 MPI module (actually two, see Section Fortran 90 and the MPI module ).
* Support for MPI_INIT_THREAD (but only for MPI_THREAD_SINGLE)
* Support for VPATH-style installations, along with a installation process and choice of directory names that is closer to the GNU-recommended approach
* A new, scalable log file format, SLOG, for use with the MPE logging tools. SLOG files can be read by a new version of Jumpshot which is included with this release.
* Updated ROMIO
* A new device for networked clusters, similar to the p4 device but based on daemons and thus supporting a number of new convenience features, including fast startup. See Section Fast Startup with the Multipurpose Daemon and the ch_p4mpd for details.

Features that were new in 1.1.1 are:
* The ROMIO subsystem implements a large part of the MPI-2 standard for parallel I/O. For details on what types of file systems runs on and current limitations, see the Romio documentation in romio/doc.
* The MPI-2 standard C++ bindings are available for the MPI-1 functions.
* A new Globus device, globus2, is available. It replaces the previous globus device. See Section Computational Grids: the globus2 device and Appendix mpirun and Globus .
* A new program visualization program, called Jumpshot, is available as an alternative to the upshot and nupshot programs.


Contents

  • Introduction
  • Linking and running programs
  • Scripts to Compile and Link Applications
  • Fortran 90 and the MPI module
  • Compiling and Linking without the Scripts
  • Running with mpirun
  • SMP Clusters
  • Multiple Architectures
  • More detailed control
  • Special features of different systems
  • Workstation clusters
  • Checking your machines list
  • Using the Secure Shell
  • Using the Secure Server
  • Heterogeneous networks and the ch_p4 device
  • Environment Variables used by P4
  • Using special interconnects
  • Using Shared Libraries
  • Fast Startup with the Multipurpose Daemon and the ch_p4mpd Device
  • Goals
  • Introduction
  • Examples
  • How the Daemons Work
  • Debugging
  • Computational Grids: the globus2 device
  • MPPs
  • IBM SP
  • Intel Paragon
  • Symmetric Multiprocessors (SMPs)
  • Sample MPI programs
  • The MPE library of useful extensions
  • Logfile Creation
  • Logfile Format
  • Parallel X Graphics
  • Other MPE Routines
  • Profiling Libraries
  • Accumulation of Time Spent in MPI routines
  • Automatic Logging
  • Customized Logging
  • Real-Time Animation
  • Logfile Viewers
  • Upshot and Nupshot
  • Jumpshot-2 and Jumpshot-3
  • Accessing the profiling libraries
  • Automatic generation of profiling libraries
  • Tools for Profiling Library Management
  • Debugging MPI programs with built-in tools
  • Error handlers
  • Command-line arguments for mpirun
  • MPI arguments for the application program
  • p4 Arguments for the ch_p4 Device
  • p4 Debugging
  • Setting the Working Directory for the p4 Device
  • Command-line arguments for the application program
  • Starting jobs with a debugger
  • Starting the debugger when an error occurs
  • Attaching a debugger to a running program
  • Signals
  • Related tools
  • Contents of the library files
  • Debugging MPI programs with TotalView
  • Preparing mpich for TotalView debugging
  • Starting an mpich program under TotalView control
  • Attaching to a running program
  • Debugging with TotalView
  • Summary
  • Other MPI Documentation
  • In Case of Trouble
  • Problems compiling or linking Fortran programs
  • General
  • Problems Linking C Programs
  • General
  • Sun Solaris
  • HPUX
  • LINUX
  • Problems starting programs
  • General
  • Workstation Networks
  • Intel Paragon
  • IBM RS6000
  • IBM SP
  • Programs fail at startup
  • General
  • Workstation Networks
  • Programs fail after starting
  • General
  • HPUX
  • ch_shmem device
  • LINUX
  • Workstation Networks
  • Trouble with Input and Output
  • General
  • IBM SP
  • Workstation Networks
  • Upshot and Nupshot
  • General
  • HP-UX
  • Appendices
  • Automatic generation of profiling libraries
  • Writing wrapper definitions
  • Options for mpirun
  • mpirun and Globus
  • Using mpirun To Construct An RSL Script For You
  • Using mpirun By Supplying Your Own RSL Script
  • Acknowledgments
  • Bibliography