Starting TotalView Debugging on an MPICH2 MPD Job
You must start the mpd daemon before starting an MPICH2 MPI job.
NOTE: As of MPICH2 1.4.1, the default job type is Hydra, rather than MPD, so if you are using the default, there is no need to start the daemon. See Starting TotalView Debugging on an MPICH2 Hydra Job.
Starting the MPI MPD Job with MPD Process Manager
To start the mpd daemon, use the mpdboot command. For example:
mpdboot -n 4 -f hostfile
-n 4
The number of hosts on which you wish to run the daemon. In this example, the daemon runs on four hosts
-f hostfile
Lists the hosts on which the application will run. In this example, a file named hostfile contains this list.
You are now ready to start debugging your application.
Starting an MPICH2 MPD Job
NOTE: In many cases, you can bypass the procedure described in this section. For more information, see Debugging MPI Programs.
Start an MPICH2 MPD job in one of the following ways:
mpiexec mpi-args -tv program -a program-args
This command tells MPI to start TotalView. You must have set the TOTALVIEW environment variable with the path to TotalView’s executable when you start a program using mpiexec. For example:
setenv TOTALVIEW \
This method of starting TotalView does not let you restart your program without exiting TotalView and you will not be able to attach to a running MPI job.
totalview python -a `which mpiexec` \
-tvsu mpiexec-args program program-args
This command lets you restart your MPICH2 job. It also lets you attach to a running MPICH2 job by using the Attach to a Running Program dialog box. You need to be careful that you attach to the right instance of python as it is likely that a few instances are running. The one to which you want to attach has no attached children—child processes are indented with a line showing the connection to the parent.
You may not see sources to your program at first. If you do see the program, you can set breakpoints. In either case, press the Go button to start your process. TotalView displays a dialog box when your program goes parallel that allows you to stop execution. (This is the default behavior. You can change it using the options within File >Preferences >Parallel page.)
You will also need to set the TOTALVIEW environment variable as indicated in the previous method.