SGI MPI Applications
In many cases, you can bypass the procedure described in this section. For more information, see
“Debugging MPI Programs”.
TotalView can acquire processes started by SGI MPI applications. This MPI is part of the Message Passing Toolkit (MPT) 1.3 and 1.4 packages. TotalView can display the Message Queue Graph Window for these releases. See
“Displaying the Message Queue Graph Window” for message queue display.
Starting TotalView on an SGI MPI Job
You normally start SGI MPI programs by using the mpirun command. You use a similar command to start an MPI program under debugger control, as follows:
{ totalview | totalviewcli } mpirun -a mpirun-command-line
This invokes TotalView and tells it to show you the machine code for
mpirun. Since you’re not usually interested in debugging this code, use the
Process > Go command to let the program run.
The SGI MPI mpirun command runs and starts all MPI processes. After TotalView acquires them, it asks if you want to stop them at startup. If you answer Yes, TotalView halts them before they enter the main program. You can then create breakpoints.
If you set a verbosity level that allows informational messages, TotalView also prints a message that shows the name of the array and the value of the array services handle (ash) to which it is attaching.
Attaching to an SGI MPI Job
To attach to a running SGI MPI program, attach to the SGI MPI mpirun process that started the program. The procedure for attaching to an mpirun process is the same as that for attaching to any other process.
After you attach to the mpirun process, TotalView asks if you also want to attach to slave MPICH processes. If you do, press Return or choose Yes. If you do not, choose No.
If you choose Yes, TotalView starts the server processes and acquires all MPICH processes.
As an alternative, you can use the
Group > Attach Subset command to predefine what to do.
Using ReplayEngine with SGI MPI
SGI MPI uses the xpmem module to map memory from one MPI process to another during job startup. Memory mapping is enabled by default. The size of this mapped memory can be quite large, and can have a negative effect on TotalView’s ReplayEngine performance. Therefore, mapped memory is limited by default for the xpmem module if Replay is enabled. The environment variable, MPI_MEMMAP_OFF, is set to 1 in the TotalView file parallel_support.tvd by adding the variable to the replay_env: specification as follows: replay_env: MPI_MEMMAP_OFF=1.
If full memory mapping is required, set the startup environment variable in the Arguments Tab’s Startup Parameters dialog window. Add the following to the environment variables: MPI_MEMMAP_OFF=0.
Be aware that the default mapped memory size may prove to be too large for ReplayEngine to deal with, and it could be quite slow. You can limit the size of the mapped heap area by using the MPI_MAPPED_HEAP_SIZE environment variable documented in the SGI documentation. After turning off MEMMAP_OFF as described above, you can set the size (in bytes) in the TotalView startup parameters.
For example:
MPI_MAPPED_HEAP_SIZE=1048576
SGI has a patch for an
MPT/XPMEM issue. Without this patch,
XPMEM can crash the system if ReplayEngine is turned on. To get the
XPMEM fix for the munmap problem, either upgrade to ProPack 6 SP 4 or install SGI patch 10570 on top of ProPack 6 SP 3.