Troubleshooting MPI Startup
If you can’t successfully start TotalView on MPI programs, check the following:
*Can you successfully start MPICH programs without TotalView?
The MPICH code contains some useful scripts that verify if you can start remote processes on all of the computers in your computers file. (See tstmachines in mpich/util.)
*You won’t get a message queue display if you get the following warning:
The symbols and types in the MPICH library used by TotalView to extract the message queues are not as expected in the image <your image name>. This is probably an MPICH version or configuration problem.
Check that you are using MPICH Version 1.1.0 or later and that you have configured it with the -debug option. (You can check this by looking in the config.status file at the root of the MPICH directory tree.)
*Does the TotalView Server (tvdsvr) fail to start?
tvdsvr must be in your PATH when you log in. Remember that TotalView uses ssh to start the server, and that this command doesn’t pass your current environment to remotely started processes.
*Make sure you have the correct MPI version and have applied all required patches. See the TotalView Release Notes at https://help.totalview.io/ for up-to-date information.
*Under some circumstances, MPICH kills TotalView with the SIGINT signal. You can see this behavior when you use the Group > Kill command as the first step in restarting an MPICH job.
If TotalView exits and terminates abnormally with a Killed message, try setting the TV::ignore_control_c variable to true.
 
 
 
The Group > Kill command
Individual Execution Commands and dkill in the TotalView Reference Guide.
Tips for debugging MPI applications
“MPI Debugging Tips and Tools” in the chapter “Debugging Strategies for Parallel Applications” in the Classic TotalView User Guide.
The TotalView server, tvdsvr
"The tvdsvr Command and its Options" in the TotalView Reference Guide
MPI version information
The TotalView Release Notes on the TotalView documentation page