Troubleshooting MPI Startup
If you can’t successfully start TotalView on MPI programs, check the following:
Can you successfully start MPICH programs without TotalView?
The MPICH code contains some useful scripts that verify if you can start remote processes on all of the computers in your computers file. (See tstmachines in mpich/util.)
You won’t get a message queue display if you get the following warning:
The symbols and types in the MPICH library used by TotalView to extract the message queues are not as expected in the image <your image name>. This is probably an MPICH version or configuration problem.
Check that you are using MPICH Version 1.1.0 or later and that you have configured it with the -debug option. (You can check this by looking in the config.status file at the root of the MPICH directory tree.)
Does the TotalView Server (
tvdsvr) fail to start?
tvdsvr must be in your PATH when you log in. Remember that TotalView uses ssh to start the server, and that this command doesn’t pass your current environment to remotely started processes.
Make sure you have the correct MPI version and have applied all required patches. See the TotalView Release Notes at
https://help.totalview.io/ for up-to-date information.
Under some circumstances, MPICH kills TotalView with the
SIGINT signal. You can see this behavior when you use the
Group > Kill command as the first step in restarting an MPICH job.
If TotalView exits and terminates abnormally with a
Killed message, try setting the
TV::ignore_control_c variable to true.