IBM MPI Parallel Environment (PE) Applications
NOTE: In many cases, you can bypass the procedure described in this section. For more information, see
Debugging MPI Programs.
You can debug IBM MPI Parallel Environment (PE) applications on the IBM RS/6000 and SP platforms.
To take advantage of TotalView’s ability to automatically acquire processes, you must be using release 3,1 or later of the Parallel Environment for AIX.
Topics in this section are:
Preparing to Debug a PE Application
The following sections describe what you must do before TotalView can debug a PE application.
Using Switch-Based Communications
If you’re using switch-based communications (either IP over the switch or user space) on an SP computer, configure your PE debugging session so that TotalView can use IP over the switch for communicating with the TotalView Server (tvdsvr). Do this by setting the -adapter_use option to shared and the -cpu_use option to multiple, as follows:
If you’re using a PE host file, add
shared multiple after all host names or pool IDs in the host file.
Always use the following arguments on the
poe command line:
-adapter_use shared -cpu_use multiple
If you don’t want to set these arguments on the poe command line, set the following environment variables before starting poe:
setenv MP_ADAPTER_USE shared
setenv MP_CPU_USE multiple
When using IP over the switch, the default is usually shared adapter use and multiple cpu use; we recommend that you set them explicitly using one of these techniques. You must run TotalView on an SP or SP2 node. Since TotalView will be using IP over the switch in this case, you cannot run TotalView on an RS/6000 workstation.
Performing a Remote Login
You must be able to perform a remote login using the ssh command. You also need to enable remote logins by adding the host name of the remote node to the /etc/hosts.equiv file or to your .rhosts file.
When the program is using switch-based communications, TotalView tries to start the TotalView Server by using the ssh command with the switch host name of the node.
Setting Timeouts
If you receive communications timeouts, you can set the value of the MP_TIMEOUT environment variable; for example:
setenv MP_TIMEOUT 1200
If this variable isn’t set, TotalView uses a timeout value of 600 seconds.