Attaching to Processes Tips
In a typical multi-process job, you’re interested in some processes and not as much in others. By default, TotalView tries to attach to all of the processes that your program starts. If there are a lot of processes, there can be considerable overhead involved in opening and communicating with the jobs.
You can minimize this overhead by using the Attach Subset dialog box, shown in Figure 225.
Figure 225, Group > Attach Subset Dialog Box
NOTE: You can start MPI jobs in two ways. One requires that the starter program be under TotalView control and have special instrumentation for TotalView, while the other does not. In the first case, you will enter the name of the starter program on the command line. The other requires that you enter information into the File > Debug New Program or File > Debug New Parallel Program > dialog boxes. The Attach Subset command is available only if you directly name a starter program on the command line.
The Subset Attach dialog box can be launched in multiple ways. It is automatically available when you launch your job with the parallel preference set to “Ask what to do.” (See Figure 227). It is also available through other menu options after the job has been started, as discussed later in this section.
Selecting check boxes in the Attach column defines the processes to attach to. Although your program will launch all these processes, TotalView attaches only to the those you have selected.
The Attach All and Detach All buttons elect or deselect all the processes at once. You can then use the check boxes to select and deselect individual processes. For example, to attach to only a few processes in a lengthy list, use Detach All and then select those to which TotalView should attach.
The Filter controls restrict which processes are displayed; filtering is unrelated to attaching or detaching.
*The Communicator control specifies that the processes displayed must be involved with the communicators that you select. For example, if something goes wrong that involves a communicator, selecting it from the list displays only the processes that use that communicator. You can then use Attach All to attach to only those processes.
*The Talking to Rank control limits the processes displayed to those that receive messages from the indicated ranks. In addition to your rank numers, you can also select All or MPI_ANY_SOURCE.
*The Array of Ranks option is automatically selected and the array name displayed if you have invoked Tools > Attach Subset (Array of Ranks) from the Variable Window. In this case, the dialog box will only display the list of processes whose ranks match the array elements.
*The List of Ranks control allows you to enter rank numbers to filter on. Use a dash to indicate a range of ranks, and commas to indicate individual ranks. For example: 3, 10-16, 24.
*The three checkboxes in the Message Type area add yet another qualifier. Checking a box displays only communicators that are involved with a Send, Receive, or Unexpected message.
The Halt Control Group button is not active if the dialog box is launched after the job is already started. It is active only at the initial startup of a parallel job. You typically want to halt processes to allow the setting of breakpoints.
Many applications place values that indicate ranks in an array variable so that the program can refer to them as needed. You can display the variable in a Variable Window and then select the Tools > Attach Subset (Array of Ranks) command to display this dialog box. (See the Array of Ranks explanation above.)
You can use the Group > Attach Subset command at any time, but you would probably use it immediately before TotalView launches processes. Unless you have set preferences otherwise, TotalView stops and asks if you want it to stop your processes. When selected, the Halt control group check box also stops a process just before it begins executing.
Figure 226, Stop Before Going Parallel Question Box
If you click Yes, when the job stops the starter process should be at a “magic breakpoint.” These are set by TotalView behind the scene, and usually not visible. The other processes may or may not be at a “magic breakpoint.”
The commands on the Parallel Page in the File > Preferences Dialog Box control what TotalView does when your program goes parallel.
Figure 227, File > Preferences: Parallel Page
NOTE: TotalView displays the preceding question box only when you directly name a starter program on the command line.
The radio buttons in the When a job goes parallel or calls exec() area:
*Stop the group: Stops the control group immediately after the processes are created.
*Run the group: Allows all newly created processes in the control group to run freely.
*Ask what to do: Asks whether TotalView should start the created processes.
CLI: dset TV::parallel_stop
The radio buttons in the When a job goes parallel area:
*Attach to all: Automatically attaches to all processes at executing.
*Attach to none: Does not attach to any created process at execution.
*Ask what to do: Asks what processes to attach to. For this option, the same dialog box opens as that displayed for Group > Attach Subset. TotalView then attaches to the processes that you have selected. Note that this dialog box isn’t displayed when you set the preference; rather, it controls behavior when your program actually creates parallel processes.
CLI: dset TV::parallel_attach