Because the configuration of most Cray systems typically varies from site to site, the following provides only general guidelines for starting TotalView on your application. Please consult your site's documentation for the specific steps needed to debug a program using TotalView on your Cray system.
File System Considerations
Place your application to debug on a file system that is shared across all Cray node types, such as the service, elogin, login/MOM, and/or compute nodes. This allows you to compile and debug your application across node types.
Further, make sure that your $HOME/.totalview directory is on a shared file system that is common across all Cray node types to ensure that you can use the tvconnect feature in batch scripts. For more information, see Chapter 19, "Reverse Connections".
Starting TotalView
TotalView typically runs interactively. If your site has not designated any compute nodes for interactive processing, you can allocate compute nodes for interactive use. Use PBS Pro's qsub -I or SLURM's salloc command to allocate an interactive job. Be sure that your X11 DISPLAY environment variable is propagated or set properly. See "man qsub" or "man salloc" for more information on interactive jobs.
If TotalView is installed on your system, load it into your user environment:
module load totalview
Use the following command to start TotalView where mpi_starter is the MPI starter program for your system, such as aprun or srun.
TotalView is not able to stop your program before it calls MPI_Init() when using ALPS. While this is typically at the beginning of main(), the actual location depends on how you’ve written the program. This means that if you set a breakpoint before the MPI_Init() call, your program will not hit it because the statement upon which you set the breakpoint will have already executed. On the other hand, SLURM will stop your program before it enters main(), which allows you to debug the statements before MPI_Init() is called.
Example 1: Interactive Jobs Using qsub and aprun
This example shows how you can start TotalView on a program named a.out running in an interactive job using qsub and aprun.
Similarly, you can debug an interactive job using salloc and srun if your Cray system uses SLURM.
This example shows how to submit a SLURM batch job using tvconnect in the batch script. After the batch job starts running, TotalView is started to accept the reverse-connect request.