MPI Startup Customizations

Here you will find information that will allow you to create startup profiles for environments that TotalView doesn't define. Any customizations made to your MPI environment will be available for later selection in the Session Editor where they will appear in the File > Debug a Parallel Program dialog.

In general, TotalView supports various Message Passing Interface (MPI) implementations with no special configuration on your part. However, subtle differences in your environment or an implementation can cause difficulties that prevent TotalView from automatically starting your program. In these cases, you’ll need to define how TotalView behaves.

Customizing Your Parallel Configuration

Select a parallel configuration in the File > Debug a Parallel Program dialog box. If the provided default configurations do not meet your needs, you can either overwrite these configurations or create new ones.

The default definitions for parallel configurations reside in the parallel_support.tvd file, located in your totalview/lib installation directory. Use the variable TV::parallel_configs to customize parallel configurations.

TotalView Customizations

Set the TV::parallel_configs variable, either local to your TotalView installation or globally:

Globally, in your system's .tvdrc file. If you set this variable here, everyone using this TotalView version will see the definition.

Locally, in your .totalview/tvdrc file. You will be the only person to see this definition when you start TotalView.

You can also directly edit the parallel_support.tvd file, located in the totalview/lib installation directory area, but reinstalling TotalView overwrites this file so this is not recommended.

If you are using a locally-installed MPI implementation, add it to your PATH variable. By default, TotalView uses the information in PATH to find the parallel launcher (for example, mpirun, mpiexec, poe, srun, prun, dmpirun, and so on). Generally, if you can run your parallel job from a command line, TotalView can also run it.

If you have multiple installed MPI systems — for example, multiple versions of MPICH installed on a common file server — only one can be in your path. In this case, specify an absolute path to launch it, which means you will need to customize the TV::parallel_configs list variable or the parallel_support.tvd file contained within your installation directory so that it does not rely on your PATH variable.

The easiest way to create your own startup configuration for TotalView is to copy a similar configuration from the TV::private::parallel_configs_base variable (found in the parallel_support.tvd file, located in your installation directory at totalview/lib) to the TV::parallel_configs variable, and then edit it. Save the TV::parallel_configs variable in the tvdrc file located in the .totalview subdirectory in your home directory.

When you add configurations, they are simply added to a list. This means that if TotalView supplies a definition named foo and you create a definition also named foo, both exist and your product chooses the first one in the list. Because both are displayed, be careful to give each new definition a unique name.

Example Parallel Configuration Definitions

This section provides three examples of customized parallel configurations. See MPI Startup Customizations for information on where to place these definitions.

Any customizations made to your MPI environment will be available for later selection in the Session Editor, where they will appear in the Parallel Session dialog's Parallel Environment list.

Here are three examples:

dset TV::parallel_configs {

#Open MPI

name: Open MPI 5;

description: Open MPI 5;

starter: mpiexec %s %p %a;

style: bootstrap;

tasks_option: -np;

nodes_option: ;

env_option: -x;

env_style: assign_nocomma;

env: ;

comm_world: (void *) &ompi_mpi_comm_world;

pretest: prte_info;

}

 

dset TV::parallel_configs {

#CustomMPICH

name: Custom-MPICH;

description: Custom MPICH;

starter: $mpiexec -tvsu %s %p %a;

style: any;

tasks_option: -n;

env_option: -env;

env_style: assign_space_repeat;

comm_world: 0x44000000;

pretest: mpich2version

}

 

dset TV::parallel_configs {

# AIX POE

name: poe - AIX;

description: IBM PE - AIX;

tasks_option: -procs;

tasks_env: MP_PROCS;

nodes_option: -nodes;

starter: /bin/poe %p %a %s;

style: manager_process;

env: NLSPATH=/usr/lib/nls/msg/%L/%N/: \

/usr/lib/nls/msg/%L/%N.cat;

service_tids: 2 3 4;

comm_world: 0;

pretest: test -x /bin/poe

msq_lib: /usr/lpp/ppe.poe/lib/%m

}

All lines (except for comments) end with a semi-colon (;). Add spaces freely to improve the readability of these definitions as TotalView ignores them.

Notice that the Custom-MPICH definition contains the $mpiexec variable. This variable is defined elsewhere in the parallel_support.tvd file as follows:

set mpiexec mpiexec;

There is no limit to how many definitions you can place within the parallel_support.tvd file or within a variable. The definitions you create will appear in the Parallel Environment list in the Session Editor and can be used as an argument to the -mpi option of the CLI's dload command.

The fields that you can set are as follows:

comm_world

Use this option only when style is set to bootstrap. This variable is the definition of MPI_COMM_WORLD in C and C++. MPI_COMM_WORLD is usually a #define or enum to a special number or a pointer value. If you do not include this field, TotalView and MemoryScape cannot acquire the rank for each MPI process.

description

(optional) A string describing what the configuration is used for. There is no length limit.

env

(optional) Defines environment variables that are placed in the starter program's environment. (Depending on how the starter works, these variables may not make their way into the actual ranked processes.) If you are defining more than one environment variable, define each in its own env clause.

The format to use is:

variable_name=value

env_option

(optional) Names the command-line option that exports environment variables to the tasks started by the launcher program. Use this option along with the env_style field.

env_style

(optional) Contains a list of environment variables that are passed to tasks.

assign: The argument to be inserted to the command-line option named in env_option is a comma-separated list of environment variable name=value pairs; that is,

NAME1=VALUE1,NAME2=VALUE2,NAME3=VALUE3

This option is ignored if you do not use an env_option clause.

assign_space_repeat: The argument after env_option is a space-separated name/value pair that is assigned to an environment variable. The command within env_option is repeated for each environment variable; that is, suppose you enter:

-env NAME1 VALUE1 -env NAME2 VALUE2

-env NAME3 VALUE3

This mode is primarily used for the mpiexec.py MPICH starter program.

excenv

One of the following three strings:

export: The argument to be inserted after the command named in env_option. This is a comma-separated list of environment variable names; that is,

NAME1,NAME2,NAME3

This option is ignored if you do not use the env_option clause.

force: Environment variables are forced into the ranked processes using a shell script. TotalView or MemoryScape will generate a script that launches the target program. The script also tells the starter to run that script. This clause requires that your home directory be visible on all remote nodes. In most cases, you will use this option when you need to dynamically link memory debugging into the target. While this option does not work with all MPI implementations, it is the most reliable method for MPICH1.

none: No argument is inserted after env_option.

msq_lib

(optional) Names the dynamically loaded library that TotalView uses when it needs to locate message queue information. You can name this file using either a relative or full pathname.

name

A short name describing the configuration. This name shows up in such places as the File > Debug a Parallel Program dialog box and in the Process > Modify Arguments dialog box. TotalView remembers which configuration you use when starting a program so that it can automatically reapply the configuration when you restart the program.

Because the configuration is associated with a program's name, renaming or moving the program destroys this association.

nodes_option

Names the command-line option (usually -nodes) that sets the number of node upon which your program runs. This statement does not define the value that is the argument to this command-line option.

Only omit this statement if your system doesn't allow you to control the number of nodes from the command line. If you set this value to zero (“0”), this statement is omitted.

pretest

(optional) Names a shell command that is run before the parallel job is launched. This command must run quickly, produce a timely response, and have no side-effects. This is a test, not a setup hook.

TotalView may kill the test if it takes too long. It may call it more than once to be sure if everything is OK. If the shell command exit is not as expected, TotalView requires permission before continuing,

pretext_exit

The expected error code of the pretest command. The default is zero.

service_tids

(optional) The list of thread IDs that TotalView mark as service threads.

A service thread differs from a system manager thread in that it is created by the parallel runtime and are not created by your program. POE for example, often creates three service threads.

starter

Defines a template that TotalView uses to create the command line that starts your program. In most cases, this template describes the relative position of the arguments. However, you can also use it to add extra parameters, commands, or environment variables. Here are the three substation parameters:

%a: Replaced with the command-line arguments passed to rank processes.

%p: Replaced with the absolute pathname of the target program.

%s: Replaced with additional startup arguments. These are parameters to the starter process, not the rank processes.

For example:

starter: mpirun -tv -all-local %s %p %a;

When the user selects a value for the option indicated by the nodes_option and tasks_options, the argument and the value are placed within the %s parameter. If you enter a value of 0 for either of these, TotalView omits the parameter.

style

MPI programs can be launched in multiple ways, either by a manager process or by a script. Use this option to name the predefined method, as follows:

manager_process (Native): The parallel system uses a binary manager process to oversee process creation and process lifetime. TotalView attaches to this process and communicates with it using MPIR debug interface. For example, MPICH uses this style.

style: manager_process;

bootstrap: The parallel system attempts to launch an uninstrumented MPI by interposing TotalView inside the parallel launch sequence in place of the target program. This should work on all MPIs.

setup_script: The parallel system uses a script—which is often mpirun—to set up the arguments, environment, and temporary files. However, the script does not run as part of the parallel job. This script must understand the -tv command-line option and the TOTALVIEW environment variable.

tasks_env

The name of an environment variable whose value is the expected number of parallel tasks. This is consulted when the user does not explicitly specify a task count.

tasks_option

(sometimes required) Lets you define the option (usually -np or -procs) that controls the total number of tasks or processes.

Only omit this statement if your system doesn't allow you to control the number of tasks from the command line. If you set this to 0, this statement is omitted.