Example Parallel Configuration Definitions
This section provides three examples of customized parallel configurations. See
Customizing Your Parallel Configuration for information on where to place these definitions.
NOTE: Any customizations made to your MPI environment will be available for later selection in the Sessions Manager where they will appear in the File > Debug a Parallel Program dialog's Parallel System list.
Here are three examples:
dset TV::parallel_configs {
#Argonne MPICH
name: MPICH;
description: Argonne MPICH;
starter: mpirun -tv -ksq %s %p %a;
style: setup_script;
tasks_option: -np;
nodes_option: -nodes;
env_style: force;
pretest: mpichversion;
#Argonne MPICH2
name: MPICH2;
description: Argonne MPICH2;
starter: $mpiexec -tvsu %s %p %a;
style: manager_process;
tasks_option: -n;
env_option: -env;
env_style: assign_space_repeat;
comm_world: 0x44000000;
pretest: mpich2version
# AIX POE
name: poe - AIX;
description: IBM PE - AIX;
tasks_option: -procs;
tasks_env: MP_PROCS;
nodes_option: -nodes;
starter: /bin/poe %p %a %s;
style: bootstrap;
env: NLSPATH=/usr/lib/nls/msg/%L/%N/: \
/usr/lib/nls/msg/%L/%N.cat;
service_tids: 2 3 4;
comm_world: 0;
pretest: test -x /bin/poe
msq_lib: /usr/lpp/ppe.poe/lib/%m
}
All lines (except for comments) end with a semi-colon (;). Add spaces freely to improve the readability of these definitions as TotalView ignores them.
Notice that the MPICH2 definition contains the $mpiexec variable. This variable is defined elsewhere in the parallel_support.tvd file as follows:
set mpiexec mpiexec;
There is no limit to how many definitions you can place within the parallel_support.tvd file or within a variable. The definitions you create will appear in the Parallel system pulldown list in the File > Debug a Parallel Program dialog box and can be used as an argument to the --mpi option of the CLI's dload command.
The fields that you can set are as follows:
comm_world
Use this option only when style is set to bootstrap. This variable is the definition of MPI_COMM_WORLD in C and C++. MPI_COMM_WORLD is usually a #define or enum to a special number or a pointer value. If you do not include this field, TotalView cannot acquire the rank for each MPI process.
description
(optional) A string describing what the configuration is used for. There is no length limit.
env
(optional) Defines environment variables that are placed in the starter program's environment. (Depending on how the starter works, these variables may not make their way into the actual ranked processes.) If you are defining more than one environment variable, define each in its own env clause.
The format to use is:
variable_name=value
env_option
(optional) Names the command-line option that exports environment variables to the tasks started by the launcher program. Use this option along with the env_style field.
env_style
(optional) Contains a list of environment variables that are passed to tasks.
assign: The argument to be inserted to the command-line option named in env_option is a comma-separated list of environment variable name=value pairs; that is,
NAME1=VALUE1,NAME2=VALUE2,NAME3=VALUE3
This option is ignored if you do not use an env_option clause.
assign_space_repeat: The argument after env_option is a space-separated name/value pair that is assigned to an environment variable. The command within env_option is repeated for each environment variable; that is, suppose you enter:
-env NAME1 VALUE1 -env NAME2 VALUE2
-env NAME3 VALUE3
This mode is primarily used for the mpiexec.py MPICH2 starter program.
excenv
One of the following three strings:
export: The argument to be inserted after the command named in env_option. This is a comma-separated list of environment variable names; that is,
NAME1,NAME2,NAME3
This option is ignored if you do not use the env_option clause.
force: Environment variables are forced into the ranked processes using a shell script. TotalView or MemoryScape will generate a script that launches the target program. The script also tells the starter to run that script. This clause requires that your home directory be visible on all remote nodes. In most cases, you will use this option when you need to dynamically link memory debugging into the target. While this option does not work with all MPI implementations, it is the most reliable method for MPICH1.
none: No argument is inserted after env_option.
msq_lib
(optional) Names the dynamically loaded library that TotalView uses when it needs to locate message queue information. You can name this file using either a relative or full pathname.
name
A short name describing the configuration. This name shows up in such places as the File > Debug a Parallel Program dialog box and in the Process > Modify Arguments dialog box. TotalView remembers which configuration you use when starting a program so that it can automatically reapply the configuration when you restart the program.
Because the configuration is associated with a program's name, renaming or moving the program destroys this association.
nodes_option
Names the command-line option (usually -nodes) that sets the number of node upon which your program runs. This statement does not define the value that is the argument to this command-line option.
Only omit this statement if your system doesn't allow you to control the number of nodes from the command line. If you set this value to zero (“0”), this statement is omitted.
pretest
(optional) Names a shell command that is run before the parallel job is launched. This command must run quickly, produce a timely response, and have no side-effects. This is a test, not a setup hook.
TotalView may kill the test if it takes too long. It may call it more than once to be sure if everything is OK. If the shell command exit is not as expected, TotalView asks for permission before continuing,
pretext_exit
The expected error code of the pretest command. The default is zero.
service_tids
(optional) The list of thread IDs that TotalView marks as service threads.
A service thread differs from a system manager thread in that it is created by the parallel runtime and are not created by your program. POE for example, often creates three service threads.
starter
Defines a template that TotalView uses to create the command line that starts your program. In most cases, this template describes the relative position of the arguments. However, you can also use it to add extra parameters, commands, or environment variables. Here are the three substation parameters:
%a: Replaced with the command-line arguments passed to rank processes.
%p: Replaced with the absolute pathname of the target program.
%s: Replaced with additional startup arguments. These are parameters to the starter process, not the rank processes.
For example:
starter: mpirun -tv -all-local %s %p %a;
When the user selects a value for the option indicated by the nodes_option and tasks_options, the argument and the value are placed within the %s parameter. If you enter a value of 0 for either of these, TotalView omits the parameter.
style
MPI programs are launched in two ways: either by a manager process or by a script. Use this option to name the method, as follows:
manager_process: The parallel system uses a binary manager process to oversee process creation and process lifetime. Our products attach to this process and communicate with it using its debug interface. For example, IBM's poe uses this style.
style: manager_process;
setup_script: The parallel system uses a script—which is often mpirun—to set up the arguments, environment, and temporary files. However, the script does not run as part of the parallel job. This script must understand the -tv command-line option and the TOTALVIEW environment variable.
bootstrap: The parallel system attempts to launch an uninstrumented MPI by interposing TotalView inside the parallel launch sequence in place of the target program. This does not work for MPICH and SGI MPT.
tasks_env
The name of an environment variable whose value is the expected number of parallel tasks. This is consulted when the user does not explicitly specify a task count.
tasks_option
(sometimes required) Lets you define the option (usually -np or -procs) that controls the total number of tasks or processes.
Only omit this statement if your system doesn't allow you to control the number of tasks from the command line. If you set this to 0, this statement is omitted.