Totalview® for HPC User Guide : PART IV Advanced Tools and Customization : Chapter 23 Scalability in HPC Computing Environments : MRNet : Using MRNet with TotalView : Using MRNet on Blue Gene
Using MRNet on Blue Gene
The following sections describe the options and state variables that were added to TotalView to control the configuration and use of MRNet on Blue Gene. Please refer to the TotalView documentation for a general description of how options and state variables can be used with TotalView.
Blue Gene Server Command String
State variable: TV::bluegene_server_command_string string
Default string: %B/tvdsvr%K
This is a read-only state variable. It is expanded to the path of the TotalView Blue Gene debugger server command. TotalView expands the launch string using the normal launch string expansion rules.
Blue Gene/MRNet Server Launch String
Option: –bluegene_mrnet_server_launch_string string
State variable: TV::bluegene_mrnet_server_launch_string string
Default value: –set_pw %P -verbosity %V %F
Analogous to the standard Blue Gene server launch string, the Blue Gene MRNet server launch string is used when MRNet is used to launch the TotalView debugger servers on Blue Gene. TotalView expands the launch string using the normal launch string expansion rules.
The expanded string is written into the MPIR Process Acquisition Interface MPIR_server_arguments variable in the MPI starter process, such as mpirun, ‑runjob, or srun. The arguments are passed to the server command that is executed on the Blue Gene IO nodes. The MRNet launch string differs from the standard launch string in that the MRNet launch string does not contain the –callback option.
TotalView always appends the following string to the expanded Blue Gene MRNet launch string:
–mrnet_commnode temp-db-file
where temp-db-file is the path to the IO-to-FE node assignments database temporary file.
Blue Gene/MRNet Front-End Topology String
Option: –bluegene_mrnet_fe_topology string
State variable: TV::bluegene_mrnet_fe_topology string
Default value: "" (the empty string)
If set to a non-empty string, this string is used as the MRNet topology string for instantiating the Blue Gene communications tree on the front-end nodes. This string must be a well-formed MRNet topology string usable directly by MRNet.
For example, if TotalView is running on the front-end host dawndev4, this option can be used to instantiate two additional mrnet_commnode processes on that front-end host:
totalview -bluegene_mrnet_fe_topology \ 
  "dawndev4:0 => dawndev4-io:1 dawndev4-io:2 ;" \
-mrnet -stdin /dev/null -args mpirun ./ALLc
The servers on the IO node will connect back to the two mrnet_commnode processes on the front-end using the dawndev4-io network interface, and the two mrnet_commnode processes on the front-end will connect to the TotalView process (the root of the MRNet tree) using the dawndev4 network interface.
Blue Gene/MRNet Front-End Host List
Option: –bluegene_mrnet_fe_host_list string
State variable: TV::bluegene_mrnet_fe_host_list string
Default value: "" (the empty string)
This list specifies Blue Gene front-end nodes on which to run MRNet mrnet_commnode processes. If set to a non-empty string, this variable is used as the list of front-end nodes for instantiating the Blue Gene communications tree. The list must contain a unique set of host names or IP addresses.
This string is used only if TV::bluegene_mrnet_fe_topology is set to a non-empty string.
For example, if TotalView is running on the front-end host dawndev4, this option can be used to instantiate two additional mrnet_commnode processes on two other front-end nodes, dawndev1 and dawndev2:
totalview \
  -xplat_rsh ssh \
-bluegene_mrnet_fe_host_list "dawndev1-io dawndev2-io" \
-mrnet -stdin /dev/null -args mpirun ./ALLc
The servers on the IO node will connect back to the two mrnet_commnode processes on the front-end nodes using the dawndev1-io and dawndev2-io network interfaces, and the two mrnet_commnode processes on the front-end will connect to the TotalView process (the root of the MRNet tree) using the dawndev4-io network interface.
Blue Gene/MRNet Back-End Servers per IO Node
Blue Gene/P
Option: –bluegene_p_mrnet_back_end_servers_per_io_node integer
State variable: TV::bluegene_p_mrnet_back_end_servers_per_io_node integer
Default value: 1
Blue Gene/Q
Option: –bluegene_q_mrnet_back_end_servers_per_io_node integer
State variable: TV::bluegene_q_mrnet_back_end_servers_per_io_node integer
Default value: 16
These state variables control the number of back-end server processes created on the IO nodes.
On Blue Gene/P, the value must be set to 1 due to constraints in the IBM debugging interface.
On Blue Gene/Q the value must be greater than or equal to 1 and less than or equal to 64.
If the value is 1, no mrnet_commnode processes are created on the IO nodes. If the value is greater than 1, a single mrnet_commnode process is created on the IO nodes and the back-end server processes connect to that communications process. On Blue Gene/Q, the default is to create 16 back-end server processes per IO node, which evenly share the workload of debugging the target processes running on the compute node.
MRNet IO to FE Database Temporary File Directory
Option: –mrnet_io_to_fe_db_temp_dir string
State variable: TV::mrnet_io_to_fe_db_temp_dir string
Default value: . (dot)
This state variable specifies the directory in which to create the MRNet IO-to-FE node association database temporary file. The directory must be on a file system that is shared between the front-end and IO nodes. By default, the TotalView client process’s current working directory is used. TotalView creates a temporary file using the mkstemp() Posix.1-2001 function with the pattern tvdsvr_cp_db.XXXXXX, where XXXXXX is replaced with a string that makes the filename unique. After the MRNet tree is fully instantiated, TotalView deletes this temporary file.
Blue Gene Front-End Tree Considerations
As described above, TotalView has several options and state variables that allow for alternate front-end trees. TotalView decides which front-end tree to create as follows:
If TV::bluegene_mrnet_fe_topology is a non-empty string, it is used as-is and fed directly into MRNet to instantiate the front-end tree; otherwise,
If TV::bluegene_mrnet_fe_host_list is a non-empty string, it must be a unique list of front-end host names or IP addresses that is used to calculate the MRNet front-end tree; otherwise,
A single root node on the front-end host is used.
A Single Root Node on the Front-End
By default, TotalView creates a single root node on the front-end host, and the server or mrnet_commnode processes running on the IO nodes connect back to the root using the IO network interface on the front-end host.
Consider an example where the IO network interface on a front-end node has the string -io appended to the primary host name. For example, dawndev4-io is the name for the IO network interface on dawndev4.
 
dawndev4> hostname
dawndev4
dawndev4> grep dawndev4 /etc/hosts
192.168.10.14 edawndev4 dawndev4-eth2 e4
134.9.39.52 dawndev4.llnl.gov dawndev4 dawndev4-eth3
134.9.8.5 dawndev4-nfs.llnl.gov dawndev4-nfs dawndev4-eth0
172.16.126.164 dawndev4-io dawndev4-eth1
dawndev4> /sbin/ifconfig | grep addr
eth0 Link encap:Ethernet HWaddr 00:1A:64:DD:B8:A6
inet addr:134.9.8.5 Bcast:134.9.8.255 Mask:255.255.255.0
eth1 Link encap:Ethernet HWaddr 00:1A:64:DD:B8:A7
inet addr:172.16.126.164 Bcast:172.16.126.255 Mask:255.255.255.0
eth2 Link encap:Ethernet HWaddr 00:1A:64:47:E3:5A
inet addr:192.168.10.14 Bcast:192.168.10.255 Mask:255.255.255.0
eth3 Link encap:Ethernet HWaddr 00:1A:64:47:E3:5B
inet addr:134.9.39.52 Bcast:134.9.39.255 Mask:255.255.255.0
Above, we can see that dawndev4 has four network interfaces (not including the 127.0.0.1 loopback address), and eth1 is the network interface connected to the IO node network.
On systems that do not use this naming convention, the TotalView state variable TV::bluegene_io_interface must be set to the interface name, host name or IP address of an IO network connected interface on the front-end node. For more information, see "IBM Blue Gene Applications".
Explicit Front-End Topology String
If an explicit front-end topology string is specified with TV::bluegene_mrnet_fe_topology, it is used as-is and fed directly into MRNet to instantiate the front-end tree. The topology string must be a valid MRNet topology string. MRNet attempts to instantiate the front-end tree using its built-in launch support. If this fails, no tree is built and TotalView does not gain control of the job.
When specifying a front-end topology string, the following must be considered:
The root of the tree must be a network interface on the same node where TotalView is running. For example, on dawndev4, the root must be dawndev4 or dawndev4-io, or even localhost if there are additional sub-nodes.
The network interface specified at a leaf must be accessible via the IO network. For example, on dawndev4, specifying dawndev4-io as a leaf will work, specifying localhost as a leaf will not work, and specifying dawndev4 may or may not work depending on the connectivity between an IO node and the network interface associated with dawndev4.
If a single front-end host is specified to run multiple processes (e.g., fe4:0=>fe4:1 fe4:2;), MRNet uses fork() and exec() to spawn the mrnet_commnode processes.
If multiple front-end hosts are specified to run multiple processes (e.g., fe4:0=>fe1:0 fe2:0;), rsh/ssh is used to spawn the mrnet_commnode processes. In this case, the user must be able to rsh or ssh to the front-end nodes, and may need to set XPLAT_RSH using the TV::xplat_rsh state variable or -xplat_rsh command option described above.
Explicit List of Front-End Nodes
If an explicit list of front-end nodes is specified with TV::bluegene_mrnet_fe_host_list, it is used to calculate a topology string, which is then fed into MRNet to instantiate the front-end tree. The MRNet tree calculation controls for tree fan-out, tree depth, and an extra root node are used in the calculation. MRNet attempts to instantiate the front-end tree using its built-in launch support.
When specifying a list of front-end nodes, the following must be considered:
The root of the tree used by TotalView is the same as the host name or IP address that is used in the single root node case (see above). For example, on dawndev4, dawndev4-io is used.
The list of hosts specifies the leaves of the front-end tree. Therefore, the list of hosts must specify a network interface accessible via the IO network. For example, on dawndev4, specifying “dawndev1-io dawndev2-io” as the list will work, but specifying “dawndev1 dawndev2” may or may not work depending on the connectivity between an IO node and the network interface associated with dawndev1 and dawndev2.
Depending on the setting of the MRNet tree calculation controls for tree fan-out, tree depth, and an extra root node, additional interior mrnet_commnode processes may be run on the front-end nodes specified by the list.
The hosts on the list must be able to connect back to the root of the tree, and depending on the calculated topology may need to be able connect back to each other.
The list of hosts can include the root host, but a host name must not appear more than once on the list.
If a single front-end host is used to run multiple processes, MRNet uses fork() and exec() to spawn the mrnet_commnode processes.
If multiple front-end hosts are specified to run multiple processes, rsh/ssh is used to spawn the mrnet_commnode processes. In this case, the user must be able to rsh or ssh to the front-end nodes, and may need to set XPLAT_RSH.