How TotalView Creates Groups
TotalView places processes and threads into groups as your program creates them, except for the lockstep groups, which are created or changed whenever a process or thread hits an action point or is stopped for any reason. Here is a rundown on how these groups are created.
As soon as a program starts, TotalView creates the two process groups: a control group and a share group. It also creates a workers group, containing the thread in the main() routine. There is no lockstep group yet, since lockstep groups contain only stopped threads. As new threads are spawned and begin to run, they also are added to these three groups.
Let’s consider a few scenarios common to parallel program debugging.
On this page:
Groups Created When a Program Calls fork()/exec()
TotalView can automatically attach to child processes created when a process being debugged calls fork() or vfork(). Here’s a typical flow:
1. TotalView is started on a program named "a.out". TotalView names the process "a.out" and automatically places it in Control Group 1 and Share Group 2.
2. During execution, process "a.out" makes a fork() system call to create a child process. If TotalView is configured to automatically attach to child processes, it names the process "a.out.1". The child process remains in Control Group 1 because of the parent/child relationship, and it also remains in Share Group 2 because it is executing the program named "a.out".
3. During execution, process "a.out.1" (the child process) makes an execve() system call on a program named "b.out". TotalView renames the child process to "a.out<b.out>.1". The child process remains in Control Group 1, but is placed in Share Group 3 because it is now executing a program named "b.out".
4. All the threads in processes "a.out" and "a.out<b.out>.1" are placed in Worker Group 4, because neither program creates manager threads, and both processes are members of the same Control Group.
|
Setting breakpoints when acquiring processes using fork or exec |
Setting Breakpoints When Using the fork()/execve() Functions |
|
Use TotalView TV::exec_handling and -exec_handling options to control whether new processes are stopped or allowed to continue running |
|
|
Choose to either attach or detach from the new child process. |
Groups Created for MPI Programs
TotalView can automatically attach to MPI processes created by an MPIR starter program (such as, mpirun), for example, when you launch the MPI starter process under TotalView. Here's a typical flow:
1. TotalView is started on the MPI starter program named "mpirun". TotalView names the process "mpirun" and automatically places it in Control Group 1 and Share Group 2.
2. During execution, the "mpirun" process launches the MPI processes, and waits for the debugger to attach to the MPI processes before allowing them to execute the application code. The MPI application is an MPMD-style application, where some of the MPI processes run the executable "a.out", and others run the executable "b.out".
3. When TotalView detects that the "mpirun" process has launched the MPI processes, it automatically attaches to them. The MPI processes are placed in Control Group 1 with the MPI starter process because effectively the MPI starter process is the parent of the MPI processes. The MPI starter process remains in Share Group 2, but two new share groups are created for the MPI processes: Share Group 3 for the MPI processes executing "a.out" and Share Group 4 for the MPI processes executing "b.out".
4. All the threads in the MPI starter process and MPI processes are placed in Worker Group 5, because none of the processes create manager threads and all processes are members of the same Control Group.
Groups Created for CUDA Programs
The take home point for group creation when debugging CUDA programs is that CUDA threads are placed in the same share group as are their host Linux processes. Because CUDA threads and the host process are all in the same share group, you can create pending or sliding breakpoints on source lines and functions in the GPU code before the code is loaded onto the GPU. This organization allows support for a unified Source view display, where the breakpoint and source line information of the code running on the GPU is unified with the code running on the host CPU.
|
How groups are created when debugging CUDA programs |
|
|
More on the unified Source view display |
Unified Source View and Breakpoint Display and Unified Source View Display |