NextGen TotalView for HPC Reference Guide : PART I Using the CLI : Chapter 2 CLI Commands : dcuda
dcuda
Manages GPU threads
Format 
dcuda block [(Bx,By,Bz)]
dcuda thread [(Tx,Ty, Tz)]
dcuda kernel
dcuda device [<n>]
dcuda sm [<n>]
dcuda warp [<n>]
dcuda lane [<n>]
dcuda info-system
dcuda info-device
dcuda info-sm
dcuda info-warp
dcuda info-lane
dcuda focus (Bx,By,Bz),(Tx,Ty, Tz)
dcuda hwfocus <D/S/W/L>
Arguments 
Bx, By, Bz
The x, y and z block indices
Tx, Ty, Tz
The x, y, and z thread indices
D/S/W/L
The coordinates defining the physical space of the hardware:
D: device number
S: streaming multiprocessor (SM)
W: warp (WP) number on the SM
L: lane (LN) number on the warp
Description 
The dcuda commands allow you to manage and view GPU threads, in either the logical coordinate space of block and thread indices (<<<(Bx,By,Bz),(Tx,Ty,Tz)>>>) or the physical coordinate space that defines the hardware (the device number, the streaming multiprocessor number on the device, the warp number on the SM, and lane number on the warp).
dcuda block [(Bx,By,Bz)]
With no arguments, shows the current CUDA block
With a block argument of the form (Bx,By,Bz), changes the CUDA focus to that block. Parameters to the right (By and Bz, or just Bz) may be omitted; these are unchanged.
dcuda thread [(Tx,Ty,Tz)]
With no arguments, shows the current CUDA thread.
With a thread argument of the form (Tx,Ty,Tz), changes the CUDA focus to that thread. Parameters to the right (Ty and Tz, or just Tz) may be omitted; these are unchanged.
dcuda kernel
Displays the logical and hardware coordinates of the current CUDA context.
dcuda device [<n>]
With no arguments, shows the current CUDA device.
With a numeric argument, changes the CUDA device focus to that device.
dcuda sm [<n>]
With no arguments, shows the current CUDA SM (streaming multiprocessor).
With a numeric argument, changes the CUDA SM focus to that SM.
dcuda warp [<n>]
With no arguments, shows the current CUDA warp.
With a numeric argument, changes the CUDA warp focus to that warp.
dcuda lane [<n>]
With no arguments, shows the current CUDA lane.
With a numeric argument, changes the CUDA lane focus to that lane.
dcuda info-system
Displays the CUDA devices in the system.
dcuda info-device
Displays currently running SMs in the current device.
dcuda info-sm
Displays valid warps in the current SM.
dcuda info-warp
Displays valid lanes in the current warp.
dcuda info-lane
Displays the current lane.
dcuda focus (Bx,By, Bz),(Tx,Ty,Tz)
Changes the focus via CUDA logical coordinates of the form <<<(Bx,By,Bz),(Tx,Ty,Tz)>>>.
The following abbreviations are also accepted:
<<<Tx>>>
<<<(Tx)>>>
<<<(Tx,Ty)>>>
<<<(Tx,Ty,Tz)>>>
<<<(Bx),(Tx)>>>
<<<(Bx),(Tx,Ty)>>>
<<<(Bx),(Tx,Ty,Tz)>>>
<<<(Bx,By),(Tx)>>>
<<<(Bx,By),(Tx,Ty)>>>
<<<(Bx,By),(Tx,Ty,Tz)>>>
<<<(Bx,By,Bz),(Tx)>>>
<<<(Bx,By,Bz),(Tx,Ty)>>>
<<<(Bx,By,Bz),(Tx,Ty,Tz)>>>
Angle brackets are optional, but must be balanced.
dcuda hwfocus <D/S/W/L>
Changes the focus via CUDA hardware coordinates of the form D/S/W/L, S/W/L, W/L, or L.
Command alias 
Alias
Definition
Description
cuda
dcuda
Writes out the focus thread, as in dcuda kernel.
Examples 
Displaying device information
dcuda info-device
Output:
DEV: 0/1 Device Type: gt200 SM Type: sm_13 SM/WP/LN: 30/32/32 Regs/LN: 128
SM: 0/30 valid warps: 0x0000000000000001
 
dcuda info-sm
Output:
DEV: 0/1 Device Type: gt200 SM Type: sm_13 SM/WP/LN: 30/32/32 Regs/LN: 128
SM: 0/30 valid warps: 0x0000000000000001
WP: 0/32 valid/active/divergent lanes: 0x0000000f/0x0000000f/0x00000000 block: (0,0,0)
 
dcuda info-warp
Output:
DEV: 0/1 Device Type: gt200 SM Type: sm_13 SM/WP/LN: 30/32/32 Regs/LN: 128
SM: 0/30 valid warps: 0x0000000000000001
WP: 0/32 valid/active/divergent lanes: 0x0000000f/0x0000000f/0x00000000 block: (0,0,0)
LN: 0/32 pc=0x000000001ef2efa8 thread: (0,0,0)
LN: 1/32 pc=0x000000001ef2efa8 thread: (1,0,0)
LN: 2/32 pc=0x000000001ef2efa8 thread: (0,1,0)
LN: 3/32 pc=0x000000001ef2efa8 thread: (1,1,0)
 
dcuda info-lane
Output:
DEV: 0/1 Device Type: gt200 SM Type: sm_13 SM/WP/LN: 30/32/32 Regs/LN: 128
SM: 0/30 valid warps: 0x0000000000000001
WP: 0/32 valid/active/divergent lanes: 0x0000000f/0x0000000f/0x00000000 block: (0,0,0)
Displaying the focus
dcuda warp sm
Output:
sm 0 warp 0
 
dcuda lane device
Output:
device 0 lane 3
 
dcuda thread
Output:
thread (1,1,0)
 
dcuda kernel
Output:
device 0, sm 0, warp 0, lane 3, block (0,0,0), thread (1,1,0)
Changing the focus
In these commands, note that TotalView assigns CUDA threads a negative thread ID. In the examples here, the CUDA thread is labeled "1.-1".
 
dcuda thread (1,1,0)
Changes the CUDA focus to the thread represented by logical coordinates 1,1,0.
New CUDA focus (1.-1): device 0, sm 0, warp 0, lane 3, block (0,0,0), thread (1,1,0)
 
dcuda lane 2
Changes the CUDA focus to lane 2.
New CUDA focus (1.-1): device 0, sm 0, warp 0, lane 2, block (0,0,0), thread (0,1,0)
 
dcuda lane 1 sm 0
Changes the CUDA focus to lane 1 and to SM 0.
New CUDA focus (1.-1): device 0, sm 0, warp 0, lane 1, block (0,0,0), thread (1,0,0)
 
dcuda thread 0,0,0
Changes the CUDA focus to thread 0,0,0.
New CUDA focus (1.-1): device 0, sm 0, warp 0, lane 0, block (0,0,0), thread (0,0,0)
 
dcuda thread 1
Changes the CUDA focus to thread 1,0,0.
New CUDA focus (1.-1): device 0, sm 0, warp 0, lane 1, block (0,0,0), thread (1,0,0)
 
Related Topics
Using the CUDA Debugger in the TotalView for HPC User Guide