Single-Stepping GPU Code

Totalview® for HPC User Guide : PART V Using the CUDA Debugger : Chapter 28 CUDA Debugging Tutorial : Controlling Execution : Single-Stepping GPU Code

TotalView allows you to single-step GPU code just like normal host code, but note that a single-step operation steps the entire warp associated with the GPU focus thread. So, when focused on a CUDA thread, a single-step operation advances all of the GPU hardware threads in the same warp as the GPU focus thread.

To advance the execution of more than one warp, you may either:

• set a breakpoint and continue the process

• select a line number in the source pane and select "Run To".

Execution of more than one warp also happens when single-stepping a __syncthreads() thread barrier call. Any source-level single-stepping operation runs all of the GPU hardware threads to the location following the thread barrier call.

Single-stepping an inlined function (nested or not) in GPU code behaves the same as single-stepping a non-inlined function. You can:

• step into an inlined function,

• step over an inlined function,

• run to a location inside an inlined function,

• single-step within an inlined function, and

• return out of an inlined function.