CUDA Debugging Model and Unified Display
Debugging CUDA programs presents some challenges when it comes to setting action points. When the host process starts, the CUDA threads don’t yet exist and so are not visible to the debugger for setting breakpoints. (This is also true of any libraries that are dynamically loaded using dlopen and against which the code was not originally linked.)
To address this issue, TotalView allows setting a breakpoint on any line in the Source View, whether or not it can identify executable code for that line. The breakpoint becomes either a pending breakpoint or a sliding breakpoint until the CUDA code is loaded at runtime.
The Source Pane provides a unified display that includes line number symbols and breakpoints that span the host executable, host shared libraries, and the CUDA ELF images loaded into the CUDA threads. This design allows you to easily set breakpoints and view line number information for the host and GPU code at the same time. This is made possible by the way CUDA threads are grouped, discussed in the section
The TotalView CUDA Debugging Model.