The TotalView CUDA debugger is an integrated debugging tool capable of simultaneously debugging CUDA code that is running on the host system and the NVIDIA® GPU. CUDA support is an extension to the standard version TotalView, and is capable of debugging 64-bit CUDA programs. Debugging 32-bit CUDA programs is currently not supported.
Supported major features:
• Debug a CUDA application running directly on GPU hardware
• Set breakpoints, pause execution, and single step in GPU code
• View GPU variables in PTX registers, and in local, parameter, global, or shared memory
• Access runtime variables, such as threadIdx, blockIdx, blockDim, etc.
• Debug multiple GPU devices per process
• Support for the CUDA MemoryChecker
• Debug remote, distributed and clustered systems
• All host debugging features are supported, except ReplayEngine. For a list of supported hosts, please see the TotalView for HPC Supported Platforms guide.