About the TotalView CUDA Debugger
The TotalView CUDA debugger is an integrated debugging tool capable of simultaneously debugging CUDA code that is running on the host system and the NVIDIA® GPU. CUDA support is an extension to the standard version TotalView, and is capable of debugging 64-bit CUDA programs. Debugging 32-bit CUDA programs is currently not supported.
Supported major features:
![*](p4-bullet.jpg)
Debug CUDA application running directly on GPU hardware
![*](p4-bullet.jpg)
Set breakpoints, pause execution, and single step in GPU code
![*](p4-bullet.jpg)
View GPU variables in PTX registers, local, parameter, global, or shared memory
![*](p4-bullet.jpg)
Access runtime variables, such as threadIdx, blockIdx, blockDim, etc.
![*](p4-bullet.jpg)
Debug multiple GPU devices per process
![*](p4-bullet.jpg)
Support for the CUDA MemoryChecker
![*](p4-bullet.jpg)
Debug remote, distributed and clustered systems
![*](p4-bullet.jpg)
Support for directive-based programming languages
![*](p4-bullet.jpg)
Support for host debugging features
Requirements:
The CUDA SDK and a host distribution supported by NVIDIA. For SDK versions and supported NVIDIA GPUs, please see the
TotalView Supported Platforms guide.