Chapter 27 CUDA Problems and Limitations
CUDA TotalView sits directly on top of the CUDA debugging environment provided by NVIDIA, which is still evolving and maturing. This environment contains certain problems and limitations, discussed in this chapter.
System Limitations
TotalView inherits some limitations from the CUDA debugging environment, depending on the SDK driver, as follows:
SDK 4.0, 4.1 and 4.2 Limitations
• Kernel launches: The CUDA debugging environment enforces blocking kernel launches.
• Device memory: Device memory allocated via cudaMalloc() is not visible outside the kernel function.
• Illegal program behavior: The debugger does not catch all illegal program behavior; examples include out of bounds memory accesses or divide-by-zero. For information on detecting addressing violations and errors in general, see
"Enabling CUDA MemoryChecker Feature" and
"GPU Error Reporting".
• Device allocations: Device allocations larger than 100 MB on Tesla GPUs, and larger than 32 MB on Fermi GPUs, may not be accessible in the debugger.
• Breakpoints: Breakpoints in divergent code may not behave as expected.
• Textures: Debugging applications using textures is not supported on GPUs with sm_type less than sm_20.
• Multiple CUDA contexts: For SDK driver 4.0, debugging applications with multiple CUDA contexts running on the same GPU is not supported on any GPU. For SDK 4.1, this limitation applies only to compute capabilities less than SM20.