You can detect global memory addressing violations and misaligned global memory accesses by enabling the CUDA Memory Checker feature.
To enable the feature, use one of the following:
• Pass the -cuda_memcheck option to the totalview command, for example:
totalview -cuda_memcheck
• Set the TV::cuda_memcheck CLI state variable to true. For example:
dset TV::cuda_memcheck true
Note that global memory violations and misaligned global memory accesses will be detected only while the CUDA thread is running. Detection will not happen when single-stepping the CUDA thread.