CUDA and MRNet
The CUDA API has a limitation that allows a debugger process (such as the tvdsvr) to debug just one target process using a GPU. If you are using the MRNet (Multicast/Reduction Network) infrastructure model and are running multiple CUDA processes on a node, you’ll need to set the TV::mrnet_super_bushy to true.
This setting creates a "super bushy" MRNet tree by launching one MRNet tvdsvr process per target MPI process, instead of the default in which TotalView launches one tvdsvr process per node.
See TV::mrnet_super_bushy in the Classic TotalView Reference Guide.