The CUDA API has a limitation that allows a debugger process (such as the
tvdsvr) to debug just one target process using a GPU. If you are using the
MRNet (Multicast/Reduction Network) infrastructure model and are running multiple CUDA processes on a node, you’ll need to set the
TV::mrnet_super_bushy to
true.