Hangs or Initialization Failures
When starting an AMD GPU debugging session, you may encounter hangs in the debugger or target application, initialization failures, or failure to launch a kernel. Use the following checklist to diagnose the problem:
Serialized Access
Older AMD GPU models (prior to MI200s) support at most one AMD GPU debugging session active per node at a time. A node cannot be shared for debugging ROCm code simultaneously by multiple user sessions, or multiple sessions by the same user. Use ps or other system utilities to determine if your session is conflicting with another debugging session.
Orphaned Processes
Occasionally, a debugging session might accidentally orphan a process. Orphaned processes might go compute bound or prevent you or other users from starting a debugging session. You may need to manually kill orphaned ROCm processes in order to start your AMD GPU debugging session or stop a compute-bound process. Use system tools such as ps or top to find the processes and kill them using the shell kill command. If the process was orphaned by another user, that user will own the processes and you may not be able to kill them. In this case, ask the user or system administrator to kill them for you.