The TotalView AMD ROCm Debugging Model

TotalView : TotalView User Guide : PART V GPU Debugging : Debugging AMD ROCm Programs : AMD ROCm Debugging Model and Unified Display : The TotalView AMD ROCm Debugging Model

Each GPU agent, or device, is represented in the debugger as a single TotalView thread, called the “AMD GPU agent TotalView thread” or GPU agent thread for short in this documentation. (This is not to be confused with a single thread of execution, sometimes called a “work-item” or “lane.”). In TotalView, CPU threads have positive IDs, while GPU agent threads have negative IDs.

Breakpoints apply to both the CPU and GPU code

The address spaces of the Linux CPU process and the address spaces of the ROCm threads are placed into the same share group. Breakpoints are created and evaluated within the share group, and apply to all of the image files (executable, shared libraries, and ROCm ELF images) in the share group.

That means that a source-level breakpoint can apply to both the CPU and GPU code. This allows setting breakpoints on source lines in the CPU code that are then planted at the same source location in the GPU code, once the GPU kernel starts.

GPU and Linux address spaces overlap and each agent is mapped to all other agents

Unlike CUDA GPUs, the AMD GPU address spaces and Linux process address spaces overlap within a process. Each device in a process gets its own range of virtual addresses. In addition, the global memory of every GPU agent in a process is mapped into every other agent; for example, if there are eight GPUs in the system, each GPU agent is also mapped into the other GPU agents, as well as into the Linux process. However, TotalView models the global address spaces of the Linux process and GPU agents as all being discrete.

Consider a Linux process consisting of two Linux pthreads and two AMD GPU agents. Figure 149 illustrates how TotalView would group the Linux and AMD GPU agents.

NOTE: An AMD GPU agent is represented in TotalView as a thread with a negative thread ID, called the GPU agent thread.

Figure 149, TotalView AMD GPU debugging model

The Linux host ROCm process

A Linux host AMD GPU process consists of:

A Linux process address space, containing a Linux executable and a list of Linux shared libraries.

A collection of Linux threads, where a Linux thread:

Is assigned a positive debugger thread ID.

Shares the Linux process address space with other Linux threads.

A collection of AMD GPU agents, where a GPU agent:

Is assigned a negative TotalView thread ID.

Has its own global address space, separate from the Linux process address space, and separate from the global address spaces of other GPU agents.

Has a "GPU focus thread," which is focused on a specific hardware work-item (also known as a lane).

The above TotalView AMD ROCm debugging model is reflected in the TotalView user interface and command line interface. In addition, ROCm-specific CLI commands allow you to inspect ROCm work-items, change the focus, and display their status.