Using Watchpoints on Different Architectures
Add entry for rockM and CUDA does not support. Built on top of hardware-specific feature. Here it says "watchpoints are not available" add nVIDIA GPUs or CUDA.
The number of watchpoints, and their size and alignment restrictions, differ from platform to platform. This is because TotalView relies on the operating system and its hardware to implement watchpoints.
Watchpoint support depends on the target platform where your application is running, not on the host platform where TotalView is running.
For example, if you are running TotalView on host platform "H" (where watchpoints are not supported), and debugging a program on target platform "T" (where watchpoints are supported), you can create a watchpoint in a process running on "T", but not in a process running on "H".
NOTE: Watchpoints are not available on the Mac OS X platform
The following list describes constraints that exist on each platform:
AMD:
Add entry for rockM. The explanation of what is going on is long-winded. PowerLE has long paragraph. Similar because it’s built on top of hardware specific feature; hardware may not support it. RockM watchpoints are weird cause they’re a shared resource on the node; if you have another user using watchpoints during a debug session on the node, then he is taking away your ability, reducing the number of watchpoints available to you. Only 4 watchpoints per node regardless of how many GPUs there are, there must be a separate gizmo that watches the addresses that the. One user can use all watchpoints on node then you’re left w/ one. Not implemented in the processor like it is for x86-64; it’s a feature of the chip and is thread-specific. watchpoints are down to the granularity of a process, it’s not like that. Will need to a pointer to this section.
On par with the length Power LE. Put entry in here and say for more info, see AMD detail, if this is going to be too long. A lot of considerations: here what to do to not shoot yourself in the foot.
Computer
Constraints
Linux x86-64 (AMD and Intel)
Watchpoints use the four hardware debugging registers in the x86 processor and also use the ptrace system call to manipulate those registers. You can create up to four watchpoints and each must be 1, 2, 4, or 8 bytes in length, and a memory address must be aligned for the byte length. For example, you must align a 4-byte watchpoint on a 4-byte address boundary.
Linux-PowerLE
On Linux-PowerLE platforms (but not Linux-Power big-endian platforms) TotalView uses the Linux kernel's ptrace() PowerPC hardware debug extension to plant watchpoints. The ptrace() interface implements a “hardware breakpoint” abstraction that reflects the capabilities of PowerPC BookE and server processors. If supported at all, the number of watchpoints varies by processor type. Typically, the PowerPC supports at least 1 watchpoint up to 8 bytes long. Systems with the DAWR feature support a watchpoint up to 512 bytes long. The watchpoint triggers if the referenced data address is greater than or equal to the watched address and less than the watched address plus length. Alignment constraints may apply. For example, the watched length may be required to be a power of 2, and the watched address may need to be aligned to that power of 2; that is, -(address % length) == 0.
Linux ARM64
TotalView supports watchpoints for ARMv8 processors using the hardware’s debug watchpoint registers. You can typically create up to four watchpoints (although some processors may have different limits, allowing from 2 to 16 watchpoints, or none at all). Each must be 1, 2, 4, or 8 bytes in length, and the watched memory address must be aligned for the byte length. Watchpoints cannot overlap.
Mac OSX
Watchpoints are not supported.
Typically, a debugging session doesn’t use many watchpoints. In most cases, you are only monitoring one memory location at a time. Consequently, restrictions on the number of values you can watch seldom cause problems.