High TLB rates with certain multi-threaded target programs
When reverse debugging an application in which many threads make frequent system calls on a multi-processor platform, binding the application process to a single processor can improve performance. This is because such applications put stress on ReplayEngine's heap management, which in turn stresses the processor's TLB (translation lookaside buffer). If the application is bound to a single processor, it is less likely to suffer TLB misses caused by process migration. Since user threads are automatically serialized during reverse debugging, there is no loss of concurrency due to binding.
If the application is to be launched under NextGen TotalView for HPC, one way to accomplish binding is to preface the totalview command with a taskset(1) command specifying a single processor. For example:
To accomplish binding when NextGen TotalView for HPC is to be attached to a running application, find the PID (process identifier) of the application process, and use taskset to bind that process to a single processor before attaching to it with NextGen TotalView for HPC. For example:
taskset --pid --cpu-list 3 <PID of myapp>
We have noticed the need for such binding when debugging MySQL applications with ReplayEngine.