Support for Cray Abnormal Termination Processing (ATP)
Cray's ATP module stops a running job at the moment it crashes. This allows you to attach TotalView to the held job and begin debugging it. To hold a job as it is crashing you must set the ATP_HOLD_TIME environment variable before launching your job with aprun or srun.
When your job crashes, the MPI starter process outputs a message stating that your job has crashed and that ATP is holding it. You can now attach TotalView tothe aprun or srun process using the normal attach procedure (see Attaching to a Running Program.
For more information on ATP, see the Cray intro_atp man page.