Tools > Restart Checkpoint
Use this dialog box to restore and restart all of the checkpointed processes. TotalView initially attaches to the base process, and if there are parallel processes related to this base process, TotalView then attaches to them.
If an error occurs while attempting to restart the program from the checkpoint, information is displayed in the Error area.
The CLI drestart command performs the same operations as this command.
This screen has the following fields:
Name
The name of the previously saved checkpoint file. The name used for the checkpoint file is set using an environment variable, so no value is needed in the Name field. See your POE documentation for more information.
Remote Host
Enter the name of a remote host on which to restart the process. Optional.
Group ID
Enter the name of a control group into which TotalView should place all created processes. Optional.
Under Options, the following options apply:
Attach parallel
This option is automatically selected and cannot be unselected. TotalView attaches to parallel processes as they are being created.
Use Same Hosts
If selected, the restart operation tries to use the same hosts as were used when the checkpoint was created. If TotalView cannot use the same hosts, the checkpoint operation fails.If not selected, TotalView uses any available hosts.
Under After Restart, the following options apply:
Halt
Parallel processes are held immediately after the location where the checkpoint occurred. TotalView attaches to these created parallel processes.
Restarting using LoadLeveler
If you checkpointed a LoadLeveler POE job, you cannot restart it with this command. You must resubmit the program as a LoadLeveler job to restart the checkpoint. You also need to do the following:
Set the
MP_POE_RESTART_SLEEP environment variable to an appropriate number of seconds.
Set the
Attach to none option with the Parallel tab of the Process Window File > Preferences dialog box. This is necessary because when attaching to POE, parallel tasks will not have been created yet, so you need to avoid trying to attach to them.
After you restart POE, start TotalView and attach to POE. POE tells TotalView when it is time to attach to the parallel task so that it can complete the restart operation.
When doing this, you cannot use the restart the checkpoint using this command. POE will tell TotalView when it is time to attach to the parallel task so that it can complete the restart.
- Related Topics