Restarting using LoadLeveler
If you checkpointed a LoadLeveler POE job, you cannot restart it with this command. You must resubmit the program as a LoadLeveler job to restart the checkpoint. You also need to do the following:
Set the
MP_POE_RESTART_SLEEP environment variable to an appropriate number of seconds.
Set the
Attach to none option with the Parallel tab of the Process Window File > Preferences dialog box. This is necessary because when attaching to POE, parallel tasks will not have been created yet, so you need to avoid trying to attach to them.
After you restart POE, start TotalView and attach to POE. POE tells TotalView when it is time to attach to the parallel task so that it can complete the restart operation.
When doing this, you cannot use the restart the checkpoint using this command. POE will tell TotalView when it is time to attach to the parallel task so that it can complete the restart.