Content-type: text/html Manpage of echkpnt

echkpnt

Section: Maintenance Commands (8)
Updated: SkyForm AIP Version 10.25.0 - April 2025
Index Return to Main Contents

 

NAME

echkpnt - checkpoints a process or process group of AIP batch jobs

 

SYNOPSIS

echkpnt [-c] [-f] [-k | -s] [-d checkpoint_dir] [-x] process_group_ID

echkpnt [-h | -V]

 

DESCRIPTION

echkpnt is the interface through which jobs are checkpointed.

By default, jobs continue to execute after they have been checkpointed.

Some options in the synopsis are only supported by some operating systems.

 

echkpnt

echkpnt is located in CB_SERVERDIR.

echkpnt sends checkpoint instructions provided through job submission or through queue parameters to custom echkpnt methods that you define. Communication is achieved through the common syntax indicated in the synopsis.

 

echkpnt.method_name

echkpnt.method_name is a custom echkpnt that attempts to run a custom checkpoint program or script.

The name of the custom echkpnt method must be echkpnt.method_name, where method_name is a name you assign to your custom echkpnt. The method_name must be specified either with CB_ECHKPNT_METHOD as an environment variable and/or when the job is submitted with the csub -k option such as csub -k "mydir 120 method=myapp" job1.

echkpnt.method_name must support the syntax described in the synopsis.

echkpnt.method_name is placed in CB_SERVERDIR.

 

OPTIONS

-c

This option is supported on only some operating systems such as Convex.

Copies to the checkpoint directory all files in use by the checkpointed process.

-f

This option is supported on only some operating systems such as Convex.

Forces a job to be checkpointed even if non-checkpointable conditions exist (these conditions are specific to the type of checkpoint facility being used). This option may create checkpoint files that will not restart properly.

-k | -s

Options -k and -s are mutually exclusive.

-k

Kills a job after it has been successfully checkpointed. If this option is specified and the checkpoint fails for any reason, the job continues to execute.

-s

This option is supported on only some operating systems such as SGI systems running IRIX, and Convex.

Stops a job after it has been successfully checkpointed. If this option is specified and the checkpoint fails for any reason, the job continues to execute.

-d checkpoint_dir

Specifies the checkpoint directory. Specify a relative or absolute path name.

When a job is checkpointed, the checkpoint information is stored in checkpoint_dir/job_ID/file_name. Multiple jobs can checkpoint into the same directory. The system can create multiple files.

The checkpoint directory is used for restarting the job (see crestart(1)).

-x

This option is for compatibility only. It should be ignored.

process_group_ID

ID of the process or process group to be checkpointed.

-h

Prints command usage to stderr and exits.

-V

Prints AIP release version to stderr and exits.

 

SEE ALSO

csub(1), cchkpnt(1), crestart(1), erestart(8)

 

DIAGNOSTICS

 

echkpnt

Exits with a 0 if the checkpoint operation succeeds. Otherwise, exits with the value of echkpnt.default or echkpnt.method_name.

 

echkpnt.default

This should be linked to a specific echkpnt.method, which takes it as the default checkpointing method.

 

echkpnt.method_name

At least one custom checkpoint method should exist, echkpnt.method_name exits with a 0 if it succeeds in checkpointing the job. Non-zero values indicate job checkpoint failed. All messages written to stdout and stderr are directed to /dev/null and ignored by AIP.

To save standard error and standard out messages for echkpnt.method_name, set CB_ECHKPNT_KEEP_OUTPUT=y before submitting the job. The stdout and stderr output generated by echkpnt.method_name will be redirected to:

- checkpoint_dir/$CB_JOBID/echkpnt.out

- checkpoint_dir/$CB_JOBID/echkpnt.err

 

LIMITATIONS

If you use echkpnt.method_name, once the job has been submitted, the job will be checkpointed with the method that was specified at job submission or with the parameter CB_ECHKPNT_METHOD. You cannot change the method with the bmod command.

If you submit a job and do not specify a custom method, and CB_ECHKPNT_METHOD is not defined, echkpnt.default will be used. You will not be able to change this with bmod.

It is the cluster administrator's responsibility to ensure that method name and method directory combinations are unique in the cluster.


 

Index

NAME
SYNOPSIS
DESCRIPTION
echkpnt
echkpnt.method_name
OPTIONS
SEE ALSO
DIAGNOSTICS
echkpnt
echkpnt.default
echkpnt.method_name
LIMITATIONS

This document was created by man2html, using the manual pages.
Time: 18:57:47 GMT, April 23, 2025