Content-type: text/html
Manpage of echkpnt
echkpnt
Section: Maintenance Commands (8)
Updated: SkyForm AIP Version 10.25.0 - April 2025
Index
Return to Main Contents
NAME
echkpnt - checkpoints a process or process group of AIP batch jobs
SYNOPSIS
echkpnt [-c] [-f] [-k | -s] [-d checkpoint_dir] [-x]
process_group_ID
echkpnt [-h | -V]
DESCRIPTION
echkpnt is the interface through which jobs are checkpointed.
By default, jobs continue to execute after they have been checkpointed.
Some options in the synopsis are only supported by some operating systems.
echkpnt
-
echkpnt is located in CB_SERVERDIR.
-
echkpnt sends checkpoint instructions provided through job submission or
through queue parameters to
custom echkpnt methods that you define. Communication is achieved
through the common syntax indicated in the synopsis.
echkpnt.method_name
-
echkpnt.method_name is a custom echkpnt that attempts to run a custom
checkpoint program or script.
-
The name of the custom echkpnt method must be echkpnt.method_name,
where method_name is a name you assign to your custom echkpnt. The
method_name must be specified either with CB_ECHKPNT_METHOD
as an environment variable and/or when the job is submitted with the
csub -k option such as csub -k "mydir 120 method=myapp" job1.
-
echkpnt.method_name must support the syntax described in the synopsis.
-
echkpnt.method_name is placed in CB_SERVERDIR.
OPTIONS
- -c
-
-
This option is supported on only some operating systems such as Convex.
-
Copies to the checkpoint directory all files in use by the checkpointed process.
- -f
-
-
This option is supported on only some operating systems such as Convex.
-
Forces a job to be checkpointed even if non-checkpointable conditions exist (these
conditions are specific to the type of checkpoint facility being used). This option may
create checkpoint files that will not restart properly.
- -k | -s
-
-
Options -k and -s are mutually exclusive.
-
-k
-
-
Kills a job after it has been successfully checkpointed. If this option is specified
and the checkpoint fails for any reason, the job continues to execute.
-
-s
-
-
This option is supported on only some operating systems such as SGI systems
running IRIX, and Convex.
-
Stops a job after it has been successfully checkpointed. If this option is specified
and the checkpoint fails for any reason, the job continues to execute.
- -d checkpoint_dir
-
-
Specifies the checkpoint directory. Specify a relative or absolute path name.
-
When a job is checkpointed, the checkpoint information is stored in
checkpoint_dir/job_ID/file_name. Multiple jobs can checkpoint into the same
directory. The system can create multiple files.
-
The checkpoint directory is used for restarting the job (see crestart(1)).
-x
-
This option is for compatibility only. It should be ignored.
- process_group_ID
-
-
ID of the process or process group to be checkpointed.
- -h
-
-
Prints command usage to stderr and exits.
- -V
-
-
Prints AIP release version to stderr and exits.
SEE ALSO
csub(1), cchkpnt(1), crestart(1), erestart(8)
DIAGNOSTICS
echkpnt
-
Exits with a 0 if the checkpoint operation succeeds. Otherwise, exits with the value
of echkpnt.default or echkpnt.method_name.
echkpnt.default
-
This should be linked to a specific echkpnt.method, which takes it as
the default checkpointing method.
echkpnt.method_name
-
At least one custom checkpoint method should exist, echkpnt.method_name
exits with a 0 if it succeeds in checkpointing the job. Non-zero values indicate job
checkpoint failed. All messages written to stdout and stderr are directed to
/dev/null and ignored by AIP.
-
To save standard error and standard out messages for echkpnt.method_name,
set CB_ECHKPNT_KEEP_OUTPUT=y before submitting the job.
The stdout and stderr output generated by echkpnt.method_name will
be redirected to:
-
- checkpoint_dir/$CB_JOBID/echkpnt.out
-
- checkpoint_dir/$CB_JOBID/echkpnt.err
LIMITATIONS
If you use echkpnt.method_name, once the job has been submitted, the job will be
checkpointed with the method that was specified at job submission or with the
parameter CB_ECHKPNT_METHOD. You cannot change the method with the
bmod command.
If you submit a job and do not specify a custom method, and
CB_ECHKPNT_METHOD is not defined, echkpnt.default will be used. You
will not be able to change this with bmod.
It is the cluster administrator's responsibility to ensure that method name and method
directory combinations are unique in the cluster.
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- echkpnt
-
- echkpnt.method_name
-
- OPTIONS
-
- SEE ALSO
-
- DIAGNOSTICS
-
- echkpnt
-
- echkpnt.default
-
- echkpnt.method_name
-
- LIMITATIONS
-
This document was created by
man2html,
using the manual pages.
Time: 18:57:47 GMT, April 23, 2025