Content-type: text/html Manpage of RESS

RESS

Section: Maintenance Commands (8)
Updated: SkyForm AIP Version 10.25.0 - April 2025
Index Return to Main Contents
 

NAME

ress - Resource Sensor (RESS) for the SkyForm AIP system  

SYNOPSIS

CB_SERVERDIR/ress.[master|host|hostname]  

DESCRIPTION

RESS is a customized resource plugin for the AIP Load Server (cbls). It provides customized resoure information to the the AIP system for jobs to consume. It is a custom written program that is called by CBLS to obtain site-specific resource data.

RESS must reside in the directory of CB_SERVERDIR and must be executable.

RESS must have one of the following file names:

- ress.master which will be executed by CBLS on the master host.

- ress.host which will be executed by CBLS on every AIP host.

- ress.hostname which will be executed by CBLS on the host with the host name of hostname

CBLS periodically looks for exexutable names ress.master, ress.host, and ress.hostname in the directory of $CB_SERVDIR, execute them automatically if any of them exists.  

RESS CODE LOGIC

RESS code logic should be a loop that periodically update resource data. Within each loop, it provides resource data in YAML format on its stdout, then sleep for a number of seconds. Then it should update the resource again.

For a resource that is not frequently updated, the sleep time between two loops could be long, for example, a number of hours.

The shortest data update time is 5 seconds.

The resource data it provides must be in YAML format. The detailed format is explained in the next section.

Example of a RESS code, ress.master:

#!/bin/bash
while true; do

  echo "- resource: clksnd"

  echo "  description: The second value of the current clock"

  echo "  type: number"

  echo "  direction: increase"

  echo "  value:" `date +%S`

  echo "  locale: master node01"

  echo "---"

  sleep 10
done

 

OUTPUT FORMAT

For each loop, the RESS output in stdout must be a complete YAML document. The document should start with a sequence and end with a separate line of "---". This tells CBLS to stop reading the content and wait for the next loop.

Each sequence describes one resource. There could be multiple resources described.

Attributes of each resource are:

resource:
Defines the name of the resource. The resource name should not be longer than 32 characters. It must start with letter and cannot contain any character of .!-=+*/[]@:&|{}'`\".
description:
The description of the resource. It could contain any character. The total length must be 255 characters of shorter.
type:
The resource type. The valid values are: number, which means the resource value is numeric, either integer or floating point; text, which means the resource value is a free text that is shorter than 32 characters; tag, which means the resource not value and the resource name is served as a tag for the host. Jobs then can select host by using the tag, i.e. the resource name.
direction:
The direction indicates the moving direction of the resource from the best to the worse. It takes the value of "increase" or "decrease". The value of "increase" means the smaller the number is, the more resource the system has. For example, CPU utilization is an "increase" resource. The value of "decrease" means the large the number is, the more resource the system has. For example, free memory is an "decrease" resource. This parameter only applies to the resource with the type of "number".
release:
It takes value "yes" or "no". When release is specified as "yes", the resource is released when the job that requests the resource is suspended. The default value is "no", i.e. the resource is not released upon job suspension.
assign:
For "number" resource only. It takes value "yes" or "no". When assign is specified as "yes", the resource unit is enumerated, and the scheduler assigns specific resource unit(s) to the job that requests the resource. For example, the resource gpu has "assign" to set as "yes". If a job requests 2 gpu resource on host, the scheduler will assign speicif GPU number, for example GPU 2 and GPU 3 to the job. This way, the job knows which GPUs to use to avoid conflict with other jobs that requests GPU on the same host. The default value is "no", i.e. the resource unit is not assigned for each task (job slot).
slotresource:
For "number" resource only. If the value is set to "yes", the total amount of resource reserved for the job is the requested value in job resource requirement times the number of slots scheduled for the job. The default value is "no", i.e. the total amount of resource reserved for the job is the same as what specified in the resource requirement.
value:
The current value of the resource. The value could be a number for "number" resource, or text string for "text" resource. The value has no meaning for "tag" resource.
locale
Indicates the location of the resource reported. If this parameter is missing, the system considers the resource value is for the host that runs the RESS as the location. The location is a list of hosts that shares the resource. For host base resource, this parameter can be either the host name where the resource value is reported, or absent. For cluster wide resource, use the reserved word "all". If a resource is shared by a number of hosts, list host names of these hosts spearated by spaces. in minutes.
WARNING: locale must be the last field in the yaml output.

 

EXAMPLE

The following example reports two resources: a "number" resource and a "text" resource. The "locale" of both resources is the host that runs the ress.hostname.

#!/bin/bash
while true; do

  echo "- resource: gpu1"

  echo "  description: GPU utilization"

  echo "  type: number"

  echo "  direction: increase"

  echo "  value: 0.8

  echo "  release: yes"

  echo "- resource: network"

  echo "  description: my network"

  echo "  type: text"

  echo "  value: IB"

  echo "---"
sleep 10
done


 

Index

NAME
SYNOPSIS
DESCRIPTION
RESS CODE LOGIC
OUTPUT FORMAT
EXAMPLE

This document was created by man2html, using the manual pages.
Time: 18:57:47 GMT, April 23, 2025