GlideinWMS The Glidein-based Workflow Management System

WMS Factory

Custom Scripts

Description

This document describes how to write custom scripts to run in a glidein. Glidein Factory administrators may want to write them to implement features specific to their clients. Two examples are worker node validation, and discovery and setup of VO-specific software.
PS: The "scripts" can be any executable, also compiled binaries.

Script inclusion

A script is a file that was listed in the Glidein Factory or Frontend configuration file as being executable (see the attributes description):

executable="True"

By default the files listed are non executable, so an administrator needs explicitly list the executable ones.

Scripts are all executed once before starting the HTCondor glidein. Periodic scripts are invoked also later, repeatedly according to the period specified (in seconds):

period="3600"

Periodic script can know if they are run at setup or later by looking at the GLIDEIN_PERIODIC_SCRIPT environment variable, which is set only when they run in the following invocations, periodically.

Script API

A script is provided with exactly 2 arguments:

  1. The name of the glidein configuration file
  2. An entry id; this can be either main or the name of the entry point

All other input comes from the glidein configuration file that is used as a dashboard between different scripts.

If the script provides any output to be used by other scripts, it should write it in the glidein configuration file. If the values need to be published by the condor_startd or visible by the user jobs, the condor vars file should also be modified.

NOTE that periodic scripts provide an additional output mechanism: since they run via HTCondor startd_cron and the output is not masked, anything sent to standard output is added by HTCondor to the machine classad after adding a prefix (see the periodic scripts section below). If you want to be more compatible though, we recommend to use the the glidein configuration file and the condor vars file to specify if the variable needs to be published to the startd.

The script must return with exit code 0 if successful; a non-zero return value on the first invocation will stop the execution of the glidein with a validation error. A non-zero return value on following invocations of periodic scripts will notify the startd setting GLIDEIN_PS_OK to False (see below)

The glidein configuration file

The glidein configuration file acts as a dashboard between different scripts.

It is a simple ASCII file, with one value per line; the first column represents the attribute name, while the rest is the attribute value.
If the value does not contain any spaces, the easiest way to extract a value in bash is:

attr_val=`grep "^$attr_name " $glidein_config | awk '{print $2}'`

Several attributes are added by the default glidein scripts, the most interesting being:

  • ADD_CONFIG_LINE_SOURCE – Script that can be used to add new attributes to the glidein configuration file (see below).
  • GLIDEIN_Name – Name of the glidein branch
  • GLIDEIN_Entry_Name – name of the glidein entry point
  • TMP_DIR – The path to the temporary dir
  • PROXY_URL – The URL of the Web proxy

All attributes of the glidein Factory (both the common and the entry specific) are also loaded into this file.

To write into the glidein configuration file, the best approach in bash is to use the gconfig_add function. And to read from the glidein configuration file, use the gconfig_get function. Both are in the same support script. Just source the provided script and use it. Here is an example:

# get the glidein configuration file name
# must use glidein_config, it is used as global variable
glidein_config=$1
# import glidein_config functions
add_config_line_source=`grep '^ADD_CONFIG_LINE_SOURCE ' $glidein_config | awk '{print $2}'`
source $add_config_line_source
# add an attribute
gconfig_add myattribute myvalue
# read an attributes (set by you or some other script)
myvar=$(gconfig_get myattribute)

HTCondor vars file

The GlideinWMS uses a so called condor vars file to decide which attributes are going to be inserted into the condor configuration file, which are going to be published by the glidein condor_startd to the collector, and which attributes are going to be put into the job environment.

The condor vars file can be found from the glidein configuration file as

CONDOR_VARS_FILE

It is an ASCII file, with one entry per row. Each non comment line must have 7 columns. Each column has a specific meaning:

  1. Attribute name (will be extracted from the glidein configuration file)
  2. Attribute type
    • I – integer
    • S – quoted string
    • C – unquoted string (i.e. HTCondor keyword or expression)
  3. Default value, use – if no default
  4. HTCondor name, i.e. under which name should this attribute be known in the condor configuration
  5. Is a value required for this attribute?
    Must be Y or N. If Y and the attribute is not defined, the glidein will fail.
  6. Will condor_startd publish this attribute to the collector?
    Must be Y or N.
  7. Will the attribute be exported to the user job environment?
    • - - Do not export
    • + - Export using the original attribute name
    • @ - Export using the HTCondor name

The GlideinWMS defines several attributes in the default condor var files

glideinWMS/creation/web_base/condor_vars.lst
glideinWMS/creation/web_base/condor_vars.lst.entry

Here below, you can see a short extract. For all the options, look at dedicated configuration variables page.

# VarName               Type    Default         CondorName                      Req.    Export  UserJobEnvName
#                       S=Quote - = No Default  + = VarName                             HTCondor  - = Do not export
#                                                                                               + = Use VarName
#                                                                                               @ = Use CondorName
#################################################################################################################
X509_USER_PROXY         C       -               GSI_DAEMON_PROXY                Y       N       -
USE_MATCH_AUTH          C       -     SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION  N       N       -
GLIDEIN_Factory         S       -               +                               Y       Y       @
GLIDEIN_Name            S       -               +                               Y       Y       @
GLIDEIN_Collector       C       -               HEAD_NODE                       Y       N       -
GLIDEIN_Expose_Grid_Env C       False     JOB_INHERITS_STARTER_ENVIRONMENT      N       Y       +
TMP_DIR                 S       -               GLIDEIN_Tmp_Dir                 Y       Y       @
CONDORG_CLUSTER         I       -               GLIDEIN_ClusterId               Y       Y       @
CONDORG_SUBCLUSTER      I       -               GLIDEIN_ProcId                  Y       Y       @
CONDORG_SCHEDD          S       -               GLIDEIN_Schedd                  Y       Y       @
SEC_DEFAULT_ENCRYPTION  C       OPTIONAL        +                               N       N       -
SEC_DEFAULT_INTEGRITY   C       REQUIRED        +                               N       N       -
MAX_MASTER_LOG          I       1000000         +                               N       N       -
MAX_STARTD_LOG          I       10000000        +                               N       N       -

If you need to add anything to a condor vars file, the best approach in bash is to use the add_condor_vars_line support script. Just source the provided script and use it. Here is an example:

# get the condor vars file name
# must use condor_vars_file, it is used as global variable
condor_vars_file=`grep -i "^CONDOR_VARS_FILE " $glidein_config | awk '{print $2}'`
# import add_condor_vars_line function
add_config_line_source=`grep '^ADD_CONFIG_LINE_SOURCE ' $glidein_config | awk '{print $2}'`
source $add_config_line_source
# add an attribute
add_condor_vars_line myattribute type def condor_name req publish jobid

Reporting script exit status

GlideinWMS Factory can receive and interpret a detailed exit status report, if provided by the validation script.

The script should write the exit status report in the following file:

otrb_output.xml

The Factory provides a helper script to properly generate such a file. A detailed description of the format can be found in the dedicated description page.

To use the helper script, first discover its location with:

# find error reporting helper script
error_gen=`grep '^ERROR_GEN_PATH ' $glidein_config | awk '{print $2}'`

If the validation script succeeded, report the success by using:

# Everything worked out fine
"$error_gen" -ok <script name> [<key> <value>]*

You can specify any number of (key,value) pairs, representing any metrics you verified during your valudation run, if any.

If the validation script instead failed, report the failure by using:

# Uh oh, we hit an error
"$error_gen" -error <script name> <error type> "<detailed description>" [<key> <value>]*

The script should use one of the standard error types.
It should also provide a human readable detailed description. It is perfectly fine if it extends over multiple lines; just make sure you properly pass it to the script.
You can also specify any number of (key,value) pairs, representing any metrics that failed during the test. Proviind at least one metric is recommended, but not strictly necessary.

Note: The reported status MUST match the script exit code. E.g. if you claim the script succeeded, you must also exit with a 0 exit code.

Logging

Standard output and standard error of all custom scripts (except the periodic ones) is captured in the Glidein stdout and stderr and it is transferred back to the Factory by HTCondor at the end of the Glidein. Anyway this process may be insufficient: if the Glidein is killed the transfer may not happen, if there are multi-Glideins all their stdout/err are intermixed in the same files, and a user may desire to have this output back earlier or in a different place. For all these needs there is also a logging utility. It is defined in logging_utils.source, can be used in any custom script, requires a web server to receive the logging messages, and needs to set up GLIDEIN_LOG_RECIPIENTS_FACTORY as attr in the Factory configuration. The Web servers at the URLs in GLIDEIN_LOG_RECIPIENTS_FACTORY must be able to receive JWT-authenticated PUT requests HS256-encoded with the secret set in the Factory secret file (/var/lib/gwms-factory/server-credentials/jwt_secret.key). This secret file must be HMAC 256 compatible, e.g. a 32 bytes string. The Factory will create the file at startup if it is not there or empty. Scripts can use glog_setup, glog_write, and glog_send to set up, write, and checkpoint/upload log files. There is an example of how to use logging in logging_test.sh.

Periodic scripts

Scripts by default have period=0 and are invoked only once. The Factory/Frontend administrator can specify an integer number of seconds to make a script periodic. Periodic scripts are invoked a first time at the beginning, according to their order, like all other scripts, then they are invoked using the HTCondor damon ClassAd hook mechanism (aka schedd_cron) and a wrapper script that allows them to maintain the same API.

The Periodic scripts wrapper defines some additional variables in glidein_config and in the schedd ClassAds:

  • GLIDEIN_PS_FAILED_LIST - List of scripts that failed at least once
  • GLIDEIN_PS_FAILING_LIST - List of scripts that failed the last execution
  • GLIDEIN_PS_OK - True is no script failed its last execution (GLIDEIN_PS_FAILING_LIST is empty) At the beginning is published to schedd then directly
  • GLIDEIN_PS_FAILED_LAST - Name of the last script that failed execution
  • GLIDEIN_PS_FAILED_LAST_REASON - String describing the last failure
  • GLIDEIN_PS_FAILED_LAST_END - End time (seconds from Epoch) of the last failure
  • GLIDEIN_PS_LAST - File path of the last script
  • GLIDEIN_PS_LAST_END - end time of the last script execution (0 for script_wrapper.sh invoked at startup)

All these attributes can be used in the startd (e.g. start or shutdown expressions: start_expr="GLIDEIN_PS_OK =!= FALSE") or in other scripts.

Periodic scripts run via HTCondor startd_cron, the output is not masked, so anything sent to standard output is added by HTCondor to the machine ClassAd. To protect form clashes a prefix is added (via STARTD_CRON_>JobName<_PREFIX in HTCondor). By default the prefix is GLIDEIN_PS_ but you can change that by setting "prefix" in the file section of the Frontend or Factory configuration. The special value "NOPREFIX" unsets the automatic prefix allowing to set the variables as you output them. Be aware that you may overwrite system variables with unpleasant effects. If you want to be more compatible we recommend to use the the glidein configuration file and the condor vars file to specify if the variable needs to be published to the startd. See below for more.

Loading order

During the first invocation scripts are loaded and executed one at a time in order. Later, periodic scripts are re-executed one at the time according to the period. System scripts are coming with the framework, user scripts are the ones listed in the Factory and Frontend configuration files. First all files are downloaded, then the scripts are executed. Both the download and the first invocation follow the same order in six distinct stages:

  1. Global attributes are loaded and global system scripts executed.

  2. The user provided global files are loaded and user scripts are executed. First the Factory ones, then the Frontend ones (i.e. all the ones that have after_entry="False". False is the default for the Factory scripts, True for the Frontend ones). The (pre-entry) Frontend scripts are executed in the following order: pre-group scripts (in global config with after_group="False", which is the default), group scripts and after-group (i.e. after_group="True").

  3. The entry specific attributes are loaded and entry specific system scripts executed.

  4. The user provided entry specific files are loaded and entry specific user scripts are executed.

  5. The after_entry user provided global files are loaded and after_entry user scripts are executed. First the Frontend ones, then the Factory ones (i.e. all the ones that have set after_entry="True"). The (after-entry) Frontend scripts are executed in the following order: pre-group scripts (in global config with after_group="False", which is the default), group scripts and after-group (i.e. after_group="True").

  6. Final global system scripts are executed and the HTCondor daemons are launched.

The Glidein Factory configuration allows an administrator to specify the files/scripts mentioned in points 2, 4 and 5.
after_entry and after_group allow to set the belonging to one group or the other.
Within a group the files/scripts are loaded/executed in the order in which they are specified in the configuration file.

All scripts, periodic and not, are executed a first time according to the order above. Note that the wrapper scripts are not mentioned, because those are executed only right before the job.

Examples

The above documentation is hopefully providing enough information to write the scripts that will customize the glideins to your needs. Below are a few examples you can use as templates.

Test that a certain library exists

#!/bin/sh

glidein_config="$1"

# find error reporting helper script
error_gen=$(grep -m1 '^ERROR_GEN_PATH ' "$glidein_config" | awk '{print $2}')

if [ -z "/usr/lib/libcrypto.so.0.9.8" ]; then
  "$error_gen" -error "libtest.sh" "WN_Resource" "Crypto library not found." "file" "/usr/lib/libcrypto.so.0.9.8"
  exit 1
fi
echo "Crypto library found"
"$error_gen" -ok  "libtest.sh" "file" "/usr/lib/libcrypto.so.0.9.8"
exit 0

Find, test and advertise a software distribution

#!/bin/sh

glidein_config="$1"

###############
# Get the data

# find error reporting helper script
error_gen=$(grep -m1 '^ERROR_GEN_PATH ' "$glidein_config" | awk '{print $2}')

if [ -f "$VO_SW_DIR/setup.sh" ]; then
   source "$VO_SW_DIR/setup.sh"
else
  "$error_gen" -error "swfind.sh" "WN_Resource" "Could not find $VO_SW_DIR/setup.sh" \
              "file" "$VO_SW_DIR/setup.sh" "base_dir_attr" "VO_SW_DIR"
   exit 1
fi

tmpname="$PWD"/installed_software_tmp_$$.tmp
software_list> $tmpname


###########################################################
# Import add_config_line and add_condor_vars_line functions

add_config_line_source=$(grep -m1 '^ADD_CONFIG_LINE_SOURCE ' "$glidein_config" | awk '{print $2}')
# shellcheck source=./add_config_line.source
. "$add_config_line_source"
condor_vars_file=$(gconfig_get CONDOR_VARS_FILE "$glidein_config")


##################
# Format the data

sw_list=$(cat $tmpname | awk '{if (length(a)!=0) {a=a "," $0} else {a=$0}}END{print a}')

if [ -z "$sw_list" ]; then
  ERRSTR="No SW found.
But the setup script was present at $VO_SW_DIR/setup.sh."
  "$error_gen" -error "swfind.sh" "WN_Resource" "$ERRSTR" \
               "source_file" "$VO_SW_DIR/setup.sh"

  exit 1
fi

#################
# Export the data

gconfig_add GLIDEIN_SW_LIST "$sw_list"
add_condor_vars_line GLIDEIN_SW_LIST "S" "-" "+" "Y" "Y" "+"

"$error_gen" -ok  "swfind.sh" "sw_list" "$sw_list"
exit 0

Change an existing value based on conditions found

#!/bin/bash

glidein_config=$1
entry_dir=$2

# import add_config_line function, will use glidein_config
add_config_line_source=$(grep -m1 '^ADD_CONFIG_LINE_SOURCE ' "$glidein_config" | awk '{print $2}')
# shellcheck source=./add_config_line.source
. "$add_config_line_source"

# find the error reporting helper script
error_gen=$(gconfig_get ERROR_GEN_PATH "$glidein_config")

vo_scalability=$(gconfig_get VO_SCALABILITY "$glidein_config")

if [ -z "$vo_scalability" ]; then
  # set a reasonable default
  vo_scalability=5000
fi

tot_mem=`grep MemTotal /proc/meminfo |awk '{print $2}'`
if [ "$tot_mem" -lt 500000 ]; then
  if [ "$entry_dir" == "main" ]; then
    # all glideins need to scale down if low on memory
    let vo_scalability=vo_scalability/2
  elif [ "$entry_dir" == "florida23" ]; then
    # but florida23 can use a little more
    let vo_scalability=vo_scalability*5/4
  fi

  # write it back
  gconfig_add VO_SCALABILITY $vo_scalability
  "$error_gen" -ok  "memset.sh" "vo_scalability" "$vo_scalability"
  exit 0
fi
"$error_gen" -ok  "memset.sh"
exit 0