Description
This document describes how to write custom scripts to run in a
glidein. Glidein Factory administrators may want to write them to
implement features specific to their clients. Two examples are worker
node validation, and discovery and setup of VO-specific software.
PS:
The "scripts" can be any executable, also compiled binaries.
Script inclusion
A script is a file that was listed in the Glidein Factory or Frontend configuration file as being executable (see the attributes description):
executable="True"
By default the files listed are non executable, so an administrator needs explicitly list the executable ones.
Scripts are all executed once before starting the HTCondor glidein. Periodic scripts are invoked also later, repeatedly according to the period specified (in seconds):
period="3600"
Script API
A script is provided with exactly 2 arguments:
- The name of the glidein configuration file
- An entry id; this can be either main or the name of the entry point
All other input comes from the glidein configuration file that is used as a dashboard between different scripts.
If the script provides any output to be used by other scripts, it should write it in the glidein configuration file. If the values need to be published by the condor_startd or visible by the user jobs, the condor vars file should also be modified.
NOTE that periodic scripts provide an additional output mechanism: since they run via HTCondor startd_cron and the output is not masked, anything sent to standard output is added by HTCondor to the machine classad after adding a prefix (see the periodic scripts section below). If you want to be more compatible though, we recommend to use the the glidein configuration file and the condor vars file to specify if the variable needs to be published to the startd.
The script must return with exit code 0 if successful; a non-zero return value on the first invocation will stop the execution of the glidein with a validation error. A non-zero return value on following invocations of periodic scripts will notify the startd setting GLIDEIN_PS_OK to False (see below)
The glidein configuration file
The glidein configuration file acts as a dashboard between different scripts.
It is a simple ASCII file, with one value per line; the first column
represents the attribute name, while the rest is the attribute value.
If the value does not contain any spaces, the easiest way to extract a
value in bash is:
attr_val=`grep "^$attr_name " $glidein_config | awk '{print $2}'`
Several attributes are added by the default glidein scripts, the most interesting being:
- ADD_CONFIG_LINE_SOURCE – Script that can be used to add new attributes to the glidein configuration file (see below).
- GLIDEIN_Name – Name of the glidein branch
- GLIDEIN_Entry_Name – name of the glidein entry point
- TMP_DIR – The path to the temporary dir
- PROXY_URL – The URL of the Web proxy
All attributes of the glidein Factory (both the common and the entry specific) are also loaded into this file.
To write into the glidein configuration file, the best approach in bash is to use the add_config_line support script. Just source the provided script and use it. Here is an example:
# get the glidein configuration file name
# must use glidein_config, it is used as global variable
glidein_config=$1
# import add_config_line function
add_config_line_source=`grep '^ADD_CONFIG_LINE_SOURCE ' $glidein_config | awk '{print $2}'`
source $add_config_line_source
# add an attributes
add_config_line myattribute myvalue
HTCondor vars file
The GlideinWMS uses a so called condor vars file to decide which attributes are going to be inserted into the condor configuration file, which are going to be published by the glidein condor_startd to the collector, and which attributes are going to be put into the job environment.
The condor vars file can be found from the glidein configuration file as
CONDOR_VARS_FILE
It is an ASCII file, with one entry per row. Each non comment line must have 7 columns. Each column has a specific meaning:
- Attribute name (will be extracted from the glidein configuration file)
- Attribute type
- I – integer
- S – quoted string
- C – unquoted string (i.e. HTCondor keyword or expression)
- Default value, use – if no default
- HTCondor name, i.e. under which name should this attribute be known in the condor configuration
- Is a value required for this attribute?
Must be Y or N. If Y and the attribute is not defined, the glidein will fail. - Will condor_startd publish this attribute to the
collector?
Must be Y or N. - Will the attribute be exported to the user job environment?
- - - Do not export
- + - Export using the original attribute name
- @ - Export using the HTCondor name
The GlideinWMS defines several attributes in the default condor var files
glideinWMS/creation/web_base/condor_vars.lst
glideinWMS/creation/web_base/condor_vars.lst.entry
Here below, you can see a short extract. For all the options, look at dedicated configuration variables page.
# VarName Type Default CondorName Req. Export UserJobEnvName # S=Quote - = No Default + = VarName HTCondor - = Do not export # + = Use VarName # @ = Use CondorName ################################################################################################################# X509_USER_PROXY C - GSI_DAEMON_PROXY Y N - USE_MATCH_AUTH C - SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION N N - GLIDEIN_Factory S - + Y Y @ GLIDEIN_Name S - + Y Y @ GLIDEIN_Collector C - HEAD_NODE Y N - GLIDEIN_Expose_Grid_Env C False JOB_INHERITS_STARTER_ENVIRONMENT N Y + TMP_DIR S - GLIDEIN_Tmp_Dir Y Y @ CONDORG_CLUSTER I - GLIDEIN_ClusterId Y Y @ CONDORG_SUBCLUSTER I - GLIDEIN_ProcId Y Y @ CONDORG_SCHEDD S - GLIDEIN_Schedd Y Y @ SEC_DEFAULT_ENCRYPTION C OPTIONAL + N N - SEC_DEFAULT_INTEGRITY C REQUIRED + N N - MAX_MASTER_LOG I 1000000 + N N - MAX_STARTD_LOG I 10000000 + N N -
If you need to add anything to a condor vars file, the best approach in bash is to use the add_condor_vars_line support script. Just source the provided script and use it. Here is an example:
# get the condor vars file name
# must use condor_vars_file, it is used as global variable
condor_vars_file=`grep -i "^CONDOR_VARS_FILE " $glidein_config | awk '{print $2}'`
# import add_condor_vars_line function
add_config_line_source=`grep '^ADD_CONFIG_LINE_SOURCE ' $glidein_config | awk '{print $2}'`
source $add_config_line_source
# add an attribute
add_condor_vars_line myattribute type def condor_name req publish jobid
Reporting script exit status
GlideinWMS Factory can receive and interpret a detailed exit status report, if provided by the validation script.
The script should write the exit status report in the following file:
otrb_output.xml
The Factory provides a helper script to properly generate such a file. A detailed description of the format can be found in the dedicated description page.
To use the helper script, first discover its location with:
# find error reporting helper script
error_gen=`grep '^ERROR_GEN_PATH ' $glidein_config | awk '{print $2}'`
If the validation script succeeded, report the success by using:
# Everything worked out fine
"$error_gen" -ok <script name> [<key> <value>]*
You can specify any number of (key,value) pairs, representing any metrics you verified during your valudation run, if any.
If the validation script instead failed, report the failure by using:
# Uh oh, we hit an error
"$error_gen" -error <script name> <error type> "<detailed description>" [<key> <value>]*
The script should use one of the
standard error types.
It should also provide a human readable detailed description.
It is perfectly fine if it extends over multiple lines;
just make sure you properly pass it to the script.
You can also specify any number of (key,value) pairs,
representing any metrics that failed during the test.
Proviind at least one metric is recommended, but not strictly necessary.
Note: The reported status MUST match the script exit code. E.g. if you claim the script succeeded, you must also exit with a 0 exit code.
Periodic scripts
Scripts by default have period=0 and are invoked only once. The Factory/Frontend administrator can specify an integer number of seconds to make a script periodic. Periodic scripts are invoked a first time at the beginning, according to their order, like all other scripts, then they are invoked using the HTCondor damon ClassAd hook mechanism (aka schedd_cron) and a wrapper script that allows them to maintain the same API.
The Periodic scripts wrapper defines some additional variables in glidein_config and in the schedd ClassAds:
- GLIDEIN_PS_FAILED_LIST - List of scripts that failed at least once
- GLIDEIN_PS_FAILING_LIST - List of scripts that failed the last execution
- GLIDEIN_PS_OK - True is no script failed its last execution (GLIDEIN_PS_FAILING_LIST is empty) At the beginning is published to schedd then directly
- GLIDEIN_PS_FAILED_LAST - Name of the last script that failed execution
- GLIDEIN_PS_FAILED_LAST_REASON - String describing the last failure
- GLIDEIN_PS_FAILED_LAST_END - End time (seconds from Epoch) of the last failure
- GLIDEIN_PS_LAST - File path of the last script
- GLIDEIN_PS_LAST_END - end time of the last script execution (0 for script_wrapper.sh invoked at startup)
All these attributes can be used in the startd (e.g. start or shutdown expressions: start_expr="GLIDEIN_PS_OK =!= FALSE") or in other scripts.
Periodic scripts run via
HTCondor startd_cron,
the output is not masked, so anything sent to standard output is added by HTCondor to the machine ClassAd. To protect form
clashes a prefix is added (via STARTD_CRON_
Loading order
During the first invocation scripts are loaded and executed one at a time in order. Later, periodic scripts are re-executed one at the time according to the period. System scripts are coming with the framework, user scripts are the ones listed in the Factory and Frontend configuration files. First all files are downloaded, then the scripts are executed. Both the download and the first invocation follow the same order in six distinct stages:
Global attributes are loaded and global system scripts executed.
The user provided global files are loaded and user scripts are executed. First the Factory ones, then the Frontend ones (i.e. all the ones that have after_entry="False". False is the default for the Factory scripts, True for the Frontend ones). The (pre-entry) Frontend scripts are executed in the following order: pre-group scripts (in global config with after_group="False", which is the default), group scripts and after-group (i.e. after_group="True").
The entry specific attributes are loaded and entry specific system scripts executed.
The user provided entry specific files are loaded and entry specific user scripts are executed.
The after_entry user provided global files are loaded and after_entry user scripts are executed. First the Frontend ones, then the Factory ones (i.e. all the ones that have set after_entry="True"). The (after-entry) Frontend scripts are executed in the following order: pre-group scripts (in global config with after_group="False", which is the default), group scripts and after-group (i.e. after_group="True").
Final global system scripts are executed and the HTCondor daemons are launched.
The Glidein Factory configuration allows an administrator to
specify the files/scripts mentioned in points 2, 4 and 5.
after_entry and after_group allow to set the belonging to one group or the other.
Within a group the files/scripts are loaded/executed in the order in which they are
specified in the configuration file.
All scripts, periodic and not, are executed a first time according to the order above. Note that the wrapper scripts are not mentioned, because those are executed only right before the job.
Examples
The above documentation is hopefully providing enough information to write the scripts that will customize the glideins to your needs. Below are a few examples you can use as templates.
Test that a certain library exists
#!/bin/sh glidein_config="$1" # find error reporting helper script error_gen=`grep '^ERROR_GEN_PATH ' $glidein_config | awk '{print $2}'` if [ -z "/usr/lib/libcrypto.so.0.9.8" ]; then "$error_gen" -error "libtest.sh" "WN_Resource" "Crypto library not found." "file" "/usr/lib/libcrypto.so.0.9.8" exit 1 fi echo "Crypto library found" "$error_gen" -ok "libtest.sh" "file" "/usr/lib/libcrypto.so.0.9.8" exit 0
Find, test and advertise a software distribution
#!/bin/sh glidein_config="$1" ############### # Get the data # find error reporting helper script error_gen=`grep '^ERROR_GEN_PATH ' $glidein_config | awk '{print $2}'` if [ -f "$VO_SW_DIR/setup.sh" ]; then source "$VO_SW_DIR/setup.sh" else "$error_gen" -error "swfind.sh" "WN_Resource" "Could not find $VO_SW_DIR/setup.sh" \ "file" "$VO_SW_DIR/setup.sh" "base_dir_attr" "VO_SW_DIR" exit 1 fi tmpname=$PWD/installed_software_tmp_$$.tmp software_list> $tmpname ########################################################### # import add_config_line and add_condor_vars_line functions add_config_line_source=`grep '^ADD_CONFIG_LINE_SOURCE ' $glidein_config | awk '{print $2}'` source $add_config_line_source condor_vars_file=`grep -i "^CONDOR_VARS_FILE " $glidein_config | awk '{print $2}'` ################## # Format the data sw_list=`cat $tmpname | awk '{if (length(a)!=0) {a=a "," $0} else {a=$0}}END{print a}'` if [ -z "$sw_list" ]; then ERRSTR="No SW found. But the setup script was present at $VO_SW_DIR/setup.sh." "$error_gen" -error "swfind.sh" "WN_Resource" "$ERRSTR" \ "source_file" "$VO_SW_DIR/setup.sh" exit 1 fi ################# # Export the data add_config_line GLIDEIN_SW_LIST "$sw_list" add_condor_vars_line GLIDEIN_SW_LIST "S" "-" "+" "Y" "Y" "+" "$error_gen" -ok "swfind.sh" "sw_list" "$sw_list" exit 0
Change an existing value based on conditions found
#!/bin/bash glidein_config=$1 entry_dir=$2 # find error reporting helper script error_gen=`grep '^ERROR_GEN_PATH ' $glidein_config | awk '{print $2}'` # import add_config_line function, will use glidein_config add_config_line_source=`grep '^ADD_CONFIG_LINE_SOURCE ' $glidein_config | awk '{print $2}'` source $add_config_line_source vo_scalability=`grep '^VO_SCALABILITY ' $glidein_config | awk '{print $2}'` if [ -z "$vo_scalability" ]; then # set a reasonable default vo_scalability=5000 fi tot_mem=`grep MemTotal /proc/meminfo |awk '{print $2}'` if [ "$tot_mem" -lt 500000 ]; then if [ "$entry_dir" == "main" ]; then # all glideins need to scale down if low on memory let vo_scalability=vo_scalability/2 elif [ "$entry_dir" == "florida23" ]; then # but florida23 can use a little more let vo_scalability=vo_scalability*5/4 fi # write it back add_config_line VO_SCALABILITY $vo_scalability "$error_gen" -ok "memset.sh" "vo_scalability" "$vo_scalability" exit 0 fi "$error_gen" -ok "memset.sh" exit 0