GlideinWMS The Glidein-based Workflow Management System

GlideinWMS diagram main collector factory

Glidein Frontend

Configuration

Example Configuration

Below is an example Frontend configuration xml file. Click on any piece for a more detailed description.
<frontend downtimes_file="frontenddowntime" advertise_delay="5" frontend_name="vofrontend-v2_4" loop_delay="60" enable_attribute_expansion="False">
<log_retention >
<process_logs >
<process_log extension="info" max_days="7.0" max_mbytes="100.0" min_days="3.0" msg_types="INFO" backup_count="5" compression="gz" />
<process_log extension="debug" max_days="7.0" max_mbytes="100.0" min_days="3.0" msg_types="DEBUG,ERR,WARN" backup_count="5" />
</process_logs >
</log_retention >
<match match_expr="True" start_expr="$(JOBSTR_ATTR)<$$(JOBSTR_VAL)" policy_file="/path/to/python-policy-file">
<factory query_expr="True">
<match_attrs />
<collectors>
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=factory-server.fnal.gov" comment="" factory_identity="factoryuser@factory-server.fnal.gov" my_identity="frontenduser@frontend-server.fnal.gov" node="factory-server.fnal.gov:8618" />
</collectors>
</factory>
<job comment="" query_expr="(JobUniverse==5)&&($(JOBSTR_ATTR)<$$(JOBSTR_VAL))" >
<match_attrs />
<schedds>
<schedd DN="/DC=org/DC=doegrids/OU=Services/CN=userpool.fnal.gov" fullname="userpool.fnal.gov" />
<schedd DN="/DC=org/DC=doegrids/OU=Services/CN=userpool.fnal.gov" fullname="schedd_jobs1@userpool.fnal.gov" />
<schedd DN="/DC=org/DC=doegrids/OU=Services/CN=userpool.fnal.gov" fullname="schedd_jobs2@userpool.fnal.gov" />
</schedds>
</job>
</match>

<monitor base_dir="/var/www/html/vofrontend/monitor" flot_dir="/opt/javascriptrrd-0.6.3/flot" javascriptRRD_dir="/opt/javascriptrrd-0.6.3/src/lib" jquery_dir="/opt/javascriptrrd-0.6.3/flot" />
<monitor_footer display_txt="Legal Disclaimer" href_link="/site/disclaimer.html" />

<security classad_proxy="/etc/grid-security/hostcert.pem" comment="use hostcert here if not doing GSI authentication through GWMS versions 3.6.5" proxy_DN="/DC=org/DC=doegrids/OU=Services/CN=frontend-server.fnal.gov" proxy_selection_plugin="ProxyAll" security_name="frontenduser" sym_key="aes_256_cbc">
<credentials>
<credential absfname="/etc/osg/tokens/my_token.scitoken" security_class="frontend" trust_domain="OSG" type="scitoken" comment="generated by osg-token-renewer" />
<credential Comment="deprecated, use scitoken if possible" absfname="/tmp/x509up_u" security_class="frontend" trust_domain="OSG" type="grid_proxy" vm_id="123" vm_type="type1" pool_idx_len="5" pool_idx_list="2,4-6,10" />
</credentials>
</security>
<stage base_dir="/var/www/html/vofrontend/stage" use_symlink="True" web_base_url="http://frontend-server.fnal.gov:9000/vofrontend/stage" />
<work base_dir="/opt/vofrontend" base_log_dir="/opt/vofrontend/logs" />
<attrs>
<attr name="GLIDECLIENT_Rank" glidein_publish="False" job_publish="False " parameter="True" type="string" value="1" />
<attr name="GLIDECLIENT_Start" glidein_publish="False" job_publish="False" parameter="True" type="string" value="True" />
<attr name="GLIDEIN_Expose_Grid_Env" glidein_publish="True" job_publish="True" parameter="False" type="string" value="True" />
<attr name="USE_MATCH_AUTH" glidein_publish="False" job_publish="False" parameter="True" type="string" value="True" />
<attr name="JOBSTR_ATTR" glidein_publish="False" job_publish="False" parameter="True" type="string" value="NumJobStarts"/>
</attrs>
<groups>
<group name="main" enabled="True">
<config ignore_down_entries="True">
<idle_glideins_per_entry max="100" reserve="5" />
<idle_vms_per_entry curb="20" max="100" />
<idle_vms_total curb="500" max="1000" />
<running_glideins_per_entry max="2000" relative_to_queue="1.15" min="0" />
<running_glideins_total curb="30000" max="40000" />
<glideins_removal margin="0" requests_tracking="False" type="ALL" wait="0"/>
</config>
<match match_expr="True" start_expr="True" policy_file="/path/to/python-policy-file">
<factory query_expr="True">
<match_attrs />
<collectors />
</factory>
<job query_expr="True">
<match_attrs />
<schedds />
</job>
</match>
<security>
<credentials />
</security>
<attrs >
<attr name="JOBSTR_VAL" glidein_publish="False" job_publish="False" parameter="True" type="int" value="5"/>
</attrs>
<files />
</group>
</groups>
<files>
<file absfname="/opt/script/testSW.sh" after_entry="True" after_group="False" const="True" executable="True" untar="False" wrapper="False" />
<file absfname="/opt/script/testP.sh" after_entry="True" after_group="False" const="True" executable="True" period="1800" untar="False" wrapper="False" />
</files>
<ccbs>
<ccb DN="/DC=org/DC=doegrids/OU=Services/CN=ccb.fnal.gov" node="ccb.fnal.gov" />
<ccb DN="/DC=org/DC=doegrids/OU=Services/CN=ccb2.fnal.gov" node="ccb2.fnal.gov:9620-9640" group="group2" />
</ccbs>
<collectors>
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=usercollector.fnal.gov" node="usercollector.fnal.gov" secondary="False" group="default" />
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=usercollector.fnal.gov" node="usercollector.fnal.gov:9620-9819" secondary="True" group="default" />
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=usercollector2.fnal.gov" node="usercollector2.fnal.gov" secondary="False" group="ha" />
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=usercollector2.fnal.gov" node="usercollector2.fnal.gov:9620-9919" secondary="True" group="ha" />
</collectors>
<config>
<idle_vms_total curb="800" max="1200" />
<idle_vms_total_global curb="1000" max="1500" />
<running_glideins_total curb="35000" max="45000" />
<running_glideins_total_global curb="50000" max="60000" />
</config>
<high_availability check_interval="300" enabled="False">
<ha_frontends>
<ha_frontend frontend_name="vofrontend-v2_4"/>
</ha_frontends>
</high_availability>
</frontend>

Frontend Configuration

The Glidein Frontend configuration involves creating the configuration directory and files and then creating the daemons. As in the Glidein Factory set up, an XML file is converted into a configuration tree by a configuration tool.

For the installer to create the Glidein Frontend instance from the configuration directory and grid mapfile, the following objects can be defined:

Other attributes can be specified as well. They are used by the VO Frontend matchmaking and job matchmaking. The format is similar to the attributes on the Factory config file. The table below describes the <attrs ... > tag in more detail.

Refer to description of Custom HTCondor Variables section for the list of available variables.

Attribute Name

Attribute Description

name

Name of the attribute

value

Value of the attribute

parameter

Set to True if the attribute should be passed as a parameter. If set to False, the attribute will be put in the staging area to be accessed by the glidein startup scripts. Always set this to True unless you know what you are doing. See Note below for more details

glidein_publish

If set to True, the attribute will be available in the condor_startd's ClassAd. Used only if parameter is True.

job_publish

If set to True, the attribute will be available in the user job's environment. Used only if parameter is True.

comment

You can specify description of the attribute here.

type

Type of the attribute. Supported types are 'int', 'string' and 'expr'. Type expr is equivalent to condor constant/expression in condor_vars.lst

An example attribute would be:

<attrs><attr name="GLIDEIN_Collector" value="mymachine.mydomain" job_publish="True" parameter="True" const="True" glidein_publish="True" comment=“Just a test attribute”/>

Note: The attribute name "publish" is not available for the Frontend for the moment. It's for the Factory though, as indicated in the attributes section here. Although, to pass the Frontend controlled values and make them available in the ClassAd and the glidein submission on the Factory, you can set the attribute parameter="True". These attributes will be distinguished in the glidein submit environment with the prefix "GLIDEIN_PARAM_".

The following group parameters are used to configure multiple Frontends. If only one group is specified, they apply to all Frontends. The objects specified are used for creating and monitoring glideins. Groups are used to group users with similar requirements, such as proxies, criteria for matching job requirements with sites, and configuration of glideins.

Adding Custom Code/Scripts to Glidein Frontend Glideins

You can add custom scripts to glideins created for this Glidein Frontend by adding scripts and files to the configuration in the files section:
<glidein>
 [<groups><group>]
 <files>
  <file absfname="script name" executable="True" after_group="True" comment="comment"/>

The script/file will be copied to the Web-accessible area and added to one of the glidein's file lists, and when a glidein starts, the glidein startup script will pull it, validate it and execute i any action requested (execute, untar, just keep it, ...). after_entry and after_group can be used to affect the execution order (see writing custom scripts. If any parameters are needed, they can be specified using <attr />.

For more detailed information, see the page dedicated to writing custom scripts.

You can also create wrapper scripts or tar-balls of files, see the Factory configuration page for syntax. (Use groups/group tags instead of the Factory's entry tag). The variable name specified by absdir_outattr will be prepended by either GLIDECLIENT_ or GLIDECLIENT_GROUP_, depending on scope.

Match Expressions and Match Attributes

Several sections in the configuration allow a match expression. Each of these sections allows an expression to be evaluated to determine where glideins and jobs should be matched.
For example, expressions allowing a white list by the frontend can be created in order to control where the glideins are submitted. It can also allow you to give a HTCondor expression to specify where jobs can run or to specify which glidein_sites can run jobs.
There are two ways to restrict matching in most cases. Note that match_expr clauses, such as <match match_expr> will use python based expressions as explained below. Others, such as <factory query_expr> and <job query_expr> use HTCondor ClassAd expressions. For these, only valid HTCondor expressions can be used. Python expressions can not be evaluated in these contexts.
Note that, for some tags (like factory query_expr), you can specify expressions in both the default global section as well as in individual group sections. You should take special care before doing this to make sure the expressions are correct, as the expressions are typically "AND"-ed together.

Each match expression is a python expression that will be evaluated. Matches can be scoped to either global scope (<frontend><match>) or to a group specific scope.

Each python expression will typical be a series of boolean tests, surrounded by parentheses and connected by the boolean expressions "and", "or", and "not". You can use several dictionaries in these match expressions. The "job" dictionary contains the ClassAd of the job being matched, and the "glidein" dictionary contains information about the Factory (entry point) ClassAd. There is also a "attr_dict" dictionary that can reference attributes defined in the frontend.xml. While an extensive list of everything you can in these expressions is out of scope, some examples are below:

  • (job.has_key("ImageSize")): Returns true if the job ClassAd has the attribute "ImageSize".
  • (job["NumJobStarts"]>5): Returns true if the job ClassAd attribute "NumJobStarts" is greater than 5.
  • (glidein["attrs"].has_key("GLIDEIN_Retire_Time")): Returns true if the Factory entry ClassAd has the attribute "GLIDEIN_Retire_Time".
  • (glidein["attrs"]["GLIDEIN_Retire_Time"]>21600): Returns true if the Factory entry ClassAd's "GLIDEIN_Retire_Time" is greater than 21600.
  • (int(attr_dict["NUM_USERS_ALLOWED"])>0): Returns true if there is a attribute in the frontend.xml with NUM_USERS_ALLOWED that is greater than zero.
  • (job["Owner"] in attr_dict["ITB_Users"].split(",")): Returns true if the job ClassAd attribute "Owner" is in the comma-delimited string attribute ITB_USERS (which would be defined in frontend.xml)

Each attribute used in a match expression should be declared in a subsequent match_attrs section. This makes ClassAd variables available to the match expression. Attributes can be made available from the:

  1. Factory ClassAd: (<match><factory><match_attr>)
  2. Job ClassAd: (<match><job><match_attr>)

Each match_attr tag must contain a name. This is the name of the attribute in the appropriate ClassAd.
It must also contain a type which can be one of the following:

  1. string: A constant string of letters, numbers, or characters.
  2. int: An integer: a positive or negative number, or zero.
  3. real: A real number that could have decimal places
  4. bool: It can by "True" or "False"
  5. Expr: A ClassAd expression

Example

<match match_expr='glidein["attrs"].get("GLIDEIN_Site") in ((job.get("DESIRED_Sites") != None) and job.get("DESIRED_Sites").split(","))'>
<factory query_expr="(GLIDEIN_Site=!=UNDEFINED)">
<match_attrs> <match_attr name="GLIDEIN_Site" type="string"/> </match_attrs>
<collectors> </collectors>
</factory>
<job query_expr="(DESIRED_Sites=!=UNDEFINED)">
<match_attrs> <match_attr name="DESIRED_Sites" type="string"/> </match_attrs>
<schedds> </schedds>
</job>
</match>

Example

Glideins can also use "start_expr" to make sure the correct jobs start on pilots. This is a HTCondor expression run on the pilot startd daemon. Here is an example:
<match match_expr='glidein["attrs"].get("GLIDEIN_Site") in ((job.get("DESIRED_Sites") != None) and job.get("DESIRED_Sites").split(","))' start_expr='(stringListMember(GLIDEIN_Site,DESIRED_Sites,",")=?=True)' <
...
</match >

Example

Here is an example of policy.py that is equivalent to above example
<match match_expr='True' policy_file='/path/to/policy.py' start_expr='(stringListMember(GLIDEIN_Site,DESIRED_Sites,",")=?=True)' >
<factory query_expr="True"> <match_attrs /> </factory>
<job query_expr="True"> <match_attrs /> </job>
</match >
# Filename: /path/to/policy.py
# match(job, glidein) is equivalent to match_expr in frontend.xml
def match(job, glidein):
    return (glidein['attrs'].get('GLIDEIN_Site') in job["DESIRED_Sites"].split(","))

# factory/job query_expr is a string of HTCondor expression
factory_query_expr = '(GLIDEIN_Site=!=UNDEFINED)'
job_query_expr = '(DESIRED_Sites=!=UNDEFINED)'

# factory/job match_attrs is a dict with following structure
factory_match_attrs = {
    'GLIDEIN_Site': {'type': 'string', 'comment': 'From policy'}
}
job_match_attrs = {
        'DESIRED_Sites': {'type': 'string', 'comment': 'From policy'}
}
        

Attribute substitution

Atribute substitution can be used to parametrized the Frontend configuration.

Anytime one of the following two bits are found inside the configuration file

  • $$(attr_name)
  • $(attr_name)

the value of the named attribute is inserted in that place.

The $$ variant will interpret the type of the attribute, quoting its value as appopriate, while the $ requires a string type and will use it as-is.
They can appear in any string of the configuration file.

If an attribute is defined both in the main and the group section, the section one will prevail.
This can be used to e.g. define a generic expression in the global section, and customize it on a group by group basis by only re-defining the relevant bits.

The attribute expansion also works recursively, as long as there are no loops.
$(DOLLAR) can be used to represent the $ sign.

For example, with

<attr name="JOBSTR_ATTR" glidein_publish="False" job_publish="False" parameter="True" type="string" value="NumJobStarts"/>
<attr name="JOBSTR_VAL" glidein_publish="False" job_publish="False" parameter="True" type="int" value="5"/>
<attr name="PREFIX_VAL" glidein_publish="False" job_publish="False" parameter="True" type="string" value="DESIRED"/>
<attr name="SITE_ATTR" glidein_publish="False" job_publish="False" parameter="True" type="string" value="$(PREFIX_VAL)_Site"/>
<attr name="SITE_VAL" glidein_publish="False" job_publish="False" parameter="True" type="string" value="FNAL"/>
... start_expr="($(JOBSTR_ATTR)<$$(JOBSTR_VAL))&&($(SITE_ATTR)=?=$$(SITE_VAL))" ...

will be expanded into

... start_expr='(NumJobStarts<5)&&(DESIRED_Site=?="FNAL")' ...

Please notice that the Frontend will throw an error on reconfig if a referenced attribute is not defined.

[GSI proxies are deprecated] Using Multiple Proxies

Why would you want to use a pool of pilot proxies instead of a single one?

If your VO maps to a single group account at the remote grid sites, you wouldn't. A pool of pilot proxies (try saying that 5 times fast) does not gain you anything. If your VO maps to a pool of accounts at remote grid sites, you should consider using a pool of proxies equivalent to the number of users you have. Why?

Consider the following scenario: Alice, Bob, and Charlie are all in the FUNGUS experiment and form a VO. They are using GlideinWMS. Alice sends 1000 jobs to FNAL via their GlideinWMS using a single pilot proxy. The pilots map to their userid fungus01 at FNAL, and in accordance with the batch system's fair-share policies, the job priority for user fungus01 is decreased significantly.

Bob comes along and submits 1000 jobs via GlideinWMS, while Charlie submits 1000 jobs under his own proxy and not using GlideinWMS. The GlideinWMS pilots launch for Bob, and map to fungus01. Charlie launches his own jobs that get mapped to fungus02. Relative to fungus02, fungus01 priority is terrible, and Bob's jobs sit around waiting for Charlie -- even though Bob didn't occupy the FNAL resources, Alice did!

The solution: have a pool of pilot proxies. We then spread the fair-share penalty amongst fungus01, fungus02, and fungus03, and Bob now can compete on a more equal footing with Charlie and Alice.

Using multiple proxies

Proxies can be specified in the <security><credentials><credential> tags. Multiple proxy tags can be entered, one for each proxy file. These can found in the security section at the top of the xml, in which case, the proxies are shared for all security groups. They can also be found within <group> tags, in which case they are used only by that security group.

One example follows:

<security>
<credentials>
<credential type="grid_proxy" trust_domain="OSG" absfname="/home/frontend/.globus/x509_pilot05_cms_prio.proxy" security_class="cmsprio"/>
<credential type="grid_proxy" trust_domain="OSG" absfname="/home/frontend/.globus/x509_pilot06_cms_prio.proxy" security_class="cmsprio"/>
<credential type="grid_proxy" trust_domain="OSG" absfname="/home/frontend/.globus/x509_pilot07_cms_prio.proxy" security_class="cmsprio"/>
<credential type="grid_proxy" trust_domain="OSG" absfname="/home/frontend/.globus/x509_pilot08_cms_prio.proxy" security_class="cmsprio"/>
<credential type="grid_proxy" trust_domain="OSG" absfname="/home/frontend/.globus/x509_pilot09_cms_prio.proxy" security_class="cmsprio"/>
<proxies>
<security>

Run jobs under Singularity

As mentionsd above, GLIDEIN_Singularity_Use determines whether or not you want to mandate the use of Singularity (via GlideinWMS) and is used both for provisioning and to match and run once the Glidein is on the site. To use Singularity you have to set it to one of REQUIRED, PREFERRED or OPTIONAL, e.g.:

<attr name="GLIDEIN_Singularity_Use" glidein_publish="True" job_publish="True" parameter="False" type="string" value="PREFERRED"/>
The Factory setting and the actual availability of singularity and an image will also affect the actual use of Singularity. See the Factory configuration document for a table of how Singularity is negotiated with the entries using GLIDEIN_Singularity_Use and GLIDEIN_SINGULARITY_REQUIRE (the entry variable) to decide wether the Glidein can run there and should use Singularity or not.

To use Singularity you must also (in the general or group configuration):

  1. Use the default wrapper script that we provide (or a similar one). In other words a line like:
    <file absfname="/var/lib/gwms-frontend/web-base/frontend/default_singularity_wrapper.sh" wrapper="True"/>
    must be added in the <files> section of the general or group parts of the Frontend configuration. NOTE: Remember to remove your previous VO wrapper if you had one, e.g. if you were using Singularity via OSG
  2. Specify an image. You can specify one single image or a dictionary of images, SINGULARITY_IMAGES_DICT, and pick one with the variable REQUIRED_OS (set in the job of in the Frontend group). If you use SINGULARITY_IMAGES_DICT and REQUIRED_OS, the value of REQUIRED_OS must be 'any' or one of the keys used in the dictionary. To specify the image dictionary you have a couple of options:
    • 2a. run a pre-singularity script like generic_pre_singularity_setup.sh from /var/lib/gwms-frontend/web-base/frontend/, (or a similar one you did for your VO). To do this add a line such as:
      <file absfname="/var/lib/gwms-frontend/web-base/frontend/generic_pre_singularity_setup.sh" after_entry="False" executable="True" period="0" untar="False" wrapper="False"/>
      in the <files> section as before. This script sets a SINGULARITY_IMAGES_DICT with the OSG images in CVMFS for "rhel7" and "rhel6". You can select one of them by setting REQUIRED_OS to one of the keys. "rhel7" is the default.
    • 2b. You can set directly SINGULARITY_IMAGES_DICT in your group attributes adding a line like:
      <attr name="SINGULARITY_IMAGES_DICT" glidein_publish="True" job_publish="True" parameter="False" type="string" value="rhel7:/cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-el7:latest,rhel6:/cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-el6:latest,rhel8:/cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-el8:latest"/>
      Then you can use REQUIRED_OS to select one of the images.
    • 2c. You can ask the Factory to set SINGULARITY_IMAGES_DICT for you and use REQUIRED_OS to select the image.
    • 2d. You set SingularityImage in the job submit file, e.g:
      +SingularityImage = "/cvmfs/singularity.opensciencegrid.org/PATH_NAME_TO_THE_IMAGE"
    NOTE: For all the image path specified with the methods above, if CVMFS is mounted in a different place CVMFS_MOUNT_DIR will be set in the environment and the Glidein will replace /cvms/ at the beginning of the path with the correct mount point. I.e. operators and users do not have to worry about CVMFS bein mounted on an unusual mount point.

Optionally, in the general or group configuration you can also:
  • Specify bind-mounts for Singularity images using GLIDEIN_SINGULARITY_BINDPATH. These bind-mounts override the ones listed by the Factory in GLIDEIN_SINGULARITY_BINDPATH_DEFAULT. If one or more of the specified paths does not exist on the node, it will be removed from the list. Here an example:
    <attr name="GLIDEIN_SINGULARITY_BINDPATH" const="False" glidein_publish="True" job_publish="True" parameter="True" publish="True" type="string" value="/vo_files,/src_path:/dst_path"/>.
    See the custom variables file for more information about the bind mounts.
  • Specify additional options for the singularity command, using the attribute GLIDEIN_SINGULARITY_OPTS
The custom variables file contains a reference of all the Singularity attributes used in the Frontend, Factory or Glidein.

Starting a Glidein Frontend Daemon

Once you have the desired configuration file, reconfigure the frontend and start it with the commands:

gwms-frontend reconfig; systemctl start gwms-frontend

With the configutation above, all the activity messages will go into

group_*/log/frontend_info.<date>.log

while the debug, warning and error messages go into

group_*/log/frontend_debug.<date>.log

The Frontend logs are deleted after a week.

XSLT Plugins to extend configuration

This is explained in the
Factory configuration documentation.

Running pre/post reconfigure hooks

You can put executable scipts in the

/etc/gwms-frontend/hooks.reconfig.pre/
or the
/etc/gwms-frontend/hooks.reconfig.pre/
directories. These scripts will be executed every time you reconfigure the frontend. The
.pre
scripts will be executed before the reconfiguration process begins. The
.post
will be executed after the reconfigurations has been done. Scripts will be executed in
/var/lib/gwms-frontend/work-dir
as user
frontend
. Only executable scripts will be executed.