GlideinWMS The Glidein-based Workflow Management System

Search Results

Glidein Frontend

Configuration

Example Configuration

Below is an example frontend configuration xml file. Click on any piece for a more detailed description.
<frontend advertise_delay="5" frontend_name="vofrontend-v2_4" loop_delay="60">
<log_retention >
<process_logs >
<process_log extension="info" max_days="7.0" max_mbytes="100.0" min_days="3.0" msg_types="INFO" backup_count="5" compression="gz" />
<process_log extension="debug" max_days="7.0" max_mbytes="100.0" min_days="3.0" msg_types="DEBUG,ERR,WARN" backup_count="5" />
</process_logs >
</log_retention >
<match match_expr="True" start_expr="True">
<factory query_expr="True">
<match_attrs />
<collectors>
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=factory-server.fnal.gov" comment="" factory_identity="factoryuser@factory-server.fnal.gov" my_identity="frontenduser@frontend-server.fnal.gov" node="factory-server.fnal.gov:8618" />
</collectors>
</factory>
<job comment="" query_expr="(JobUniverse==5)&&(GLIDEIN_Is_Monitor =!= TRUE)&&(JOB_Is_Monitor =!= TRUE)">
<match_attrs />
<schedds>
<schedd DN="/DC=org/DC=doegrids/OU=Services/CN=userpool.fnal.gov" fullname="userpool.fnal.gov" />
<schedd DN="/DC=org/DC=doegrids/OU=Services/CN=userpool.fnal.gov" fullname="schedd_jobs1@userpool.fnal.gov" />
<schedd DN="/DC=org/DC=doegrids/OU=Services/CN=userpool.fnal.gov" fullname="schedd_jobs2@userpool.fnal.gov" />
</schedds>
</job>
</match>

<monitor base_dir="/var/www/html/vofrontend/monitor" flot_dir="/opt/javascriptrrd-0.6.3/flot" javascriptRRD_dir="/opt/javascriptrrd-0.6.3/src/lib" jquery_dir="/opt/javascriptrrd-0.6.3/flot" />
<monitor_footer display_txt="Legal Disclaimer" href_link="/site/disclaimer.html" />

<security classad_proxy="/etc/grid-security/vocert.pem" proxy_DN="/DC=org/DC=doegrids/OU=Services/CN=frontend-server.fnal.gov" proxy_selection_plugin="ProxyAll" security_name="frontenduser" sym_key="aes_256_cbc">
<credentials>
<credential absfname="/tmp/x509up_u" security_class="frontend" trust_domain="OSG" auth_method="grid_proxy" vm_id="123" vm_type="type1" pool_idx_len="5" pool_idx_list="2,4-6,10" />
</credentials>
</security>
<stage base_dir="/var/www/html/vofrontend/stage" use_symlink="True" web_base_url="http://frontend-server.fnal.gov:9000/vofrontend/stage" />
<work base_dir="/opt/vofrontend" base_log_dir="/opt/vofrontend/logs" />
<attrs>
<attr name="GLIDECLIENT_Rank" glidein_publish="False" job_publish="False " parameter="True" type="string" value="1" />
<attr name="GLIDECLIENT_Start" glidein_publish="False" job_publish="False" parameter="True" type="string" value="True" />
<attr name="GLIDEIN_Expose_Grid_Env" glidein_publish="True" job_publish="True" parameter="False" type="string" value="True" />
<attr name="GLIDEIN_Glexec_Use" glidein_publish="True" job_publish="True" parameter="False" type="string" value="OPTIONAL" />
<attr name="USE_MATCH_AUTH" glidein_publish="False" job_publish="False" parameter="True" type="string" value="True" />
</attrs>
<groups>
<group name="main" enabled="True">
<config>
<idle_glideins_per_entry max="100" reserve="5" />
<idle_vms_per_entry curb="20" max="100" />
<idle_vms_total curb="500" max="1000" />
<running_glideins_per_entry max="2000" relative_to_queue="1.15" />
<running_glideins_total curb="30000" max="40000" />
</config>
<match match_expr="True" start_expr="True">
<factory query_expr="True">
<match_attrs />
<collectors />
</factory>
<job query_expr="True">
<match_attrs />
<schedds />
</job>
</match>
<security>
<credentials />
</security>
<attrs />
<files />
</group>
</groups>
<files>
<file absfname="/opt/script/testSW.sh" after_entry="True" after_group="False" const="True" executable="True" untar="False" wrapper="False" />
</files>
<collectors>
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=usercollector.fnal.gov" node="usercollector.fnal.gov" secondary="False" group="default" />
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=usercollector.fnal.gov" node="usercollector.fnal.gov:9620-9819" secondary="True" group="default" />
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=usercollector2.fnal.gov" node="usercollector2.fnal.gov" secondary="False" group="ha" />
<collector DN="/DC=org/DC=doegrids/OU=Services/CN=usercollector2.fnal.gov" node="usercollector2.fnal.gov:9620-9919" secondary="True" group="ha" />
</collectors>
<config>
<idle_vms_total curb="800" max="1200" />
<idle_vms_total_global curb="1000" max="1500" />
<running_glideins_total curb="35000" max="45000" />
<running_glideins_total_global curb="50000" max="60000" />
</config>
</frontend>

The Glidein Frontend Configuration

The Glidein Frontend configuration involves creating the configuration directory and files and then creating the daemons. As in the Glidein Factory set up, an XML file is converted into a configuration tree by a configuration tool.

For the installer to create the Glidein Frontend instance from the configuration directory and grid mapfile, the following objects can be defined:

Other attributes can be specified as well. They are used by the VO Frontend matchmaking and job matchmaking. The format is similar to the attributes on the Factory config file. The table below describes the <attrs ... > tag in more detail.

Refer to description of Custom Condor Variables section for the list of available variables.

Attribute Name

Attribute Description

name

Name of the attribute

value

Value of the attribute

parameter

Set to True if the attribute should be passed as a parameter. If set to False, the attribute will be put in the staging area to be accessed by the glidein startup scripts. Always set this to True unless you know what you are doing.

glidein_publish

If set to True, the attribute will be available in the condor_startd's classad. Used only if parameter is True.

job_publish

If set to True, the attribute will be available in the user job's environment. Used only if parameter is True.

comment

You can specify description of the attribute here.

type

Type of the attribute. Supported types are 'int', 'string' and 'expr'. Type expr is equivalent to condor constant/expression in condor_vars.lst

An example attribute would be:

<attrs><attr name="GLIDEIN_Collector" value="mymachine.mydomain" publish="True" parameter="True" const="True" glidein_publish="True" comment=“Just a test attribute”/>

The following group parameters are used to configure multiple frontends. If only one group is specified, they apply to all frontends. The objects specified are used for creating and monitoring glideins. Groups are used to group users with similar requirements, such as proxies, criteria for matching job requirements with sites, and configuration of glideins.

Adding Custom Code/Scripts to Glidein Frontend Glideins

You can add custom scripts to glideins created for this Glidein Frontend by adding scripts and files to the configuration in the files section:
<glidein>
 [<groups><group>]
 <files>
  <file absfname="script name" executable="True" comment="comment"/>

The script will be copied to the Web-accessible area and added to the glidein's file_list, and when a glidein starts, the glidein startup script will pull it and execute it. If any parameters are needed, they can be specified using <attr />.

For more detailed information, see the page dedicated to writing custom scripts.

You can also create wrapper scripts or tar-balls of files, see the factory configuration page for syntax. (Use groups/group tags instead of the factory's entry tag). The variable name specified by absdir_outattr will be prepended by either GLIDECLIENT_ or GLIDECLIENT_GROUP_, depending on scope.

Match Expressions and Match Attributes

Several sections in the configuration allow a match expression. Each of these sections allows an expression to be evaluated to determine where glideins and jobs should be matched.
For example, expressions allowing a white list by the frontend can be created in order to control where the glideins are submitted. It can also allow you to give a Condor expression to specify where jobs can run or to specify which glidein_sites can run jobs.
There are two ways to restrict matching in most cases. Note that match_expr clauses, such as <match match_expr> will use python based expressions as explained below. Others, such as <factory query_expr> and <job query_expr> use Condor ClassAd expressions. For these, only valid Condor expressions can be used. Python expressions can not be evaluated in these contexts.
Note that, for some tags (like factory query_expr), you can specify expressions in both the default global section as well as in individual group sections. You should take special care before doing this to make sure the expressions are correct, as the expressions are typically "AND"-ed together.

Each match expression is a python expression that will be evaluated. Matches can be scoped to either global scope (<frontend><match>) or to a group specific scope.

Each python expression will typical be a series of boolean tests, surrounded by parentheses and connected by the boolean expressions "and", "or", and "not". You can use several dictionaries in these match expressions. The "job" dictionary contains the classad of the job being matched, and the "glidein" dictionary contains information about the factory (entry point) classad. There is also a "attr_dict" dictionary that can reference attributes defined in the frontend.xml. While an extensive list of everything you can in these expressions is out of scope, some examples are below:

  • (job.has_key("ImageSize")): Returns true if the job classad has the attribute "ImageSize".
  • (job["NumJobStarts"]>5): Returns true if the job classad attribute "NumJobStarts" is greater than 5.
  • (glidein["attrs"].has_key("GLIDEIN_Retire_Time")): Returns true if the factory entry classad has the attribute "GLIDEIN_Retire_Time".
  • (glidein["attrs"]["GLIDEIN_Retire_Time"]>21600): Returns true if the factory entry classad's "GLIDEIN_Retire_Time" is greater than 21600.
  • (attr_dict["NUM_USERS_ALLOWED"]>0): Returns true if there is a attribute in the frontend.xml with NUM_USERS_ALLOWED that is greater than zero.
  • (job["Owner"] in attr_dict["ITB_Users"].split(",")): Returns true if the job classad attribute "Owner" is in the comma-delimited string attribute ITB_USERS (which would be defined in frontend.xml)

Each attribute used in a match expression should be declared in a subsequent match_attrs section. This makes classad variables available to the match expression. Attributes can be made available from the:

  1. Factory classad: (<match><factory><match_attr>)
  2. Job classad: (<match><job><match_attr>)

Each match_attr tag must contain a name. This is the name of the attribute in the appropriate classad.
It must also contain a type which can be one of the following:

  1. string: A constant string of letters, numbers, or characters.
  2. int: An integer: a positive or negative number, or zero.
  3. real: A real number that could have decimal places
  4. bool: It can by "True" or "False"
  5. Expr: A ClassAd expression

Example

<match match_expr='glidein["attrs"].get("GLIDEIN_Site") in ((job.get("DESIRED_Sites") != None) and job.get("DESIRED_Sites").split(","))'>
<factory query_expr="(GLIDEIN_Site=!=UNDEFINED)">
<match_attrs> <match_attr name="GLIDEIN_Site" type="string"/> </match_attrs>
<collectors> </collectors>
</factory>
<job query_expr="(DESIRED_Sites=!=UNDEFINED)">
<match_attrs> <match_attr name="DESIRED_Sites" type="string"/> </match_attrs>
<schedds> </schedds>
</job>
</match>

Example

Glideins can also use "start_expr" to make sure the correct jobs start on pilots. This is a Condor expression run on the pilot startd daemon. Here is an example:
<match match_expr='glidein["attrs"].get("GLIDEIN_Site") in ((job.get("DESIRED_Sites") != None) and job.get("DESIRED_Sites").split(","))' start_expr='(stringListMember(GLIDEIN_Site,DESIRED_Sites,",")=?=True)' <
...
</match >

Using Multiple Proxies

Why would you want to use a pool of pilot proxies instead of a single one?

If your VO maps to a single group account at the remote grid sites, you wouldn't. A pool of pilot proxies (try saying that 5 times fast) does not gain you anything. If your VO maps to a pool of accounts at remote grid sites, you should consider using a pool of proxies equivalent to the number of users you have. Why?

Consider the following scenario: Alice, Bob, and Charlie are all in the FUNGUS experiment and form a VO. They are using GlideinWMS. Alice sends 1000 jobs to FNAL via their GlideinWMS using a single pilot proxy. The pilots map to their userid fungus01 at FNAL, and in accordance with the batch system's fair-share policies, the job priority for user fungus01 is decreased significantly.

Bob comes along and submits 1000 jobs via GlideinWMS, while Charlie submits 1000 jobs under his own proxy and not using GlideinWMS. The GlideinWMS pilots launch for Bob, and map to fungus01. Charlie launches his own jobs that get mapped to fungus02. Relative to fungus02, fungus01 priority is terrible, and Bob's jobs sit around waiting for Charlie -- even though Bob didn't occupy the FNAL resources, Alice did!

The solution: have a pool of pilot proxies. We then spread the fair-share penalty amongst fungus01, fungus02, and fungus03, and Bob now can compete on a more equal footing with Charlie and Alice.

Using multiple proxies

Proxies can be specified in the <security><credentials><credential> tags. Multiple proxy tags can be entered, one for each proxy file. These can found in the security section at the top of the xml, in which case, the proxies are shared for all security groups. They can also be found within <group> tags, in which case they are used only by that security group.

One example follows:

<security>
<credentials>
<credential type="grid_proxy" trust_domain="OSG" absfname="/home/frontend/.globus/x509_pilot05_cms_prio.proxy" security_class="cmsprio"/>
<credential type="grid_proxy" trust_domain="OSG" absfname="/home/frontend/.globus/x509_pilot06_cms_prio.proxy" security_class="cmsprio"/>
<credential type="grid_proxy" trust_domain="OSG" absfname="/home/frontend/.globus/x509_pilot07_cms_prio.proxy" security_class="cmsprio"/>
<credential type="grid_proxy" trust_domain="OSG" absfname="/home/frontend/.globus/x509_pilot08_cms_prio.proxy" security_class="cmsprio"/>
<credential type="grid_proxy" trust_domain="OSG" absfname="/home/frontend/.globus/x509_pilot09_cms_prio.proxy" security_class="cmsprio"/>
<proxies>
<security>

Starting a Glidein Frontend Daemon

Once you have the desired configuration file, move to the VO Frontend directory  and launch the command:

./frontend_startup start

With the configutation above, all the activity messages will go into

group_*/log/frontend_info.<date>.log

while the debug, warning and error messages go into

group_*/log/frontend_debug.<date>.log

The frontend logs are deleted after a week.

XSLT Plugins to extend configuration

This is explained in the
factory configuration documentation.