GlideinWMS The Glidein-based Workflow Management System

Search Results

WMS Factory

WMS Collector and Factory Installation

1. Description

The glidein Factory node will be the Condor Central Manager for the WMS, i.e. it will run the Condor Collector and Negotiator daemons, but it will also act as a Condor Submit node for the glidein factory, running Condor schedds used for Grid submission.

On top of that, this node also hosts the glidein factory daemons. The glidein Factory is also responsible for the base configuration of the glideins (although part of the configuration comes from the Glidein frontend).

Note: The WMS collector and factory must be installed on the same node.

2. Hardware requirements

Installation Size CPUs Memory Disk
Small 1 1GB ~10GB
Large 4 - 8 2GB - 4GB 100+GB

A major installation, serving tens of sites and several thousand glideins will require several CPUs (recommended 4-8: 1 for the Condor damons, 1-2 for the glidein factory daemons and 2 or more for Condor-G schedds) and a reasonable amount of memory (at least 2GB, 4GB for a large installation to provide some disk caching).

The disk needed is for binaries, config files, log files and Web monitoring data (For just a few sites, 10GB could be enough, larger installations will need 100+GB to maintain a reasonable history). Monitoring can be pretty I/O intensive when serving many sites, so get the fastest disk you can afford. An SSD disk with sufficient space is recommended for good performance of a large production factory. A good estimate is that about 100GB will be needed for a factory with multiple frontends. If an SSD is not possible, a RAMDISK could possibly be used, but this can cause difficulties and hassle in configuration and synchronization.

It must must be on the public internet, with at least one port open to the world; all worker nodes will load data from this node trough HTTP. Note that worker nodes will also need outbound access in order to access this HTTP port.

3. Needed software

3.1 Software List

Software Notes Install Before glideinWMS
Linux OS A reasonably recent Condor-supported OS Linux OS (RH/SL4 and RH/SL5 tested at press time). X
Python interpreter v2.3.4 or above X
The perl-Time-HiRes rpm. This rpm may already be included in perl, depending on the perl version X
The OSG client software. This can be installed prior to glideinWMS, but the installer can install it inline with the glideinWMS install  
A HTTP server, like Apache or TUX. This should be installed prior to glideinWMS (see below). This port will need open access to the internet, so worker nodes can download data from this service. X
The Condor distribution as a tarball. The installer will use the tarball to install and configure Condor inline  
The RRDTool package v1.2.18 or later (see below) X
The M2Crypto python library v0.17 or later (see below) X
The javascriptRRD package v0.6.3 or later with flot (see below) X
The glideinWMS software.    

NOTE:

  • Condor version v7.3.1 has a known issue with incorrect return/exit codes of condor_status and condor_q
  • If you are using Condor version v7.3.2 disable VOMS checking in condor_config file used by Condor daemons other than that used by user schedd. VOMS checking adds unrequired overhead. To do so, set
    USE_VOMS_ATTRIBUTES = False
    or for individual condor daemons like collector
    COLLECTOR.USE_VOMS_ATTRIBUTES = False

3.2 HTTP Server

The glidein Factory needs a HTTP server, like Apache or TUX. The server should be installed on the same node, but a different node can be used as long as the web area is writable from this one. Servers often come pre-installed with HTTP server software, so if you have one running, just reuse it. Otherwise, the installer can help you install one (as root). This will need to be installed on a port with access to the internet, so worker nodes can download data from this service. (See GlideinWMS Component Install)

3.3 RRDTool

You will also need the python module for RRDTool (v1.2.18 or later). Many systems come with packages for it; if possible use that. Otherwise see additional install notes for alternative installs.

3.4 M2Crypto

You will need the M2Crypto python library. A few systems include it in the software distribution; if possible install the system one. Otherwise see additional install notes for alternative installs.

3.5 javascriptRRD

You will need the javascriptRRD package. It contains the javascript libraries needed by the monitoring. Just download the tarball (with flot), and untar it. You will need to point the installer to this directory.

4. Before you begin...

4.1 Required Users

The installer will ask you for several non-privileged users during the install process. These should be created prior to running the glideinWMS installer.

User Notes
WMS Collector - Condor User If privilege separation is not used, then install as the same user as the Factory
Factory The Factory will always be installed as a non-prvileged user, whether or not privilege separation is being used.
One user per VO Fontent (See notes) If you are using privilege separation, you will need a user for each VO frontend that will be communicating with the Factory. Otherwise, no new users need to be created for the frontends.

4.2 Required Certificates/Proxies

Each service in the GlideinWMS will use a x509 certificate in order to identify itself using GSI authentication (see the Quick Reference Guide" for an overview. The installer will ask for several DNs for GSI authentication. You have the option of using a service certificate or a proxy. These should be created and put in place before running the installer. The following is a list of DNs the installer will ask for:

  • WMS Collector cert/proxy DN
  • Glidein Frontend proxy DN (cannot use a cert here)
Note: In some places the installer will also ask for nicknames to go with the DNs. For the most part the name given doesn't really matter. There is one case where is does matter. If you are using privilege separation, then, on the WMS Collector, the nickname for each Glidein Frontend must be the username that you created for the frontend.
Note 2: The installer will ask if these are trusted Condor Daemons. Answer 'y'.

4.3 Required Directories

When installing the Factory you will be presented with a question asking for the directory location for various items. The example below puts many of them in /var. All the directories in /var have to be created as root. Therefore, if you intend on using /var, you will have to create the directories ahead of time.

Due to new restrictions on the directory permissions, it is no longer recommended that you install glideinWMS into the /home directory of the user.

Note: The web data must be stored in a directory served by the HTTP Server.

Example:

Where will you host your config files?: [/var/gfactory/glideinsubmit] /home/gfactory/glideinsubmit
Where will you host your log files?: [/var/gfactory/glideinlogs] /var/gfactory/glideinlogs
Where will you host the client log files?: [/var/gfactory/clientlogs] /var/gfactory/clientlogs
Where will you host the client proxies files?: [/var/gfactory/clientproxies]/var/gfactory/clientproxies
Where will the web data be hosted?: [/var/www/html/glidefactory] /var/www/html/glidefactory

4.4 Miscellaneous Notes

At some point the installer will prompt you for the OSG VDT Client location or if you want to install it. The installer will install the client for you. (See GlideinWMS Component Install)

When asked if you want OSG_VDT_BASE defined globally? Answer 'y' unless you want to force your users to find and hard code the location.

By default, match authentication will be used. If you have a reason not to use it, be sure to set to False the USE_MATCH_AUTH attribute in both in both the Factory and Frontend configuration files.

5. glideinWMS Collector installation guide

If you are installing privilage separation, you need to install glideinWMS Schedds and Collector as root. Otherwise, they can be installed as a non-privileged user. If installing as a non-priveleged user, it is recommended that both the WMS Collector and the Factory share the same credentials. In either case, the Collector needs access to GSI credentials.

Move into

glideinWMS/install

and execute

./glideinWMS_install

You will be presented with the service selection screen. Choose the glideinWMSSchedds and Collector and follow the instructions. Additional information as well as a sample install walk-through is below.

Field Installation Text Description
User Which user should Condor run under? If not using privilege separation, this user should be the same user as the factory (see Required Users). Otherwise, the collector should be run as root.
Condor tarball Where do you have the Condor tarball? As part of the required software, you will need to download the condor distribution.
WMS Collector location Where do you want to install it?: [/opt/glidecondor] This is the directory where the installer will put the Condor Collector (And Schedd) used to submit glideins to the grid gatekeepers. It will create a condor.sh (or .csh) in this directory that will set environment variables. It is important to always source the correct condor.sh before starting/stopping/querying condor. There are two different condor pools in a glideinWMS installation, so it is very important to make sure your shell environment has the correct variables and paths before issuing condor commands.
It is recommended to avoid putting this in a user home directory, as permission requirements (especially with privilege separation) can cause problems.
Privilege separation Privilege separation is needed to securely support multiple frontends. Do you want to install it?: (y/n) [y]
What is the factory username:
List the usernames the factroy will use to separate frontends from one another.
The glideinWMS currently only supports the factory and the glideinWMS collector installed on the same machine. You must specify the user for the factory here to be a valid caller uid. You must specify each username needed for VO frontends to be a valid target uid. This will also populate the /etc/condor/privsep_config file for Condor privilege separation with the following values:
  • valid-caller-uids and valid-caller-gids. These lists specify users and groups that will be allowed to request action from the PrivSep Kernel. The list typically will contain the UID and primary GID that the Condor daemons will run as.
  • valid-target-uids and valid-target-gids. These lists specify the users and groups that Condor will be allowed to act on behalf of. The list will need to include IDs of all users and groups that Condor jobs may use on the given execute machine.
Condor's PrivSep kernel prevents users than the above from calling or executing condor jobs, so the factory and VO users need to be explicitly authorized.
Factory locations Where will the factory store its config files?
Where will the factory store its log files?
Where will the factory store the client log files?
Where will the factory store the client proxies?
The factory directories will be where you install the factory (in the next step). The client log and proxy locations will be authorized by the PrivSep to be written to by condor daemons. If using privilege separations, they need to be owned by root.
As these directories need to be world-readable, and the client log and proxy location have permission requirements if using privilege separation, these should generally not be located in a user home directory.
GSI Security Where can I find the directory with the trusted CAs?
... Please insert all such DNs, together with a user nickname.
GSI security is based on x509 certificates. First, you will need a list of trusted certificates. VDT comes with a list of certificates, so, if you install that now (or have installed it previously), you can install that now. Note that you may have to update your certificates if you have an old VDT installation.
You will next need a certificate or proxy for the WMS collector. See the previous section for more information on required certificates and proxies.
If you are using privilege separation, then, on the WMS Collector, the nickname for each Glidein Frontend must be the username that you created for the frontend.
The installer will then configure the condor_mapfile (located in the certs directory for each condor. install). See the Quick reference for more information.
Collector Configuration What name would you like to use for this pool?
What port should the collector be running?
How many secondary schedds do you want?
Here is where you give an identifier and port for the collector process. The collector typically runs on port 9618. Schedd and other daemons do not have a set port and just use ports in a range determined by the collector.
You can also choose how many schedd daemons the master process will start by default. The proper value of this value depends on many factors, including the memory and CPU of the server running it as well as the number of jobs submitted and the number of entry points. This should be increased enough so that each schedd will not have to handle multiple glidein requests to different entry points from factories simultaneously. The default install of 10 schedds should be enough to handle a site with around 10000 jobs. If you are only running hundreds of jobs, you may want to tune this down. Conversely, with higher amounts of jobs, this may need to be increased. This value depends on your installation and can later be tuned based on load and average number of jobs.

Here a possible set of answers is presented; your setup will probably be slightly different:

Welcome to the glideinWMS Installation Helper

What do you want to install?
(May select several options at one, using a , separated list)
[1] glideinWMS Schedds and Collector
[2] Glidein Factory
[3] GCB
[4] User Pool Collector
[5] User Schedd
[6] Condor for Glidein Frontend
[7] Glidein Frontend
[8] Components
Please select: 1

The following profiles will be installed:
[1] glideinWMS Schedds and Collector

Installing WMS Schedds and Collector

Installing condor

Which user should Condor run under?: [condor] gfactory

You will now need the Condor tarball
You can find it on http://www.cs.wisc.edu/condor/
Versions v7.2.2 and 7.3.1 have been tested, but you
should always use the latest one

Where do you have the Condor tarball? /home/gfactory/downloads/condor-7.4.2-linux-x86_64-rhel5-dynamic.tar.gz
Checking...
Seems condor version 7.4.2

Where do you want to install it?: [/opt/glidecondor] /home/gfactory/glidecondor
Directory '/home/gfactory/glidecondor' does not exist, should I create it?: (y/n) y

Installing condor in '/home/gfactory/glidecondor'

If something goes wrong with Condor, who should get email about it?: admin@my.org
Extracting from tarball
Running condor_configure
Installing Condor from /home/gfactory/glidecondor/tar/condor-7.4.2 to /home/gfactory/glidecondor

Condor has been installed into:
    /home/gfactory/glidecondor

Configured condor using these configuration files:
  global: /home/gfactory/glidecondor/etc/condor_config
  local:  /home/gfactory/glidecondor/condor_local/condor_config.local
You should look inside the installation log for some details about how
Condor was installed.
Created scripts which can be sourced by users to setup their
Condor environment variables.  These are:
   sh: /home/gfactory/glidecondor/condor.sh
  csh: /home/gfactory/glidecondor/condor.csh

Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y
Condor installed

Privilege separation is needed to securely support multiple frontends.
Do you want to install it?: (y/n) [y]y
A privsep config (/etc/condor/privsep_config) is already in place. Do you want to recreate it?: (y/n) y
What is the factory username: gfactory

List the usernames the factroy will use
to separate frontends from one another.
An empty username entry means you are done.
Username: frontenduser1

Username: frontenduser2
Username:

Directories needed by the factory will be given special
treatment to ease administration.
Where will the factory store its config files?[/var/gfactory/glideinsubmit] /home/gfactory/glideinsubmit
Directory '/home/gfactory/glideinsubmit' does not exist, should I create it?: (y/n) y
Where will the factory store its log files?[/var/gfactory/glideinlogs] /var/gfactory/glideinlogs
Where will the factory store the client log files?[/var/gfactory/clientlogs] /var/gfactory/clientlogs
Directory '/var/gfactory/clientlogs' does not exist, should I create it?: (y/n) y

Where will the factory store the client proxies?[/var/gfactory/clientproxies] /var/gfactory/clientproxies
Directory '/var/gfactory/clientproxies' does not exist, should I create it?: (y/n) y

Privilege separation setup completed

Configuring GSI security

GSI security relies on a list of trusted CAs
Where can I find the directory with the trusted CAs?

Do you want to get it from VDT?: (y/n) y
Do you have already a VDT installation?: (y/n) y
Where is the VDT installed?: /home/gfactory/vdt
Using VDT installation in /home/gfactory/vdt

To use the GSI security for WMS Collector, you either need
a valid GSI proxy or a valid x509 certificate and relative key.
Its subject (i.e. DN) will be added as the trusted daemon
in the condor configuration.

Will you be using a proxy or a cert? (proxy/cert) proxy
Where is your proxy located?: /home/condor/security/grid_proxy.wmspool
My DN = '/DC=org/DC=doegrids/OU=Service/CN=gfactory/gfactory1.my.org'


You will most probably need other DNs in the condor grid mapfile.
The Glidein Frontend(s) will be contacting the WMS Collector
and will interact as daemons. Their subjects (i.e. DNs)
will most likely be needed.

Please insert all such DNs, together with a user nickname.
An empty DN entry means you are done.
DN: /DC=org/DC=doegrids/OU=Services/CN=frontend1.my.org
nickname: [condor001] vofrontend1
Is this a trusted Condor daemon?: (y/n) y
DN: /DC=org/DC=doegrids/OU=Services/CN=frontend2.my.org

nickname: [condor002] vofrontend2
Is this a trusted Condor daemon?: (y/n) y
DN:
What name would you like to use for this pool?: [My glideinWMS pool] GlideinWMSPool
What port should the collector be running?: [9618] 9618
How many secondary schedds do you want?: [9] 9
[...]

The installer will also start the Condor daemons. To stop the Condor daemons, issue

killall condor_master

To start them again:

cd <install dir>; ./start_condor.sh

6. Glidein Factory installation guide

The glidein Factory needs a x509 proxy to communicate with the rest of the world. You have the option of giving the Factory its own proxy, or having the Glidein Frontend serve up the proxy. Standard configuration has the Glidein Frontend serving the proxy. If the Factory will have its own proxy, you need to create such proxy before instantiating a glidein Factory and then keep it valid for the life of the factory. If used for job submission, this proxy must at any point in time have a validity of at least the longest expected job being run by the glideinWMS (and not less than 12 hours).
How you keep this proxy valid (via MyProxy, kx509, voms-proxy-init from a local certificate, scp from other nodes, or other methods), is beyond the scope of this document.

Note that the OSG client is may be installed on this node. Be careful to have not source the VDT setup script prior to the installation. This can causel problems with conflicting python versions. VDT sometimes installs its own version of python which will not have rrdtool or M2Crypto available. You can determine if this will be a problem by sourcing the VDT setup script and then doing a 'type python' or 'which python' to see which python is being used.

The glidein factory itself should be installed as a non privileged user. The provided installer can be used to create the configuration file, although some manual tunning will probably be needed.

Move into

glideinWMS/install

and execute

./glideinWMS_install

You will be presented with the service selection screen. Follow the instructions and install all the software components. Additional information about the options is below:

Field Installation Text Description
JavascriptRRD Where is javascriptRRD installed? As part of the required software, you will need to javascriptRRD..
GSI Proxy Do you want to use such a proxy? The Glidein Factory can be configured to use a default GSI proxy for submission. However, this operation mode is not recommended.
Factory Locations Where will you host your config files?
Where will you host your log files?
Where will you host the client log files?
Where will you host the client proxies?
Where will the web data be hosted?
What Web URL will you use?
These directories will match the configuration specified in the previous WMS collector install. For privilege separation, these will be need to be world-readable. In addition, the client log directory and proxy directories will need to be owned by root, or Condor will report a security error.
Web data will be where the glidein Factory reports usage statistics. This should be a directory served by the web server and owned by the factory user. You will also be prompted for a Web http URL to access these pages. If the web server is not running on the default port (80), please specify the port as well.
Due to the permission requirements, it is not recommended that these be stored in a user home directory.
Condor Configuration What is the Condor base directory? This will be the directory that you installed the WMS collector in the previous step. Do not use the user pool collector!
This step will detect the Schedd that were created in the previous step. Typically, all the schedd daemons can be used to submit glideins to. However, you can specify a subset to submit to if desired.
CCB Do you want to use CCB (requires Condor 7.3.0 or better)? You should use CCB unless you:
a) Have an old version of Condor that does not support it.
b) Do not have an out-bound network connection (one-directional connectivity)
See the Condor manual for more information.
glExec Do you want to use gLExec? This question generally depends on the policy of the sites that you will be submitting glideins to. The glideins will be submitted to the site, then will verify the environment, start Condor, and then run the user job. glExec provides the ability to segregate these responsibilities to different users (much like a sudo command would). It provides additional accounting mechanisms and better priority management. Use glExec if you desire these features, the sites you are submitting to require it, or if you do not trust user jobs.
a) ReSS Which RESS server should I use?
Select Condor RESS constraint: []
Define a python filter:
There are three ways to retrieve entry points that glideins can submit to. The first, RESS, is an information gathering service that provides a GLUE-based information about sites. In order to use RESS, you will need to have the address of a RESS server. You will also have to provide a condor rule to distinguish which sites to submit to. This will be a condor constraint that will query a classad about the GLUE schema.
For example, StringlistMember("VO:MyVO",GlueCEAccessControlBaseRule) will return all results that support that VO. After these results are returned, a python expression will be evaluated that will take these parameters and can perform calculations to determine if the site is sufficient.
Refer to the Condor manual for condor constraint syntax, and to GLUE schema pages for more information about valid parameters that can be used.
Note that you may need to verify entry points after automatically adding from ReSS or BDII, as many OSG sites will crash if you try to run using the default work_dir="OSG" and require "condor".
b) BDII Do you want to fetch entries from BDII?
There are three ways to retrieve entry points that glideins can submit to. The second, BDII, is an information gathering service that provides site information. Note: If you use BOTH ReSS and BDII for entry points, you will probably get duplicates that will need to be cleaned up manually
c) Manual entry Please list all additional glidein entry points,
Entry name (leave empty when finished):
Gatekeeper for 'XXXX':
RSL for 'XXXX': ((queue=default)(jobtype=single))
Work dir for 'XXXX': [.]
Site name for 'XXXX':
There are three ways to retrieve entry points that glideins can submit to. The third, manual entry, is a more tedious way of specifying entry points, but may be necessary if the RESS or BDII services do not contain the entry points you need.
For each entry point, you will need the address of the gatekeeper. You will need an RSL expression that specifies the queue and jobtype. You will also be prompted for a work directory and a name for this site. Contact your site administrator or refer to Globus documentation for more information.
Frontend Configuration Frontend security name (leave empty when finished):
Frontend identity (like vo1@gfactory1.my.org):
Frontend proxy security class: [frontend]
Username:
For each frontend you will be supporting, you will need to specify the identity so that it is authorized by the factory to submit glideins on its bequest. If this information is not correct, the factory will drop all glidein requests from the frontend.
For a visual guide to the configuration options that need to match in the frontend and factory, see this color coded chart.
the GSI security; you will need to provide the DN of the glidein factory (if different from the collector one) and the DN of the VO frontend.

The part that is not completely automatic is the list of GCBs and the configuration of the GSI security; you will need to provide the DNs of all the submit nodes. It is strongly recommended to use CCB over GCB if possible.

Here a possible install is presented; your setup will probably be slightly different:

Welcome to the glideinWMS Installation Helper

What do you want to install?
(May select several options at one, using a , separated list)
[1] glideinWMS Schedds and Collector
[2] Glidein Factory
[3] GCB
[4] User Pool Collector
[5] User Schedd
[6] Condor for Glidein Frontend
[7] Glidein Frontend
[8] Components
Please select: 2

The following profiles will be installed:
[2] Glidein Factory


Installing Glidein Factory

Do you have already a javascriptRRD installation?: (y/n) y
Where is javascriptRRD installed?: /home/gfactory/javascriptrrd-0.6.3
The Glidein Factory can be configured to use a default GSI proxy for submission.

However, this operation mode is not recommended.

Do you want to use such a proxy?: (y/n) [n] n

As you probably know, privilege separation
is needed to securely support multiple frontends.
If you are using privilege separation, the factory directories
must be world readable (except for the proxies dirs)

Hosting the config and log files in the factory home directory
is thus not recommended anymore.

Where will you host your config files?: [/var/gfactory/glideinsubmit] /home/gfactory/glideinsubmit
Where will you host your log files?: [/var/gfactory/glideinlogs] /var/gfactory/glideinlogs
Where will you host the client log files?: [/var/gfactory/clientlogs] /var/gfactory/clientlogs
Where will you host the client proxies files?: [/var/gfactory/clientproxies]/var/gfactory/clientproxies
Where will the web data be hosted?: [/var/www/html/glidefactory] /var/www/html/glidefactory
Directory '/var/www/html/glidefactory' not empty.
Should I use it anyhow?: (y/n) y

What Web URL will you use?: [http://gfactory1.my.org/glidefactory/] http://gfactory1.my.org/glidefactory
Give a name to this Glidein Factory?: [mySites-gfactory1] GlideinFactory-gfactory1
Give a name to this Glidein instance?: [v1_0] v1_0
What is the Condor base directory?: [/home/gfactory/glidecondor]/home/gfactory/glidecondor
The following glidein schedds have been found:
 [1] schedd_glideins1@gfactory1.my.org
 [2] schedd_glideins2@gfactory1.my.org
 [3] schedd_glideins3@gfactory1.my.org
 [4] schedd_glideins4@gfactory1.my.org
 [5] schedd_glideins5@gfactory1.my.org
 [5] schedd_glideins6@gfactory1.my.org
 [5] schedd_glideins7@gfactory1.my.org
 [5] schedd_glideins8@gfactory1.my.org
 [5] schedd_glideins9@gfactory1.my.org
Do you want to use all of them?: (y/n) y
Using ['schedd_glideins1@gfactory1.my.org', 'schedd_glideins2@gfactory1.my.org', 'schedd_glideins3@gfactory1.my.org',
'schedd_glideins4@gfactory1.my.org', 'schedd_glideins5@gfactory1.my.org', 'schedd_glideins6@gfactory1.my.org', 
'schedd_glideins7@gfactory1.my.org', 'schedd_glideins8@gfactory1.my.org', 'schedd_glideins9@gfactory1.my.org']

Do you want to use CCB (requires Condor 7.3.0 or better)?: (y/n) y

Do you want to use gLExec?: (y/n) y
Do you want to fetch entries from RESS?: (y/n) [n] y
Which RESS server should I use?: [osg-ress-1.fnal.gov] osg-ress-1.fnal.gov
Select Condor RESS constraint: [] StringlistMember("VO:MyVO",GlueCEAccessControlBaseRule)
Define a python filter: [(int(GlueCEPolicyMaxCPUTime)==0) or (int(GlueCEPolicyMaxCPUTime)>(72*60))] (int(GlueCEPolicyMaxCPUTime)>(1))
Found 28 additional entries
Do you want to use them all?: (y/n) y

Do you want to fetch entries from BDII?: (y/n) [n]n
Please list all additional glidein entry points,

Entry name (leave empty when finished):


For security reasons, we want to whitelist all the frontends that we will be serving.
Each frontend should be segregated to its own (set of) username(s).
If you do not want privilege separation, you can still just use the factory user.

Please list the frontends you will be serving:

Frontend security name (leave empty when finished): vofrontend1
Frontend identity (like vo1@gfactory1.my.org): frontenduser1@gfactory1.my.org
Frontend proxy security class: [frontend]
Username: frontenduser1

Frontend security name (leave empty when finished): vofrontend2
Frontend identity (like vo1@gfactory1.my.org): frontenduser2@gfactory1.my.org
Frontend proxy security class: [frontend]
Username: frontenduser2

Frontend security name (leave empty when finished):
Do you want to create the glidein (as opposed to just the config file)?: (y/n) [n]n
To create the glidein, you need to run
/home/gfactory/glideinWMS/creation/create_glidein /home/gfactory/glideinsubmit/glidein_v1_0.cfg/glideinWMS.xml

Configuration files are located in /home/gfactory/glideinsubmit/glidein_v1_0.cfg

If you followed the example above, you ended up with a configuration file in /home/gfactory/glideinsubmit/glidein_v1_0.cfg/glideinWMS.xml. Edit this file to suit your needs and than create the factory instance as described in the Manual configuration section below.

At this point, your factory installation is complete and you can start the factory. See the following section for how to do this.

7. Factory Management Commands

The following gives some introduction to managing a factory. For more information, see the Factory Configuration page.

7.1 Starting and Stopping

The glidein factory comes with a init.d style startup script.

Note: you will need to export an X509_CERT_DIR variable pointing to the directory of your installed CA certificates when starting the factory.

Glidein factory starting and stopping is handled by

<export X509_CERT_DIR=[CA_CERTIFICATE_DIRECTORY]
<glidein directory>/factory_startup start|stop|restart

You can check that the factory is actually running with

<glidein directory>/factory_startup status

Getting configuration information

To get the list of entries defined in the factory, use

<glidein directory>/factory_startup info -entries -CE

To see which entries are currently active, use

<glidein directory>/factory_startup statusdown entries

PS: Make sure the whole factory is not disabled, by using

<glidein directory>/factory_startup statusdown factory

See the monitoring section on how to check other info.

7.2 Manual configuration

The glidein factory can also be configured manually.

To create an entry point you need:

  • the name of the Gatekeeper(s),
  • the configuration details of the Grid pool(s),
  • the location of the submit point, and
  • the loction of the Web server data directory.
  • the proper work directory to use (ie. work_dir="OSG" or work_dir="Condor")
Once a configuration file is ready, you can create the glidein Factory by executing

cd glideinWMS/creation
./create_glidein <config file>

The startup procedure is the same as described above.

Warning: Never change the files in an entry point by hand after it has been created!.
Use the reconfig tools described below instead.

7.3 Changing the factory configuration

The files in an entry point must never be changed by hand, after the directory structure has been created.

The reason why manual changes are so problematic are two fold:

  • The first problem are signatures. Any change requires the change of the signature file, that in turn gets a new signature. Since the signature file signature is one of the parameters of the startup script, all glideins already in the queue will fail.
  • The second problem is caching. For performance reasons, most Web caches don't check too often if the original document has been changed; a glidein could thus get an old copy of a file and fail the signature check.

There is only one file that is neither signed nor cached and can be thus modified; the blacklisting file called nodes.blacklist. This one can be used to temporarily blacklist malfunctioning nodes that would pass regular sanity checks (for example: memory corruption or I/O errors), while waiting for the Grid site admin to take action.

The proper procedure to update an entry point is to make a copy of the official configuration file (i.e. glideinWMS.xml) and run

<glidein directory>/factory_startup reconfig config_copy_fname

This will update the directory tree and restart the factory and entry dameons.

Please notice that if you make any errors in the new configuration file, the reconfig script will throw an error and do nothing. So you should never need to worry about corrupting the installation tree using this tool.

NOTE: The reconfig tool does not kill the factory in case of errors. Hence is recommended that you disable any entry points that will not be used.

Once you are done editing the work config file, reconfigure the factory with

<install dir>/factory_startup reconfig <work config>

If the factory was running, the procedure will stop the factory before reconfiguring it, and restart it afterwards.

Starting with v2.5.6, the factory_startup script contains a default location for the factory configuration file which is set to the location used for the initial install. This allows you to not have to specify the config location when doing a reconfig. To change the default location in the file, run the command:

<glidein working directory>/factory_startup reconfig config_copy_fname update_default_cfg

7.4 Downtime handling

Starting with v1_3, the glidein factory supports the dynamic handling of downtimes.
Downtimes can be handled both at the factory and at the entry level.

Downtimes are useful when one or more Grid sites are known to have issues (can be anything from scheduled maintenance to a storage element corrupting user files).
In this case the factory administrator can temporarily stop submitting glideins to the affected sites, without stopping the factory as a whole. The list of current downtimes are listed in the factory file in glideinWMS.downtimes

Downtimes are handled with

<glidein directory>/factory_startup up|down -entry 'factory'|<entry name> [-delay <delay>]

Caution: An admin can handle downtimes both at the entry and at the factory level.
Please be aware that both will be used.

More advanced configuration can be done with the following script:

glideinWMS/factory/manageFactoryDowntimes.py -dir factory_dir -entry ['all'|'factory'|'entries'|entry_name] -cmd [command] [options]
You must specify the above options for the factory directory, the entry you wish to disable/enable, and the command to run. The valid commands are:
  • add - Add a scheduled downtime period
  • down - Put the factory down now(+delay)
  • up - Get the factory back up now(+delay)
  • check - Report if the factory is in downtime now(+delay)
  • vacuum - Remove all expired downtime info
Additional options that can be given based on the command above are:
  • -start [[[YYYY-]MM-]DD-]HH:MM[:SS] (start time for adding a downtime)
  • -end [[[YYYY-]MM-]DD-]HH:MM[:SS] (end time for adding a downtime)
  • -delay [HHh][MMm][SS[s]] (delay a downtime for down, up, and check cmds)
  • -frontend SECURITY_NAME (limits a downtime to one frontend)
    (If security class is not specified, the downtime is for all users for this frontend.)
  • -security SECURITY_CLASS (restricts a downtime to users of that security class)
    (If frontend is not specified, the downtime is for all frontends that have this security class.)
  • -comment "Comment here" (user comment for the downtime. Not used by WMS.)

This script can allow you to have more control over managing downtimes, by allowing you to make downtimes specific to security classes, and adding comments to the downtimes file.

Please note that the date format is currently very specific. You need to specify dates in the format "YYYY-MM-DD-HH:MM:SS", such as "2011-11-28:23:01:00."

7.5 Testing with a local glidein

In case of problems, you may want to test a glidein by hand.

Move to the glidein directory and run

./local_start.sh entry_name fast -- GLIDEIN_Collector yourhost.dot,your.dot,domain

. This will start a glidein on the local machine and pointing to the yourhost.your.domain collector.

Please make sure you have a valid Grid environment set up, including a valid proxy, as the glidein needs it in order to work.

8. Verification

One can save troubleshooting and verification until the installation is complete. At this point, however, you should be able be to query the WMS collector.

Verification of Condor Daemons

Verify processes are running by:

ps -ef | grep condor
You should see several condor_master and condor_procd processes. You should also be able to see one schedd process for each secondary schedd you specified in the install.

You can query the WMS collector by (use .csh if using c shell):

source /condor.sh
condor_q
condor_q -global
condor_status -any
The condor_q commands query any jobs in the WMS pool (-global is needed to show grid jobs). The condor_status will show all daemons and jobs in the condor pool. Eventually, the factory and VO frontend should show up in a listing like this:
MyType               TargetType           Name                          

glidefactory         None                 FNAL_FERMIGRID_ITB@v1_0@mySite
glidefactoryclient   None                 FNAL_FERMIGRID_ITB@v1_0@mySite
glideclient          None                 FNAL_FERMIGRID_ITB@v1_0@mySite
Scheduler            None                 xxxx.fnal.gov            
DaemonMaster         None                 xxxx.fnal.gov            
Negotiator           None                 xxxx.fnal.gov            
Scheduler            None                 schedd_glideins1@xxxx.fna
DaemonMaster         None                 schedd_glideins1@xxxx.fna
Scheduler            None                 schedd_glideins2@xxxx.fna
DaemonMaster         None                 schedd_glideins2@xxxx.fna