WMS Collector and Collocated Glidein Factory
The glidein Factory node will be the Condor Central Manager for the WMS, i.e. it will run the Condor Collector and Negotiator daemons, but it will also act as a Condor Submit node for the glidein factory, running Condor schedds used for Grid submission.
On top of that, this node also hosts the glidein factory daemons. The glidein Factory is also responsible for the base configuration of the glideins (although part of the configuration comes from the VO frontend).
|Large||4 - 8||2GB - 4GB||100+GB|
A major installation, serving tens of sites and several thousand glideins will require several CPUs (recommended 4-8: 1 for the Condor damons, 1-2 for the glidein factory daemons and 2 or more for Condor-G schedds) and a reasonable amount of memory (at least 2GB, 4GB for a large installation to provide some disk caching).
The disk needed is for binaries, config files, log files and Web monitoring data (For just a few sites, 10GB could be enough, larger installations will need 100+GB to maintain a reasonable history). Monitoring can be pretty I/O intensive when serving many sites, so get the fastest disk you can afford, or consider setting up a RAMDISK.
It must must be on the public internet, with at least one port open to the world; all worker nodes will load data from this node trough HTTP.
|Software||Notes||Install Before glideinWMS|
|Linux OS||A reasonably recent Condor-supported OS Linux OS (RH/SL4 and RH/SL5 tested at press time).||X|
|Python interpreter||v2.3.4 or above||X|
|The perl-Time-HiRes rpm.||This rpm may already be included in perl, depending on the perl version||X|
|The OSG client software.||This can be installed prior to glideinWMS, but the installer can install it inline with the glideinWMS install|
|A HTTP server, like Apache or TUX.||This should be installed prior to glideinWMS (see below)||X|
|The Condor distribution as a tarball.||The installer will use the tarball to install and configure Condor inline|
|The RRDTool package||v1.2.18 or later (see below)||X|
|The M2Crypto python library||v0.17 or later (see below)||X|
|The glideinWMS software.|
- Condor version v7.3.1 has a known issue with incorrect return/exit codes of condor_status and condor_q
If you are using Condor version v7.3.2 disable VOMS checking in condor_config file used by Condor daemons other
than that used by user schedd. VOMS checking adds unrequired overhead. To do so, set
USE_VOMS_ATTRIBUTES = Falseor for individual condor daemons like collector
COLLECTOR.USE_VOMS_ATTRIBUTES = False
The glidein Factory needs a HTTP server, like Apache or TUX. The server should be installed on the same node, but a different node can be used as long as the web area is writable from this one. Servers often come pre-installed with HTTP server software, so if you have one running, just reuse it. Otherwise, the installer can help you install one (as root). (See GlideinWMS Component Install)
The installer will ask you for several non-privileged users during the install process. These should be created prior to running the glideinWMS installer.
|WMS Collector - Condor User||If privilege separation is not used, then install as the same user as the Factory|
|Factory||The Factory will always be installed as a non-prvileged user, whether or not privilege separation is being used.|
|One user per VO Fontent (See notes)||If you are using privilege separation, you will need a user for each VO frontend that will be communicating with the Factory. Otherwise, no new users need to be created for the frontends.|
The installer will ask for several DNs for GSI authentication. You have the option of using a service certificate or a proxy. These should be created and put in place before running the installer. The following is a list of DNs the installer will ask for:
- WMS Collector cert/proxy DN
- VO Frontend proxy DN (cannot use a cert here)
Note 2: The installer will ask if these are trusted Condor Daemons. Answer 'y'.
When installing the Factory you will be presented with a question asking for the directory location for various items. The example below puts many of them in /var. All the directories in /var have to be created as root. Therefore, if you intend on using /var, you will have to create the directories ahead of time.
Due to new restrictions on the directory permissions, it is no longer recommended that you install glideinWMS into the /home directory of the user.
Note: The web data must be stored in a directory served by the HTTP Server.
Where will you host your config files?: [/var/gfactory/glideinsubmit] /home/gfactory/glideinsubmit Where will you host your log files?: [/var/gfactory/glideinlogs] /var/gfactory/glideinlogs Where will you host the client log files?: [/var/gfactory/clientlogs] /var/gfactory/clientlogs Where will you host the client proxies files?: [/var/gfactory/clientproxies]/var/gfactory/clientproxies Where will the web data be hosted?: [/var/www/html/glidefactory] /var/www/html/glidefactory
At some point the installer will prompt you for the OSG VDT Client location or if you want to install it. The installer will install the client for you. (See GlideinWMS Component Install)
When asked if you want OSG_VDT_BASE defined globally? Answer 'y' unless you want to force your users to find and hard code the location.
When asked if you want to enable Match authentication, if you are using Condor 7.1.3 or later, answer 'y' unless you have a reason not to.
If you are installing privilage separation, you need to install glideinWMS Schedds and Collector as root. Otherwise, they can be installed as a non-privileged user. If installing as a non-priveleged user, it is recommended that both the WMS Collector and the Factory share the same credentials. In either case, the Collector needs access to GSI credentials.
You will be presented with the service selection screen. Choose the glideinWMSSchedds and Collector and follow the instructions. Additional information as well as a sample install walk-through is below.
|User||Which user should Condor run under?||If not using privilege separation, this user should be the same user as the factory (see Required Users). Otherwise, the collector should be run as root.|
|Condor tarball||Where do you have the Condor tarball?||As part of the required software, you will need to download the condor distribution.|
|WMS Collector location||Where do you want to install it?: [/opt/glidecondor]||This is the directory where the installer will put the Condor Collector (And Schedd)
used to submit glideins to the grid gatekeepers. It will create a condor.sh (or .csh) in this
directory that will set environment variables. It is important to always source the correct
condor.sh before starting/stopping/querying condor. There are two different condor pools
in a glideinWMS installation, so it is very important to make sure your shell environment has
the correct variables and paths before issuing condor commands.
It is recommended to avoid putting this in a user home directory, as permission requirements (especially with privilege separation) can cause problems.
|Privilege separation||Privilege separation is needed to securely support multiple frontends.
Do you want to install it?: (y/n) [y]
What is the factory username:
List the usernames the factroy will use to separate frontends from one another.
The glideinWMS currently only supports the factory and the glideinWMS collector installed on the same machine. You must specify the user for the factory here to be a valid caller uid. You must specify each username needed for VO frontends to be a valid target uid. This will also populate the /etc/condor/privsep_config file for Condor privilege separation with the following values:
|Factory locations||Where will the factory store its config files?
Where will the factory store its log files?
Where will the factory store the client log files?
Where will the factory store the client proxies?
The factory directories will be where you install the factory (in the next step). The client log and proxy locations will be authorized by the PrivSep to be written to by condor daemons. If using privilege separations, they need to be owned by root.
As these directories need to be world-readable, and the client log and proxy location have permission requirements if using privilege separation, these should generally not be located in a user home directory.
|GSI Security||Where can I find the directory with the trusted CAs?
... Please insert all such DNs, together with a user nickname.
GSI security is based on x509 certificates.
First, you will need a list of trusted certificates. VDT comes with a list of certificates, so, if you install that now (or have installed it previously), you can install that now. Note that you may have to update your certificates if you have an old VDT installation.
You will next need a certificate or proxy for the WMS collector. See the previous section for more information on required certificates and proxies.
If you are using privilege separation, then, on the WMS Collector, the nickname for each VO Frontend must be the username that you created for the frontend.
The installer will then configure the condor_mapfile (located in the certs directory for each condor. install). See the Quick reference for more information.
|Collector Configuration||What name would you like to use for this pool?
What port should the collector be running?
How many secondary schedds do you want?
Here is where you give an identifier and port for the collector process. The collector typically runs on port 9618. Schedd and other daemons do not have a set port and just use ports in a range determined by the collector.
You can also choose how many schedd daemons the master process will start by default. The proper value of this value depends on many factors, including the memory and CPU of the server running it as well as the number of jobs submitted and the number of entry points. This should be increased enough so that each schedd will not have to handle multiple glidein requests to different entry points from factories simultaneously. The default install of 10 schedds should be enough to handle a site with around 10000 jobs. If you are only running hundreds of jobs, you may want to tune this down. Conversely, with higher amounts of jobs, this may need to be increased. This value depends on your installation and can later be tuned based on load and average number of jobs.
Here a possible set of answers is presented; your setup will probably be slightly different:
Welcome to the glideinWMS Installation Helper What do you want to install? (May select several options at one, using a , separated list)  glideinWMS Schedds and Collector  Glidein Factory  GCB  User Pool Collector  User Schedd  Condor for VO Frontend  VO Frontend  Components Please select: 1 The following profiles will be installed:  glideinWMS Schedds and Collector Installing WMS Schedds and Collector Installing condor Which user should Condor run under?: [condor] gfactory You will now need the Condor tarball You can find it on http://www.cs.wisc.edu/condor/ Versions v7.2.2 and 7.3.1 have been tested, but you should always use the latest one Where do you have the Condor tarball? /home/gfactory/downloads/condor-7.4.2-linux-x86_64-rhel5-dynamic.tar.gz Checking... Seems condor version 7.4.2 Where do you want to install it?: [/opt/glidecondor] /home/gfactory/glidecondor Directory '/home/gfactory/glidecondor' does not exist, should I create it?: (y/n) y Installing condor in '/home/gfactory/glidecondor' If something goes wrong with Condor, who should get email about it?: email@example.com Extracting from tarball Running condor_configure Installing Condor from /home/gfactory/glidecondor/tar/condor-7.4.2 to /home/gfactory/glidecondor Condor has been installed into: /home/gfactory/glidecondor Configured condor using these configuration files: global: /home/gfactory/glidecondor/etc/condor_config local: /home/gfactory/glidecondor/condor_local/condor_config.local You should look inside the installation log for some details about how Condor was installed. Created scripts which can be sourced by users to setup their Condor environment variables. These are: sh: /home/gfactory/glidecondor/condor.sh csh: /home/gfactory/glidecondor/condor.csh Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y Condor installed Privilege separation is needed to securely support multiple frontends. Do you want to install it?: (y/n) [y]y A privsep config (/etc/condor/privsep_config) is already in place. Do you want to recreate it?: (y/n) y What is the factory username: gfactory List the usernames the factroy will use to separate frontends from one another. An empty username entry means you are done. Username: frontenduser1 Username: frontenduser2 Username: Directories needed by the factory will be given special treatment to ease administration. Where will the factory store its config files?[/var/gfactory/glideinsubmit] /home/gfactory/glideinsubmit Directory '/home/gfactory/glideinsubmit' does not exist, should I create it?: (y/n) y Where will the factory store its log files?[/var/gfactory/glideinlogs] /var/gfactory/glideinlogs Where will the factory store the client log files?[/var/gfactory/clientlogs] /var/gfactory/clientlogs Directory '/var/gfactory/clientlogs' does not exist, should I create it?: (y/n) y Where will the factory store the client proxies?[/var/gfactory/clientproxies] /var/gfactory/clientproxies Directory '/var/gfactory/clientproxies' does not exist, should I create it?: (y/n) y Privilege separation setup completed Configuring GSI security GSI security relies on a list of trusted CAs Where can I find the directory with the trusted CAs? Do you want to get it from VDT?: (y/n) y Do you have already a VDT installation?: (y/n) y Where is the VDT installed?: /home/gfactory/vdt Using VDT installation in /home/gfactory/vdt To use the GSI security for WMS Collector, you either need a valid GSI proxy or a valid x509 certificate and relative key. Its subject (i.e. DN) will be added as the trusted daemon in the condor configuration. Will you be using a proxy or a cert? (proxy/cert) proxy Where is your proxy located?: /home/condor/security/grid_proxy.wmspool My DN = '/DC=org/DC=doegrids/OU=Service/CN=gfactory/gfactory1.my.org' You will most probably need other DNs in the condor grid mapfile. The VO Frontend(s) will be contacting the WMS Collector and will interact as daemons. Their subjects (i.e. DNs) will most likely be needed. Please insert all such DNs, together with a user nickname. An empty DN entry means you are done. DN: /DC=org/DC=doegrids/OU=Services/CN=frontend1.my.org nickname: [condor001] vofrontend1 Is this a trusted Condor daemon?: (y/n) y DN: /DC=org/DC=doegrids/OU=Services/CN=frontend2.my.org nickname: [condor002] vofrontend2 Is this a trusted Condor daemon?: (y/n) y DN: What name would you like to use for this pool?: [My glideinWMS pool] GlideinWMSPool What port should the collector be running?:  9618 How many secondary schedds do you want?:  9 [...]
The installer will also start the Condor daemons. To stop the Condor daemons, issue
To start them again:
cd <install dir>; ./start_condor.sh
The glidein Factory needs a x509 proxy to communicate with the rest of the world. You have the option of giving the Factory its own proxy,
or having the VO Frontend serve up the proxy. Standard configuration has the VO Frontend serving the proxy. If the Factory will have its
own proxy, you need to create such proxy before instantiating a glidein Factory and then keep it valid for the life of the factory. If used
for job submission, this proxy must at any point in time have a validity of at least the longest expected job being run by the glideinWMS
(and not less than 12 hours).
How you keep this proxy valid (via MyProxy, kx509, voms-proxy-init from a local certificate, scp from other nodes, or other methods), is beyond the scope of this document.
The glidein factory itself should be installed as a non privileged user. The provided installer can be used to create the configuration file, although some manual tunning will probably be needed.
You will be presented with the service selection screen. Follow the instructions and install all the software components. Additional information about the options is below:
|GSI Proxy||Do you want to use such a proxy?||The Glidein Factory can be configured to use a default GSI proxy for submission. However, this operation mode is not recommended.|
|Factory Locations||Where will you host your config files?
Where will you host your log files?
Where will you host the client log files?
Where will you host the client proxies?
Where will the web data be hosted?
What Web URL will you use?
These directories will match the configuration specified in the previous WMS collector install.
For privilege separation, these will be need to be world-readable. In addition, the client log
directory and proxy directories will need to be owned by root,
or Condor will report a security error.
Web data will be where the glidein Factory reports usage statistics. This should be a directory served by the web server and owned by the factory user. You will also be prompted for a Web http URL to access these pages. If the web server is not running on the default port (80), please specify the port as well.
Due to the permission requirements, it is not recommended that these be stored in a user home directory.
|Condor Configuration||What is the Condor base directory?||
This will be the directory that you installed the WMS collector in the previous step.
Do not use the user pool collector!
This step will detect the Schedd that were created in the previous step. Typically, all the schedd daemons can be used to submit glideins to. However, you can specify a subset to submit to if desired.
|CCB||Do you want to use CCB (requires Condor 7.3.0 or better)?||
You should use CCB unless you:
a) Have an old version of Condor that does not support it.
b) Do not have an out-bound network connection (one-directional connectivity)
See the Condor manual for more information.
|glExec||Do you want to use gLExec?||This question generally depends on the policy of the sites that you will be submitting glideins to. The glideins will be submitted to the site, then will verify the environment, start Condor, and then run the user job. glExec provides the ability to segregate these responsibilities to different users (much like a sudo command would). It provides additional accounting mechanisms and better priority management. Use glExec if you desire these features, the sites you are submitting to require it, or if you do not trust user jobs.|
|a) ReSS||Which RESS server should I use?
Select Condor RESS constraint: 
Define a python filter:
There are three ways to retrieve entry points that glideins can submit to.
The first, RESS, is an information gathering service that provides a GLUE-based
information about sites. In order to use RESS, you will need to have the address
of a RESS server. You will also have to provide a condor rule to distinguish
which sites to submit to. This will be a condor constraint that will query a classad
about the GLUE schema.
For example, StringlistMember("VO:MyVO",GlueCEAccessControlBaseRule) will return all results that support that VO. After these results are returned, a python expression will be evaluated that will take these parameters and can perform calculations to determine if the site is sufficient.
Refer to the Condor manual for condor constraint syntax, and to GLUE schema pages for more information about valid parameters that can be used.
|b) BDII||Do you want to fetch entries from BDII?
||There are three ways to retrieve entry points that glideins can submit to. The second, BDII, is an information gathering service that provides site information. Note: If you use BOTH ReSS and BDII for entry points, you will probably get duplicates that will need to be cleaned up manually|
|c) Manual entry||Please list all additional glidein entry points,
Entry name (leave empty when finished):
Gatekeeper for 'XXXX':
RSL for 'XXXX': ((queue=default)(jobtype=single))
Work dir for 'XXXX': [.]
Site name for 'XXXX':
There are three ways to retrieve entry points that glideins can submit to.
The third, manual entry, is a more tedious way of specifying entry points,
but may be necessary if the RESS or BDII services do not contain the entry points
For each entry point, you will need the address of the gatekeeper. You will need an RSL expression that specifies the queue and jobtype. You will also be prompted for a work directory and a name for this site. Contact your site administrator or refer to Globus documentation for more information.
Frontend security name (leave empty when finished):
Frontend identity (like firstname.lastname@example.org):
Frontend proxy security class: [frontend]
|For each frontend you will be supporting, you will need to specify the identity so that it is authorized by the factory to submit glideins on its bequest. If this information is not correct, the factory will drop all glidein requests from the frontend. ***|
The part that is not completely automatic is the list of GCBs and the configuration of the GSI security; you will need to provide the DNs of all the submit nodes. It is strongly recommended to use CCB over GCB if possible.
Here a possible install is presented; your setup will probably be slightly different:
If you followed the example above, you ended up with a configuration file in /home/gfactory/glideinsubmit/glidein_v1_0.cfg/glideinWMS.xml. Edit this file to suit your needs and than create the factory instance as described in the Manual configuration section below.
At this point you can start the factory with
<install dir>/factory_startup start
where the <install dir> is the one written out by the installation script.
To stop the factory the factory, use
<install dir>/factory_startup stop
Once a configuration file is ready, you can create the glidein Factory by executing
./create_glidein <config file>
The startup procedure is the same as described above.
Occasionally, you may need to change the glidein Factory configuration. This is done by updating the configuration file and
Warning: Never update the <install dir>/glideinWMS.xml file! Either use the configuration file you used to install it, or make a copy of the glideinWMS.xml file and modify that one.
Once you are done editing the work config file, reconfigure the factory with
<install dir>/factory_startup reconfig <work config>
If the factory was running, the procedure will stop the factory before reconfiguring it, and restart it afterwards.
There are several ways to monitor the entry points of the glidein factory:
You can either monitor the factory as a whole, or just a single entry point.
The factory monitoring is located at a URL like the one below
Moreover, each entry point, has its own history on the Web.
Assuming you have a SanDiego entry, it can be monitored at
You can get the equivalent of the Web page snaphot by using
The glidein factory writes two log files per entry point factory_info.YYYYMMDD.log
Assuming you have a SanDiego entry, the log files are in
All errors are reported in the factory_err.YYYYMMDD.log. file, while factory_info.YYYYMMDD.log contains entries about what the factory is doing.
Each glidein creates 2 files on exit; job.ID.out and job.ID.err.
Assuming you have a SanDiego entries, the log files are in
Problems are usually reasonably easy to spot.
The glidein factory also advertises summary information in the WMS collector.
and look for glidefactory and glidefactoryclient ads.
One can save troubleshooting and verification until the installation is complete. At this point, however, you should be able be to query the WMS collector.
Verification of Condor Daemons
Verify processes are running by:
ps -ef | grep condorYou should see several condor_master and condor_procd processes. You should also be able to see one schedd process for each secondary schedd you specified in the install.
You can query the WMS collector by (use .csh if using c shell):
sourceThe condor_q commands query any jobs in the WMS pool (-global is needed to show grid jobs). The condor_status will show all daemons and jobs in the condor pool. Eventually, the factory and VO frontend should show up in a listing like this:
MyType TargetType Name glidefactory None FNAL_FERMIGRID_ITB@v1_0@mySite glidefactoryclient None FNAL_FERMIGRID_ITB@v1_0@mySite glideclient None FNAL_FERMIGRID_ITB@v1_0@mySite Scheduler None xxxx.fnal.gov DaemonMaster None xxxx.fnal.gov Negotiator None xxxx.fnal.gov Scheduler None email@example.com DaemonMaster None firstname.lastname@example.org Scheduler None email@example.com DaemonMaster None firstname.lastname@example.org