GlideinWMS The Glidein-based Workflow Management System

Search Results

User Pool

User Collector Installation

1. Description

The glidein pool Collector node will be the Condor Central Manager for the glidein pool, i.e. it will run the Condor Collector and Negotiator daemons. These daemons define the glidein pool; if this node dies, the pool dies with it.

2. Hardware requirements

CPUs Memory Disk
1 - 2 1+GB 5+GB

This machine needs one or two fast CPUs and a moderate amount of memory (1GB should be enough for most tasks; really big pools may need more).

It must have reliable network connectivity and must be on the public internet, with no firewalls; all worker nodes will be continuously sending UDP packets to the Collector.

The machine must be very stable; if the Collector dies, the glidein pool dies with it (There are Condor techniques to minimize this damage, but you should still try to choose the stablest machine you can afford.)

The disk needed is just for Condor binaries and log files (5GB should be enough)

3. Needed software

Software Notes Install Before glideinWMS
Linux OS A reasonably recent Linux OS (RH/SL4 nad RH/SL5 tested at press time). X
Python interpreter v2.3.4 or above X
The OSG client software. This can be installed prior to glideinWMS, but the installer can install it inline with the glideinWMS install  
The Condor distribution as a tarball. The installer will use the tarball to install and configure Condor inline  
The glideinWMS software.    

NOTE:

  • Condor version v7.3.1 has a known issue with incorrect return/exit codes of condor_status and condor_q
  • If you are using Condor version v7.3.2 disable VOMS checking in condor_config file used by Condor daemons other than that used by user schedd. VOMS checking adds unrequired overhead. To do so, set
    USE_VOMS_ATTRIBUTES = False
    or for individual condor daemons like collector
    COLLECTOR.USE_VOMS_ATTRIBUTES = False

4. Before you begin...

Each service in the GlideinWMS will use a x509 certificate in order to identify itself using GSI authentication (see the Quick Reference Guide" for an overview. The installer will ask for several DNs for GSI authentication. You have the option of using a service certificate or a proxy. These should be created and put in place before running the installer. The following is a list of DNs the installer will ask for:

  • Pool Collector cert/proxy DN
  • User Submitter cert/proxy DN
  • Glidein Factory cert/proxy DN
  • Glidein Frontend proxy DN (cannot use a cert here)
Note: In some places the installer will also ask for nicknames to go with the DNs. For the most part the name given doesn't really matter. There is one case where is does matter. If you are using privilege separation, then, on the WMS Collector, the nickname for each Glidein Frontend must be the username that you created for the frontend.
Note 2: The installer will ask if these are trusted Condor Daemons. Answer 'y'.

5. Installation instructions

The pool Collector can be installed either as root or as a non privileged user. Either case, make sure that the user has access to the needed GSI credentials. There is no real advantage to install as root, so non-privileged installation is recommended if installed separately.

The whole process is managed by a install script described below.

Move into

glideinWMS/install

and execute

./glideinWMS_install

You will be presented with the service selection screen. Choose [4] for the user pool collector, and f ollow the instructions and install all the software components. Further detail and a walk-through is presented below:
Field Installation Text Description
Condor Where do you have the Condor tarball?
Where do you want to install it?
The user pool collector is part of the Condor pool that will actually run the user's jobs. This will be the server that you will submit jobs to. This piece of the install will configure the collector to work with the submitted glideins.
For this, you will need a condor distribution and a location to install to. It will also prompt for a administrator email.
It is not recommended to install this into a user home directory.
GSI Security Where can I find the directory with the trusted CAs? GSI security is based on x509 certificates. First, you will need a list of trusted certificates. VDT comes with a list of certificates, so, if you install that now (or have installed it previously), you can install that now. Note that you may have to update your certificates if you have an old VDT installation.
You will next need a certificate or proxy for the user pool collector. See the previous section for more information on required certificates and proxies.
PrivSep Please insert all such DNs, together with a user nickname. You will need to provide the DN(s) of the glideins, the DNs of all the submit machines and the DN of the VO frontend.
The installer will then configure the condor_mapfile (located in the certs directory for each condor install).
See GSI Reference for more information.
Condor configuration What name would you like to use for this pool?
How many slave collectors do you want?
You will need to provide a name for your pool, and determine how many slave collectors you will need.
The number of slave collectors will vary based on the number of jobs and other factors and can later be tuned.

Here a possible set of answers is presented; your setup will probably be slightly different:

Welcome to the glideinWMS Installation Helper

What do you want to install?
(May select several options at one, using a , separated list)
[1] glideinWMS Schedds and Collector
[2] Glidein Factory
[3] GCB
[4] User Pool Collector
[5] User Schedd
[6] Condor for Glidein Frontend
[7] Glidein Frontend
[8] Components
Please select: 4

The following profiles will be installed:
[4] User Pool Collector

Installing pool collector

Installing condor


You will now need the Condor tarball
You can find it on http://www.cs.wisc.edu/condor/
Versions v7.2.2 and 7.3.1 have been tested, but you
should always use the latest one

Where do you have the Condor tarball? /home/collector/downloads/condor-7.4.2-linux-x86_64-rhel5-dynamic.tar.gz
Checking...
Seems condor version 7.4.2

Where do you want to install it?: [/home/collector/glidecondor] /home/collector/glidecondor

Directory '/home/collector/glidecondor' does not exist, should I create it?: (y/n) y
Installing condor in '/home/collector/glidecondor'

If something goes wrong with Condor, who should get email about it?: admin@my.org
Extracting from tarball
Running condor_configure
Installing Condor from /home/collector/glidecondor/tar/condor-7.4.2 to /home/collector/glidecondor

Condor has been installed into:
    /home/collector/glidecondor

Configured condor using these configuration files:
  global: /home/collector/glidecondor/etc/condor_config
  local:  /home/collector/glidecondor/condor_local/condor_config.local
You should look inside the installation log for some details about how
Condor was installed.
Created scripts which can be sourced by users to setup their
Condor environment variables.  These are:
   sh: /home/collector/glidecondor/condor.sh
  csh: /home/collector/glidecondor/condor.csh

Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y

The Condor config has been put in your login files
Please remember to exit and reenter the terminal after the install

Condor installed

Configuring GSI security

GSI security relies on a list of trusted CAs
Where can I find the directory with the trusted CAs?

Do you want to get it from VDT?: (y/n) y
Do you have already a VDT installation?: (y/n) y
Where is the VDT installed?: /home/collector/vdt

Using VDT installation in /home/collector/vdt

To use the GSI security for Pool Collector, you either need
a valid GSI proxy or a valid x509 certificate and relative key.
Its subject (i.e. DN) will be added as the trusted daemon
in the condor configuration.

Will you be using a proxy or a cert? (proxy/cert) cert
Where is your certificate located?: /home/collector/grid-security/servicecert.pem
Where is your certificate key located?: /home/collector/grid-security/servicekey.pem
My DN = '/DC=org/DC=doegrids/OU=Services/CN=collector/master1.my.org'


You will most probably need other DNs in the condor grid mapfile.
The User Schedd(s) and Glidein startds will connect to
and act as daemons to the Pool Collector. Any other node or process
that needs to talk securely with the Collector (like the
Glidein Frontend) also needs to be authenticated, but not as
a daemon. Finally, if you expect any processes on this node
to use condor security toward other nodes (e.g. the Glidein Frontend
talking to the WMS Collector), the remote services will also
need to be authenticated. The subjects (i.e. DNs)
for these services will thus most likely be needed.

Please insert all such DNs, together with a user nickname.
An empty DN entry means you are done.
DN: /DC=org/DC=doegrids/OU=Services/CN=schedd1.my.org
nickname: [condor001] submit
Is this a trusted Condor daemon?: (y/n) y

DN: /DC=org/DC=doegrids/OU=Services/CN=gfactory/gfactory1.my.org
nickname: [condor002] pilot
Is this a trusted Condor daemon?: (y/n) y
DN: /DC=org/DC=doegrids/OU=Services/CN=frontend/frontend1.my.org
nickname: [condor002] frontend
Is this a trusted Condor daemon?: (y/n) n

DN: enter
What name would you like to use for this pool?: [My pool] TestPool
How many slave collectors do you want?: [10] 10
    

6. To Start/Stop Pool Collector

Setup the environment

source /home/collector/condor/condor.sh

To start Condor run:

/home/condor/sbin/condor_master

You should see three processes run as user condor: condor_master, condor_collector and condor_negotiator.

The log files can be found in

/home/condor/condor_local/log

To stop Condor run:

/home/condor/sbin/condor_off -master

7. Fine Tunning for Large Scale Installations

7.1 Increase the number of available file descriptors

The default number of file descriptors per process is 1024 on most systems. Increase this limit for large scale installations if you are running into problems. The below shows one example of how to do this.

ulimit -n 16384

One note is that this may increase the amount of memory used by condor shadow processes. If this becomes the case, you may need to change the condor shadows to have a smaller table size, ie.
SHADOW_MAX_FILE_DESCRIPTORS = 100

For more information on advanced condor installations and multiple schedds, see the Advanced Condor page.