Installation of a scheduler node in the glideinWMS


[Description |Hardware requirements |Needed software |Installation instructions |Submitting user jobs ]

1. Description

This node will be a Condor Submit node for the user jobs.
You can install many such nodes to improve scalability of your system.

2. Hardware requirements

This machine needs a reasonably recent CPUs and a large amount of memory (min 2GB, recommended 16GB, ~1.5MB per running job).
The amount of disk needed depends on the user jobs; Condor itself uses very little (5 GB should be enough for Condor alone)

3. Needed software

Any Condor-supported OS.
The OSG client software.
The Condor distribution.

NOTE:

4. Installation instructions

The scheduler node software should be installed as root. While it is possible to run the schedds as a non-privileged user, it has some serious security implications; see the Condor manual for details.
The whole process is managed by a install script described below. You will need to provide a valid Condor tarball, so you may as well download it before starting the installer.

Move into

glideinWMS/install

and execute

./glideinWMS_install

You will be presented with this screen:

What do you want to install?
(May select several options at one, using a , separated list)
[1] glideinWMS Collector
[2] Glidein Factory
[3] GCB
[4] pool Collector
[5] Schedd node
[6] Condor for VO Frontend
[7] VO Frontend
[8] Components

Select 5.

Now follow the instructions and install all the software components. Most of the questions should be fairly straightforward. The part that is not completely automatic is the configuration of the GSI security; you will need to provide the DN of the pool collector and the DN of the glidein factory.

Here a possible set of answers is presented; your setup will probably be slightly different:

Do you have already a VDT installation?: (y/n) n
Do you want to install the full OSG VDT client?: (y/n) n
Do you want to install a minimal Grid VDT client?: (y/n) y
Where do you want the VDT installed?: [/opt/vdt] /opt/vdt
Directory '/opt/vdt' does not exist, should I create it?: (y/n) y
What pacman version should I use?: [pacman-3.26] PACMAN-3.26

What VDT cache should I use?: [http://vdt.cs.wisc.edu/vdt_1101_cache] http://vdt.cs.wisc.edu/vdt_1101_cache
VDT client installation tends to be very picky about the platforms it installs under
Most of the time, one needs to pretent to be one of the tested platforms
The platforms known to work are: linux-rhel-3,SL-3,linux-rhel-4,SL-4,linux-fedora-4,linux-rhel-5,SL-5
Which platform do you want to use (leave empty for autodetect): enter

Do you agree to the licenses? [y/n] y

Do you want to update the CA certification revocation lists (CRLs) automatically? [y/n] y

Would you like to setup daily rotation of VDT log files?

Possible answers:
    y: Yes, I want the service to run automatically (once enabled)
    n: No, I do NOT want the service to run automatically

y

Where would you like to install CA files?

Choices:
        r (root)  - install into /etc/grid-security/certificates
                   (existing CA files will be preserved)
        l (local) - install into $VDT_LOCATION/globus/share/certificates
        n (no)    - do not install
l

Do you want to automatically update your CA Certificates? [y/n] y

Where should I fetch the CAs from?: [http://software.grid.iu.edu/pacman/cadist/ca-certs-version] http://software.grid.iu.edu/pacman/cadist/ca-certs-version

VDT client installed

Installing condor

Which user should Condor run under?: [condor] condor

You will now need the Condor tarball
You can find it on http://www.cs.wisc.edu/condor/
Versions v7.0.5 and 7.1.3 have been tested, but you
should always use the latest one

Where do you have the Condor tarball? /root/condor-7.1.3-linux-x86-rhel3-dynamic.tar.gz

Where do you want to install it?: [/opt/glidecondor] /opt/glidecondor
Directory '/opt/glidecondor' does not exist, should I create it?: (y/n) y
Installing condor in '/opt/glidecondor'

If something goes wrong with Condor, who should get email about it?: troubles@my.org

Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y
Condor installed

Configuring GSI security

To use the GSI security, you either need a valid GSI proxy or
a valid x509 certificate and relative key
The cert/proxy subject of User Schedd, Glidein Pool Collector and VO Frontend
(used by glidein startd) will be added as the trusted daemon in GSI_DAEMON_NAME 
in the condor configuration.
Will you be using a proxy or a cert? (proxy/cert) cert
Where is your certificate located?: /etc/grid-security/hostcert.pem
Where is your certificate key located?: /etc/grid-security/hostkey.pem
My DN = '/DC=org/DC=doegrids/OU=Services/CN=submit1.my.org'

Glidein Pool Collector and VO Frontend and glidein startd will connect to and
act as client to User Schedd. Subjects for these services should be in the 
gridmap file of the User Schedd.
Please insert all such DNs, together with a user nickname.
An empty DN entry means you are done.
DN: /DC=org/DC=doegrids/OU=Services/CN=collector1.my.org
nickname: [condor001] collector
Is this a trusted Condor daemon?: (y/n) y
DN: /DC=org/DC=doegrids/OU=Service/CN=gfactory/gfactory1.my.org
nickname: [condor002] pilot
Is this a trusted Condor daemon?: (y/n) y
DN: enter
Do you want to Use Quill (works for 6.8.X only)?: (y/n) [n] n
What node is the collector running (i.e. CONDOR_HOST)?: collector1.my.org
Please list all the GCB servers you will be using
Leave an empty line when finished
GCB node: gcb1.my.org
GCB node:enter
How many secondary schedds do you want?: [9] 4

The installer will also start the Condor daemons.

The installer also created init.d scripts in

/etc/init.d/condor

Use it to stop and restart the schedd(s)

NOTE: Since Schedd node is installed as root, condor installer will create condor.sh and condor.csh files in /etc/profile.d so that you have correct environment setup. If this machine hosts another condor installation or you prefer not to have user PATH and CONDOR_CONFIG point to this condor setup automatically, you need to manually move these files out from the /etc/profile.d

5. Submitting user jobs

From the user point of view, this is just a regular Condor pool.

However, since the resources potentially come from all over the world, users need to create more complex Requirements, in order to prevent the jobs from landing on sites that cannot run their jobs.
Users will need to know which attributes the glideins publish and use them accordingly.


One useful attribute that all glideins publish is GLIDEIN_Site. If a user wants to restrict its job to a list of sites, he can do it be using:

+DESIRED_Sites = "Site1,Site4,Site7,Site22"
Requirements = stringListMember(GLIDEIN_Site,DESIRED_Sites)

Some glideins may also need to properly identify the final user, using GSI authentication. A user should thus add the following line:

x509userproxy = <path to X509 proxy>

to their Condor submission file.

6. Fine Tunning for Large Scale Installations

6.1 Increase the number of available file descriptors

Number of ports used by the condor_schedd process increases as the number of jobs running/queued in the schedd increase. The default number of file descriptors per process is 1024 on most systems. Increase this limit to ~16k or value higher than number of jobs that might be in the queue at any given time. This is particularly required for large scale installations.

In most cases for default installation, user schedd is configured to start as root and started through the script in /etc/xinet.d/condor This is a good place to set higher file descriptor limit for the schedd process.

Back to the index


glideinWMS support: glideinwms-support@fnal.gov