Installation of a scheduler node in the glideinWMS

1. Description

This node will be a Condor Submit node for tbe user jobs.
You can install many such nodes to improve scalability of your system.

2. Hardware requirements

This machine needs a reasonably recent CPUs  and a large amount of memory (2+GB   recommended, ~1MB per running job).
The amount of disk needed depends on the user jobs; Condor itself uses very little (5 GB should be enough)

3. Needed software

Any Condor-supported OS.
The OSG client software.
The Condor distribution.

4. Installation instructions

The installation will assume you install Condor v6.9.2 on a Linux machine, from tarballs, as root.
The install directory will be /opt/glidecondor, the working directory is /opt/glidecondor/condor_local and the machine name is mymachine.fnal.gov and the glidein pool collector is located at mypool.fnal.gov. Moreover, the GCB nodes are located at IP addresses 131.225.70.222 and 131.225.70.224.
If you want to use a different setup, make the necessary changes.

4.1 Install Condor

Follow instructions in the Condor v6.9.2 installation document.

4.2 Make it part of the glidein pool

You will most probably want to install the scheduler on a separate node, so disable collector, negotiator and start on this node, by adding the following line to /opt/glidecondor/condor_local/condor_config.local:
DAEMON_LIST   = MASTER, SCHEDD

and point the schedd to the glidein pool Collector, by adding the following line to /opt/glidecondor/etc/condor_config:
CONDOR_HOST    = mypool.fnal.gov


PS: If instead you prefer to collate the schedulers with the glidein pool collector, follow the collector node installation instructions, and set in /opt/glidecondor/condor_local/condor_config.local:

DAEMON_LIST   = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD


4.3 Configure schedd settings

The default values are unusable for large pools.

Please add to /opt/glidecondor/etc/condor_config the following lines:
###############################
# Schedd settings
###############################
# Allow up to 2k concurrent running jobs
MAX_JOBS_RUNNING        = 2000
# Start 8 jobs every 2 seconds
JOB_START_DELAY = 2
JOB_START_COUNT = 8
# Prevent checking on ImageSize
APPEND_REQ_VANILLA = (Memory>=1)

4.4 Configure Condor GSI security

Follow instructions in the Configuring GSI security in Condor document.

The /opt/glidecondor/certs/grid-mapfile must be populated with your own DN, the glidein pool service proxy DN and the DNs of the glideins.
Assuming the local service proxy DN is "/DC=org/DC=doegrids/OU=Service/CN=schedd2" , the glidein pool COllector's DN in "/DC=org/DC=doegrids/OU=Service/CN=collector5", and you have a single glidein  DN "/DC=org/DC=doegrids/OU=Service/CN=factory3",  the grid-mapfile would contain something like:
"/DC=org/DC=doegrids/OU=Service/CN=schedd2" scondor
"/DC=org/DC=doegrids/OU=Service/CN=factory3" fcondor
"/DC=org/DC=doegrids/OU=Service/CN=fcollector5" ccondor
The names you specify after the DN are not important, but they must differ one from another.

Please notice that the user's will be authenticated using their Unix UID. This implies all submissions need to be local.
If you prefer to force GSI authentication for user's, too (allowing for remote submission), set
SEC_DEFAULT_AUTHENTICATION_METHODS = GSI
and insert all users' DNs in the /opt/glidecondor/certs/grid-mapfile. If you start Condor as root, in this case the user names must correspond to actual UID names.

4.5 Configure Quill

Follow the instructions in the Condor Quill setup document.

4.6 Setup GCB routing tables

Create a file named /opt/glidecondor/etc/gcb-routing-table containing the addresses of the GCBs:
131.225.70.222/32 GCB
131.225.70.224/32 GCB
The add to /opt/glidecondor/etc/condor_config the following lines:
#####################################
# Tell schedd daemons where is GCB
#####################################
SCHEDD.NET_REMAP_ENABLE=TRUE
SCHEDD.NET_REMAP_SERVICE=GCB
SCHEDD.NET_REMAP_ROUTE=/opt/glidecondor/etc/gcb-routing-table

4.7 Setup multiple schedds

You will most probably need multiple schedd, if you want to scale.
Configure the schedds schedd_jobs1, schedd_jobs2, schedd_jobs3 and schedd_jobs4,  by following the instructions in the Configuring multiple Schedds in Condor document.

4.8 Example config files

You can find the complete Condor config files in
example-config/glide-schedd/mymachine/condor_config
and
example-config/glide-schedd/mymachine/condor_config.local.

5. Start Condor

If not already running, start PostgreSQL with:
/etc/init.d/postgresql start

In order to start all the schedulers, you now run several scripts. In order to simplify your life, create /opt/glidecondor/start_condor.sh, contatining:
#!/bin/bash
/opt/glidecondor/sbin/condor_master
sleep 1
/opt/glidecondor/start_master_schedd.sh jobs1
/opt/glidecondor/start_master_schedd.sh jobs2
/opt/glidecondor/start_master_schedd.sh jobs3
/opt/glidecondor/start_master_schedd.sh jobs4
The same file can be downloaded from example-config/glide-schedd/start_condor.sh.

After you make it executable:
chmod a+x /opt/glidecondor/start_condor.sh
just run the new script:
/opt/glidecondor/start_condor.sh

6. Submitting user jobs

From the user point of view, this is just a regular Condor pool.

However, since the resources potentially come from all over the world, users need to create more complex Requirements, in order to prevent the jobs from landing on sites that cannot run their jobs.
Users will need to know which attributes the glideins publish and use them accordingly.

One useful attribute that all Glideins publish is GLIDEIN_Site. If a user wants to restrict its job to a list of sites, he can do it be using:
+DESIRED_Sites = "Site1,Site4,Site7,Site22"
Requirements = stringListMember(GLIDEIN_Site,DESIRED_Sites)


Back to the index