GlideinWMS The Glidein-based Workflow Management System

Search Results

Components

Advanced Condor Configuration

The installation will assume you have installed HTCondor v7.0.5+.
For the purposes of the examples shown here the HTCondor install location is shown as /opt/glideincondor.
The working directory is /opt/glidecondor/condor_local and the machine name is mymachine.fnal.gov.
If you want to use a different setup, make the necessary changes. If you installed HTCondor via RPMs the configuration files location is different: see this OSG guide or the OSG pages about the Fronrtend and Factory.

Multiple Schedds

Note: If you specified any of these options using the GlideinWMS configuration based installer, these files and initialization steps will already have been performed. These instructions are relevant to any post-installation changes you desire to make.

Unless explicity mentioned, all operations are to be done by the user that you installed HTCondor as.

Increase the number of available file descriptors

When using multiple schedds, you may want to consider increasing the available file descriptors. This can be done by issuing a "ulimit -n" command as well as changing the values in the /etc/security/limits.conf file

Using the condor_shared_port feature

The HTCondor shared_port_daemon is available in Condor 7.5.3+.

GlideinWMS V2.5.2+

Additional information on this daemon can be found here:

Your /opt/glidecondor/condor_config.d/02_gwms_schedds.config will need to contain the following attributes. Port 9615 is the default port for the schedds.

#-- Enable shared_port_daemon
SHADOW.USE_SHARED_PORT = True
SCHEDD.USE_SHARED_PORT = True
SHARED_PORT_MAX_WORKERS = 1000
SCHEDD.SHARED_PORT_ARGS = -p 9615
DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT
Note: Both the SCHEDD and SHADOW processes need to specify the shared port option is in effect.

GlideinWMS V2.5.1 and earlier

Additional information on this daemon can be found here:

If you are using this feature, there are 3 additional variables that must be added to the schedd setup script described in the create setup files section:

_CONDOR_USE_SHARED_PORT
_CONDOR_SHARED_PORT_DAEMON_AD_FILE
_CONDOR_DAEMON_SOCKET_DIR
In addition, your /opt/glidecondor/condor_local/condor_config.local will need to contain the following attributes. Port 9615 is the default port for the schedds.
#-- Enable shared_port_daemon
SHADOW.USE_SHARED_PORT = True
SCHEDD.USE_SHARED_PORT = True
SHARED_PORT_MAX_WORKERS = 1000
SCHEDD.SHARED_PORT_ARGS = -p 9615
DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT
Note: Both the SCHEDD and SHADOW processes need to specify the shared port option is in effect.

Multiple Schedds in GlideinWMS V2.5.2+

The following needs to be added to your Condor config file for each additional schedd desired. Note the numeric suffix used to distinguish each schedd.

If the multiple schedds are being used on your WMS Collector, Condor-G is used to submit the glidein pilot jobs and the SCHEDD(GLIDEINS/JOBS)2_ENVIRONMENT attribute shown below is required. If not, then it should be omitted.

Effective with Condor 7.7.5+, the JOB_QUEUE_LOG attribute is required.

For the WMS Collector:
SCHEDDGLIDEINS2 = $(SCHEDD)
SCHEDDGLIDEINS2_ARGS = -local-name scheddglideins2
SCHEDD.SCHEDDGLIDEINS2.SCHEDD_NAME = schedd_glideins2
SCHEDD.SCHEDDGLIDEINS2.SCHEDD_LOG = $(LOG)/SchedLog.$(SCHEDD.SCHEDDGLIDEINS2.SCHEDD_NAME)
SCHEDD.SCHEDDGLIDEINS2.LOCAL_DIR = $(LOCAL_DIR)/$(SCHEDD.SCHEDDGLIDEINS2.SCHEDD_NAME)
SCHEDD.SCHEDDGLIDEINS2.EXECUTE = $(SCHEDD.SCHEDDGLIDEINS2.LOCAL_DIR)/execute
SCHEDD.SCHEDDGLIDEINS2.LOCK = $(SCHEDD.SCHEDDGLIDEINS2.LOCAL_DIR)/lock
SCHEDD.SCHEDDGLIDEINS2.PROCD_ADDRESS = $(SCHEDD.SCHEDDGLIDEINS2.LOCAL_DIR)/procd_pipe
SCHEDD.SCHEDDGLIDEINS2.SPOOL = $(SCHEDD.SCHEDDGLIDEINS2.LOCAL_DIR)/spool
SCHEDD.SCHEDDGLIDEINS2.JOB_QUEUE_LOG = $(SCHEDD.SCHEDDGLIDEINS2.SPOOL)/job_queue.log ## Note: Required with Condor 7.7.5+
SCHEDD.SCHEDDGLIDEINS2.SCHEDD_ADDRESS_FILE = $(SCHEDD.SCHEDDGLIDEINS2.SPOOL)/.schedd_address
SCHEDD.SCHEDDGLIDEINS2.SCHEDD_DAEMON_AD_FILE = $(SCHEDD.SCHEDDGLIDEINS2.SPOOL)/.schedd_classad
SCHEDDGLIDEINS2_SPOOL_DIR_STRING = "$(SCHEDD.SCHEDDGLIDEINS2.SPOOL)"
SCHEDD.SCHEDDGLIDEINS2.SCHEDD_EXPRS = SPOOLL_DIR_STRING
SCHEDDGLIDEINS2_ENVIRONMENT = "_CONDOR_GRIDMANAGER_LOG=$(LOG)/GridManagerLog.$(SCHEDD.SCHEDDGLIDEINS2.SCHEDD_NAME).$(USERNAME)"
DAEMON_LIST = $(DAEMON_LIST), SCHEDDGLIDEINS2
DC_DAEMON_LIST = + SCHEDDGLIDEINS2

For the User Submit host:
SCHEDDJOBS2 = $(SCHEDD)
SCHEDDJOBS2_ARGS = -local-name scheddglideins2
SCHEDD.SCHEDDJOBS2.SCHEDD_NAME = schedd_glideins2
SCHEDD.SCHEDDJOBS2.SCHEDD_LOG = $(LOG)/SchedLog.$(SCHEDD.SCHEDDJOBS2.SCHEDD_NAME)
SCHEDD.SCHEDDJOBS2.LOCAL_DIR = $(LOCAL_DIR)/$(SCHEDD.SCHEDDJOBS2.SCHEDD_NAME)
SCHEDD.SCHEDDJOBS2.EXECUTE = $(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)/execute
SCHEDD.SCHEDDJOBS2.LOCK = $(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)/lock
SCHEDD.SCHEDDJOBS2.PROCD_ADDRESS = $(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)/procd_pipe
SCHEDD.SCHEDDJOBS2.SPOOL = $(SCHEDD.SCHEDDJOBS2.LOCAL_DIR)/spool
SCHEDD.SCHEDDJOBS2.JOB_QUEUE_LOG = $(SCHEDD.SCHEDDJOBS2.SPOOL)/job_queue.log ## Note: Required with Condor 7.7.5+
SCHEDD.SCHEDDJOBS2.SCHEDD_ADDRESS_FILE = $(SCHEDD.SCHEDDJOBS2.SPOOL)/.schedd_address
SCHEDD.SCHEDDJOBS2.SCHEDD_DAEMON_AD_FILE = $(SCHEDD.SCHEDDJOBS2.SPOOL)/.schedd_classad
SCHEDDJOBS2_SPOOL_DIR_STRING = "$(SCHEDD.SCHEDDJOBS2.SPOOL)"
SCHEDD.SCHEDDJOBS2.SCHEDD_EXPRS = SPOOL_DIR_STRING
DAEMON_LIST = $(DAEMON_LIST), SCHEDDJOBS2
DC_DAEMON_LIST = + SCHEDDJOBS2

The directories files will need to be created for the attributes by these attribtues defined above:

LOCAL_DIR
EXECUTE
SPOOL
LOCK

A script is available to do this for you given the attributes are defined with the naming convention shown. If they already exist it will verify their existance and ownership. If they do not exist, they will be created.

source /opt/glidecondor/condor.sh
GLIDEINWMS_LOCATION/install/services/init_schedd.sh
(sample output)
Validating schedd: SCHEDDJOBS2
Processing schedd: SCHEDDJOBS2
SCHEDD.SCHEDDJOBS2.LOCAL_DIR: /opt/glidecondor/condor_local/schedd_jobs2
... created
SCHEDD.SCHEDDJOBS2.EXECUTE: /opt/glidecondor/condor_local/schedd_jobs2/execute
... created
SCHEDD.SCHEDDJOBS2.SPOOL: /opt/glidecondor/condor_local/schedd_jobs2/spool
... created
SCHEDD.SCHEDDJOBS2.LOCK: /opt/glidecondor/condor_local/schedd_jobs2/lock
... created

Multiple Schedds in GlideinWMS V2.5.1

Create setup files

If not already created during installation, you will need to create a set of files to support multiple schedds. This describes the steps necessary.

/opt/glidecondor/new_schedd_setup.sh (example new_schedd_setup.sh)
1. adds the necessary attributes when the schedds are initialized and started.

if [ $# -ne 1 ]
then
 echo "ERROR: arg1 should be schedd name"
 return 1
fi
LD=`condor_config_val LOCAL_DIR`
export _CONDOR_SCHEDD_NAME=schedd_$1
export _CONDOR_MASTER_NAME=${_CONDOR_SCHEDD_NAME}
# SCHEDD and MASTER names MUST be the same (Condor requirement)
export _CONDOR_DAEMON_LIST="MASTER, SCHEDD,QUILL"
export _CONDOR_LOCAL_DIR=$LD/$_CONDOR_SCHEDD_NAME
export _CONDOR_LOCK=$_CONDOR_LOCAL_DIR/lock
#-- condor_shared_port attributes ---
export _CONDOR_USE_SHARED_PORT=True
export _CONDOR_SHARED_PORT_DAEMON_AD_FILE=$LD/log/shared_port_ad
export _CONDOR_DAEMON_SOCKET_DIR=$LD/log/daemon_sock
#------------------------------------
unset LD

The same file can be downloaded from example-config/multi_schedd/new_schedd_setup.sh .

/opt/glidecondor/init_schedd.sh (example init_schedd.sh)
1. This script creates the necessary directories and files for the additional schedds. It will only be used to initialize a new secondary schedd. (see the initialize schedds section)

#!/bin/sh
CONDOR_LOCATION=/opt/glidecondor
script=$CONDOR_LOCATION/new_schedd_setup.sh
source $script $1
if [ "$?" != "0" ];then
 echo "ERROR in $script"
 exit 1
fi
# add whatever other config you need
# create needed directories
$CONDOR_LOCATION/sbin/condor_init
2. This needs to be made executable by the user that installed Condor:
chmod u+x /opt/glidecondor/init_schedd.sh

chmod a+x /opt/glidecondor/init_schedd.sh
The same file can be downloaded from example-config/multi_schedd/init_schedd.sh .

/opt/glidecondor/start_master_schedd.sh (example start_master_schedd.sh)
1. This script is used to start the secondary schedds (see the starting up schedds section)

#!/bin/sh
CONDOR_LOCATION=/opt/glidecondor/condor-submit
export CONDOR_CONFIG=$CONDOR_LOCATION/etc/condor_config
source $CONDOR_LOCATION/new_schedd_setup.sh $1
# add whatever other config you need
$CONDOR_LOCATION/sbin/condor_master
2.- This needs to be made executable by the user that installed Condor:
chmod u+x /opt/glidecondor/start_master_schedd.sh
The same file can be downloaded from example-config/multi_schedd/start_master_schedd.sh .

Initialize schedds

To initialize the secondary schedds, use /opt/glidecondor/init_schedd.sh created above.

If you came here from another document, make sure you configure the schedds specified there.

For example, supposing you want to create schedds named schedd_jobs1, schedd_jobs2 and schedd_glideins1, you would run:

/opt/glidecondor/init_schedd.sh jobs1
/opt/glidecondor/init_schedd.sh jobs2
/opt/glidecondor/init_schedd.sh glideins1

Starting up schedds

If you came to this document as part of another installation, go back and follow those instructions.

Else, when you are ready, you can start the schedd by running /opt/glidecondor/start_master_schedd.sh created above.

For example, supposing you want to start schedds named schedd_jobs1, schedd_jobs2 and schedd_glideins1, you would run:

/opt/glidecondor/start_master_schedd.sh jobs1
/opt/glidecondor/start_master_schedd.sh jobs2
/opt/glidecondor/start_master_schedd.sh glideins1
Note: Always start them after you have started the Collector.

Submission and monitoring

The secondary schedds can be seen by issuing

condor_status -schedd
To submit or query a secondary schedd, you need to use the -name options, like:
condor_submit -name schedd_jobs1@ job.jdl
condor_q -name schedd_jobs1@


Multiple Collectors for Scalability

For scalability purposes, this section will describe the steps (configuration) necessary to add additional (secondary) HTCondor collectors for the WMS and/or User Collectors.

Note: If you specified any of these options using the GlideinWMS configuration based installer, these files and initialization steps will already have been performed. These instructions are relevant to any post-installation changes you desire to make.

Important: When secondary (additional) collectors are added to either the WMS Collector or User Collector, changes must also be made to the Frontend configurations so they are made aware of them.

HTCondor configuration changes

For each secondary collector, the following Condor attributes are required:

COLLECTORnn = $(COLLECTOR)
COLLECTORnn_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/CollectornnLog"
COLLECTORnn_ARGS = -f -p port_number
DAEMON_LIST = $(DAEMON_LIST), COLLECTORnn

In the above example, n is an arbitrary value to uniquely identify each secondary collector. Each secondary collector must also have a unique port_number.

After these changes have been made in your Condor configuration file, restart HTCondor to effect the change. You will see these collector processes running (example has 5 secondary collectors).

user 17732 1 0 13:34 ? 00:00:00 /usr/local/glideins/separate-no-privsep-7-6/condor-userpool/sbin/condor_master user 17735 17732 0 13:34 ? 00:00:00 condor_collector -f primary
user 17736 17732 0 13:34 ? 00:00:00 condor_negotiator -f
user 17737 17732 0 13:34 ? 00:00:00 condor_collector -f -p 9619 secondary
user 17738 17732 0 13:34 ? 00:00:00 condor_collector -f -p 9620 secondary
user 17739 17732 0 13:34 ? 00:00:00 condor_collector -f -p 9621 secondary
user 17740 17732 0 13:34 ? 00:00:00 condor_collector -f -p 9622 secondary
user 17741 17732 0 13:34 ? 00:00:00 condor_collector -f -p 9623 secondary


Multiple Collectors for High Availability (HA)

For reliability purposes, you may want to utilize Condor's High Availability (HA) feature for collectors.
Note:This is only supported in glideinWMS for use with the User pool collector and frontend with v2.6+.

The Condor configuration of additional (secondary) collectors is the same as in the previous section, Multiple Collectors for Scalability. Refer to the HTCondor manual section on High Availability of the Central Manager for additional configuration requirements.

Important: When the Condor High Availability feature is used in the User Collector, changes must also be made to the Frontend configurations so it is made aware of them.

Installing Quill

The HTCondor manual section about Quill may have instructions more updated than this section.

Required software

Installation instructions

The installation will assume you have installed HTCondor v7.0.5 or newer.

The install directory is /opt/glidecondor, the working directory is /opt/glidecondor/condor_local and the machine name is mymachine.fnal.gov. and its IP 131.225.70.222.

If you want to use a different setup, make the necessary changes.

Unless explicity mentioned, all operations are to be done as root.

Obtain and install PostgreSQL RPMs

Most Linux distributions come with very old versions of PostgreSQL, so you will want to download the latest version.

The RPMs can be found on http://www.postgresql.org/ftp/binary/

At the time of writing, the latest version is v8.2.4, and the RPM files to install are

postgresql-8.2.4-1PGDG.i686.rpm
postgresql-libs-8.2.4-1PGDG.i686.rpm
postgresql-server-8.2.4-1PGDG.i686.rpm

Initialize PostgreSQL

Switch to user postgres:

su - postgres
And initialize initialize the database with:
initdb -A "ident sameuser" -D /var/lib/pgsql/data

Configure PostgreSQL

PostgreSQL by default only accepts local connections., so you need to configure it in order for Quill to use it.

Please do it as user postgres.

To enable TCP/IP traffic, you need to change listen_addresses in /var/lib/pgsql/data/postgresql.conf to:

# Make it listen to TCP ports
listen_addresses = '*'

Moreover, you need to specify which machines will be able to access it.
Unless you have strict security policies forbiding this, I recommend enabling read access to the whole world by adding the following line to /var/lib/pgsql/data/pg_hba.conf:

host    all     quillreader     0.0.0.0/0        md5
On the other hand, we want only the local machine to be able to write the database. So, we will add to /var/lib/pgsql/data/pg_hba.conf:
host    all     quillwriter     131.225.70.222/32   md5

Start PostgreSQL

To start PostgreSQL, just run:
/etc/init.d/postgresql start
There should be no error messages.

Initalize Quill users

Switch to user postgres:
su - postgres
And initialize initialize the Quill users with:
createuser quillreader --no-createdb --no-adduser --no-createrole --pwprompt
# passwd reader
createuser quillwriter --createdb --no-adduser --no-createrole --pwprompt
# password <writer passwd>
psql -c "REVOKE CREATE ON SCHEMA public FROM PUBLIC;"
psql -d template1 -c "REVOKE CREATE ON SCHEMA public FROM PUBLIC;"
psql -d template1 -c "GRANT CREATE ON SCHEMA public TO quillwriter; GRANT USAGE ON SCHEMA public TO quillwriter;"

Configure Condor

Append the following lines to /opt/glidecondor/etc/condor_config:
#############################
# Quill settings
#############################
QUILL_ENABLED = TRUE
QUILL_NAME = quill@$(FULL_HOSTNAME)
QUILL_DB_NAME = $(HOSTNAME)
QUILL_DB_QUERY_PASSWORD = reader
QUILL_DB_IP_ADDR = $(HOSTNAME):5432
QUILL_MANAGE_VACUUM = TRUE
In /opt/glidecondor/condor_local/condor_config.local, add QUILL to DAEMON_LIST, getting something like:
DAEMON_LIST                     = MASTER, QUILL, SCHEDD
Finally, put the writer passwd into /opt/glidecondor/condor_local/spool/.quillwritepassword:
echo "<writer passwd>" > /opt/glidecondor/condor_local/spool/.quillwritepassword
chown condor /opt/glidecondor/condor_local/spool/.quillwritepassword
chmod go-rwx /opt/glidecondor/condor_local/spool/.quillwritepassword