GlideinWMS The Glidein-based Workflow Management System

User Schedd

User Schedd Install

1. Description

This node will be a HTCondor Submit node for the user jobs. You can install many such nodes to improve scalability of your system.

2. Hardware requirements

CPUs Memory Disk
1 min 2GB (16GB recommended) ~5GB

This machine needs a reasonably recent CPUs and a large amount of memory (min 2GB, recommended 16GB, ~1.5MB per running job). The amount of disk needed depends on the user jobs; HTCondor itself uses very little (5 GB should be enough for HTCondor alone)

3. Needed software

See the prerequisites page for a list of software requirements.

4. Before you begin...

4.1 Required Certificates/Proxies

The installer will ask for several DNs for GSI authentication. You have the option of using a service certificate or a proxy, which should be created and put in place before running the installer. The following is a list of DNs the installer will ask for:

  • Glidein Submitter cert/proxy DN
  • User Pool (Collector) cert/proxy DN
  • Glidein (VO) Frontend cert/proxy DN
Note: In some places the installer will also ask for nicknames to go with the DNs. These nicknames are the HTCondor UID used in its configuration and mapfile. These do not matter when installing the user schedds. For more details, see OSG documentation.

4.2 Miscellaneous Notes

By default, match authentication will be used.

5. Installation instructions

The scheduler node software should be installed as root. While it is possible to run the schedds as a non-privileged user, it has some serious security implications; see the HTCondor manual for details. The whole process is managed by the install script manage-glideins described below.

Move into

glideinWMS/install

and execute

./manage_glideins --install submit --ini /path/to/glideinWMS.ini

Attribute Example Description Comments
install_type tarball or rpm If this is a VOFrontend RPM installation and you are doing a '--configure', then rpm should be specified.
If this is a stand-alone Submit install, only tarball installations are supported.
Valid values: tarball, rpm.
hostname submitnode.domain.name hostname for Submit node .
username condor (non-root account) UNIX user account that this services will run under. DO NOT use "root". For security purposes, this value should always be a non-root user.
service_name submit Used as the 'nickname' for the GSI DN in the condor_mapfile of other services. .
condor_location /path/to/condor-submit Directory in which the condor software will be installed. IMPORTANT: The Submit can share the same instance of HTCondor as the Frontend. The condor_location must not be a subdirectory of the Frontend's install_location or logs_dir. They may share the same parent, however.
x509_cert_dir /path/to/certificates-location The directory where the CA certificates are maintained. The installer will validate for the precesence of *.0 and *.r0 files. If the CAs are installed from the VDT distribution, this will be the VDT_LOCATION/globus/TRUSTED_CA directory.
x509_cert /path-to-cert-location/cert.pm The location of the certificate file being used. This file must be owned by the user installing (starting/stopping) this service. Permissions should be 644 or 600.
x509_key /path-to-cert-location/key.pm The location of the certificate key file being used and associated with the certtificate defined by the x509_cert option above. This file must be owned by the user installing (starting/stopping) this service. Permissions should be 600 or 400.
x509_gsi_dn dn-subject-of-x509_cert-using-openssl This is the identity of the certificate used by this service to contact the other HTCondor based GlideinWMS services. This is the subject of the certificate (x509_cert option).
openssl x509 -subject -noout -in [x509_cert]
It is used to populate the condor_config file GSI_DAEMON_NAME and condor_mapfile entries of this and the other GlideinWMS services as needed.
condor_tarball /path/to/condor/tarballs/condor-8.7.6-x86_64_RedHat6-stripped.tar.gz Location of the condor tarball. The installation script will perform the installation of condor using this tarball. It must be a zipped tarball with a *.tg.tz name.
condor_admin_email whomever@email.com The email address to get HTCondor notifications in the event of a problem. Used in the condor_config.local only.
number_of_schedds 5 The desired number of schedds to be used. There must be at least 1 schedd.
schedd_shared_port 9618 Specifies the port number to be used by the shared port daemon for schedds. This can drastically reduce the number of ports used and thus improves scalability.
The default port is 9618. If you install the user schedd on a separate host, this incoming TCP port remains to be open (it was 9615 for versions prior to GlideinWMS 3.4.1). For more detailed information on this, refer to the Advanced Condor Configuration - Mulitple Schedds using condor_shared_port feature.
install_vdt_client n Indicates if an OSG/VDT client should be installed if it is not already present in the vdt_location option location. This should be set as n - so the installler will not attempt to install the OSG Client
You must pre-install OSG Client
vdt_location /path/to/glidein/vdt The location of the OSG/VDT client software. Used only if install_vdt_client option is 'y' Leave this blank (since the install_vdt_client option is should always be 'n'
glideinwms_location /path/to/glideinWMS Directory of the GlideinWMS software. Since this is a HTCondor service only, this software is only used during the installation process.

For example configuration files, see here.

The installer allows you to automatically start the HTCondor daemons. To start them on your own, source the condor env script and execute:

/path/to/condor/location/condor start

To stop the HTCondor daemons, source the condor env script and execute:

/path/to/condor/location/condor stop

6. Submitting user jobs

From the user point of view, this is just a regular HTCondor pool.

However, since the resources potentially come from all over the world, users need to create more complex Requirements line in their JDL, in order to prevent the jobs from landing on sites that cannot run their jobs.

Users will need to know which attributes the glideins publish and use them accordingly.

One useful attribute that all glideins publish is GLIDEIN_Site. If a user wants to restrict its job to a list of sites, he can do it by using:

+DESIRED_Sites = "Site1,Site4,Site7,Site22"
Requirements = stringListMember(GLIDEIN_Site,DESIRED_Sites)

Some glideins may also need to properly identify the final user, using GSI authentication. A user should thus add the following line:

x509userproxy = <path to X509 proxy>

to their HTCondor submission file.

7. Fine Tunning for Large Scale Installations

7.1 Increase the number of available file descriptors

Number of ports used by the condor_schedd process increases as the number of jobs running/queued in the schedd increase. The default number of file descriptors per process is 1024 on most systems. Increase this limit to ~16k or value higher than number of jobs that might be in the queue at any given time. This is particularly required for large scale installations.

In most cases for default installation, user schedd is configured to start as root and started through the script in /etc/xinet.d/condor. This is a good place to set higher file descriptor limit for the schedd process.