GlideinWMS The Glidein-based Workflow Management System

Search Results

User Schedd

User Schedd Install

1. Description

This node will be a HTCondor Submit node for the user jobs. You can install many such nodes to improve scalability of your system.

2. Hardware requirements

CPUs Memory Disk
1 min 2GB (16GB recommended) ~5GB

This machine needs a reasonably recent CPUs and a large amount of memory (min 2GB, recommended 16GB, ~1.5MB per running job). The amount of disk needed depends on the user jobs; Condor itself uses very little (5 GB should be enough for Condor alone)

3. Needed software

See the prerequisites page for a list of software requirements.

4. Before you begin...

4.1 Required Certificates/Proxies

The installer will ask for several DNs for GSI authentication. You have the option of using a service certificate or a proxy. These should be created and put in place before running the installer. The following is a list of DNs the installer will ask for:

  • Glidein Submitter cert/proxy DN
  • User Pool (Collector) cert/proxy DN
  • Glidein (VO) Frontend cert/proxy DN
Note: In some places the installer will also ask for nicknames to go with the DNs. These nicknames are the HTCondor UID used in its configuration and mapfile. These do not matter when installing the user schedds.

4.2 Miscellaneous Notes

By default, match authentication will be used.

5. Installation instructions

The scheduler node software should be installed as root. While it is possible to run the schedds as a non-privileged user, it has some serious security implications; see the HTCondor manual for details. The whole process is managed by the install script manage-glideins described below.

Move into

glideinWMS/install

and execute

./manage_glideins --install submit --ini /path/to/glideinWMS.ini

Attribute Example Description Comments
install_type tarball or rpm If this is a VOFrontend RPM installation and you are doing a '--configure', then rpm should be specified.
If this is a stand-alone Submit install, only tarball installations are supported.
Valid values: tarball, rpm.
hostname submitnode.domain.name hostname for Submit node .
username condor (non-root account) UNIX user account that this services will run under. DO NOT use "root". For security purposes, this value should always be a non-root user.
service_name submit Used as the 'nickname' for the GSI DN in the condor_mapfile of other services. .
condor_location /path/to/condor-submit Directory in which the condor software will be installed. IMPORTANT: The Submit can share the same instance of Condor as the Frontend. The condor_location must not be a subdirectory of the Frontend's install_location or logs_dir. They may share the same parent, however.
x509_cert_dir /path/to/certificates-location The directory where the CA certificates are maintained. The installer will validate for the precesence of *.0 and *.r0 files. If the CAs are installed from the VDT distribution, this will be the VDT_LOCATION/globus/TRUSTED_CA directory.
x509_cert /path-to-cert-location/cert.pm The location of the certificate file being used. This file must be owned by the user installing (starting/stopping) this service. Permissions should be 644 or 600.
x509_key /path-to-cert-location/key.pm The location of the certificate key file being used and associated with the certtificate defined by the x509_cert option above. This file must be owned by the user installing (starting/stopping) this service. Permissions should be 600 or 400.
x509_gsi_dn dn-subject-of-x509_cert-using-openssl This is the identity of the certificate used by this service to contact the other Condor based glideinWMS services. This is the subject of the certificate (x509_cert option).
openssl x509 -subject -noout -in [x509_cert]
It is used to populate the condor_config file GSI_DAEMON_NAME and condor_mapfile entries of this and the other glideinWMS services as needed.
condor_tarball /path/to/condor/tarballs/condor-7.5.0-linux-x86-rhel3-dynamic.tar.gz Location of the condor tarball. The installation script will perform the installation of condor using this tarball. It must be a zipped tarball with a *.tg.tz name.
condor_admin_email whomever@email.com The email address to get Condor notifications in the event of a problem. Used in the condor_config.local only.
number_of_schedds 5 The desired number of schedds to be used. There must be at least 1 schedd.
schedd_shared_port 9615 Specifies the port number to be used by the shared port daemon for schedds. This is only available in HTCondor 7.5.3+.
This can drastically reduce the number of ports used and thus improves scalability.
The default port is 9615.
Leave this option blank if you do not wish to utilize this feature or if is is not supported in the version of Condor being used.
For more information on use of the shared_port_daemon, see the GlideinWMS - Advanced Condor Configuration manual.
install_vdt_client n Indicates if an OSG/VDT client should be installed if it is not already present in the vdt_location option location. This should be set as n - so the installler will not attempt to install the OSG Client
You must pre-install OSG Client
vdt_location /path/to/glidein/vdt The location of the OSG/VDT client software. Used only if install_vdt_client option is 'y' Leave this blank (since the install_vdt_client option is should always be 'n'
glideinwms_location /path/to/glideinWMS Directory of the glideinWMS software. Since this is a HTCondor service only, this software is only used during the installation process.

For example configuration files, see here.

The installer allows you to automatically start the HTCondor daemons. To start them on your own, source the condor env script and execute:

/path/to/condor/location/condor start

To stop the Condor daemons, source the condor env script and execute:

/path/to/condor/location/condor stop

6. Submitting user jobs

From the user point of view, this is just a regular HTCondor pool.

However, since the resources potentially come from all over the world, users need to create more complex Requirements line in their JDL, in order to prevent the jobs from landing on sites that cannot run their jobs.

Users will need to know which attributes the glideins publish and use them accordingly.

One useful attribute that all glideins publish is GLIDEIN_Site. If a user wants to restrict its job to a list of sites, he can do it by using:

+DESIRED_Sites = "Site1,Site4,Site7,Site22"
Requirements = stringListMember(GLIDEIN_Site,DESIRED_Sites)

Some glideins may also need to properly identify the final user, using GSI authentication. A user should thus add the following line:

x509userproxy = <path to X509 proxy>

to their HTCondor submission file.

7. Fine Tunning for Large Scale Installations

7.1 Increase the number of available file descriptors

Number of ports used by the condor_schedd process increases as the number of jobs running/queued in the schedd increase. The default number of file descriptors per process is 1024 on most systems. Increase this limit to ~16k or value higher than number of jobs that might be in the queue at any given time. This is particularly required for large scale installations.

In most cases for default installation, user schedd is configured to start as root and started through the script in /etc/xinet.d/condor This is a good place to set higher file descriptor limit for the schedd process.