GlideinWMS - Factory

GlideinWMS The Glidein-based Workflow Management System

WMS Factory

Design Overview

Glidein Factory Overview

A Glidein Factory is composed of a Factory Daemon and several Factory Entry Daemons. Each Entry Daemon is autonomous; it advertises itself and processes the incoming requests. The main Factory Daemon just launches and monitors the Factory Entry Daemons.

In the section below, each components will be described.

Factory Components

The factory is made of several parts, which each have their own section on their design.

Implementation details

The Glidein Factory Daemon

Also, the Glidein Factory is composed of several entry points. The Factory Daemon is really just a small process tasked to start and monitor the Factory Entry Daemons. See picture in the next section for a logical overview.

All daemons of a Glidein Factory share the same directory tree, which root contains the main configuration files used by the Factory daemon.

More details about the Glidein Factory Daemon internals can be found here.

The Factory Entry Daemons

The Glidein Factory is composed of several Factory Entry Daemons, each advertising itself and processing the incoming requests. See the picture below for a logical overview.

As explained previously, the root of the tree contains the common startup and configuration files, while each entry point has a few additional configuration files on its own. Each entry point is completely described by these files on disk; the Factory Entry Daemons only extract information about entry point attributes and supported parameters needed for advertising. When glidein jobs are submitted, only the Frontend provided parameters need to be given to the glidein startup script, as the script itself will autonomously gather all the other information.

More details about the Factory Entry Daemon internals can be found here.

The glidein startup script

As said in the overview,  a glidein is essentially a properly configured HTCondor startd. However, somebody needs to do that configuration. So we need a job startup script that will do the work.

A startup script needs to perform several tasks:

  • check that the working environment on the worker node is reasonable (else user jobs will fail)
  • obtain the HTCondor binaries
  • configure HTCondor
  • prepare the environment for HTCondor
  • start HTCondor

Given the complexity of the task and for the sake of flexibility, it makes sense to split the script in several pieces. Thus, the glidein job is composed of several pieces, including the startup script pieces, the HTCondor binaries, and a base configuration file.

However, having a Grid job with data files can represent a challenge; each Grid flavor treats data in a different way!

To make the system as general as possible, the Glidein Factory requires the use of a Web Server to distribute its data. This version of the Glidein based Factory was tested with Apache and TUX, but any other web server should work just well, as only static file delivery is required.

A general overview of how a glidein starts up is given in the picture below.

The task of the basic startup script (called glidein_startup.sh) is thus reduced to loading the other files including the support scripts, the base config files and the HTCondor binaries. The list of files to load is obtained from the Web server as one of the first steps, making the startup script completely generic.

Please notice two things. First, all the files transfered over the Web are signed using sha1sum. This prevents a hacker from tampering with the files while in transit. This is especially important for executables and scripts (to prevent arbitrary code to be executed), but is useful for configuration files too.

The signature checking is implemented in two steps:

  1. The signature of all the files to be transfered is saved in a file called signature.sha1 and stored on the Web server. The signature of the signature file is then passed as one of the parameters to the startup script.
  2. The startup script loads the signature file from the Web server and verifies its signature. All other downloads, including the file containing the list of other files, is checked against the values in the signature file. See the pseudo-code below.
  3. wget http://head.fnal.gov/glidein_g1/signature.sha1
    sha1sum known_sha1 signature.sha1
    if $?!=0 then
     exit 1
    fi
    grep files_list signature.sha1 > filelist.sha1
    wget http://head.fnal.gov/glidein_g1/files_list.lst
    sha1sum -c filelist.sha1
    if $?!=0 then
     exit 2
    fi
    for file in files_list.lst do
     wget http://head.fnal.gov/glidein_g1/$file
    done
    sha1sum -c signature.sha1
    if $?!=0 then
     exit 3
    fi
    
    launch scripts

The second point I would like to stress, are the advantages that come from using standard Web technologies. Web technologies are widely used nowadays and there is a plethora of tools that can be readily used. In our case, we can reduce the network load and speed startup times by using a Web cache near the worker nodes, if available. The Glidein Factory was tested with Squid, but other products should work as well. It is also worth mentioning that both OSG and gLite has expressed interest in deploying a Squid server on every Grid site.

More details about the startup script internals and support scripts provided by the current implementation can be found here.