Glidein Factory Overview
A Glidein Factory is composed of a Factory Daemon and several Factory Entry Daemons. Each Entry Daemon is autonomous; it advertises itself and processes the incoming requests. The main Factory Daemon just launches and monitors the Factory Entry Daemons.
In the section below, each components will be described.
Factory Components
The factory is made of several parts, which each have their own section on their design.Implementation details
The Glidein Factory Daemon
Also, the Glidein Factory is composed of several entry points. The Factory Daemon is really just a small process tasked to start and monitor the Factory Entry Daemons. See picture in the next section for a logical overview.
All daemons of a Glidein Factory share the same directory tree, which root contains the main configuration files used by the Factory daemon.
More details about the Glidein Factory Daemon internals can be found here.
The Factory Entry Daemons
The Glidein Factory is composed of several Factory Entry Daemons, each advertising itself and processing the incoming requests. See the picture below for a logical overview.
As explained previously, the root of the tree contains the common startup and configuration files, while each entry point has a few additional configuration files on its own. Each entry point is completely described by these files on disk; the Factory Entry Daemons only extract information about entry point attributes and supported parameters needed for advertising. When glidein jobs are submitted, only the Frontend provided parameters need to be given to the glidein startup script, as the script itself will autonomously gather all the other information.
More details about the Factory Entry Daemon internals can be found here.
The glidein startup script
As said in the overview,
a glidein is essentially a properly configured HTCondor startd.
However, somebody needs to do that configuration. So we need a job
startup script that will do the work.
A startup script needs
to perform several tasks:
- check that the working environment on the worker node is reasonable (else user jobs will fail)
- obtain the HTCondor binaries
- configure HTCondor
- prepare the environment for HTCondor
- start HTCondor
Given the complexity of the task and for the sake of flexibility, it makes sense to split the script in several pieces. Thus, the glidein job is composed of several pieces, including the startup script pieces, the HTCondor binaries, and a base configuration file.
However, having a Grid job with data files can represent a challenge; each Grid flavor treats data in a different way!
To make the system as general as possible, the Glidein Factory requires the use of a Web Server to distribute its data. This version of the Glidein based Factory was tested with Apache and TUX, but any other web server should work just well, as only static file delivery is required.
A general overview of how a glidein starts up is given in the picture below.
The task of the basic startup script (called glidein_startup.sh)
is thus reduced to loading the other files including the support
scripts, the base config files and the HTCondor binaries. The list of
files to load is obtained from the Web server as one of the first
steps, making the startup script completely generic.
Please
notice two things. First, all the files transfered over the Web are
signed using sha1sum. This
prevents a hacker from tampering with the files while in transit.
This is especially important for executables and scripts (to prevent
arbitrary code to be executed), but is useful for configuration files
too.
The signature checking is implemented in two steps:
- The signature of all the files to be transfered is saved in a file called signature.sha1 and stored on the Web server. The signature of the signature file is then passed as one of the parameters to the startup script.
- The startup script loads the signature file from the Web server and verifies its signature. All other downloads, including the file containing the list of other files, is checked against the values in the signature file. See the pseudo-code below.
wget http://head.fnal.gov/glidein_g1/signature.sha1 sha1sum known_sha1 signature.sha1 if $?!=0 then exit 1 fi grep files_list signature.sha1 > filelist.sha1 wget http://head.fnal.gov/glidein_g1/files_list.lst sha1sum -c filelist.sha1 if $?!=0 then exit 2 fi for file in files_list.lst do wget http://head.fnal.gov/glidein_g1/$file done sha1sum -c signature.sha1 if $?!=0 then exit 3 fi launch scripts
The second point I would like to stress, are the advantages that
come from using standard Web technologies. Web technologies are
widely used nowadays and there is a plethora of tools that can be
readily used. In our case, we can reduce the network load and speed
startup times by using a Web cache near the worker nodes, if
available. The Glidein Factory was tested with
Squid,
but other products should work as well. It is also worth mentioning
that both OSG and gLite has expressed interest in deploying a Squid
server on every Grid site.
More details about the startup
script internals and support scripts provided by the current
implementation can be found here.