Description
This recipe is designed to give an example on how to configure a Factory and Frontend to submit user jobs to a batch cluster via BOSCO.
Requirement | Description |
A functioning GlideinWMS Factory | The Factory should be completely configured and functioning for Grid submissions. The main reason for this is to be able to be assured that the Factory is running and works before we do any configuration for BOSCO. |
A functioning GlideinWMS Frontend | The Frontend should be completely configured and functioning for Grid submissions. The same reasoning for the Factory applies here. |
Valid, current, enabled account to access a submit host and submit to the cluster. Specifically, you need the private and public ssh keys are needed for submission. Then you can add the resource by invoking the "bosco_cluster --add" command. This can be invoked from any host but we suggest to do it from the Frontend so that you don't need to transfer the ssh keys. See the BOSCO manual for more information on adding a BOSCO resource. |
The BOSCO submission from the Factory uses SSH. Before being able to do ssh in batch mode for the glidein submission, you must trust the keys of the BOSCO resource, the one added with "bosco_cluster --add" on the Frontend, in our example carvergrid.nersc.gov. You have a couple of options depending on how you want to configure ssh:
-
Build up a global fingerprint list. Collect the keys in
/etc/ssh/ssh_known_hosts (or ~/.ssh/known_hosts of the users
submitting glideins). Note that you'll have to update the
fingerprint list if the BOSCO recource key will change or all the
glidein submisison attempts will fail.
ssh-keyscan -t rsa,dsa carvergrid.nersc.gov >> /etc/ssh/ssh_known_hosts
-
Alternatively disable strict host key checking in ssh for the BOSCO
resource adding these lines to /etc/ssh/ssh_config (or ~/.ssh/config
of the users submitting glideins). The syntax below can accept also
IP numbers or wildcards (to include more hosts).
Host carvergrid.nersc.gov StrictHostKeyChecking no
The setting above will add the key the first time and give a warning if the key changes subsequently. To avoid host key verification, and not use known_hosts file you can do the following (not recommended unless on a local network). In this example the BOSCO resources have IPs 192.168.0.XXX.Host 192.168.0.* StrictHostKeyChecking no UserKnownHostsFile=/dev/null
If you have access to the ssh key on the Factory, it is recommended also to manually ssh to the host to see if the ssh connection works correctly.
The configuration fragments below highlight the parts that differ most for a BOSCO entry: the entry configuration on the Factory and the credentials configuration on the Frontend.
Example BOSCO Factory Entry
<entry name="BOSCO_TEST_carver" auth_method="key_pair" enabled="True" gatekeeper="cmsuser@carvergrid.nersc.gov" gridtype="batch pbs" schedd_name="fermicloud199.fnal.gov" boeco_dir="altbosco" trust_domain="bosco" verbosity="std" work_dir="AUTO"> <config> <max_jobs glideins="3" held="2" idle="1"> <max_job_frontends></max_job_frontends> </max_jobs> <release max_per_cycle="20" sleep="0.2"/> <remove max_per_cycle="5" sleep="0.2"/> <restrictions require_voms_proxy="False"/> <submit cluster_size="10" max_per_cycle="100" sleep="0.2"> <submit_attrs> <submit_attr name="+remote_queue" value='"serial"'/> <submit_attr name="request_memory" value="2048"/> <submit_attrs> </submit> </config> <allow_frontends></allow_frontends> <attrs> <attr name="CONDOR_ARCH" const="True" glidein_publish="False" job_publish="False" parameter="True" publish="False" type="string" value="default"/> <attr name="CONDOR_OS" const="True" glidein_publish="False" job_publish="False" parameter="True" publish="False" type="string" value="default"/> <attr name="GLIDEIN_Site" const="True" glidein_publish="True" job_publish="True" parameter="True" publish="True" type="string" value="BOSCO_PBS"/> <attr name="USE_CCB" const="False" glidein_publish="True" job_publish="False" parameter="True" publish="True" type="string" value="True"/> <attr name="X509_CERT_DIR" const="True" glidein_publish="False" job_publish="True" parameter="True" publish="True" type="string" value="/osg/certificates"/> </attrs> <files></files> <infosys_refs></infosys_refs> <monitorgroups></monitorgroups> </entry>
The important pieces of the entry stanza listed above are listed below:
Name | Type | Value | Description |
auth_method |
The key pair in this case refers to the ssh keypair installed to access the BOSCO resource (remote cluster submit host). See Factory Configuration for a complete description. |
||
gatekeeper |
The gatekeeper attribute in the BOSCO case is the username and hostname used by the user to login to the cluster and submit jobs. See Factory Configuration for a complete description. |
||
gridtype | "batch pbs" |
It must be the keyword "batch" followed by the batch system used in the cluster (must be one supported by HTCondor/BOSCO, e.g pbs, condor, lsf, sge. See Factory Configuration for a complete description. |
|
bosco_dir | "altbosco" |
This is optional, default is "bosco". It is the BOSCO directory on the BOSCO resource (containing BLAHP and other HTCondor sw). This is installed by "bosco_cluter -add". The value is relative to $HOME. If you move it manually note the hardcoded paths in ~/bosco/glite/etc/condor_config.ft-gahp. See Factory Configuration for a complete description. |
|
trust_domain | "bosco" |
The trust domain can be any arbitrary value. Both the Factory and the Frontend must be configured to use the same value of the trust_domain. In this example, "bosco" is the arbitrary value. See Factory Configuration for a complete description. |
|
work_dir | "AUTO" |
The working directory that the pilot starts up in can be any one supported by the remote cluster or batch system. See Factory Configuration for a complete description. |
|
glideins | "3" |
This is a hard limit for the number of glideins that the Factory will submit to the remote batch system. For testing purposes this example was restricted to 3 running VMs. See Factory Configuration for a complete description. |
|
held | "1" |
This is a limit for the number of glideins requests that can be in held state. If the number of held requests match this number, the Factory will stop asking for more. For purposes of testing, this number was set extremely low. See Factory Configuration for a complete description. |
|
idle | "1" |
This is a limit for the number of glideins requests that can be in idle state. Ordinarily, this attribute is used to determine "pressure" at a grid site. See Factory Configuration for a complete description. |
|
submit_attr | - |
This element is used to specify RSL or equivalent. Name and value of the submit attribute configured will be put in the glidein's JDL before submission. For example, the above configuration shows how to configure glidein submission to a specific remote queue and will result in the following line in the glidein's JDL. +remote_queue = "serial" See Factory Configuration for a complete description. |
Example BOSCO Frontend Configuration
The Frontend configuration fragment in this example is only for the credential setup. The credential setup can be included in the group credential definition or in the global credential definition.
<credential absfname="/path/to/grid_proxy" security_class="frontend" trust_domain="OSG" type="grid_proxy"/> <credential absfname="/path/to/bosco_key.rsa.pub" keyabsfname="/path/to/bosco_key.rsa" pilotabsfname="/path/to/grid_proxy" security_class="frontend" trust_domain="bosco" type="key_pair"/>
Note that the ssh key pair in the configuration (/path/to/bosco_key.rsa.pub, /path/to/bosco_key.rsa) must give access to the BOSCO recource, e.g. it can be the one generated by "bosco_cluster --add" (after removing the passphrase), or a different key-pair that you setup. pilotabsfname is the proxy needed by the glidein to authenticate back with the User pool.
If you decide to use the "bosco_cluster --add" key pair, you must first remove the passphrase as mentioned above. To do so as the user that ran the BOSCO command, check first that the important files exist ($HOME/.ssh/bosco_key.rsa, $HOME/.ssh/bosco_key.rsa.pub, $HOME/.bosco.pass), and then run:
openssl rsa -in $HOME/.ssh/bosco_key.rsa -out $HOME/.ssh/bosco_key.rsa_new -passin file:$HOME/.bosco/.pass chmod 600 $HOME/.ssh/bosco_key.rsa_new cp $HOME/.ssh/bosco_key.rsa_new /path/to/bosco_key.rsa cp $HOME/.ssh/bosco_key.rsa.pub /path/to/bosco_key.rsa.pub
The important pieces of the credential stanza listed above are listed below:
Name | Type | Value | Description |
absfname | "/path/to/grid_proxy" |
This is the full path to the file containing the grid proxy used to identify the glidein with the Frontend See Frontend Configuration for a complete description. |
|
absfname | "/path/to/bosco_key.rsa.pub" |
This is the full path to the file containing the publik key installed on the BOSCO resource to allow ssh access See Frontend Configuration for a complete description. |
|
keyabsfname | "/path/to/bosco_key.rsa" |
This is the full path to the file containing the secret key used to access the BOSCO resource via ssh See Frontend Configuration for a complete description. |
|
security_class | "frontend" |
This is the security class that is defined for the other credentials on this Frontend See Frontend Configuration for a complete description. |
|
trust_domain | "bosco" |
The trust domain can be any arbitrary value. Both the Factory and the Frontend must be configured to use the same value of the trust_domain. In this example, "bosco" is the arbitrary value. See Frontend Configuration for a complete description. |
|
type | "key_pair" |
The key pair in this case refers to the public and secret keys that can be used to ssh to the BOSCO resource submit host. This must match the value specified in the Factory for the credentials to be matched properly See Frontend Configuration for a complete description. |
|
pilotabsfname | "/path/to/pilot_proxy" |
This is necessary for all BOSCO entries. A proxy for the pilot is required in all cases, even if proxies are not used to authenticate on the gatekeeper. This is because the proxy is used to establish secure communication between the pilot and the user collector. See Frontend Configuration for a complete description. |