GlideinWMS The Glidein-based Workflow Management System

Glidein Recipes

Overview

Jump to:
  1. Overview

Description

This recipe is designed to give an example on how to configure a Factory and Frontend to submit user jobs and for running in cloud resources like: Amazon EC2, OpenStack, OpenNebula or Google cloud.

Requirement Description
A functioning GlideinWMS Factory The Factory should be completely configured and functioning for Grid submissions. The main reason for this is to be able to be assured that the Factory is running and works before we do any configuration for cloud.
A functioning GlideinWMS Frontend The Frontend should be completely configured and functioning for Grid submissions. The same reasoning for the Factory applies here.
Valid, current, enabled Amazon EC2 credentials Specifically, the AccessKey and SecretKey are needed for submission.

Example Factory Entry: Amazon EC2

<entry name="Amazon_Vandy"
    auth_method="key_pair+vm_id+vm_type"
    enabled="True"
    gatekeeper="https://us-east-1.ec2.amazonaws.com"
    gridtype="ec2"
    schedd_name="cms-xen6.fnal.gov"
    trust_domain="cloud"
    verbosity="std"
    work_dir=".">

    <config>
        <max_jobs glideins="3" held="2" idle="1">
            <max_job_frontends></max_job_frontends>
        </max_jobs>
        <release max_per_cycle="20" sleep="0.2"/>
        <remove max_per_cycle="5" sleep="0.2"/>
        <restrictions require_voms_proxy="False"/>
        <submit cluster_size="10" max_per_cycle="100" sleep="0.2"/>
        <submit_attrs>
            <submit_attr name="ec2_availability_zone" value="us-west-2a"/>
            <submit_attr name="ec2_spot_price" value="0.03"/>
        <submit_attrs>

    </config>
    <allow_frontends></allow_frontends>

    <attrs>
        <attr name="CONDOR_ARCH" const="True" glidein_publish="False" job_publish="False"
              parameter="True" publish="False" type="string" value="default"/>
        <attr name="CONDOR_OS" const="True" glidein_publish="False"  job_publish="False"
              parameter="True" publish="False" type="string" value="default"/>
        <attr name="GLIDEIN_Site" const="True" glidein_publish="True" job_publish="True"
              parameter="True" publish="True" type="string" value="Amazon_EC2"/>
        <attr name="USE_CCB" const="True" glidein_publish="True" job_publish="False"
              parameter="True" publish="True" type="string" value="True"/>
    </attrs>

    <files></files>
    <infosys_refs></infosys_refs>
    <monitorgroups></monitorgroups>
</entry>
            

Example Factory Entry: Google cloud

<entry name="Google_Entry"
    auth_method="auth_file+vm_id+vm_type"
    enabled="True"
    gatekeeper="https://www.googleapis.com/compute/v1"
    gridtype="gce"
    schedd_name="cms-xen6.fnal.gov"
    trust_domain="Google_cloud"
    verbosity="std"
    work_dir=".">

    <config>
        <max_jobs glideins="3" held="2" idle="1">
            <max_job_frontends></max_job_frontends>
        </max_jobs>
        <release max_per_cycle="20" sleep="0.2"/>
        <remove max_per_cycle="5" sleep="0.2"/>
        <restrictions require_voms_proxy="False"/>
        <submit cluster_size="10" max_per_cycle="100" sleep="0.2"/>
    </config>
    <allow_frontends></allow_frontends>

    <attrs>
        <attr name="CONDOR_ARCH" const="True" glidein_publish="False" job_publish="False"
              parameter="True" publish="False" type="string" value="default"/>
        <attr name="CONDOR_OS" const="True" glidein_publish="False"  job_publish="False"
              parameter="True" publish="False" type="string" value="default"/>
        <attr name="GLIDEIN_Site" const="True" glidein_publish="True" job_publish="True"
              parameter="True" publish="True" type="string" value="Amazon_EC2"/>
        <attr name="USE_CCB" const="True" glidein_publish="True" job_publish="False"
              parameter="True" publish="True" type="string" value="True"/>
    </attrs>

    <files></files>
    <infosys_refs></infosys_refs>
    <monitorgroups></monitorgroups>
</entry>
            

The important pieces of the entry stanza listed above are listed below:

Name Type Value Description
auth_method Element attribute for <entry> "key_pair+vm_id+vm_type"

The key pair in this case refers to the AccessKey and SecretKey that EC2-like cloud providers give for their REST interface. The vm_id and vm_type correspond to EC2's AMI_ID and AMI_TYPE descriptors. Each cloud implementation will have their own definitions for what these descriptors mean. In this example, the actual values will be configured by the Frontend.

See Factory Configuration for a complete description.

gatekeeper Element attribute for <entry> "https://us-east-1.ec2.amazonaws.com"

The gatekeeper attribute in the cloud case is similar enough to a grid gatekeeper that there is no function difference as far as the GlideinWMS Factory admin is concerned. EC2 has regional gatekeepers, so choose the gatekeeper for the region in which you would like to run in. In this example, the US-EAST region has bee selected.

See Factory Configuration for a complete description.

gridtype Element attribute for <entry> "ec2"

To submit to EC2-like clouds, this attribute must be set to "ec2".

See Factory Configuration for a complete description.

trust_domain Element attribute for <entry> "cloud"

The trust domain can be any arbitrary value. The only caveat is that both the Factory and the Frontend must be configured to use the same value. In this example, "cloud" is the arbitrary value.

See Factory Configuration for a complete description.

work_dir Element attribute for <entry> "."

The working directory that the pilot starts up in must be "." for this example. The reason is that the VM that the example is pointing to makes specific use of the scratch space Amazon provides. This is in a non-standard location. For all intents and practical purposes, it will be the VOs responsibility to define the working directory on the VM and have the contextualization scripts handle the setup of where the pilot starts.

See Factory Configuration for a complete description.

glideins Element attribute for <max_jobs> "3"

This attribute is very important for cloud use. Even more so when real money is being used to pay for the computing cycles. This is a hard limit for the number of VMs that the Factory will start. For testing purposes this example was restricted to 3 running VMs.

See Factory Configuration for a complete description.

held Element attribute for <max_jobs> "1"

This is a limit for the number of VM requests that can be in held state. If the number of held requests match this number, the Factory will stop asking for more. For purposes of testing, this number was set extremely low.

See Factory Configuration for a complete description.

idle Element attribute for <max_jobs> "1"

This is a limit for the number of VM requests that can be in idle state. Ordinarily, this attribute is used to determine "pressure" at a grid site. However, the cloud use case is different considering that most cloud implementations do not operate on "allocations" or something similare, but are operated on a "pay-as-you-go" principle. Therefore, real money is exchanged for actual usage. By setting this value to "1", we basically turn off the "pressure" and ask for as many VMs as there are jobs up to the max set by the glideins attribute.

See Factory Configuration for a complete description.

Example Frontend Configuration: Amazon EC2

This only configuration for the Frontend in this example is for the credential setup. The credential setup can be included in the group credential definition or in the global credential definition.

<credential absfname="/path/to/cloud_AccessKey"
            keyabsfname="/path/to/cloud_SecretKey"
            security_class="Security Class"
            trust_domain="cloud"
            type="key_pair+vm_id+vm_type"
            vm_id="ami-7bf43812"
            vm_type="m1.large"
            vm_id_fname="/path/to/file_with_vm_id"
            vm_type_fname="/path/to/file_with_vm_type"
            pilotabsfname="/path/to/pilot_proxy"/>
            

Example Frontend Configuration: Google cloud

<credential absfname="/path/to/Auth_File"
            security_class="Security Class"
            trust_domain="Google_cloud"
            type="auth_file+vm_id+vm_type"
            vm_id="projects/centos-cloud/global/images/centos-6-v20160803"
            vm_type="projects/fermilab-poc/zones/us-central1-a/machineTypes/n1-standard-1"
            pilotabsfname="/path/to/pilot_proxy"/>
            

The important pieces of the credential stanza listed above are listed below:

Name Type Value Description
absfname Element attribute for <credential> "/path/to/cloud_AccessKey"

This is the full path to the file containing the AccessKey for the account that will be used to submit the VM request

See Frontend Configuration for a complete description.

keyabsfname Element attribute for <credential> "/path/to/cloud_SecretKey"

This is the full path to the file containing the SectretKey for the account that will be used to submit the VM request

See Frontend Configuration for a complete description.

security_class Element attribute for <credential> "Security Class"

This is the security class that is defined for the other credentials on this Frontend

See Frontend Configuration for a complete description.

trust_domain Element attribute for <credential> "cloud"

The trust domain can be any arbitrary value. The only caveat is that both the Factory and the Frontend must be configured to use the same value. In this example, "cloud" is the arbitrary value.

See Frontend Configuration for a complete description.

type Element attribute for <credential> "key_pair+vm_id+vm_type"

The key pair in this case refers to the AccessKey and SecretKey that EC2-like cloud providers give for their REST interface. The vm_id and vm_type correspond to EC2's AMI_ID and AMI_TYPE descriptors. Each cloud implementation will have their own definitions for what these descriptors mean. In this example, the actual values will be configured by the Frontend.

This must match the value specified in the Factory for the credentials to be matched properly

See Frontend Configuration for a complete description.

vm_id Element attribute for <credential> "ami-7bf43812"

Since the <type> attribute contains vm_id, it must be specified here. See the specific cloud implementation for the correct vm_id value. In this example, a generic VM has been uploaded to Amazon EC2 and is ready for use.

See Frontend Configuration for a complete description.

vm_type Element attribute for <credential> "m1.large"

Since the <type> attribute contains vm_type, it must be specified here. See the specific cloud implementation for the correct vm_type value. In this example, a generic VM has been uploaded to Amazon EC2 and is ready for use.

See Frontend Configuration for a complete description.

vm_id_fname Element attribute for <credential> "/path/to/file_with_vm_id"

Alternate means to provide the vm_id. If vm_id_fname is configured, updated vm_id from the file is used without a need to reconfigure the Frontend service. If vm_id and vm_id_fname are both found in the configuration, vm_id_fname will be used. vm_id_fname should contain a line in following format

VM_ID=ami-7bf43812
Note: both vm_id_fname and vm_type_fname can use the same text file.
vm_type_fname Element attribute for <credential> "/path/to/vm_type_fname"

Alternate means to provide the vm_type. If vm_type_fname is configured, updated vm_type from the file is used without a need to reconfigure the Frontend service. If vm_type and vm_type_fname are both found in the configuration, vm_type_fname will be used. vm_type_fname should contain a line in following format

VM_TYPE=c3.large
Note: both vm_id_fname and vm_type_fname can use the same text file.
pilotabsfname Element attribute for <credential> "/path/to/pilot_proxy"

A proxy for the pilot is required in all cases, even if proxies are not used to authenticate on the gatekeeper. This is because the proxy is used to establish secure communication between the pilot and the user collector.

See Frontend Configuration for a complete description.