glideinwms.factory package

Subpackages

Submodules

glideinwms.factory.checkFactory module

glideinwms.factory.glideFactory module

This is the main of the glideinFactory

param $1 = glidein submit_dir:

exception glideinwms.factory.glideFactory.HUPException[source]

Bases: Exception

Used to catch SIGHUP and trigger a reconfig

glideinwms.factory.glideFactory.aggregate_stats(in_downtime)[source]

Aggregate all the monitoring stats

@type in_downtime: boolean @param in_downtime: Entry downtime information :return stats dictionary

glideinwms.factory.glideFactory.clean_exit(childs)[source]
glideinwms.factory.glideFactory.entry_grouper(size, entries)[source]

Group the entries into n smaller groups KNOWN ISSUE: Needs improvement to do better grouping in certain cases TODO: Migrate to itertools when only supporting python 2.6 and higher

@type size: long @param size: Size of each subgroup @type entries: list @param size: List of entries

@rtype: list @return: List of grouped entries. Each group is a list

glideinwms.factory.glideFactory.generate_log_tokens(startup_dir, glideinDescript)[source]

Generate the JSON Web Tokens used to authenticate with the remote HTTP log server. Note: tokens are generated for disabled entries too

Parameters:
  • startup_dir (str|Path) – Path to the glideinsubmit directory

  • glideinDescript – Factory config’s glidein description object

Returns:

None

Raises:

IOError – If can’t open/read/write a file (key/token)

glideinwms.factory.glideFactory.hupsignal(signr, frame)[source]

Signal handler. Raise HUPException when receiving SIGHUP. Used to trigger a reconfig and restart.

glideinwms.factory.glideFactory.increase_process_limit(new_limit=10000)[source]

Raise RLIMIT_NPROC to new_limit

glideinwms.factory.glideFactory.is_crashing_often(startup_time, restart_interval, restart_attempts)[source]

Check if the entry is crashing/dieing often

@type startup_time: long @param startup_time: Startup time of the entry process in second @type restart_interval: long @param restart_interval: Allowed restart interval in second @type restart_attempts: long @param restart_attempts: Number of allowed restart attempts in the interval

@rtype: bool @return: True if entry process is crashing/dieing often

glideinwms.factory.glideFactory.is_file_old(filename, allowed_time)[source]

Check if the file is older than given time

@type filename: String @param filename: Full path to the file @type allowed_time: long @param allowed_time: Time is second

@rtype: bool @return: True if file is older than the given time, else False

glideinwms.factory.glideFactory.main(startup_dir)[source]

Reads in the configuration file and starts up the factory

@type startup_dir: String @param startup_dir: Path to glideinsubmit directory

glideinwms.factory.glideFactory.save_stats(stats, fname)[source]

Serialize and save aggregated statistics so that each component (Factory and Entries) can retrieve and use it to log and advertise

stats is a dictionary pickled in binary format stats[‘LogSummary’] - log summary aggregated info

Parameters:
  • stats – aggregated Factory statistics

  • fname – name of the file with the serialized data

Returns:

glideinwms.factory.glideFactory.spawn(sleep_time, advertize_rate, startup_dir, glideinDescript, frontendDescript, entries, restart_attempts, restart_interval)[source]

Spawn and keep track of the entry processes. Restart them if required. Advertise glidefactoryglobal classad every iteration

@type sleep_time: long @param sleep_time: Delay between every iteration @type advertize_rate: long @param advertize_rate: Rate at which entries advertise their classads @type startup_dir: String @param startup_dir: Path to glideinsubmit directory @type glideinDescript: glideFactoryConfig.GlideinDescript @param glideinDescript: Factory config’s glidein description object @type frontendDescript: glideFactoryConfig.FrontendDescript @param frontendDescript: Factory config’s frontend description object @type entries: list @param entries: Sorted list of entry names @type restart_interval: long @param restart_interval: Allowed restart interval in second @type restart_attempts: long @param restart_attempts: Number of allowed restart attempts in the interval

glideinwms.factory.glideFactory.termsignal(signr, frame)[source]

Signal handler. Raise KeyboardInterrupt when receiving SIGTERN or SIGQUIT

glideinwms.factory.glideFactory.update_classads()[source]

Loads the aggregate job summary pickle files, and then quedit the finished jobs adding a new classad called MONITOR_INFO with the monitor information.

Returns:

glideinwms.factory.glideFactory.write_descript(glideinDescript, frontendDescript, monitor_dir)[source]

Write the descript.xml to the monitoring directory

@type glideinDescript: glideFactoryConfig.GlideinDescript @param glideinDescript: Factory config’s glidein description object @type frontendDescript: glideFactoryConfig.FrontendDescript @param frontendDescript: Factory config’s frontend description object @type monitor_dir: String @param monitor_dir: Path to monitoring directory

glideinwms.factory.glideFactoryConfig module

class glideinwms.factory.glideFactoryConfig.ConfigFile(config_file, convert_function=<built-in function repr>)[source]

Bases: object

In memory dictionary-like representation of key-value config files Loads a file composed of

NAME VAL

and creates

self.data[NAME]=convert_function(VAL) # repr is the default conversion

It also defines:

self.config_file=”name of file”

This is used only to load into memory and access the dictionary, not to update the on-disk persistent values

has_key(key_name)[source]
load(fname, convert_function)[source]
class glideinwms.factory.glideFactoryConfig.EntryConfigFile(entry_name, config_file, convert_function=<built-in function repr>)[source]

Bases: ConfigFile

Load from the entry subdir

It also defines:

self.config_file=”name of file with entry directory” (from parent ConfigFile) self.entry_name=”Entry name” self.config_file_short=”name of file” (just the file name since the other had the directory)

class glideinwms.factory.glideFactoryConfig.FactoryConfig[source]

Bases: object

class glideinwms.factory.glideFactoryConfig.FrontendDescript[source]

Bases: ConfigFile

Contains the security identity and username mappings for the Frontends that are authorized to use this factory.

Contains dictionary of dictionaries: obj.data[frontend][‘ident’]=identity obj.data[frontend][‘usermap’][sec_class]=username

get_all_frontend_sec_classes()[source]

Get a list of all frontend:sec_class

Returns:

Frontend security classes

Return type:

list

get_all_usernames()[source]

Get all the usernames assigned to all the frontends.

Returns:

list of usernames

Return type:

list

get_frontend_name(identity)[source]

Get the frontend:sec_class mapping for the given identity

Parameters:

identity (str) – identity

Returns:

Frontend name

Return type:

str

get_identity(frontend)[source]

Get the identity for the given frontend. If the Frontend is unknown, returns None.

Parameters:

frontend (str) – frontend name

Returns:

identity

Return type:

str|None

get_username(frontend, sec_class)[source]

Get the security name mapping for the given frontend and security class. If not found or not authorized, returns None.

Parameters:
  • frontend (str) – frontend name

  • sec_class (str) – security class name

Returns:

security name

Return type:

str|None

class glideinwms.factory.glideFactoryConfig.GlideinDescript[source]

Bases: ConfigFile

backup_and_load_old_key()[source]

Backup existing key and load the key object

backup_rsa_key()[source]

Backup existing rsa key.

load_old_rsa_key()[source]

Load the old key object.

load_pub_key(recreate=False)[source]

Load the key object. Create the key if required

Parameters:

recreate (bool) – Create a new key overwriting the old one. Defaults to False

remove_old_key()[source]
class glideinwms.factory.glideFactoryConfig.GlideinKey(pub_key_type, key_fname=None, recreate=False)[source]

Bases: object

extract_sym_key(enc_sym_key)[source]

Extracts the symkey from encrypted fronted attribute

Parameters:

enc_sym_key (AnyStrASCII) – encrypted symmetric key

Returns:

symmetric key (SymKey child object)

Return type:

SymKey

Raises:

RuntimeError – if the key type is not RSA

get_pub_key_id()[source]
get_pub_key_type()[source]
get_pub_key_value()[source]
load(key_fname=None, recreate=False)[source]

Create the key if required and initialize it

Parameters:
  • key_fname (str) – Filename of the key

  • recreate (bool) – Create a new key if True else load existing key. Defaults to False.

Raises:

RuntimeError – if asking for a key type different from RSA

class glideinwms.factory.glideFactoryConfig.JobAttributes(entry_name)[source]

Bases: JoinConfigFile

class glideinwms.factory.glideFactoryConfig.JobDescript(entry_name)[source]

Bases: EntryConfigFile

class glideinwms.factory.glideFactoryConfig.JobParams(entry_name)[source]

Bases: JoinConfigFile

class glideinwms.factory.glideFactoryConfig.JobSubmitAttrs(entry_name)[source]

Bases: JoinConfigFile

class glideinwms.factory.glideFactoryConfig.JoinConfigFile(entry_name, config_file, convert_function=<built-in function repr>)[source]

Bases: ConfigFile

Load both the main and entry subdir config file and join the results

Data is only read, not saved self.data will contain the joint items (initially the common one, then is updated using the content of entry_obj.data)

It also defines:
self.config_file=”name of both files, with and without entry directory”

It is not an actual file

self.entry_name=”Entry name” self.config_file_short=”name of file” (just the file name without the directory)

class glideinwms.factory.glideFactoryConfig.SignatureFile[source]

Bases: ConfigFile

Signatures File dictionary

load(fname, convert_function)[source]

Load the signatures.sha1 file into the class as a dictionary. The convert_function is completely ignored here. The line format is different from all the other class in that there are three values with the key being the last value. The internal dictionary has the following structure:

where:

line[0] is the signature for the line line[1] is the descript file for the line line[2] is the key for the line

for each line:

line[2]_sign = line[0] line[2]_descript = line[1]

glideinwms.factory.glideFactoryCredentials module

exception glideinwms.factory.glideFactoryCredentials.CredentialError[source]

Bases: Exception

defining new exception so that we can catch only the credential errors here and let the “real” errors propagate up

class glideinwms.factory.glideFactoryCredentials.SubmitCredentials(username, security_class)[source]

Bases: object

Data class containing all information needed to submit a glidein.

add_factory_credential(cred_type, absfname)[source]

Adds a factory provided security credential.

add_identity_credential(cred_type, cred_str)[source]

Adds an identity credential.

add_security_credential(cred_type, filename)[source]

Adds a security credential.

glideinwms.factory.glideFactoryCredentials.check_security_credentials(auth_method, params, client_int_name, entry_name, scitoken_passthru=False)[source]

Verify that only credentials for the given auth method are in the params

Parameters:
  • auth_method – (string): authentication method of an entry, defined in the config

  • params – (dictionary): decrypted params passed in a frontend (client) request

  • client_int_name (string) – internal client name

  • entry_name – (string): name of the entry

  • scitoken_passthru – (bool): if True, scitoken present in credential. Override checks for ‘auth_method’ and proceded with glidein request

Raises:

CredentialError – if the credentials in params don’t match what is defined for the auth method

glideinwms.factory.glideFactoryCredentials.compress_credential(credential_data)[source]
glideinwms.factory.glideFactoryCredentials.get_globals_classads(factory_collector='default')[source]
glideinwms.factory.glideFactoryCredentials.get_key_obj(pub_key_obj, classad)[source]

Gets the symmetric key object from the request classad

@type pub_key_obj: object @param pub_key_obj: The factory public key object. This contains all the encryption and decryption methods @type classad: dictionary @param classad: a dictionary representation of the classad

glideinwms.factory.glideFactoryCredentials.process_global(classad, glidein_descript, frontend_descript)[source]
glideinwms.factory.glideFactoryCredentials.safe_update(fname, credential_data)[source]
glideinwms.factory.glideFactoryCredentials.update_credential_file(username, client_id, credential_data, request_clientname)[source]

Updates the credential file

Parameters:
  • username – credentials’ username

  • client_id – id used for tracking the submit credentials

  • credential_data – the credentials to be advertised

  • request_clientname – client name passed by frontend

:return:the credential file updated

glideinwms.factory.glideFactoryCredentials.validate_frontend(classad, frontend_descript, pub_key_obj)[source]

Validates that the frontend advertising the classad is allowed and that it claims to have the same identity that Condor thinks it has.

@type classad: dictionary @param classad: a dictionary representation of the classad @type frontend_descript: class object @param frontend_descript: class object containing all the frontend information @type pub_key_obj: object @param pub_key_obj: The factory public key object. This contains all the encryption and decryption methods

@return: sym_key_obj - the object containing the symmetric key used for decryption @return: frontend_sec_name - the frontend security name, used for determining the username to use.

glideinwms.factory.glideFactoryDowntimeLib module

class glideinwms.factory.glideFactoryDowntimeLib.DowntimeFile(fname)[source]

Bases: object

Handle a downtime file

space separated file with downtime information Each line has space-separated values The first line is a comment (starts with #) and header line :

“#%-29s %-30s %-20s %-30s %-20s # %s

“ % (“Start”, “End”, “Entry”, “Frontend”, “Sec_Class”, “Comment”)
Each non-comment line in the file has at least two entries

start_time end_time expressed in utime

if end_time is None, the downtime does not have a set expiration (i.e. it runs forever) Additional entries are used to limit the scope (Entry, Frontend, Sec_Class) and to add a comment

addPeriod(start_time, end_time, entry='All', frontend='All', security_class='All', comment='', create_if_empty=True)[source]

Add a scheduled downtime Maintin a lock (fcntl.LOCK_EX) on the downtime file while writing entry, frontend, and security_class default to “All”

Parameters:
  • start_time (int) – start time in seconds from Epoch

  • end_time (int) – end time in seconds from Epoch

  • entry (str) – entry name or “All”

  • frontend (str) – frontend name os “All”

  • security_class (str) – security class name or “All”

  • comment (str) – comment to add

  • create_if_empty (bool) – if False, raise FileNotFoundError if there is not already a downtime file

Returns:

0

Return type:

int

checkDowntime(entry='Any', frontend='Any', security_class='Any', check_time=None)[source]
endDowntime(end_time=None, entry='All', frontend='All', security_class='All', comment='')[source]

End a downtime (not a scheduled one) if end_time==None, use current time entry, frontend, and security_class default to “All”

Parameters:
  • end_time (int|None) – end time in seconds from Epoch. If end_time==None, default, use current time

  • entry (str) – entry name or “All”

  • frontend (str) – frontend name os “All”

  • security_class (str) – security class name or “All”

  • comment (str) – comment to add

Returns:

number of records closed

Return type:

int

printDowntime(entry='Any', check_time=None)[source]
purgeOldPeriods(cut_time=None, raise_on_error=False)[source]

Purge old downtime periods if cut time<0, use current_time-abs(cut_time)

Parameters:
  • cut_time (int) – cut time in seconds from epoch, if cut_time==None or 0, use current time, if cut time<0, use current_time-abs(cut_time)

  • raise_on_error (bool) – if not True, mask all exceptions

Returns:

number of records purged

Return type:

int

read(raise_on_error=False)[source]

Return a list of downtime periods (utimes) a value of None idicates “forever” for example: [(1215339200,1215439170),(1215439271,None)]

Parameters:

raise_on_error (bool) – if not True mask all the exceptions

Returns:

list of downtime periods [(start, end), …]

a value of None idicates “forever”, no start time, or no end time timestamps are in seconds from epoch (utime) [] returned when raise_on_error is False (default) and there is no downtime file

Return type:

list

startDowntime(start_time=None, end_time=None, entry='All', frontend='All', security_class='All', comment='', create_if_empty=True)[source]

start a downtime that we don’t know when it will end if start_time==None, use current time entry, frontend, and security_class default to “All”

Parameters:
  • start_time (int|None) – start time in seconds from Epoch

  • end_time (int|None) – end time in seconds from Epoch

  • entry (str) – entry name or “All”

  • frontend (str) – frontend name os “All”

  • security_class (str) – security class name or “All”

  • comment (str) – comment to add

  • create_if_empty (bool) – if False, raise FileNotFoundError if there is not already a downtime file

Returns:

glideinwms.factory.glideFactoryDowntimeLib.addPeriod(fname, start_time, end_time, entry='All', frontend='All', security_class='All', comment='', create_if_empty=True)[source]

Add a downtime period Maintin a lock (fcntl.LOCK_EX) on the downtime file while writing

Parameters:
  • fname (str|Path) – downtime file

  • start_time (int) – start time in seconds from Epoch

  • end_time (int) – end time in seconds from Epoch

  • entry (str) – entry name or “All”

  • frontend (str) – frontend name os “All”

  • security_class (str) – security class name or “All”

  • comment (str) – comment to add

  • create_if_empty (bool) – if False, raise FileNotFoundError if there is not already a downtime file

Returns:

0

Return type:

int

glideinwms.factory.glideFactoryDowntimeLib.checkDowntime(fname, entry='Any', frontend='Any', security_class='Any', check_time=None)[source]

Check if there is a downtime at check_time if check_time==None, use current time “All” (default) is a wildcard for entry, frontend and security_class

Parameters:
  • fname (str|Path) – Downtime file

  • entry (str) – entry name or “All”

  • frontend (str) – frontend name os “All”

  • security_class (str) – security class name or “All”

  • check_time – time to check in seconds from epoch, if check_time==None, use current time

Returns:

tuple with the comment string and True is in downtime

or (“”, False) is not in downtime

Return type:

(str, bool)

glideinwms.factory.glideFactoryDowntimeLib.endDowntime(fname, end_time=None, entry='All', frontend='All', security_class='All', comment='')[source]

End a downtime (not a scheduled one) if end_time==None, use current time “All” (default) is a wildcard for entry, frontend and security_class

Parameters:
  • fname (str|Path) – Downtime file

  • end_time (int) – end time in seconds from epoch, if end_time==None, use current time

  • entry (str) – entry name or “All”

  • frontend (str) – frontend name os “All”

  • security_class (str) – security class name or “All”

  • comment (str) – comment to add

Returns:

Number of downtime records closed

Return type:

int

glideinwms.factory.glideFactoryDowntimeLib.printDowntime(fname, entry='Any', check_time=None)[source]
glideinwms.factory.glideFactoryDowntimeLib.purgeOldPeriods(fname, cut_time=None, raise_on_error=False)[source]

Purge old rules using cut_time if cut_time==None or 0, use current time if cut time<0, use current_time-abs(cut_time)

Parameters:
  • fname (str|Path) – downtime file

  • cut_time (int) – cut time in seconds from epoch, if cut_time==None or 0, use current time, if cut time<0, use current_time-abs(cut_time)

  • raise_on_error (bool) – if not True, mask all exceptions

Returns:

number of records purged

Return type:

int

glideinwms.factory.glideFactoryDowntimeLib.read(fname, raise_on_error=False)[source]

Return a list of downtime periods (utimes) a value of None idicates “forever” for example: [(1215339200,1215439170),(1215439271,None)]

Parameters:
  • fname (str|Path) – downtimes file

  • raise_on_error (bool) – if not True mask all the exceptions

Returns:

list of downtime periods [(start, end), …]

a value of None idicates “forever”, no start time, or no end time timestamps are in seconds from epoch (utime) [] returned when raise_on_error is False (default) and there is no file

Return type:

list

glideinwms.factory.glideFactoryEntry module

Entry class Model and behavior of a Factory Entry (element describing a resource)

class glideinwms.factory.glideFactoryEntry.Entry(name, startup_dir, glidein_descript, frontend_descript)[source]

Bases: object

advertise(downtime_flag)[source]

Advertises the glidefactory and the glidefactoryclient classads.

@type downtime_flag: boolean @param downtime_flag: Downtime flag

getGlideinConfiguredLimits()[source]

Extract the required info to write to classads

getGlideinExpectedCores()[source]
Return the number of cores expected for each glidein.

This is the GLIDEIN_CPU attribute when > 0, GLIDEIN_ESTIMATED_CPUS when GLIDEIN_CPU <= 0 or auto/node/slot, or 1 if not set The actual cores received will depend on the RSL or HTCondor attributes and the Entry and could also vary over time.

getLogStatsCurrentStatsData()[source]

Returns the gflFactoryConfig.log_stats.current_stats_data that can be pickled

@rtype: glideFactoryMonitoring.condorLogSummary @return: condorLogSummary from current iteration

getLogStatsData(stats_data)[source]

Returns the stats_data(stats_data[frontend][user].data) that can be pickled

@rtype: dict @return: Relevant stats data to pickle

getLogStatsOldStatsData()[source]

Returns the gflFactoryConfig.log_stats.old_stats_data that can be pickled

@rtype: glideFactoryMonitoring.condorLogSummary @return: condorLogSummary from previous iteration

getState()[source]

Compile a dictionary containt useful state information

@rtype: dict @return: Useful state information that can be pickled and restored

glideinsWithinLimits(condorQ)[source]

Check the condorQ info and see we are within limits & init entry limits

@rtype: boolean @return: True if glideins are in limits and we can submit more

initIteration(factory_in_downtime)[source]

Perform the reseting of stats as required before every iteration

@type factory_in_downtime: boolean @param factory_in_downtime: Downtime flag for the factory

isClientBlacklisted(client_sec_name)[source]

Check if the frontend whitelist is enabled and client is not in whitelist

@rtype: boolean @return: True if the client’s security name is blacklist

isClientInWhitelist(client_sec_name)[source]

Check if the client’s security name is in the whitelist of this entry

@rtype: boolean @return: True if the client’s security name is in the whitelist

isClientWhitelisted(client_sec_name)[source]

Check if the client’s security name is in the whitelist of this entry and the frontend whitelist is enabled

@rtype: boolean @return: True if the client’s security name is whitelisted

isInDowntime()[source]

Check the downtime file to find out if entry is in downtime

@rtype: boolean @return: True if the entry is in downtime

isSecurityClassAllowed(client_sec_name, proxy_sec_class)[source]

Check if the security class is allowed

@rtype: boolean @return: True if the security class is allowed

isSecurityClassInDowntime(client_security_name, security_class)[source]

Check if the security class is in downtime in the Factory or in this Entry

@rtype: boolean @return: True if the security class is in downtime

loadContext()[source]

Load context for this entry object so monitoring and logs are writen correctly. This should be called in every method for now.

loadDowntimes()[source]

Load the downtime info for this entry

loadWhitelist()[source]

Load the whitelist info for this entry

logLogStats(marker='')[source]
queryQueuedGlideins()[source]

Query WMS schedd (on Factory) and get glideins info. Re-raise in case of failures. Return a loaded condorMonitor.CondorQ object using the entry attributes (name, schedd, …). Consists of a fetched dictionary w/ jobs (keyed by job cluster, ID) in .stored_data, some query attributes and the ability to reload (load/fetch)

@rtype: condorMonitor.CondorQ already loaded @return: Information about the jobs in condor_schedd

setDowntime(downtime_flag)[source]

Check if we are in downtime and set info accordingly

@type downtime_flag: boolean @param downtime_flag: Downtime flag

setLogStatsCurrentStatsData(new_data)[source]

Set gflFactoryConfig.log_stats.current_stats_data from pickled info

@type new_data: glideFactoryMonitoring.condorLogSummary @param new_data: Data from pickled object to load

setLogStatsData(stats_data, new_data)[source]

Sets the stats_data(stats_data[frontend][user].data) from pickled info

@type stats_data: dict @param stats_data: Stats data

@type new_data: dict @param new_data: Stats data from pickled info

setLogStatsOldStatsData(new_data)[source]

Set old_stats_data or current_stats_data from pickled info

@type new_data: glideFactoryMonitoring.condorLogSummary @param new_data: Data from pickled object to load

setState(state)[source]

Load the post work state from the pickled info

Parameters:

state (dict) – Pickled state after doing work

setState_old(state)[source]

Load the post work state from the pickled info

Parameters:

state (dict) – Picked state after doing work

unsetInDowntime()[source]

Clear the downtime status of this entry

writeClassadsToFile(downtime_flag, gf_filename, gfc_filename, append=True)[source]

Create the glidefactory and glidefactoryclient classads to advertise but do not advertise

@type downtime_flag: boolean @param downtime_flag: downtime flag

@type gf_filename: string @param gf_filename: Filename to write glidefactory classads

@type gfc_filename: string @param gfc_filename: Filename to write glidefactoryclient classads

@type append: boolean @param append: True to append new classads. i.e Multi classads file

writeStats()[source]

Calls the statistics functions to record and write stats for this iteration.

There are several main types of statistics:

log stats: That come from parsing the condor_activity and job logs. This is computed every iteration (in perform_work()) and diff-ed to see any newly changed job statuses (ie. newly completed jobs)

qc stats: From condor_q data.

rrd stats: Used in monitoring statistics for javascript rrd graphs.

glideinwms.factory.glideFactoryEntry.check_and_perform_work(factory_in_downtime, entry, work)[source]

Check if we need to do the work and then do the work. Called by child process per entry

@param factory_in_downtime: Flag if factory is in downtime

@type entry: glideFactoryEntry.Entry @param entry: Entry object

@param work: all the work requests for the Entry

Returns:

glideinwms.factory.glideFactoryEntry.perform_work_v3(entry, condorQ, client_name, client_int_name, client_security_name, submit_credentials, remove_excess, idle_glideins, max_glideins, idle_lifetime, credential_username, glidein_totals, frontend_name, client_web, params)[source]

Perform the work (Submit or remove glideins)

@type entry: glideFactoryEntry.Entry @param entry: Entry object

@type condorQ: condorMonitor.CondorQ @param condorQ: Information about the jobs in condor_schedd (entry values sub-query from glideFactoryLib.getQCredentials())

@type client_int_name: string @param client_in_name: Internal name of the client

@type client_securty_name: string @param client_security_name: Security name of the client

@type submit_credentials: @param submit_credentials: credentials used

@type remove_excess: tuple @param remove_excess: remove_excess_str, remove_excess_margin; if frontend wants us to remove excess glideins

@type idle_glideins: int @param idle_glideins: Number of idle glideins

@type max_glideins: int @param max_glideins: Maximum number of running glideins

@type idle_lifetime: @param idle_lifetime:

@type credential_username: string @param credential_username: Credential username

@type glidein_totals: object @param glidein_totals: glidein_totals object

@type frontend_name: string @param frontend_name: Name of the frontend

@type client_web: string @param client_web: Client’s web location

@type params: object @param params: Params object

@return: 1 if something was submitted, 0 otherwise

glideinwms.factory.glideFactoryEntry.termsignal(signr, frame)[source]
glideinwms.factory.glideFactoryEntry.unit_work_v3(entry, work, client_name, client_int_name, client_int_req, client_expected_identity, decrypted_params, params, in_downtime, condorQ)[source]

Perform a single work unit using the v3 protocol.

Parameters:
  • entry – Entry

  • work – work requests

  • client_name – work_key (key used in the work request)

  • client_int_name – client name declared in the request

  • client_int_req – name of the request (declared in the request)

  • client_expected_identity

  • decrypted_params

  • params

  • in_downtime

  • condorQ – list of HTCondor jobs for this entry as returned by entry.queryQueuedGlideins()

Returns:

Return dictionary w/ success, security_names and work_done

glideinwms.factory.glideFactoryEntry.update_entries_stats(factory_in_downtime, entry_list)[source]

Update client_stats for the entries in the list. Used for entries with no job requests TODO: #22163, skip update when in downtime? NOTE: qc_stats cannot be updated because the frontend certificate information are missing @param factory_in_downtime: True if the Factory is in downtime, here for future needs (not used now) @param entry_list: list of entry names for the entries to update @return: list of names of the entries that have been updated (subset of entry_list)

glideinwms.factory.glideFactoryEntry.write_descript(entry_name, entryDescript, entryAttributes, entryParams, monitor_dir)[source]

glideinwms.factory.glideFactoryEntryGroup module

This is the glideinFactoryEntryGroup. Common Tasks like querying collector

and advertizing the work done by group are done here

param $1 = parent_pid:

The pid for the Factory daemon

type $1 = parent_pid:

int

param $2 = sleep_time:

The number of seconds to sleep between iterations

type $2 = sleep_time:

int

param $3 = advertize_rate:

The rate at which advertising should occur (every $3 loops)

type $3 = advertize_rate:

int

param $4 = startup_dir:

The “home” directory for the entry.

type $4 = startup_dir:

str|Path

param $5 = entry_names:

Colon separated list with the names of the entries this process should work on

type $5 = entry_names:

str

param $6 = group_id:

Group id, normally a number (with the “group_” prefix it forms the group name), It can change between Factory reconfigurations

type $6 = group_id:

str

class glideinwms.factory.glideFactoryEntryGroup.EntryGroup[source]

Bases: object

glideinwms.factory.glideFactoryEntryGroup.check_parent(parent_pid, glideinDescript, my_entries)[source]

Check to make sure that we aren’t an orphaned process. If Factory daemon has died, then clean up after ourselves and kill ourselves off.

@type parent_pid: int @param parent_pid: pid for the Factory daemon process

@type glideinDescript: glideFactoryConfig.GlideinDescript @param glideinDescript: Object that encapsulates glidein.descript in the Factory root directory

@type my_entries: dict @param my_entries: Dictionary of entry objects keyed on entry name

@raise KeyboardInterrupt: Raised when the Factory daemon cannot be found

glideinwms.factory.glideFactoryEntryGroup.compile_pickle_data(entry, work_done)[source]

Extract the state of the entry after doing work

Parameters:
  • entry (Entry) – Entry object

  • work_done (int) – Work done info

Returns:

pickle-friendly version of the Entry (state of the Entry)

Return type:

dict

glideinwms.factory.glideFactoryEntryGroup.find_and_perform_work(do_advertize, factory_in_downtime, glideinDescript, frontendDescript, group_name, my_entries)[source]

For all entries in this group, find work requests from the WMS collector, validate credentials, and requests Glideins. If an entry is in downtime, requested Glideins is zero.

Parameters:
  • do_advertize (bool) – Advertise (publish the gfc ClassAd) event if no work is preformed

  • factory_in_downtime (bool) – True if factory is in downtime

  • glideinDescript (dict) – Factory glidein config values

  • frontendDescript (dict) – Security mappings for frontend identities, security classes, and usernames

  • group_name (str) – Name of the group

  • my_entries (dict) – Dictionary of entry objects (glideFactoryEntry.Entry) keyed on entry name

Returns:

Dictionary of work to do keyed using entry name

Return type:

dict

glideinwms.factory.glideFactoryEntryGroup.find_work(factory_in_downtime, glideinDescript, frontendDescript, group_name, my_entries)[source]

Find work for all the entries in the group

@type factory_in_downtime: boolean @param factory_in_downtime: True if factory is in downtime

@type glideinDescript: dict @param glideinDescript: Factory glidein config values

@type frontendDescript: dict @param frontendDescript: Security mappings for frontend identities, security classes, and usernames

@type group_name: string @param group_name: Name of the group

@type my_entries: dict @param my_entries: Dictionary of entry objects keyed on entry name

@return: Dictionary of work to do keyed on entry name @rtype: dict

glideinwms.factory.glideFactoryEntryGroup.forked_check_and_perform_work(factory_in_downtime, entry, work)[source]

Do the work assigned to an entry (glidein requests) @param factory_in_downtime: flag, True if the Factory is in downtime @param entry: entry object (glideFactoryEntry.Entry) @param work: work requests for the entry @return: dictionary with entry state + work_done

glideinwms.factory.glideFactoryEntryGroup.forked_update_entries_stats(factory_in_downtime, entries_list)[source]

Update statistics for entries that have no work to do

Parameters:
  • factory_in_downtime

  • entries_list

Returns:

glideinwms.factory.glideFactoryEntryGroup.get_work_count(work)[source]

Get total work to do i.e. sum of work to do for every entry

@type work: dict @param work: Dictionary of work to do keyed on entry name

@rtype: int @return: Total work to do.

glideinwms.factory.glideFactoryEntryGroup.iterate(parent_pid, sleep_time, advertize_rate, glideinDescript, frontendDescript, group_name, my_entries)[source]

Iterate over set of tasks until it is time to quit or die. The main “worker” function for the Factory Entry Group.

Parameters:
  • parent_pid (int) – The pid for the Factory daemon

  • sleep_time (int) – The number of seconds to sleep between iterations

  • advertize_rate (int) – The rate at which advertising should occur

  • glideinDescript (glideFactoryConfig.GlideinDescript) – glidein.descript object in the Factory root dir

  • frontendDescript (glideFactoryConfig.FrontendDescript) – frontend.descript object in the Factory root dir

  • group_name (str) – Name of the group

  • my_entries (dict) – Dictionary of entry objects keyed on entry name

glideinwms.factory.glideFactoryEntryGroup.iterate_one(do_advertize, factory_in_downtime, glideinDescript, frontendDescript, group_name, my_entries)[source]

One iteration of the entry group

Parameters:
  • do_advertize (bool) – True if glidefactory classads should be advertised

  • factory_in_downtime (bool) – True if factory is in downtime

  • glideinDescript (dict) – Factory glidein config values

  • frontendDescript (dict) – Security mappings for frontend identities, security classes, and usernames

  • group_name (str) – Name of the group

  • my_entries (dict) – Dictionary of entry objects (glideFactoryEntry.Entry) keyed on entry name

Returns:

Units of work preformed (0 if no Glidein was submitted)

Return type:

int

glideinwms.factory.glideFactoryEntryGroup.log_work_info(work, key='')[source]
glideinwms.factory.glideFactoryEntryGroup.main(parent_pid, sleep_time, advertize_rate, startup_dir, entry_names, group_id)[source]

GlideinFactoryEntryGroup main function

Setup logging, monitoring, and configuration information. Starts the Entry group main loop and handles cleanup at shutdown.

Parameters:
  • parent_pid (int) – The pid for the Factory daemon

  • sleep_time (int) – The number of seconds to sleep between iterations

  • advertize_rate (int) – The rate at which advertising should occur

  • startup_dir (str|Path) – The “home” directory for the entry.

  • entry_names (str) – Colon separated list with the names of the entries this process should work on

  • group_id (str) – Group id, normally a number (with the “group_” prefix formes the group name), It can change between Factory reconfigurations

glideinwms.factory.glideFactoryInterface module

This module implements the functions needed to advertize and get commands from the Collector

class glideinwms.factory.glideFactoryInterface.EntryClassad(factory_name, glidein_name, entry_name, trust_domain, auth_method, supported_signtypes, pub_key_obj=None, glidein_submit={}, glidein_attrs={}, glidein_params={}, glidein_monitors={}, glidein_stats={}, glidein_web_attrs={}, glidein_config_limits={})[source]

Bases: Classad

This class describes the glidefactory classad. Factory advertises the glidefactory classad to the user pool as an UPDATE_AD_GENERIC type classad

class glideinwms.factory.glideFactoryInterface.FactoryConfig[source]

Bases: object

class glideinwms.factory.glideFactoryInterface.FactoryGlobalClassad(factory_name, glidein_name, supported_signtypes, pub_key_obj)[source]

Bases: Classad

This class describes the glidefactoryglobal classad. Factory advertises the glidefactoryglobal classad to the user pool as an UPDATE_AD_GENERIC type classad

glidefactory and glidefactoryglobal classads must be of the same type because they may be invalidated together (with a single command)

class glideinwms.factory.glideFactoryInterface.MultiAdvertizeGlideinClientMonitoring(factory_name, glidein_name, entry_name, glidein_attrs, factory_collector='default')[source]

Bases: object

add(client_name, client_int_name, client_int_req, client_params={}, client_monitors={}, limits_triggered={})[source]
do_advertize()[source]
do_advertize_iterate()[source]
do_advertize_multi()[source]
writeToMultiClassadFile(filename=None, append=True)[source]
exception glideinwms.factory.glideFactoryInterface.MultiExeError(arr)[source]

Bases: ExeError

glideinwms.factory.glideFactoryInterface._remove_if_there(fname)[source]

Remove the file and ignore errors (e.g. file not there)

glideinwms.factory.glideFactoryInterface.advertizeGlideinClientMonitoring(factory_name, glidein_name, entry_name, client_name, client_int_name, client_int_req, glidein_attrs={}, client_params={}, client_monitors={}, factory_collector='default')[source]
glideinwms.factory.glideFactoryInterface.advertizeGlideinClientMonitoringFromFile(fname, remove_file=True, is_multi=False, factory_collector='default')[source]
glideinwms.factory.glideFactoryInterface.advertizeGlideinFromFile(fname, remove_file=True, is_multi=False, factory_collector='default')[source]
glideinwms.factory.glideFactoryInterface.advertizeGlobal(factory_name, glidein_name, supported_signtypes, pub_key_obj, stats_dict={}, factory_collector='default')[source]

Creates the glidefactoryglobal classad and advertises.

@type factory_name: string @param factory_name: the name of the factory @type glidein_name: string @param glidein_name: name of the glidein @type supported_signtypes: string @param supported_signtypes: suppported sign types, i.e. sha1 @type pub_key_obj: GlideinKey @param pub_key_obj: for the frontend to use in encryption @type stats_dict: dict @param stats_dict: completed jobs statistics @type factory_collector: string or None @param factory_collector: the collector to query, special value ‘default’ will get it from the global config

@todo add factory downtime?

glideinwms.factory.glideFactoryInterface.createGlideinClientMonitoringFile(fname, factory_name, glidein_name, entry_name, client_name, client_int_name, client_int_req, glidein_attrs={}, client_params={}, client_monitors={}, limits_triggered={}, do_append=False)[source]
glideinwms.factory.glideFactoryInterface.deadvertizeAllGlideinClientMonitoring(factory_name, glidein_name, entry_name, factory_collector='default')[source]

Deadvertize monitoring classads for the given entry.

glideinwms.factory.glideFactoryInterface.deadvertizeFactory(factory_name, glidein_name, factory_collector='default')[source]

Deadvertize all entry and global classads for this factory.

glideinwms.factory.glideFactoryInterface.deadvertizeFactoryClientMonitoring(factory_name, glidein_name, factory_collector='default')[source]

Deadvertize all monitoring classads for this factory.

glideinwms.factory.glideFactoryInterface.deadvertizeGlidein(factory_name, glidein_name, entry_name, factory_collector='default')[source]

Removes the glidefactory classad advertising the entry from the WMS Collector.

glideinwms.factory.glideFactoryInterface.deadvertizeGlobal(factory_name, glidein_name, factory_collector='default')[source]

Removes the glidefactoryglobal classad advertising the factory globals from the WMS Collector.

glideinwms.factory.glideFactoryInterface.exe_condor_advertise(fname, command, is_multi=False, factory_collector=None)[source]
glideinwms.factory.glideFactoryInterface.findGroupWork(factory_name, glidein_name, entry_names, supported_signtypes, pub_key_obj=None, additional_constraints=None, factory_collector='default')[source]

Find request classAds that have my (factory, glidein name, entries) and create the dictionary of dictionary of work request information. Example: work[entry_name][frontend] = {‘params’:’value’, ‘requests’:’value}

@type factory_name: string @param factory_name: name of the factory

@type glidein_name: string @param glidein_name: name of the glidein instance

@type entry_names: list @param entry_names: list of factory entry names

@type supported_signtypes: list @param supported_signtypes: only support one kind of signtype, ‘sha1’, default is None

@type pub_key_obj: string @param pub_key_obj: only support ‘RSA’, defaults to None

@type additional_constraints: string @param additional_constraints: any additional constraints to include for querying the WMS collector, default is None

@type factory_collector: string or None @param factory_collector: the collector to query, special value ‘default’ will get it from the global config

@rtype: dict @return: Dictionary of work to perform. Return format is work[entry_name][frontend] = {‘params’:’value’, ‘requests’:’value}

glideinwms.factory.glideFactoryInterface.findWork(factory_name, glidein_name, entry_name, supported_signtypes, pub_key_obj=None, additional_constraints=None, factory_collector='default')[source]

Find request classAds that have my (factory, glidein name, entry name) and create the dictionary of work request information.

@type factory_name: string @param factory_name: name of the factory @type glidein_name: string @param glidein_name: name of the glidein instance @type entry_name: string @param entry_name: name of the factory entry @type supported_signtypes: list @param supported_signtypes: only support one kind of signtype, ‘sha1’, default is None @type pub_key_obj: string @param pub_key_obj: only support ‘RSA’ @type additional_constraints: string @param additional_constraints: any additional constraints to include for querying the WMS collector, default is None

@type factory_collector: string or None @param factory_collector: the collector to query, special value ‘default’ will get it from the global config

@return: dictionary, each key is the name of a frontend. Each value has a ‘requests’ and a ‘params’ key. Both refer to classAd dictionaries.

glideinwms.factory.glideFactoryInterface.workGroupByEntries(work)[source]

Given the dictionary of work items, group the work based on the entry Example: grouped_work[entry][w]

glideinwms.factory.glideFactoryLib module

This module implements the functions needed to keep the required number of idle glideins It also has support for glidein sanitizing

class glideinwms.factory.glideFactoryLib.ClientWeb(client_web_url, client_signtype, client_descript, client_sign, client_group, client_group_web_url, client_group_descript, client_group_sign, factoryConfig=None)[source]

Bases: object

get_glidein_args()[source]
class glideinwms.factory.glideFactoryLib.FactoryConfig[source]

Bases: object

config_dirs(submit_dir, log_base_dir, client_log_base_dir, client_proxies_base_dir)[source]
config_remove_freq(sleepBetweenRemoves, maxRemovesXCycle)[source]
config_submit_freq(sleepBetweenSubmits, maxSubmitsXCycle)[source]
config_whoamI(factory_name, glidein_name)[source]
get_client_log_dir(entry_name, username)[source]
get_client_proxies_dir(username)[source]
class glideinwms.factory.glideFactoryLib.GlideinTotals(entry_name, frontendDescript, jobDescript, entry_condorQ, log=None)[source]

Bases: object

Keeps track of all glidein totals.

add_idle_glideins(nr_glideins, frontend_name)[source]

Updates the totals with the additional glideins.

can_add_idle_glideins(nr_glideins, frontend_name, log=None, factoryConfig=<glideinwms.factory.glideFactoryLib.FactoryConfig object>)[source]

Determines how many more glideins can be added. Does not compare against request max_glideins. Does not update totals.

get_max_held(frontend_name)[source]

Returns max held for the given frontend:sec_class.

has_entry_exceeded_max_glideins()[source]
has_entry_exceeded_max_held()[source]
has_entry_exceeded_max_idle()[source]
has_sec_class_exceeded_max_held(frontend_name)[source]

Compares the current held for a security class to the security class limit.

glideinwms.factory.glideFactoryLib.clean_glidein_queue(remove_excess_tp, glidein_totals, condorQ, req_min_idle, req_max_glideins, frontend_name, log=None, factoryConfig=None)[source]

Cleans up the glideins queue (removes any excesses) per the frontend request.

We are not adjusting the glidein totals with what has been removed from the queue. It may take a cycle (or more) for these totals to occur so it would be difficult to reflect the true state of the system.

TODO: req_min_idle=0 when remove_excess_tp.frontend_req_min_idle is not means that a limit was reached in the Factory

or some component (Factory/Entry) is in downtime. Check if the removal behavior should change

Parameters:
  • remove_excess_tp (tuple) – remove_excess_str (NO, WAIT, IDLE, ALL), remove_excess_margin, frontend_req_min_idle remove_excess_str and remove_excess_margin are the removal request form the Frontend The frontend_req_min_idle item of the tuple indicates the original frontend pressure. We use this instead of req_min_idle for the IDLE pilot removal because the factory could set req_min_idle to 0 if an entry is in downtime, or the factory limits are reached. We do not want to remove idle pilots in these cases!

  • glidein_totals (dict) – Number of Glideins in different states for each Frontend

  • condorQ (dict) – Results of condor_q, classified

  • req_min_idle – min_idle requested by the Frontend (NOT USED, used frontend_req_min_idle in remove_excess_tp instead to avoid Factory limits effects)

  • req_max_glideins – max_glideins requested by the Frontend

  • frontend_name (str) – Name of the Frontend, to use as key

  • log (logging.Logger) – logger

  • factoryConfig (FactoryConfig) – configuration object

Returns:

1 if some glideins were removed, 0 otherwise

Return type:

int

TODO:V could return the number of glideins removed

glideinwms.factory.glideFactoryLib.days2sec(days)[source]
glideinwms.factory.glideFactoryLib.diffList(base_list, subtract_list)[source]
glideinwms.factory.glideFactoryLib.env_list2dict(env, sep='=')[source]
glideinwms.factory.glideFactoryLib.escapeParam(param_str)[source]
glideinwms.factory.glideFactoryLib.executeSubmit(log, factoryConfig, username, schedd, exe_env, submitFile)[source]

Submit Glideins using the condor_submit command in a custom environment

Parameters:
  • log – logger to use

  • factoryConfig

  • username

  • schedd (str) – HTCSS schedd name

  • exe_env (list) – environment list

  • submitFile (str) – path os the submit file

Returns:

list of submitted Glideins

Return type:

list

glideinwms.factory.glideFactoryLib.extractHeldSimple(q, factoryConfig=None)[source]

All Held Glideins: JobStatus == 5

q: dictionary of Glideins from condor_q factoryConfig (FactoryConfig): Factory configuartion (NOT USED, for interface)

Returns:

dictionary of Held Glideins from condor_q

Return type:

dict

glideinwms.factory.glideFactoryLib.extractIdleQueued(q, factoryConfig=None)[source]

All Idle Glideins already submitted: with hash_status 1xxx except 1001

hash_status 1xxx implies JobStatus 1

Parameters:
  • q – dictionary of Glideins from condor_q

  • factoryConfig (FactoryConfig) – Factory configuartion (NOT USED, for interface)

Returns:

dictionary of Idle and Submitted Glideins from condor_q

Return type:

dict

glideinwms.factory.glideFactoryLib.extractIdleSimple(q, factoryConfig=None)[source]

All Idle Glideins: JobStatus == 1

q: dictionary of Glideins from condor_q factoryConfig (FactoryConfig): Factory configuartion (NOT USED, for interface)

Returns:

dictionary of Idle Glideins from condor_q

Return type:

dict

glideinwms.factory.glideFactoryLib.extractIdleUnsubmitted(q, factoryConfig=None)[source]

All Idle Glideins not yet submitted (Unsubmitted, aka Waiting): with hash_status 1001

hash_status 1xxx implies JobStatus 1

q: dictionary of Glideins from condor_q factoryConfig (FactoryConfig): Factory configuartion (NOT USED, for interface)

Returns:

dictionary of Idle not Submitted Glideins from condor_q

Return type:

dict

glideinwms.factory.glideFactoryLib.extractJobId(submit_out)[source]

Extracts the number of jobs and cluster id from a condor output.

Parameters:

submit_out (list) – Condor output. Expects a list of str.

Raises:

condorExe.ExeError – When it failts to apply a regular expression to a line of the output.

Returns:

Number of jobs and cluster id.

Return type:

tuple

glideinwms.factory.glideFactoryLib.extractNonRunSimple(q, factoryConfig=None)[source]

All NOT Running Glideins: JobStatus != 2

q: dictionary of Glideins from condor_q factoryConfig (FactoryConfig): Factory configuartion (NOT USED, for interface)

Returns:

dictionary of Not Running Glideins from condor_q

Return type:

dict

glideinwms.factory.glideFactoryLib.extractRecoverableHeldSimple(q, factoryConfig=None)[source]
glideinwms.factory.glideFactoryLib.extractRecoverableHeldSimpleWithinLimits(q, factoryConfig=None)[source]
glideinwms.factory.glideFactoryLib.extractRunSimple(q, factoryConfig=None)[source]

All Running Glideins: JobStatus == 2

q: dictionary of Glideins from condor_q factoryConfig (FactoryConfig): Factory configuartion (NOT USED, for interface)

Returns:

dictionary of Running Glideins from condor_q

Return type:

dict

glideinwms.factory.glideFactoryLib.extractRunStale(q, factoryConfig=None)[source]
glideinwms.factory.glideFactoryLib.extractStaleSimple(q, factoryConfig=None)[source]
glideinwms.factory.glideFactoryLib.extractUnrecoverableHeldForceX(q, factoryConfig=None)[source]
glideinwms.factory.glideFactoryLib.extractUnrecoverableHeldSimple(q, factoryConfig=None)[source]
glideinwms.factory.glideFactoryLib.getCondorQCredentialList(factoryConfig=None)[source]

Returns a list of all currently used proxies based on the glideins in the queue.

glideinwms.factory.glideFactoryLib.getCondorQData(entry_name, client_name, schedd_name, factoryConfig=None)[source]

Get Condor data, given the glidein name To be passed to the main functions if client_name=None, return all clients

glideinwms.factory.glideFactoryLib.getCondorStatusData(entry_name, client_name, pool_name=None, factory_startd_attribute=None, glidein_startd_attribute=None, entry_startd_attribute=None, client_startd_attribute=None, factoryConfig=None)[source]
glideinwms.factory.glideFactoryLib.getQClientNames(condorq, factoryConfig=None)[source]

Return a dictionary grouping the condorQ by client_names (factoryConfig.client_schedd_attribute)

Parameters:

condorq – condor_q query object from condorMonitor.CondorQ

Returns:

list of client names (factoryConfig.client_schedd_attribute)

glideinwms.factory.glideFactoryLib.getQCredentials(condorq, client_name, creds, client_sa, cred_secclass_sa, cred_id_sa)[source]

Get the current queue status for a given client and credential (condorQ sub-query). v3 equivalent for getQProxySecClass

Parameters:
  • condorq – condor_q query to select within

  • client_name (param) – client name (e.g. the Frontend requesting the jobs)

  • creds – credential object (to extract sec class and id)

  • client_sa – schedd attribute to compare the client name

  • cred_secclass_sa – schedd attribute to compare the security class

  • cred_id_sa – schedd attribute to compare the credential ID

Returns:

sub-query with only the desired jobs

glideinwms.factory.glideFactoryLib.getQProxSecClass(condorq, client_name, proxy_security_class, client_schedd_attribute=None, credential_secclass_schedd_attribute=None, factoryConfig=None)[source]

Get the current queue status for client and security class.

glideinwms.factory.glideFactoryLib.getQStatus(condorq)[source]

Return a dictionary with detailed_jobStatus/numJobs, where detailed_jobStatus is returned by hash_status Idle jobs may be: 1001, 1002, 1010 depending on the GridJobStatus Running maybe: 2 or 4010 if in stage-out 1100 is used for unknown GridJobStatus

glideinwms.factory.glideFactoryLib.getQStatusSF(condorq)[source]

Return a dictionary where keys are GlideinEntrySubmitFile(s) and values is a jobStatus/numJobs dict NOTE: this has not the same level of detail as getQStatus, e.g. Idle jobs are not split depending on GridJobStatus

glideinwms.factory.glideFactoryLib.getQStatusStale(condorq)[source]

Return a dictionary with jobStatus, stale_info/numJobs, where stale_info is 1 if the status information is old

glideinwms.factory.glideFactoryLib.get_status_glideidx(el)[source]
glideinwms.factory.glideFactoryLib.get_submit_environment(entry_name, client_name, submit_credentials, client_web, params, idle_lifetime, log=None, factoryConfig=None)[source]
glideinwms.factory.glideFactoryLib.group_unclaimed(el_list)[source]
glideinwms.factory.glideFactoryLib.hash_status(el)[source]
glideinwms.factory.glideFactoryLib.hash_statusStale(el)[source]
glideinwms.factory.glideFactoryLib.hrs2sec(hrs)[source]
glideinwms.factory.glideFactoryLib.in_submit_environment(entry_name, exe_env)[source]
glideinwms.factory.glideFactoryLib.isGlideinHeldNTimes(jobInfo, factoryConfig=None, n=20)[source]

This function looks at the glidein job’s information and returns if the CondorG job is held for more than N(defaults to 20) iterations

This is useful to remove Unrecoverable glidein (CondorG job) with forcex option.

Parameters:

jobInfo (dict) – Dictionary containing glidein job’s classad information

Returns:

True if job is held more than N(defaults to 20) iterations, False if otherwise.

Return type:

bool

glideinwms.factory.glideFactoryLib.isGlideinUnrecoverable(jobInfo, factoryConfig=None, glideinDescript=None)[source]

This function looks at the glidein job’s information and returns if the CondorG job is unrecoverable. Condor hold codes are available at: https://htcondor.readthedocs.io/en/v8_9_4/classad-attributes/job-classad-attributes.html

This is useful to change to status of glidein (CondorG job) from hold to idle.

In 3.6.2 the behavior of the function changed. Instead of having a list of unrecoverable codes in the function (that got outdated once gt was deprecated), we consider each code unrecoverable and give the operators the possibility of specify a list of recoverable codes in the config.

Parameters:

jobInfo (dict) – Dictionary containing glidein job’s classad information

Returns:

True if job is unrecoverable, False if recoverable

Return type:

bool

glideinwms.factory.glideFactoryLib.isGlideinWithinHeldLimits(jobInfo, factoryConfig=None)[source]

This function looks at the glidein job’s information and returns if the CondorG job can be released.

This is useful to limit how often held jobs are released.

Parameters:

jobInfo (dict) – Dictionary containing glidein job’s classad information

Returns:

True if job is within limits, False if it is not

Return type:

bool

glideinwms.factory.glideFactoryLib.is_str_safe(s)[source]
glideinwms.factory.glideFactoryLib.keepIdleGlideins(client_condorq, client_int_name, req_min_idle, req_max_glideins, idle_lifetime, remove_excess, submit_credentials, glidein_totals, frontend_name, client_web, params, log=None, factoryConfig=None)[source]

Looks at the status of the queue and determines how many glideins to submit. Returns the number of newly submitted glideins.

If the system is unable to submit glideins because has reached one of the limits (request, entry, frontend:security_class), and the frontend asks for removal (RemoveExcess) in the request, it will try to remove excess glideins.

Parameters:
  • client_condorq (CondorQ) – Condor queue filtered by security class

  • client_int_name (str) – internal representation of the client name

  • req_min_idle (int) – min number of idle glideins needed from the frontend request

  • req_max_glideins (int) – max number of running glideins allowed in the frontend request

  • idle_lifetimei (int) – how much to wait before removing glideins that are idle

  • remove_excess (tuple) – remove_excess_str (NO, WAIT, IDLE, ALL), remove_excess_margin

  • submit_credentials (SubmitCredentials) – all the information needed to submit the glideins

  • glidein_totals (GlideinTotals) – entry and frontend glidein counts

  • frontend_name (str) – frontend name, used to map frontend totals in glidein_totals (“frontend:sec_class”)

  • client_web (glideFactoryLib.ClientWeb) – client web values

  • params (dict) – params from the entry configuration or frontend to be passed to the glideins

  • log (logger) – factory logger

  • factoryConfig – factory configuration

Raises:

condorExe.ExeError – in case of issues executing condor commands

glideinwms.factory.glideFactoryLib.logStats(condorq, client_int_name, client_security_name, proxy_security_class, log=None, factoryConfig=None)[source]

Sum to the current schedd statistics of this entry (from condor_q on the Factory) to the values already stored in factoryConfig.client_stats, factoryConfig.qc_stats

Parameters:
  • condorq – condorQ object, containing a list of all jobs of the schedd (.data) for the entry invoking this

  • client_int_name – client name (from the requestor/Frontend)

  • client_security_name – security name used by the client

  • proxy_security_class – credential security class used by the client

  • log – to log errors/info/…

  • factoryConfig – common data block for the entry to get schedd statistics (client_stats, qc_stats)

glideinwms.factory.glideFactoryLib.logStatsAll(condorq, log=None, factoryConfig=None)[source]

Sum to the current schedd statistics of this entry (from condor_q on the Factory) to the values already stored in factoryConfig.client_stats, factoryConfig.qc_stats Do that for all the clients that have jobs on this entry

Parameters:
  • condorq – condorQ object, containing a list of all jobs of the schedd (.data) for the entry invoking this

  • log – to log errors/info/…

  • factoryConfig – common data block for the entry to get schedd statistics (client_stats, qc_stats)

glideinwms.factory.glideFactoryLib.logWorkRequest(client_int_name, client_security_name, proxy_security_class, req_idle, req_max_run, remove_excess, work_el, fraction=1.0, log=None, factoryConfig=None)[source]

Logs work requests

Parameters:
  • client_int_name – client internal name

  • client_security_name – client security name

  • proxy_security_class – security ID

  • req_idle – requested idle glideins

  • req_max_run – max running glideins

  • remove_excess – tuple, remove_excess_str, remove_excess_margin

  • work_el – Work requests, temporary workaround; the requests should always be processed at the caller level

  • fraction – fraction for this entry

  • log

  • factoryConfig

glideinwms.factory.glideFactoryLib.releaseGlideins(schedd_name, jid_list, log=None, factoryConfig=None)[source]

Release the glideins in the list

We are assuming the gfactory to be a condor superuser or the only user owning jobs (Glideins) and thus does not need identity switching to release jobs

Parameters:
  • schedd_name

  • jid_list

  • log

  • factoryConfig

Returns:

glideinwms.factory.glideFactoryLib.removeGlideins(schedd_name, jid_list, force=False, log=None, factoryConfig=None)[source]

Remove the Glideins in the list

We are assuming the gfactory to be a condor superuser or the only user owning jobs (Glideins) and thus does not need identity switching to remove jobs

Parameters:
  • schedd_name (str) – HTCSS schedd name

  • jid_list

  • force

  • log

  • factoryConfig

Returns:

None

glideinwms.factory.glideFactoryLib.sanitizeGlideins(condorq, log=None, factoryConfig=None)[source]
glideinwms.factory.glideFactoryLib.schedd_name2str(schedd_name)[source]
glideinwms.factory.glideFactoryLib.secClass2Name(client_security_name, proxy_security_class)[source]
glideinwms.factory.glideFactoryLib.set_condor_integrity_checks()[source]
glideinwms.factory.glideFactoryLib.submitGlideins(entry_name, client_name, nr_glideins, idle_lifetime, frontend_name, submit_credentials, client_web, params, status_sf, log=None, factoryConfig=None)[source]

Submit the Glideins Calls executeSubmit to run the HTCSS command

Parameters:
  • entry_name (str)

  • client_name (str) – client (e.g. Frontend group) name

  • nr_glideins (int)

  • idle_lifetime (int)

  • frontend_name (str)

  • submit_credentials (dict)

  • client_web (str) – None means client did not pass one, backwards compatibility

  • params

  • status_sf (dict) – keys are GlideinEntrySubmitFile(s) and values is a jobStatus/numJobs dict

  • log

  • factoryConfig

glideinwms.factory.glideFactoryLib.sum_idle_count(qc_status)[source]

Add the summary of idle jobs to the statistics passed as input

Parameters:

qc_status – Query count summary with job_status/number_of_jobs

Returns:

Adds qc_status[1] to qc_status

glideinwms.factory.glideFactoryLib.update_x509_proxy_file(entry_name, username, client_id, proxy_data, factoryConfig=None)[source]

Create/update the proxy file

It is simply a safe update of the file w/ the new proxy data if different from the current data. The data is written in a binary proxy file. The path of the proxy file is: f”{factoryConfig.client_proxies_base_dir}/user_{username}/glidein_{factoryConfig.glidein_name}/entry_{entry_name}” with name “x509_%s.proxy” % escapeParam(client_id)

Proxy DN and VOMS extension are extracted but never used (?!)

Parameters:
  • entry_name (str) – entry name

  • username (str) – user name (identity of the Frontend)

  • client_id (str) – client ID

  • proxy_data (bytes) – Proxy data in PEM format

  • factoryConfig – Factory configuration

Returns:

Proxy file full path

Return type:

str

glideinwms.factory.glideFactoryLib.which(program)[source]

Implementation of which command in python.

Returns:

Path to the binary

Return type:

str

glideinwms.factory.glideFactoryLogParser module

This module implements classes to track changes in glidein status logs

class glideinwms.factory.glideFactoryLogParser.dirSummarySimple(obj)[source]

Bases: object

dirSummary Simple

for now it is just a constructor wrapper Further on it will need to implement glidein exit code checks

diff(other)[source]
merge(other)[source]
mkTempLogObj()[source]
class glideinwms.factory.glideFactoryLogParser.dirSummaryTimingsOut(dirname, cache_dir, client_name, user_name, inactive_files=None, inactive_timeout=86400)[source]

Bases: cacheDirClass

This class uses a lambda function to initialize an instance of cacheDirClass. The function chooses all condor_activity files in a directory that correspond to a particular client.

get_simple()[source]
class glideinwms.factory.glideFactoryLogParser.dirSummaryTimingsOutFull(dirname, cache_dir, inactive_files=None, inactive_timeout=86400)[source]

Bases: cacheDirClass

This class uses a lambda function to initialize an instance of cacheDirClass. The function chooses all condor_activity files in a directory regardless of client name.

get_simple()[source]
glideinwms.factory.glideFactoryLogParser.extractLogData(fname)[source]

Given a filename of a job file “path/job.NUMBER.out” extract the statistics of the job duration, etc.

@param fname: Filename to extract @return: a dictionary with keys:

  • glidein_duration - integer, how long did the glidein run

  • validation_duration - integer, how long before starting condor

  • condor_started - Boolean, did condor even start (if false, no further entries)

  • condor_duration - integer, how long did Condor run

  • stats - dictionary of stats (as in KNOWN_SLOT_STATS), each having

  • jobsnr - integer, number of jobs started

  • secs - integer, total number of secods used

For example {‘glidein_duration’:20305,’validation_duration’:6,’condor_started’ : 1, ‘condor_duration’: 20298, ‘stats’: {‘badSignal’: {‘secs’: 0, ‘jobsnr’: 0}, ‘goodZ’: {‘secs’ : 19481, ‘jobsnr’: 1}, ‘Total’: {‘secs’: 19481, ‘jobsnr’: 1}, ‘goodNZ’: {‘secs’: 0, ‘jobsnr’: 0}, ‘badOther’: {‘secs’: 0, ‘jobsnr’: 0}}}

class glideinwms.factory.glideFactoryLogParser.logSummaryTimingsOut(logname, cache_dir, username)[source]

Bases: logSummaryTimings

Class logSummaryTimingsOut logs timing and status of a job. It declares a job complete only after the output file has been received The format is slightly different than the one of logSummaryTimings; we add the dirname in the job id When a output file is found, it adds a 4th parameter to the completed jobs See extractLogData below for more details

diff(other)[source]

Diff self.data with other for use in comparing current iteration data with previous iteration.

Uses diff_raw to perform symmetric difference of self.data and other and puts it into data[status][‘Entered’|’Exited’] Completed jobs are augmented with data from the log

@return: data[status][‘Entered’|’Exited’] - list of jobs

diff_raw(other)[source]

Diff self.data with other info, add glidein log data to Entered/Exited. Used to compare current data with previous iteration.

Uses symmetric difference of sets to compare the two dictionaries.

@type other: dictionary of statuses -> jobs @return: data[status][‘Entered’|’Exited’] - list of jobs

loadFromLog()[source]

This class inherits from cachedLogClass. So, load() will first check the cached files. If changed, it will call this function. This uses the condorLogParser to load the log, then does some post-processing to check the job.NUMBER.out files to see if the job has finished and to extract some data.

class glideinwms.factory.glideFactoryLogParser.logSummaryTimingsOutWrapper[source]

Bases: object

getObj(logname=None, cache_dir=None, username='all')[source]

glideinwms.factory.glideFactoryMonitorAggregator module

This module implements the functions needed to aggregate the monitoring fo the glidein factory

class glideinwms.factory.glideFactoryMonitorAggregator.MonitorAggregatorConfig[source]

Bases: object

config_factory(monitor_dir, entries, log)[source]
glideinwms.factory.glideFactoryMonitorAggregator.aggregateJobsSummary()[source]

Loads the job summary pickle files for each entry, aggregates them per schedd/collector pair, and return them. :return: A dictionary containing the needed information that looks like:

{
(‘schedd_name’,’collector_name’){

‘2994.000’: {‘condor_duration’: 1328, ‘glidein_duration’: 1334, ‘condor_started’: 1, ‘numjobs’: 0}, ‘2997.000’: {‘condor_duration’: 1328, ‘glidein_duration’: 1334, ‘condor_started’: 1, ‘numjobs’: 0}, …

}, (‘schedd_name’,’collector_name’) : {

‘2003.000’: {‘condor_duration’: 1328, ‘glidein_duration’: 1334, ‘condor_started’: 1, ‘numjobs’: 0}, ‘206.000’: {‘condor_duration’: 1328, ‘glidein_duration’: 1334, ‘condor_started’: 1, ‘numjobs’: 0}, …

}

}

glideinwms.factory.glideFactoryMonitorAggregator.aggregateLogSummary()[source]

Create an aggregate of log summary files, write it in an aggregate log summary file and in the end return the values

glideinwms.factory.glideFactoryMonitorAggregator.aggregateRRDStats(log=None)[source]

Create an aggregate of RRD stats, write it files

Parameters:

log (logging.Logger) – logger to use

glideinwms.factory.glideFactoryMonitorAggregator.aggregateStatus(in_downtime)[source]

Create an aggregate of status files, write it in an aggregate status file and in the end return the values

@type in_downtime: boolean @param in_downtime: Entry downtime information

@rtype: dict @return: Dictionary of status information

glideinwms.factory.glideFactoryMonitorAggregator.rrd_site(name)[source]
glideinwms.factory.glideFactoryMonitorAggregator.sumDictInt(indict, outdict)[source]
glideinwms.factory.glideFactoryMonitorAggregator.verifyRRD(fix_rrd=False, backup=True)[source]

Go through all known monitoring rrds and verify that they match existing schema (could be different if an upgrade happened) If fix_rrd is true, then also attempt to add any missing attributes.

Parameters:
  • fix_rrd (bool) – if True, will attempt to add missing attrs

  • backup (bool) – if True, backup the old RRD before fixing

Returns:

True if all OK, False if there is a problem w/ RRD files

Return type:

bool

glideinwms.factory.glideFactoryMonitorAggregator.writeLogSummaryRRDs(fe_dir, status_el)[source]

glideinwms.factory.glideFactoryMonitoring module

This module implements the functions needed to monitor the glidein factory

class glideinwms.factory.glideFactoryMonitoring.Descript2XML(log=None)[source]

Bases: object

create an XML file out of glidein.descript, frontend.descript, entry.descript, attributes.cfg, and params.cfg TODO: The XML is used by … “the monioring page”? The file created is descript.xml, w/ glideFactoryDescript and glideFactoryEntryDescript elements

entryDescript(e_dict)[source]
frontendDescript(fe_dict)[source]
getUpdated()[source]

returns the time of last update

glideinDescript(g_dict)[source]
writeFile(path, xml_str, singleEntry=False)[source]
class glideinwms.factory.glideFactoryMonitoring.FactoryStatusData(log=None, base_dir=None)[source]

Bases: object

this class handles the data obtained from the rrd files

average(input_list)[source]
fetchData(rrd_file, pathway, res, start, end)[source]

Uses rrdtool to fetch data from the clients. Returns a dictionary of lists of data. There is a list for each element.

rrdtool fetch returns 3 tuples: a[0], a[1], & a[2]. [0] lists the resolution, start and end time, which can be specified as arugments of fetchData. [1] returns the names of the datasets. These names are listed in the key. [2] is a list of tuples. each tuple contains data from every dataset. There is a tuple for each time data was collected.

getData(input_val, monitoringConfig=None)[source]

Return the data fetched by rrdtool as a dictionary

This also modifies the rrd data dictionary for the client (input_val) in all RRD files and appends the client to the list of frontends

Where this side effect is used: - totals are updated in Entry.writeStats (writing the XML) - frontend data in check_and_perform_work

getUpdated()[source]

returns the time of last update

getXMLData(rrd)[source]

Return a XML formatted string the specific RRD file for the data fetched from a given site (all clients+total).

This also has side effects in the getData(self.total) invocation: - modifies the rrd data dictionary (all RRDs) for the total for this entry - and appends the total (self.total aka ‘total/’) to the list of clients (frontends)

@param rrd: @return: XML formatted string with stats data

writeFiles(monitoringConfig=None)[source]

Write an xml file for the data fetched from a given site. Write rrd files

NOTE: writeFiles triggers the side effect of updating the rrd for totals (via getXMLData/getData)

@param monitoringConfig: @return: None

class glideinwms.factory.glideFactoryMonitoring.MonitoringConfig(log=None)[source]

Bases: object

config_log(log_dir, max_days, min_days, max_mbs)[source]
establish_dir(relative_dname)[source]
logCompleted(client_name, entered_dict)[source]

This function takes all newly completed glideins and logs them in logs/entry_Name/completed_jobs_date.log in an XML-like format.

It counts the jobs completed on a glidein but does not keep track of the cores received or used by the jobs

@type client_name: String @param client_name: the name of the frontend client @type entered_dict: Dictionary of dictionaries @param entered_dict: This is the dictionary of all jobs that have “Entered” the “Completed” states. It is indexed by job_id. Each data is an info dictionary containing the keys: username, jobs_duration (subkeys:total,goodput,terminated), wastemill (subkeys:validation,idle,nosuccess,badput) , duration, condor_started, condor_duration, jobsnr

rrd_obj

The name of the attribute that identifies the glidein

Type:

@ivar

write_completed_json(relative_fname, time, val_dict)[source]

Write val_dict to a json file, creating if needed relative_fname: location of json relative to self.monitor_dir time: typically self.updated val_dict: dictionary object to be dumped to file

write_file(relative_fname, output_str)[source]

Write out a string or bytes to a file

Parameters:
  • relative_fname (AnyStr) – The relative path name to write out

  • output_str (AnyStr) – the string (unicode str or bytes) to write to the file

write_rrd_multi(relative_fname, ds_type, time, val_dict, min_val=None, max_val=None)[source]

Create a RRD file, using rrdtool.

write_rrd_multi_hetero(relative_fname, ds_desc_dict, time, val_dict)[source]

Create a RRD file, using rrdtool. Like write_rrd_multi, but with each ds having each a specified type each element of ds_desc_dict is a dictionary with any of ds_type, min, max if ds_desc_dict[name] is not present, the defaults are {‘ds_type’:’GAUGE’, ‘min’:’U’, ‘max’:’U’}

class glideinwms.factory.glideFactoryMonitoring.condorLogSummary(log=None)[source]

Bases: object

This class handles the data obtained from parsing the glidein log files

aggregate_frontend_data(updated, diff_summary)[source]

This goes into each frontend in the current entry and aggregates the completed/stats/wastetime data into completed_data.json at the entry level

computeDiff()[source]

This function takes the current_stats_data from the current iteration and the old_stats_data from the last iteration (see reset() function) to create a diff of the data in the stats_diff dictionary.

This stats_diff will be a dictionary with two entries for each status: “Entered” and “Exited” denoting which job ids have recently changed status, ie. stats_diff[frontend][username:client_int_name][“Completed”][“Entered”]

diffTimes(end_time, start_time)[source]
get_completed_stats(entered_list)[source]
get_data_summary()[source]

Summarizes stats_diff data (computeDiff should have already been called) Sums over username in the dictionary stats_diff[frontend][username][entered/exited][status] to make stats_data[client_name][entered/exited][status]=count

@return: dictionary[client_name][entered/exited][status]=count

get_diff_summary()[source]

Flattens stats_diff differential data.

@return: Dictionary of client_name with sub_keys Wait,Idle,Running,Held,Completed,Removed

get_diff_total()[source]
get_stats_data_summary()[source]

Summarizes current_stats_data: Adds up current_stats_data[frontend][user:client][status] across all username keys.

@return: returns dictionary stats_data[frontend][status]=count

get_stats_total()[source]

@return: Dictionary with keys (wait,idle,running,held)

get_stats_total_summary()[source]
get_total_summary()[source]
get_xml_data(indent_tab='   ', leading_tab='')[source]
get_xml_stats_data(indent_tab='   ', leading_tab='')[source]
get_xml_stats_total(indent_tab='   ', leading_tab='')[source]
get_xml_total(indent_tab='   ', leading_tab='')[source]
get_xml_updated(indent_tab='   ', leading_tab='')[source]
logSummary(client_name, stats)[source]

log_stats taken during during an iteration of perform_work are added/merged into the condorLogSummary class here.

@type stats: dictionary of glideFactoryLogParser.dirSummaryTimingsOut @param stats: Dictionary keyed by “username:client_int_name” client_int_name is needed for frontends with multiple groups

reset()[source]

Replaces old_stats_data with current_stats_data Sets current_stats_data to empty. This is called every iteration in order to later compare the diff of the previous iteration and current one to find any newly changed jobs (ie newly completed jobs)

summarize_completed_stats(entered_list)[source]
write_file(monitoringConfig=None)[source]
write_job_info(scheddName, collectorName)[source]

The method itereates over the stats_diff dictionary looking for completed jobs and then fills out a dictionary that contains the monitoring information needed for this job. Those info looks like:

{

‘schedd_name’: ‘name’, ‘collector_name’: ‘name’, ‘joblist’ : {

‘2994.000’: {‘condor_duration’: 1328, ‘glidein_duration’: 1334, ‘condor_started’: 1, ‘numjobs’: 0, ‘2997.000’: {‘condor_duration’: 1328, ‘glidein_duration’: 1334, ‘condor_started’: 1, ‘numjobs’: 0 …

}

}

Parameters:
  • scheddName – The schedd name to update the job

  • collectorName – The collector name to update the job

class glideinwms.factory.glideFactoryMonitoring.condorQStats(log=None, cores=1)[source]

Bases: object

aggregateStates(qc_status, el)[source]

For each status in the condor_q count status dictionary (qc_status) add the count to the el dictionary (whose keys are state like ‘Idle’ instead of its number: 1)

finalizeClientMonitor()[source]
static getEntryFromSubmitFile(submitFile)[source]

Extract the entry name from submit files that look like: ‘entry_T2_CH_CERN/job.CMSHTPC_T2_CH_CERN_ce301.condor’

get_data()[source]
get_total(history={'set_to_zero': False})[source]
static get_xml_data(data, indent_tab='   ', leading_tab='')[source]

Return a string with the XML formatted statistic data @param data: self.get_data() @param indent_tab: indentation space @param leading_tab: leading space @return: XML string

get_xml_downtime(leading_tab='   ')[source]
static get_xml_total(total, indent_tab='   ', leading_tab='')[source]

Return formatted XML for the total statistics @param total: self.get_total() @param indent_tab: indentation space @param leading_tab: leading space @return: XML string

get_xml_updated(indent_tab='   ', leading_tab='')[source]
get_zero_data_element()[source]

Return a dictionary with the keys defined in self.attributes, and all values to 0

Returns:

data element w/ all 0 values

logClientMonitor(client_name, client_monitor, client_internals, fraction=1.0)[source]

client_monitor is a dictionary of monitoring info (GlideinMonitor… from glideclient ClassAd) client_internals is a dictionary of internals (from glideclient ClassAd) If fraction is specified it will be used to extract partial info

At the moment, it looks only for

‘Idle’ ‘Running’ ‘RunningHere’ ‘GlideinsIdle’, ‘GlideinsIdleCores’ ‘GlideinsRunning’, ‘GlideinsRunningCores’ ‘GlideinsTotal’, ‘GlideinsTotalCores’ ‘LastHeardFrom’

updates go in self.data (self.data[client_name][‘ClientMonitor’])

logRequest(client_name, requests)[source]

requests is a dictinary of requests params is a dictinary of parameters

At the moment, it looks only for

‘IdleGlideins’ ‘MaxGlideins’

Request contains only that (no real cores info) It is eveluated using GLIDEIN_CPUS

logSchedd(client_name, qc_status, qc_status_sf)[source]

Create or update a dictionary with aggregated HTCondor stats

client_name is the client requesting the glideins qc_status is a dictionary of condor_status:nr_jobs qc_status_sf is a dictionary of submit_file:qc_status OUTPUT: self.data[client_name][‘Status’] is the status for all Glideins

self.data[client_name][‘StatusEntries’] is the Glidein status by Entry

set_downtime(in_downtime)[source]
write_file(monitoringConfig=None, alt_stats=None)[source]

Calculate a summary for the entry and write statistics to files @param monitoringConfig: used to pass information from the Entry @param alt_stats: an alternative condorQStats object to use if self has no data @return:

glideinwms.factory.glideFactoryMonitoring.getAllJobRanges()[source]
glideinwms.factory.glideFactoryMonitoring.getAllJobTypes()[source]
glideinwms.factory.glideFactoryMonitoring.getAllMillRanges()[source]
glideinwms.factory.glideFactoryMonitoring.getAllTimeRanges()[source]
glideinwms.factory.glideFactoryMonitoring.getJobRange(absval)[source]
glideinwms.factory.glideFactoryMonitoring.getLogCompletedDefaults()[source]
glideinwms.factory.glideFactoryMonitoring.getMillRange(absval)[source]
glideinwms.factory.glideFactoryMonitoring.getTimeRange(absval)[source]
glideinwms.factory.glideFactoryMonitoring.get_completed_stats_xml_desc()[source]

glideinwms.factory.glideFactoryPidLib module

class glideinwms.factory.glideFactoryPidLib.EntryGroupPidSupport(startup_dir, group_name)[source]

Bases: PidWParentSupport

class glideinwms.factory.glideFactoryPidLib.EntryPidSupport(startup_dir, entry_name)[source]

Bases: PidWParentSupport

class glideinwms.factory.glideFactoryPidLib.FactoryPidSupport(startup_dir)[source]

Bases: PidSupport

glideinwms.factory.glideFactoryPidLib.get_entry_pid(startup_dir, entry_name)[source]
glideinwms.factory.glideFactoryPidLib.get_entrygroup_pid(startup_dir, group_name)[source]
glideinwms.factory.glideFactoryPidLib.get_factory_pid(startup_dir)[source]

glideinwms.factory.glideFactorySelectionAlgorithms module

glideinwms.factory.glideFactorySelectionAlgorithms.selectionAlgoDefault(submit_files, status_sf, jobDescript, nr_glideins, log)[source]

Given the list of sub entries (aka submit files), and the status of each sub entry (how many idle + running etc) figures out how many glideins to submit for each sub entry. 1) Shuffle the submit_files list 2) Try to “depth-wise” fill all the subentries untillimits are reached

@type submit_files: list @param submit_files: list of strings containing the name of the submit files for this entry set @type status_sf: dict @param status_sf: dictrionary where the keys are the submit files and the values is a condor states dict @type jobDescript: object @param jobDescript: will read here maximum number of idle/total fglideins for each sub entry @type nr_glideins: int @param nr_glideins: total number of glideins to submit to all the entries @type log: object @param log: logging object

Return a dictionary where keys are the submit files, and values are int indicating how many glideins to submit

glideinwms.factory.manageFactoryDowntimes module

glideinwms.factory.manageFactoryDowntimes.add(entry_name, opt_dict)[source]
glideinwms.factory.manageFactoryDowntimes.check(entry_or_id, opt_dict)[source]
glideinwms.factory.manageFactoryDowntimes.delay2time(delayStr)[source]
glideinwms.factory.manageFactoryDowntimes.down(entry_name, opt_dict)[source]
glideinwms.factory.manageFactoryDowntimes.get_args(argv)[source]
glideinwms.factory.manageFactoryDowntimes.get_downtime_fd(entry_name, cmdname)[source]
glideinwms.factory.manageFactoryDowntimes.get_downtime_fd_dict(entry_or_id, cmdname, opt_dict)[source]
glideinwms.factory.manageFactoryDowntimes.get_entries(factory_dir)[source]
glideinwms.factory.manageFactoryDowntimes.get_frontends(factory_dir)[source]
glideinwms.factory.manageFactoryDowntimes.get_production_ress_entries(server, ref_dict_list)[source]
glideinwms.factory.manageFactoryDowntimes.get_security_classes(factory_dir)[source]
glideinwms.factory.manageFactoryDowntimes.infosys_based(entry_name, opt_dict, infosys_types)[source]
glideinwms.factory.manageFactoryDowntimes.main(argv)[source]
glideinwms.factory.manageFactoryDowntimes.printtimes(entry_or_id, opt_dict)[source]
glideinwms.factory.manageFactoryDowntimes.str2time(timeStr)[source]
glideinwms.factory.manageFactoryDowntimes.strtxt2time(timeStr)[source]
glideinwms.factory.manageFactoryDowntimes.up(entry_name, opt_dict)[source]
glideinwms.factory.manageFactoryDowntimes.usage()[source]
glideinwms.factory.manageFactoryDowntimes.vacuum(entry_or_id, opt_dict)[source]

glideinwms.factory.stopFactory module

glideinwms.factory.stopFactory.all_pids_in_pgid_dead(pgid)[source]
glideinwms.factory.stopFactory.kill_and_check_pgid(pgid, signr=Signals.SIGTERM, retries=100, retry_interval=0.5)[source]
glideinwms.factory.stopFactory.main(startup_dir, force=True)[source]

Module contents