Submission utility
To ease the transition to the new cluster and the general use of HTCondor, we implemented a solution based on environment modules. The traditional interaction methods, i.e. specifying all command line options, remain valid, yet less handy and more verbose.
The htc
modules will set all environment variables needed to correctly submit to both the old and the new HTCondor clusters.
Once logged into any Tier 1 user interface, this utility will be available. You can list all the available modules using:
apascolinit1@ui-tier1 ~ $ module avail -------------------------------------------------------- /opt/exp_software/opssw/modules/modulefiles --------------------------------------------------------- htc/auth htc/ce htc/local use.own Key: modulepath default-version
These htc/* modules have different roles:
- htc/local - to be used once you want to submit jobs to or query the local schedds sn-02 or sn01-htc, respectively the HTCondor 9.0 and 23 cluster access points. This is the default module loaded when loading the "htc" family.
variable values description ver 9 selects the old HTCondor cluster and local schedd (sn-02) 23 selects the new HTCondor cluster and local schedd (sn01-htc) Usage of htc/local moduleapascolinit1@ui-tier1 ~ $ module switch htc ver=9 apascolinit1@ui-tier1 ~ $ condor_q -- Schedd: sn-02.cr.cnaf.infn.it : <131.154.192.42:9618?... @ 04/17/24 14:58:44 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL JOB_IDS Total for query: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended Total for apascolinit1: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended Total for all users: 50164 jobs; 30960 completed, 1 removed, 12716 idle, 4514 running, 1973 held, 0 suspended apascolinit1@ui-tier1 ~ $ module switch htc ver=23 apascolinit1@ui-tier1 ~ $ condor_q -- Schedd: sn01-htc.cr.cnaf.infn.it : <131.154.192.242:9618?... @ 04/17/24 14:58:52 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL JOB_IDS Total for query: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended Total for apascolinit1: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended Total for all users: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspende
- htc/ce - eases the usage of condor_q and condor_submit commands setting up all the needed variables to contact our Grid compute entryopoints.
variable values description num 1,2,3,4 connects to ce{num}-htc (new cluster) 5,6,7 connects to ce{num}-htc (old cluster) auth GSI,SSL,SCITOKENS calls htc/auth with the selected auth method Usage of htc/ce moduleapascolinit1@ui-tier1 ~ $ condor_q Error: ...... apascolinit1@ui-tier1 ~ $ module switch htc/ce auth=SCITOKENS num=2 Don't forget to "export BEARER_TOKEN=$(oidc-token <client-name>)"! Switching from htc/ce{auth=SCITOKENS:num=2} to htc/ce{auth=SCITOKENS:num=2} Loading requirement: htc/auth{auth=SCITOKENS} apascolinit1@ui-tier1 ~ $ export BEARER_TOKEN=$(oidc-token htc23) apascolinit1@ui-tier1 ~ $ condor_q -- Schedd: ce02-htc.cr.cnaf.infn.it : <131.154.192.41:9619?... @ 04/17/24 15:48:24 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL JOB_IDS .......... .......... ..........
All modules in the htc
family provide on-line help via the "module help <module name>
" command, e.g.:
- Executable and Submit file
budda@ui-tier1:~ $ module help htc ------------------------------------------------------------------- Module Specific Help for /opt/exp_software/opssw/modules/modulefiles/htc/local: Defines environment variables and aliases to ease the interaction with the INFN-T1 HTCondor local job submission system -------------------------------------------------------------------
Local Submission
To submit local jobs, the behavior is the same as for HTCondor 9 using the Jobs UI.
- Submitting a job to the cluster.Executable and Submit file
apascolinit1@ui-tier1 ~ $ cat sleep.sh #!/bin/env bash sleep $1 apascolinit1@ui-tier1 ~ $ cat submit.sub # Unix submit description file # subimt.sub -- simple sleep job batch_name = Local-Sleep executable = sleep.sh arguments = 3600 log = $(batch_name).log.$(Process) output = $(batch_name).out.$(Process) error = $(batch_name).err.$(Process) should_transfer_files = Yes when_to_transfer_output = ON_EXIT queue
Submission and control of job statusapascolinit1@ui-tier1 ~ $ module switch htc ver=23 apascolinit1@ui-tier1 ~ $ condor_submit submit.sub Submitting job(s). 1 job(s) submitted to cluster 15. apascolinit1@ui-tier1 ~ $ condor_q -- Schedd: sn01-htc.cr.cnaf.infn.it : <131.154.192.242:9618?... @ 03/18/24 17:15:44 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS apascolinit1 Local-Sleep 3/18 17:15 _ 1 _ 1 15.0 Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for apascolinit1: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for all users: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Grid Submission
The GRID submission part on the ce01-htc is nearly the same as the one used to submit on the old cluster. You can use 2 types of authentication methods:
Token submission
That the steps are identical to those in the HTCondor 9 cluster::
- Register a Client (or upload it of an already submitted)Register a new Client
apascolinit1@ui-tier1 ~ $ eval `oidc-agent-service use` 23025 apascolinit1@ui-tier1 ~ $ oidc-gen -w device Enter short name for the account to configure: htc23 [1] https://iam-t1-computing.cloud.cnaf.infn.it/ ... ... Issuer [https://iam-t1-computing.cloud.cnaf.infn.it/]: <enter> The following scopes are supported: openid profile email address phone offline_access eduperson_scoped_affiliation eduperson_entitlement eduperson_assurance entitlements Scopes or 'max' (space separated) [openid profile offline_access]: profile wlcg.groups wlcg compute.create compute.modify compute.read compute.cancel Registering Client ... Generating account configuration ... accepted Using a browser on any device, visit: https://iam-t1-computing.cloud.cnaf.infn.it/device And enter the code: REDACTED ... ... ... Enter encryption password for account configuration 'htc23': <passwd> Confirm encryption Password: <passwd> Everything setup correctly!
- Get a token for submission
apascolinit1@ui-tier1 ~ $ oidc-add htc23 Enter decryption password for account config 'htc23': <passwd> success apascolinit1@ui-tier1 ~ $ umask 0077 ; oidc-token htc23 > ${HOME}/token
- Submit a test jobSubmit file
apascolinit1@ui-tier1 ~ $ cat submit_token.sub # Unix submit description file # subimt.sub -- simple sleep job scitokens_file = $ENV(HOME)/token +owner = undefined batch_name = Grid-Token-Sleep executable = sleep.sh arguments = 3600 log = $(batch_name).log.$(Process) output = $(batch_name).out.$(Process) error = $(batch_name).err.$(Process) should_transfer_files = Yes when_to_transfer_output = ON_EXIT queue
Job submission with Tokenapascolinit1@ui-tier1 ~ $ module switch htc/ce auth=SCITOKENS num=1 Don't forget to "export BEARER_TOKEN=$(oidc-token <client-name>)"! apascolinit1@ui-tier1 ~ $ export BEARER_TOKEN=$(oidc-token htc23) apascolinit1@ui-tier1 ~ $ condor_submit submit_token.sub Submitting job(s). 1 job(s) submitted to cluster 35. apascolinit1@ui-tier1 ~ $ condor_q -- Schedd: ce01-htc.cr.cnaf.infn.it : <131.154.193.64:9619?... @ 03/19/24 10:35:43 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS apascolinius Grid-Token-Sleep 3/19 10:35 _ _ 1 1 35.0 Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended Total for apascolinius: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
SSL submission
The SSL Submission substitution of proxy, this process is almost identical.
CAVEAT
To be able to submit jobs using the SSL authentication, your x509 User Proxy FQAN must be mapped in the CE configuration.
You will need to send to the support team via the user-support@lists.cnaf.infn.it mailing list the output of the voms-proxy-info --all --chain
corresponding to a valid voms proxy:
budda@ui-tier1:~ $ voms-proxy-info --all --chain === Proxy Chain Information === X.509 v3 certificate Subject: CN=1569994718,CN=Carmelo Pellegrino cpellegr@infn.it,O=Istituto Nazionale di Fisica Nucleare,C=IT,DC=tcs,DC=terena,DC=org Issuer: CN=Carmelo Pellegrino cpellegr@infn.it,O=Istituto Nazionale di Fisica Nucleare,C=IT,DC=tcs,DC=terena,DC=org Valid from: Tue Apr 09 16:18:41 CEST 2024 Valid to: Wed Apr 10 04:18:41 CEST 2024 CA: false Signature alg: SHA384WITHRSA Public key type: RSA 2048bit Allowed usage: digitalSignature keyEncipherment Serial number: 1569994718 VOMS extensions: yes. X.509 v3 certificate Subject: CN=Carmelo Pellegrino cpellegr@infn.it,O=Istituto Nazionale di Fisica Nucleare,C=IT,DC=tcs,DC=terena,DC=org Issuer: CN=GEANT TCS Authentication RSA CA 4B,O=GEANT Vereniging,C=NL Valid from: Mon Oct 16 12:57:40 CEST 2023 Valid to: Thu Nov 14 11:57:40 CET 2024 Subject alternative names: email: carmelo.pellegrino@cnaf.infn.it CA: false Signature alg: SHA384WITHRSA Public key type: RSA 8192bit Allowed usage: digitalSignature keyEncipherment Allowed extended usage: clientAuth emailProtection Serial number: 73237961961532056736463686571865333148 === Proxy Information === subject : /DC=org/DC=terena/DC=tcs/C=IT/O=Istituto Nazionale di Fisica Nucleare/CN=Carmelo Pellegrino cpellegr@infn.it/CN=1569994718 issuer : /DC=org/DC=terena/DC=tcs/C=IT/O=Istituto Nazionale di Fisica Nucleare/CN=Carmelo Pellegrino cpellegr@infn.it identity : /DC=org/DC=terena/DC=tcs/C=IT/O=Istituto Nazionale di Fisica Nucleare/CN=Carmelo Pellegrino cpellegr@infn.it type : RFC3820 compliant impersonation proxy strength : 2048 path : /tmp/x509up_u23069 timeleft : 00:00:00 key usage : Digital Signature, Key Encipherment === VO km3net.org extension information === VO : km3net.org subject : /DC=org/DC=terena/DC=tcs/C=IT/O=Istituto Nazionale di Fisica Nucleare/CN=Carmelo Pellegrino cpellegr@infn.it issuer : /DC=org/DC=terena/DC=tcs/C=IT/ST=Napoli/O=Universita degli Studi di Napoli FEDERICO II/CN=voms02.scope.unina.it attribute : /km3net.org/Role=NULL/Capability=NULL timeleft : 00:00:00 uri : voms02.scope.unina.it:15005
- Get a proxy with voms-proxy-init
apascolinit1@ui-tier1 ~ $ voms-proxy-init --voms cms Enter GRID pass phrase for this identity: Contacting voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "cms"... Remote VOMS server contacted succesfully. Created proxy in /tmp/x509up_u23077. Your proxy is valid until Tue Mar 19 22:39:41 CET 2024
- Submit a job to the CESubmit file
apascolinit1@ui-tier1 ~ $ cat submit_ssl.sub # Unix submit description file # subimt.sub -- simple sleep job use_x509userproxy = true +owner = undefined batch_name = Grid-SSL-Sleep executable = sleep.sh arguments = 3600 log = $(batch_name).log.$(Process) output = $(batch_name).out.$(Process) error = $(batch_name).err.$(Process) should_transfer_files = Yes when_to_transfer_output = ON_EXIT queue
Submit a job with SSLapascolinit1@ui-tier1 ~ $ module switch htc/ce auth=SSL num=1 Don't forget to voms-proxy-init! apascolinit1@ui-tier1 ~ $ condor_submit submit_ssl.sub Submitting job(s). 1 job(s) submitted to cluster 36. apascolinit1@ui-tier1 ~ $ condor_q -- Schedd: ce01-htc.cr.cnaf.infn.it : <131.154.193.64:9619?... @ 03/19/24 10:45:18 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS apascolini Grid-SSL-Sleep 3/19 10:44 _ 1 _ 1 36.0 Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for apascolini: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for all users: 2 jobs; 1 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended