Submission utility
To ease the transition to the new cluster and the general use of HTCondor, we implemented a solution based on environment modules. The traditional interaction methods, i.e. specifying all command line options, remain valid, yet less handy and more verbose.
...
Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
title | Showing available modules |
---|
|
apascolinit1@ui-tier1 ~
$ module avail
-------------------------------------------------------- /opt/exp_software/opssw/modules/modulefiles ---------------------------------------------------------
htc/auth htc/ce htc/
htc/auth htc/ce htc/local use.own
Key:
modulepath default-version |
...
- htc/local - to be used once you want to submit jobs to or query the local schedds sn-02 or sn01-htc, respectively the HTCondor 9.0 and 23 cluster access points. This is the default module loaded when loading the "htc" family.
variable | values | description |
---|
ver | 9 | selects the old HTCondor cluster and local schedd (sn-02) |
23 | selects the new HTCondor cluster and local schedd (sn01-htc) |
Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
title | Usage of htc/local module |
---|
|
apascolinit1@ui-tier1 ~
$ module switch htc ver=9
apascolinit1@ui-tier1 ~
$ condor_q
-- Schedd: sn-02.cr.cnaf.infn.it : <131.154.192.42:9618?... @ 04/17/24 14:58:44
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL JOB_IDS
Total for query: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for apascolinit1: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for all users: 50164 jobs; 30960 completed, 1 removed, 12716 idle, 4514 running, 1973 held, 0 suspended
apascolinit1@ui-tier1 ~
$ module switch htc ver=23
apascolinit1@ui-tier1 ~
$ condor_q
-- Schedd: sn01-htc.cr.cnaf.infn.it : <131.154.192.242:9618?... @ 04/17/24 14:58:52
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL JOB_IDS
Total for query: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for apascolinit1: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for all users: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspende |
- htc/ce - eases the usage of condor_q and condor_submit commands setting up all the needed variables to contact our Grid compute entryopoints.
variable | values | description |
---|
num | 1,2,3,4,5,6 | connects to ce{num}-htc (new cluster) |
5,6,7 | connects to ce{num}-htc (old cluster) |
auth | GSIVOMS,SSL,SCITOKENS | calls htc/auth with the selected auth method |
Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
title | Usage of htc/ce module |
---|
|
apascolinit1@ui-tier1 ~
$ condor_q
Error:
......
apascolinit1@ui-tier1 ~
$ module switch htc/ce auth=SCITOKENS num=2
Don't forget to "export BEARER_TOKEN=$(oidc-token <client-name>)"!
Switching from htc/ce{auth=SCITOKENS:num=2} to htc/ce{auth=SCITOKENS:num=2}
Loading requirement: htc/auth{auth=SCITOKENS}
apascolinit1@ui-tier1 ~
$ export BEARER_TOKEN=$(oidc-token htc23)
apascolinit1@ui-tier1 ~
$ condor_q
-- Schedd: ce02-htc.cr.cnaf.infn.it : <131.154.192.41:9619?... @ 04/17/24 15:48:24
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL JOB_IDS
..........
..........
..........
|
...
Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
title | Executable and Submit file |
---|
|
budda@ui-tier1:~
$ module help htc
-------------------------------------------------------------------
Module Specific Help for /opt/exp_software/opssw/modules/modulefiles/htc/local:
Defines environment variables and aliases to ease the interaction with the INFN-T1 HTCondor local job submission system
-------------------------------------------------------------------
|
Local Submission
To submit local jobs, the behavior is the same as for HTCondor 9 using the Jobs UI.
- Submitting a job to the cluster.
Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
title | Executable and Submit file |
---|
|
apascolinit1@ui-tier1 ~
$ cat sleep.sh
#!/bin/env bash
sleep $1
apascolinit1@ui-tier1 ~
$ cat submit.sub
# Unix submit description file
# subimt.sub -- simple sleep job
batch_name = Local-Sleep
executable = sleep.sh
arguments = 3600
log = $(batch_name).log.$(Process)
output = $(batch_name).out.$(Process)
error = $(batch_name).err.$(Process)
should_transfer_files = Yes
when_to_transfer_output = ON_EXIT
queue
|
Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
title | Submission and control of job status |
---|
|
apascolinit1@ui-tier1 ~
$ module switch htc ver=23
apascolinit1@ui-tier1 ~
$ condor_submit submit.sub
Submitting job(s).
1 job(s) submitted to cluster 15.
apascolinit1@ui-tier1 ~
$ condor_q
-- Schedd: sn01-htc.cr.cnaf.infn.it : <131.154.192.242:9618?... @ 03/18/24 17:15:44
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
apascolinit1 Local-Sleep 3/18 17:15 _ 1 _ 1 15.0
Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for apascolinit1: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for all users: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
|
Grid Submission
The GRID submission part on the ce01-htc is nearly the same as the one used to submit on the old cluster. You can use 2 types of authentication methods:
...
Submission | | Grid Submission/Token Submission |
---|
|
Token submissionThat the steps That the steps are identical to those in the HTCondor 9 cluster::
- Register a Client (or upload it of an already submitted)
Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
title | Register a new Client |
---|
|
apascolinit1@ui-tier1 ~
$ eval `oidc-agent-service use`
23025
apascolinit1@ui-tier1 ~
$ oidc-gen -w device
Enter short name for the account to configure: htc23
[1] https://iam-t1-computing.cloud.cnaf.infn.it/
...
...
Issuer [https://iam-t1-computing.cloud.cnaf.infn.it/]: <enter>
The following scopes are supported: openid profile email address phone offline_access eduperson_scoped_affiliation eduperson_entitlement eduperson_assurance entitlements
Scopes or 'max' (space separated) [openid profile offline_access]: profile wlcg.groups wlcg compute.create compute.modify compute.read compute.cancel
Registering Client ...
Generating account configuration ...
accepted
Using a browser on any device, visit:
https://iam-t1-computing.cloud.cnaf.infn.it/device
And enter the code: REDACTED
...
...
...
Enter encryption password for account configuration 'htc23': <passwd>
Confirm encryption Password: <passwd>
Everything setup correctly! |
- Get a token for submission
Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
|
apascolinit1@ui-tier1 ~
$ oidc-add htc23
Enter decryption password for account config 'htc23': <passwd>
success
apascolinit1@ui-tier1 ~
$ umask 0077 ; oidc-token htc23 > ${HOME}/token
|
- Submit a test job
Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
title | Submit file |
---|
|
apascolinit1@ui-tier1 ~
$ cat submit_token.sub
# Unix submit description file
# subimt.sub -- simple sleep job
scitokens_file = $ENV(HOME)/token
+owner = undefined
batch_name = Grid-Token-Sleep
executable = sleep.sh
arguments = 3600
log = $(batch_name).log.$(Process)
output = $(batch_name).out.$(Process)
error = $(batch_name).err.$(Process)
should_transfer_files = Yes
when_to_transfer_output = ON_EXIT
queue |
Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
title | Job submission with Token |
---|
|
apascolinit1@ui-tier1 ~
$ module switch htc/ce auth=SCITOKENS num=1
Don't forget to "export BEARER_TOKEN=$(oidc-token <client-name>)"!
apascolinit1@ui-tier1 ~
$ export BEARER_TOKEN=$(oidc-token htc23)
apascolinit1@ui-tier1 ~
$ condor_submit submit_token.sub
Submitting job(s).
1 job(s) submitted to cluster 35.
apascolinit1@ui-tier1 ~
$ condor_q
-- Schedd: ce01-htc.cr.cnaf.infn.it : <131.154.193.64:9619?... @ 03/19/24 10:35:43
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
apascolinius Grid-Token-Sleep 3/19 10:35 _ _ 1 1 35.0
Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
Total for apascolinius: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended |
SSL submission
The SSL Submission substitution of proxy, this process is almost identical.
Warning |
---|
|
To be able to submit jobs using the SSL authentication, your x509UserProxyFQAN x509 User Proxy FQAN must be mapped in the CE configuration. You will need to send your x509UserProxyFQAN to the support team via via the user-support@lists.cnaf.infn.it The attribute can be recovered in different ways: after you have a valid proxy you can retreive it with mailing list the output of the voms-proxy-info --all --chain corresponding to a valid voms proxy: apascolinit1@ui :~
$ voms-proxy-info --all |
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=apascoli/CN=842035/CN=Alessandro Pascolini/CN=1239012205
issuer : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=apascoli/CN=842035/CN=Alessandro Pascolini
identity : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=apascoli/CN=842035/CN=Alessandro Pascolini
type : RFC3820 compliant impersonation proxy
strength : 2048
path : /tmp/x509up_u23077
timeleft : 11:59:53
key usage : Digital Signature, Key Encipherment
=== VO cms extension information ===
VO : cms
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=apascoli/CN=842035/CN=Alessandro Pascolini
issuer : /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch
attribute : /cms/Role=production/Capability=NULL
attribute : /cms/Role=NULL/Capability=NULL
timeleft : 11:59:52
uri : lcg-voms2.cern.ch:15002the x509UserProxyFQAN will be composed by "<subject>,<attribute1>,<attribute2>...", in this case:
Code Block |
---|
| x509UserProxyFQAN = "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=apascoli/CN=842035/CN=Alessandro Pascolini,/cms/Role=production/Capability=NULL,/cms/Role=NULL/Capability=NULL" | if you already have running jobs submitted with GSI auth you can get the x509UserProxyFQAN attribute with: Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
| apascolinit1@ui-tier1 ~
$ condor_q -pool ce02-htc.cr.cnaf.infn.it:9619 -n ce02-htc.cr.cnaf.infn.it <job_id> -af x509UserProxyFQAN
/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=apascoli/CN=842035/CN=Alessandro Pascolini,/cms/Role=NULL/Capability=NULL |
In case your x509UserProxyFQAN hasn't been mapped into the CE configuration you will be shown the following error: Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
| apascolinit1@ui-tier1 ~
$ condor_submit -pool ce01-htc.cr.cnaf.infn.it:9619 -remote ce01-htc.cr.cnaf.infn.it submit_ssl.sub
ERROR: Can't find address of schedd ce01-htc.cr.cnaf.infn.it |
--chain
=== Proxy Chain Information ===
X.509 v3 certificate
Subject: CN=1569994718,CN=Carmelo Pellegrino cpellegr@infn.it,O=Istituto Nazionale di Fisica Nucleare,C=IT,DC=tcs,DC=terena,DC=org
Issuer: CN=Carmelo Pellegrino cpellegr@infn.it,O=Istituto Nazionale di Fisica Nucleare,C=IT,DC=tcs,DC=terena,DC=org
Valid from: Tue Apr 09 16:18:41 CEST 2024
Valid to: Wed Apr 10 04:18:41 CEST 2024
CA: false
Signature alg: SHA384WITHRSA
Public key type: RSA 2048bit
Allowed usage: digitalSignature keyEncipherment
Serial number: 1569994718
VOMS extensions: yes.
X.509 v3 certificate
Subject: CN=Carmelo Pellegrino cpellegr@infn.it,O=Istituto Nazionale di Fisica Nucleare,C=IT,DC=tcs,DC=terena,DC=org
Issuer: CN=GEANT TCS Authentication RSA CA 4B,O=GEANT Vereniging,C=NL
Valid from: Mon Oct 16 12:57:40 CEST 2023
Valid to: Thu Nov 14 11:57:40 CET 2024
Subject alternative names:
email: carmelo.pellegrino@cnaf.infn.it
CA: false
Signature alg: SHA384WITHRSA
Public key type: RSA 8192bit
Allowed usage: digitalSignature keyEncipherment
Allowed extended usage: clientAuth emailProtection
Serial number: 73237961961532056736463686571865333148
=== Proxy Information ===
subject : /DC=org/DC=terena/DC=tcs/C=IT/O=Istituto Nazionale di Fisica Nucleare/CN=Carmelo Pellegrino cpellegr@infn.it/CN=1569994718
issuer : /DC=org/DC=terena/DC=tcs/C=IT/O=Istituto Nazionale di Fisica Nucleare/CN=Carmelo Pellegrino cpellegr@infn.it
identity : /DC=org/DC=terena/DC=tcs/C=IT/O=Istituto Nazionale di Fisica Nucleare/CN=Carmelo Pellegrino cpellegr@infn.it
type : RFC3820 compliant impersonation proxy
strength : 2048
path : /tmp/x509up_u23069
timeleft : 00:00:00
key usage : Digital Signature, Key Encipherment
=== VO km3net.org extension information ===
VO : km3net.org
subject : /DC=org/DC=terena/DC=tcs/C=IT/O=Istituto Nazionale di Fisica Nucleare/CN=Carmelo Pellegrino cpellegr@infn.it
issuer : /DC=org/DC=terena/DC=tcs/C=IT/ST=Napoli/O=Universita degli Studi di Napoli FEDERICO II/CN=voms02.scope.unina.it
attribute : /km3net.org/Role=NULL/Capability=NULL
timeleft : 00:00:00
uri : voms02.scope.unina.it:15005 |
|
- Get a proxy with Get a proxy with voms-proxy-init
Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
|
apascolinit1@ui-tier1 ~
$ voms-proxy-init --voms cms
Enter GRID pass phrase for this identity:
Contacting voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "cms"...
Remote VOMS server contacted succesfully.
Created proxy in /tmp/x509up_u23077.
Your proxy is valid until Tue Mar 19 22:39:41 CET 2024 |
- Submit a job to the CE
Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
title | Submit file |
---|
|
apascolinit1@ui-tier1 ~
$ cat submit_ssl.sub
# Unix submit description file
# subimt.sub -- simple sleep job
use_x509userproxy = true
+owner = undefined
batch_name = Grid-SSL-Sleep
executable = sleep.sh
arguments = 3600
log = $(batch_name).log.$(Process)
output = $(batch_name).out.$(Process)
error = $(batch_name).err.$(Process)
should_transfer_files = Yes
when_to_transfer_output = ON_EXIT
queue |
Code Block |
---|
language | bash |
---|
theme | Midnight |
---|
title | Submit a job with SSL |
---|
|
apascolinit1@ui-tier1 ~
$ module switch htc/ce auth=SSLVOMS num=1
Don't forget to voms-proxy-init!
apascolinit1@ui-tier1 ~
$ condor_submit submit_ssl.sub
Submitting job(s).
1 job(s) submitted to cluster 36.
apascolinit1@ui-tier1 ~
$ condor_q
-- Schedd: ce01-htc.cr.cnaf.infn.it : <131.154.193.64:9619?... @ 03/19/24 10:45:18
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
apascolini Grid-SSL-Sleep 3/19 10:44 _ 1 _ 1 36.0
Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for apascolini: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for all users: 2 jobs; 1 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended |
...