The new cluster HTCondor 23 currently consists of:
Node(s) | Description | OS |
---|---|---|
cm01-htc e cm02-htc | A Central Management cluster of the new cluster | AlmaLinux9 |
sn01-htc | A local submit node (working on the Access Point (AP)) | AlmaLinus9 |
ce01-htc | CE for submission grid (Token + SSL) | AlmaLinux9 |
wn-204-11-* | Worker Nodes | CentOS |
Than is mounted on /opt/exp_software nodes in the production GPFS cluster.
Warning | ||
---|---|---|
| ||
The main difference in the submission workflow is to refer to the Cental Manager of the new cluster, i.e. adding -pool cm01-htc in the submission and query commands. |
Local Submission
To submit local jobs, the behavior is the same as for HTCondor 9 using the Jobs UI.
- Submitting a job to the cluster.Before submitting you need to make sure that an idtoken condor exists to communicate with the CM for new cluster. For example:
The clusters token HTC23 is saved on ${HOME}/.condor/tokens.d/${USER}@t1htc_23.Code Block language bash theme Midnight title User tokenExecutable and Submit file apascolinit1@ui-tier1 ~ $ condor_token_list Header: {"alg":"HS256","kid":"users_password_23"} Payload: {"exp":1711366320,"iat":1710761520,"iss":"t1htc_23","jti":"82e6c20e274f5678aaf6c10c6a95f889","sub":"apascolinit1@t1htc_23"} File: /home/TIER1/apascolinit1/.condor/tokens.d/apascolinit1@t1htc_23 Header: {"alg":"HS256","kid":"users"} Payload: {"exp":1711366320,"iat":1710761520,"iss":"htc-1.cr.cnaf.infn.it","jti":"3419320701ff6284cd49ac5b613b6727","sub":"apascolinit1@t1htc_90"} File: /home/TIER1/apascolinit1/.condor/tokens.d/apascolinit1@t1htc_90
It is also necessary that you need a valid access token. You can validate your token by the "exp" using:
In this case, the token is VALID then you can continue with the next steps.Code Block language bash theme Midnight title To verify a token apascolinit1@ui-tier1 ~ $ date Mon Mar 18 17:04:37 CET 2024 (--> data attuale) apascolinit1@ui-tier1 ~ $ date -d @1711366320 Mon Mar 25 12:32:00 CET 2024 (--> data di scadenza)
This step is not necessary to your production account but for now, it is best to check if everything is correct.
Test authentication with condor_statusCode Block language bash theme Midnight title condor_status to test the authentication in the cluster apascolinit1@ui-tier1 ~ $ condor_status -pool cm01-htc -comp Machine cat sleep.sh #!/bin/env bash sleep $1 apascolinit1@ui-tier1 ~ $ cat submit.sub # Unix submit description file # subimt.sub -- simple sleep job batch_name = Local-Sleep executable = sleep.sh arguments = 3600 log = $(batch_name).log.$(Process) output = $(batch_name).out.$(Process) error = $(batch_name).err.$(Process) should_transfer_files Platform Slots Cpus Gpus TotalGb FreCpu FreeGb CpuLoad ST Jobs/Min MaxSlotGb wn-204-11-01-01-a= Yes when_to_transfer_output = ON_EXIT queue
Submitting a job to the cluster.Code Block language bash theme Midnight title Submission and control of job status apascolinit1@ui-tier1 ~ $ condor_submit -pool cm01-htc -remote sn01-htc submit.sub Submitting job(s). 1 job(s) submitted to cluster 15. apascolinit1@ui-tier1 ~ $ condor_q -pool cm01-htc -n sn01-htc -- Schedd: sn01-htc.cr.cnaf.infn.it x64/CentOS7 : <131.154.192.242:9618?... @ 03/18/24 17:15:44 OWNER 0 40 BATCH_NAME SUBMITTED 151.17 DONE RUN 40 136.05IDLE TOTAL 0.00 Ui 0.00 * wn-204-11-01-04-a.cr.cnaf.infn.it x64/CentOS7 0 40JOB_IDS apascolinit1 Local-Sleep 3/18 17:15 _ 1 _ 132.42 40 119.18 0.01 Ui 0.00 * wn-204-11-01-05-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-01-06-a.cr.cnaf.infn.it x64/CentOS7 0 40 113.67 40 102.30 0.00 Ui 0.00 * wn-204-11-01-07-a.cr.cnaf.infn.it x64/CentOS7 0 40 93.75 40 84.38 0.00 Ui 0.00 * wn-204-11-05-01-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-05-02-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-05-03-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-05-04-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-05-05-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-05-06-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-05-07-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-05-08-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * Total Owner Claimed Unclaimed Matched Preempting Backfill Drain x64/CentOS7 13 0 0 13 0 0 0 0 Total 13 0 0 13 0 0 0 0
Code Block apascolinit1@ui-tier1language bash theme Midnight title Executable and Submit file 1 15.0 Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for apascolinit1: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for all users: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Grid Submission
The GRID submission part on the ce01-htc is nearly the same as the one used to submit on the old cluster. You can use 2 types of authentication methods:
Token submission
That the steps are identical to those in the HTCondor 9 cluster::
- Register a Client (or upload it of an already submitted)~
$ cat sleep.sh
#!/bin/env bash
sleep $1
apascolinit1@ui-tier1 ~
$ cat submit.sub
# Unix submit description file
# subimt.sub -- simple sleep job
batch_name = Local-Sleep
executable = sleep.sh
arguments = 3600
log = $(batch_name).log.$(Process)
output = $(batch_name).out.$(Process)
error = $(batch_name).err.$(Process)
should_transfer_files = Yes
when_to_transfer_output = ON_EXIT
queue
Code Block language bash theme Midnight title Submission and control of job status Register a new Client apascolinit1@ui-tier1 ~ $ eval `oidc-agent-service use` 23025apascolinit1@ui-tier1 ~ $ condor_submit -pool cm01-htc -remote sn01-htc submit.sub Submitting job(s). 1 job(s) submitted to cluster 15. apascolinit1@ui-tier1 ~ $ condor_q -pool cm01-htc -n sn01-htc -- Schedd: sn01-htc.cr oidc-gen -w device Enter short name for the account to configure: htc23 [1] https://iam-t1-computing.cloud.cnaf.infn.it : <131.154.192.242:9618?... @ 03/18/24 17:15:44 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS apascolinit1 Local-Sleep 3/18 17:15 _ 1 _ 1 15.0 Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for apascolinit1: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for all users: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Grid Submission
The GRID submission part on the ce01-htc is substantially the same as a matter that has already been examined. You can use 2 types of authentication methods:
Token submission
That the steps are identical to those in the HTCondor 9 cluster::
/ ... ... Issuer [https://iam-t1-computing.cloud.cnaf.infn.it/]: <enter> The following scopes are supported: openid profile email address phone offline_access eduperson_scoped_affiliation eduperson_entitlement eduperson_assurance entitlements Scopes or 'max' (space separated) [openid profile offline_access]: profile wlcg.groups wlcg compute.create compute.modify compute.read compute.cancel Registering Client ... Generating account configuration ... accepted Using a browser on any device, visit: https://iam-t1-computing.cloud.cnaf.infn.it/device And enter the code: HQ2WYL ... ... ... Enter encryption password for account configuration 'htc23': <passwd> Confirm encryption Password: <passwd> Everything setup correctly!
- Get a token for submission
Code Block language bash theme Midnight apascolinit1@ui-tier1 ~ $ oidc-add htc23 Enter decryption password for account config 'htc23': <passwd> success apascolinit1@ui-tier1 ~ $ umask 0077 ; oidc-token htc23 > ${HOME}/token
- Submit a test job
Code Block language bash theme Midnight title Submit file apascolinit1@ui-tier1 ~ $ cat submit_token.sub # Unix submit description file # subimt.sub -- simple sleep job scitokens_file = $ENV(HOME)/token +owner = undefined batch_name = Grid-Token-Sleep executable = sleep.sh arguments = 3600 log = $(batch_name).log.$(Process) output = $(batch_name).out.$(Process) error = $(batch_name).err.$(Process) should_transfer_files = Yes when_to_transfer_output = ON_EXIT queue
- Register a Client (or upload it of an already submitted)
Code Block language bash theme Midnight title Register a new Client apascolinit1@ui-tier1 ~ $ eval `oidc-agent-service use` 23025 apascolinit1@ui-tier1 ~ $ oidc-gen -w device Enter short name for the account to configure: htc23 [1] https://iam-t1-computing.cloud.cnaf.infn.it/ ... ... Issuer [https://iam-t1-computing.cloud.cnaf.infn.it/]: <enter> The following scopes are supported: openid profile email address phone offline_access eduperson_scoped_affiliation eduperson_entitlement eduperson_assurance entitlements Scopes or 'max' (space separated) [openid profile offline_access]: profile wlcg.groups wlcg compute.create compute.modify compute.read compute.cancel Registering Client ... Generating account configuration ... accepted Using a browser on any device, visit: https://iam-t1-computing.cloud.cnaf.infn.it/device And enter the code: HQ2WYL ... ... ... Enter encryption password for account configuration 'htc23': <passwd> Confirm encryption Password: <passwd> Everything setup correctly!
- Take a token for submission
Code Block language bash theme Midnight apascolinit1@ui-tier1 ~ $ oidc-add htc23 Enter decryption password for account config 'htc23': <passwd> success apascolinit1@ui-tier1 ~ $ umask 0077 ; oidc-token htc23 > ${HOME}/token
- Submit a test job
Code Block language bash theme Midnight title Submit fileJob submission with Token apascolinit1@ui-tier1 ~ $ cat submit_token.sub # Unix submit description file # subimt.sub -- simple sleep job scitokens_file = $ENV(HOME)/token +owner = undefined batch_name = Grid-Token-Sleep executable = sleep.sh arguments export _condor_SEC_CLIENT_AUTHENTICATION_METHODS=SCITOKEN apascolinit1@ui-tier1 ~ $ export BEARER_TOKEN=$(cat ${HOME}/token) apascolinit1@ui-tier1 ~ $ condor_submit -pool ce01-htc.cr.cnaf.infn.it:9619 -remote ce01-htc.cr.cnaf.infn.it submit_token.sub Submitting job(s). 1 job(s) submitted to cluster 35. apascolinit1@ui-tier1 ~ $ condor_q -pool ce01-htc.cr.cnaf.infn.it:9619 -n ce01-htc.cr.cnaf.infn.it -- Schedd: ce01-htc.cr.cnaf.infn.it : <131.154.193.64:9619?... @ 03/19/24 10:35:43 OWNER BATCH_NAME = 3600 log SUBMITTED DONE RUN IDLE = $(batch_name).log.$(Process) output TOTAL JOB_IDS apascolinius Grid-Token-Sleep 3/19 10:35 _ = $(batch_name).out.$(Process) error 1 = $(batch_name).err.$(Process) should_transfer_files = Yes when_to_transfer_output = ON_EXIT queue
Code Block language bash theme Midnight title Job submission with Token apascolinit1@ui-tier1 ~ $ export _condor_SEC_CLIENT_AUTHENTICATION_METHODS=SCITOKEN apascolinit1@ui-tier1 ~ $ export BEARER_TOKEN=$(cat ${HOME}/token) apascolinit1@ui-tier1 ~ $ condor_submit -pool ce01-htc.cr.cnaf.infn.it:9619 -remote ce01-htc.cr.cnaf.infn.it submit_token.sub Submitting job(s). 1 job(s) submitted to cluster 35. apascolinit1@ui-tier1 ~ $ condor_q -pool ce01-htc.cr.cnaf.infn.it:9619 -n ce01-htc.cr.cnaf.infn.it -- Schedd: ce01-htc.cr.cnaf.infn.it : <131.154.193.64:9619?... @ 03/19/24 10:35:43 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS apascolinius Grid-Token-Sleep 3/19 10:35 _ _ 1 1 35.0 Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended Total for apascolinius: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
SSL submission
The SSL Submission substitution of proxy, this process is almost identical.
1 35.0 Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended Total for apascolinius: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
SSL submission
The SSL Submission substitution of proxy, this process is almost identical.
Warning | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||
Tobe able to submit jobs using the SSL authentication, your x509UserProxyFQAN must be mapped in the CE configuration. The attribute can be recovered in different ways:
| ||||||||||||||
Warning | ||||||||||||||
| ||||||||||||||
Before testing submission SSL you need to provide the x509UserProxyFQAN, this is an attribute that can be recovered from a submitted job, with the same proxy, through GSI to the production cluster
Becomes readily available the string an x509UserProxyFQAN the HTCondor administrator must take a configuration change, associating it with the condor username that the user prefers. The error which occurs if the checking of an x509UserProxyFQAN: (too many ambiguous)In case your x509UserProxyFQAN hasn't been mapped into the CE configuration you will be shown the following error:
|
- Take Get a proxy with voms-proxy-init
Code Block language bash theme Midnight apascolinit1@ui-tier1 ~ $ voms-proxy-init --voms cms Enter GRID pass phrase for this identity: Contacting voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "cms"... Remote VOMS server contacted succesfully. Created proxy in /tmp/x509up_u23077. Your proxy is valid until Tue Mar 19 22:39:41 CET 2024
- Submit a job to the CE
Code Block language bash theme Midnight title Submit file apascolinit1@ui-tier1 ~ $ cat submit_ssl.sub # Unix submit description file # subimt.sub -- simple sleep job use_x509userproxy = true +owner = undefined batch_name = Grid-SSL-Sleep executable = sleep.sh arguments = 3600 log = $(batch_name).log.$(Process) output = $(batch_name).out.$(Process) error = $(batch_name).err.$(Process) should_transfer_files = Yes when_to_transfer_output = ON_EXIT queue
Code Block language bash theme Midnight title Submit a job with SSL apascolinit1@ui-tier1 ~ $ export _condor_SEC_CLIENT_AUTHENTICATION_METHODS=SSL apascolinit1@ui-tier1 ~ $ condor_submit -pool ce01-htc.cr.cnaf.infn.it:9619 -remote ce01-htc.cr.cnaf.infn.it submit_ssl.sub Submitting job(s). 1 job(s) submitted to cluster 36. apascolinit1@ui-tier1 ~ $ condor_q -pool ce01-htc.cr.cnaf.infn.it:9619 -n ce01-htc.cr.cnaf.infn.it -- Schedd: ce01-htc.cr.cnaf.infn.it : <131.154.193.64:9619?... @ 03/19/24 10:45:18 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS apascolini Grid-SSL-Sleep 3/19 10:44 _ 1 _ 1 36.0 Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for apascolini: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for all users: 2 jobs; 1 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
...