The new cluster HTCondor 23 currently consists of:
Node(s) | Description | OS |
---|---|---|
cm01-htc e cm02-htc | A Central Management cluster of the new cluster | AlmaLinux9 |
sn01-htc | A local submit node (working on the Access Point (AP)) | AlmaLinus9 |
ce01-htc | CE for submission grid (Token + SSL) | AlmaLinux9 |
wn-204-11-* | Worker Nodes | CentOS |
Than is mounted on /opt/exp_software nodes in the production GPFS cluster.
Local Submission
To submit local jobs, the behavior is the same as for HTCondor 9 using the Jobs UI.
- Before submitting you need to make sure that an idtoken condor exists to communicate with the CM for new cluster. For example:The clusters token HTC23 is saved on ${HOME}/.condor/tokens.d/${USER}@t1htc_23.User token
apascolinit1@ui-tier1 ~ $ condor_token_list Header: {"alg":"HS256","kid":"users_password_23"} Payload: {"exp":1711366320,"iat":1710761520,"iss":"t1htc_23","jti":"82e6c20e274f5678aaf6c10c6a95f889","sub":"apascolinit1@t1htc_23"} File: /home/TIER1/apascolinit1/.condor/tokens.d/apascolinit1@t1htc_23 Header: {"alg":"HS256","kid":"users"} Payload: {"exp":1711366320,"iat":1710761520,"iss":"htc-1.cr.cnaf.infn.it","jti":"3419320701ff6284cd49ac5b613b6727","sub":"apascolinit1@t1htc_90"} File: /home/TIER1/apascolinit1/.condor/tokens.d/apascolinit1@t1htc_90
It is also necessary that you need a valid access token. You can validate your token by the "exp" using:In this case, the token is VALID then you can continue with the next steps.To verify a tokenapascolinit1@ui-tier1 ~ $ date Mon Mar 18 17:04:37 CET 2024 (--> data attuale) apascolinit1@ui-tier1 ~ $ date -d @1711366320 Mon Mar 25 12:32:00 CET 2024 (--> data di scadenza)
This step is not necessary to your production account but for now, it is best to check if everything is correct.
- Test authentication with condor_statuscondor_status to test the authentication in the cluster
apascolinit1@ui-tier1 ~ $ condor_status -pool cm01-htc -comp Machine Platform Slots Cpus Gpus TotalGb FreCpu FreeGb CpuLoad ST Jobs/Min MaxSlotGb wn-204-11-01-01-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-01-04-a.cr.cnaf.infn.it x64/CentOS7 0 40 132.42 40 119.18 0.01 Ui 0.00 * wn-204-11-01-05-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-01-06-a.cr.cnaf.infn.it x64/CentOS7 0 40 113.67 40 102.30 0.00 Ui 0.00 * wn-204-11-01-07-a.cr.cnaf.infn.it x64/CentOS7 0 40 93.75 40 84.38 0.00 Ui 0.00 * wn-204-11-05-01-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-05-02-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-05-03-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-05-04-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-05-05-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-05-06-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-05-07-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * wn-204-11-05-08-a.cr.cnaf.infn.it x64/CentOS7 0 40 151.17 40 136.05 0.00 Ui 0.00 * Total Owner Claimed Unclaimed Matched Preempting Backfill Drain x64/CentOS7 13 0 0 13 0 0 0 0 Total 13 0 0 13 0 0 0 0
- Submitting a job to the cluster.Executable and Submit file
apascolinit1@ui-tier1 ~ $ cat sleep.sh #!/bin/env bash sleep $1 apascolinit1@ui-tier1 ~ $ cat submit.sub # Unix submit description file # subimt.sub -- simple sleep job batch_name = Local-Sleep executable = sleep.sh arguments = 3600 log = $(batch_name).log.$(Process) output = $(batch_name).out.$(Process) error = $(batch_name).err.$(Process) should_transfer_files = Yes when_to_transfer_output = ON_EXIT queue
Submission and control of job statusapascolinit1@ui-tier1 ~ $ condor_submit -pool cm01-htc -remote sn01-htc submit.sub Submitting job(s). 1 job(s) submitted to cluster 15. apascolinit1@ui-tier1 ~ $ condor_q -pool cm01-htc -n sn01-htc -- Schedd: sn01-htc.cr.cnaf.infn.it : <131.154.192.242:9618?... @ 03/18/24 17:15:44 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS apascolinit1 Local-Sleep 3/18 17:15 _ 1 _ 1 15.0 Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for apascolinit1: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for all users: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Grid Submission
The GRID submission part on the ce01-htc is substantially the same as a matter that has already been examined. You can use 2 types of authentication methods:
Token submission
That the steps are identical to those in the HTCondor 9 cluster::
- Register a Client (or upload it of an already submitted)Register a new Client
apascolinit1@ui-tier1 ~ $ eval `oidc-agent-service use` 23025 apascolinit1@ui-tier1 ~ $ oidc-gen -w device Enter short name for the account to configure: htc23 [1] https://iam-t1-computing.cloud.cnaf.infn.it/ ... ... Issuer [https://iam-t1-computing.cloud.cnaf.infn.it/]: <enter> The following scopes are supported: openid profile email address phone offline_access eduperson_scoped_affiliation eduperson_entitlement eduperson_assurance entitlements Scopes or 'max' (space separated) [openid profile offline_access]: profile wlcg.groups wlcg compute.create compute.modify compute.read compute.cancel Registering Client ... Generating account configuration ... accepted Using a browser on any device, visit: https://iam-t1-computing.cloud.cnaf.infn.it/device And enter the code: HQ2WYL ... ... ... Enter encryption password for account configuration 'htc23': <passwd> Confirm encryption Password: <passwd> Everything setup correctly!
- Take a token for submission
apascolinit1@ui-tier1 ~ $ oidc-add htc23 Enter decryption password for account config 'htc23': <passwd> success apascolinit1@ui-tier1 ~ $ umask 0077 ; oidc-token htc23 > ${HOME}/token
- Submit a test jobSubmit file
apascolinit1@ui-tier1 ~ $ cat submit_token.sub # Unix submit description file # subimt.sub -- simple sleep job scitokens_file = $ENV(HOME)/token +owner = undefined batch_name = Grid-Token-Sleep executable = sleep.sh arguments = 3600 log = $(batch_name).log.$(Process) output = $(batch_name).out.$(Process) error = $(batch_name).err.$(Process) should_transfer_files = Yes when_to_transfer_output = ON_EXIT queue
Job submission with Tokenapascolinit1@ui-tier1 ~ $ export _condor_SEC_CLIENT_AUTHENTICATION_METHODS=SCITOKEN apascolinit1@ui-tier1 ~ $ export BEARER_TOKEN=$(cat ${HOME}/token) apascolinit1@ui-tier1 ~ $ condor_submit -pool ce01-htc.cr.cnaf.infn.it:9619 -remote ce01-htc.cr.cnaf.infn.it submit_token.sub Submitting job(s). 1 job(s) submitted to cluster 35. apascolinit1@ui-tier1 ~ $ condor_q -pool ce01-htc.cr.cnaf.infn.it:9619 -n ce01-htc.cr.cnaf.infn.it -- Schedd: ce01-htc.cr.cnaf.infn.it : <131.154.193.64:9619?... @ 03/19/24 10:35:43 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS apascolinius Grid-Token-Sleep 3/19 10:35 _ _ 1 1 35.0 Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended Total for apascolinius: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
SSL submission
The SSL Submission substitution of proxy, this process is almost identical.
CAVEAT
Before testing submission SSL you need to provide the x509UserProxyFQAN, this is an attribute that can be recovered from a submitted job, with the same proxy, through GSI to the production cluster:
apascolinit1@ui-tier1 ~ $ condor_q -pool ce02-htc.cr.cnaf.infn.it:9619 -n ce02-htc.cr.cnaf.infn.it <job_id> -af x509UserProxyFQAN /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=apascoli/CN=842035/CN=Alessandro Pascolini,/cms/Role=NULL/Capability=NULL
Una volta ottenuta la stringa corrispondente a x509UserProxyFQAN gli amministratori HTCondor dovranno inserirlo nella configurazione, associandolo all'username condor che l'utente preferisce.
The error which occurs if the checking of an x509UserProxyFQAN: (too many ambiguous)
apascolinit1@ui-tier1 ~ $ condor_submit -pool ce01-htc.cr.cnaf.infn.it:9619 -remote ce01-htc.cr.cnaf.infn.it submit_ssl.sub ERROR: Can't find address of schedd ce01-htc.cr.cnaf.infn.it
- Take a proxy with voms-proxy-init
apascolinit1@ui-tier1 ~ $ voms-proxy-init --voms cms Enter GRID pass phrase for this identity: Contacting voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "cms"... Remote VOMS server contacted succesfully. Created proxy in /tmp/x509up_u23077. Your proxy is valid until Tue Mar 19 22:39:41 CET 2024
- Submit a job to the CESubmit file
apascolinit1@ui-tier1 ~ $ cat submit_ssl.sub # Unix submit description file # subimt.sub -- simple sleep job use_x509userproxy = true +owner = undefined batch_name = Grid-SSL-Sleep executable = sleep.sh arguments = 3600 log = $(batch_name).log.$(Process) output = $(batch_name).out.$(Process) error = $(batch_name).err.$(Process) should_transfer_files = Yes when_to_transfer_output = ON_EXIT queue
Submit a job with SSLapascolinit1@ui-tier1 ~ $ export _condor_SEC_CLIENT_AUTHENTICATION_METHODS=SSL apascolinit1@ui-tier1 ~ $ condor_submit -pool ce01-htc.cr.cnaf.infn.it:9619 -remote ce01-htc.cr.cnaf.infn.it submit_ssl.sub Submitting job(s). 1 job(s) submitted to cluster 36. apascolinit1@ui-tier1 ~ $ condor_q -pool ce01-htc.cr.cnaf.infn.it:9619 -n ce01-htc.cr.cnaf.infn.it -- Schedd: ce01-htc.cr.cnaf.infn.it : <131.154.193.64:9619?... @ 03/19/24 10:45:18 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS apascolini Grid-SSL-Sleep 3/19 10:44 _ 1 _ 1 36.0 Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for apascolini: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for all users: 2 jobs; 1 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended