Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The new cluster HTCondor 23 currently consists of:

Node(s)
Description
OS
cm01-htc e cm02-htcA Central Management cluster of the new clusterAlmaLinux9
sn01-htcA local submit node (working on the  Access Point (AP)) AlmaLinus9
ce01-htc

CE for submission grid (Token + SSL)

AlmaLinux9
wn-204-11-*

Worker Nodes

CentOS


Than is mounted on /opt/exp_software  nodes in the production GPFS cluster.

Warning
titleSubmission to the new cluster

The main difference in the submission workflow is to refer to the Cental Manager of the new cluster, i.e. adding -pool cm01-htc in the submission and query commands.


Local Submission   

To submit local jobs, the behavior is the same as for HTCondor 9 using the Jobs UI. 

  1. Submitting a job to the cluster.Before submitting you need to make sure that an idtoken condor exists to communicate with the CM for new cluster.  For example:
    Code Block
    languagebash
    themeMidnight
    titleUser tokenExecutable and Submit file
    apascolinit1@ui-tier1 ~
    $ condor_token_list
    Header: {"alg":"HS256","kid":"users_password_23"} Payload: {"exp":1711366320,"iat":1710761520,"iss":"t1htc_23","jti":"82e6c20e274f5678aaf6c10c6a95f889","sub":"apascolinit1@t1htc_23"} File: /home/TIER1/apascolinit1/.condor/tokens.d/apascolinit1@t1htc_23
    Header: {"alg":"HS256","kid":"users"} Payload: {"exp":1711366320,"iat":1710761520,"iss":"htc-1.cr.cnaf.infn.it","jti":"3419320701ff6284cd49ac5b613b6727","sub":"apascolinit1@t1htc_90"} File: /home/TIER1/apascolinit1/.condor/tokens.d/apascolinit1@t1htc_90
    The clusters token HTC23 is saved on ${HOME}/.condor/tokens.d/${USER}@t1htc_23. 
    It is also necessary that you need a valid access token. You can validate your token by the "exp" using: 
    Code Block
    languagebash
    themeMidnight
    titleTo verify a token
    apascolinit1@ui-tier1 ~
    $ date
    Mon Mar 18 17:04:37 CET 2024 (--> data attuale)
    apascolinit1@ui-tier1 ~
    $ date -d @1711366320
    Mon Mar 25 12:32:00 CET 2024 (--> data di scadenza)
     In this case, the token is VALID  then you can continue with the next steps.

    This step is not necessary to your production account but for now, it is best to check if everything is correct.

    Test authentication with condor_status
    Code Block
    languagebash
    themeMidnight
    titlecondor_status to test the authentication in the cluster
    apascolinit1@ui-tier1 ~
    $ condor_status -pool cm01-htc -comp
    Machine cat sleep.sh
    #!/bin/env bash
    sleep $1
    
    
    apascolinit1@ui-tier1 ~
    $ cat submit.sub
    # Unix submit description file
    # subimt.sub -- simple sleep job
    
    batch_name              = Local-Sleep
    executable              = sleep.sh
    arguments               = 3600
    log                     = $(batch_name).log.$(Process)
    output                  = $(batch_name).out.$(Process)
    error                   = $(batch_name).err.$(Process)
    should_transfer_files               Platform    Slots Cpus Gpus  TotalGb FreCpu  FreeGb  CpuLoad ST Jobs/Min MaxSlotGb
    
    wn-204-11-01-01-a= Yes
    when_to_transfer_output = ON_EXIT
    
    queue
    
    
    Code Block
    languagebash
    themeMidnight
    titleSubmission and control of job status
    apascolinit1@ui-tier1 ~
    $ condor_submit -pool cm01-htc -remote sn01-htc submit.sub
    Submitting job(s).
    1 job(s) submitted to cluster 15.
    
    
    apascolinit1@ui-tier1 ~
    $ condor_q -pool cm01-htc -n sn01-htc
    
    
    -- Schedd: sn01-htc.cr.cnaf.infn.it x64/CentOS7  : <131.154.192.242:9618?... @ 03/18/24 17:15:44
    OWNER    0   40 BATCH_NAME     SUBMITTED  151.17 DONE   RUN 40   136.05IDLE  TOTAL  0.00 Ui     0.00 *
    wn-204-11-01-04-a.cr.cnaf.infn.it x64/CentOS7     0   40JOB_IDS
    apascolinit1 Local-Sleep   3/18 17:15      _      1      _  132.42     40   119.18    0.01 Ui     0.00 *
    wn-204-11-01-05-a.cr.cnaf.infn.it x64/CentOS7     0   40        151.17     40   136.05    0.00 Ui     0.00 *
    wn-204-11-01-06-a.cr.cnaf.infn.it x64/CentOS7     0   40        113.67     40   102.30    0.00 Ui     0.00 *
    wn-204-11-01-07-a.cr.cnaf.infn.it x64/CentOS7     0   40         93.75     40    84.38    0.00 Ui     0.00 *
    wn-204-11-05-01-a.cr.cnaf.infn.it x64/CentOS7     0   40        151.17     40   136.05    0.00 Ui     0.00 *
    wn-204-11-05-02-a.cr.cnaf.infn.it x64/CentOS7     0   40        151.17     40   136.05    0.00 Ui     0.00 *
    wn-204-11-05-03-a.cr.cnaf.infn.it x64/CentOS7     0   40        151.17     40   136.05    0.00 Ui     0.00 *
    wn-204-11-05-04-a.cr.cnaf.infn.it x64/CentOS7     0   40        151.17     40   136.05    0.00 Ui     0.00 *
    wn-204-11-05-05-a.cr.cnaf.infn.it x64/CentOS7     0   40        151.17     40   136.05    0.00 Ui     0.00 *
    wn-204-11-05-06-a.cr.cnaf.infn.it x64/CentOS7     0   40        151.17     40   136.05    0.00 Ui     0.00 *
    wn-204-11-05-07-a.cr.cnaf.infn.it x64/CentOS7     0   40        151.17     40   136.05    0.00 Ui     0.00 *
    wn-204-11-05-08-a.cr.cnaf.infn.it x64/CentOS7     0   40        151.17     40   136.05    0.00 Ui     0.00 *
    
                   Total Owner Claimed Unclaimed Matched Preempting Backfill  Drain
    
       x64/CentOS7    13     0       0        13       0          0        0      0
    
             Total    13     0       0        13       0          0        0      0
    Submitting a job to the cluster.
    Code Block
    languagebash
    themeMidnight
    titleExecutable and Submit file
    apascolinit1@ui-tier1
    1 15.0
    
    Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
    Total for apascolinit1: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
    Total for all users: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
    
    

Grid Submission

The GRID submission part on the ce01-htc is nearly the same as the one used to submit on the old cluster. You can use 2 types of authentication methods:

Token submission

That the steps are identical to those in the HTCondor 9 cluster::

  1. Register a Client (or upload it of an already submitted)~ $ cat sleep.sh #!/bin/env bash sleep $1 apascolinit1@ui-tier1 ~ $ cat submit.sub # Unix submit description file # subimt.sub -- simple sleep job batch_name = Local-Sleep executable = sleep.sh arguments = 3600 log = $(batch_name).log.$(Process) output = $(batch_name).out.$(Process) error = $(batch_name).err.$(Process) should_transfer_files = Yes when_to_transfer_output = ON_EXIT queue
    Code Block
    languagebash
    themeMidnight
    titleSubmission and control of job statusRegister a new Client
    apascolinit1@ui-tier1 ~
    $ eval `oidc-agent-service use`
    23025apascolinit1@ui-tier1 ~
    $ condor_submit -pool cm01-htc -remote sn01-htc submit.sub
    Submitting job(s).
    1 job(s) submitted to cluster 15.
    
    
    apascolinit1@ui-tier1 ~
    $ condor_q -pool cm01-htc -n sn01-htc
    
    
    -- Schedd: sn01-htc.cr oidc-gen -w device
    Enter short name for the account to configure: htc23
    [1] https://iam-t1-computing.cloud.cnaf.infn.it : <131.154.192.242:9618?... @ 03/18/24 17:15:44
    OWNER        BATCH_NAME     SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
    apascolinit1 Local-Sleep   3/18 17:15      _      1      _      1 15.0
    
    Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
    Total for apascolinit1: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
    Total for all users: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
    
    

Grid Submission

The GRID submission part on the ce01-htc is substantially the same as a matter that has already been examined. You can use 2 types of authentication methods:

Token submission

That the steps are identical to those in the HTCondor 9 cluster::

  1. /
    ...
    ...
    Issuer [https://iam-t1-computing.cloud.cnaf.infn.it/]: <enter>
    The following scopes are supported: openid profile email address phone offline_access eduperson_scoped_affiliation eduperson_entitlement eduperson_assurance entitlements
    Scopes or 'max' (space separated) [openid profile offline_access]: profile wlcg.groups wlcg compute.create compute.modify compute.read compute.cancel
    Registering Client ...
    Generating account configuration ...
    accepted
    
    Using a browser on any device, visit:
    https://iam-t1-computing.cloud.cnaf.infn.it/device
    
    And enter the code: HQ2WYL
    ...
    ...
    ...
    Enter encryption password for account configuration 'htc23': <passwd>
    Confirm encryption Password: <passwd> 
    Everything setup correctly!
  2. Get a token for submission
    Code Block
    languagebash
    themeMidnight
    apascolinit1@ui-tier1 ~
    $ oidc-add htc23
    Enter decryption password for account config 'htc23': <passwd>
    success
    
    apascolinit1@ui-tier1 ~
    $ umask 0077 ; oidc-token htc23 > ${HOME}/token
    
  3. Submit a test job
    Code Block
    languagebash
    themeMidnight
    titleSubmit file
    apascolinit1@ui-tier1 ~
    $ cat submit_token.sub
    # Unix submit description file
    # subimt.sub -- simple sleep job
    
    scitokens_file          = $ENV(HOME)/token
    +owner                  = undefined
    
    batch_name              = Grid-Token-Sleep
    executable              = sleep.sh
    arguments               = 3600
    log                     = $(batch_name).log.$(Process)
    output                  = $(batch_name).out.$(Process)
    error                   = $(batch_name).err.$(Process)
    should_transfer_files   = Yes
    when_to_transfer_output = ON_EXIT
    
    queue
  4. Register a Client (or upload it of an already submitted)
    Code Block
    languagebash
    themeMidnight
    titleRegister a new Client
    apascolinit1@ui-tier1 ~
    $ eval `oidc-agent-service use`
    23025
    
    apascolinit1@ui-tier1 ~
    $ oidc-gen -w device
    Enter short name for the account to configure: htc23
    [1] https://iam-t1-computing.cloud.cnaf.infn.it/
    ...
    ...
    Issuer [https://iam-t1-computing.cloud.cnaf.infn.it/]: <enter>
    The following scopes are supported: openid profile email address phone offline_access eduperson_scoped_affiliation eduperson_entitlement eduperson_assurance entitlements
    Scopes or 'max' (space separated) [openid profile offline_access]: profile wlcg.groups wlcg compute.create compute.modify compute.read compute.cancel
    Registering Client ...
    Generating account configuration ...
    accepted
    
    Using a browser on any device, visit:
    https://iam-t1-computing.cloud.cnaf.infn.it/device
    
    And enter the code: HQ2WYL
    ...
    ...
    ...
    Enter encryption password for account configuration 'htc23': <passwd>
    Confirm encryption Password: <passwd> 
    Everything setup correctly!
  5. Take a token for submission
    Code Block
    languagebash
    themeMidnight
    apascolinit1@ui-tier1 ~
    $ oidc-add htc23
    Enter decryption password for account config 'htc23': <passwd>
    success
    
    apascolinit1@ui-tier1 ~
    $ umask 0077 ; oidc-token htc23 > ${HOME}/token
    
  6. Submit a test job
    Code Block
    languagebash
    themeMidnight
    titleSubmit fileJob submission with Token
    apascolinit1@ui-tier1 ~
    $ cat submit_token.sub
    # Unix submit description file
    # subimt.sub -- simple sleep job
    
    scitokens_file          = $ENV(HOME)/token
    +owner                  = undefined
    
    batch_name              = Grid-Token-Sleep
    executable              = sleep.sh
    arguments    export _condor_SEC_CLIENT_AUTHENTICATION_METHODS=SCITOKEN
    
    apascolinit1@ui-tier1 ~
    $ export BEARER_TOKEN=$(cat ${HOME}/token)
    
    apascolinit1@ui-tier1 ~
    $ condor_submit -pool ce01-htc.cr.cnaf.infn.it:9619 -remote ce01-htc.cr.cnaf.infn.it submit_token.sub
    Submitting job(s).
    1 job(s) submitted to cluster 35.
    
    apascolinit1@ui-tier1 ~
    $ condor_q  -pool ce01-htc.cr.cnaf.infn.it:9619 -n ce01-htc.cr.cnaf.infn.it
    
    
    -- Schedd: ce01-htc.cr.cnaf.infn.it : <131.154.193.64:9619?... @ 03/19/24 10:35:43
    OWNER        BATCH_NAME   = 3600
    log      SUBMITTED   DONE   RUN    IDLE     = $(batch_name).log.$(Process)
    output     TOTAL JOB_IDS
    apascolinius Grid-Token-Sleep   3/19 10:35        _     = $(batch_name).out.$(Process)
    error      1             = $(batch_name).err.$(Process)
    should_transfer_files   = Yes
    when_to_transfer_output = ON_EXIT
    
    queue
    Code Block
    languagebash
    themeMidnight
    titleJob submission with Token
    apascolinit1@ui-tier1 ~
    $ export _condor_SEC_CLIENT_AUTHENTICATION_METHODS=SCITOKEN
    
    apascolinit1@ui-tier1 ~
    $ export BEARER_TOKEN=$(cat ${HOME}/token)
    
    apascolinit1@ui-tier1 ~
    $ condor_submit -pool ce01-htc.cr.cnaf.infn.it:9619 -remote ce01-htc.cr.cnaf.infn.it submit_token.sub
    Submitting job(s).
    1 job(s) submitted to cluster 35.
    
    apascolinit1@ui-tier1 ~
    $ condor_q  -pool ce01-htc.cr.cnaf.infn.it:9619 -n ce01-htc.cr.cnaf.infn.it
    
    
    -- Schedd: ce01-htc.cr.cnaf.infn.it : <131.154.193.64:9619?... @ 03/19/24 10:35:43
    OWNER        BATCH_NAME          SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
    apascolinius Grid-Token-Sleep   3/19 10:35      _      _      1      1 35.0
    
    Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
    Total for apascolinius: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
    Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

SSL submission

The SSL Submission substitution of proxy, this process is almost identical.

  1. 1 35.0
    
    Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
    Total for apascolinius: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
    Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended


SSL submission

The SSL Submission substitution of proxy, this process is almost identical.

Warning
titleCAVEAT

Tobe able to submit jobs using the SSL authentication, your x509UserProxyFQAN must be mapped in the CE configuration.
You will need to send your x509UserProxyFQAN to the support team via user-support@lists.cnaf.infn.it

The attribute can be recovered in different ways:

  • after you have a valid proxy you can retreive it with:
    Code Block
    themeMidnight
    apascolinit1@ui-tier1 ~
    $ voms-proxy-info --all
    subject   : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=apascoli/CN=842035/CN=Alessandro Pascolini/CN=1239012205
    issuer    : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=apascoli/CN=842035/CN=Alessandro Pascolini
    identity  : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=apascoli/CN=842035/CN=Alessandro Pascolini
    type      : RFC3820 compliant impersonation proxy
    strength  : 2048
    path      : /tmp/x509up_u23077
    timeleft  : 11:59:53
    key usage : Digital Signature, Key Encipherment
    === VO cms extension information ===
    VO        : cms
    subject   : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=apascoli/CN=842035/CN=Alessandro Pascolini
    issuer    : /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch
    attribute : /cms/Role=production/Capability=NULL
    attribute : /cms/Role=NULL/Capability=NULL
    timeleft  : 11:59:52
    uri       : lcg-voms2.cern.ch:15002
    the x509UserProxyFQAN will be composed by "<subject>,<attribute1>,<attribute2>...", in this case:
    Code Block
    themeMidnight
    x509UserProxyFQAN = "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=apascoli/CN=842035/CN=Alessandro Pascolini,/cms/Role=production/Capability=NULL,/cms/Role=NULL/Capability=NULL"
  • if you already have running jobs submitted with GSI auth you can get the x509UserProxyFQAN attribute with
Warning
titleCAVEAT
Before testing submission SSL you need to provide the x509UserProxyFQAN, this is an attribute that can be recovered from a submitted job, with the same proxy, through GSI  to the production cluster
  • :
    Code Block
    languagebash
    themeMidnight
    apascolinit1@ui-tier1 ~
    $ condor_q -pool ce02-htc.cr.cnaf.infn.it:9619 -n ce02-htc.cr.cnaf.infn.it <job_id> -af x509UserProxyFQAN
    /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=apascoli/CN=842035/CN=Alessandro Pascolini,/cms/Role=NULL/Capability=NULL

Becomes readily available the string an x509UserProxyFQAN the HTCondor administrator must take a configuration change, associating it with the condor username that the user prefers.


The error which occurs if the checking of an x509UserProxyFQAN: (too many ambiguous)In case your x509UserProxyFQAN hasn't been mapped into the CE configuration you will be shown the following error:

Code Block
languagebash
themeMidnight
apascolinit1@ui-tier1 ~
$ condor_submit -pool ce01-htc.cr.cnaf.infn.it:9619 -remote ce01-htc.cr.cnaf.infn.it submit_ssl.sub

ERROR: Can't find address of schedd ce01-htc.cr.cnaf.infn.it



  1. Take Get a proxy with voms-proxy-init
    Code Block
    languagebash
    themeMidnight
    apascolinit1@ui-tier1 ~
    $ voms-proxy-init --voms cms
    Enter GRID pass phrase for this identity:
    Contacting voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "cms"...
    Remote VOMS server contacted succesfully.
    
    
    Created proxy in /tmp/x509up_u23077.
    
    Your proxy is valid until Tue Mar 19 22:39:41 CET 2024
  2. Submit a job to the CE
    Code Block
    languagebash
    themeMidnight
    titleSubmit file
    apascolinit1@ui-tier1 ~
    $ cat submit_ssl.sub
    # Unix submit description file
    # subimt.sub -- simple sleep job
    
    use_x509userproxy       = true
    +owner                  = undefined
    
    batch_name              = Grid-SSL-Sleep
    executable              = sleep.sh
    arguments               = 3600
    log                     = $(batch_name).log.$(Process)
    output                  = $(batch_name).out.$(Process)
    error                   = $(batch_name).err.$(Process)
    should_transfer_files   = Yes
    when_to_transfer_output = ON_EXIT
    
    queue

    Code Block
    languagebash
    themeMidnight
    titleSubmit a job with SSL
    apascolinit1@ui-tier1 ~
    $ export _condor_SEC_CLIENT_AUTHENTICATION_METHODS=SSL
    
    apascolinit1@ui-tier1 ~
    $ condor_submit -pool ce01-htc.cr.cnaf.infn.it:9619 -remote ce01-htc.cr.cnaf.infn.it submit_ssl.sub
    Submitting job(s).
    1 job(s) submitted to cluster 36.
    
    apascolinit1@ui-tier1 ~
    $ condor_q -pool ce01-htc.cr.cnaf.infn.it:9619 -n ce01-htc.cr.cnaf.infn.it
    
    
    -- Schedd: ce01-htc.cr.cnaf.infn.it : <131.154.193.64:9619?... @ 03/19/24 10:45:18
    OWNER      BATCH_NAME        SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
    apascolini Grid-SSL-Sleep   3/19 10:44      _      1      _      1 36.0
    
    Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
    Total for apascolini: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
    Total for all users: 2 jobs; 1 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

...