All the options and the submit description commands of the condor_submit
command are available in the Command Reference Manual [26]. Also for a short guide on the submit description file and its commands you can see the Appendix A.
Some helpful examples follow below.
Multiple job submission
HTCondor allows multiple job submission by using the queue
command.
For jobs which don't depend on parameters, it is possible to submit the same job many times specifying queue <N>
in the submission file, where <N>
is an integer number.
Here's a .sub file example to submit a simple job for 3 times:
-bash-4.2$ cat sleep.sub # submit description file # sleep.sub -- simple sleep job executable = sleep.sh log = sleep.log output = outfile$(Process).txt error = errors$(Process).txt should_transfer_files = Yes when_to_transfer_output = ON_EXIT queue 3
And then run the usual commands:
-bash-4.2$ condor_submit -name sn01-htc.cr.cnaf.infn.it -spool sleep.sub Submitting job(s)... 3 job(s) submitted to cluster 4588631. -bash-4.2$ -bash-4.2$ condor_q -name sn01-htc.cr.cnaf.infn.it -- Schedd: sn01-htc.cr.cnaf.infn.it : <131.154.192.42:9618?... @ 11/04/22 11:16:11 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS dlattanziobelle ID: 4588631 11/4 11:12 _ _ _ 3 4588631.0-2 Total for query: 3 jobs; 3 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended Total for dlattanziobelle: 3 jobs; 3 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended Total for all users: 38425 jobs; 30994 completed, 0 removed, 636 idle, 6613 running, 182 held, 0 suspended -bash-4.2$ condor_transfer_data -name sn01-htc.cr.cnaf.infn.it 4588631 Fetching data files...
On the other hand, if the jobs depend on a parameter, it is possible to provide the queue
command with a list of items, for instance the list of the files that the jobs depend on.
In this case, in the submission file it is possible to define a variable (e.g. file
), which can be recalled, for instance, in arguments = $(file)
and then expand such variable into a list of values (it might also be a list of lists). It is possible to express each item either in a comma and/or space separated list, either by placing each of them on different lines and delimiting the list with parentheses. It is required to specify the keyword from
in the queue
command. For example:
executable = ... arguments = $(file) ... queue file from ( /storage/gpfs_data/.../file1 /storage/gpfs_data/.../file2 /storage/gpfs_data/.../fileN )
Another way consists to compose the list with a rule and then using the keyword matching
to match a specific expression. In the following example, assuming to have a set of .root files, HTCondor will submit a job for each file matching the specified rule:
executable = ... arguments = $(file) ... queue file matching files /storage/gpfs_data/.../*.root
Further details are available in the official HTCondor Manual [34].
CPUs, GPUs and RAM requests
Generally, for a job it could be useful to specify the number of CPUs or maybe it would be better to specify the amount of required RAM with the options:
request_cpus = <number of CPUs>
request_memory = <RAM amount in MB>
in the command lines of the job submit file. For example, this can be the script of a submit description file with specific requests of CPUs and RAM:
-bash-4.2$ cat sleep.sub # submit description file # sleep.sub -- simple sleep job request_cpus = 2 request_memory = 1000 executable = sleep.sh log = sleep.log output = outfile.txt error = errors.txt should_transfer_files = Yes when_to_transfer_output = ON_EXIT queue
On the other hand, if your job has to use GPUs for running, you have to insert the right requirement:
+WantGPU = true request_GPUs = 1 requirements = (TARGET.CUDACapability >= 1.2) && (TARGET.CUDADeviceName =?= "Tesla K40m") && $(requirements:True)
Jobs with ROOT-program as executable
First of all, you have to setup ROOT before using it. Your collaboration may have installed a ROOT distribution in /opt/exp_software (which is a location shared between the user-interface and the worker nodes).
In this case you should find one or more ROOT installation directories there:
[fornaricta@ui-tier1]$ ls /opt/exp_software/cta/local_software/root/ 5.34.26 5.34.36 5.34.38 root root-6.10.08 root-6.16.00 root_build_5.34.38
so you can choose your preferred version with a submit file like the following one:
[fornaricta@ui-tier1]$ cat test.sub universe = vanilla executable = test.sh arguments = 5.34.26 output = job.out error = job.err log = job.log WhenToTransferOutput = ON_EXIT ShouldTransferFiles = YES queue 1
where the executable file has this content:
[fornaricta@ui-tier1]$ cat test.sh #!/bin/bash source /storage/gpfs_data/ctalocal/fornaricta/root_config.sh $1 /opt/exp_software/cta/local_software/root/$1/bin/root -b -q
and the configuration script (located on a gpfs path, shared between the user-interface and the worker nodes) is:
[fornaricta@ui-tier1]$ cat /storage/gpfs_data/ctalocal/fornaricta/root_config.sh #!/bin/bash export LD_LIBRARY_PATH=/opt/exp_software/cta/local_software/root/$1/lib/root:$LD_LIBRARY_PATH
Submitting:
[fornaricta@ui-tier1]$ condor_submit -spool -name sn01-htc.cr.cnaf.infn.it test.sub Submitting job(s). 1 job(s) submitted to cluster 5824045. [fornaricta@ui-tier1]$ condor_q -name sn01-htc.cr.cnaf.infn.it 5824045.0 -- Schedd: sn01-htc.cr.cnaf.infn.it : <131.154.192.58:9618?... @ 07/08/20 18:54:17 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS fornaricta ID: 5824045 7/8 18:54 _ _ 1 1 5824045.0 Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended Total for all users: 45425 jobs; 25623 completed, 3 removed, 13181 idle, 5297 running, 1321 held, 0 suspended [fornaricta@ui-tier1]$ condor_q -name sn01-htc.cr.cnaf.infn.it 5824045.0 -- Schedd: sn-01.cr.cnaf.infn.it : <131.154.192.58:9618?... @ 07/08/20 18:55:03 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS fornaricta ID: 5824045 7/8 18:54 _ 1 _ 1 5824045.0 Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for all users: 45474 jobs; 25644 completed, 3 removed, 13222 idle, 5286 running, 1319 held, 0 suspended [fornaricta@ui-tier1]$ condor_q -name sn01-htc.cr.cnaf.infn.it 5824045.0 -- Schedd: sn-01.cr.cnaf.infn.it : <131.154.192.58:9618?... @ 07/08/20 18:55:04 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS fornaricta ID: 5824045 7/8 18:54 _ _ _ 1 5824045.0 Total for query: 1 jobs; 1 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended Total for all users: 45476 jobs; 25646 completed, 3 removed, 13223 idle, 5284 running, 1320 held, 0 suspended [fornaricta@ui-tier1]$ condor_transfer_data -name sn01-htc.cr.cnaf.infn.it 5824045.0 Fetching data files... [fornaricta@ui-tier1]$ cat job.err [fornaricta@ui-tier1]$ cat job.out ******************************************* * * * W E L C O M E to R O O T * * * * Version 5.34/26 20 February 2015 * * * * You are welcome to visit our Web site * * http://root.cern.ch * * * ******************************************* ROOT 5.34/26 (v5-34-26@v5-34-26, Jun 16 2015, 18:41:55 on linuxx8664gcc) CINT/ROOT C/C++ Interpreter version 5.18.00, July 2, 2010 Type ? for help. Commands must be C++ statements. Enclose multiple statements between { }
If no ROOT installation is available in /opt/exp_software, you can source one of the multiple distributions available from CVMFS:
[fornarivirgo@ui01-virgo root_test]$ ls /cvmfs/sft.cern.ch/lcg/releases/ROOT/ 5.34.24-64287 6.06.06-71859 6.10.00-8b404 6.12.04-4473c 6.12.06-76fef 6.14.00-66c89 6.14.04-2a3e5 6.14.04-dedca 6.16.00-23725 6.16.00-5be98 6.16.00-b4729 6.18.00-d0330 ...
For instance:
[fornarivirgo@ui01-virgo root_test]$ cat test.sub universe = vanilla Executable = test.sh ShouldTransferFiles = YES WhenToTransferOutput = ON_EXIT Log = log.log Output = log.out Error = log.err queue 1 [fornarivirgo@ui01-virgo root_test]$ cat test.sh #!/bin/bash . /cvmfs/sft.cern.ch/lcg/views/setupViews.sh LCG_96python3 x86_64-centos7-gcc8-opt root -b -q [fornarivirgo@ui01-virgo root_test]$ condor_submit -spool -name sn01-htc.cr.cnaf.infn.it test.sub Submitting job(s). 1 job(s) submitted to cluster 8445482. [fornarivirgo@ui01-virgo root_test]$ condor_q -name sn01-htc.cr.cnaf.infn.it 8445482 -- Schedd: sn01-htc.cr.cnaf.infn.it : <131.154.192.58:9618?... @ 09/10/20 17:21:42 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS fornarivirgo ID: 8445482 9/10 17:21 _ _ 1 1 8445482.0 Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended Total for all users: 39956 jobs; 21240 completed, 2 removed, 13371 idle, 4218 running, 1125 held, 0 suspended [fornarivirgo@ui01-virgo root_test]$ condor_q -name sn01-htc.cr.cnaf.infn.it 8445482 -- Schedd: sn01-htc.cr.cnaf.infn.it : <131.154.192.58:9618?... @ 09/10/20 17:23:58 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS fornarivirgo ID: 8445482 9/10 17:21 _ _ _ 1 8445482.0 Total for query: 1 jobs; 1 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended Total for all users: 39959 jobs; 21288 completed, 2 removed, 13341 idle, 4202 running, 1126 held, 0 suspended [fornarivirgo@ui01-virgo root_test]$ condor_transfer_data -name sn01-htc.cr.cnaf.infn.it 8445482 Fetching data files... [fornarivirgo@ui01-virgo root_test]$ cat log.err [fornarivirgo@ui01-virgo root_test]$ cat log.out ------------------------------------------------------------ | Welcome to ROOT 6.18/00 https://root.cern | | (c) 1995-2019, The ROOT Team | | Built for linuxx8664gcc on Jun 25 2019, 09:22:23 | | From tags/v6-18-00@v6-18-00 | | Try '.help', '.demo', '.license','.credits', '.quit'/'.q'| ------------------------------------------------------------