Access to CERN resources are managed via subscription to the e-groups:
Granted to everybody of the Muon Collider collaboration
muoncollider-batch (batch queue)
muoncollider-readers (read from disk)
Granted to developers for the moment
muoncollider-writers (write to disk)
To see if you are able to submit you can check your Accounting Groups with this command on lxplus:
$) haggis rights
+----------------------------+
| gianelle |
+----------------------------+
| group_u_MUONCOLLIDER.users |
| group_u_LHCB.u_z5 |
+----------------------------+
If you're a member of multiple accounting groups, then you should try to ensure that you specify which group you wish to use for a specific activity. The system will otherwise assign you to one, normally the first is the default:
The syntax for the submit file is:
+AccountingGroup = "group_u_
MUONCOLLIDER.users"
BTW: submission to the HTCondor schedds at CERN normally makes use of a shared filesystem, ie AFS. This is convenient, but also shared filesystems introduce instability
To submit a muoncollider's job to condor on CERN you first need a submit file:
# Unix submit description file +JobFlavour = "testmatch"
|
IDX is just an example of a parameter that you can use as argument for your job. JobFlavour is needed to choose max job wall time.
Then you need to define your executable:
#!/bin/bash ## Job's argument #### Define some paths ## Experiment space where for example BIB files are stored ### Docker file to use (i.e. which release version) ## Define a unique directory for the job using the argument ## cd in the working directory ## a simple function to quit script ### function to copy file to eos space # create the unique working dir and cd in it ## copy or link the auxiliary files of the job inside the job directory # back to $BASE directory # copy outfile on user EOS space quit "All is done" 0 |
Last script is the executable that you run inside the container (for example a bash script). Remember that when you execute the singularity command you jump (inside the container)
in the same directory from which you have run the command, usually your afs home directory, or as in the previous script the $BASE directory.
#!/bin/bash # Job's argument # define as in the previous script the _same_ unique job directory # set muoncollider environment # cd in the working directory # define the arguments for the ddsim script echo "Start simulation at `date`" echo "End simulation at `date`" |
To submit the file, setting an arguments, use the usual condor command:
condor_submit IDX=01 job.sub
Here the job submission documentation from CERN
There are some schedds that do not allow shared filesystems on the worker node, which should make them more suitable for users who have longer jobs and are willing to have slightly more constraints.
To select the schedds:
module load lxbatch/spool
Some modification are needed to scripts:
# Unix submit description file queue |
You need to define the files that need to be transfer both for input and output
#!/bin/bash ## Job's argument #### Define some paths ### Docker file to use (i.e. which release version) ## a simple function to quit script ### function to copy output files to eos space ### first copy the input file from the eos path # exec the singularity container # copy outfile on user EOS space quit "All is done" 0 |
The major difference is that we use the xrdcp command to transfer input and output files from and to eos space. Shared filesystems (i.e. afs) are still available on the worker nodes, but it is not safe to refers to it.
In the singularity command we muount the condor spool directory as the user "HOME".
Last script is the executable that you run inside the container (for example a bash script). Remember that when you execute the singularity command you jump (inside the container)
in the same directory from which you have run the command, usually your afs home directory, or as in the previous script the condor spool directory.
The only difference from the local approach is that we don't need to create a unique directory for the job.
#!/bin/bash echo "Start job at `date`"# Job's argument # number of events to process # set muoncollider environment # define the arguments for the ddsim script echo "Start simulation at `date`" echo "End simulation at `date`" |
To submit the file, setting an arguments, use the usual condor command:
condor_submit
-spool
IDX=01 job.sub
If you need to reconstruct events using your own customized code (i.e. a custom processor) you need first of all to commit your code on git in a "ad hoc" branch (or you can zip your code as a input file).
In the following example we will use as configuration file one of the official one committed on the ProductionConfig package. Also we will see how to manage BIB files.
We use the "spool" method, so the submission script is like the previous one (note that for reconstruction we need more memory):
# Unix submit description file |
Also the executable is similar to the previous one, Note that we cannot copy all the 1000 BIB files, so we choose NBIBs file randomly that will be overlays with our signal:
#!/bin/bash ## ARG1: an index to identify the job ## Software docker image ## Eos # Signal input file # How many BIBs ? ### Exit function: ### function to copy file to eos space ## Retrieve input Signal file ## Retrieve input BKG files # exec the singularity container ## save output files quit "Well Done" 0 |
More interesting is the job that need to be execute in the container:
#!/bin/bash ## ARG1: the job index JOB=$1 # define how many events per job source /opt/ilcsoft/muonc/init_ilcsoft.sh ## Function to compile processor, it also redefine MARLIN_DLL
echo "Start job at `date`" Marlin --global.MaxRecordNumber=${NEVT} --global.SkipNEvents=${SKIP} --OverlayTrimmed.BackgroundFileNames="${BIBs[*]}" allProcess.xml &> reco.out echo "Everything end at `date`" |
To submit more jobs you can use a "for" cycle from bash command line changing the IDX parameter, or you can modify you submit file in this way:
start = $(Process) + 20
IDX = $INT(start,%d)
Arguments = $(IDX)
...
queue 40
With these lines, for example, condor submits 40 jobs starting from Arguments=20 (see first line). In other words this is equivalent to the bash command line:
for index in {20..59}; do condor_submit --spool IDX=${index} job.sub; done
For other option see for example here
This is a list of the more used commands, see documentation for more info:
## submit job
condor_submit -spool job.sub
## query all mine jobs
condor_q
condor_q -l <jobid>
condor_q -nobatch
## query all jobs
condor_q -all
## remove single job
condor_rm <jobid>
## remove all _terminated_ jobs
condor_rm -constraint 'JobStatus == 4'
## ssh to job's worker node
condor_ssh_to_job <jobid>
## transfer condor stdout/err
condor_transfer_data <jobid>
/eos/experiment/muoncollider/ is the experiment available space
eosexperiment.cern.ch is the host of the EOS (xroot) server
export EOS_MGM_URL=root://eosexperiment.cern.ch
User can access (list and read) files in CERN EOS from non lxplus nodes, without being authenticated.
In order to get full permission(write/delete), one should use CERN account to get Kerberos authentication:
[gianelle@muonc ~]$ kinit gianelle@CERN.CH
Password for gianelle@CERN.CH:
[gianelle@muonc ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_3807
Default principal: gianelle@CERN.CHValid starting Expires Service principal
06/29/23 11:42:42 06/30/23 12:42:39 krbtgt/CERN.CH@CERN.CH
renew until 07/04/23 11:42:39
Create directory:
eos mkdir /eos/experiment/muoncollider/test
Copy file:
xrdcp test.txt ${EOS_MGM_URL}//eos/experiment/muoncollider/
List files:
eos ls -l /eos/experiment/muoncollider/
See here for a tutorial to access eos from lxplus.cern.ch
Here a talk on EOS for Users