If you're a member of multiple accounting groups, then you should try to ensure that you specify which group you wish to use for a specific activity. The system will otherwise assign you to one, normally the first is the default:
The syntax for the submit file is:
+AccountingGroup = "group_u_MUONCOLLIDER.users"
Condor local submission
BTW: submission to the HTCondor schedds at CERN normally makes use of a shared filesystem, ie AFS. This is convenient, but also shared filesystems introduce instability
To submit a muoncollider's job to condor on CERN you first need a submit file:
IDX is just an example of a parameter that you can use as argument for your job. JobFlavour is needed to choose max job wall time.
Then you need to define your executable:
#!/bin/bash
## Job's argument IDX=$1
#### Define some paths ### Eos ## User space where to store output file (you can also leave the files in the local directory) EOS_USER_URL=root://eosuser.cern.ch EOS_PATH=/eos/user/g/gianelle/MuonC/muontest
## Experiment space where for example BIB files are stored EOS_EXP_URL=root://eosexperiment.cern.ch EOS_EXP="/eos/experiment/muoncollider/"
## Define a unique directory for the job using the argument ## in the local mode all the job sumbitted to condor are executed on the local directory ## so pay attention to the possible conflicts WORKHOME="JOB_$IDX"
## cd in the working directory cd $BASE
## a simple function to quit script # ARGs: <error message> <error code> quit () { echo $1 # comment the line below if you project to leave outputs in the local directory rm -rf $WORKHOME exit $2 }
### function to copy file to eos space # ARG1: input file name # ARG2: output file name copy() { IN=$1 OUT=$2 if [ -f $1 ]; then if ! xrdcp -f $IN ${OUT} ; then quit "ERROR! Failed to transfer $IN" 2 fi else quit "ERROR! File $IN not exists" 2 fi }
# create the unique working dir and cd in it mkdir $WORKHOME cd $WORKHOME
## copy or link the auxiliary files of the job inside the job directory # you can use a generic name fot ìinput file so you don't need to # customize the steering file at each execution cp $BASE/k10_out3.hepmc input.hepmc ln -s $BASE/sim_steer.py sim_steer.py
# back to $BASE directory cd $BASE
# exec the singularity container echo "Start singularity job at `date`" # NB It mounts the eos experiment directory so the BIB files are accessible singularity exec -B$EOS_EXP $DOCKER /bin/bash sim.sh ${IDX} &> job_${IDX}.log echo "End job at `date`"
# copy outfile on user EOS space ext=$(( 10#${IDX} )) postfix=$(printf "%03d" $IDX ) copy ${WORKHOME}/OUTPUT_${ext}.slcio ${EOS_USER_URL}//${EOS_PATH}/z2jmu_k10_10evt_${postfix}.slcio copy ${WORKHOME}/simout.log ${EOS_USER_URL}//${EOS_PATH}/z2jmu_k10_10evt_${postfix}.log
quit "All is done" 0
Last script is the executable that you run inside the container (for example a bash script). Remember that when you execute the singularity command you jump (inside the container) in the same directory from which you have run the command, usually your afs home directory, or as in the previous script the $BASE directory.
#!/bin/bash
# Job's argument IDX=$1
# define as in the previous script the _same_ unique job directory WORKHOME=JOB_$IDX # number of events to process NEVT=10
# set muoncollider environment source /opt/ilcsoft/muonc/init_ilcsoft.sh
# cd in the working directory cd $WORKHOME
# define the arguments for the ddsim script INPUT=input.hepmc # btw the output file name must be the same of the ones used in the previous script ext=$(( 10#${IDX} )) OUTPUT=OUTPUT_${ext}.slcio skipEvt=$(( ( 10#${IDX} ) * $NEVT ))
There are some schedds that do not allow shared filesystems on the worker node, which should make them more suitable for users who have longer jobs and are willing to have slightly more constraints.
You need to define the files that need to be transfer both for input and output
#!/bin/bash
## Job's argument IDX=$1
#### Define some paths ### Eos ## User space where to store output file (you can also leave the files in the local directory) EOS_USER_URL=root://eosuser.cern.ch EOS_PATH=/eos/user/g/gianelle/MuonC/muontest
## a simple function to quit script # ARGs: <error message> <error code> quit () { echo $1 exit $2 }
### function to copy output files to eos space # ARG1: input file name # ARG2: output file name copyout() { IN=$1 OUT=$2 if [ -f $1 ]; then if ! xrdcp -f $IN ${OUT} ; then quit "ERROR! Failed to transfer $IN" 2 fi else quit "ERROR! File $IN not exists" 2 fi }
### first copy the input file from the eos path xrdcp ${EOS_USER_URL}//${EOS_PATH}/k10_out3.hepmc input.hepmc
# exec the singularity container echo "Start singularity job at `date`" singularity exec -B $PWD $DOCKER /bin/bash sim.sh ${IDX} &> job.log echo "End job at `date`"
# copy outfile on user EOS space ext=$(( 10#${IDX} )) postfix=$(printf "%03d" $IDX ) copyout OUTPUT_${ext}.slcio ${EOS_USER_URL}//${EOS_PATH}/z2jmu_k10_10evt_${postfix}.slcio copyout simout.log ${EOS_USER_URL}//${EOS_PATH}/z2jmu_k10_10evt_${postfix}.log
quit "All is done" 0
The major difference is that we use the xrdcp command to transfer input and output files from and to eos space. Shared filesystems (i.e. afs) are still available on the worker nodes, but it is not safe to refers to it.
In the singularity command we muount the condor spool directory as the user "HOME".
Last script is the executable that you run inside the container (for example a bash script). Remember that when you execute the singularity command you jump (inside the container) in the same directory from which you have run the command, usually your afs home directory, or as in the previous script the condor spool directory.
The only difference from the local approach is that we don't need to create a unique directory for the job.
#!/bin/bash
echo "Start job at `date`"# Job's argument IDX=$1
# number of events to process NEVT=10
# set muoncollider environment source /opt/ilcsoft/muonc/init_ilcsoft.sh
# define the arguments for the ddsim script INPUT=input.hepmc # btw the output file name must be the same of the ones used in the previous script ext=$(( 10#${IDX} )) OUTPUT=OUTPUT_${ext}.slcio skipEvt=$(( ( 10#${IDX} ) * $NEVT ))
To submit the file, setting an arguments, use the usual condor command:
condor_submit -spool IDX=01 job.sub
A more complex example
If you need to reconstruct events using your own customized code (i.e. a custom processor) you need first of all to commit your code on git in a "ad hoc" branch (or you can zip your code as a input file).
In the following example we will use as configuration file one of the official one committed on the ProductionConfig package. Also we will see how to manage BIB files.
We use the "spool" method, so the submission script is like the previous one (note that for reconstruction we need more memory):
Also the executable is similar to the previous one, Note that we cannot copy all the 1000 BIB files, so we choose NBIBs file randomly that will be overlays with our signal:
### function to copy file to eos space # ARG1: input file name # ARG2: output file name copy() { IN=$1 OUT=$2 if ! xrdcp -f $IN ${OUT} ; then quit "ERROR! Failed to transfer $IN" 2 fi }
## Retrieve input Signal file copy ${EOS_USER_URL}//${EOS_PATH}/${INPUT} input.slcio
## Retrieve input BKG files BKGPre=${EOS_EXP_URL}//${EOS_BIB}/sim_mumu-1e3x500-26m-lowth-excl_seed BKGPost="_allHits.slcio" # total number of available BIBs files BKGTot=1000 # array with BIB file names BIBs=() for (( i=1; i<=${NBIBs}; i++ )); do RNDBKG=$(printf "%04d" $(($RANDOM % $BKGTot)) ) BKGFILE=${BKGPre}${RNDBKG}${BKGPost} BIBFILE=BKG_seed${RNDBKG}.slcio copy $BKGFILE $BIBFILE BIBs+=( $BIBFILE ) done
# exec the singularity container echo "Start singularity job at `date`" singularity exec -B $PWD $DOCKER ./job.sh ${IDX} "${BIBs[@]}" &> job.log echo "End job at `date`"
More interesting is the job that need to be execute in the container:
#!/bin/bash
## ARG1: the job index ## ARG2...ARGn: BIB file names
JOB=$1 shift # the other args are BIB files BIBs=("$@")
# define how many events per job EVTxJOB=500 if [ ${JOB} -eq 0 ]; then # the first job need an event more... NEVT=$(( ${EVTxJOB} + 1 )) SKIP=0 else NEVT=${EVTxJOB} SKIP=$(( ${EVTxJOB} * ${JOB} )) fi
## Function to compile processor, it also redefine MARLIN_DLL # ARG1: processor name compile () { PROCNAME=$1 WD=`pwd` mkdir BUILD/${PROCNAME} cd BUILD/${PROCNAME} cmake -C $ILCSOFT/ILCSoft.cmake -DBOOST_INCLUDEDIR=/usr/include/boost173 -DBOOST_LIBRARYDIR=/usr/lib64/boost173 ../../${PROCNAME} make cd $WD echo $MARLIN_DLL | grep ${PROCNAME} &> /dev/null if [ $? -eq 0 ]; then NEWDLL=$(echo $MARLIN_DLL | sed -e "s#/opt/ilcsoft/muonc/${PROCNAME}/.*/lib/lib${PROCNAME}.so#${BASE}/BUILD/${PROCNAME}/lib/lib${PROCNAME}.so#g") else NEWDLL=$MARLIN_DLL:${BASE}/BUILD/${PROCNAME}/lib${PROCNAME}.so fi export MARLIN_DLL=$NEWDLL }
With these lines, for example, condor submits 40 jobs starting from Arguments=20 (see first line). In other words this is equivalent to the bash command line:
for index in {20..59}; do condor_submit --spool IDX=${index} job.sub; done