How to use Python libraries

Problem description: A user, being part of an experiment, needed to use Python modules that are not installed by default on user interface or worker nodes, so he asks for help from user support.

Solution:

Such modules can be installed in virtual environments with conda, and his experiment (Juno) already had a conda installation in his cvmfs area.

A virtual environment is a working, isolated copy of Python that maintains its own files and directories so that you can work with specific versions of libraries without affecting other Python projects. Virtual environments simplify the clean separation of different projects and avoid problems with different dependencies and version requirements between components.

The conda command is the interface for managing virtual installations and environments with the Anaconda Python distribution.

Conda is just installed on ui-tier1, so we can use this.

If someone needs to use a particular version of conda, for example the user of the experiment Juno, there is the possibility to create conda environments with python modules via cvmfs:

$ source /cvmfs/juno.ihep.ac.cn/sw/anaconda/Anaconda3-2020.11-Linux-x86_64/etc/profile.d/conda.sh

We also check the versions of Python available through:

$ conda search "^python$"

Choosing the name for our virtual environment, for example <yourcondaenv>, and the latest version of Python 2 available (Python 2.7.18), we type:

$ conda create -n <yourcondaenv> python=2.7.18 anaconda

and so we will see a series of new packages that will be installed.

If a different path is not specified, the Python packages in question are installed in a subfolder of the user's home.

At the end of the installation, denoted by three "done", we can activate our virtual environment by typing:

$ conda activate <yourcondaenv>

The command prompt has now changed to indicate which conda environment we are currently in by prefixing '(<yourcondaenv>)'

To install an additional package, denoted for example as "package", only in "<yourcondaenv>" virtual environment, just type the command:

$ conda install -n <yourcondaenv> [package]

To end a session in the current environment, we type:

$ conda deactivate

It is not necessary to specify the name of the virtual environment: the one that is currently active will be deactivated.

If we want to delete a conda environment that is no longer necessary, specifying the name of the environment, we type the command:

$ conda remove -n <yourcondaenv> -all

Case 2 Let's see how to configure the conda virtual environment in case it is necessary to use python modules in an htcondor job:
A possible solution is to create the conda environment in a path that
exists on shared storage so that it can be accessed by workers
nodes.
A working example, in which the mini-conda installation that is already on /opt /exp_software/cta is referenced is the following conda.sh file
We can see its content by typing:
```
$ cat /storage/gpfs_data/juno/test_conda/conda.sh
```
_export CONDA_EXE='/cvmfs/juno.ihep.ac.cn/sw/anaconda/Anaconda3-2020.11-Linux-x86_64/bin/conda'
export _CE_M=''
export _CE_CONDA=''
export CONDA_PYTHON_EXE='/cvmfs/juno.ihep.ac.cn/sw/anaconda/Anaconda3-2020.11-Linux-x86_64/bin/python'_
[...]
This file is also on /cvmfs/juno.ihep.ac.cn/sw/anaconda/Anaconda3-2020.11-Linux-x86_64/etc/profile.d/conda.sh
Through the command:
```
$ source /storage/gpfs_data/juno/test_conda/conda.sh 
```
the 'conda.sh' file will be executed, source is used to set the conda environment.

Through the command:
```
$ conda create -p  /storage/gpfs_data/juno/test_conda/ python=3.8.0 anaconda
```
conda is created in that storage path for Python version 3.8.

The screen will show "Continue creating environment (y / [n])?" and we must confirm by typing 'y'.

Exploring the test folder:
```
$ cd /storage/gpfs_data/juno/test_conda/  
```
two files we will create:
test.sh test.sub

The test.sub file contains executable, input, output.
Let's paste their content through:
```
$ nano test.sh 
```
(we put the following content:)
#+owner = undefined
universe = vanilla
executable = test.sh
output = /storage/gpfs_data/juno/test_conda/job.out
error = /storage/gpfs_data/juno/test_conda/job.err
log = /storage/gpfs_data/juno/test_conda/job.log
WhenToTransferOutput = ON_EXIT
ShouldTransferFiles = YES
queue 1

and

$ nano test.sub

#!/bin/bash
/storage/gpfs_data/juno/test_conda/bin/python --version
/storage/gpfs_data/juno/test_conda/bin/python -c 'import h5py'

/storage/gpfs_data/juno/test_conda/bin/python -c 'import matplotlib'

For example we can import packages we need (h5py and matplotlib) with the last two lines.

We type the command:

$ condor_submit -name sn-01 /storage/gpfs_data/juno/test_conda/test.sub

to submit to worker node sn-01, and we will get:

Submitting job (s).
1 job (s) submitted to cluster 14243269

Subsequently with:

$ condor_q -name sn-01 $ (whoami)

let's see how submission proceeds.

At the end we will read, if all is ok:

Total for query: 1 jobs; 1 completed, 0 removed, 0 idle, 0 running, 0
held, 0 suspended

Finally we view with:

$ cat /storage/gpfs_data/juno/test_conda/job.err

that there are no errors, we can view the output with:

$ cat /storage/gpfs_data/juno/test_conda/job.out

Python 3.8.0

On the worker node we were able to use Python installed with conda and load a module that is not present on the user interface.

Some users have scripts to automatize job submission. In these scripts the queue status is periodically checked. We recommend to use a frequency of at least 5 min. Example:
```
while True:
    job_pend = calc_pending()
    if job_pend < max_pend :
         submit_jobs()
    sleep 5 min
```
In case a huge amount of files has to be recalled from tape, it is strongly suggested to do it in agreement with Experiment Support group, which can help and facilitate the procedure.

If you need to build a user interface from scratch you can just run the following commands on your CentOS Linux machine.
From ROOT:

rm /etc/yum.repos.d/epel* /etc/yum.repos.d/UMD*
yum remove epel-release-7-11.noarch
yum install epel-release
yum install yum-priorities
yum install http://repository.egi.eu/sw/production/umd/4/centos7/x86_64/updates/umd-release-4.1.3-1.el7.centos.noarch.rpm
yum clean all
yum update
yum install ui
cat  /etc/cron.d/fetch-crl; cat /usr/sbin/fetch-crl (these are to check that "cron" and the executable file called by "cron" for "fetch-crl" exist)
cat /etc/grid-security/certificates/ (this is to check that this directory contains a bunch of files in .0 and .r0 format)

Page tree

12 - Helpful information and tips

How to use Python libraries