How to use Python libraries in a conda virtual environment

A virtual environment is a working, isolated copy of Python that maintains its own files and directories so that a user can work with specific versions of libraries without affecting other Python projects. Virtual environments simplify the clean separation of different projects and avoid problems with different dependencies and version requirements between components.

The conda command is the interface for managing virtual installations and environments with the Anaconda Python distribution.
In this way, a user can use the Python modules which are not installed on the user interface.

On a user interface

Conda is just installed on ui-tier1 user interface at CNAF. Anyway, if someone needs to use a particular version of conda, there is the possibility to create a conda environment with the python modules via CVMFS:

$ source /cvmfs/juno.ihep.ac.cn/sw/anaconda/Anaconda3-2020.11-Linux-x86_64/etc/profile.d/conda.sh

The versions of Python available for conda can be checked by launching the command:

$ conda search "^python$"

After this, the user has to choose the name of the virtual environment, for example <yourcondaenv>, and the latest version of Python 2 available (Python 2.7.18):

$ conda create -n <yourcondaenv> python=2.7.18 anaconda

Thus, several new packages will be installed.
If a different path is not specified, the Python packages will be installed in a subfolder of the user's home.

At the end of the installation, the virtual environment can be activated by typing:

$ conda activate <yourcondaenv>

Now, the command prompt has changed to indicate the conda environment we are currently in.
To install an additional package, denoted for example as "package", only in <yourcondaenv> virtual environment, it is enough to type the command:

$ conda install -n <yourcondaenv> [package]

Finally, to end the session of the current environment:

$ conda deactivate


NB: It is not necessary to specify the name of the virtual environment: the one that is currently active will be deactivated.

If a user wants to delete a conda environment that is no longer necessary, the name of the conda environment must be indicated:

$ conda remove -n <yourcondaenv> -all

In a HTCondor job

Let's see how to configure the conda virtual environment in case it is necessary to use python modules in a HTCondor job.

A possible solution is to create the conda environment in a path that exists on a shared storage that can be accessed by the worker nodes.
The shell script below can be used to setup a conda virtual environment in which the mini-conda installation is already on /opt/exp_software/<experiment>.

$ cat /opt/exp_software/juno/test_conda/conda.sh
export CONDA_EXE='/cvmfs/juno.ihep.ac.cn/sw/anaconda/Anaconda3-2020.11-Linux-x86_64/bin/conda'
export _CE_M=''
export _CE_CONDA=''
export CONDA_PYTHON_EXE='/cvmfs/juno.ihep.ac.cn/sw/anaconda/Anaconda3-2020.11-Linux-x86_64/bin/python'
[...]

The script can be executed by the command:

$ source /storage/gpfs_data/juno/test_conda/conda.sh 

Then, by launching

$ conda create -p /opt/exp_software/juno/test_conda/ python=3.8.0 anaconda

the conda virtual environment has been created in /opt/exp_software/juno/test_conda/ for the 3.8 version of Python.
After this, the screen will show

"Continue creating environment (y / [n])?"

so that a confirm is requested.
In this case, the submit description file and the executable file have to be in the folder /opt/exp_software/juno/test_conda/.
The content of the submit description file is:

$ cat test.sub
executable = test.sh
output = /storage/gpfs_data/juno/test_conda/job.out
error = /storage/gpfs_data/juno/test_conda/job.err
log = /storage/gpfs_data/juno/test_conda/job.log
WhenToTransferOutput = ON_EXIT
ShouldTransferFiles = YES
queue 1

and for the script:

$ cat test.sh
#!/bin/bash
/opt/exp_software/juno/test_conda/bin/python --version
/opt/exp_software/juno/test_conda/bin/python -c 'import h5py'
/opt/exp_software/juno/test_conda/bin/python -c 'import matplotlib'

In this instance, the h5py and the matplotlib packages are imported.

To send the job to the submit-node sn-02

$ condor_submit -name sn-02.cr.cnaf.infn.it -spool test.sub
Submitting job (s).
1 job (s) submitted to cluster 14243269
[...]

and to check that all is going well so far:

$ condor_q -name sn-02.cr.cnaf.infn.it 14243269
[...]
Total for query: 1 jobs; 1 completed, 0 removed, 0 idle, 0 running, 0
held, 0 suspended

Finally, when the job completes, the errors and the output of the submission can be retrieved via the condor_transfer_data 14243269 command and found at:

  • /storage/gpfs_data/juno/test_conda/job.err
  • /storage/gpfs_data/juno/test_conda/job.out

Other tips

  • In case a huge amount of files has to be recalled from tape, it is strongly suggested to do it in agreement with Experiment Support group, which can help and facilitate the procedure.

  • If you need to build a user interface from scratch you can just run the following commands on your CentOS Linux machine.
    From ROOT:

    rm /etc/yum.repos.d/epel* /etc/yum.repos.d/UMD*
    yum remove epel-release-7-11.noarch
    yum install epel-release
    yum install yum-priorities
    yum install http://repository.egi.eu/sw/production/umd/4/centos7/x86_64/updates/umd-release-4.1.3-1.el7.centos.noarch.rpm
    yum clean all
    yum update
    yum install ca-policy-egi-core -y
    yum install fetch-crl 
    yum install ui
    systemctl enable --now fetch-crl-cron
    
    cat /etc/cron.d/fetch-crl; cat /usr/sbin/fetch-crl (these are to check that "cron" and the executable file called by "cron" for "fetch-crl" exist)
    cat /etc/grid-security/certificates/ (this is to check that this directory contains a bunch of files in .0 and .r0 format)
  • No labels