How to use Python libraries in a conda virtual environment
A virtual environment is a working, isolated copy of Python that maintains its own files and directories so that a user can work with specific versions of libraries without affecting other Python projects. Virtual environments simplify the clean separation of different projects and avoid problems with different dependencies and version requirements between components.
The conda command is the interface for managing virtual installations and environments with the Anaconda Python distribution.
In this way, a user can use the Python modules which are not installed on the user interface.
On a user interface
Conda is just installed on ui-tier1 user interface at CNAF. Anyway, if someone needs to use a particular version of conda, there is the possibility to create a conda environment with the python modules via CVMFS:
$ source /cvmfs/juno.ihep.ac.cn/sw/anaconda/Anaconda3-2020.11-Linux-x86_64/etc/profile.d/conda.sh
The versions of Python available for conda can be checked by launching the command:
$ conda search "^python$"
After this, the user has to choose the name of the virtual environment, for example <yourcondaenv>, and the latest version of Python 2 available (Python 2.7.18):
$ conda create -n <yourcondaenv> python=2.7.18 anaconda
Thus, several new packages will be installed.
If a different path is not specified, the Python packages will be installed in a subfolder of the user's home.
At the end of the installation, the virtual environment can be activated by typing:
$ conda activate <yourcondaenv>
Now, the command prompt has changed to indicate the conda environment we are currently in.
To install an additional package, denoted for example as "package", only in <yourcondaenv> virtual environment, it is enough to type the command:
$ conda install -n <yourcondaenv> [package]
Finally, to end the session of the current environment:
$ conda deactivate
NB: It is not necessary to specify the name of the virtual environment: the one that is currently active will be deactivated.
If a user wants to delete a conda environment that is no longer necessary, the name of the conda environment must be indicated:
$ conda remove -n <yourcondaenv> -all
In a HTCondor job
Let's see how to configure the conda virtual environment in case it is necessary to use python modules in a HTCondor job.
A possible solution is to create the conda environment in a path that exists on a shared storage that can be accessed by the worker nodes.
The shell script below can be used to setup a conda virtual environment in which the mini-conda installation is already on /opt/exp_software/<experiment>.
$ cat /opt/exp_software/juno/test_conda/conda.sh export CONDA_EXE='/cvmfs/juno.ihep.ac.cn/sw/anaconda/Anaconda3-2020.11-Linux-x86_64/bin/conda' export _CE_M='' export _CE_CONDA='' export CONDA_PYTHON_EXE='/cvmfs/juno.ihep.ac.cn/sw/anaconda/Anaconda3-2020.11-Linux-x86_64/bin/python' [...]
The script can be executed by the command:
$ source /opt/exp_software/juno/test_conda/conda.sh
Then, by launching
$ conda create -p /opt/exp_software/juno/test_conda/ python=3.8.0 anaconda
the conda virtual environment has been created in /opt/exp_software/juno/test_conda/
for the 3.8 version of Python.
After this, the screen will show
"Continue creating environment (y / [n])?"
so that a confirm is requested.
In this case, the submit description file and the executable file have to be in the folder /opt/exp_software/juno/test_conda/.
The content of the submit description file is:
$ cat test.sub executable = test.sh output = /storage/gpfs_data/juno/test_conda/job.out error = /storage/gpfs_data/juno/test_conda/job.err log = /storage/gpfs_data/juno/test_conda/job.log WhenToTransferOutput = ON_EXIT ShouldTransferFiles = YES queue 1
and for the script:
$ cat test.sh #!/bin/bash /opt/exp_software/juno/test_conda/bin/python --version /opt/exp_software/juno/test_conda/bin/python -c 'import h5py' /opt/exp_software/juno/test_conda/bin/python -c 'import matplotlib'
In this instance, the h5py and the matplotlib packages are imported.
To send the job to the submit-node sn-02
$ condor_submit -name sn-02.cr.cnaf.infn.it -spool test.sub Submitting job (s). 1 job (s) submitted to cluster 14243269 [...]
and to check that all is going well so far:
$ condor_q -name sn-02.cr.cnaf.infn.it 14243269 [...] Total for query: 1 jobs; 1 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Finally, when the job completes, the errors and the output of the submission can be retrieved via the condor_transfer_data 14243269
command and found at:
/storage/gpfs_data/juno/test_conda/job.err
/storage/gpfs_data/juno/test_conda/job.out
Other tips
- In case a huge amount of files has to be recalled from tape, it is strongly suggested to do it in agreement with Experiment Support group, which can help and facilitate the procedure.
If you need to build a user interface from scratch you can just run the following commands on your AlmaLinux 9 machine. This guide requires
sudo
privileges.sudo curl -L https://linuxsoft.cern.ch/wlcg/wlcg-el9.repo -o /etc/yum.repos.d/wlcg-el9.repo sudo curl -L https://linuxsoft.cern.ch/wlcg/RPM-GPG-KEY-wlcg -o /etc/pki/rpm-gpg/RPM-GPG-KEY-wlcg sudo curl -L https://repository.egi.eu/sw/production/cas/1/current/repo-files/egi-trustanchors.repo -o /etc/yum.repos.d/egi-trustanchors.repo sudo dnf install -y epel-release sudo dnf config-manager --set-enabled crb sudo dnf install -y ui sudo systemctl enable --now fetch-crl.timer