Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents


Author(s)

NameInstitutionMail AddressSocial Contacts
Luca GiommiINFN-CNAF

luca.giommi@cnaf.infn.it

N/A
Mattia PaladinoUniversity of Bolognamattia.paladino2@unibo.itN/A

How to Obtain Support

Mailluca.giommi@cnaf.infn.it
SocialN/A
JiraN/A

General Information

ML/DL Technologiesclassification algorithms
Science FieldsHigh energy Physics
Difficultylow
Language

English

Type

runnable, fully annotated

Software and Tools

Programming LanguagePython
ML Toolset

Keras, Tensorflow, sklearn, PyTorch, XGBoost

Additional librariesuproot, matplotlib, 
Suggested EnvironmentsGoogle CoLab, Docker, own PC, INFN-Cloud VM

Needed datasets

Data CreatorATLAS experiment
Data Typesimulation
Data Size57 MB compressed
Data SourceKaggle, CERN opendata


Short Description of the Use Case

In this exercise, we use the MLaa4HEP machinery to deal with the Higgs boson ML challenge, a competition held in 2014, organized by a group of ATLAS physicists and data scientists, and hosted by the Kaggle platform.

...

  • files.txt stores the path of the input ROOT files;
  • labels.txt stores the labels of the input ROOT files in case of classification problems;
  • model.py stores the definition of the custom ML model to use in the training phase, in the user’s favorite ML framework;

  • params.json stores the parameters on which MLaaS4HEP is based, e.g. number of events to use, chunk size, batch size, and redirector path for files located in remote storage;

  • preproc.json stores the definition of preprocessing operations to be applied to data.

How to execute it

Way #1: Use Googe Colab 

You can run this Jupyter notebook using Google Colab, by clicking here. It covers several steps, from inspecting data, to running the MLaaS4HEP framework to obtain the trained ML models, to uploading the submissions file to the Kaggle website.

Way #2 Use the MLaaS4HEP Docker image

If you don't want to use MLaaS4HEP in the Google Colab notebook but you want to use your resources, instead of installing all the dependencies you can use the MLaaS4HEP Docker image, i.e. felixfelicislp/mlaas:xrootd_pip. An example of the command to run is the following:

docker run --name={name} --memory={memory} --cpus={cpus} felixfelicislp/mlaas:xrootd_pip --files={files} --labels={labels} --model={model} --params={params} --fout={fout}

Way #3: Use the MLaaS4HEP server

Another way to use the MLaaS4HEP framework is to interact with the APIs of the MLaaS4HEP server. We implemented a working prototype connecting an OAuth2-Proxy server, a MLaaS4HEP_server, an xrootd proxy-cache server, an X509 proxy renewer, and TFaaS, hosted by a VM of the INFN Cloud.

...

The former command allows training a ML model, whereas the latter allows using this model to get the prediction on a given event (stored in the predict_bkg.json file). All the instructions about how to use these services can be found here. A demo version of the services can be found here. A pictorial representation of the services is the following:


References

V. Kuznetsov, L. Giommi, D. Bonacorsi, MLaaS4HEP: Machine Learning as a Service for HEP. Comput Softw Big Sci 5, 17 (2021). DOI: 10.1007/s41781-021-00061-3

...