Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Author(s)

NameInstitutionMail Address
Matteo MiglioriniINFN Sezione di Padovamatteo.migliorini@pd.infn.it

General Information

ML/DL TechnologiesFeedforward neural network
Science FieldsHigh Energy Physics
DifficultyLow
LanguageEnglish
TypeRunnable

Software and Tools

Programming LanguagePython
ML ToolsetBigDL
Additional librariesBigDL, Spark
Suggested EnvironmentsINFN-Cloud VM, Spark Cluster

Short Description of the Use Case

The effective utilization at scale of complex machine learning techniques for HEP use cases poses several technological challenges, most importantly on the actual implementation of dedicated end-to-end data pipelines. In this paper we presented a possible solution to this challenges built using industry standard big data tools, such as Apache Spark. In the presented pipeline we exploited Spark in all the steps, from ingesting and processing ROOT files containing the dataset to the distributed training of the model.

...

In this simple example, we will train a deep neural network with the goal of classifying three different kind of events. A better description of the HEP use case and model used is provided in the original paper.

 How to execute it

A Spark cluster is required in order to perform the training. This can be created on INFN Cloud by performing the following steps:

...

In principle they should not be modified and work out of the box. If one wish to use more workers for the training and slaves ara available, the number of executors can be increased in the spark session in the option "spark.executor.instances" and  "spark.cores.max" (which should be equal to executor.cores*executor.instances).

References


Attachments

View file
nameHLF classifier.ipynb
height250

...