Table of Contents |
---|
Author(s)
Name | Institution | Mail Address |
---|---|---|
Matteo Migliorini | INFN Sezione di Padova | matteo.migliorini@pd.infn.it |
General Information
ML/DL Technologies | Feedforward neural network |
---|---|
Science Fields | High Energy Physics |
Difficulty | Low |
Language | English |
Type | Runnable |
Software and Tools
Programming Language | Python |
---|---|
ML Toolset | BigDL |
Additional libraries | BigDL, Spark |
Suggested Environments | INFN-Cloud VM, Spark Cluster |
Short Description of the Use Case
The effective utilization at scale of complex machine learning techniques for HEP use cases poses several technological challenges, most importantly on the actual implementation of dedicated end-to-end data pipelines. In this paper we presented a possible solution to this challenges built using industry standard big data tools, such as Apache Spark. In the presented pipeline we exploited Spark in all the steps, from ingesting and processing ROOT files containing the dataset to the distributed training of the model.
...
In this simple example, we will train a deep neural network with the goal of classifying three different kind of events. A better description of the HEP use case and model used is provided in the original paper.
How to execute it
A Spark cluster is required in order to perform the training. This can be created on INFN Cloud by performing the following steps:
...
In principle they should not be modified and work out of the box. If one wish to use more workers for the training and slaves ara available, the number of executors can be increased in the spark session in the option "spark.executor.instances" and "spark.cores.max" (which should be equal to executor.cores*executor.instances).
References
- A detailed description of the work around this example can be found in this paper:
Migliorini, M., Castellotti, R., Canali, L. et al. Machine Learning Pipelines with Modern Big Data Tools for High Energy Physics. Comput Softw Big Sci 4, 8 (2020). https://doi.org/10.1007/s41781-020-00040-0
- Data and code used for this work can be found in this repository.
- Related blog entries:
- Presentations:
Attachments
View file | ||||
---|---|---|---|---|
|
...