Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Short Description of the Use Case

In this exercise, we perform a binary classification task using 2018 CMS Monte Carlo (MC) simulated samples representing the Vector Boson Fusion (VBF) Higgs boson production in the four-lepton final state signal and its main background processes. Two Machine Learning (ML) algorithms will be implemented: an Artificial Neural Network (ANN) and a Random Forest (RF).

Learning Goals of the exercise

  • You will learn how a Multivariate Analysis algorithm works and how a Machine Learning model must be implemented;

  • you will acquire basic knowledge about the Higgs boson physics as it is described by the Standard Model. During the exercise, you will be invited to plot some physical quantities in order to understand what is the underlying Particle Physics problem;

  • you will be invited to change hyperparameters of the ANN and the RF algorithms in order to understand better what are the consequences in terms of the model performances;

  • you will understand that the choice of the input variables is the key to the goodness of the algorithm since an optimal choice allows achieving the best possible performances;

  • moreover, you will have the possibility of changing the background datasets, the decay channels, and seeing how the performance of the ML algorithms changes.

Multivariate Analysis and Machine learning algorithms: basic concepts

Multivariate Analysis algorithms receive as input a set of discriminating variables. Each variable alone does not allow to reach an optimal discrimination power between two categories (signal and background). Therefore the algorithms compute an output that combines the input variables.

This is what every Multivariate Analysis (MVA) discriminator does. The discriminant output, also called discriminator, score , or classifier, is used as a test statistic and is then adopted to perform the signal selection. It could be used as a variable on which a cut can be applied under a particular hypothesis test.

In particular, Machine Learning tools are models which have enough capacity to define their own internal representation of data to accomplish two main tasks : learning from data and make predictions without being explicitly programmed to do so.

How to execute it

Use Googe Colab 

...