Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • You will learn how a Multivariate Analysis algorithm works and how a Machine Learning model must be implemented;

  • you will acquire basic knowledge about the Higgs boson physics as it is described by the Standard Model. During the exercise, you will be invited to plot some physical quantities in order to understand what is the underlying Particle Physics problem;

  • you will be invited to change hyperparameters of the ANN and the RF algorithms in order to understand better what are the consequences in terms of the model performances;

  • you will understand that the choice of the input variables is the key to the goodness of the algorithm since an optimal choice allows achieving the best possible performances;

  • moreover, you will have the possibility of changing the background datasets, the decay channels, and seeing how the performance of the ML algorithms changes.

Multivariate Analysis and Machine learning algorithms: basic concepts

Multivariate Analysis algorithms receive as input a set of discriminating variables. Each variable alone does not allow to reach an optimal discrimination power between two categories (signal and background). Therefore the algorithms compute an output that combines the input variables.

...

  • Training (learning): a discriminator is built by using all the input variables. Then, the parameters are iteratively modified by comparing the discriminant output to the true label of the dataset (supervised machine learning algorithms, we will use two of them). This phase is crucial: one should tune the input variables and the parameters of the algorithm!

    • As an alternative, algorithms that group and find patterns in the data according to the observed distribution of the input data are called unsupervised learning.
    • A good habit is training multiple models with various hyperparameters on a “reduced” training set ( i.e. the full training set minus the so-called validation set), and then select the model that performs best on the validation set.
    • Once, the validation process is over, you can re-train the best model on the full training set (including the validation set), and this gives you the final model.
  • Test: once the training has been performed, the discriminator score is computed in a separated, independent dataset for both and .

  • A comparison is made between test and training classifier and their performances (in terms of ROC curves) are evaluated.
    • If the test fails and the performance of the test and training are different, this could be a symptom of overtraining and our model can be considered not good!

Introduction to the physics problem

In this section you will find the following subsections:

...

  • VBF_HToZZTo4mu.root
  • GluGlueHtoZZTo4mu.root
  • ZZto4mu.root.

The VBF ROOT file contains the Higgs boson production (mass of 125 GeV) via the Vector Boson Fusion (VBF) mechanism Image Modified - our signal events - that we want to discriminate from the so-called Gluon Gluon Fusion Image Modified Higgs production events and the QCD process which are both irreducible backgrounds (you can see an example of an irreducible background in the Feynmann diagram at the leading order (LO) in the picture below and the cross-sections expected for the Higgs boson production processes and the branching ratios for its decay channels ).
Image Modified
The processes are characterized by the same final-state particles but we can use the value of multiple variables,such as kinematic properties of the particles, for classifying data into the two categories,signal and background.

The first one is the statistically less probable process that results in producing the Higgs boson at the Large Hadron Collider (LHC) experiments and it is still understudies by the CMS collaboration.

...