Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Short Description of the Use Case

Introduction to the statistical analysis problem

In this exercise, we perform a binary classification task using 2018 CMS Monte Carlo (MC) simulated samples representing the Vector Boson Fusion (VBF) Higgs boson production in the four-lepton final state signal and its main background processes. Two Machine Learning (ML) algorithms will be implemented: an Artificial Neural Network (ANN) and a Random Forest (RF).

Learning Goals of the exercise

  • You will learn how a Multivariate Analysis algorithm works and how a Machine Learning model must be implemented;
  • you will acquire basic knowledge about the Higgs boson physics as it is described by the Standard Model. During the exercise, you will be invited to plot some physical quantities in order to understand what is the underlying Particle Physics problem;
  • you will be invited to change hyperparameters of the ANN and the RF algorithms in order to understand better what are the consequences in terms of the model performances;
  • you will understand that the choice of the input variables is the key to the goodness of the algorithm since an optimal choice allows achieving the best possible performances;
  • moreover, you will have the possibility of changing the background datasets, the decay channels, and seeing how the performance of the ML algorithms changes.

Image Removed

Particle Physics basic concepts: the Standard Model and the Higgs boson

Image Removed

Experimental signature of the Higgs boson in a particle detector

...


Image Removed

Data exploration

In this exercise we are mainly interested in the following ROOT files (you may look at the web page  ROOT file if you prefer to learn more about which kind of objects you can store in them):

  • VBF_HToZZTo4mu.root;
  • GluGlueHtoZZTo4mu.root;
  • ZZto4mu.root.

Image Removed

The VBF ROOT file contains the Higgs boson production (mass of 125 GeV) via the Vector Boson Fusion (VBF) mechanism (qqH) signal events - that we want to discriminate from the so-called Gluon Gluon Fusion (ggH) Higgs production events and the QCD process ZZ → 4mu which are both irreducible backgrounds (see the Feynmann diagram in the pictures and the cross-sections/branching ratios expected for Higgs boson production processes and its decay channels).

Image Removed

The processes are characterized by the same final-state particles but we can use the value of multiple variables, such as kinematic properties of the particles, for classifying data into the two categories, signal, and background. The first one is the statistically less probable process that results in producing the Higgs boson at the Large Hadron Collider (LHC) experiments and it is still understudies by the CMS collaboration.

Image Removed

Image RemovedIn order to train our Machine Learning algorithms, we will look at the decay products of our physics problem. In our case we going to deal with:

  •  electrically-charged leptons (electrons or muons, denoted l)
  •  particle jets (collimated streams of particles originating from quarks or gluons, denoted j).

For each object, several kinetic variables are measured:

  •  the momentum transverse to the beam direction pt
  •  two angles θ (polar) and φ (azimuthal) - see picture below for the CMS reference frame used
  • for convenience, at hadron colliders, the pseudorapidity η, defined as η =-ln(tan(η/2)) is used instead of the polar angle θ.

We will use some of them for training our Machine Learning algorithms.

How to execute it

Use Googe Colab 

...