Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Introduction to the statistical analysis problem

In this exercise, we perform a binary classification task using 2018 CMS Monte Carlo (MC) simulated samples representing the Vector Boson Fusion (VBF) Higgs boson production in the four-lepton final state signal and its main background processes

...

. Two Machine Learning (ML) algorithms will be implemented: an Artificial Neural Network (ANN)

...

and a Random Forest (RF).

Learning Goals of the exercise

  • You will learn how a Multivariate Analysis algorithm works (see the below introduction) and more specifically works and how a Machine Learning model must be implemented;
  • you will acquire basic knowledge about the Higgs boson physics as as it is described by the Standard Model. During the exercise, you will be invited to plot some physical quantities in order to understand what is the underlying Particle Physics problem;
  • you will be invited to change hyperparameters hyperparameters of the ANN and the RF algorithms in order to understand better what are the consequences in terms of the models' model performances;
  • you will understand that the choice of the input variables  is the key to the goodness of the algorithm since an optimal choice allows to achieve achieving the best possible performances;
  • moreover, you will have the possibility of changing the background datasets, the decay channels of the final state particles, and seeing how the performance of the ML algorithms ' performance changes.

Multivariate Analysis and Machine Learning algorithms: basic concepts

Multivariate Analysis algorithms receive as input a set of discriminating variables. Each variable alone does not allow to reach an optimal discrimination power between two categories (signal and background). Therefore the algorithms compute an output that combines the input variables.

Image Removed

...

  • one that contains events distributed according to the null (in our case signal, there exists another convention in actual physics analysis) hypothesis H0 ;
  • another one according to the alternative (in our case background) hypothesis H1.

Then the algorithm must learn how to classify new datasets (the test dataset in our case).

...

  • If the test fails, and the performances of the test and training are different, this could be a symptom of overtraining and the model is not good!

A description of the Artificial Neural Network and Random Forest algorithms is inserted in the notebook itself.

Particle Physics basic concepts: the Standard Model and the Higgs boson

Image Removed

...


Image Added

Particle Physics basic concepts: the Standard Model and the Higgs boson

Image Added



Experimental signature of the Higgs boson in a particle detector



Image Added









Image Added

The ideal instrument for measuring the Higgs boson properties is a particle collider. The Large Hadron Collider (LHC), situated nearby Geneva, between France and Switzerland, is the largest proton-proton collider ever built on Earth. It consists of a 27 km circumference ring, where proton beams are smashed at a centre-of-mass energy of 13 TeV (99.999999% of speed of light). At the LHC, 40 Million collisions / second occurs, providing an enormous amount of data. Thanks to these data, ATLAS and CMS experiments discovered the missing piece of the Standard Model, the Higgs boson, in 2012.

During a collision, the energy is so high that protons are "broken" into their fundamental components, i.e. quarks and gluons, which can interact together, producing particles that we don't observe in our everyday life, such as the top quark. The production of a top quark is, by the way, a relatively "rare" phenomenon, since there are other physical processes that occur more often, such as those initiated by strong interaction, producing lighter quarks (such as up, down, strange quarks). In high-energy physics, we speak about the cross-section of a process. We say that the top quark production has a smaller cross-section than one of the productions of light quarks.

The experimental consequence is that distinguishing the decay products of a top quark from a light quark can be extremely difficult, due to the quite larger probability to occur of the latter phenomenon.

Experimental signature of the Higgs boson in a particle detector

...

Image Removed

Our physics problem consists of detecting the so-called golden channel H→ ZZ*→ l+ l- l'+ l'which is one of the possible Higgs boson's decays: its name is due to the fact that it has the clearest and cleanest signature of all the possible Higgs boson decay modes. The decay chain is sketched here: the Higgs boson decays into Z boson pairs, which in turn decay into a lepton pair (in the picture, muon-antimuon or electron-positron pairs). In this exercise, we will use only datasets concerning the 4mu decay channel and the datasets about the 4e channel are given to you to be analyzed as an optional exercise. At the LHC experiments, the decay channel 2e2mu is also widely analyzed.





Data exploration

In this exercise we are mainly interested in the following ROOT files (you may look at the web page  ROOT file if you prefer to learn more about which kind of objects you can store in them):

...