Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Training (learning): a discriminator is built by using all the input variables. Then, the parameters are iteratively modified by comparing the discriminant output to the true label of the dataset (supervised machine learning algorithms, we will use two of them). This phase is crucial, one should tune the input variables and the parameters of the algorithm!
    • Alternatively, algorithms that group and find patterns in the data according to the observed distribution of the input data are called unsupervised learning.
    • A good habit is training multiple models with various hyperparameters on a “reduced” training set ( i.e. the full training set minus the so-called validation set), and then select the model that performs best on the validation set. If you have the possibility of having more than one validation set, you can do a so-called cross-validation check (we will do it on the RF algorithm).
    • Once, the validation process is over, you re-train the best model on the full training set (including the validation set), and this gives you the final model
  • Test: once the training has been performed, the discriminator score is computed in a separated, independent dataset for both H0 and H1.
  • An overfitting check is performed between test and training classifier and their performances are computed (e.g. in terms of ROC curves).
    • If the test fails, and the performances of the test and training are different, it is a symptom of overtraining and the model is not good!

A description of the Artificial Neural Network and Random Forest algorithms is inserted in the notebook itself.

Particle Physics basic concepts: the Standard Model and the Higgs boson

...

Input files 

The datasets files are stored on the Recas Bari's ownCloud and are automatically loaded by the notebook. In case, they are also available here (4 muons decay channel) for the main exercise and here (4 electrons decay channel) for the optional exercise.

In the following, the most important excerpts are described.

...