Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • You will learn how a Multivariate Analysis algorithm works (see the below introduction) and more specifically how a Machine Learning model must be implemented;
  • you will acquire basic knowledge about the Higgs boson physics as described by the Standard Model. During the exercise you will be invited to plot some physical quantities in order to understand what is the underlying Particle Physics problem;
  • you will be invited to change hyperparameters of the ANN and the RF algorithms to understand better what are the consequences in terms of the models' performances;
  • you will understand that the choice of the input variables is the key to the goodness of the algorithm since an optimal choice allows achieving to achieve the best possible performances;
  • moreover, you will have the possibility of changing the background datasets, the decay channels of the final state particles, and seeing how the ML algorithms' performance changes.

...

  • Training (learning): a discriminator is built by using all the input variables. Then, the parameters are iteratively modified by comparing the discriminant output to the true label of the dataset (supervised machine learning algorithms, we will use two of them). This phase is crucial, : one should tune the input variables and the parameters of the algorithm!
    • Alternatively, algorithms that group and find patterns in the data according to the observed distribution of the input data are called unsupervised learning.
    • A good habit is training multiple models with various hyperparameters on a “reduced” training set ( i.e. the full training set minus the so-called validation set), and then select the model that performs best on the validation set. 
    • Once, the validation process is over, you can re-train the best model on the full training set (including the validation set), and this gives you the final model
  • Test: once the training has been performed, the discriminator score is computed in a separated, independent dataset for both H0 and H1.
  • An overfitting check is performed between test and training classifier and their performances are computed (e.g. in terms of ROC curves).
    • If the test fails, and the performances of the test and training are different, this could be a symptom of overtraining and the model is not good!

...