Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

import pickle

# Save to file in the current working directory
pkl_filename = "rf_model.pkl"
with open(pkl_filename, 'wb') as file:
    pickle.dump(rfc, file)

Performance evaluation

In this section you will find the following subsections:

  • ROC curve and Rates definitions
  • Overfitting and test evaluation of a an MVA model
    If you have the knowledge about these theoretical concepts you may skip it.
  • Artificial Neural Network performance
  • Exercise 1 - Random Forest performance
    Here you will re-do the procedure followed for the ANN in order to evaluate the Random Forest performance.
    Finally, you will compare the discriminating performance of the two trained ML models.

...

The recall/sensitivity/TPR/signal efficiency is the ratio where TP is the number of true positives and FN the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

The accuracy Accuracy is defined as the number of good matches between the predictions and the true labels.

...

#Let's import all the metrics that we need later on!
from sklearn.metrics import ConfusionMatrixDisplay,confusion_matrix,accuracy_score , precision_score , recall_score , precision_recall_curve , roc_curve, auc , roc_auc_score

Overfitting and test evaluation of

...

an MVA model


The loss function and the accuracy metrics give us a measure of the overtraining (overfitting) of the ML algorithm. Over-fitting happens when a an ML algorithm learns to recognize a pattern that is primarily based on the training (validation) sample and that is nonexistent when looking at the testing (training) set (see the plot on the right side to understand what we would expect when overfitting happens).

...