Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Layers are the basic building blocks of neural networks in Keras. A layer consists of a tensor-in tensor-out computation function (the layer's call method) and some state, held in TensorFlow variables (the layer's weights).

Callbacks API

Image Modified

A callback is an object that can perform actions at various stages of training (e.g. at the start or end of an epoch, before or after a single batch, etc).

...

Regularization layers : the dropout layer

Image Modified

The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overtraining. Inputs not set to 0 are scaled up by 1/(1-rate) such that the sum over all inputs is unchanged.

...

Let's implement the grid search algorithm for our Random Forest discriminator!

Image Modified
Grid Search algorithm basically tries all possible combinations of parameter values and returns the combination with the highest accuracy. The Grid Search algorithm can be very slow, owing to the potentially huge number of combinations to test. Furthermore, the cross-validation further increases the execution!
For these reasons, the algorithm is commented in the following code cells and images of the outputs are left to you!

To read more about cross-validation on Scikit-learn:

...

Output of the previous code cell:


# Best parameters:

# {'bootstrap': False, 'criterion': 'gini', 'max_depth': 5, 'max_features': 4, 
# 'min_samples_leaf': 500, 'min_samples_split': 200, 'n_estimators': 300}
# Best metrics score (accuracy):
# 0.9199343250564441
rfc=RandomForestClassifier( n_estimators=300,criterion='gini',
                           verbose=0 , min_samples_split=200, 
                           max_depth= 5,min_samples_leaf=500, 
                           max_features=4, bootstrap=False,random_state=7 )
# Use the same sets X_train_val, X_test, Y_train_val, Y_test , W_train_val , 
# W_test used for the ANN in order to train our Random Forest algorithm

randomforest=rfc.fit(X_train_val,np.ravel(Y_train_val),np.ravel(W_train_val))

In the following few lines of code, the random forest model which we created in the previous step is saved as a .pkl file so that you can load it as a new object called pickled_model in another notebook!

import pickle

# Save to file in the current working directory
pkl_filename = "rf_model.pkl"
with open(pkl_filename, 'wb') as file:
    pickle.dump(rfc, file)

Performance evaluation

In this section you will find the following subsections:

  • ROC curve and Rates definitions
  • Overfitting and test evaluation of a MVA model
    If you have the knowledge about these theoretical concepts you may skip it.
  • Artificial Neural Network performance
  • Exercise 1 - Random Forest performance
    Here you will re-do the procedure followed for the ANN in order to evaluate the Random Forest performance.
    Finally, you will compare the discriminating performance of the two trained ML models.

ROC curve and rates definitions

There are many ways to evaluate the quality of a model’s predictions. In the ANN implementation, we were evaluating the accuracy metrics and losses of the training and validation samples.

A largely used evaluation metrics for binary classification tasks is also the Receiver Operating Characteristic curve or ROC curve.

First, we introduce the terms positive and negative referring to the classifier’s prediction, and the terms true and false referring to whether the network prediction corresponds to the observation (the "truth" level). In our Higgs boson binary classification exercise, we can think the negative outcome as the one labeling background (that, in the last sigmoid layer of our network, would mean a number close to 0 - in the Random Forest score would mean a number equals to zero), and the positive outcome as the one labeling signal (that, in the last sigmoid layer of our network, would mean a number close to 1 - random forest score equals to zero).

  • TP (true positive): the event is signal, the prediction is signal (correct result)
  • FP (false positive): the event is background, but the prediction is signal (unexpected result)
  • TN (true negative): the event is background, the prediction is background (correct absence of signal)
  • FN (false negative): the event is signal, the prediction is background (missing a true signal event)

Some additional definitions:

  • TPR (true positive rate): how often the network predicts a positive outcome (signal), when the input is positive (signal): Image Added
  • FPR (false positive rate): how often the network predicts a positive outcome (signal), when the input is negative (background) : Image Added

A good classifier should give an high TPR and a small FPR.

Quoting wikipedia:

"A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.

The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity, probability of detection, or signal efficiency in high energy physics. The false-positive rate is also known as the probability of false alarm or fake rate in high energy physics."

The ROC curve requires the true binary value (0 or 1, background or signal) and the probability estimates of the positive (signal) class.

The roc_auc_score function computes the area under the receiver operating characteristic (ROC) curve, which is also denoted by AUC. By computing the area under the roc curve, the curve information is summarized in one number.

For more information see: https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve.

The AUC is the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one. The higher the AUC, the better the performance of the classifier. If the AUC is 0.5, the classifier is uninformative, i.e., it will rank equally a positive or a negative observation.

Other metrics

The precision/purity is the ratio Image Addedwhere TPR is the number of true positives and FPR the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.

The recall/sensitivity/TPR/signal efficiency is the ratio Image Added where TP is the number of true positives and FN the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

The accuracy is defined as the number of good matches between the predictions and the true labels.

You can always achieve high accuracy on skewed/unbalanced datasets by predicting the most the same output (the most common one) for every input. Thus the another metrics, F1 can be used when there are more positive examples than negative examples. It is defined in terms of the precision and recall as (2 precision recall) / (precision + recall). In our case we will use a semplification of this metrics that is the product signal*efficiency.

In [ ]:

#Let's import all the metrics that we need later on!
from sklearn.metrics import ConfusionMatrixDisplay,confusion_matrix,accuracy_score , precision_score , recall_score , precision_recall_curve , roc_curve, auc , roc_auc_score

Overfitting and test evaluation of a MVA model

Image Added


The loss function and the accuracy metrics gives us a measure of the overtraining (overfitting) of the ML algorithm. Over-fitting happens when a ML algorithm learns to recognize a pattern that is primarily based on the training (validation) sample and that is nonexistent when looking at the testing (training) set (see the plot on the right side to understand what we would expect when overfitting happens).



References

Attachments