Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Training (learning): a discriminator is built by using all the input variables. Then, the parameters are iteratively modified by comparing the discriminant output to the true label of the dataset (supervised machine learning algorithms, we will use two of them). This phase is crucial: one should tune the input variables and the parameters of the algorithm!

    • As an alternative, algorithms that group and find patterns in the data according to the observed distribution of the input data are called unsupervised learning.
    • A good habit is training multiple models with various hyperparameters on a “reduced” training set ( i.e. the full training set minus subtracting the so-called validation set), and then select the model that performs best on the validation set.
    • Once, the validation process is over, you can re-train the best model on the full training set (including the validation set), and this gives you the final model.
  • Test: once the training has been performed, the discriminator score is computed in a separated, independent dataset for both H0 and H1 .

  • A comparison is made between test and training classifier and their performances (in terms of ROC curves) are evaluated.
    • If the test fails and the performance of the test and training are different, this could be a symptom of overtraining and our model can be considered not good!

...