Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

model = keras.models.load_model('ANN_model.h5')

Description of the Random Forest (RF) and Scikit-learn library

In this section you will find the following subsections:

...

The gain due to the splitting of a node A into the nodes B1 and B2, which depends on the chosen cut, is given by ΔI=I(A)-I(B1)-I(B2) , where I denotes the adopted metric (G or E, in case of the Gini index or cross-entropy introduced above). By varying the cut, the optimal gain may be achieved.

Pruning Tree


A solution to the overtraining is pruning, which is eliminating subtrees (branches) that seem too specific to the training sample:

...

Grid Search for Parameter estimation

Grid Search for Parameter estimation

A machine learning model has two types of parameters. The first type of parameter is are the parameters that are learned through a machine learning model while the second type of parameter is the hyperparameter that we pass to the machine learning modelare the hyperparameters whose value is used to control the learning process.

Hyperparameters can be thought of as model settings. These settings need to be tuned for each problem because the best model hyperparameters for one particular dataset will not be the best across all datasets.

The process of hyperparameter tuning (also called hyperparameter optimization) means finding the combination of hyperparameter values for a machine learning model that performs the best - as measured on a validation dataset - for a problem.

Normally we set the value for these hyperparameters by hand, as we did for our ANN, and see what which parameters result in reach the best performance. However, randomly selecting the parameters for the algorithm can be exhaustive.

Also, it is not easy to compare the performance of different algorithms by randomly setting the hyperparameters because one algorithm may perform better than the other with a different set of parameters. And if the parameters are changed, the algorithm may perform worse than the other algorithms.

Therefore, instead of randomly selecting the values of the parameters, a better approach would be to develop an algorithm that which automatically finds the best parameters for a particular model. Grid Search is one such algorithmof such algorithms.

Hyperparameter optimization algorithms usually finds a tuple of hyperparameters that yields an optimal model which maximizes a predefined metric on a given independent data.The metric takes a tuple of hyperparameters and returns the associated value. Cross-validation) is often used to estimate this generalization performance.


from

...

sklearn.ensemble

...

import

...

RandomForestClassifier


from sklearn.metrics import plot_roc_curve
from sklearn.model_selection import GridSearchCV

Let's implement the grid search algorithm for our Random Forest discriminator!

Image Modified
Grid Search algorithm basically tries all possible combinations of parameter values and returns the combination with the highest accuracy. The Grid Search algorithm can be very slow, owing to the potentially huge number of combinations to test. Furthermore,

...

performing cross-validation

...

considerably increases the execution time of the process!
For these reasons, the algorithm is commented on the following code cells and images of the outputs are left to you!

To read more about cross-validation on Scikit-learn:

...