Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


A solution to the overtraining is pruning, that which is eliminating subtrees (branches) that seem too specific to the training sample:

  • a node and all its descendants turn into a leaf
  • stop tree growth during the building phase

Be Careful: early stopping condition conditions may prevent from discovering further useful splitting. Therefore, grow the full tree and when result results from subtrees are not significantly different from result the results of the parent one, prune them!

...

  • They are computed on statistics derived from the training dataset and therefore do not necessarily inform us on which features are most important to make good predictions on the held-out dataset.
  • They favor high cardinality features, that is are featured with many unique values. Permutation feature importance is an alternative to impurity-based feature importance that does not suffer from these flaws.

...