...
Introduction to the Random Forest algorithm
Decision Trees and their extension Random Forests are robust and easy-to-interpret machine learning algorithms for Classification tasks.
...
One of the biggest advantages of using Decision Trees and Random Forests is the ease in which we can see what features or variables contribute to the classification or regression and their relative importance based on their location depthwise in the tree.
Decision Tree
A decision tree is a sequence of selection cuts that are applied in a specified order on a given variable data sets.
...
The gain due to the splitting of a node A into the nodes B1 and B2, which depends on the chosen cut, is given by: , where I denotes the adopted metric (G or E, in case of the Gini index or cross entropy introduced above). By varying the cut, the optimal gain may be achieved.
Pruning Tree
A solution to the overtraining is pruning, that is eliminating subtrees (branches) that seem too specific to training sample:
- a node and all its descendants turn into a leaf
- stop tree growth during building phase
...
Here you are an example of how you can build a decision tree by yourself ! Try to imagine how the decision tree's growth could proceed in our analysis case and complete it! We give you some hints!
Here you are an example of how you can build a decision tree by yourself ! Try to imagine how the decision tree's growth could proceed in our analysis case and complete it! We give you some hints!
import matplotlib.pyplot as plt %matplotlib inline fig = plt.figure(figsize=(10, 4)) ax = fig.add_axes([0, 0, 0.8, 1], frameon=False, xticks=[], yticks=[]) ax.set_title('Decision Tree: Higgs Boson events Classification', size=24,color='red') def text(ax, x, y, t, size=20, **kwargs): ax.text(x, y, t, ha='center', va='center', size=size, bbox=dict( boxstyle='round', ec='blue', fc='w' ), **kwargs) # Here you are the variables we can use for the training phase: # -------------------------------------------------------------------------- # High level features: # ['f_massjj', 'f_deltajj', 'f_mass4l', 'f_Z1mass' , 'f_Z2mass'] # -------------------------------------------------------------------------- # Low level features: # [ 'f_lept1_pt','f_lept1_eta','f_lept1_phi', \ # 'f_lept2_pt','f_lept2_eta','f_lept2_phi', \ # 'f_lept3_pt','f_lept3_eta','f_lept3_phi', \ # 'f_lept4_pt','f_lept4_eta','f_lept4_phi', \ # 'f_jet1_pt','f_jet1_eta','f_jet1_phi', \ # 'f_jet2_pt','f_jet2_eta','f_jet2_phi'] #--------------------------------------------------------------------------- text(ax, 0.5, 0.9, "How large is\n\"f_lepton1_pt\"?", 20,color='red') text(ax, 0.3, 0.6, "How large is\n\"f_lepton2_pt\"?", 18,color='blue') text(ax, 0.7, 0.6, "How large is\n\"f_lepton3_pt\"?", 18) text(ax, 0.12, 0.3, "How large is\n\"f_lepton4_pt\"?", 14,color='magenta') text(ax, 0.38, 0.3, "How large is\n\"f_jet1_eta\"?", 14,color='violet') text(ax, 0.62, 0.3, "How large is\n\"f_jet2_eta\"?", 14,color='orange') text(ax, 0.88, 0.3, "How large is\n\"f_jet1_phi\"?", 14,color='green') text(ax, 0.4, 0.75, ">= 1 GeV", 12, alpha=0.4,color='red') text(ax, 0.6, 0.75, "< 1 GeV", 12, alpha=0.4,color='red') text(ax, 0.21, 0.45, ">= 3 GeV", 12, alpha=0.4,color='blue') text(ax, 0.34, 0.45, "< 3 GeV", 12, alpha=0.4,color='blue') text(ax, 0.66, 0.45, ">= 2 GeV", 12, alpha=0.4,color='black') text(ax, 0.79, 0.45, "< 2 GeV", 12, alpha=0.4,color='black') ax.plot([0.3, 0.5, 0.7], [0.6, 0.9, 0.6], '-k',color='red') ax.plot([0.12, 0.3, 0.38], [0.3, 0.6, 0.3], '-k',color='blue') ax.plot([0.62, 0.7, 0.88], [0.3, 0.6, 0.3], '-k') ax.plot([0.0, 0.12, 0.20], [0.0, 0.3, 0.0], '--k') ax.plot([0.28, 0.38, 0.48], [0.0, 0.3, 0.0], '--k') ax.plot([0.52, 0.62, 0.72], [0.0, 0.3, 0.0], '--k') ax.plot([0.8, 0.88, 1.0], [0.0, 0.3, 0.0], '--k') ax.axis([0, 1, 0, 1]) fig.savefig('05.08-decision-tree.png')