Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We can now start to define a the first architecture. The most simple approach is using fully connected layers (Dense layers in Keras/Tensorflow), with seluactivation function and a sigmoid final layer, since we are affording a binary classification problem.

...

Scikit-learn is a simple and efficient tools tool for predictive data analysis, accessible to everybody, and reusable in various contexts. It is built on NumPy, SciPy, and matplotlib scientific libraries.

...

Decision Trees and their extension Random Forests are robust and easy-to-interpret machine learning algorithms for Classification classification tasks.

Decision Trees comprise a simple and fast way of learning a function that maps data x to outputs y, where x can be a mix of categorical and numeric variables and y can be categorical for classification, or numeric for regression.

...

Selection cuts can be tuned in order to achieve the best split level in each node according to some metrics (gini Gini index, cross-entropy...). Most of them are related to the purity of a node, that is the fraction of signal events over the whole events set in a given node P=S/(S+B).

The gain due to the splitting of a node A into the nodes B1 and B2, which depends on the chosen cut, is given by : ΔI=I(A)-I(B)-I(B2) , where I denotes the adopted metric (G or E, in case of the Gini index or cross-entropy introduced above). By varying the cut, the optimal gain may be achieved.

...