Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The last output layer is usually constituted by a single node that provides the discriminant output. Intermediate layers between the input and the output layers are called hidden layers. Usually, if a Neural Network has more than one hidden layer is called Deep Neural Network and theoretically is able to do by itself the feature extraction (it becomes a Deep Learning algorithm).

...

Hyperparameters are the variables which determines that determine the network structure and how the network is trained. Hyperparameters are set before training. A list of the main parameters is below:

  • Number of Hidden Layers and units: the hidden layers are the layers between the input layer and the output layer. Many hidden units within a layer can increase accuracy. Smaller A smaller number of units may cause underfitting.
  • Network Weight Initialization: ideally, it may be better to use different weight initialization schemes according to the activation function used on each layer. Mostly uniform distribution is used.
  • Activation functions: they are used to introduce nonlinearity to models, which allows deep learning models to learn nonlinear prediction boundaries.
  • Learning Rate: it defines how quickly a network updates its parameters. Low A low learning rate slows down the learning process but converges smoothly. Larger A larger learning rate speeds up the learning but may not converge. Usually, a decaying Learning rate is preferred.
  • Number of epochs: it is the number of times the whole training data is shown to the network while training. Increase the number of epochs until the validation accuracy starts decreasing even when training accuracy is increasing(overfitting).
  • Batch size: a number of sub samples subsamples (events) given to the network after the update of the parameters. A good default for batch size might be 32. Also try 32, 64, 128, 256, and so on.
  • Dropout: regularization technique to avoid overfitting thus increasing the generalizing power. Generally, use a small dropout value of 10%-50% of neurons.Too low value has minimal effect. Value too high results in under-learning by the network.

...

A callback is an object that can perform actions at various stages of training (e.g. at the start or end of an epoch, before or after a single batch, etc).

You can use callbacks in order to:

  • Write TensorBoard logs after every batch of training to monitor your metrics
  • Periodically save your model to disk
  • Do early stopping
  • Get a view on internal states and statistics of a model during training

...

The gain due to the splitting of a node A into the nodes B1 and B2, which depends on the chosen cut, is given by: ΔI=I(A)-I(B)-I(B2) , where I denotes the adopted metric (G or E, in case of the Gini index or cross-entropy introduced above). By varying the cut, the optimal gain may be achieved.

...