...
A Neural Networks (NN) can be classified according to the type of neuron interconections and the flow of information.
Feed Forward Networks
A feedforward NN is a neural network where connections between the nodes do not form a cycle. In a feed forward network information always moves one direction, from input to output, and it never goes backwards. Feedforward NN can be viewed as mathematical models of a func5on .
Recurrent Neural Network
A Recurrent Neural Network (RNN) is a neural network that allows connections between nodes in the same layer, with themselves or with previous layers.
Unlike feedforward neural networks, RNNs can use their internal state (memory) to process sequential input data.
...
Such a structure is also called Feedforward Multilayer Perceptron (MLP, see the picture).
The output of the node of the layers is computed as weighted average of the input variables, with weights that are subject to optimization via training.
...
A popular algorithm to optimize the weights, consists in iteratively modifying the weights afer each training observation or after a bunch of training observation by doing a minimization of the loss function.
The minimization usually proceeds via the so-called Stochastic Gradient Descent (SGD) which modifies weight at each iteration according to the following formula:
.
Other more complex optimization algorithms are available in KERAS API.
...
Metric functions are similar to loss functions, except that the results from evaluating a metric are not used when training the model. Note that you may use any loss function as a metric.
Other parameters of a Neural Network
Hyperparameters are the variables which determines the network structure and how the network is trained. Hyperparameters are set before training. A list of the main parameters is below:
Number of Hidden Layers and units
: the hidden layers are the layers between input layer and output layer. Many hidden units within a layer can increase accuracy. Smaller number of units may cause underfitting.Network Weight Initialization
: ideally, it may be better to use different weight initialization schemes according to the activation function used on each layer. Mostly uniform distribution is used.Activation functions
: they are used to introduce nonlinearity to models, which allows deep learning models to learn nonlinear prediction boundaries.Learning Rate
: it defines how quickly a network updates its parameters.Low learning rate slows down the learning process but converges smoothly. Larger learning rate speeds up the learning but may not converge. Usually a decaying Learning rate is preferred.Number of epochs
: it is the number of times the whole training data is shown to the network while training. Increase the number of epochs until the validation accuracy starts decreasing even when training accuracy is increasing(overfitting).Batch size
: number of sub samples (events) given to the network after the update of the parameters. A good default for batch size might be 32. Also try 32, 64, 128, 256, and so on.Dropout
: regularization technique to avoid overfitting thus increasing the generalizing power. Generally, use a small dropout value of 10%-50% of neurons.Too low value has minimal effect. Value too high results in under-learning by the network.
Applications in High Energy Physics
Nowadays ANN are used on a variety of tasks: image and speech recognition, translation,filtering, playing games, medical diagnosis, autonomous vehicles. There are also many applications in High Energy Physics: classification of signal and background events, particle tagging, simulation of event reconstruction...
Usage of Keras API: basic concepts
Keras layers API
Layers are the basic building blocks of neural networks in Keras. A layer consists of a tensor-in tensor-out computation function (the layer's call method) and some state, held in TensorFlow variables (the layer's weights).
Callbacks API
A callback is an object that can perform actions at various stages of training (e.g. at the start or end of an epoch, before or after a single batch, etc).
You can use callbacks to:
- Write TensorBoard logs after every batch of training to monitor your metrics
- Periodically save your model to disk
- Do early stopping
- Get a view on internal states and statistics of a model during training
More info and examples about the most used: EarlyStopping, LearningRateScheduler, ReduceLROnPlateau.
Regularization layers : the dropout layer
The Dropout layer randomly sets input units to 0 with a frequency of rate
at each step during training time, which helps prevent overtraining. Inputs not set to 0 are scaled up by 1/(1-rate)
such that the sum over all inputs is unchanged.
Note that the Dropout layer only applies when training is set to True
such that no values are dropped during inference. When using model.fit
, training will be appropriately set to True
automatically, and in other contexts, you can set the kwarg explicitly to True
when calling the layer.
Artificial Neural Network implementation
We can now start to define a first architecture. The most simple approach is using fully connected layers (Dense
layers in Keras/Tensorflow), with selu
activation function and a sigmoid
final layer, since we are affording a binary classification problem.
We are using the binary_crossentropy
loss function during training, a standard loss function for binary classification problems. We will optimize the model with the RMSprop algorithm and we will collect accuracy
metrics while the model is trained.
To avoid overfitting we use also Dropout layers and some callback functions.