Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A Neural Networks (NN) can be classified according to the type of neuron interconections and the flow of information.

Feed Forward Networks

Image Modified
A feedforward NN is a neural network where connections between the nodes do not form a cycle. In a feed forward network information always moves one direction, from input to output, and it never goes backwards. Feedforward NN can be viewed as mathematical models of a func5on Image Modified.

Recurrent Neural Network

Image Modified
A Recurrent Neural Network (RNN) is a neural network that allows connections between nodes in the same layer, with themselves or with previous layers.

Unlike feedforward neural networks, RNNs can use their internal state (memory) to process sequential input data.

...

Such a structure is also called Feedforward Multilayer Perceptron (MLP, see the picture).

Image Modified

The output of the node of the layers is computed as weighted average of the input variables, with weights that are subject to optimization via training.

...

A popular algorithm to optimize the weights, consists in iteratively modifying the weights afer each training observation or after a bunch of training observation by doing a minimization of the loss function.

The minimization usually proceeds via the so-called Stochastic Gradient Descent (SGD) which modifies weight at each iteration according to the following formula: Image Modified

.


Other more complex optimization algorithms are available in KERAS API.

...

Metric functions are similar to loss functions, except that the results from evaluating a metric are not used when training the model. Note that you may use any loss function as a metric.

Other parameters of a Neural Network

Hyperparameters are the variables which determines the network structure and how the network is trained. Hyperparameters are set before training. A list of the main parameters is below:

  • Number of Hidden Layers and units: the hidden layers are the layers between input layer and output layer. Many hidden units within a layer can increase accuracy. Smaller number of units may cause underfitting.
  • Network Weight Initialization: ideally, it may be better to use different weight initialization schemes according to the activation function used on each layer. Mostly uniform distribution is used.
  • Activation functions: they are used to introduce nonlinearity to models, which allows deep learning models to learn nonlinear prediction boundaries.
  • Learning Rate: it defines how quickly a network updates its parameters.Low learning rate slows down the learning process but converges smoothly. Larger learning rate speeds up the learning but may not converge. Usually a decaying Learning rate is preferred.
  • Number of epochs: it is the number of times the whole training data is shown to the network while training. Increase the number of epochs until the validation accuracy starts decreasing even when training accuracy is increasing(overfitting).
  • Batch size: number of sub samples (events) given to the network after the update of the parameters. A good default for batch size might be 32. Also try 32, 64, 128, 256, and so on.
  • Dropout: regularization technique to avoid overfitting thus increasing the generalizing power. Generally, use a small dropout value of 10%-50% of neurons.Too low value has minimal effect. Value too high results in under-learning by the network.

Applications in High Energy Physics

Nowadays ANN are used on a variety of tasks: image and speech recognition, translation,filtering, playing games, medical diagnosis, autonomous vehicles. There are also many applications in High Energy Physics: classification of signal and background events, particle tagging, simulation of event reconstruction...

Usage of Keras API: basic concepts

Keras layers API

Layers are the basic building blocks of neural networks in Keras. A layer consists of a tensor-in tensor-out computation function (the layer's call method) and some state, held in TensorFlow variables (the layer's weights).

Callbacks API

Image Added

A callback is an object that can perform actions at various stages of training (e.g. at the start or end of an epoch, before or after a single batch, etc).

You can use callbacks to:

  • Write TensorBoard logs after every batch of training to monitor your metrics
  • Periodically save your model to disk
  • Do early stopping
  • Get a view on internal states and statistics of a model during training

More info and examples about the most used: EarlyStopping, LearningRateScheduler, ReduceLROnPlateau.

Regularization layers : the dropout layer

Image Added

The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overtraining. Inputs not set to 0 are scaled up by 1/(1-rate) such that the sum over all inputs is unchanged.

Note that the Dropout layer only applies when training is set to True such that no values are dropped during inference. When using model.fit, training will be appropriately set to Trueautomatically, and in other contexts, you can set the kwarg explicitly to True when calling the layer.

Artificial Neural Network implementation

We can now start to define a first architecture. The most simple approach is using fully connected layers (Dense layers in Keras/Tensorflow), with seluactivation function and a sigmoid final layer, since we are affording a binary classification problem.

We are using the binary_crossentropy loss function during training, a standard loss function for binary classification problems. We will optimize the model with the RMSprop algorithm and we will collect accuracy metrics while the model is trained.

To avoid overfitting we use also Dropout layers and some callback functions.

References

Attachments