Ticker

6/recent/ticker-posts

Neural Networks Fully Explained - Theory



                        ALL YOU WANT TO KNOW ABOUT THE NEURAL NETWORKS


So a neural network consists of many layers with different names. The first layer is the input layer, last layer is the output layer and the layers between them are called hidden layers. each layers consists of one or more neurons. 



A neuron, like in the human body, the neuron gives the electric signal as an output after getting some high influence input. So just like this, a neuron in the neural network, gives some output after getting inputs from the previous layer.




The above is the diagram of a neuron. here x1, x2, x3 _ _ _ xn are the input data and w1, w2, w3 _ _ _ wn are the weights.

These weights and input data are multiplied respectively and are added. Then we add Bias to it and it forms a variable z.

z = (x1 * w1) + (x2 * w2) + _ _ _ (xn * wn) + Bias

A = f(z)

where A is the output of the neuron. f(z) is the Activation Function



Activation Function -

f(z) is the Activation Function in the above equation. we cant just use z as output and give it to the next layer as the  value of Z ranges from -infinity to +infinity. So we apply some calculus and derivatives on z to get the threshold value that is f(z).


Some important Activation Functions -


Sigmoid Activation Function -

Its numerical representation is  1/(1+e^(-z))

This function gives us the output between 0 and 1.

This also improves the learning process as it gives non linear output and it follows the principle of -lower influence, lower output and higher influence, higher output.



ReLU Activation Function -

ReLU uses function f(z) = max(0,z). This means that if the output is positive it would output the same value, otherwise it would output 0.



Model -

The overall structure of the neural network is developed using the model object in keras. This provides a simple way to create a stack of layers by adding new layers one after the other.

Easiest way to define a model is by using the sequential model which allows easy creation of linear stack of layers. Creation of a simple Sequential model with one layer having 10 neurons and 15 as the input dimension-

from keras.models import Sequential

from keras.layers import Dense, Activation

model = Sequential()

model.add(Dense(10, input_dim = 15, Activation = "relu"))



Core Layers in the model-


Dense Layer - 

A dense layer connects every neuron in the defined layer to every neuron in the previous layer.

for example, if layer1 has 5 neurons and layer2 had 4 neurons then the total number of connections

would be 5*4 = 20.

Syntax-

keras.layers.Dense(5, input_dim=3)



Dropout Layer-

It helps to reduce the overfitting the model. It just drops out a few neurons or set them to 0 and reduces 

computation in the training process. Its added after a regular layer. syntax example-

model.add(Dense(5, input_dim = 10, Activation="relu"))

model.add(Dropout(rate = 0.1, seed = 100))



Loss Function - 

A loss functions helps the model to improve its learning process. It measures the loss from the target value.

Important Loss Functions are -


Mean Squared Error -

Its the average squared difference between the actual and predicted value. So in the loss, difference of 3 would result in 9 and a difference of 9 would result in 81 (because the difference is squared)

Σ((actual - predicted)^2) / K



Binary Cross Entropy -

It defines the loss when the categorical outcome is a binary variable, that is only 2 possible outcomes.

like - (Yes/No) or (Pass/Fail)


Categorical Cross Entropy -

When the categorical outcome is non binary or greater than 2.

Like - (Yes/No/Maybe)



Optimizer -

After the loss Function calculates the loss then the output is backpropagated and the weights and bias are adjusted accordingly by the Optimizer to reduce the losses. Most important optimizers -


Stochastic Gradient Descent (SGD) -

SGD performs an iteration with each training sample (i.e., after the pass of every training sample, it calculates the loss and updates the weight). Since the weights are updated too frequently, the overall loss curve would be very noisy. However, the optimization is relatively fast compared to others.

The formula for weight updates can be expressed in a simple way as follows:

                        Weights = Weights – learning rate * Loss



Adam -

It stands for Adaptive Moment Estimation, is by far the most popular and widely used optimizer in DL. In most cases, you can blindly choose the Adam optimizer and forget about the optimization alternatives. This optimization technique computes an adaptive learning rate for each parameter.

 It defines momentum and variance of the gradient of the loss and leverages a combined effect to update the weight parameters. The momentum and variance together help smooth the learning curve and effectively improve the learning process.

The math representation can be simplified in the following way:

        Weights = Weights – (Momentum and Variance combined)



How the whole Neural Network works -


Imagine we have made a model structure to classify whether a student will pass or fail. The structure created by defining the sequence of layers with the number of neurons, activation functions, and the input and output shape is initialized with random weights in the beginning.

 The weights that determined the influence of a neuron on the next neuron or the final output are updated during the learning process by the network. A network with randomized weights and a defined structure is the starting point for a model. The model can make a prediction at this point, but it would almost be of no value. The model takes one training sample and use its values as inputs to the neurons in the first layer, which then produces the output with defined activation function.

 The output now becomes an input for for the next layer, and so on. The output of the final layer would be the prediction for the training sample. Now Loss function does its part and help the network to understand how well or poorly the current set of weights has performed on the training sample. The next step is to reduce the loss. The optimizer function helps in this step. 

The optimizer function is a mathematical algorithm that uses derivates, partial derivatives and chain rule to understand how much change the network will see in the loss function by making a small change in the weights of the neurons. This computation of one training sample from the input layer to the output layer is called a Pass. A batch is a collection of training samples from the entire input.

 The network updates its weights after preprocessing all samples in a batch. This is called an Iteration. The computing of all training samples provided in the input data with batch by batch weight updates is called epoch. After processing all the epochs specified in the network, we get the final result that is called the predicted data.


Post a Comment

0 Comments