Recurrent Neural Networks
• Recurrent Neural Networks (RNNs) are a type of artificial neural network designed to process sequences of data.
  They work especially well for jobs requiring sequences, such as time series data, voice, natural language, and
  other activities.
• RNN works on the principle of saving the output of a particular layer and feeding this back to the input in order to
  predict the output of the layer.
• Below is how you can convert a Feed-Forward Neural Network into a Recurrent Neural Network:
                                 Fig: Simple Recurrent Neural Network
                                                                                                                 1
   Recurrent Neural Networks (Cont…)
The nodes in different layers of the neural network are compressed to form a single layer of recurrent neural
networks. A, B, and C are the parameters of the network.
                                                                                                                2
   Recurrent Neural Networks (Cont…)
Here, “x” is the input layer, “h” is the hidden layer, and “y” is the output layer. A, B, and C are the network
parameters used to improve the output of the model. At any given time t, the current input is a combination of
input at x(t) and x(t-1). The output at any given time is fetched back to the network to improve on the output.
                            Fig: Fully connected Recurrent Neural Network
                                                                                                                  3
Recurrent Neural Networks - Work
• In Recurrent Neural networks, the information cycles through a loop to the middle hidden layer.
• The input layer ‘x’ takes in the input to the neural network and processes it and passes it onto the middle layer.
• The middle layer ‘h’ can consist of multiple hidden layers, each with its own activation functions and weights
  and biases. If you have a neural network where the various parameters of different hidden layers are not
  affected by the previous layer, ie: the neural network does not have memory, then you can use a recurrent
  neural network.
• The Recurrent Neural Network will standardize the different activation functions and weights and biases so that
  each hidden layer has the same parameters. Then, instead of creating multiple hidden layers, it will create one
  and loop over it as many times as required.
                               Fig: Working of Recurrent Neural Network
                                                                                                             4
Types of Recurrent Neural Networks
There are four types of Recurrent Neural Networks:
• One to One
• One to Many
• Many to One
• Many to Many
One to One RNN
This type of neural network is known as the Vanilla Neural Network. It's used for general machine learning
problems, which has a single input and a single output.
                                                                                                             5
Types of Recurrent Neural Networks (Cont…)
One to Many RNN
This type of neural network has a single input and multiple outputs. An example of this is the image caption.
Many to One RNN
This RNN takes a sequence of inputs and generates a single output.
Sentiment analysis is a good example of this kind of network where a given
sentence can be classified as expressing positive or negative sentiments.
                                                                                                                6
Types of Recurrent Neural Networks (Cont…)
Many to Many RNN
This RNN takes a sequence of inputs and generates a sequence of outputs. Machine translation is one of the
examples.
                                                                                                             7
Two Issues of Standard RNNs (Cont…)
1. Vanishing Gradient Problem
2. Exploding Gradient Problem
Vanishing Gradient Problem
Recurrent Neural Networks enable you to model time-dependent and sequential data problems, such as stock market
prediction, machine translation, and text generation. You will find, however, RNN is hard to train because of the
gradient problem.
RNNs suffer from the problem of vanishing gradients. The gradients carry information used in the RNN, and when the
gradient becomes too small, the parameter updates become insignificant. This makes the learning of long data
sequences difficult.
1. As the RNN trains, the gradients (used to adjust the model’s weights) become very small.
2. This makes it hard for the model to learn or remember information from earlier parts of the sequence.
3. The model repeatedly multiplies small numbers during backpropagation, making the gradients shrink to almost
   zero.
Impact:
The RNN forgets long-term context and only focuses on recent inputs.
                                                                                                            8
Two Issues of Standard RNNs
1. Vanishing Gradient Problem
2. Exploding Gradient Problem
Exploding Gradient Problem
While training a neural network, if the slope tends to grow exponentially instead of decaying, this is called an Exploding
Gradient. This problem arises when large error gradients accumulate, resulting in very large updates to the neural
network model weights during the training process.
Long training time, poor performance, and bad accuracy are the major issues in gradient problems.
• Sometimes, the gradients become extremely large during training.
• This makes the model’s weight updates go out of control, causing errors outputs.
• The model repeatedly multiplies large numbers, making the gradients grow bigger and bigger.
Impact:
Training becomes unstable, and the model fails to learn properly.
Issue                        Cause                        Impact                       Solution
                             Weights < 1 (during          Cannot learn long-term       LSTMs, GRUs, ReLU,
Vanishing Gradient
                             backpropagation)             dependencies                 Clipping
                             Weights > 1 (during                                       Gradient Clipping,
Exploding Gradient                                        Training instability
                             backpropagation)                                          Regularization
                                                                                                                    9
Long Short-Term Memory (LSTM)
1. Long Short-Term Memory (LSTM) is a special type of Recurrent Neural Network (RNN) designed to better handle
   the vanishing gradient problem and learn long-term dependencies in sequential data. LSTMs are particularly useful
   for tasks like language modeling, text generation, machine translation, and time-series forecasting.
Why LSTMs?
Standard RNNs struggle to learn long-term dependencies because their gradients can either vanish (become too small)
or explode (become too large) during backpropagation. This makes them ineffective for tasks where context over long
sequences is important. LSTMs overcome this limitation through their unique architecture that allows them to
remember information for longer periods.
                                                                                                              10
Long Short-Term Memory (LSTM)
Structure of LSTM
Cell State (Ct):
      The cell state acts as the memory of the LSTM. It carries information across time steps and can be modified by
      different gates. This is what allows LSTMs to maintain long-term dependencies.
Hidden State (ht):
      The hidden state is used for the output at each time step and is influenced by the cell state.
Gates:
      Gates are neural network layers that control the flow of information through the cell state.
      They use sigmoid σ(x) = 1 / (1 + e^(-x)) or tanh tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x)) activation functions.
      The gates include:
           Forget Gate: Decides what information from the cell state should be discarded.
           Input Gate: Decides what new information should be added to the cell state.
           Output Gate: Decides what part of the cell state should be output as the hidden state.
                                                                                                                    11
Gated Recurrent Unit (GRU) Networks
• GRU is another type of RNN that is designed to address the vanishing gradient problem.
• It has two gates: the reset gate and the update gate.
• The reset gate determines how much of the previous state should be forgotten, while the update gate determines
  how much of the new state should be remembered.
• This allows the GRU network to selectively update its internal state based on the input sequence.
How GRUs Work in Simple Terms
Think of GRUs as having a mechanism to decide what to remember and what to forget at each step:
     • Update Gate: Controls how much of the past should be kept and how much should be replaced with new
        information.
     • Reset Gate: Helps decide how much of the past should be ignored when generating the new hidden state.
                                                                                                            12
Compare GRU vs LSTM
Here is a comparison of Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM) networks
                                                                        GRU                                                                     LSTM
       Structure                           Simpler structure with two gates (update and reset gate)             More complex structure with three gates (input, forget, and output gate)
                               Fewer parameters (3 weight matrices - update gate, reset gate and candidate    More parameters (4 weight matrices - candidate cell state, input, forget, and
      Parameters
                                                            hidden state)                                                                  output gate)
        Training                                                   Faster to train                                                            Slow to train
                                In most cases, GRU tend to use fewer memory resources due to its simpler      LSTM has a more complex structure and a larger number of parameters, thus
  Space Complexity                structure and fewer parameters, thus better suited for large datasets or     might require more memory resources and could be less effective for large
                                                               sequences.                                                             datasets or sequences.
                              Generally performed similarly to LSTM on many tasks, but in some cases, GRU      LSTM generally performs well on many tasks but is more computationally
     Performance              has been shown to outperform LSTM and vice versa. It's better to try both and   expensive and requires more memory resources. LSTM has advantages over
                                            see which works better for your dataset and task.                   GRU in natural language understanding and machine translation tasks.
Thank You
            14