DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
Anna University Regulation: 2021
AD3501 - DEEP LEARNING
III Year/V Semester
UNIT III - RECURRENT NEURAL NETWORKS
AD3501_DL
UNIT III RECURRENT NEURAL NETWORKS
Unfolding Graphs -- RNN Design Patterns: Acceptor -- Encoder --Transducer; Gradient
Computation -- Sequence Modeling Conditioned on Contexts -- Bidirectional RNN -- Sequence to
Sequence RNN – Deep Recurrent Networks -- Recursive Neural Networks -- Long Term
Dependencies; Leaky Units: Skip connections and dropouts; Gated Architecture: LSTM.
1. What is the purpose of unfolding in RNNs?
A:
Unfolding in RNNs refers to converting the recurrent structure into a series of sequential layers across
time steps. This process allows the network to process sequences step-by-step and facilitates
backpropagation through time (BPTT) for gradient computation. It makes temporal dependencies
explicit, enabling learning from time-ordered data.
2. What is an RNN acceptor?
A:
An RNN acceptor is a design pattern where the network takes an input sequence and produces a single
output, such as a binary classification (e.g., sentiment analysis). It processes the entire sequence and
uses the final hidden state to make predictions.
3. What is an RNN encoder?
A:
An RNN encoder processes an input sequence and compresses it into a fixed-length vector
representation, called the context vector. This vector encodes information about the entire sequence
and is often used as input for another network, such as in encoder-decoder architectures for machine
translation.
4. What is a transducer in RNNs?
A:
An RNN transducer maps an input sequence to an output sequence of the same or different length. It
processes sequences step-by-step, generating corresponding outputs. Examples include time-series
forecasting and speech-to-text systems.
5. What is backpropagation through time (BPTT)?
A:
BPTT is an extension of standard backpropagation for training RNNs. It computes gradients by unrolling
the RNN across all time steps and applying the chain rule to propagate errors backward through time.
This enables the network to learn temporal dependencies.
AD3501_DL
6. What are vanishing and exploding gradients in RNNs?
A:
● Vanishing gradients: Gradients become too small during backpropagation, preventing
the network from learning long-term dependencies.
● Exploding gradients: Gradients grow uncontrollably large, destabilizing
training. These issues arise due to repeated multiplication of gradients during
BPTT.
7. What are bidirectional RNNs?
A:
Bidirectional RNNs process sequences in both forward and backward directions by maintaining two
hidden layers for each time step. This allows the network to use both past and future context,
improving performance on tasks like speech recognition and named entity recognition.
8. What is sequence-to-sequence (Seq2Seq) modeling?
A:
Seq2Seq is a framework where an encoder RNN converts an input sequence into a context vector, and
a decoder RNN generates an output sequence. It is widely used in tasks like neural machine translation
and text summarization.
9. What are deep recurrent networks?
A:
Deep recurrent networks are RNNs with multiple stacked recurrent layers. The additional layers
enhance the network's ability to learn complex patterns and hierarchical representations from
sequential data, but they also require more computational resources.
10. What are recursive neural networks?
A:
Recursive neural networks process hierarchical structures like parse trees in natural language
processing (NLP). Unlike RNNs, they operate on non-linear structures, making them suitable for tasks
like sentence parsing and scene graph generation.
11. Why do RNNs struggle with long-term dependencies?
A:
Standard RNNs have difficulty learning long-term dependencies due to the vanishing gradient problem,
where gradients diminish exponentially as they are propagated back through many time steps, leading
to poor learning of distant dependencies.
AD3501_DL
12. What are leaky units in RNNs?
A:
Leaky units introduce skip connections that allow gradients to flow directly to earlier layers, mitigating
the vanishing gradient problem. This improves the network's ability to capture long-term dependencies.
13. What is dropout in RNNs?
A:
Dropout is a regularization technique where neurons are randomly deactivated during training to
prevent overfitting. In RNNs, dropout is typically applied to non-recurrent connections to
preserve temporal dependencies.
14. What are skip connections in RNNs?
A:
Skip connections link layers that are not adjacent, allowing information to bypass certain layers. This
improves gradient flow, reduces vanishing gradient issues, and enables learning of long-term
dependencies.
15. What is the architecture of an LSTM?
A:
LSTM (Long Short-Term Memory) networks consist of memory cells and three gates (input, forget, and
output gates). These gates regulate the flow of information, allowing the network to retain or forget
data and enabling learning of long-term dependencies.
16. How does a GRU differ from an LSTM?
A:
Gated Recurrent Units (GRUs) simplify LSTMs by combining the input and forget gates into a single
update gate and removing the separate memory cell. GRUs are computationally less expensive while
achieving similar performance.
17. What is the purpose of the forget gate in LSTMs?
A:
The forget gate in LSTMs decides which information from the memory cell to discard. It ensures that
irrelevant information does not clutter the memory, improving the network's ability to focus on
important features.
18. What is the role of the context vector in Seq2Seq models?
AD3501_DL
A:
The context vector in Seq2Seq models summarizes the entire input sequence into a fixed-
length representation, which is passed to the decoder to generate the output sequence. It
encodes the necessary information for the translation task.
19. What are the advantages of bidirectional RNNs?
A:
Bidirectional RNNs leverage both past and future context, improving the network's ability to
understand sequences where the meaning of a token depends on both preceding and succeeding
elements, such as in speech and language tasks.
20. What are the applications of recursive neural networks?
A:
Recursive neural networks are used for:
● Sentence parsing: Building syntax trees for NLP tasks.
● Scene graph generation: Understanding hierarchical relationships in images.
● Hierarchical sentiment analysis: Analyzing sentiment across structured text data.
PART-B
1. Explain the concept of unfolding graphs in recurrent neural networks (RNNs). Discuss how
unfolding transforms RNNs into feedforward networks and aids in gradient computation during
training.
2. Describe the different design patterns of RNNs: acceptor, encoder, and transducer. Compare
their architectures, use cases, and applications in sequence modeling.
3. Explain how gradients are computed in RNNs using backpropagation through time (BPTT). Discuss
the challenges of vanishing and exploding gradients and their impact on training.
4. What is sequence modeling conditioned on contexts? Explain how contextual information
influences RNN outputs and provide examples of its applications in tasks like machine translation and
speech recognition.
5. Describe the architecture of bidirectional RNNs. Discuss their advantages over standard RNNs
and provide examples of tasks where bidirectional RNNs excel.
6. Explain the sequence-to-sequence RNN architecture. Discuss the role of encoder-decoder models
in sequence-to-sequence tasks like neural machine translation.
AD3501_DL
7. What are deep recurrent networks? Explain how stacking multiple RNN layers
improves representational power and discuss the challenges associated with training
deep RNNs.
8. Describe the architecture and working of recursive neural networks. Compare recursive networks
with recurrent networks and discuss their applications, such as parsing natural language.
9. What are long-term dependencies in sequence modeling? Discuss why standard RNNs struggle
with learning them and the role of advanced architectures like LSTMs in addressing this challenge.
10. What are leaky units in RNNs? Explain how skip connections and dropouts are incorporated
to address issues like vanishing gradients and improve learning efficiency.
AD3501_DL