0% found this document useful (0 votes)

14 views51 pages

Final PDL - Unit IV

The document discusses Recurrent Neural Networks (RNNs) and their importance in handling sequential data, addressing limitations of traditional feed-forward neural networks. It highlights the structure and functionality of Long Short-Term Memory (LSTM) networks, which are designed to learn long-term dependencies and manage memory effectively. Additionally, it outlines various applications of LSTMs, including speech recognition, machine translation, and time series prediction.

Uploaded by

tweeshasomaiya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views51 pages

Final PDL - Unit IV

Uploaded by

tweeshasomaiya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

/ 1

UNIT – IV
RNN-LSTM

TY AI-DS Subject : Principles of Deep Learning

UNIT IV
2

Recurrent Neural Networks:

Sequences of Unequal Length
Learning with Recurrent Neural Networks
Adding Feedback Loops and Unfolding
Building Recurrent Neural Network
Case Study: Long Short-TermMemory,
Gated Recurrent Unit.
Recurrent Neural Networks (RNN)
3

Why Recurrent Neural Networks?

RNN were created because there were a few issues in the feed-forward neural network:
•Cannot handle sequential data
•Considers only the current input
•Cannot memorize previous inputs

The solution to these issues is the RNN. An RNN can handle sequential data, accepting the
current input data, and previously received inputs. RNNs can memorize previous inputs due
to their internal memory.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
NN Working

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-
inception-xception-keras/
RNN Requirement

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-
inception-xception-keras/
NN Requires fixed set of features at i/p layer. How to handle variable no of features? : RNN
th
Here Blue Stock is with1 to 9 days data (Features) to predict 10 Day Stock value, Whereas Blue Stock have only 5 to 9
days data.
RNN VS NN

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-
inception-xception-keras/
Also, if data is sequential (next prediction depends on prior) need to have special model which
remembers previous predictions(state) and use it for next prediction(feedback) : RNN
RNN VS NN

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-
inception-xception-keras/
RNN working

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-
inception-xception-keras/
• Regardless of how many times we unroll the n/w, the weights and bias are shared across all inputs

● In standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer.
Recurrent Neural Networks (RNN)

Humans don’t start their thinking from scratch every second. As you read any essay, you understand each word based on your
understanding of previous words. You don’t throw everything away and start thinking from scratch again. Your thoughts have
persistence.
Traditional neural networks can’t do this, and it seems like a major shortcoming. For example, imagine you want to
classify what kind of event is happening at every point in a movie. It’s unclear how a traditional neural network could use its
reasoning about previous events in the film to inform later ones.
Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks (RNN)
14

• RNN works on the principle of saving the output of a particular layer and feeding this
back to the input in order to predict the output of the layer.
• Below is how you can convert a Feed-Forward Neural Network into a Recurrent Neural
Network:
• The nodes in different layers of the neural network are compressed to form a single layer
of recurrent neural networks. A, B, and C are the parameters of the network.
Recurrent Neural Networks (RNN)
15

• Here, “x” is the input layer, “h” is the hidden layer, and “y” is the output layer. A, B, and
C are the network parameters used to improve the output of the model. At any given time
t, the current input is a combination of input at x(t) and x(t-1). The output at any given
time is fetched back to the network to improve on the output.
How Does Recurrent Neural Networks Work?

• In Recurrent Neural networks, the information cycles through a loop to the middle
hidden layer.
How Does Recurrent Neural Networks Work?

• The input layer ‘x’ takes in the input to the neural network and processes it and passes it
onto the middle layer.
• The middle layer ‘h’ can consist of multiple hidden layers, each with its own activation
functions and weights and biases. If you have a neural network where the various
parameters of different hidden layers are not affected by the previous layer, ie: the neural
network does not have memory, then you can use a recurrent neural network.
• The Recurrent Neural Network will standardize the different activation functions and
weights and biases so that each hidden layer has the same parameters. Then, instead of
creating multiple hidden layers, it will create one and loop over it as many times as
required.
Feed-Forward Neural Networks vs Recurrent Neural Networks

• A feed-forward neural network allows information to flow only in the forward direction, from the
input nodes, through the hidden layers, and to the output nodes. There are no cycles or loops in the
network.
• In a feed-forward neural network, the decisions are based on the current input. It doesn’t memorize
the past data, and there’s no future scope. Feed-forward neural networks are used in general
regression and classification problems.

RNN
NN
Recurrent Neural Networks (RNN)
19

● This chain-like nature reveals that recurrent neural networks are intimately related to sequences and
lists. They’re the natural architecture of neural network to use for such data.
● And they certainly are used! In the last few years, there have been incredible success applying RNNs
to a variety of problems: speech recognition, language modeling, translation, image captioning….
● Essential to these successes is the use of “LSTMs,” a very special kind of recurrent neural network
which works, for many tasks, much much better than the standard version.
● Almost all exciting results based on recurrent neural networks are achieved with them. It’s these
LSTMs that this essay will explore.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Types of Recurrent Neural Networks (RNN)
20

• There are four types of Recurrent Neural Networks:

• One to One
• One to Many
• Many to One
• Many to Many
Types of Recurrent Neural Networks (RNN)
21
• One to One: This type of neural network is known as the Vanilla Neural Network. It's used for general machine
learning problems, which has a single input and a single output.
• One to Many: This type of neural network has a single input and multiple outputs. An example of this is the
image caption.
• Many to One: This RNN takes a sequence of inputs and generates a single output. Sentiment analysis is a good
example of this kind of network where a given sentence can be classified as expressing positive or negative
sentiments.
Long Short Term Memory Networks (LSTM)
22

● Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning
long-term dependencies.
● LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long
periods of time is practically their default behavior, not something they struggle to learn!
● All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs,
this repeating module will have a very simple structure, such as a single tanh layer.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks (LSTM)
23

LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single
neural network layer, there are four, interacting in a very special way.

Fig. The repeating module in a standard RNN contains a single layer. Fig. The repeating module in an LSTM contains four interacting laye
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks (LSTM)
24

The memory cell is controlled by three gates: the

input gate, the forget gate, and the output gate.
These gates decide what information to add to,
remove from, and output from the memory cell. The
input gate controls what information is added to the
memory cell. The forget gate controls what
information is removed from the memory cell. And
the output gate controls what information is output
from the memory cell. This allows LSTM networks to
selectively retain or discard information as it flows
through the network, which allows them to learn
long-term dependencies.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks (LSTM)
25

Green Line: Long term memory(Also called Cell State) with no weights
Pink Line: Short Term Memory(Hidden State) with Weights
This First Stage of LSTM called as Forget Gate determines how much percent of Long term memory to forget.
For Positive i/p, long term memory initial value is reduced. For –ve i/p it becomes zero, as a result of sigmoid
activation function.

Fig. The repeating module in an LSTM contains four interacting layers.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks (LSTM)
26
Second Unit of LSTM, called as Input Gate(2 Boxes), its determine Potential Long Term Memory(Right Box) and %
of Potential Memory to Remember(Left Box)
Right Box used Tanh function. Left Box Use Sigmoid

Fig. The repeating module in an LSTM contains four interacting layers.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks (LSTM)
27

Final Unit of LSTM, called as OutputGate(2 Boxes), it updates Short Term Memory
Right Box used Tanh function. Left Box Use Sigmoid

Fig. The repeating module in an LSTM contains four interacting layers.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks (LSTM)
28

Fig. The repeating module in an LSTM contains four interacting layers.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks
29
(LSTM):Day 1 as i/p for Company A

Fig. The repeating module in an LSTM contains four interacting layers.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks
30
(LSTM):Day 1 as i/p for Company A

Fig. The repeating module in an LSTM contains four interacting layers.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks
31
(LSTM):Day 2 as i/p for Company A

Fig. The repeating module in an LSTM contains four interacting layers.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks
32
(LSTM):Day 4 as i/p for Company A

Final O/p Prediction for

Day 5 with i/p as Day 4

Fig. The repeating module in an LSTM contains four interacting layers.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks
33
(LSTM):For Company B

Final O/p Prediction for

Day 2 with i/p as Day 1

Fig. The repeating module in an LSTM contains four interacting layers.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks
34
(LSTM):For Company B

Final O/p Prediction for

Day 5 with i/p as Day 4

Fig. The repeating module in an LSTM contains four interacting layers.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Architecture for an LSTM
“Bits of Decide Decide Longterm-short term model
memory” what what
to to
forget insert

Combine with
transformed xt

σ: output in [0,1]
tanh: output in [-1,+1]

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Walkthrough
What part of
memory to
“forget” – zero
means forget this
bit
Walkthrough
What bits to insert
into the next states

What content to store

into the next state

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Walkthrough
Next memory cell
content – mixture of
not-forgotten part of
previous cell and
insertion

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Walkthrough
What part of cell to
output

tanh maps bits to [-

1,+1] range

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
(1)
Architecture for an LSTM

(2)

Ct-1 (3)
it ot
ft

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Implementing an LSTM
For t = 1,…,T:
(1)

(2)

(3)

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTMs can be used for other sequence tasks

image sequence named entity

captioning classification translation
recognition

http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Character-level language model

Test time:
• pick a seed
character
sequence
• generate the
next character
• then the next
• then the next …

http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Long Short Term Memory Networks (LSTM)
44

Applications of LSTM:
• Speech Recognition (Input is audio and output is text) - as done by Google Assistant, Microsoft Cortana,
Apple Siri
• Machine Translation (Input is text and output is also text) - as done by Google Translate
• Image Captioning (Input is image and output is text)
• Sentiment Analysis (Input is text and output is rating)
• Music Generation/Synthesis ( input music notes and output is music)
• Video Activity Recognition (input is video and output is type of activity)
• Time series prediction ( Forecasting)
RNN and LSTM
45

● Advantages and Disadvantages

CNN vs RNN
46

https://searchenterpriseai.techtarget.com/feature/CNN-vs-RNN-How-they-differ-and-where-they-overlap
47
GAN-[content beyond syllabus]
48

• Generative Adversarial Networks (GANs) are a powerful class of neural networks that are used
for unsupervised learning.
• Generative Adversarial Networks (GANs) were first introduced in 2014 by Ian Goodfellow et. al.
and since then this topic itself opened up a new area of research.
• GAN-Generative Adversarial Networks-an approach to generative modeling using deep
learning methods, such as convolutional neural networks.
• Generative modeling is an unsupervised learning task in machine learning that involves
automatically discovering and learning the regularities or patterns in input data in such a way
that the model can be used to generate or output new examples that plausibly could have been
drawn from the original dataset.
• GANs are a clever way of training a generative model by framing the problem as a supervised
learning problem with two sub-models: the generator model that we train to generate new
examples, and the discriminator model that tries to classify examples as either real (from the
domain) or fake (generated). The two models are trained together in a zero-sum game,
adversarial, until the discriminator model is fooled about half the time, meaning the generator
model is generating plausible examples.
GAN
49

Generative Adversarial Networks (GANs) can be

broken down into three parts:
•Generative: To learn a generative model, which
describes how data is generated in terms of a
probabilistic model.
•Adversarial: The training of a model is done in an
adversarial setting.
•Networks: Use deep neural networks as the
artificial intelligence (AI) algorithms for training
purpose.
GAN
50

• GANs are an exciting and rapidly changing field, delivering on the promise of
generative models in their ability to generate realistic examples across a range of
problem domains, most notably in image-to-image translation tasks such as
translating photos of summer to winter or day to night, and in generating photorealistic
photos of objects, scenes, and people that even humans cannot tell are fake.

• With the invention of GANs, Generative Models had started showing promising results
in generating realistic images. GANs has shown tremendous success in Computer
Vision. In recent times, it started showing promising results in Audio, Text as well.
• Some of the most popular GAN formulations are:
• Transforming an image from one domain to another (CycleGAN),
• Generating an image from a textual description (text-to-image),
• Generating very high-resolution images (ProgressiveGAN) and many more.
GAN-Types
51

Basic
• Generative Adversarial Network (GAN)
• Deep Convolutional Generative Adversarial Network (DCGAN)
Extensions
• Conditional Generative Adversarial Network (cGAN)
• Information Maximizing Generative Adversarial Network (InfoGAN)
• Auxiliary Classifier Generative Adversarial Network (AC-GAN)
• Stacked Generative Adversarial Network (StackGAN)
• Context Encoders
• Pix2Pix
Advanced
• Wasserstein Generative Adversarial Network (WGAN)
• Cycle-Consistent Generative Adversarial Network (CycleGAN)
• Progressive Growing Generative Adversarial Network (Progressive GAN)
• Style-Based Generative Adversarial Network (StyleGAN)
• Big Generative Adversarial Network (BigGAN)

RNNs: A Guide for AI Enthusiasts
No ratings yet
RNNs: A Guide for AI Enthusiasts
83 pages
RNNs Explained for Tech Enthusiasts
No ratings yet
RNNs Explained for Tech Enthusiasts
6 pages
What Is A Recurrent Neural Network
No ratings yet
What Is A Recurrent Neural Network
36 pages
Sequence Modeling
100% (1)
Sequence Modeling
131 pages
DeepLearning SecC
No ratings yet
DeepLearning SecC
20 pages
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
No ratings yet
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
4 pages
Unit 5
No ratings yet
Unit 5
76 pages
Unit III - Recurrent Neural Networks
No ratings yet
Unit III - Recurrent Neural Networks
44 pages
DL Half TechKnowledge
No ratings yet
DL Half TechKnowledge
50 pages
Unit 3
No ratings yet
Unit 3
8 pages
RNN Part1
No ratings yet
RNN Part1
12 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
13 pages
LSTM Ucl
100% (1)
LSTM Ucl
35 pages
15.03.2024 Csa3007 A24+d23+d24
No ratings yet
15.03.2024 Csa3007 A24+d23+d24
8 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
RNNs: Understanding and Applications
No ratings yet
RNNs: Understanding and Applications
30 pages
REPORT
No ratings yet
REPORT
24 pages
Intro to Recurrent Neural Networks
No ratings yet
Intro to Recurrent Neural Networks
79 pages
Unit-Iv DL
No ratings yet
Unit-Iv DL
54 pages
RNNs & LSTMs for Tech Enthusiasts
No ratings yet
RNNs & LSTMs for Tech Enthusiasts
9 pages
Endsem Imp DL Unit 4
No ratings yet
Endsem Imp DL Unit 4
30 pages
Understanding LSTM Networks - Colah's Blog
No ratings yet
Understanding LSTM Networks - Colah's Blog
15 pages
Deep & Reinforcement - Unit 4
No ratings yet
Deep & Reinforcement - Unit 4
17 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
42 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
99 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
8 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
RNN Introduction
No ratings yet
RNN Introduction
22 pages
Lec 4 Recurrent Neural Network Long Short-Term Memory
No ratings yet
Lec 4 Recurrent Neural Network Long Short-Term Memory
32 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
15 pages
LSTM
No ratings yet
LSTM
123 pages
RNN 2
No ratings yet
RNN 2
144 pages
Deep Learning - AD3501 - Notes - Unit 3 - Recurrent Neural Networks
No ratings yet
Deep Learning - AD3501 - Notes - Unit 3 - Recurrent Neural Networks
29 pages
Understanding LSTM Networks - Colah's Blog
No ratings yet
Understanding LSTM Networks - Colah's Blog
7 pages
RNN LSTM
No ratings yet
RNN LSTM
37 pages
Deep Learning RNN
100% (2)
Deep Learning RNN
53 pages
LSTM Networks for AI Enthusiasts
No ratings yet
LSTM Networks for AI Enthusiasts
8 pages
1 Recurrent Neural Networks
No ratings yet
1 Recurrent Neural Networks
34 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
Lab 9 RNN
No ratings yet
Lab 9 RNN
8 pages
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
No ratings yet
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
9 pages
RNN Notes
No ratings yet
RNN Notes
45 pages
Unit V Recurrent Neural Networks
No ratings yet
Unit V Recurrent Neural Networks
35 pages
LSTM
No ratings yet
LSTM
22 pages
Survey of Prediction Using Recurrent Neural Network
No ratings yet
Survey of Prediction Using Recurrent Neural Network
3 pages
RNNs: Design, Advantages, and Challenges
No ratings yet
RNNs: Design, Advantages, and Challenges
30 pages
Sequence Modeling Recurrent Neural Networks
No ratings yet
Sequence Modeling Recurrent Neural Networks
18 pages
Mod 6
No ratings yet
Mod 6
48 pages
Module 06
No ratings yet
Module 06
5 pages
Recurrent Neural Networks
100% (1)
Recurrent Neural Networks
14 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
RNNs and Their Types - Simple Explanation
No ratings yet
RNNs and Their Types - Simple Explanation
5 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
CNN RNN LSTM GRU Simple
100% (3)
CNN RNN LSTM GRU Simple
20 pages
Unit-Iv DL
No ratings yet
Unit-Iv DL
23 pages
Payload
No ratings yet
Payload
2 pages
HMW1 PDF
No ratings yet
HMW1 PDF
11 pages
Case Study
No ratings yet
Case Study
4 pages
Anytime A-Star Algorithm
No ratings yet
Anytime A-Star Algorithm
4 pages
Assignment Two: Search: Question One
No ratings yet
Assignment Two: Search: Question One
2 pages
Digital Signal Processing (DSP) Course File (A.Y.2021-2022) (Academic Regulations - 2018)
No ratings yet
Digital Signal Processing (DSP) Course File (A.Y.2021-2022) (Academic Regulations - 2018)
233 pages
Machine Learning (BCS-055) QUS & ANS
No ratings yet
Machine Learning (BCS-055) QUS & ANS
29 pages
Laboratory in Automatic Control: Lab 5 System Performance
No ratings yet
Laboratory in Automatic Control: Lab 5 System Performance
20 pages
CFD Solution Algorithms Explained
No ratings yet
CFD Solution Algorithms Explained
5 pages
BookSlides 7A Error-Based Learning
No ratings yet
BookSlides 7A Error-Based Learning
49 pages
Diabetes Classification with COA-LS-SVM
No ratings yet
Diabetes Classification with COA-LS-SVM
11 pages
Signals Analysis: DTFS For Periodic Signals
No ratings yet
Signals Analysis: DTFS For Periodic Signals
31 pages
Simplex Method for Students
No ratings yet
Simplex Method for Students
8 pages
3.5 Optimal Merge Patterns
No ratings yet
3.5 Optimal Merge Patterns
9 pages
DSP QB - 16ec422 - Prasad
No ratings yet
DSP QB - 16ec422 - Prasad
15 pages
Dsa Roadmap
No ratings yet
Dsa Roadmap
3 pages
Computational Statistics in Data Science 1st Edition Walter W. Piegorsch PDF Download
100% (2)
Computational Statistics in Data Science 1st Edition Walter W. Piegorsch PDF Download
41 pages
Dsa Sheet
No ratings yet
Dsa Sheet
85 pages
Lecture #2: C Camera Model
No ratings yet
Lecture #2: C Camera Model
38 pages
Cryptography Forouzan
No ratings yet
Cryptography Forouzan
56 pages
Sudoku Using Constraint Satisfaction
No ratings yet
Sudoku Using Constraint Satisfaction
12 pages
2 Marks Questions
No ratings yet
2 Marks Questions
6 pages
Lab 2 3 Activity Selection
No ratings yet
Lab 2 3 Activity Selection
25 pages
Chapter 2
No ratings yet
Chapter 2
25 pages
Lec 4A Bisection
No ratings yet
Lec 4A Bisection
32 pages
Lecture 10
100% (1)
Lecture 10
5 pages
Number Theory: Modular Arithmetic
No ratings yet
Number Theory: Modular Arithmetic
4 pages
Complete Resources of DSA
No ratings yet
Complete Resources of DSA
8 pages
Transportation Problem (TP) and Assignment Problem (AP) : (Special Cases of Linear Programming)
No ratings yet
Transportation Problem (TP) and Assignment Problem (AP) : (Special Cases of Linear Programming)
37 pages
SRTF Preemptive SJF
No ratings yet
SRTF Preemptive SJF
5 pages