0% found this document useful (0 votes)
14 views51 pages

Final PDL - Unit IV

The document discusses Recurrent Neural Networks (RNNs) and their importance in handling sequential data, addressing limitations of traditional feed-forward neural networks. It highlights the structure and functionality of Long Short-Term Memory (LSTM) networks, which are designed to learn long-term dependencies and manage memory effectively. Additionally, it outlines various applications of LSTMs, including speech recognition, machine translation, and time series prediction.

Uploaded by

tweeshasomaiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views51 pages

Final PDL - Unit IV

The document discusses Recurrent Neural Networks (RNNs) and their importance in handling sequential data, addressing limitations of traditional feed-forward neural networks. It highlights the structure and functionality of Long Short-Term Memory (LSTM) networks, which are designed to learn long-term dependencies and manage memory effectively. Additionally, it outlines various applications of LSTMs, including speech recognition, machine translation, and time series prediction.

Uploaded by

tweeshasomaiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

/ 1

UNIT – IV
RNN-LSTM

TY AI-DS Subject : Principles of Deep Learning


UNIT IV
2

Recurrent Neural Networks:


Sequences of Unequal Length
Learning with Recurrent Neural Networks
Adding Feedback Loops and Unfolding
Building Recurrent Neural Network
Case Study: Long Short-TermMemory,
Gated Recurrent Unit.
Recurrent Neural Networks (RNN)
3

Why Recurrent Neural Networks?


RNN were created because there were a few issues in the feed-forward neural network:
•Cannot handle sequential data
•Considers only the current input
•Cannot memorize previous inputs

The solution to these issues is the RNN. An RNN can handle sequential data, accepting the
current input data, and previously received inputs. RNNs can memorize previous inputs due
to their internal memory.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
NN Working

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-
inception-xception-keras/
RNN Requirement

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-
inception-xception-keras/
NN Requires fixed set of features at i/p layer. How to handle variable no of features? : RNN
th
Here Blue Stock is with1 to 9 days data (Features) to predict 10 Day Stock value, Whereas Blue Stock have only 5 to 9
days data.
RNN VS NN

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-
inception-xception-keras/
Also, if data is sequential (next prediction depends on prior) need to have special model which
remembers previous predictions(state) and use it for next prediction(feedback) : RNN
RNN VS NN

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-
inception-xception-keras/
RNN working

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-
inception-xception-keras/
RNN working

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-
inception-xception-keras/
RNN working

10

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-
inception-xception-keras/
RNN working

11

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-
inception-xception-keras/
RNN working

12

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-
inception-xception-keras/
• Regardless of how many times we unroll the n/w, the weights and bias are shared across all inputs

● In standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer.
Recurrent Neural Networks (RNN)

13

Humans don’t start their thinking from scratch every second. As you read any essay, you understand each word based on your
understanding of previous words. You don’t throw everything away and start thinking from scratch again. Your thoughts have
persistence.
Traditional neural networks can’t do this, and it seems like a major shortcoming. For example, imagine you want to
classify what kind of event is happening at every point in a movie. It’s unclear how a traditional neural network could use its
reasoning about previous events in the film to inform later ones.
Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks (RNN)
14

• RNN works on the principle of saving the output of a particular layer and feeding this
back to the input in order to predict the output of the layer.
• Below is how you can convert a Feed-Forward Neural Network into a Recurrent Neural
Network:
• The nodes in different layers of the neural network are compressed to form a single layer
of recurrent neural networks. A, B, and C are the parameters of the network.
Recurrent Neural Networks (RNN)
15

• Here, “x” is the input layer, “h” is the hidden layer, and “y” is the output layer. A, B, and
C are the network parameters used to improve the output of the model. At any given time
t, the current input is a combination of input at x(t) and x(t-1). The output at any given
time is fetched back to the network to improve on the output.
How Does Recurrent Neural Networks Work?

16

• In Recurrent Neural networks, the information cycles through a loop to the middle
hidden layer.
How Does Recurrent Neural Networks Work?

17

• The input layer ‘x’ takes in the input to the neural network and processes it and passes it
onto the middle layer.
• The middle layer ‘h’ can consist of multiple hidden layers, each with its own activation
functions and weights and biases. If you have a neural network where the various
parameters of different hidden layers are not affected by the previous layer, ie: the neural
network does not have memory, then you can use a recurrent neural network.
• The Recurrent Neural Network will standardize the different activation functions and
weights and biases so that each hidden layer has the same parameters. Then, instead of
creating multiple hidden layers, it will create one and loop over it as many times as
required.
Feed-Forward Neural Networks vs Recurrent Neural Networks

18

• A feed-forward neural network allows information to flow only in the forward direction, from the
input nodes, through the hidden layers, and to the output nodes. There are no cycles or loops in the
network.
• In a feed-forward neural network, the decisions are based on the current input. It doesn’t memorize
the past data, and there’s no future scope. Feed-forward neural networks are used in general
regression and classification problems.

RNN
NN
Recurrent Neural Networks (RNN)
19

● This chain-like nature reveals that recurrent neural networks are intimately related to sequences and
lists. They’re the natural architecture of neural network to use for such data.
● And they certainly are used! In the last few years, there have been incredible success applying RNNs
to a variety of problems: speech recognition, language modeling, translation, image captioning….
● Essential to these successes is the use of “LSTMs,” a very special kind of recurrent neural network
which works, for many tasks, much much better than the standard version.
● Almost all exciting results based on recurrent neural networks are achieved with them. It’s these
LSTMs that this essay will explore.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Types of Recurrent Neural Networks (RNN)
20

• There are four types of Recurrent Neural Networks:


• One to One
• One to Many
• Many to One
• Many to Many
Types of Recurrent Neural Networks (RNN)
21
• One to One: This type of neural network is known as the Vanilla Neural Network. It's used for general machine
learning problems, which has a single input and a single output.
• One to Many: This type of neural network has a single input and multiple outputs. An example of this is the
image caption.
• Many to One: This RNN takes a sequence of inputs and generates a single output. Sentiment analysis is a good
example of this kind of network where a given sentence can be classified as expressing positive or negative
sentiments.
Long Short Term Memory Networks (LSTM)
22

● Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning
long-term dependencies.
● LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long
periods of time is practically their default behavior, not something they struggle to learn!
● All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs,
this repeating module will have a very simple structure, such as a single tanh layer.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks (LSTM)
23

LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single
neural network layer, there are four, interacting in a very special way.

Fig. The repeating module in a standard RNN contains a single layer. Fig. The repeating module in an LSTM contains four interacting laye
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks (LSTM)
24

The memory cell is controlled by three gates: the


input gate, the forget gate, and the output gate.
These gates decide what information to add to,
remove from, and output from the memory cell. The
input gate controls what information is added to the
memory cell. The forget gate controls what
information is removed from the memory cell. And
the output gate controls what information is output
from the memory cell. This allows LSTM networks to
selectively retain or discard information as it flows
through the network, which allows them to learn
long-term dependencies.

https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks (LSTM)
25

Green Line: Long term memory(Also called Cell State) with no weights
Pink Line: Short Term Memory(Hidden State) with Weights
This First Stage of LSTM called as Forget Gate determines how much percent of Long term memory to forget.
For Positive i/p, long term memory initial value is reduced. For –ve i/p it becomes zero, as a result of sigmoid
activation function.

Fig. The repeating module in an LSTM contains four interacting layers.


https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks (LSTM)
26
Second Unit of LSTM, called as Input Gate(2 Boxes), its determine Potential Long Term Memory(Right Box) and %
of Potential Memory to Remember(Left Box)
Right Box used Tanh function. Left Box Use Sigmoid

Fig. The repeating module in an LSTM contains four interacting layers.


https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks (LSTM)
27

Final Unit of LSTM, called as OutputGate(2 Boxes), it updates Short Term Memory
Right Box used Tanh function. Left Box Use Sigmoid

Fig. The repeating module in an LSTM contains four interacting layers.


https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks (LSTM)
28

Fig. The repeating module in an LSTM contains four interacting layers.


https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks
29
(LSTM):Day 1 as i/p for Company A

Fig. The repeating module in an LSTM contains four interacting layers.


https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks
30
(LSTM):Day 1 as i/p for Company A

Fig. The repeating module in an LSTM contains four interacting layers.


https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks
31
(LSTM):Day 2 as i/p for Company A

Fig. The repeating module in an LSTM contains four interacting layers.


https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks
32
(LSTM):Day 4 as i/p for Company A

Final O/p Prediction for


Day 5 with i/p as Day 4

Fig. The repeating module in an LSTM contains four interacting layers.


https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks
33
(LSTM):For Company B

Final O/p Prediction for


Day 2 with i/p as Day 1

Fig. The repeating module in an LSTM contains four interacting layers.


https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks
34
(LSTM):For Company B

Final O/p Prediction for


Day 5 with i/p as Day 4

Fig. The repeating module in an LSTM contains four interacting layers.


https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Architecture for an LSTM
“Bits of Decide Decide Longterm-short term model
memory” what what
to to
forget insert

Combine with
transformed xt

σ: output in [0,1]
tanh: output in [-1,+1]

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Walkthrough
What part of
memory to
“forget” – zero
means forget this
bit
Walkthrough
What bits to insert
into the next states

What content to store


into the next state

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Walkthrough
Next memory cell
content – mixture of
not-forgotten part of
previous cell and
insertion

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Walkthrough
What part of cell to
output

tanh maps bits to [-


1,+1] range

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
(1)
Architecture for an LSTM

(2)

Ct-1 (3)
it ot
ft

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Implementing an LSTM
For t = 1,…,T:
(1)

(2)

(3)

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTMs can be used for other sequence tasks

image sequence named entity


captioning classification translation
recognition

http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Character-level language model

Test time:
• pick a seed
character
sequence
• generate the
next character
• then the next
• then the next …

http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Long Short Term Memory Networks (LSTM)
44

Applications of LSTM:
• Speech Recognition (Input is audio and output is text) - as done by Google Assistant, Microsoft Cortana,
Apple Siri
• Machine Translation (Input is text and output is also text) - as done by Google Translate
• Image Captioning (Input is image and output is text)
• Sentiment Analysis (Input is text and output is rating)
• Music Generation/Synthesis ( input music notes and output is music)
• Video Activity Recognition (input is video and output is type of activity)
• Time series prediction ( Forecasting)
RNN and LSTM
45

● Advantages and Disadvantages


CNN vs RNN
46

https://searchenterpriseai.techtarget.com/feature/CNN-vs-RNN-How-they-differ-and-where-they-overlap
47
GAN-[content beyond syllabus]
48

• Generative Adversarial Networks (GANs) are a powerful class of neural networks that are used
for unsupervised learning.
• Generative Adversarial Networks (GANs) were first introduced in 2014 by Ian Goodfellow et. al.
and since then this topic itself opened up a new area of research.
• GAN-Generative Adversarial Networks-an approach to generative modeling using deep
learning methods, such as convolutional neural networks.
• Generative modeling is an unsupervised learning task in machine learning that involves
automatically discovering and learning the regularities or patterns in input data in such a way
that the model can be used to generate or output new examples that plausibly could have been
drawn from the original dataset.
• GANs are a clever way of training a generative model by framing the problem as a supervised
learning problem with two sub-models: the generator model that we train to generate new
examples, and the discriminator model that tries to classify examples as either real (from the
domain) or fake (generated). The two models are trained together in a zero-sum game,
adversarial, until the discriminator model is fooled about half the time, meaning the generator
model is generating plausible examples.
GAN
49

Generative Adversarial Networks (GANs) can be


broken down into three parts:
•Generative: To learn a generative model, which
describes how data is generated in terms of a
probabilistic model.
•Adversarial: The training of a model is done in an
adversarial setting.
•Networks: Use deep neural networks as the
artificial intelligence (AI) algorithms for training
purpose.
GAN
50

• GANs are an exciting and rapidly changing field, delivering on the promise of
generative models in their ability to generate realistic examples across a range of
problem domains, most notably in image-to-image translation tasks such as
translating photos of summer to winter or day to night, and in generating photorealistic
photos of objects, scenes, and people that even humans cannot tell are fake.

• With the invention of GANs, Generative Models had started showing promising results
in generating realistic images. GANs has shown tremendous success in Computer
Vision. In recent times, it started showing promising results in Audio, Text as well.
• Some of the most popular GAN formulations are:
• Transforming an image from one domain to another (CycleGAN),
• Generating an image from a textual description (text-to-image),
• Generating very high-resolution images (ProgressiveGAN) and many more.
GAN-Types
51

Basic
• Generative Adversarial Network (GAN)
• Deep Convolutional Generative Adversarial Network (DCGAN)
Extensions
• Conditional Generative Adversarial Network (cGAN)
• Information Maximizing Generative Adversarial Network (InfoGAN)
• Auxiliary Classifier Generative Adversarial Network (AC-GAN)
• Stacked Generative Adversarial Network (StackGAN)
• Context Encoders
• Pix2Pix
Advanced
• Wasserstein Generative Adversarial Network (WGAN)
• Cycle-Consistent Generative Adversarial Network (CycleGAN)
• Progressive Growing Generative Adversarial Network (Progressive GAN)
• Style-Based Generative Adversarial Network (StyleGAN)
• Big Generative Adversarial Network (BigGAN)

You might also like