0% found this document useful (0 votes)

30 views67 pages

Adl Unit 1 2

Notes for Advanced Deep Learning units 1 and 2 for IPU exams!

Uploaded by

bakerpuddin71

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views67 pages

Adl Unit 1 2

Notes for Advanced Deep Learning units 1 and 2 for IPU exams!

Uploaded by

bakerpuddin71

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

🏪

Advancement in Deep Learning

Unit 1:

Reviewing Deep Learning Concepts,

NN,

Advancement in Deep Learning 1

Advancement in Deep Learning 2
Advancement in Deep Learning 3
Advancement in Deep Learning 4
Advancement in Deep Learning 5
Regularization

Advancement in Deep Learning 6

Batch Normalization

Batch Normalization (BN) is a technique used to address the problem of

internal covariate shift in deep neural networks. Internal covariate shift refers
to the change in the distribution of the network activations as the parameters
of the preceding layers change during training. BN normalizes the activations
of each layer by adjusting and scaling them.

Batch Normalization offers several benefits:

Improved Training Speed: BN can accelerate the training process by

reducing the internal covariate shift, allowing for higher learning rates and
faster convergence.

Advancement in Deep Learning 7

Stabilized Gradients: BN helps stabilize the gradients, making optimization
more robust and less sensitive to weight initialization.

Regularization: BN acts as a form of regularization, reducing the need for

other regularization techniques like dropout.

Allows for Deeper Networks: BN enables the training of deeper networks

by mitigating the vanishing or exploding gradient problems.

Batch Normalization is typically applied before the activation function in each

layer of the network, although there are variations such as "Batch
Renormalization" where normalization is applied after the activation function. It
has become a standard component in many modern deep learning
architectures and is widely used in practice to improve training stability and
performance.

Layer Normalization (Out of syl)

Layer Normalization (LN) is another technique used to address the problem of
internal covariate shift in deep neural networks, similar to Batch Normalization
(BN). However, unlike BN, which normalizes activations across the mini-batch
dimension, LN normalizes activations across the feature dimension (or layer
dimension) independently for each training example.

Advancement in Deep Learning 8

Layer Normalization offers benefits similar to Batch Normalization, such as
improved training speed, stabilized gradients, and regularization. However, it
operates independently for each training example rather than across mini-
batches, which can be advantageous in certain scenarios, especially when the
size of the mini-batch is small or when dealing with recurrent neural networks
(RNNs) where the concept of mini-batches is less applicable.
Layer Normalization has found applications in various deep learning
architectures, particularly in scenarios where Batch Normalization may not be
suitable due to constraints on mini-batch size or network architecture.
Additionally, Layer Normalization has been shown to be effective in stabilizing
the training of transformers and recurrent neural networks.

Advancement in Deep Learning 9

Weight Initialization Strategies

Learning vs Optimization

Advancement in Deep Learning 10

In summary, learning in deep learning encompasses the broader process of
acquiring knowledge from data through training, while optimization refers
specifically to the process of finding the optimal set of parameters for a given
model by minimizing a defined objective function. Learning involves
optimization as a crucial step, but it also includes other components such as
data preprocessing, model architecture design, and evaluation.

1. Optimization:

Optimization, on the other hand, specifically refers to the process of

finding the best set of parameters for a given model with respect to a
certain objective function.

Advancement in Deep Learning 11

In the context of deep learning, this typically involves minimizing a loss
function that quantifies the difference between the model's predictions
and the actual targets.

Optimization algorithms are used to iteratively adjust the parameters of

the model in order to minimize this loss function.

Common optimization algorithms in deep learning include gradient

descent, stochastic gradient descent (SGD), Adam, RMSprop, and
others.

2. Learning:

In deep learning, "learning" refers to the process by which a model

acquires knowledge or understanding from data through training.

This process involves adjusting the parameters (weights and biases) of

the model based on the input data and the desired outputs, with the
goal of minimizing the difference between the model's predictions and
the actual targets.

Learning in deep learning often involves iterative updates to the model

parameters using a training algorithm such as stochastic gradient
descent (SGD) or one of its variants.

In the context of deep learning, "learning" and "optimization" are closely

related but distinct concepts.

Effective training in Deep Net

Advancement in Deep Learning 12

Early Stopping,

Advancement in Deep Learning 13

Normalization(Batch,Instance,Group)
Normalization is a data pre-processing tool used to bring the numerical data
to a common scale without distorting its shape. Generally, when we input the

Advancement in Deep Learning 14

data to a machine or deep learning algorithm we tend to change the values to
a balanced scale.

removes overfitting

Batch:

Instance:

Group:

Advancement in Deep Learning 15

Batch Gradient Descent (GD)

Advancement in Deep Learning 16

GD with momentum
Momentum is an extension to the gradient descent optimization algorithm that
allows the search to build inertia in a direction in the search space and
overcome the oscillations of noisy gradients and coast across flat spots of the
search space.

The problem with gradient descent is that the weight update at a moment (t) is
governed by the learning rate and gradient at that moment only. It doesn’t take
into account the past steps taken while traversing the cost space.

It leads to the following problems.

1. The gradient of the cost function at saddle points( plateau) is negligible or

zero, which in turn leads to small or no weight updates. Hence, the
network becomes stagnant, and learning stops

Advancement in Deep Learning 17

2. The path followed by Gradient Descent is very jittery even when operating
with mini-batch mode

How can this be used and applied to Gradient Descent?

To account for the momentum, we can use a moving average over the past
gradients. In regions where the gradient is high like AB, weight updates will be
large. Thus, in a way we are gathering momentum by taking a moving average
over these gradients. But there is a problem with this method, it considers all
the gradients over iterations with equal weightage. The gradient at t=0 has
equal weightage as that of the gradient at current iteration t. We need to use
some sort of weighted average of the past gradients such that the recent
gradients are given more weightage.

Unit 2:
Recent Trends in Deep Learning Architectures,
GANs:

Advancement in Deep Learning 18

VGG:

Inception Net:

Advancement in Deep Learning 19

Residual Network

Advancement in Deep Learning 20

Pros:

Enables training of extremely deep networks (100+ layers).

Alleviates vanishing gradient problem via skip connections.

Facilitates feature reuse and learning of residual functions.

Improves optimization by allowing gradients to flow more easily.

Cons:

Increased model complexity compared to shallower networks.

Requires careful initialization and regularization to prevent overfitting.

May suffer from degradation problem if not properly tuned.

Training can still be time-consuming and computationally intensive for

very deep architectures.

Understanding ResNet and analyzing various models on the CIFAR-10

datase

Introduction

Advancement in Deep Learning 21

Deep neural networks are very fascinating and it produces magic as we try to
predict something, maybe with images or texts. In the past 10 years, there has
been a major improvement in deep learning, especially when it comes to
Image recognition. Many researchers have been trying to develop newer
models every week to improve the existing system’s accuracy.

Challenges in building Neural Networks

One of the major challenges included how deep networks could be built.
Theoretically, it sounds cool to build deeper networks but in reality, it
encounters a problem called degradation. It is the problem of the increase in
the training error as deeper layers are constructed. This hurts our accuracy a
lot. One another problem of building deeper networks includes the vanishing
gradient descent. This happens in the backpropagation step, as we know in
the neural networks we need to adjust weights after calculating the loss
function.

While backpropagating, we follow the chain rule, the derivatives of each layer
are multiplied down the network. When we use a lot of deeper layers, and we
have hidden layers like sigmoid, we could have derivatives being scaled down
below 0.25 for each layer. So when n number of layers derivatives are
multiplied the gradient decreases exponentially as we propagate down to the
initial layers.

As told earlier when we go very deep into a network we get blocks that have
learned a lot and when we add deeper blocks to it tends to be just an identity
mapping of the earlier block that is it is just the same as the earlier block.
Degradation results suggest that there are difficulties in learning this identity
mapping. To solve these problems we come across the ResNet paper. These
are residual networks stacked together that allow us to build deep networks
without degradation or vanishing gradient descent.

Advancement in Deep Learning 22

Train and Test error visualized on 56 and 34 layers plain model
(https://arxiv.org/pdf/1512.03385.pdf)

We may think that it could be a result of overfitting too, but here the error% of
the 56-layer network is worst on both training as well as testing data which
does not happen when the model is overfitting.

Derivative of sigmoid layers (https://towardsdatascience.com/the-vanishing-

gradient-problem-69bf08b15484)

We can see the derivative of the sigmoid function is from a value of 0 to 0.25
which brings down the value and when we multiply the chain of such values
as deeper layers we go, we end up with a very small value which affects
weight updation through our loss function.

How does ResNet work?

Advancement in Deep Learning 23

Let us now understand how ResNet works. Here, we have something called
Residual blocks. Many Residual blocks are stacked together to form a ResNet.
We have “skipped connections” which are the major part of ResNet. The
following image was provided by the authors in the original paper which
denotes how a residual network works. The idea is to connect the input of a
layer directly to the output of a layer after skipping a few connections. We can
see here, x is the input to the layer which we are directly using to connect to a
layer after skipping the identity connections and if we think the output from
identity connection to be F(x). Then we can say the output will be F(x) + x.

Comparison of ResNet with Plain Networks

Now let us compare the ResNet with the plain networks. According to the
deep learning course, Andrew NG has told us that one of the main benefits of
ResNet is how they perform in training errors. If we see the plain networks, as
we increase the layers there is a decrease in train error, but after few layers,
the error goes back increasing. This is why deploy methods like early
stopping. But this behavior is solved when it comes to ResNet and as layers

Advancement in Deep Learning 24

are increasing the error only tends to decrease and don’t increase. The
authors of the ResNet paper have also given us the comparison between how
plain networks and ResNet works over 18 and 34 layers and we can see that
the ResNet gives us lesser error compared to plain networks.

Skip Connection Network

Pros:

Facilitates training of very deep networks.

Helps alleviate vanishing gradient problem.

Allows for better information flow through layers.

Enables reuse of features from earlier layers.

Cons:

Increased model complexity.

Requires careful design to optimize performance.

May lead to increased memory and computational requirements.

Training can still be challenging with extremely deep architectures.

What are Skip Connections in Deep Learning?

Advancement in Deep Learning 25

Introduction
The need for deeper networks emerges while handling complex tasks.
However, training a deep neural net has a lot of complications not only limited
to overfitting and high computation costs but also has some non-trivial
problems. In this article, we will solve some complex deep learning problems
using skip connections.

Why Skip Connections?

The beauty of deep neural networks is that they can learn complex functions
more efficiently than their shallow counterparts. While training deep neural
nets, the performance of the model drops down with the increase in depth of
the architecture. This is known as the degradation problem. But, what could
be the reasons for the saturation inaccuracy with the increase in network
depth? Let us try to understand the reasons behind the degradation problem.

Deeper Network Performance Analysis: Overfitting Discarded

One of the possible reasons could be overfitting. The model tends to overfit
with the increase in depth but that’s not the case here. As you can infer from
the below figure, the deeper network with 56 layers has more training error
than the shallow one with 20 layers. The deeper model doesn’t perform as
well as the shallow one. Clearly, overfitting is not the problem here.
Train and test error for 20-layer and 56-layer NN

Advancement in Deep Learning 26

Gradient Issues in ResNet Construction
Another possible reason can be vanishing gradient and/or exploding gradient
problems. However, the authors of ResNet (He et al.) argued that the use of
Batch Normalization and proper initialization of weights through normalization
ensures that the gradients have healthy norms. But, what went wrong here?
Let’s understand this by construction.
Consider a shallow neural network that was trained on a dataset. Also,
consider a deeper one in which the initial layers have the same weight
matrices as the shallow network (the blue colored layers in the below diagram)
with added some extra layers (green colored layers). We set the weight
matrices of the added layers as identity matrices (identity mappings).

Diagram explaining the construction

From this construction, the deeper network should not produce any higher
training error than its shallow counterpart because we are actually using the

Advancement in Deep Learning 27

shallow model’s weight in the deeper network with added identity layers. But
experiments prove that the deeper network produces high training error
comparing to the shallow one. This states the inability of deeper layers to
learn even identity mappings.

The degradation of training accuracy indicates that not all

systems are similarly easy to optimize.

One of the primary reasons is due to random initialization of weights with a

mean around zero, L1, and L2 regularization. As a result, the weights in the
model would always be around zero and thus the deeper layers can’t learn
identity mappings as well.
Here comes the concept of skip connections which would enable us to train
very deep neural networks. Let’s learn this awesome concept now.

What are Skip Connections?

Skip Connections (or Shortcut Connections) as the name

suggests skips some of the layers in the neural network and
feeds the output of one layer as the input to the next layers.

Skip Connections were introduced to solve different problems in different

architectures. In the case of ResNets, skip connections solved the degradation
problem that we addressed earlier whereas, in the case of DenseNets, it
ensured feature reusability. We’ll discuss them in detail in the following
sections.

Advancement in Deep Learning 28

How do Skip Connections Work?
Skip connections were introduced in literature even before residual networks.
For example, Highway Networks (Srivastava et al.) had skip connections
with gates that controlled and learned the flow of information to deeper layers.
This concept is similar to the gating mechanism in LSTM. Although ResNets is
actually a special case of Highway networks, the performance isn’t up to the
mark comparing to ResNets. This suggests that it’s better to keep the gradient
highways clear than to go for any gates – simplicity wins here!
Neural networks can learn any functions of arbitrary complexity, which could
be high-dimensional and non-convex. Visualizations have the potential to help
us answer several important questions about why neural networks work. And
there is actually some nice work done by Li et al. which enables us to visualize
the complex loss surfaces. The results from the networks with skip
connections are even more surprising! Take a look at them.
The loss surfaces of ResNet-56 with and without skip connections

As you can see here, the loss surface of the neural network with skip
connections is smoother and thus leading to faster convergence than the
network without any skip connections. Let’s see the variants of skip
connections in the next section.

Advancement in Deep Learning 29

Variants of Skip Connections
In this section, we will see the variants of skip connections in different
architectures. Skip Connections can be used in 2 fundamental ways in Neural
Networks: Addition and Concatenation.

Residual Networks (ResNets)

Residual Networks were proposed by He et al. in 2015 to solve the image
classification problem. In ResNets, the information from the initial layers is
passed to deeper layers by matrix addition. This operation doesn’t have any
additional parameters as the output from the previous layer is added to the
layer ahead. A single residual block with skip connection looks like this:
A residual block

Thanks to the deeper layer representation of ResNets as pre-trained weights

from this network can be used to solve multiple tasks. It’s not only limited to
image classification but also can solve a wide range of problems on image
segmentation, keypoint detection & object detection. Hence, ResNet is one of
the most influential architectures in the deep learning community.

Advancement in Deep Learning 30

Next, we’ll learn about another variant of skip connections in DenseNets which
is inspired by ResNets.

I would recommend you to go through the below resources for an in-detailed

understanding of ResNets–

Understanding ResNet and analyzing various models on the CIFAR-10

dataset

Densely Connected Convolutional Networks (DenseNets)

DenseNets were proposed by Huang et al. in 2017. The primary difference
between ResNets and DenseNets is that DenseNets concatenates the output
feature maps of the layer with the next layer rather than a summation.

Coming to Skip Connections, DenseNets uses

Concatenation whereas ResNets uses Summation

A 5-layer dense block

Advancement in Deep Learning 31

The idea behind the concatenation is to use features that are learned from
earlier layers in deeper layers as well. This concept is known as Feature
Reusability. So, DenseNets can learn mapping with fewer parameters than a
traditional CNN as there is no need to learn redundant maps.

U-Net: Convolutional Networks for Biomedical Image

Segmentation
The use of skip connections influences the field of biomedical too. U-
Nets were proposed by Ronneberger et al. for biomedical image segmentation.
It has an encoder-decoder part including Skip Connections. The overall
architecture looks like the English letter “U”, thus the name U-Nets.
U-Net architecture

Advancement in Deep Learning 32

The layers in the encoder part are skip connected and concatenated with
layers in the decoder part (those are mentioned as grey lines in the above
diagram). This makes the U-Nets use fine-grained details learned in the
encoder part to construct an image in the decoder part.
These kinds of connections are long skip connections whereas the ones we
saw in ResNets were short skip connections. More about U-Nets here.
Okay! Enough of theory, let’s implement a block of the discussed architectures
and how to load and use them in PyTorch!

Implementation of Skip Connections

In this section, we will build ResNets and DesNets using Skip Connections
from the scratch. Are you excited? Let’s go!

ResNet – A Residual Block

Advancement in Deep Learning 33

First, we will implement a residual block using skip connections. PyTorch is
preferred because of its super cool feature – object-oriented structure.

# import required libraries

import torch

from torch import nn

import torch.nn.functional as F

import torchvision

# basic resdidual block of ResNet

# This is generic in the sense, it could be used for downsampling of

features.

class ResidualBlock(nn.Module):

def init(self, in_channels, out_channels, stride=[1, 1],

downsample=None):

"""

A basic residual block of ResNet

Parameters

----------

in_channels: Number of channels that the input have

out_channels: Number of channels that the output have

stride: strides in convolutional layers

downsample: A callable to be applied before addition of residual

mapping

"""

super(ResidualBlock, self).__init__()

self.conv1 = nn.Conv2d(

in_channels, out_channels, kernel_size=3, stride=stride[0],

padding=1, bias=False

Advancement in Deep Learning 34

self.conv2 = nn.Conv2d(

out_channels, out_channels, kernel_size=3, stride=stride[1],

padding=1, bias=False

self.bn = nn.BatchNorm2d(out_channels)

self.downsample = downsample

def forward(self, x):

residual = x

# applying a downsample function before adding it to the output

if(self.downsample is not None):

residual = downsample(residual)

out = F.relu(self.bn(self.conv1(x)))

out = self.bn(self.conv2(out))

# note that adding residual before activation

out = out + residual

out = F.relu(out)

return out

view rawresidual_block.py hosted with ❤ by GitHub

As we have a Residual block in our hand, we can build a ResNet model of
arbitrary depth! Let’s quickly build the first five layers of ResNet-34 to get an
idea of how to connect the residual blocks.

# downsample using 1 * 1 convolution

downsample = nn.Sequential(

nn.Conv2d(64, 128, kernel_size=1, stride=2, bias=False),

Advancement in Deep Learning 35

nn.BatchNorm2d(128)

# First five layers of ResNet34

resnet_blocks = nn.Sequential(

nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False),

nn.MaxPool2d(kernel_size=2, stride=2),

ResidualBlock(64, 64),

ResidualBlock(64, 128, stride=[2, 1], downsample=downsample)

# checking the shape

inputs = torch.rand(1, 3, 100, 100) # single 100 * 100 color image

outputs = resnet_blocks(inputs)

print(outputs.shape) # shape would be (1, 128, 13, 13)

view rawmain_resnet.py hosted with ❤ by GitHub

PyTorch provides us an easy way to load ResNet models with pretrained
weights trained on the ImageNet dataset.

# one could also use pretrained weights of ResNet trained on

ImageNet

resnet34 = torchvision.models.resnet34(pretrained=True)

view rawresnet_pretrained.py hosted with ❤ by GitHub

DenseNet – A Dense Block

Implementing the complete densenet would be a little bit complex. Let’s grab it
step by step.

1. Implement a DenseNet layer

2. Build a dense block

3. Connect multiple dense blocks to obtain a densenet model

Advancement in Deep Learning 36

class Dense_Layer(nn.Module):

def init(self, in_channels, growthrate, bn_size):

super(Dense_Layer, self).__init__()

self.bn1 = nn.BatchNorm2d(in_channels)

self.conv1 = nn.Conv2d(

in_channels, bn_size * growthrate, kernel_size=1, bias=False

self.bn2 = nn.BatchNorm2d(bn_size * growthrate)

self.conv2 = nn.Conv2d(

bn_size * growthrate, growthrate, kernel_size=3, padding=1,

bias=False

def forward(self, prev_features):

out1 = torch.cat(prev_features, dim=1)

out1 = self.conv1(F.relu(self.bn1(out1)))

out2 = self.conv2(F.relu(self.bn2(out1)))

return out2

view rawdense_layer.py hosted with ❤ by GitHub

Next, we’ll implement a dense block that consists of an arbitrary number of
DenseNet layers.

class Dense_Block(nn.ModuleDict):

def init(self, n_layers, in_channels, growthrate, bn_size):

"""

A Dense block consists of `n_layers` of `Dense_Layer`

Parameters

----------

n_layers: Number of dense layers to be stacked

Advancement in Deep Learning 37

in_channels: Number of input channels for first layer in the block

growthrate: Growth rate (k) as mentioned in DenseNet paper

bn_size: Multiplicative factor for # of bottleneck layers

"""

super(Dense_Block, self).__init__()

layers = dict()

for i in range(n_layers):

layer = Dense_Layer(in_channels + i * growthrate, growthrate,

bn_size)

layers['dense{}'.format(i)] = layer

self.block = nn.ModuleDict(layers)

def forward(self, features):

if(isinstance(features, torch.Tensor)):

features = [features]

for _, layer in self.block.items():

new_features = layer(features)

features.append(new_features)

return torch.cat(features, dim=1)

view rawdense_block.py hosted with ❤ by GitHub

From the dense block, let’s build DenseNet. Here, I’ve omitted the transition
layers of DenseNet architecture (which acts as downsampling) for simplicity.

# a block consists of initial conv layers followed by 6 dense layers

dense_block = nn.Sequential(

nn.Conv2d(3, 64, kernel_size=7, padding=3, stride=2, bias=False),

Advancement in Deep Learning 38

nn.BatchNorm2d(64),

nn.MaxPool2d(3, 2),

Dense_Block(6, 64, growthrate=32, bn_size=4),

inputs = torch.rand(1, 3, 100, 100)

outputs = dense_block(inputs)

print(outputs.shape) # shape would be (1, 256, 24, 24)

# one could also use pretrained weights of DenseNet trained on

ImageNet

densenet121 = torchvision.models.densenet121(pretrained=True)

view rawmain_densenet.py hosted with ❤ by GitHub

Conclusion
In this article, we’ve discussed the importance of skip connections for the
training of deep neural nets and how skip connections were used in ResNet,
DenseNet, and U-Net with its implementation. I know, this article covers many
theoretical aspects which are not easy to grasp in one go. So, feel free to
leave comments if you have any.

Frequently Asked Question

Computer Visiondeep learningdeep learning architecturespythonTheory
S
Sivaram T14 Aug 2023
AdvancedComputer VisionDeep LearningLibrariesPython

Frequently Asked Questions

Q1. Why skip connections in ResNet?

Advancement in Deep Learning 39

A. Skip connections in ResNet prevent the vanishing gradient problem during
deep neural network training. These connections enable the direct flow of
information from earlier layers to later layers, aiding in preserving gradient and
promoting better convergence.

Q2. Why do we use skip connections in unet?

Q3. What are the different types of skip

connections?

Q5. What is the difference between skip and

residual connections?
Responses From Readers
Submit reply

Related Courses

72 Lessons
4.84

Advancement in Deep Learning 40

A Comprehensive Learning Path for Deep Learning in 2023

FREE
Learning Path Deep Learning Neural Networks
Enroll now

72 Lessons
4.94
A Comprehensive Learning Path for Deep Learning in 2020
FREE
Learning Path Deep Learning Neural Networks
Enroll now

Advancement in Deep Learning 41

76 Lessons
4.93
A Comprehensive Learning Path for Deep Learning in 2019
FREE
Learning Path Deep Learning Neural Networks

Enroll now
Write, Shine, Succeed
Write, captivate, and earn accolades and rewards for your work

• Reach a Global Audience

• Get Expert Feedback
• Build Your Brand & Audience
• Cash In on Your Knowledge
• Join a Thriving Community
• Level Up Your Data Science Game

Advancement in Deep Learning 42

Rahul Shah27

Sion Chakrabarti16

CHIRAG GOYAL87

Advancement in Deep Learning 43

Barney Darlington5

Suvojit Hore9

Arnab Mondal15

Prateek Majumder68

Advancement in Deep Learning 44

Company
•
About Us
•
Contact Us
•
Careers
Discover

•
Blogs
•
Expert session
•
Podcasts
•
Comprehensive Guides
Learn

•
Free courses
•
Learning path
•
BlackBelt program
•
Gen AI
Engage

Advancement in Deep Learning 45

•
Community
•
Hackathons
•
Events
•
Daily challenges
Contribute

•
Contribute & win
•
Become a speaker
•
Become a mentor
•
Become an instructor
Enterprise

•
Our offerings
•
Case studies
•
Industry report
•
quexto.aiDownload App

Terms & conditions Refund Policy Privacy Policy Cookies Policy © Analytics
Vidhya 2023.All rights reserved.

Advancement in Deep Learning 46

Image Denoising
(a)Gaussian Noise - Noise having PDF equal to the normal distribution. i.e. the
pixel values that these noises can take are Gaussian distributed.
(b)Impulse Noise - caused by sharp and sudden disturbances in the image
signal. It usually occurs as white and black pixels in the image.
The real-world noise (also known as blind noise) is more sophisticated and
diverse.

AutoEncoders:

Advancement in Deep Learning 47

CBDNet (Convolutional Blind Denoising Network), PRIDNet (Perceptual
Residual-Injective Denoising Network), and RIDNet (Residual-in-Residual
Dense Network) are all state-of-the-art deep learning models designed for
image denoising. Let's briefly discuss each of them:

1. CBDNet:

CBDNet was proposed in the paper "Toward Convolutional Blind

Denoising of Real Photographs" by Guo et al., published in 2019.

It's designed to denoise real-world photographs without assuming any

prior knowledge about the noise characteristics.

CBDNet utilizes a blind denoising approach, meaning it doesn't require

any explicit noise level estimation.

The network architecture is composed of multiple convolutional layers

along with residual connections to effectively learn the denoising task.

2. PRIDNet:

PRIDNet was introduced in the paper "Perceptual Residual-Injective

Denoising Network for Real Image Denoising" by Wang et al.,
presented in 2019.

It focuses on real image denoising and aims to achieve perceptually

superior denoising results.

PRIDNet incorporates a residual-injective structure that leverages both

local and global residual learning for better denoising performance.

Advancement in Deep Learning 48

Additionally, it employs a perceptual loss function, which takes into
account the perceptual difference between the denoised and clean
images, leading to visually pleasing results.

3. RIDNet:

RIDNet, proposed in the paper "RIDNet: Residual-in-Residual Dense

Network for Image Denoising" by Ahn et al., in 2018, focuses on
learning hierarchical representations for image denoising.

It adopts a residual-in-residual dense block architecture, which

facilitates the learning of highly non-linear mappings between noisy
and clean images.

RIDNet is capable of capturing both local and global features

effectively through its dense connections and residual learning.

The network architecture allows for efficient information flow across

multiple layers, enabling better exploitation of the image's contextual
information for denoising.

Overall, CBDNet, PRIDNet, and RIDNet are among the top-performing deep
learning models for image denoising, each offering unique architectural
designs and learning strategies to address the challenges associated with
real-world image denoising tasks.

Semantic Segmentation

Advancement in Deep Learning 49

1. UNet:

UNet, proposed by Ronneberger et al. in 2015, is a widely used

architecture particularly suited for biomedical image segmentation.

It features a symmetric encoder-decoder structure with skip

connections between corresponding encoder and decoder layers.

UNet's skip connections help preserve spatial information and enable

precise localization of objects in the segmentation masks.

Advancement in Deep Learning 50

ENet (Efficient Neural Network):

ENet, proposed by Paszke et al. in 2016, is designed for efficient real-time

semantic segmentation.

It features a compact architecture with lightweight operations, making it

suitable for deployment on embedded systems or mobile devices.

ENet utilizes a combination of regular and asymmetric convolutions to

reduce computational complexity while maintaining performance.

Advancement in Deep Learning 51

Object Detection etc
Object detection is a computer vision task that involves identifying and
localizing objects within an image. Deep learning architectures have
revolutionized object detection, enabling high accuracy and real-time
performance. Some of the most popular deep learning architectures for object
detection include:

1. Faster R-CNN:

Faster R-CNN, introduced by Ren et al. in 2015, is a milestone in object

detection.

It combines a Region Proposal Network (RPN) with a Fast R-CNN

detector, allowing for end-to-end training.

RPN generates region proposals (bounding boxes) from the input

image, and Fast R-CNN uses these proposals to classify and refine
object detections.

2. YOLO:
YOLO (You Only Look Once) is a popular deep learning architecture for
real-time object detection. YOLO processes images in a single forward
pass through a neural network to predict bounding boxes and class
probabilities directly. This approach makes YOLO extremely fast and

Advancement in Deep Learning 52

suitable for real-time applications. The original YOLO architecture, as
introduced by Joseph Redmon et al. in 2015, has undergone several
iterations, including YOLOv2, YOLOv3, and YOLOv4, each with
improvements in accuracy and efficiency. Here's an overview of the
original YOLO architecture:

1. Input Processing:

YOLO takes an input image of fixed size (e.g., 416x416 pixels) and
divides it into a grid of cells.

Each grid cell is responsible for predicting bounding boxes and

class probabilities for objects present in that cell.

2. Feature Extraction:

The input image is passed through a convolutional neural network

(CNN) to extract features.

The CNN architecture typically consists of convolutional layers

followed by max-pooling layers, which progressively reduce the
spatial dimensions of the feature maps while increasing the depth.

3. Grid Cell Prediction:

For each grid cell, YOLO predicts multiple bounding boxes.

Each bounding box is represented by a set of coordinates (x, y,

width, height) relative to the grid cell's location.

Advancement in Deep Learning 53

Additionally, YOLO predicts the confidence score for each
bounding box, indicating the probability that the box contains an
object, as well as the class probabilities for the detected objects.

4. Non-Maximum Suppression (NMS):

YOLO applies non-maximum suppression to remove redundant

bounding boxes.

It keeps the bounding box with the highest confidence score for
each detected object and suppresses overlapping boxes with
lower scores.

5. Output:

The final output of YOLO is a set of bounding boxes along with

their associated class probabilities.

YOLO provides real-time object detection by efficiently processing

the input image in a single pass through the network.

Neural Attention Models,

Neural attention models are a class of deep learning architectures that mimic
the human cognitive mechanism of selectively focusing on specific parts of
input data while processing it. These models have gained prominence across
various tasks in natural language processing (NLP), computer vision, and
other domains. Attention mechanisms allow neural networks to dynamically
weigh the importance of different parts of the input during computation,
enabling more effective and context-aware processing. Here are some key
types of neural attention models and their applications:

1. Sequence-to-Sequence with Attention:

This model was introduced by Bahdanau et al. in 2014 and is

commonly used for tasks such as machine translation and text
summarization.

In sequence-to-sequence tasks, the model encodes an input

sequence into a fixed-length context vector using a recurrent neural
network (RNN) encoder.

Advancement in Deep Learning 54

During decoding, an attention mechanism is applied to the encoder's
hidden states, allowing the decoder to attend to different parts of the
input sequence while generating the output sequence.

2. Transformer:

The Transformer architecture, introduced by Vaswani et al. in 2017,

revolutionized NLP tasks by eliminating recurrent connections and
replacing them with self-attention mechanisms.

Transformers consist of multiple self-attention layers that allow each

word/token in the input sequence to attend to all other words/tokens in
the sequence.

Self-attention enables the model to capture long-range dependencies

and contextual information more efficiently than traditional recurrent
architectures.

Transformers have been widely adopted for tasks such as machine

translation, text classification, and language modeling.

3. Spatial Attention in Convolutional Neural Networks (CNNs):

Advancement in Deep Learning 55

In computer vision, spatial attention mechanisms are used to focus on
relevant regions of an input image while suppressing irrelevant or
distracting regions.

These mechanisms typically involve learning attention maps that

indicate the importance of different spatial locations in the input image.

Spatial attention has been integrated into CNN architectures for tasks
such as image classification, object detection, and image captioning,
improving performance by allowing the model to focus on salient
features.

4. Multi-Head Attention:

Multi-head attention, introduced in the Transformer architecture,

enables the model to attend to different parts of the input
simultaneously.

In multi-head attention, the input is projected into multiple subspaces,

and attention is computed independently in each subspace.

This allows the model to capture diverse representations and attend to

different aspects of the input data effectively.

5. Cross-Modal Attention:

Cross-modal attention mechanisms enable models to attend to

information from multiple modalities (e.g., text, image, audio)
simultaneously.

These mechanisms are used in tasks such as image captioning, visual

question answering (VQA), and multimodal translation, where the input
may consist of data from different modalities.

Neural attention models have demonstrated significant improvements in

various tasks by enabling more flexible and context-aware processing of input
data. They continue to be an active area of research, with ongoing efforts to
develop more advanced attention mechanisms and integrate them into diverse
architectures and applications.

Neural Machine Translation.

Advancement in Deep Learning 56

Neural Machine Translation (NMT) is an approach to machine translation that
uses neural networks to translate text from one language to another. NMT has
largely replaced traditional statistical machine translation (SMT) approaches
due to its superior performance, especially in capturing long-range
dependencies and handling context.
Here's how Neural Machine Translation generally works:

1. Sequence-to-Sequence Model:

NMT is typically based on the sequence-to-sequence (seq2seq)

model architecture, introduced by Sutskever et al. in 2014.

In seq2seq models, an encoder-decoder architecture is used where

the encoder processes the input sequence (source language) and
generates a fixed-length context vector that represents the input.

The decoder then takes this context vector and generates the output
sequence (target language) word by word.

2. Recurrent Neural Networks (RNNs) and Transformers:

Initially, NMT systems were built using Recurrent Neural Networks

(RNNs) such as Long Short-Term Memory (LSTM) or Gated Recurrent
Unit (GRU) cells.

However, with the introduction of the Transformer architecture by

Vaswani et al. in 2017, the landscape of NMT changed significantly.
Transformers have since become the dominant architecture for NMT
due to their ability to capture long-range dependencies more
effectively through self-attention mechanisms.

3. Training Data and Loss Function:

NMT models are trained on parallel corpora, which are collections of

sentences in both the source and target languages.

During training, the model learns to minimize a loss function that

measures the difference between the predicted translations and the
ground truth translations.

Common loss functions used in NMT include cross-entropy loss and

sequence-to-sequence loss.

Advancement in Deep Learning 57

4. Attention Mechanism:

Attention mechanisms play a crucial role in NMT by allowing the model

to focus on relevant parts of the input sentence while generating the
output translation.

They enable the model to align words in the source and target
languages and alleviate the bottleneck of fixed-length context vectors.

The attention mechanism can be implemented using different variants,

such as global attention, local attention, or multi-head attention.

5. Evaluation:

NMT systems are evaluated based on metrics such as BLEU (Bilingual

Evaluation Understudy), which measures the similarity between the
predicted translations and human-generated translations.

Other evaluation metrics include METEOR, TER, and human evaluation.

NMT has made significant advancements in recent years and is widely used in
commercial translation systems and research laboratories. While it has
achieved impressive results, there are still challenges such as handling low-
resource languages, domain adaptation, and capturing subtle linguistic
nuances. Ongoing research in NMT aims to address these challenges and
further improve the quality and efficiency of machine translation systems.

Performance Metrics,

Advancement in Deep Learning 58

Neural Machine Translation Performance Metrics:

NMT systems are evaluated based on metrics such as BLEU (Bilingual

Evaluation Understudy), which measures the similarity between the
predicted translations and human-generated translations.

Other evaluation metrics include METEOR, TER, and human evaluation.

Baseline Methods,
Baseline models wield immense influence in machine learning practice.
Though intentionally simple, they serve as the basis for evaluating the
performance of more complex models. Baseline models have a dual purpose:

first, they set a performance baseline against which advancements can be

measured, and

second, they provide a benchmark for gauging the efficiency of intricate

models.

Advancement in Deep Learning 59

1. Feedforward Neural Networks (FNNs):
◦
Pros: Simple architecture, suitable for basic tasks.
◦
Cons: Limited complexity, prone to overfitting.
2.
Convolutional Neural Networks (CNNs):
◦
Pros: Excellent for image tasks, reduces computational load.
◦
Cons: Needs lots of data, vanishing gradients in deep networks.
3.
Recurrent Neural Networks (RNNs):
◦
Pros: Great for sequential data, captures temporal dependencies.
◦
Cons: Vanishing/exploding gradients, struggles with long-term dependencies.
4.
Autoencoders:
◦
Pros: Unsupervised learning, feature learning, dimensionality reduction.
◦
Cons: Slow training, potential information loss during compression.
5.
Generative Adversarial Networks (GANs):
◦
Pros: Generates realistic data, used in image generation.
◦
Cons: Training instability, mode collapse, hyperparameter sensitivity.
6.
Reinforcement Learning (RL) Models:
◦
Pros: Learns decision-making through interaction.
◦

Advancement in Deep Learning 60

Cons: High computational requirements, reward design sensitivity.
7.
Transfer Learning:
◦
Pros: Saves time/resources, useful for limited data domains.
◦
Cons: Task misalignment, requires careful fine-tuning.
8.
Ensemble Methods:
◦
Pros: Combines models for improved accuracy.
◦
Cons: Increased complexity, potential overfitting.
9.
Attention Mechanisms:
◦
Pros: Improves model interpretability, focuses on relevant inputs.
◦
Cons: Adds computational overhead, tuning required.
10.
Meta-Learning Approaches:
◦
Pros: Learns quickly for new tasks/domains.
◦
Cons: Requires careful algorithm design, sensitive to task similarities.

Data Requirements,
1. Quantity: Large-scale and diverse datasets are crucial for effective model
training and generalization.
2.
Quality: Clean, accurately labeled data with balanced class distributions
improves model performance.
3.
Preprocessing: Normalize, standardize, and augment data to aid model
convergence and reduce overfitting.
4.

Advancement in Deep Learning 61

Representative Features: Ensure input features capture relevant information
for the task.
5.
Data Splitting: Divide data into training, validation, and test sets for evaluation
and hyperparameter tuning.
6.
Imbalance Handling: Address class imbalance using oversampling,
undersampling, or class weights.
7.
Transfer Learning: Utilize pre-trained models and domain adaptation
techniques for limited data scenarios.
8.
Privacy and Security: Comply with regulations and implement data protection
measures for sensitive data.

Hyperparameter Tuning:
hyperparameters - external configuration variables that data scientists use
to manage machine learning model training. eg - the number of nodes and
layers in a neural network and the number of branches in a decision tree

Parameters allow the model to learn the rules from the data while
hyperparameters control how the model is training.

Hyperparameter tuning is a critical step in optimizing the performance of

machine learning and deep learning models. It involves adjusting the
hyperparameters of a model to find the best configuration that results in
improved performance metrics such as accuracy, precision, recall, or F1-
score. Here are key points regarding hyperparameter tuning:

1. Hyperparameters Examples:

Learning rate in optimization algorithms (e.g., gradient descent)

Number of layers and neurons in a neural network

Regularization parameters (e.g., L1/L2 regularization strength, dropout

rate)

Advancement in Deep Learning 62

Batch size, epochs, and optimizer choice (e.g., Adam, SGD)

2. Cross-Validation: Utilize cross-validation techniques (e.g., k-fold cross-

validation) during hyperparameter tuning to evaluate model performance
across different subsets of data and reduce overfitting.

3. Objective Function: Define an objective function (e.g., accuracy, loss) that

the hyperparameter tuning process aims to optimize. It guides the search
for optimal hyperparameters.

4. Early Stopping: Implement early stopping based on validation metrics to

prevent overfitting during hyperparameter tuning iterations.

5. Parallelization: Leverage parallel computing or distributed systems to

speed up hyperparameter tuning processes, especially for computationally
intensive models or large datasets.

6. Domain Knowledge: Incorporate domain knowledge and insights into the

hyperparameter tuning process to guide the search space and prioritize
relevant hyperparameters.

Manual vs Automatic,

Advancement in Deep Learning 63

Grid vs Random.

Advancement in Deep Learning 64

Advancement in Deep Learning 65
Advancement in Deep Learning 66
Advancement in Deep Learning 67

Deep Learning & Neural Networks Guide
No ratings yet
Deep Learning & Neural Networks Guide
64 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Algorithmic Advances
No ratings yet
Algorithmic Advances
5 pages
Neural Networks: A Beginner's Guide
No ratings yet
Neural Networks: A Beginner's Guide
23 pages
Deep Learning (Nirali)
No ratings yet
Deep Learning (Nirali)
32 pages
Cst414-Deep Learning Module 2
No ratings yet
Cst414-Deep Learning Module 2
13 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Introtodeeplearning MIT 6.S191
No ratings yet
Introtodeeplearning MIT 6.S191
36 pages
2 Deep Neural Network - 241120 - 095158
No ratings yet
2 Deep Neural Network - 241120 - 095158
47 pages
Deep Learning in Neural Networks An Overview
No ratings yet
Deep Learning in Neural Networks An Overview
89 pages
DL Unit 3
No ratings yet
DL Unit 3
14 pages
HCIP-AI-EI Developer V2.0 Training Material
No ratings yet
HCIP-AI-EI Developer V2.0 Training Material
508 pages
Module 1
No ratings yet
Module 1
64 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
No ratings yet
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
65 pages
Deep Learning: A Technical Survey
No ratings yet
Deep Learning: A Technical Survey
75 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
Lecture 9 Training Deep Networks
No ratings yet
Lecture 9 Training Deep Networks
20 pages
Deep Learning 15 May 2014
No ratings yet
Deep Learning 15 May 2014
70 pages
CS 230 - Deep Learning Tips and Tricks Cheatsheet
No ratings yet
CS 230 - Deep Learning Tips and Tricks Cheatsheet
8 pages
Convergence
No ratings yet
Convergence
31 pages
A Review On Basic Deep Learning
No ratings yet
A Review On Basic Deep Learning
9 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
Unit II
No ratings yet
Unit II
56 pages
Introd 02
No ratings yet
Introd 02
32 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
Deep Learning Regularization Techniques
No ratings yet
Deep Learning Regularization Techniques
36 pages
Deep Learning Neural Networks Overview
No ratings yet
Deep Learning Neural Networks Overview
31 pages
DL Unit 1
No ratings yet
DL Unit 1
199 pages
Deep Learning Unit 4
No ratings yet
Deep Learning Unit 4
10 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Unit 5
No ratings yet
Unit 5
42 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
SNGAN 5th Module
No ratings yet
SNGAN 5th Module
12 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Chapter-2 (Deep Learning)
No ratings yet
Chapter-2 (Deep Learning)
18 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
54 pages
Unit 2
No ratings yet
Unit 2
10 pages
Curriculum: Tuesday, February 15, 2022 3:30 PM
No ratings yet
Curriculum: Tuesday, February 15, 2022 3:30 PM
408 pages
Curriculum: Tuesday, February 15, 2022 3:30 PM
No ratings yet
Curriculum: Tuesday, February 15, 2022 3:30 PM
408 pages
Curriculum: Tuesday, February 15, 2022 3:30 PM
No ratings yet
Curriculum: Tuesday, February 15, 2022 3:30 PM
358 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
48 pages
DL - Unit I (Fundamentals of DL)
No ratings yet
DL - Unit I (Fundamentals of DL)
21 pages
FDL Module1
No ratings yet
FDL Module1
102 pages
DL Test-2
No ratings yet
DL Test-2
28 pages
UNIT 2 Notes
No ratings yet
UNIT 2 Notes
19 pages
22 Selected Top Papers On Deep Learning
No ratings yet
22 Selected Top Papers On Deep Learning
393 pages
Inception Network Architecture
No ratings yet
Inception Network Architecture
56 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
40 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
Deep Learning - Intro, Methods & Applications
100% (1)
Deep Learning - Intro, Methods & Applications
37 pages
Deep Neural Networks Explained
No ratings yet
Deep Neural Networks Explained
12 pages
Stqa Notes-3-91
No ratings yet
Stqa Notes-3-91
89 pages
Unit-3 Mad
100% (1)
Unit-3 Mad
18 pages
Computer Vision NOTES
No ratings yet
Computer Vision NOTES
15 pages
Principles of Management For Engineers
No ratings yet
Principles of Management For Engineers
18 pages
Retele Neuronale Convolutionale
No ratings yet
Retele Neuronale Convolutionale
60 pages
Ijst 2021 1266
No ratings yet
Ijst 2021 1266
15 pages
Efficient Neural Network Quantization
No ratings yet
Efficient Neural Network Quantization
10 pages
Personalized Federated Learning With Adaptive Batchnorm For Healthcare
No ratings yet
Personalized Federated Learning With Adaptive Batchnorm For Healthcare
12 pages
DL Ut - 1
No ratings yet
DL Ut - 1
14 pages
MAJOR
No ratings yet
MAJOR
30 pages
Beyond A Gaussian Denoiser: Residual Learning of Deep CNN For Image Denoising
No ratings yet
Beyond A Gaussian Denoiser: Residual Learning of Deep CNN For Image Denoising
13 pages
Early Stopping, Dropout, Augmentation, Optimizers New
No ratings yet
Early Stopping, Dropout, Augmentation, Optimizers New
91 pages
Object Detection in Images and Videos Using OpenCV A Comparative Study of Deep Learning and Traditional Computer Vision Techniques
No ratings yet
Object Detection in Images and Videos Using OpenCV A Comparative Study of Deep Learning and Traditional Computer Vision Techniques
6 pages
Unit 2
No ratings yet
Unit 2
28 pages
Dataset of Groundnut Plant Leaf Images For Classification and Detection
No ratings yet
Dataset of Groundnut Plant Leaf Images For Classification and Detection
10 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
How To Get Pavement Distress Detection Ready For Deep Learning A Systematic Approach
No ratings yet
How To Get Pavement Distress Detection Ready For Deep Learning A Systematic Approach
9 pages
NeurIPS 2019 How To Initialize Your Network Robust Initialization For Weightnorm Resnets Paper
No ratings yet
NeurIPS 2019 How To Initialize Your Network Robust Initialization For Weightnorm Resnets Paper
10 pages
Brain Tumor Classification
100% (1)
Brain Tumor Classification
12 pages
1160 CS F425 20241218114944 Comprehensive Exam Question Paper
No ratings yet
1160 CS F425 20241218114944 Comprehensive Exam Question Paper
5 pages
A Mobile Solution For Lateral Segment Photographed Images Based Deep Keratoconus Screening Method
No ratings yet
A Mobile Solution For Lateral Segment Photographed Images Based Deep Keratoconus Screening Method
10 pages
Deep Learning - IIT Ropar - Unit 11 - Week 8
No ratings yet
Deep Learning - IIT Ropar - Unit 11 - Week 8
4 pages
DEEP LEARNING LAB Manuals
No ratings yet
DEEP LEARNING LAB Manuals
55 pages
DL 4
No ratings yet
DL 4
15 pages
Chapter 3 - Training Deep Neural Networks
No ratings yet
Chapter 3 - Training Deep Neural Networks
25 pages
Batch Normalization
No ratings yet
Batch Normalization
11 pages
Zero Initialization Initializi
No ratings yet
Zero Initialization Initializi
14 pages
Applied Information Processing Systems 2022
100% (1)
Applied Information Processing Systems 2022
588 pages
Age & Gender Detection Tech Overview
No ratings yet
Age & Gender Detection Tech Overview
69 pages
Deep Neural Network Training Guide
No ratings yet
Deep Neural Network Training Guide
55 pages
AI-Driven Breast Cancer Detection
No ratings yet
AI-Driven Breast Cancer Detection
17 pages
Breaking Into AI!
100% (1)
Breaking Into AI!
30 pages
Unit IV
No ratings yet
Unit IV
21 pages
Vanishing Gradient Problem in Deep Learning Understanding Intuition and Solutions
No ratings yet
Vanishing Gradient Problem in Deep Learning Understanding Intuition and Solutions
8 pages