0% found this document useful (0 votes)
641 views18 pages

Al3502 - DLV Unit 2

The document provides an overview of deep learning, focusing on deep feed-forward neural networks, gradient descent, back-propagation, and challenges such as the vanishing gradient problem. It discusses the structure and functioning of neural networks, optimization techniques, and mitigation strategies to enhance model performance and fairness. Additionally, it highlights heuristics for avoiding local minima and accelerating training processes in deep learning applications.

Uploaded by

swethakarthi1619
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
641 views18 pages

Al3502 - DLV Unit 2

The document provides an overview of deep learning, focusing on deep feed-forward neural networks, gradient descent, back-propagation, and challenges such as the vanishing gradient problem. It discusses the structure and functioning of neural networks, optimization techniques, and mitigation strategies to enhance model performance and fairness. Additionally, it highlights heuristics for avoiding local minima and accelerating training processes in deep learning applications.

Uploaded by

swethakarthi1619
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Al3502 - DEEP LEARNING FOR VISION

UNIT II INTRODUCTION TO DEEP LEARNING

Deep Feed-Forward Neural Networks – Gradient Descent – Back-Propagation and


Other Differentiation Algorithms – Vanishing Gradient Problem – Mitigation –
Rectified Linear Unit (ReLU) – Heuristics for Avoiding Bad Local Minima – Heuristics
for Faster Training – Nestors Accelerated Gradient Descent – Regularization for Deep
Learning – Dropout – Adversarial Training – Optimization for Training Deep Models.

DEEP LEARNING

 Deep learning is a subfield of machine learning that utilizes artificial neural networks
with multiple layers to analyze data and learn complex patterns.
 Deep learning is a class of machine learning techniques that use multi-layered
artificial neural networks to model and learn complex representations of data. It is
particularly effective for processing unstructured data such as images, audio, and
natural language.

NEURAL NETWORKS

A Neural Network is a computational model inspired by the structure and functioning of the
human brain, designed to recognize patterns and solve problems by learning from data
through interconnected layers of artificial neurons.
 Neurons: The basic units that receive inputs, each neuron is governed by a threshold
and an activation function.
 Connections: Links between neurons that carry information, regulated by weights
and biases.
 Weights and Biases: These parameters determine the strength and influence of
connections.
 Propagation Functions: Mechanisms that help process and transfer data across layers
of neurons.
 Learning Rule: The method that adjusts weights and biases over time to improve
accuracy.

1. Input Layer: This is where the network receives its input data. Each input neuron in
the layer corresponds to a feature in the input data.
2. Hidden Layers: These layers perform most of the computational heavy lifting. A
neural network can have one or multiple hidden layers. Each layer consists of units
(neurons) that transform the inputs into something that the output layer can use.
3. Output Layer: The final layer produces the output of the model. The format of these
outputs varies depending on the specific task like classification, regression.
DEEP FEED-FORWARD NEURAL NETWORKS

Feedforward Neural Network (FNN) is a type of artificial neural network in which


information flows in a single direction—from the input layer through hidden layers to the
output layer—without loops or feedback. It is mainly used for pattern recognition tasks like
image and speech classification.

Advantages of DFNN

 Can model complex non-linear relationships


 Suitable for image, text, sound, and tabular data
 Easily scalable with hardware (e.g., GPUs)

Limitations

 Needs large data for effective training


 May overfit without regularization
 Computationally expensive

Real-Life Applications

 Image classification
 Disease prediction
 Game-playing AI
 Stock price forecasting
GRADIENT DESCENT
Gradient Descent is an optimization algorithm used to minimize the loss function in machine
learning and deep learning models by adjusting model parameters (like weights and biases) in
the direction of the steepest descent (i.e., negative gradient).

Gradient Descent is a fundamental optimization algorithm used in machine learning and deep
learning to minimize the loss function. Simply put, it helps us find the best possible
parameters (or weights) for our models so that they can make accurate predictions.

Why is it Important?

In the context of deep learning, especially in computer vision tasks (like identifying objects in
images), we need our models to learn from data effectively. Gradient Descent helps us adjust
the model's parameters in such a way that the difference between the predicted values and
actual values (the error) is minimized.

Step-by-Step Explanation

1. Initialization:

Start with random values for the model parameters (weights). Think of this as starting at a
random point on a hilly landscape.

2. Calculate the Loss:

Use a loss function to measure how far off your model's predictions are from the
actual results. This is like calculating the height of the hill at your starting point.

3. Compute the Gradient:

The gradient is a vector that points in the direction of the steepest ascent. It indicates
how much the loss will increase if we change the parameters. Imagine you want to go
downhill; the gradient tells you which way to go.

4. Update the Parameters:

Adjust the parameters in the opposite direction of the gradient. This is like taking a step down
the hill. The size of the step is determined by a value called the learning rate.

5. Repeat:

Continue calculating the loss, computing the gradient, and updating the parameters
until the model’s performance stops improving significantly or until a set number of
iterations is reached.

VISUALIZING GRADIENT DESCENT


Imagine a ball at the top of a hill (representing high loss). The ball rolls down to find the
lowest point (the minimum loss). Each time the ball rolls, it takes the steepest path
downward, which is analogous to following the gradient in Gradient Descent.

Real-Life Application

Example: Image Classification

Task: Suppose we want to teach a computer to recognize cats in photos.

Process:

 We initialize our model with random weights.


 We feed it a series of images (some with cats, some without) and calculate how well it
identifies the cats using a loss function.
 The model calculates the gradients and updates its weights using Gradient Descent.
 After several iterations, the model learns to recognize cats more accurately.

BACK-PROPACATION
 Backpropagation is a crucial concept in deep learning, particularly when training
neural networks to recognize patterns in data, such as images. Let's break it down in a
way that is clear and accessible.
 Backpropagation is an algorithm used to minimize the error in a neural network's
predictions. It does this by adjusting the weights of the connections within the
network based on the errors it makes during training.

Why is Back-Propagation Important?


 Learning from Mistakes: Just like how we learn from our mistakes, back-
propagation helps neural networks learn by correcting their errors.
 Improving Accuracy: By repeatedly adjusting weights, the model becomes better at
making accurate predictions over time.
Step-by-Step Explanation of Back-Propagation

1. Forward Pass:
 The input data (like an image) is passed through the neural network.
 Each neuron processes the input and passes its output to the next layer.
 At the end, the network produces an output (like a prediction of what is in the image).
2. Calculate Error:
 The output of the network is compared to the actual answer (the ground truth).
 The difference between the predicted output and the actual answer is calculated using
a loss function (a measure of how wrong the prediction is).

3. Backward Pass:
 The error is then propagated backward through the network.
 This involves calculating the gradient (slope) of the error with respect to each weight
in the network. The gradient tells us how much to change the weights to reduce the
error.

4. Update Weights:
 Using the gradients calculated, the weights are updated to minimize the error. This is
typically done using an optimization algorithm like Stochastic Gradient Descent
(SGD).
 The weights are adjusted slightly in the direction that reduces the error, based on the
calculated gradients.

5. Repeat:
 Steps 1 to 4 are repeated for many iterations (epochs) with different batches of
training data until the model's performance stabilizes or improves to an acceptable
level.

VANISHING GRADIENT PROBLEM


In deep learning, particularly when we talk about training neural networks, one of the
challenges we face is the Vanishing Gradient Problem. This issue can significantly hinder our
ability to train deep networks effectively. Let's break it down step by step.

Step 1: Understanding Neural Networks

 Neural Networks are computational models inspired by the human brain. They
consist of layers of nodes (neurons) that process inputs to produce outputs.
 Each connection between nodes has a weight, which is adjusted during training to
improve the network's performance.
Step 2: The Role of Gradients

 Gradients are numerical values that indicate how much a change in input will affect
the output. They are essential for training neural networks because they guide the
adjustments made to the weights.
 During training, we use a method called backpropagation to compute these gradients.
This process helps us update the weights effectively.

Step 3: What Happens During Backpropagation?

 When backpropagation is executed, gradients are calculated from the output layer
back to the input layer.
 In deep networks (networks with many layers), the gradients must be multiplied at
each layer. If the gradients are small (close to zero), the values can become extremely
small as they move back through the layers.

Step 4: The Vanishing Gradient Phenomenon

 When the gradients become very small, the updates to the weights of the earlier
layers (closer to the input) are negligible. This means that those layers learn very
slowly, if at all.
 As a result, the network might struggle to learn important features in the data, leading
to poor performance.

Solutions to the Vanishing Gradient Problem

1. Use of Activation Functions:

Certain functions, like ReLU (Rectified Linear Unit), help maintain gradients better
than traditional activation functions like sigmoid or tanh.

2. Batch Normalization:

This technique normalizes the inputs to each layer, helping to stabilize and accelerate
training.

3. Skip Connections:

These connections allow gradients to bypass certain layers, making it easier for the
network to learn.

4. Proper Initialization:

Initializing weights correctly can prevent gradients from becoming too small in the
first place.
MITIGATION
Mitigation, in the context of deep learning for vision, refers to strategies and techniques used
to reduce negative outcomes or risks associated with using deep learning models. These
negative outcomes could include biases in predictions, inaccuracies in image recognition, or
failures in model performance. The goal of mitigation is to ensure that the models are fair,
reliable, and effective.

Why is Mitigation Important?

1. Accuracy: Models need to make correct predictions based on visual data. Mitigation
helps improve accuracy by addressing potential errors.

2. Fairness: Deep learning models can unintentionally learn biases from the data they
are trained on. Mitigation techniques aim to identify and reduce these biases, leading to fairer
outcomes for all users.

3. Robustness: Models should perform well even when presented with unfamiliar data.
Mitigation strategies help enhance the robustness of models against unexpected inputs.

Steps to Mitigation in Deep Learning for Vision

1. Identify Potential Risks

 Analyze the data and the model to identify what could go wrong.
 Look for biases in the dataset, such as underrepresentation of certain groups or
classes.

2. Data Augmentation

 What It Is: This technique involves modifying the training images to create new
variations.
 Example: If you have a picture of a cat, you can rotate, zoom, or change its
brightness to make new versions.
 Benefit: This helps the model learn better by providing diverse examples.

3. Bias Detection

 Use statistical methods to check for biases in the model's predictions.


 Example: If a facial recognition model performs poorly on certain ethnic groups, it
indicates bias.
 Mitigation: Adjust the training data to include more diverse examples.

4. Model Evaluation

 Continuously assess the model with new data to ensure it performs well across
different scenarios.
 Use metrics like precision, recall, and F1-score to evaluate performance.
5. Feedback Loop

 Incorporate feedback from users and stakeholders to improve the model iteratively.
 Example: If users report inaccuracies, take that feedback to adjust the model.

RECTIFIED LINEAR UNIT (RELU)


 Definition: An activation function determines whether a neuron in a neural network
should be activated or not. It helps the model learn complex patterns in the data.
 Purpose: It introduces non-linearity into the model, allowing it to learn from errors
and make better predictions.

{ReLU}(x) = max(0, x)

• If the input x is greater than 0, ReLU outputs x.

• If x is less than or equal to 0, it outputs 0.

Steps

1. Input to the Neuron: When data is fed into the neuron, it produces a numerical
output (let’s call this output x).

2. Applying ReLU:

• If x > 0, the output remains x.

• If x \leq 0, the output becomes 0.

2. Output: The result is then passed onto the next layer in the neural network.

Why Use ReLU?

 Simplicity: It is computationally efficient since it only requires a simple thresholding


at zero.
 Performance: It helps mitigate the vanishing gradient problem, allowing models to
learn faster and perform better.
 Sparsity: ReLU creates sparse representations, meaning that it activates only a
portion of the neurons, making the model more efficient.
HEURISTICS FOR AVOIDING BAD LOCAL MINIMA

Introduction to Local Minima


 Local Minima: These are small dips or valleys in the landscape where we might get
stuck. They are not the lowest point (global minimum) but are lower than their
immediate surroundings.
 Global Minimum: This is the deepest point in the landscape, representing the best
possible outcome for our model.

The Challenge
If our optimization process gets stuck in a local minimum, it won't find the best solution. This
can lead to a model that doesn’t perform well, which is why we need strategies (heuristics) to
avoid this problem.

Heuristics to avoid bad local minima

1. Initialization Strategies
 Random Initialization: Start with random values for weights instead of setting them
all to zero. This increases the chances of exploring different paths in the optimization
landscape.
 Heuristic Initialization: Use techniques like Xavier or He initialization to set
weights based on the number of input and output neurons, helping in better
convergence.

2. Learning Rate Adjustment


 Adaptive Learning Rates: Use optimizers like Adam or RMSprop that adjust the
learning rate during training. A learning rate that changes can help navigate out of
local minima.
 Learning Rate Schedules: Gradually decrease the learning rate as training
progresses, which can help in fine-tuning the model once it’s near a minimum.

3. Momentum
 Incorporate Momentum: This technique helps the optimization algorithm maintain
its direction and speed, allowing it to "roll over" small local minima instead of
getting stuck.
 How it Works: Like pushing a heavy ball down a hill, momentum helps carry the
optimization past small dips in the landscape.
4. Adding Noise
 Stochastic Gradient Descent (SGD): Instead of using the entire dataset to calculate
gradients, use a small random subset. This introduces noise and variability, which
can help escape local minima.
 Dropout: In neural networks, randomly dropping out nodes during training can
prevent the model from becoming overly reliant on specific features, leading to better
generalization.

5. Batch Normalization
Normalizing Activations: This technique helps maintain healthy distributions of layer
inputs, allowing for faster training and reducing the chance of getting stuck in poor local
minima.

6. Ensemble Methods
Combine Multiple Models: Train different models and combine their predictions. This
can help mitigate the effects of any one model getting stuck in a local minimum.

HEURISTICS FOR FASTER TRAINING


Training deep learning models can be time-consuming and resource-intensive. Heuristics
help us optimize this process, allowing models to learn faster and more effectively without
sacrificing performance.

Steps to Implement Heuristics for Faster Training

1. Data Augmentation
 What is it?: This technique involves creating variations of the training data. For
example, if you have a picture of a dog, you can flip it, rotate it, or change its
brightness to get new images.
 How it helps: By providing more diverse data, it helps the model learn better and
generalizes well without needing more raw data.

2. Learning Rate Scheduling


 What is it?: The learning rate is a parameter that controls how much to change the
model in response to the estimated error each time the model weights are updated.
Scheduling means changing this rate during training.
 How it helps: Starting with a higher learning rate allows the model to learn quickly,
and then reducing it helps fine-tune the model for better accuracy.
3. Transfer Learning
 What is it?: Instead of training a model from scratch, transfer learning takes a pre-
trained model (one that has already learned from a large dataset) and fine-tunes it for
a specific task.
 How it helps: This approach saves time and resources, as the model already has a
good understanding of features relevant to many tasks.

4. Batch Normalization
 What is it?: This technique normalizes the inputs of each layer in the network,
ensuring that they have a mean of zero and a standard deviation of one.
 How it helps: It stabilizes the learning process and allows for faster convergence,
meaning the model can learn more quickly.

5. Early Stopping
 What is it?: This heuristic involves monitoring the model's performance on a
validation set and stopping training when performance starts to degrade.
 How it helps: It prevents overfitting (where the model learns the training data too
well and performs poorly on new data) and saves time by not training longer than
necessary.

NESTORS ACCELERATED GRADIENT DESCENT


Nesterov's Accelerated Gradient Descent (NAG) is an optimization technique used in
machine learning, particularly in deep learning for improving the performance and speed of
training models. Let's break this concept down step by step for undergraduate students.

Understanding Gradient Descent


Before diving into Nesterov's method, it’s essential to understand the basic concept of
Gradient Descent:

1. Objective: The main goal of gradient descent is to minimize a function, usually the
loss function in machine learning, which measures how well the model is performing.

2. How It Works:

 Step 1: Start with an initial point (the model's parameters).


 Step 2: Calculate the gradient (the slope of the function) at that point.
 Step 3: Move in the opposite direction of the gradient (downhill) to reach a lower
point. The step size is determined by a parameter called the learning rate.
 Step 4: Repeat this process until you reach a point where the function is minimized
(or until you have sufficiently reduced the loss).
What Makes Nesterov’s Method Special?
Nesterov's method improves upon basic gradient descent by introducing a concept called
momentum. Here’s how it works:

1. Momentum: Instead of just using the current gradient to update the parameters,
momentum helps the optimization process to keep moving in the right direction. Think of it
like a ball rolling down a hill: it gains speed as it goes downhill, which helps it overcome
small bumps along the way.

2. Nesterov’s Approach:

 In Nesterov's method, we first make a “lookahead” step by predicting where we will


be after applying momentum.
 Step 1: Calculate the gradient at this predicted position (not just the current position).
 Step 2: Use this gradient to update the parameters. This way, the update step
considers where we are headed, resulting in more informed updates.

Step-by-Step Explanation of Nesterov's Accelerated Gradient Descent


1. Initialize Parameters: Start with initial values for the model parameters and set
the learning rate.
2. Calculate Momentum: Compute the momentum term, which is a combination of
the previous update and the current gradient.
3. Lookahead Position: Predict the next position using the momentum term.
4. Compute Gradient: Calculate the gradient at the lookahead position.
5. Update Parameters: Update the model parameters using this new gradient
information.
6. Repeat: Continue this process until the loss function converges or reaches a
satisfactory level.
REGULARIZATION FOR DEEP LEARNING
Regularization is a technique used to prevent a machine learning model from becoming too
complex. When a model is too complex, it can perform very well onthe training data but
poorly on new, unseen data. This phenomenon is known as overfitting.

Key Terms:

 Overfitting: When a model learns the noise and details of the training data too well,
leading to poor performance on new data.
 Underfitting: When a model is too simple and fails to capture the underlying patterns
in the data.

Why Do We Need Regularization?


Think of regularization like a coach for athletes. Just as a coach helps athletes avoid
overtraining and maintain a balanced approach to their sport, regularization helps models
maintain a balance between learning from the data and not memorizing it.

Types of Regularization Techniques


There are several methods of regularization, but we will focus on two common techniques:

1. L1 Regularization
2. L2 Regularization.

1. L1 Regularization (Lasso)

Concept: L1 regularization adds a penalty equivalent to the absolute value of the magnitude
of coefficients (weights) to the loss function. This encourages the model to use fewer
features, effectively performing feature selection.

Mathematical Representation:

Loss = Original Loss + λ * ∑ |weights|

Example: Imagine you’re teaching a student to identify different types of fruit. If they focus
too much on minor details (like the exact shade of yellow in a banana), they may confuse it
with a lemon. L1 regularization helps the model focus on the most important features (like
shape and size) instead of the minor details.

2. L2 Regularization (Ridge)

Concept: L2 regularization adds a penalty equal to the square of the magnitude of


coefficients to the loss function. This helps to keep the weights small and spread out across
features, reducing the chances of overfitting.
Mathematical Representation:

Loss = Original Loss +λ *∑ (weights^2)

Example: Think of a student learning to play piano. If they try to memorize every single note
of a song instead of understanding the chords and structure, they may struggle to play
anything new. L2 regularization encourages the model to learn the general structure rather
than memorizing specific examples.

Real-Life Applications of Regularization


1. Image Classification: When training models to classify images (e.g., dogs vs. cats),
regularization helps ensure that the model doesn't get distracted by irrelevant details (like
background elements), allowing it to focus on essential features like fur patterns and shapes.

2. Natural Language Processing: In tasks like sentiment analysis, regularization can


prevent models from overfitting to specific phrases or words, helping them to generalize
better to different expressions of sentiment

ADVERSARIAL TRAINING
Adversarial training is a technique used in deep learning to improve the robustness of
machine learning models, particularly those used for image recognition and processing. It
helps models learn not just from regular images but also from challenging examples designed
to confuse them.

Why is it Important?
In the real world, models can encounter unexpected or tricky inputs that might make them
perform poorly. Adversarial training prepares models to handle these difficult cases by
exposing them to examples that are intentionally designed to mislead them.

Step-by-Step Explanation of Adversarial Training

1. Understanding Adversarial Examples:

Adversarial Examples are inputs to a model that have been slightly altered in a
way that is usually imperceptible to humans but causes the model to make an
incorrect prediction.

For example, if a model is trained to recognize cats, an image of a cat can be


slightly modified (like changing a few pixels) to trick the model into thinking
it’s a dog.
2. Creating Adversarial Examples:

Techniques like the Fast Gradient Sign Method (FGSM) can be used to create
these adversarial examples. This method calculates how small changes to an
image can lead to a large change in the model's output.

3. Training with Adversarial Examples:

During training, the model is shown both normal images and adversarial
examples. By learning from both, it becomes better at identifying real objects as
well as recognizing when an image is trying to trick it.

For instance, if the model learns that a specific modification of a cat image
leads to it thinking it’s a dog, it can adjust its parameters to improve its
performance on similar tricks in the future.

4. Evaluating Model Robustness:

After training, the model is tested with a mix of normal and adversarial
examples to see how well it performs. A robust model will maintain its accuracy
even when faced with adversarial inputs.

Real-Life Applications of Adversarial Training

 Self-Driving Cars: They must recognize road signs and pedestrians


accurately. Adversarial training helps ensure they can handle unusual
situations, like a sign that has been altered to confuse the model.
 Medical Image Diagnosis: In healthcare, models analyze X-rays or
MRIs. Adversarial training can help improve their accuracy, even when
images are slightly altered due to noise or other factors.
 Security Systems: Facial recognition systems can be tricked by small
changes in facial images. Adversarial training helps these systems
become more secure against attempts to bypass them.
OPTIMIZATION FOR TRAINING DEEP MODELS
Optimization in this context refers to the process of fine-tuning the parameters of a neural
network to improve its performance. Think of it as finding the best possible solution to a
problem — in our case, making the model as accurate as possible when predicting or
classifying data.

Why is Optimization Important?


1. Accuracy Improvement: A well-optimized model can significantly increase
accuracy in tasks like image recognition or natural language processing.
2. Efficiency: Optimization helps in reducing the time it takes for the model to learn
from the data, making the training process faster.
3. Resource Utilization: It ensures that computational resources (like GPU usage)
are used efficiently, which is crucial given the large datasets often involved in
deep learning.

Key Concepts in Optimization


1. Loss Function

The loss function measures how well the model's predictions match the actual outcomes. It
quantifies the difference between predicted values and actual values. Our goal in optimization
is to minimize this loss function.

Example: In a model that recognizes cats and dogs, if the model predicts a dog but the actual
image is of a cat, the loss function will give a higher value. We want to minimize these
errors.

2. Gradient Descent

Gradient Descent is one of the most common optimization algorithms used for training deep
models. It works by calculating the gradient (or slope) of the loss function and updating the
model's parameters in the opposite direction of that gradient.

Steps:

 Calculate the Gradient: Determine how much the loss function changes with respect
to each parameter.
 Update Parameters: Adjust the parameters slightly in the direction that reduces the
loss.
 Repeat: Continue this process until the loss is minimized and the model performs
satisfactorily.

Visual Cue: Imagine a ball rolling down a hill. The ball will naturally roll to the lowest point
(minimum loss). Gradient descent helps the model find its way to this "lowest point."
3. Learning Rate

The learning rate is a crucial hyperparameter in gradient descent. It determines how big of a
step we take in the parameter space when updating the model.

 Too High: The model might overshoot the minimum and fail to converge.
 Too Low: The model may take too long to converge, making the training inefficient.

Example: If you're climbing down a mountain, taking huge steps could lead you to fall off a
cliff (overshooting), while tiny steps will take forever to reach the bottom (too slow).

4. Regularization

Regularization techniques are used during optimization to prevent the model from becoming
too complex and overfitting the training data. Overfitting happens when the model learns the
training data too well, including its noise and outliers.

• L1 and L2 Regularization: These add a penalty to the loss function based on the size
of the parameters, encouraging the model to keep the parameters small and simple.

Example: Think of it like a student studying for a test. If the student memorizes every detail
from the textbook without understanding the concepts, they might do well on that test but
struggle with real-world applications.

You might also like