0% found this document useful (0 votes)

28 views212 pages

Unit-1 and 2 and 3

The document provides an introduction to deep learning, covering topics such as biological and artificial neurons, activation functions, and various gradient descent optimization techniques. It discusses the architecture of neural networks, including multilayer perceptrons and the importance of regularization and data augmentation in training models. Additionally, it highlights the evolution of deep learning methods and their applications in fields like gaming, language processing, and vision.

Uploaded by

Dnyanesh Radke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views212 pages

Unit-1 and 2 and 3

Uploaded by

Dnyanesh Radke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 212

Introduction to Deep

Learning
Biological Neurons
AI Development
The Deep Revival
From CAT to CNN
DL : Faster, Higher, Stronger
DL : Sequencing Models
Gaming
The rise of Transformer
From Language to Vision
Discrimination to Generalization
Questions??
Artificial Neuron
Artificial Neuron
Guess the personality??
Guess the personality??
Guess the personality??
Guess the personality??
Questions??
Decision
Boolean Function
McCulloch Pitts Model
OR Using MP Model
Non-Boolean Function
Example
OR Using Perceptron Model
Errors
Questions?
Perceptron Learning Algorithm
Questions??
Linearly separable functions
OR/XOR Using Perceptron Model
What is the solution
for points which are
linearly inseparable??
??
Network of Perceptron (MLP)
Multilayer Network of Perceptron (MLP)
XOR Using MLP
Three Input MLP
What if you have more than 3 Input??
MLP
Sigmoid Neuron
Supervised Learning
Machine Learning SL Setup
• Data ?
• Model?
• Parameter?
• Learning Algorithm?
• Objective Function / Loss function?
Learning Parameter
Learning Algorithm
Example:
Calculation
Questions??
Feed Forward Network
Multilayer Network of neuron
feed forward neural network
Questions??
Learning parameters
: Gradient Descent
Calculate Grad(θ): Grad. (W) and
Grad. (b)
Example:
Calculation
Problem Type-1
Problem Type-2
Problem Type : Regression / Classification
Questions??
Activation Function
Activation Function
Activation Function..
• Nonlinear — When the activation function is non-linear, then a two-layer neural
network can be proven to be a universal function approximator. The identity
activation function does not satisfy this property. When multiple layers use the
identity activation function, the entire network is equivalent to a single-layer
model.
• Range — When the range of the activation function is finite, gradient-based
training methods tend to be more stable, because pattern presentations
significantly affect only limited weights. When the range is infinite, training is
generally more efficient because pattern presentations significantly affect most of
the weights. In the latter case, smaller learning rates are typically necessary.
• Continuously differentiable — This property is desirable (ReLU is not
continuously differentiable and has some issues with gradient-based optimization,
but it is still possible) for enabling gradient-based optimization methods. The
binary step activation function is not differentiable at 0, and it differentiates to 0
for all other values, so gradient-based methods can make no progress with it.
1. The Sigmoid Function

• Sigmoid functions are used in machine learning for logistic regression

and basic neural network implementations and they are the
introductory activation units. But for advanced Neural Network
Sigmoid functions are not preferred due to various drawbacks
(vanishing gradient problem).
Tanh Function

• In tanh function the drawback we saw in sigmoid function is

addressed (not entirely), here the only difference with sigmoid
function is the curve is symetric across the origin with values ranging
from -1 to 1.
ReLU
• A Rectified Linear Unit (A unit employing the rectifier is also called a
rectified linear unit ReLU) has output 0 if the input is less than 0,
and raw output otherwise. That is, if the input is greater than 0, the
output is equal to the input. The operation of ReLU is closer to the
way our biological neurons work.
Softmax Function
• Softmax is a very interesting activation function because it not
only maps our output to a [0,1] range but also maps each output in
such a way that the total sum is 1. The output of Softmax is therefore
a probability distribution.
Forms of GD
Training of Feedforward Neural Network with
Gradient Descent
• Training FNNs involves adjusting their weights to minimize the loss
function, which measures the difference between the network's
predictions and the actual targets. Gradient Descent (GD) is a
fundamental method used for this optimization.
Training of Feedforward Neural Network with
Gradient Descent
Step-1
import numpy as np

# Simple example: Training a network to learn the AND function

# Inputs and corresponding targets for AND
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [0], [1], [1]])

# Sigmoid activation function and its derivative

def sigmoid(x):
return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
return x * (1 - x)
Training of Feedforward Neural Network with
Gradient Descent
Step-2
# Initialize weights randomly
weights = np.random.uniform(size=(2, 1))
bias = np.random.uniform(size=(1))
learning_rate = 0.1

# Training loop
for epoch in range(10000):
inputs = X
# Forward propagation
z = np.dot(inputs, weights) + bias
output = sigmoid(z)

# Calculate the error

error = y - output
Training of Feedforward Neural Network with
Gradient Descent
Step-3
# Backpropagation
adjustment = error * sigmoid_derivative(output)
weights += np.dot(inputs.T, adjustment) * learning_rate
bias += np.sum(adjustment, axis=0) * learning_rate

# Predictions after training

print("Output after training")
print(output)
Momentum Based Gradient Descent
• Momentum helps accelerate the GD in the correct direction and
dampens oscillations by adding a fraction of the previous update to
the current one.

momentum = 0.9
v = 0 # Initialize velocity

for each epoch:

gradients = compute_gradients(data, weights)
v = momentum * v + learning_rate * gradients
weights = weights - v
Nesterov Accelerated Gradient Descent
• Nesterov Accelerated Gradient (NAG) is a slight variation on the
momentum idea, where the gradient is calculated at an ahead point
rather than the current position.

momentum = 0.9
v = 0 # Initialize velocity

for each epoch:

temp_weights = weights - momentum * v
gradients = compute_gradients(data, temp_weights)
v = momentum * v + learning_rate * gradients
weights = weights - v
Stochastic Gradient Descent (SGD)
• SGD updates the weights by calculating the gradient based on a
subset of the data, making the training process faster.

for each epoch:

for each batch in data:
gradients = compute_gradients(batch, weights)
weights = weights - learning_rate * gradients
AdaGrad, RMSProp, and Adam
• These are adaptive learning rate optimization algorithms. AdaGrad
adjusts the learning rate for each parameter, RMSProp modifies
AdaGrad to improve its performance in the long run, and Adam
combines the ideas of momentum and RMSProp for an efficient and
effective optimization.
beta1 = 0.9
beta2 = 0.999
epsilon = 1e-8
m=0
v=0

for each epoch:

gradients = compute_gradients(data, weights)
m = beta1 * m + (1 - beta1) * gradients
v = beta2 * v + (1 - beta2) * (gradients ** 2)
m_hat = m / (1 - beta1 ** epoch) # Correct bias
v_hat = v / (1 - beta2 ** epoch)
weights = weights - learning_rate * m_hat / (np.sqrt(v_hat) + epsilon)
Forms of GD
Gradient Descent Update Strategy Key Feature Computational Cost Convergence Speed Application
Variant
Batch Gradient Descent Full Dataset Global Minimum High Slow Simple regression, small
(BGD) (Convex Functions) datasets
Stochastic Gradient Single Example Escapes Local Minima Low Fast Online learning, real-
Descent (SGD) time applications
Mini-Batch Gradient Small Batch Balances Efficiency & Medium Medium Deep learning, large-
Descent Stability scale classification
problems
Momentum-Based Full/Batch Faster Convergence Medium Fast Image recognition, deep
Gradient Descent learning frameworks
Nesterov Accelerated Full/Batch Smooth Convergence Medium Faster than Speech recognition, NLP
Gradient (NAG) Momentum tasks
Adagrad Adaptive Good for Sparse Data Low Slows Over Time Text processing, NLP
applications
RMSprop Adaptive Prevents Learning Medium Fast Recurrent Neural
Rate Decay Networks (RNNs), speech
analysis
Adam Adaptive Combines Medium Very Fast General deep learning,
Momentum & CNNs, NLP,
reinforcement learning
RMSprop
Nadam Adaptive Adds Nesterov Medium Faster than Adam Computer vision,
Momentum sequence modeling
AdaMax Adaptive Stable Updates Medium Fast Training GANs, complex
neural networks
AMSGrad Adaptive Prevents Learning Medium Stable Financial modeling,
Rate Decay advanced AI applications
Bias and Variance
Bias and Variance
Train error vs Test error
Regularization
L2 Regularization
• L2 regularization, known as weight decay in the context of neural
networks, is commonly applied to the weights of the neural network
layers.
• It helps prevent overfitting by shrinking the weights, making the
network less sensitive to small changes in input data.
• L2 regularization encourages smaller, more evenly distributed weights
by adding a penalty based on the square of the coefficients.
L2 Regularization
Data Augmentation
Data Augmentation
• Typically, More data = better learning
• Works well for image classification / object recognition tasks
• Also shown to work well for speech
• For some tasks it may not be clear how to generate such data
Questions??

AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
48 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
Training Feedforward DNN Guide
No ratings yet
Training Feedforward DNN Guide
9 pages
DL Module 2 1 (Sami)
No ratings yet
DL Module 2 1 (Sami)
17 pages
Module 2
No ratings yet
Module 2
67 pages
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
No ratings yet
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
14 pages
CS601 Machine Learning Unit 2 Notes 1672759753
No ratings yet
CS601 Machine Learning Unit 2 Notes 1672759753
14 pages
Neural Network (Basics)
No ratings yet
Neural Network (Basics)
48 pages
15 Deep
No ratings yet
15 Deep
39 pages
Mcculloh: Linear Activation Function
No ratings yet
Mcculloh: Linear Activation Function
12 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
A) Explanation of Two Tensor Operations With Examp
No ratings yet
A) Explanation of Two Tensor Operations With Examp
11 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
DL Unit2
No ratings yet
DL Unit2
113 pages
NN Unit - 1
No ratings yet
NN Unit - 1
27 pages
FDL Module1
No ratings yet
FDL Module1
102 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Unit 3
No ratings yet
Unit 3
7 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Module 2 DL Snotes P1
No ratings yet
Module 2 DL Snotes P1
16 pages
Module4 AI
No ratings yet
Module4 AI
12 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Survey of FNN
No ratings yet
Survey of FNN
25 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Week 14 (NN)
No ratings yet
Week 14 (NN)
49 pages
ML MU Unit 5NeuralNetworkpdf 2025 04 16 13 47 39
No ratings yet
ML MU Unit 5NeuralNetworkpdf 2025 04 16 13 47 39
57 pages
Deep Learning & Activation Functions
No ratings yet
Deep Learning & Activation Functions
32 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
Unit II
No ratings yet
Unit II
56 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
6103 Deep Neural Network - Related Concepts (Lecture 12)
No ratings yet
6103 Deep Neural Network - Related Concepts (Lecture 12)
7 pages
Neural Networks
No ratings yet
Neural Networks
14 pages
Unit V
No ratings yet
Unit V
25 pages
Neural Network Optimization Tactics
No ratings yet
Neural Network Optimization Tactics
20 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Cst414-Deep Learning Module 2
No ratings yet
Cst414-Deep Learning Module 2
13 pages
DL Test-2
No ratings yet
DL Test-2
28 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
UNIT 1 Introduction Part 1
No ratings yet
UNIT 1 Introduction Part 1
37 pages
Module 1
No ratings yet
Module 1
64 pages
Lecture - 05 (Introduction To ANN)
No ratings yet
Lecture - 05 (Introduction To ANN)
27 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Unit 1
No ratings yet
Unit 1
72 pages
Machine Learning for Beginners
No ratings yet
Machine Learning for Beginners
50 pages
Mlfa Autumn 22 Lec 05
No ratings yet
Mlfa Autumn 22 Lec 05
29 pages
Deep Learning Interview Guide
No ratings yet
Deep Learning Interview Guide
17 pages
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
No ratings yet
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
21 pages
To Help You Earn The HCIA
No ratings yet
To Help You Earn The HCIA
8 pages
Math For ML
No ratings yet
Math For ML
10 pages
Lecture3-Steepest and Gradient Descent
No ratings yet
Lecture3-Steepest and Gradient Descent
7 pages
Ai - Automated Cars Research Report
No ratings yet
Ai - Automated Cars Research Report
16 pages
AMED Project 2
No ratings yet
AMED Project 2
14 pages
Artificial Neural Networks & Fuzzy Logic
No ratings yet
Artificial Neural Networks & Fuzzy Logic
13 pages
CS231n Lecture: Regularization
No ratings yet
CS231n Lecture: Regularization
105 pages
(VDT) Llama-2
No ratings yet
(VDT) Llama-2
39 pages
Guava Leaf Disease Detection
No ratings yet
Guava Leaf Disease Detection
5 pages
A Survey On Kolmogorov-Arnold Networks
No ratings yet
A Survey On Kolmogorov-Arnold Networks
35 pages
Base Paper
No ratings yet
Base Paper
12 pages
AI Learning Frameworks Survey
No ratings yet
AI Learning Frameworks Survey
20 pages
Incentivizing Honesty Among Competitors in Collaborative Learning and Optimization
No ratings yet
Incentivizing Honesty Among Competitors in Collaborative Learning and Optimization
37 pages
Data Science & ML Interview Guide
100% (1)
Data Science & ML Interview Guide
18 pages
Assignment On Module-3
No ratings yet
Assignment On Module-3
3 pages
Deep Learning - IIT Ropar - Unit 7 - Week 4
No ratings yet
Deep Learning - IIT Ropar - Unit 7 - Week 4
6 pages
Lecture 4-5
No ratings yet
Lecture 4-5
48 pages
Federated Learning - Fundamentals and Advances - Yaochu Jin, Hangyu Zhu, Jinjin Xu, Yang Chen
No ratings yet
Federated Learning - Fundamentals and Advances - Yaochu Jin, Hangyu Zhu, Jinjin Xu, Yang Chen
227 pages
Deep Learning Based Context Aware Recommender System
No ratings yet
Deep Learning Based Context Aware Recommender System
70 pages
Deep Neural Network Training Guide
No ratings yet
Deep Neural Network Training Guide
55 pages
Deep Learning for Data Scientists
No ratings yet
Deep Learning for Data Scientists
17 pages
4pdevelopment and Validation of An
No ratings yet
4pdevelopment and Validation of An
8 pages
Deep Learning in Data Science Theoretical Foundati
No ratings yet
Deep Learning in Data Science Theoretical Foundati
6 pages
(Deep Learning Using PyTorch) (Cheatsheet)
No ratings yet
(Deep Learning Using PyTorch) (Cheatsheet)
7 pages
Ait401 DL Syllubus
100% (1)
Ait401 DL Syllubus
13 pages
Ai 2024
No ratings yet
Ai 2024
5 pages
DN CNN
No ratings yet
DN CNN
14 pages
Huawei Talent Quizzes
No ratings yet
Huawei Talent Quizzes
7 pages

Unit-1 and 2 and 3

Uploaded by

Unit-1 and 2 and 3

Uploaded by

Introduction to Deep

• Sigmoid functions are used in machine learning for logistic regression

• In tanh function the drawback we saw in sigmoid function is

# Simple example: Training a network to learn the AND function

# Sigmoid activation function and its derivative

# Calculate the error

# Predictions after training

for each epoch:

for each epoch:

for each epoch:

for each epoch:

You might also like