0% found this document useful (0 votes)

486 views38 pages

Deep Learning Book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

This document summarizes key aspects of deep feedforward neural networks discussed in Chapter 6 of the Deep Learning book. It first compares linear classifiers, shallow neural networks with one hidden layer, and deep neural networks, noting that deep networks can represent functions using relatively small hidden layers compared to shallow networks. It then discusses hyperparameters like depth, hidden layer sizes, and activation functions that define a network architecture. Finally, it discusses training parameters like weights and biases that must be optimized to learn functions from data.

Uploaded by

HarshitShukla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

486 views38 pages

Deep Learning Book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Uploaded by

HarshitShukla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Deep Learning book, by Ian Goodfellow,

Yoshua Bengio and Aaron Courville

Chapter 6 :Deep Feedforward Networks

Benoit Massé Dionyssos Kounades-Bastian

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 1 / 25

Linear regression (and classication)
Input vector x
Output vector y
Parameters Weight W and bias b

Prediction : y = W> x + b

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 2 / 25

Linear regression (and classication)
Input vector x
Output vector y
Parameters Weight W and bias b

Prediction : y = W> x + b

W b
x u y

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 2 / 25

Linear regression (and classication)
Input vector x
Output vector y
Parameters Weight W and bias b

Prediction : y = W> x + b

x1 W11 . . . W23 b1
u1 y1
x2 b2
u2 y2
x3
Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 2 / 25
Linear regression (and classication)
Input vector x
Output vector y
Parameters Weight W and bias b

Prediction : y = W> x + b

x1 W, b
y1
x2
y2
x3
Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 2 / 25
Linear regression (and classication)

Advantages
Easy to use
Easy to train, low risk of overtting

Drawbacks
Some problems are inherently non-linear

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 3 / 25

Solving XOR

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 4 / 25

Solving XOR

Linear regressor

x1 W, b
y
x2
There is no value for W and b
such that ∀(x1 , x2 ) ∈ {0, 1}2

x1
W >
+ b = xor (x1 , x2 )
x2

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 4 / 25

Solving XOR

What about... ?
W, b
x1 u1 V, c
y
x2 u2

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 5 / 25

Solving XOR

What about... ?
W, b
x1 u1 V, c
y
x2 u2
Strictly equivalent :
The composition of two linear operation is still a linear
operation

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 5 / 25

Solving XOR
And about... ?
W, b φ
x1 u1 h1 V, c
y
x2 u2 h2
In which φ(x ) = max {0, x }

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 6 / 25

Solving XOR
And about... ?
W, b φ
x1 u1 h1 V, c
y
x2 u2 h2
In which φ(x ) = max {0, x }

It is possible !
1 1 0 1

With W = ,b= ,V= and c = 0,
1 1 −1 −2

Vφ(Wx + b) = xor (x1 , x2 )

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 6 / 25
Neural network with one hidden layer

Compact representation
W , b, φ V, c
x h y

Neural network
Hidden layer with non-linearity
→ can represent broader class of function

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 7 / 25

Universal approximation theorem

Theorem
A neural network with one hidden layer can approximate any
continuous function

More formally, given a continuous function f : Cn 7→ Rm where

n
Cn is a compact subset of R ,

K
:x→ vi φ(wi > x + bi ) + c
X
ε
∀ε, ∃fNN
i =1
such that
ε
∀x ∈ Cn , ||f (x ) − fNN (x )|| < ε

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 8 / 25

Problems

Obtaining the network

The universal theorem gives no information about HOW to
obtain such a network
Size of the hidden layer h
Values of W and b

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 9 / 25

Problems

Obtaining the network

The universal theorem gives no information about HOW to
obtain such a network
Size of the hidden layer h
Values of W and b
Using the network
Even if we nd a way to obtain the network, the size of the
hidden layer may be prohibitively large.

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 9 / 25

Deep neural network
Why Deep ?
Let's stack l hidden layers one after the other; l is called the
length of the network.

W 1 , b1 , φ W 2 , b2 , φ W l , bl , φ V, c
x h1 ... hl y

Properties of DNN
The universal approximation theorem also apply
Some functions can be approximated by a DNN with N
hidden unit, and would require O(e N ) hidden units to be
represented by a shallow network.

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 10 / 25

Summary
Comparison
Linear classier
− Limited representational power

+ Simple

Shallow Neural network (Exactly one hidden layer)

+ Unlimited representational power

− Sometimes prohibitively wide

Deep Neural network

+ Unlimited representational power

+ Relatively small number of hidden units needed

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 11 / 25

Summary
Comparison
Linear classier
− Limited representational power

+ Simple

Shallow Neural network (Exactly one hidden layer)

+ Unlimited representational power

− Sometimes prohibitively wide

Deep Neural network

+ Unlimited representational power

+ Relatively small number of hidden units needed

Remaining problem
How to get this DNN ?

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 11 / 25

On the path of getting my own DNN

Hyperparameters
First, we need to dene the architecture of the DNN
The depth l
The size of the hidden layers n1 , . . . , nl
The activation function φ
The output unit

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 12 / 25

On the path of getting my own DNN

Hyperparameters
First, we need to dene the architecture of the DNN
The depth l
The size of the hidden layers n1 , . . . , nl
The activation function φ
The output unit

Parameters
When the architecture is dened, we need to train the DNN
W1 , b1 , . . . , Wl , bl

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 12 / 25

Hyperparameters

The depth l
The size of the hidden layers n1 , . . . , nl
Strongly depend on the problem to solve

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 13 / 25

Hyperparameters

The depth l
The size of the hidden layers n1 , . . . , nl
Strongly depend on the problem to solve

The activation function φ

g : x 7→ max { , xx } 1
σ : x 7→ ( + e )
ReLU 0

Sigmoid 1
− −

Many others : tanh, RBF, softplus...

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 13 / 25

Hyperparameters

The depth l
The size of the hidden layers n1 , . . . , nl
Strongly depend on the problem to solve

The activation function φ

g : x 7→ max { , xx } 1
σ : x 7→ ( + e )
ReLU 0

Sigmoid 1
− −

Many others : tanh, RBF, softplus...

The output unit

Linear outputE[y] = V> hl + c
For regression with Gaussian distribution y ∼ N (E[y], I)
ŷ
Sigmoid output = σ(w> hl + ) b
For classication with Bernouilli distribution P (y = 1|x) = ŷ

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 13 / 25

Parameters Training

Objective
Let's dene θ = (W1 , b1 , . . . , Wl , bl ).
We suppose we have a set of inputs X = (x1 , . . . , xN ) and a
set of expected outputs Y = (y1 , . . . , yN ). The goal is to nd
a neural network fNN such that
∀i , fNN (x i , θ) ' yi .

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 14 / 25

Parameters Training

Cost function
To evaluate the error that our current network makes, let's
dene a cost function L(X, Y, θ). The goal becomes to nd
argmin L(X, Y, θ)
θ

Loss function
Should represent a combination of the distances between every
yi and the corresponding fNN (xi , θ)
Mean square error (rare)
Cross-entropy

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 15 / 25

Parameters Training

Find the minimum

The basic idea consists in computing θ̂ such that
∇θ L(X, Y, θ̂) = 0.

This is dicult to solve analytically e.g. when θ have millions

of degrees of freedom.

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 16 / 25

Parameters Training

Gradient descent
Let's use a numerical way to optimize θ, called the gradient
descent (section 4.3). The idea is that
f (θ − εu) ' f (θ) − εu> ∇f (θ)

So if we take u = ∇f (θ), we have u> u > 0 and then

f (θ − εu) ' f (θ) − εu> u < f (θ).

If f is a function to minimize, we have an update rule that

improves our estimate.

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 17 / 25

Parameters Training

Gradient descent algorithm

1
Have an estimate θ̂ of the parameters
2
Compute ∇θ L(X, Y, θ̂)
3
Update θ̂ ←− θ̂ − ε∇θ L
4
Repeat step 2-3 until ∇θ L < threshold

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 18 / 25

Parameters Training

Gradient descent algorithm

1
Have an estimate θ̂ of the parameters
2
Compute ∇θ L(X, Y, θ̂)
3
Update θ̂ ←− θ̂ − ε∇θ L
4
Repeat step 2-3 until ∇θ L < threshold

Problem
How to estimate eciently ∇θ L(X, Y, θ̂) ?
Back-propagation algorithm

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 18 / 25

Back-propagation for Parameter Learning

Consider the architecture:

w 2 w 1

y φ φ
x
with function:

y =φ w2 φ(w1 x ) ,
N
some training pairs T = x̂n , ŷn n=1 , and

an activation-function φ().
Learn w1 , w2 so that: Feeding x̂n results ŷn .

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 19 / 25

Prerequisite: dierentiable activation function

For learning to be possible φ() has to be dierentiable.

Let φ0 (x ) = ∂φ(x ) denote the derivative of φ(x ).
∂x
For example when φ(x ) = Relu(x ) we have:

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 20 / 25

Gradient-based Learning

Minimize the loss function L(w1 , w2 , T ).

We will learn the weights by iterating:
 
updated ∂L
 ∂ w1 

w1 w1
= −γ , (1)
w2 w2 ∂L
∂ w2

L is the loss function (must be dierentiable): In detail is

L(w1 , w2 , T ) and we want to compute the gradient(s) at
w1 , w2 .

γ is the learning rate (a scalar typically known).

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 21 / 25

Back-propagation

Calculate intermediate The partial derivatives are:

values on all units:
∂L(d )
∂d = L0 ( ).d
a = w1x̂n
6

1 .
∂d
= φ0 ( w2φ(w1x̂n ))
b = φ(w1x̂n ) ∂c
7 .
.
∂c
w2
2

c = w2φ(w1x̂n )
8
∂b = .

w1x̂n )
3 .
∂b
d = φ w2φ(w1x̂n ) ∂a = φ0 ( .
9

4 .

L(d ) = L φ w2 φ(w1 x̂n )

5 .

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 22 / 25

Calculating the Gradients I

Apply chain rule:

∂L ∂L ∂ d ∂ c ∂ b ∂ a
= ,
∂ w1 ∂ d ∂ c ∂ b ∂ a w1

∂L(d ) ∂L(d ) ∂ d ∂ c
= .
∂ w2 ∂ d ∂ c w2

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 23 / 25

Calculating the Gradients I

Apply chain rule:

∂L ∂L ∂ d ∂ c ∂ b ∂ a
= ,
∂ w1 ∂ d ∂ c ∂ b ∂ a w1

∂L(d ) ∂L(d ) ∂ d ∂ c
= .
∂ w2 ∂ d ∂ c w2

Start the calculation from left-to-right.

We propage the gradients (partial products) from the last
layer towards the input.

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 23 / 25

Calculating the Gradients

And because we have N training pairs:

N
∂L X ∂L(dn ) ∂ dn ∂ cn ∂ bn ∂ an
= ,
∂ w1 n=1 ∂ dn ∂ cn ∂ bn ∂ an w1

N
∂L X ∂L(dn ) ∂ dn ∂ cn
= .
∂ w2 n=1 ∂ dn ∂ cn w2

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 24 / 25

Thank you !

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 25 / 25

9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
Deep Learning
No ratings yet
Deep Learning
1,108 pages
Blockchain Retirement Savings Platform
No ratings yet
Blockchain Retirement Savings Platform
54 pages
ANN-unit 3
No ratings yet
ANN-unit 3
30 pages
Building A 2D Game Physics Engine Using HTML5 and JavaScript Tanaya Available All Format
100% (1)
Building A 2D Game Physics Engine Using HTML5 and JavaScript Tanaya Available All Format
101 pages
Network Guide To Networks 5th Edition Tamara Dean Complete Edition
100% (3)
Network Guide To Networks 5th Edition Tamara Dean Complete Edition
107 pages
Heuristic Optimization Methods: Tabu Search: Slides Prepared by Nina Skorin-Kapov
No ratings yet
Heuristic Optimization Methods: Tabu Search: Slides Prepared by Nina Skorin-Kapov
40 pages
Automates Neural Architecture Construction
No ratings yet
Automates Neural Architecture Construction
23 pages
Hands On Start To Wolfram Mathematica and Programming With The Wolfram Language 3rd Edition Cliff Hastings PDF Download
No ratings yet
Hands On Start To Wolfram Mathematica and Programming With The Wolfram Language 3rd Edition Cliff Hastings PDF Download
143 pages
Btech CSE
100% (1)
Btech CSE
17 pages
P1 - PMR3508
No ratings yet
P1 - PMR3508
9 pages
Big Data Computations - Comparing Apache HAWQ, Druid, and GPU Databases Presentation
No ratings yet
Big Data Computations - Comparing Apache HAWQ, Druid, and GPU Databases Presentation
18 pages
Barry James
No ratings yet
Barry James
22 pages
Estatistica para Economistas Estatistica Aplicada A Economia
No ratings yet
Estatistica para Economistas Estatistica Aplicada A Economia
221 pages
ML-5TH Unit
No ratings yet
ML-5TH Unit
28 pages
CertyIQ Ai-900
No ratings yet
CertyIQ Ai-900
147 pages
Brandes U., Erlebach T. Eds. Network Analysis. Methodological Foundations 2005ã. 482ñ. ISBN ISBN10 3-540-24979-6 PDF
100% (1)
Brandes U., Erlebach T. Eds. Network Analysis. Methodological Foundations 2005ã. 482ñ. ISBN ISBN10 3-540-24979-6 PDF
482 pages
DL Unit1 Final
No ratings yet
DL Unit1 Final
41 pages
Uday Kamath, Kevin Keenan, Garrett Somers, Sarah Sorenson - Large Language Models - A Deep Dive-Springer (2024)
No ratings yet
Uday Kamath, Kevin Keenan, Garrett Somers, Sarah Sorenson - Large Language Models - A Deep Dive-Springer (2024)
496 pages
Lecture 1: Introduction To Reinforcement Learning: David Silver
No ratings yet
Lecture 1: Introduction To Reinforcement Learning: David Silver
46 pages
Judge Holden
No ratings yet
Judge Holden
1 page
PDF D o W N L o A D Teach Yourself Qbasic in 21 Days Full PDF Online 191219092805
50% (2)
PDF D o W N L o A D Teach Yourself Qbasic in 21 Days Full PDF Online 191219092805
11 pages
Bioinformatics: Methods and Applications Dev Bukhsh Singh Full Chapters Included
No ratings yet
Bioinformatics: Methods and Applications Dev Bukhsh Singh Full Chapters Included
119 pages
Applied Calculus 8th Edition Waner Full Download
No ratings yet
Applied Calculus 8th Edition Waner Full Download
406 pages
AI-Driven Trading Strategies
No ratings yet
AI-Driven Trading Strategies
16 pages
ML for Breast Cancer Prediction
No ratings yet
ML for Breast Cancer Prediction
8 pages
Stochastic Gradient Descent Tuning
No ratings yet
Stochastic Gradient Descent Tuning
8 pages
EMG Signal Biophysics
No ratings yet
EMG Signal Biophysics
23 pages
Machine Learning Revised and Updated Edition Ethem Alpaydin Full Chapters Included
100% (1)
Machine Learning Revised and Updated Edition Ethem Alpaydin Full Chapters Included
111 pages
Design Thinking Innovative Solutions For A Better World
100% (2)
Design Thinking Innovative Solutions For A Better World
187 pages
Applications of Machine Learning and Deep Learning On Biological Data 1st Edition Faheem Masoodi Download
No ratings yet
Applications of Machine Learning and Deep Learning On Biological Data 1st Edition Faheem Masoodi Download
119 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
Questoes AWS
No ratings yet
Questoes AWS
4 pages
Andersen & Petersen (1993)
No ratings yet
Andersen & Petersen (1993)
5 pages
Short Notes On Vanishing & Exploding Gradients
No ratings yet
Short Notes On Vanishing & Exploding Gradients
30 pages
Software Requirements Engineering: Sobia Shafiq Lecture # 1
No ratings yet
Software Requirements Engineering: Sobia Shafiq Lecture # 1
39 pages
A Causal Introduction To Machine Learning
No ratings yet
A Causal Introduction To Machine Learning
47 pages
Duncan Steel - Introduction To Quantum Nanotechnology - A Problem Focused Approach-Oxford Univ Press (2021)
No ratings yet
Duncan Steel - Introduction To Quantum Nanotechnology - A Problem Focused Approach-Oxford Univ Press (2021)
389 pages
Deep Learning Assignment 1 Solution: Name: Vivek Rana Roll No.: 1709113908
No ratings yet
Deep Learning Assignment 1 Solution: Name: Vivek Rana Roll No.: 1709113908
5 pages
Questões Objetivas: Guia para Educadores
No ratings yet
Questões Objetivas: Guia para Educadores
11 pages
Breast Cancer Diagnostiic Using Machine Learning
No ratings yet
Breast Cancer Diagnostiic Using Machine Learning
72 pages
UNIT-I - Introduction To Computer Vision
No ratings yet
UNIT-I - Introduction To Computer Vision
45 pages
Syllabus
No ratings yet
Syllabus
2 pages
Improve Model Accuracy With Data Pre-Processing
No ratings yet
Improve Model Accuracy With Data Pre-Processing
11 pages
Deep Learning Applications,: M. Arif Wani Taghi M. Khoshgoftaar Vasile Palade Editors
No ratings yet
Deep Learning Applications,: M. Arif Wani Taghi M. Khoshgoftaar Vasile Palade Editors
307 pages
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
No ratings yet
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
31 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
6.1 DeepFFNets M2
No ratings yet
6.1 DeepFFNets M2
48 pages
DL 2
No ratings yet
DL 2
62 pages
Week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
Week 03-04 - Deep Feedforward Networks - Intro
141 pages
Contents MLP PDF
No ratings yet
Contents MLP PDF
60 pages
978-3-030-41068-1 (1) - 133-188
No ratings yet
978-3-030-41068-1 (1) - 133-188
56 pages
Unit II
No ratings yet
Unit II
56 pages
3 Neural Networks
No ratings yet
3 Neural Networks
72 pages
Module 2
No ratings yet
Module 2
44 pages
Module 2 DL Snotes P1
No ratings yet
Module 2 DL Snotes P1
16 pages
DL - M2 - Deep Feedforward NN
No ratings yet
DL - M2 - Deep Feedforward NN
97 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
Deep Learning Algorithms Report PDF
No ratings yet
Deep Learning Algorithms Report PDF
11 pages
Study and Practice of Yoga Volume-I
No ratings yet
Study and Practice of Yoga Volume-I
8 pages
Tablet Publisher: - Instructions
No ratings yet
Tablet Publisher: - Instructions
18 pages
Algorithms Illuminated: Part 2: Graph Algorithms and Data Structures Tim Roughgarden
No ratings yet
Algorithms Illuminated: Part 2: Graph Algorithms and Data Structures Tim Roughgarden
28 pages
Teaching Philosophy Guidelines PDF
No ratings yet
Teaching Philosophy Guidelines PDF
14 pages
Glimpses of Vedic Literature
100% (1)
Glimpses of Vedic Literature
255 pages
Deep Learning Lib
No ratings yet
Deep Learning Lib
3 pages
Modern Placer Mining
No ratings yet
Modern Placer Mining
36 pages
1york Yhe 5005747 Ytg A 0216
No ratings yet
1york Yhe 5005747 Ytg A 0216
70 pages
YTG J 1009SunlineXP
No ratings yet
YTG J 1009SunlineXP
73 pages
Graph Traversals: Slides by Carl Kingsford
No ratings yet
Graph Traversals: Slides by Carl Kingsford
52 pages
YG Core Competency Framework
100% (1)
YG Core Competency Framework
15 pages
York Chiller Manual Guide
100% (1)
York Chiller Manual Guide
2 pages
Synchronous Digital Hierarchy (SDH) Tutorial: 1. Introduction: Emergence of SDH
No ratings yet
Synchronous Digital Hierarchy (SDH) Tutorial: 1. Introduction: Emergence of SDH
15 pages
Ytg PDF
No ratings yet
Ytg PDF
2 pages
Manual Chiller York Model Ytg 3
No ratings yet
Manual Chiller York Model Ytg 3
3 pages
Raph Raversals: A B C D
No ratings yet
Raph Raversals: A B C D
22 pages
Manual Chiller York Model Ytg 2
No ratings yet
Manual Chiller York Model Ytg 2
3 pages
Early Diagnosis of Parkinson's Disease: A Combined Method Using Deep Learning and Neuro-Fuzzy Techniques
No ratings yet
Early Diagnosis of Parkinson's Disease: A Combined Method Using Deep Learning and Neuro-Fuzzy Techniques
14 pages
AI's Role in Modern Healthcare
No ratings yet
AI's Role in Modern Healthcare
20 pages
Deepsense: A Unified Deep Learning Framework For Time-Series Mobile Sensing Data Processing
No ratings yet
Deepsense: A Unified Deep Learning Framework For Time-Series Mobile Sensing Data Processing
9 pages
Learn Outlier Detection in Python PyOD Library 1566237490
No ratings yet
Learn Outlier Detection in Python PyOD Library 1566237490
23 pages
Machine Learning in Manufacturing
No ratings yet
Machine Learning in Manufacturing
19 pages
Soft Computing
100% (1)
Soft Computing
12 pages
FPGA Neural Network Design
No ratings yet
FPGA Neural Network Design
4 pages
Phishing Url Detection Using CNNLSTM and Random Forest Classifier
No ratings yet
Phishing Url Detection Using CNNLSTM and Random Forest Classifier
6 pages
Final Thesis Report-Bhuvanesh Kumar J
No ratings yet
Final Thesis Report-Bhuvanesh Kumar J
72 pages
AI Explainability Whitepaper
No ratings yet
AI Explainability Whitepaper
27 pages
Image Classification Using Resnet
No ratings yet
Image Classification Using Resnet
28 pages
Neural Networks for Beginners
No ratings yet
Neural Networks for Beginners
4 pages
Analyses of Machine Learning Techniques For Sign Language To Text Conversion For Speech Impaired
No ratings yet
Analyses of Machine Learning Techniques For Sign Language To Text Conversion For Speech Impaired
5 pages
Street View Number Recognition Project
No ratings yet
Street View Number Recognition Project
2 pages
Lecture 8 Deep Learning Overview PDF
No ratings yet
Lecture 8 Deep Learning Overview PDF
98 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Data Science - Full-Time PDF
No ratings yet
Data Science - Full-Time PDF
34 pages
Understanding Auditory Evoked Brain Signal Via Physics-Informed Embedding Network With Multi-Task Transformer
No ratings yet
Understanding Auditory Evoked Brain Signal Via Physics-Informed Embedding Network With Multi-Task Transformer
12 pages
Entropy: A Labeling Method For Financial Time Series Prediction Based On Trends
No ratings yet
Entropy: A Labeling Method For Financial Time Series Prediction Based On Trends
27 pages
ML in Student Performance Analysis
No ratings yet
ML in Student Performance Analysis
15 pages
Lunet: A Deep Neural Network For Network Intrusion Detection
No ratings yet
Lunet: A Deep Neural Network For Network Intrusion Detection
8 pages
18 Intellisys Employee
No ratings yet
18 Intellisys Employee
22 pages
Wa0002.
No ratings yet
Wa0002.
14 pages
Evaluation of Earthquake Resistance of Urban Buildings Using Image Processing and Machine Learning Techniques
No ratings yet
Evaluation of Earthquake Resistance of Urban Buildings Using Image Processing and Machine Learning Techniques
7 pages
Periodic Exam
No ratings yet
Periodic Exam
4 pages
Data - The New Oil!
No ratings yet
Data - The New Oil!
11 pages
Large Scale Deep Learning
No ratings yet
Large Scale Deep Learning
170 pages
Learning With Fractional Orthogonal Kernel Classifiers in Support Vector Machines
No ratings yet
Learning With Fractional Orthogonal Kernel Classifiers in Support Vector Machines
312 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
70 pages

Deep Learning Book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Uploaded by

Deep Learning Book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Uploaded by

Deep Learning book, by Ian Goodfellow,

Yoshua Bengio and Aaron Courville

Benoit Massé Dionyssos Kounades-Bastian

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 1 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 2 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 2 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 3 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 4 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 4 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 5 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 5 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 6 / 25

Vφ(Wx + b) = xor (x1 , x2 )

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 7 / 25

More formally, given a continuous function f : Cn 7→ Rm where

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 8 / 25

Obtaining the network

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 9 / 25

Obtaining the network

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 9 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 10 / 25

Shallow Neural network (Exactly one hidden layer)

− Sometimes prohibitively wide

Deep Neural network

+ Relatively small number of hidden units needed

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 11 / 25

Shallow Neural network (Exactly one hidden layer)

− Sometimes prohibitively wide

Deep Neural network

+ Relatively small number of hidden units needed

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 11 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 12 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 12 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 13 / 25

The activation function φ

Many others : tanh, RBF, softplus...

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 13 / 25

The activation function φ

Many others : tanh, RBF, softplus...

The output unit

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 13 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 14 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 15 / 25

Find the minimum

This is dicult to solve analytically e.g. when θ have millions

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 16 / 25

So if we take u = ∇f (θ), we have u> u > 0 and then

If f is a function to minimize, we have an update rule that

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 17 / 25

Gradient descent algorithm

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 18 / 25

Gradient descent algorithm

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 18 / 25

Consider the architecture:

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 19 / 25

For learning to be possible φ() has to be dierentiable.

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 20 / 25

Minimize the loss function L(w1 , w2 , T ).

L is the loss function (must be dierentiable): In detail is

γ is the learning rate (a scalar typically known).

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 21 / 25

Calculate intermediate The partial derivatives are:

L(d ) = L φ w2 φ(w1 x̂n )

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 22 / 25

Apply chain rule:

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 23 / 25

Apply chain rule:

Start the calculation from left-to-right.

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 23 / 25

And because we have N training pairs:

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 24 / 25

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 25 / 25

You might also like

This is dicult to solve analytically e.g. when θ have millions

For learning to be possible φ() has to be dierentiable.

L is the loss function (must be dierentiable): In detail is