0% found this document useful (0 votes)

9 views64 pages

Lec14 CNNRNNModels

The document provides an introduction to neural networks, focusing on training methodologies such as feedforward networks, backpropagation, and various gradient descent techniques. It discusses the architecture of convolutional neural networks (CNNs) and their applications in image processing, as well as the use of recurrent neural networks (RNNs) for sequential prediction tasks. Additionally, it covers autoencoders for unsupervised learning and dimensionality reduction.

Uploaded by

eduardogiamballuci1967

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views64 pages

Lec14 CNNRNNModels

Uploaded by

eduardogiamballuci1967

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

INTRODUCTION TO MACHINE LEARNING

Neural Networks II

Giovanni Iacca

(credits: Elisa Ricci)

Training a Neural
Network
Feedforward networks
The function f is a composition of multiple functions:
Feedforward networks
● Goal: Approximate some unknown ideal function
● Feedforward Network:
○ Define parametric mapping f (xi ; Θ)
○ Learn parameters to get a good approximation of from available sample
● The computation can be described by a Directed Acyclic Graph (DAG)
○ Information flow in function evaluation begins at input, and then flows through
intermediate computations to produce the output
Modeling Choices
● Need to choose:
○ Cost function
○ Form of output
○ Activation functions
○ Architecture (number of layers etc)
○ Optimizer (for training)
Training a Neural Network
● Learning = Optimization
● Main idea: Given training samples T={(x1 ,y1), (x2 ,y2), …., (xN ,yN)}, adjust all
the weights of the network Θ such that a cost function is minimized

minΘ Σi L(yi , f (xi ; Θ))

● Choose your loss function (e.g., square loss, cross-entropy loss, etc.)
● Update the weights of each layer with gradient descent
● Use the backpropagation to compute the gradient efficiently
So Far: Backpropagation
1. Forward propagation: sum inputs, produce activations, feed-forward
2. Error estimation
3. Back propagate the error signal and used it to update weights

f (xi ; Θ)
Gradient Descent
Feedforward neural networks can be trained with Vanilla Gradient Descent

Gradient
descent
update rule
Gradient
● The gradient is the vector of partial derivatives
wrt to all the coordinates of the weights:

● Each partial derivative measures how fast the

loss changes in one direction.
● When the gradient is zero, i.e., all the partials
derivatives are zero, the loss is not changing in
any direction.
● Issues: local minima, saddle points
Gradient Descent
● Gradient Descent finds the set of parameters that makes the loss as small as
possible
● The change of parameters depends on the gradients of the loss with respect to the
network weights
● Backpropagation is a method for computing gradients
● What we will see now: Stochastic Gradient Descent (SGD) and other
optimization methods
Batch Gradient Descent (BGD)
Input: Learning rate , initial parameters w

while stopping criteria not met do

Compute gradient estimate over N examples

Apply update:
end while

The learning rate changes at each step, typically is decayed linearly.

Batch Gradient Descent

● Pros: Gradient estimates are stable

● Cons: Need to compute gradients over the entire training dataset for one update
Stochastic gradient descent (SGD)
Input: Learning rate , initial parameters w

while stopping criteria not met do

Sample one datapoint from training set
Compute gradient estimate

Apply update:
end while

The learning rate changes at each step, typically is decayed linearly.

BGD vs. SGD
BGD
SGD
BGD vs. SGD
BGD

SGD
MiniBatches
● Problem : gradient estimates can be very noisy
● One obvious solution is to use mini-batches (small sets of samples)
● Advantage:
○ Computation time per update does not depend on number of training examples N
○ It permits computation on extremely large datasets
○ Often parallel implementation
○ Using GPUs, it is common to use power of 2 batch sizes to oﬀer better runtime (some kinds of
hardware achieve better runtime with speciﬁc sizes of arrays)
Momentum
Problem with SGD: with some error surfaces, very slow progress along flat direction,
jitter along steep one!
Momentum
Introduce a new variable v: the velocity
The velocity is an exponentially decaying moving average of the negative gradient

Input: Learning rate , initial parameters w,

initial velocity v , momentum parameter

while stopping criteria not met do

Sample one datapoint from training set
Compute gradient estimate

Compute velocity update

Apply update:
end while
Adaptive Learning Rate Methods
● So far we have assigned the same learning rate to all features
● If the features vary in importance and frequency, is this a good idea?
● The learning rate is one of the hyperparameters most diﬃcult to set in neural networks

Easier: all the features important Harder!

Different Methods

http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html
Convolutional Neural
Networks
Structured Data
● Some applications naturally deal with an input space which is locally structured,
i.e., spatial or temporal.
● Images, language, etc. vs. arbitrary input features.
● Neural networks are extremely powerful in this case.
From Pixels to Labels
● Learn a hierarchy of features
● Each layer of hierarchy extracts features from output of previous layer
● Train all layers jointly

Layer1 Layer2 Layer3 Female

Simple
Classifier
In Convolutional Neural Networks…

Layer1 Layer2 Layer3 Female

Convolutional Neural Networks
Convolutional networks are simply neural networks that use convolution in place of
general matrix multiplication in at least one of their layers.
What is a convolution?
Recap: Convolution
● Convolution is a general purpose filter operation for images.
● A kernel matrix is applied to an image.
● It works by determining the value of a central pixel by adding the weighted
values of all its neighbors together.
● The output is a new modified filtered image.

● Can be used to smooth, sharpen, enhance…

● It is a commutative operation.
Recap: Convolution
Recap: Convolution
Convolutional Neural Networks
Inspired by mammalian visual cortex.

https://neurdiness.wordpress.com/2018/05/17/deep-convolutional-neural-networks-as-models-of-the-visual-system-qa/
Visual Cortex
● The visual cortex contains a complex arrangement of cells, which are sensitive to
small sub-regions of the visual field, called a receptive field.
● These cells act as local filters over the input space and are well-suited to exploit
the strong spatially local correlation present in natural images.
● Two basic cell types:
○ Simple cells respond maximally to specific edge-like patterns within their receptive field.
○ Complex cells have larger receptive fields and are locally invariant to the exact position of the
pattern.
CNN: Architecture
● Feedforward neural network with specialized connectivity structure
● Typically CNN layers transform the input matrix into an output class
prediction.
● There are a few distinct types of operations:
○ Convolution
○ Non-linearity
○ Pooling
CNN MOTIF
Convolution Spatial
Nonlinearity
(Learned) pooling

Input Feature Activation Map

Convolution
● Convolutional layer : core layer of CNNs.
● Consists of a set of learned filters.
● Each filter covers a spatially small portion of the input data (receptive field).
● Each filter is convolved across the dimensions of the input data, producing a multi-
dimensional feature map.
● Intuition: the network will learn filters that activate when they see some specific
type of feature at some spatial position in the input.
CNN:architecture

Convolution Spatial
Nonlinearity
(Learned) pooling

Apply elementwise

Increase the nonlinearity of the entire architecture without affecting the receptive fields
of the convolution layer.
CNN:architecture
Convolution Spatial
Nonlinearity
(Learned) pooling

Pooling: to provide invariance to translations

Pooling
By progressively reducing the spatial size of the representation we reduce the amount
of parameters and computation in the network and also control overfitting.
Example: max pooling
Convolutional
Neural Networks
Architectures
LeNet - 1998

[LeCun, Bottou, Bengio, Haffner 1998]

AlexNet - 2012
● Similar framework to LeNet but…
● Bigger model (7 hidden layers, 650K units, 60M params)
● More data (106 vs. 103 images)
● GPU implementation (50x speedup over CPU) - Trained on two GPUs for a week

A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012
Going Deeper
Classification: ImageNet Challenge top-5 error
VGG - 2014
Similar motif to AlexNet
GoogLeNet
● Has 12x fewer parameters than AlexNet
● Gets rid of fully connected layers
● Inception Module
ResNet
● Residual Block: improved performance of very
deep nets
● Solve the degradation problem enabling the
deeper layers to propagate the information
from the shallow layers directly with identity
mapping.
● Introduces batch normalization to improve
training
Beyond Classification
Detection
● First approach: R-CNN (Regions with CNN features)
● Trained on ImageNet classification
● Fine-tune CNN on PASCAL-VOC
● Nowadays more sophisticated methods exist

[Girshick et al. CVPR 2014]

Beyond Classification
● Semantic Segmentation

[Long et al. CVPR 2015]

Beyond Classification
● Structured Regression

[Toshev and Szegedy CVPR 2014]

CNN: SUMMARY
● In a feedforward neural network, units are organized into layers and the units at
a given layer only get input from units in the layer below.
● CNNs are feedforward networks. However, unlike standard vanilla feedforward
networks, units in a CNN have a spatial arrangement.
● At each layer, units are organized into 2D grids, the feature maps.
● Each feature map is the result of a convolution. The same convolutional filter is
applied at each location. The weights are different across feature maps.
● A unit at a particular location on the 2D grid can only receive input from units at a
similar location at the layer below.
● Need (a lot of) labeled data: supervised learning model!
● Flexible to many applications.
Other
Neural Networks
Many Models for different needs

https://www.asimovinstitute.org/neural-network-zoo/
Sequential Prediction Tasks
● So far, we focused mainly prediction problems with fixed-sized inputs and outputs.
● We discussed the flexibility of CNNs to address a wide range of tasks.
● But what if the input and/or output is a variable-length sequence ?
● Many applications where we need this...

Document classification Sentiment Analysis Image Captioning

Example: Video Frame Prediction
What is new?

Single2Single Feedforward Network

Multiple2Multiple Recurrent Network

Video Frame Prediction
Recurrent Neural Network (RNN)
RNN can address a wide range of tasks

Multiple2Single
Sentiment Analysis

Single2Multiple
Image Captioning

Multiple2Multiple

Machine Translation
Recurrent Neural Network (RNN)
● Introduces cycles, recurrences

Output at time t yt

Classifier
Hidden
representation at ht
time t new function input at old
Hidden layer state of W time t state

Input at time t xt
Recurrent Neural Network (RNN)
y3
RNN can be trained with backpropagation

y2
Classifier
y1 h3
Classifier
h2 Hidden layer
Classifier
h1 Hidden layer
x3
Hidden layer
x2 t=3
h0 x1 t=2
t=1
Unsupervised Learning: Autoencoders

https://www.asimovinstitute.org/neural-network-zoo/
Autoencoders: Dimensionality Reduction
● Unsupervised approach for learning a lower-dimensional feature representation
from unlabeled training data
● Features should capture meaningful factors of variation in data: z usually smaller
than x

Features
Encoder
Input data
Autoencoders

Originally: Linear + nonlinearity (sigmoid)

Later: Deep, fully-connected
Later: CNN with ReLU
Features
Encoder
Input data
Autoencoders
How to learn this feature representation?
● Train such that features can be used to reconstruct original data
● “Autoencoding” - encoding itself

Reconstructed
input data
Decoder
Features
Encoder
Input data
Autoencoders
Originally: Linear + nonlinearity (sigmoid)
Later: Deep, fully-connected
Later: ReLU CNN (upconv)

Reconstructed
input data
Decoder
Features
Encoder
Input data
Autoencoders
Reconstructed data

Doesn’t use labels!

L2 Loss function:

Reconstructed Encoder: 4-layer conv

input data Decoder: 4-layer upconv
Input data
Features

Input data
Autoencoders
● After training, we throw away decoder
● Encoder can be used to initialize a supervised model

Reconstructed
input data

Features

Input data
Autoencoders
● After training, we throw away decoder
● Encoder can be used to initialize a supervised model

Loss function

Predicted Label
Classifier Fine-tune
Features encoder
jointly with
classifier
Input data
QUESTIONS?

Chapter21 4e
No ratings yet
Chapter21 4e
35 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
Assignment 13 Modern AI
No ratings yet
Assignment 13 Modern AI
3 pages
Antim Prahar AI and ML For Business 2025
No ratings yet
Antim Prahar AI and ML For Business 2025
45 pages
Artificial Neural Network MID
No ratings yet
Artificial Neural Network MID
13 pages
Unit 2 Notes NLP
No ratings yet
Unit 2 Notes NLP
6 pages
Lec7 8+CNN 2
No ratings yet
Lec7 8+CNN 2
69 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
37 pages
Basics of DL: Prof. Leal-Taixé and Prof. Niessner 1
No ratings yet
Basics of DL: Prof. Leal-Taixé and Prof. Niessner 1
76 pages
L11 Learning III Neural Network Architectures
No ratings yet
L11 Learning III Neural Network Architectures
35 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
43 pages
Gen Ai Mynotes
No ratings yet
Gen Ai Mynotes
12 pages
Module2 1
No ratings yet
Module2 1
27 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
116 pages
Seminar Report cnn1
No ratings yet
Seminar Report cnn1
23 pages
Deep Learning - Intro, Methods & Applications
100% (1)
Deep Learning - Intro, Methods & Applications
37 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
Unit 4
No ratings yet
Unit 4
51 pages
L7 Lecture Image - classification.DNN v4
No ratings yet
L7 Lecture Image - classification.DNN v4
61 pages
3 - DeepLearning - and - CNN v3
No ratings yet
3 - DeepLearning - and - CNN v3
50 pages
Neural Networks & Deep Learning - Study Notes
No ratings yet
Neural Networks & Deep Learning - Study Notes
8 pages
Convolutional Neural Networks in Python - DataCamp
No ratings yet
Convolutional Neural Networks in Python - DataCamp
22 pages
Convolutional Neural Networks Notes
No ratings yet
Convolutional Neural Networks Notes
29 pages
AI Slide 2
No ratings yet
AI Slide 2
82 pages
Unit III
No ratings yet
Unit III
89 pages
Eng PPT Tech
No ratings yet
Eng PPT Tech
18 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
Deep Learning Cheatsheet Guide
No ratings yet
Deep Learning Cheatsheet Guide
14 pages
Deep Learning
No ratings yet
Deep Learning
90 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Artificial Neural Networks: Introduction To Computational Neuroscience
No ratings yet
Artificial Neural Networks: Introduction To Computational Neuroscience
42 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
106 pages
SDL Unit 2 3 4
No ratings yet
SDL Unit 2 3 4
12 pages
4b Image Processing
No ratings yet
4b Image Processing
63 pages
CNN 2
No ratings yet
CNN 2
47 pages
DL Unit3 1
No ratings yet
DL Unit3 1
67 pages
Large Scale Deep Learning
No ratings yet
Large Scale Deep Learning
170 pages
A Beginner's Tutorial For CNN
100% (1)
A Beginner's Tutorial For CNN
35 pages
Lecture 4
No ratings yet
Lecture 4
45 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
Lecture - 07 (Convolutional Neural Networks)
No ratings yet
Lecture - 07 (Convolutional Neural Networks)
57 pages
F11 Handout
No ratings yet
F11 Handout
5 pages
Neural Networks: Feedforward Basics
No ratings yet
Neural Networks: Feedforward Basics
24 pages
Introduction+to+Neural+Networks+ +Lecture+Slides+Part+1
No ratings yet
Introduction+to+Neural+Networks+ +Lecture+Slides+Part+1
36 pages
1.neural Networks and Convolutional Processing
No ratings yet
1.neural Networks and Convolutional Processing
94 pages
Chapter 4 Neural Network
No ratings yet
Chapter 4 Neural Network
46 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
Lecture 221007 05
No ratings yet
Lecture 221007 05
21 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
15 pages
An Overview of Convolutional Neural Network Architectures For Deep Learning
No ratings yet
An Overview of Convolutional Neural Network Architectures For Deep Learning
22 pages
Deep Learning PDF
No ratings yet
Deep Learning PDF
55 pages
Unit III
No ratings yet
Unit III
89 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
DL Unit-Ii
No ratings yet
DL Unit-Ii
34 pages
Score-Based Fusion Schemes For Plant Identification From Multi-Organ Images
No ratings yet
Score-Based Fusion Schemes For Plant Identification From Multi-Organ Images
15 pages
Underwater Target Detection Algorithm Based On YOLO and Swin Transformer For Sonar Images
No ratings yet
Underwater Target Detection Algorithm Based On YOLO and Swin Transformer For Sonar Images
7 pages
DL Project Report
No ratings yet
DL Project Report
10 pages
Coordinated Monitoring and Control Method of Deposited Layer Width and Reinforcement in WAAM Process
No ratings yet
Coordinated Monitoring and Control Method of Deposited Layer Width and Reinforcement in WAAM Process
11 pages
A Modified Deep Residual-Convolutional Neural Netw
No ratings yet
A Modified Deep Residual-Convolutional Neural Netw
23 pages
Smart IoT Retail Shop Design
No ratings yet
Smart IoT Retail Shop Design
10 pages
U-Net and Its Variants For Medical Image Segmentat
No ratings yet
U-Net and Its Variants For Medical Image Segmentat
43 pages
Investment of Classic Deep CNNs and SVM
No ratings yet
Investment of Classic Deep CNNs and SVM
8 pages
DeepSkin A Deep Learning Approach For Skin Cancer Classification
No ratings yet
DeepSkin A Deep Learning Approach For Skin Cancer Classification
56 pages
Plant Pathology - 2024 - Dolatabadian - Image Based Crop Disease Detection Using Machine Learning
No ratings yet
Plant Pathology - 2024 - Dolatabadian - Image Based Crop Disease Detection Using Machine Learning
21 pages
Residual Neural Network: Tea Leaf Desease Detection
No ratings yet
Residual Neural Network: Tea Leaf Desease Detection
6 pages
Discovering - The - Depths - of - Cotton - Leaf - Disease - Detection - Integrating - Hypertuned - Residual - Networks - With - GradCAM - XAI - For - Enhanced - Understanding - and - Diagnosis
No ratings yet
Discovering - The - Depths - of - Cotton - Leaf - Disease - Detection - Integrating - Hypertuned - Residual - Networks - With - GradCAM - XAI - For - Enhanced - Understanding - and - Diagnosis
6 pages
Project Report
No ratings yet
Project Report
30 pages
Deepfake Detection with CNN Models
No ratings yet
Deepfake Detection with CNN Models
6 pages
2023 - Jurnal - Q1-Mapping Smallholder Plantation As A Key To Sustainable Oil Palm
No ratings yet
2023 - Jurnal - Q1-Mapping Smallholder Plantation As A Key To Sustainable Oil Palm
18 pages
Unit 5
No ratings yet
Unit 5
24 pages
Ma Rewrite The Stars CVPR 2024 Paper
No ratings yet
Ma Rewrite The Stars CVPR 2024 Paper
10 pages
Integrating Gait and Speech Dynamics Methodologies For Enhanced Stuttering Detection Across Diverse Datasets
No ratings yet
Integrating Gait and Speech Dynamics Methodologies For Enhanced Stuttering Detection Across Diverse Datasets
14 pages
Visisonfor Code
No ratings yet
Visisonfor Code
12 pages
CNN Mcqs
No ratings yet
CNN Mcqs
26 pages
Midterm Study Guide Csci566
No ratings yet
Midterm Study Guide Csci566
20 pages
10 1108 - Ijbpa 06 2024 0128
No ratings yet
10 1108 - Ijbpa 06 2024 0128
19 pages
Deep Residual Learning for Image Recognition
No ratings yet
Deep Residual Learning for Image Recognition
25 pages
1 s2.0 S1566253525005792 Main
No ratings yet
1 s2.0 S1566253525005792 Main
58 pages
BDCC 08 00116 v2
No ratings yet
BDCC 08 00116 v2
23 pages
R20!63!20ITC27 Deep Learning Lab Manual (Minor Proj 2) Dr.K.ramu
No ratings yet
R20!63!20ITC27 Deep Learning Lab Manual (Minor Proj 2) Dr.K.ramu
47 pages
TensorFlow ML Models for Students
No ratings yet
TensorFlow ML Models for Students
17 pages
Using LLM To Transcribe Restaurant Menu Photos - DoorDash
No ratings yet
Using LLM To Transcribe Restaurant Menu Photos - DoorDash
15 pages
Sih Report
No ratings yet
Sih Report
33 pages
ML Interview Cheat Sheet
No ratings yet
ML Interview Cheat Sheet
9 pages