0% found this document useful (0 votes)

14 views32 pages

Unit Iv - NNDL

The document provides an overview of deep learning in computer vision, focusing on convolutional neural networks (CNNs) and their applications in tasks like object detection and image classification. It discusses the architecture of CNNs, the importance of data augmentation to prevent overfitting, and the benefits of using pre-trained models for improved accuracy and efficiency. Additionally, it introduces the YOLOv8 algorithm for real-time object detection, highlighting its performance advantages and ease of deployment.

Uploaded by

senba12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views32 pages

Unit Iv - NNDL

Uploaded by

senba12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Module IV

Overview
What is Deep Learning in Computer Vision?
• Definition: Deep learning for computer vision involves
using neural networks, especially convolutional neural
networks (CNNs), to interpret and process images and
videos.

• Applications: Object detection, image classification, facial

recognition, medical imaging, autonomous driving, etc.

• Why Important: Automates tasks that traditionally

required human vision, leading to breakthroughs in AI
systems.
The Role of Convolutional Neural Networks (Convnets)

• What are Convnets?

– A ConvNet (Convolutional Neural Network or CNN) is a specialized

type of neural network designed primarily for processing

structured grid data like images.

– It is particularly effective for tasks involving image classification,

object detection, and pattern recognition because of its ability to

capture spatial and hierarchical patterns in data.

– Specialized type of neural network custom-made for visual data.

– Uses convolutional layers to automatically detect important

features like edges, textures, and patterns.

• Key Features:

– Local receptive fields: These are the regions in the input data that a

convolutional filter focuses on to detect patterns.

– Parameter sharing: This refers to the use of the same weights

(filters) across different parts of the input, reducing the number of

parameters and capturing the same feature regardless of location.

– Spatial hierarchy of patterns: This describes how CNNs build up

complex features (like edges, shapes, and objects) from simpler

ones, as layers go deeper into the network.

• Real-world use: Google’s image search, self-driving cars, facial

recognition.
Basic Architecture of a Convnet

• Layers:
– Convolutional Layer: Detects local patterns using filters.

– Pooling Layer: Reduces dimensionality, keeps important

features.
– Fully Connected Layer: Combines features to make final
predictions.
– Activation Function (ReLU): Introduces non-linearity.

• Example Architecture: Simple convnet with alternating

convolutional and pooling layers followed by dense
layers.
Training a Convnet from Scratch
• Challenge:
– Requires a large dataset and significant computational power.

• Steps:
– Define the network architecture.

– Compile the model (choose loss function, optimizer).

– Preprocess the data (image augmentation, normalization).

– Train on a labeled dataset.

– Evaluate and tune the model.

• Dataset Example:
– Using a small dataset like cats vs dogs (binary classification).
Overfitting and Data Augmentation
• Problem of Overfitting:
– Happens when the model learns noise or detail in the
training data, leading to poor performance on new
data.

• Solution: Data augmentation

• Techniques:
– Random rotations, shifts, flips, zooms.
– Introduces variation in training data to improve
generalization.
Leveraging a Pre-trained Model
• Definition: Using a model that has been trained on a large
dataset (e.g., ImageNet) and fine-tuning it for a specific task.
• Advantages:
– Requires less data.
– Faster training.
– Higher accuracy for small datasets.

• Common Approaches:
– Feature Extraction: Use the pre-trained model as a fixed feature
extractor.
– Fine-Tuning: Adjust some layers to fit the new task.
Feature Extraction with Pretrained
Models
• What is Feature Extraction?
– Freezing the convolutional base of the pretrained model and
only training the top-level classifier (dense layers).

• Example:
– Using a pretrained VGG16(Visual Geometry Group) model on
ImageNet and applying it to a medical image classification task.

• Benefits:
– Saves time, reduces overfitting on small datasets.
Fine-Tuning a Pretrained Model
• What is Fine-Tuning?
– Unfreezing some layers in the pretrained model and retraining them on the
new dataset.

• Steps:
– Load the pretrained model.

– Freeze the base layers.

– Add new layers for the target task.

– Unfreeze some layers and retrain with a lower learning rate.

• Example: Fine-tuning the top convolutional layers of MobileNetV2

for object detection on a custom dataset.
Case Study: Dogs vs Cats Classification
• Dataset: 2,000 images of cats and dogs (training, validation, test sets).

• Model: Small convnet trained from scratch vs. pretrained VGG16 model.

• Outcome:Training from scratch led to overfitting with limited data.

• Pretrained VGG16 achieved higher accuracy and better generalization.

• Note:
– Convnets: Key to processing visual data in deep learning.
– Pretrained Models: Powerful for saving time and improving accuracy on smaller
datasets.
– Importance of Augmentation: Reduces overfitting by introducing variety into
training data.
How to Implement VGG16 in Keras?
8 Steps for Implementing VGG16 in Kears:
1. Import the libraries for VGG16.

2. Create an object for training and testing data.

3. Initialize the model,

4. Pass the data to the dense layer.

5. Compile the model.

6. Import libraries to monitor and control training.

7. Visualize the training/validation data.

8. Test your model.
Dataset - Dogs vs. Cats” data set.
Implementation – Colab Link
• https://colab.research.google.com/drive/
1eJJnwB4eBasUMADlp55rFy7DawlApmGs#
Deep learning for computer vision
• Introduction to convnets
– Instantiating a small convnet
– Displaying the model’s summary
– Training the convnet on MNIST images
– Evaluating the convnet
• Formula for calculating number of parameters in

Input,Conv2D and MaxPooling2D layers:

– 1. Input Layer
• The input layer does not have trainable parameters, but its output
shape affects the following layers.

• Input shape: (Hin,Win,Cin), where:

• Hin= height of the input image

• Win= width of the input image

• Cin= number of channels (e.g., 3 for RGB)

– There are no parameters to calculate here since the input layer

just passes the data to the next layer.
2. Conv2D Layer
– For a Conv2D layer, the number of parameters depends on the size of the
filters, the number of input channels, and the number of output filters
(kernels).
• MaxPooling2D Layer
– The MaxPooling2D layer is a downsampling layer
and does not involve trainable parameters. It
reduces the spatial dimensions of the input but has
no weights or biases.
• Number of parameters: 0
– MaxPooling only computes the maximum value in
each pooling window, so it doesn’t contribute to the
number of parameters in the model.
The convolution operation
• This key characteristic gives convnets two interesting properties:

– The patterns they learn are translation-invariant.

• This means that the patterns learned by convolutional layers are

independent of their exact location in the input image. If a feature
(like an edge or a shape) appears in different locations, the network can
still recognize it due to the shared weights of the filters across the entire
input.
– They can learn spatial hierarchies of patterns.

• Convnets are capable of learning complex patterns by stacking multiple

convolutional layers. Lower layers capture basic features (like edges or
textures), while higher layers combine these simple features into more
abstract representations (like objects or faces), creating a hierarchy of
patterns.
Convolutions are defined by two key parameters:
Size of the patches extracted from the inputs—These are typically 3 × 3 or 5 × 5.
In the example, they were 3 × 3, which is a common choice.
Depth of the output feature map—This is the number of filters computed by the
convolution.
The example started with a depth of 32 and ended with a depth of 64.
Real-Time Object Detection Algorithm:
YOLOv8
• YOLO (You Only Look Once) is a real-time object detection
algorithm that detects objects in a single forward pass of
the neural network.

• Unlike R-CNN and other two-stage detectors, YOLO is a

one-stage detector, achieving higher speed with good
accuracy.

• YOLOv8 is developed and maintained by Ultralytics, and it

represents the latest and most efficient version in the
YOLO series (as of 2024-2025).
Why YOLOv8?
• Real-time performance on CPU and GPU

• High accuracy and speed (FPS)

• Supports object detection, segmentation,

classification, and tracking

• Easy deployment with CoreML, TensorRT, TFLite

exports
//sample code
pip install ultralytics
from ultralytics import YOLO
# Load the pretrained model (nano version for
speed)
model = YOLO("yolov8n.pt")
# Predict on an image
results = model("image.jpg", show=True)
# Real-time from webcam
model.predict(source=0, show=True)
Thank you

ch4 CNN
No ratings yet
ch4 CNN
35 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
DL7 2
No ratings yet
DL7 2
11 pages
Convolutional Nets
No ratings yet
Convolutional Nets
41 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
CNN 2
No ratings yet
CNN 2
47 pages
Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
Classify Webcam Images Using Deep Learning
No ratings yet
Classify Webcam Images Using Deep Learning
17 pages
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
No ratings yet
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
55 pages
An Overview of Convolutional Neural Network Architectures For Deep Learning
No ratings yet
An Overview of Convolutional Neural Network Architectures For Deep Learning
22 pages
Intro to Convolutional Networks
No ratings yet
Intro to Convolutional Networks
17 pages
8 Deep Learning CNN
No ratings yet
8 Deep Learning CNN
63 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
4th Unit Aktu Machine Learning
No ratings yet
4th Unit Aktu Machine Learning
9 pages
Week8 WEB
No ratings yet
Week8 WEB
54 pages
Image Recognition Using Neural Networks
No ratings yet
Image Recognition Using Neural Networks
18 pages
Deep Learning for Vision Experts
No ratings yet
Deep Learning for Vision Experts
91 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
43 pages
Deep Learning & CNN Fundamentals
No ratings yet
Deep Learning & CNN Fundamentals
56 pages
Convolutional Neural Networks - Deeplearning-Notes
No ratings yet
Convolutional Neural Networks - Deeplearning-Notes
43 pages
Convolutional Neural Networks in Python - DataCamp
No ratings yet
Convolutional Neural Networks in Python - DataCamp
22 pages
Identify Web Cam Images Using Neural Networks
No ratings yet
Identify Web Cam Images Using Neural Networks
17 pages
Unit III
No ratings yet
Unit III
89 pages
COMP3220 Lect 11 - Introduction To Convolutional Neural Networks
No ratings yet
COMP3220 Lect 11 - Introduction To Convolutional Neural Networks
13 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
7 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
CNN and Applications
No ratings yet
CNN and Applications
22 pages
Convnets
No ratings yet
Convnets
41 pages
Oct2022 CSC649 SupervisedDL - CNN
No ratings yet
Oct2022 CSC649 SupervisedDL - CNN
79 pages
Week 09
No ratings yet
Week 09
6 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
Keras Computer Vision Guide
No ratings yet
Keras Computer Vision Guide
67 pages
CNN Basic
No ratings yet
CNN Basic
64 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
Classic CNN
No ratings yet
Classic CNN
39 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
No ratings yet
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
55 pages
AlexNet and Other Pretrained Models - Presentation
No ratings yet
AlexNet and Other Pretrained Models - Presentation
182 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
55 pages
CNN13 7 25
No ratings yet
CNN13 7 25
175 pages
CNN13 7 25
No ratings yet
CNN13 7 25
175 pages
CNN13 7 25
No ratings yet
CNN13 7 25
175 pages
DLCV Ch2 Neural Network
No ratings yet
DLCV Ch2 Neural Network
68 pages
Computer Vision & CNNs - Study Notes
No ratings yet
Computer Vision & CNNs - Study Notes
12 pages
SoS'25 Midterm - Report
No ratings yet
SoS'25 Midterm - Report
14 pages
Unit 3
No ratings yet
Unit 3
105 pages
CV Mot
No ratings yet
CV Mot
69 pages
Convolutional Networks
No ratings yet
Convolutional Networks
37 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
CV - T3 - Unit-7
No ratings yet
CV - T3 - Unit-7
36 pages
Unit2 CNN
No ratings yet
Unit2 CNN
34 pages
Convolutional Neural Networks Notes
No ratings yet
Convolutional Neural Networks Notes
29 pages
Plant Disease Identification
No ratings yet
Plant Disease Identification
17 pages
Convolutional Neural Networks: Riddhiman Dasgupta & Ayushi Dalmia Cse577 Tutorial, Iiit Hyderabad, Monsoon 2015
No ratings yet
Convolutional Neural Networks: Riddhiman Dasgupta & Ayushi Dalmia Cse577 Tutorial, Iiit Hyderabad, Monsoon 2015
29 pages
CNNs: A Guide for Tech Enthusiasts
No ratings yet
CNNs: A Guide for Tech Enthusiasts
80 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
DL Unit3 1
No ratings yet
DL Unit3 1
67 pages
Math of Fingerprint
No ratings yet
Math of Fingerprint
55 pages
Artificial Intelligence Domains
No ratings yet
Artificial Intelligence Domains
1 page
Brain Tumor Detection via MRI
No ratings yet
Brain Tumor Detection via MRI
9 pages
Test-Imagexpert Color Offset
100% (1)
Test-Imagexpert Color Offset
1 page
Computer Vision Based Fruit Sorting and Grading System 2024 11-16-15!39!31 Export
No ratings yet
Computer Vision Based Fruit Sorting and Grading System 2024 11-16-15!39!31 Export
242 pages
Machine Learning Projects For Beginners
100% (2)
Machine Learning Projects For Beginners
9 pages
Brochura ResearchinDataScienceandAIappliedtoPA
No ratings yet
Brochura ResearchinDataScienceandAIappliedtoPA
31 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
11 pages
AGC 8.7 Libpatcher Help
100% (3)
AGC 8.7 Libpatcher Help
11 pages
Machine Learning Applications For Precision Agricu
No ratings yet
Machine Learning Applications For Precision Agricu
38 pages
Biometric Image Noise & Restoration
No ratings yet
Biometric Image Noise & Restoration
33 pages
Fruit Detection Report
No ratings yet
Fruit Detection Report
50 pages
Image Processing and Computer Vision Laboratory - DR - Majharoddin
No ratings yet
Image Processing and Computer Vision Laboratory - DR - Majharoddin
64 pages
Azure AI Fundamentals Course
No ratings yet
Azure AI Fundamentals Course
11 pages
Advanced Digital Image Processing-QB
No ratings yet
Advanced Digital Image Processing-QB
2 pages
Shapiro S.C. - Artificial Intelligence
No ratings yet
Shapiro S.C. - Artificial Intelligence
9 pages
Corn Disease Detection Report3
No ratings yet
Corn Disease Detection Report3
7 pages
Science and Technology Educational Offer
No ratings yet
Science and Technology Educational Offer
126 pages
Generative AI Learning Roadmap - 2024
No ratings yet
Generative AI Learning Roadmap - 2024
25 pages
GTU DOM Paper 4
No ratings yet
GTU DOM Paper 4
2 pages
Systolic Array Architecture For Educational Use
No ratings yet
Systolic Array Architecture For Educational Use
6 pages
Slide 11 Diagimg
No ratings yet
Slide 11 Diagimg
39 pages
Pantone C vs Cemani Toka TC Color Analysis
No ratings yet
Pantone C vs Cemani Toka TC Color Analysis
8 pages
MCA101
No ratings yet
MCA101
1 page
Image Segmentation: A Literature Review: Project Proposal
No ratings yet
Image Segmentation: A Literature Review: Project Proposal
12 pages
Photoshop Blending Modes Guide
No ratings yet
Photoshop Blending Modes Guide
4 pages
ALL LIST 2025 PYTHON PROJECTS LIST Final 290724
No ratings yet
ALL LIST 2025 PYTHON PROJECTS LIST Final 290724
25 pages
Human Action Recognition Curves
No ratings yet
Human Action Recognition Curves
14 pages
An Efcient Multi Level Pre Processing Algorithm For The Enhancement
No ratings yet
An Efcient Multi Level Pre Processing Algorithm For The Enhancement
19 pages
3 Juli 2022
No ratings yet
3 Juli 2022
4 pages

Unit Iv - NNDL

Uploaded by

Unit Iv - NNDL

Uploaded by

Module IV

• Applications: Object detection, image classification, facial

• Why Important: Automates tasks that traditionally

• What are Convnets?

– A ConvNet (Convolutional Neural Network or CNN) is a specialized

type of neural network designed primarily for processing

structured grid data like images.

– It is particularly effective for tasks involving image classification,

object detection, and pattern recognition because of its ability to

capture spatial and hierarchical patterns in data.

– Specialized type of neural network custom-made for visual data.

– Uses convolutional layers to automatically detect important

features like edges, textures, and patterns.

convolutional filter focuses on to detect patterns.

– Parameter sharing: This refers to the use of the same weights

(filters) across different parts of the input, reducing the number of

parameters and capturing the same feature regardless of location.

– Spatial hierarchy of patterns: This describes how CNNs build up

complex features (like edges, shapes, and objects) from simpler

ones, as layers go deeper into the network.

• Real-world use: Google’s image search, self-driving cars, facial

– Pooling Layer: Reduces dimensionality, keeps important

• Example Architecture: Simple convnet with alternating

– Compile the model (choose loss function, optimizer).

– Preprocess the data (image augmentation, normalization).

– Train on a labeled dataset.

– Evaluate and tune the model.

• Solution: Data augmentation

– Freeze the base layers.

– Add new layers for the target task.

– Unfreeze some layers and retrain with a lower learning rate.

• Example: Fine-tuning the top convolutional layers of MobileNetV2

• Outcome:Training from scratch led to overfitting with limited data.

• Pretrained VGG16 achieved higher accuracy and better generalization.

2. Create an object for training and testing data.

4. Pass the data to the dense layer.

6. Import libraries to monitor and control training.

7. Visualize the training/validation data.

Input,Conv2D and MaxPooling2D layers:

• Input shape: (Hin,Win,Cin), where:

• Win​= width of the input image

• Cin​= number of channels (e.g., 3 for RGB)

– There are no parameters to calculate here since the input layer

– The patterns they learn are translation-invariant.

• This means that the patterns learned by convolutional layers are

• Convnets are capable of learning complex patterns by stacking multiple

• Unlike R-CNN and other two-stage detectors, YOLO is a

• YOLOv8 is developed and maintained by Ultralytics, and it

• High accuracy and speed (FPS)

• Supports object detection, segmentation,

• Easy deployment with CoreML, TensorRT, TFLite

You might also like

• Win= width of the input image

• Cin= number of channels (e.g., 3 for RGB)