0% found this document useful (0 votes)
14 views32 pages

Unit Iv - NNDL

The document provides an overview of deep learning in computer vision, focusing on convolutional neural networks (CNNs) and their applications in tasks like object detection and image classification. It discusses the architecture of CNNs, the importance of data augmentation to prevent overfitting, and the benefits of using pre-trained models for improved accuracy and efficiency. Additionally, it introduces the YOLOv8 algorithm for real-time object detection, highlighting its performance advantages and ease of deployment.

Uploaded by

senba12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views32 pages

Unit Iv - NNDL

The document provides an overview of deep learning in computer vision, focusing on convolutional neural networks (CNNs) and their applications in tasks like object detection and image classification. It discusses the architecture of CNNs, the importance of data augmentation to prevent overfitting, and the benefits of using pre-trained models for improved accuracy and efficiency. Additionally, it introduces the YOLOv8 algorithm for real-time object detection, highlighting its performance advantages and ease of deployment.

Uploaded by

senba12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Module IV

Overview
What is Deep Learning in Computer Vision?
• Definition: Deep learning for computer vision involves
using neural networks, especially convolutional neural
networks (CNNs), to interpret and process images and
videos.

• Applications: Object detection, image classification, facial


recognition, medical imaging, autonomous driving, etc.

• Why Important: Automates tasks that traditionally


required human vision, leading to breakthroughs in AI
systems.
The Role of Convolutional Neural Networks (Convnets)

• What are Convnets?

– A ConvNet (Convolutional Neural Network or CNN) is a specialized

type of neural network designed primarily for processing

structured grid data like images.

– It is particularly effective for tasks involving image classification,

object detection, and pattern recognition because of its ability to

capture spatial and hierarchical patterns in data.

– Specialized type of neural network custom-made for visual data.

– Uses convolutional layers to automatically detect important

features like edges, textures, and patterns.


• Key Features:

– Local receptive fields: These are the regions in the input data that a

convolutional filter focuses on to detect patterns.

– Parameter sharing: This refers to the use of the same weights

(filters) across different parts of the input, reducing the number of

parameters and capturing the same feature regardless of location.

– Spatial hierarchy of patterns: This describes how CNNs build up

complex features (like edges, shapes, and objects) from simpler

ones, as layers go deeper into the network.

• Real-world use: Google’s image search, self-driving cars, facial

recognition.
Basic Architecture of a Convnet

• Layers:
– Convolutional Layer: Detects local patterns using filters.

– Pooling Layer: Reduces dimensionality, keeps important


features.
– Fully Connected Layer: Combines features to make final
predictions.
– Activation Function (ReLU): Introduces non-linearity.

• Example Architecture: Simple convnet with alternating


convolutional and pooling layers followed by dense
layers.
Training a Convnet from Scratch
• Challenge:
– Requires a large dataset and significant computational power.

• Steps:
– Define the network architecture.

– Compile the model (choose loss function, optimizer).

– Preprocess the data (image augmentation, normalization).

– Train on a labeled dataset.

– Evaluate and tune the model.

• Dataset Example:
– Using a small dataset like cats vs dogs (binary classification).
Overfitting and Data Augmentation
• Problem of Overfitting:
– Happens when the model learns noise or detail in the
training data, leading to poor performance on new
data.

• Solution: Data augmentation

• Techniques:
– Random rotations, shifts, flips, zooms.
– Introduces variation in training data to improve
generalization.
Leveraging a Pre-trained Model
• Definition: Using a model that has been trained on a large
dataset (e.g., ImageNet) and fine-tuning it for a specific task.
• Advantages:
– Requires less data.
– Faster training.
– Higher accuracy for small datasets.

• Common Approaches:
– Feature Extraction: Use the pre-trained model as a fixed feature
extractor.
– Fine-Tuning: Adjust some layers to fit the new task.
Feature Extraction with Pretrained
Models
• What is Feature Extraction?
– Freezing the convolutional base of the pretrained model and
only training the top-level classifier (dense layers).

• Example:
– Using a pretrained VGG16(Visual Geometry Group) model on
ImageNet and applying it to a medical image classification task.

• Benefits:
– Saves time, reduces overfitting on small datasets.
Fine-Tuning a Pretrained Model
• What is Fine-Tuning?
– Unfreezing some layers in the pretrained model and retraining them on the
new dataset.

• Steps:
– Load the pretrained model.

– Freeze the base layers.

– Add new layers for the target task.

– Unfreeze some layers and retrain with a lower learning rate.

• Example: Fine-tuning the top convolutional layers of MobileNetV2


for object detection on a custom dataset.
Case Study: Dogs vs Cats Classification
• Dataset: 2,000 images of cats and dogs (training, validation, test sets).

• Model: Small convnet trained from scratch vs. pretrained VGG16 model.

• Outcome:Training from scratch led to overfitting with limited data.

• Pretrained VGG16 achieved higher accuracy and better generalization.

• Note:
– Convnets: Key to processing visual data in deep learning.
– Pretrained Models: Powerful for saving time and improving accuracy on smaller
datasets.
– Importance of Augmentation: Reduces overfitting by introducing variety into
training data.
How to Implement VGG16 in Keras?
8 Steps for Implementing VGG16 in Kears:
1. Import the libraries for VGG16.

2. Create an object for training and testing data.


3. Initialize the model,

4. Pass the data to the dense layer.


5. Compile the model.

6. Import libraries to monitor and control training.

7. Visualize the training/validation data.


8. Test your model.
Dataset - Dogs vs. Cats” data set.
Implementation – Colab Link
• https://colab.research.google.com/drive/
1eJJnwB4eBasUMADlp55rFy7DawlApmGs#
Deep learning for computer vision
• Introduction to convnets
– Instantiating a small convnet
– Displaying the model’s summary
– Training the convnet on MNIST images
– Evaluating the convnet
• Formula for calculating number of parameters in

Input,Conv2D and MaxPooling2D layers:


– 1. Input Layer
• The input layer does not have trainable parameters, but its output
shape affects the following layers.

• Input shape: (Hin,Win,Cin), where:


• Hin​= height of the input image

• Win​= width of the input image

• Cin​= number of channels (e.g., 3 for RGB)

– There are no parameters to calculate here since the input layer


just passes the data to the next layer.
2. Conv2D Layer
– For a Conv2D layer, the number of parameters depends on the size of the
filters, the number of input channels, and the number of output filters
(kernels).
• MaxPooling2D Layer
– The MaxPooling2D layer is a downsampling layer
and does not involve trainable parameters. It
reduces the spatial dimensions of the input but has
no weights or biases.
• Number of parameters: 0
– MaxPooling only computes the maximum value in
each pooling window, so it doesn’t contribute to the
number of parameters in the model.
The convolution operation
• This key characteristic gives convnets two interesting properties:

– The patterns they learn are translation-invariant.

• This means that the patterns learned by convolutional layers are


independent of their exact location in the input image. If a feature
(like an edge or a shape) appears in different locations, the network can
still recognize it due to the shared weights of the filters across the entire
input.
– They can learn spatial hierarchies of patterns.

• Convnets are capable of learning complex patterns by stacking multiple


convolutional layers. Lower layers capture basic features (like edges or
textures), while higher layers combine these simple features into more
abstract representations (like objects or faces), creating a hierarchy of
patterns.
Convolutions are defined by two key parameters:
Size of the patches extracted from the inputs—These are typically 3 × 3 or 5 × 5.
In the example, they were 3 × 3, which is a common choice.
Depth of the output feature map—This is the number of filters computed by the
convolution.
The example started with a depth of 32 and ended with a depth of 64.
Real-Time Object Detection Algorithm:
YOLOv8
• YOLO (You Only Look Once) is a real-time object detection
algorithm that detects objects in a single forward pass of
the neural network.

• Unlike R-CNN and other two-stage detectors, YOLO is a


one-stage detector, achieving higher speed with good
accuracy.

• YOLOv8 is developed and maintained by Ultralytics, and it


represents the latest and most efficient version in the
YOLO series (as of 2024-2025).
Why YOLOv8?
• Real-time performance on CPU and GPU

• High accuracy and speed (FPS)

• Supports object detection, segmentation,


classification, and tracking

• Easy deployment with CoreML, TensorRT, TFLite


exports
//sample code
pip install ultralytics
from ultralytics import YOLO
# Load the pretrained model (nano version for
speed)
model = YOLO("yolov8n.pt")
# Predict on an image
results = model("image.jpg", show=True)
# Real-time from webcam
model.predict(source=0, show=True)
Thank you

You might also like