0% found this document useful (0 votes)

93 views20 pages

Faster R-CNN

The document provides a comprehensive guide on understanding and implementing the Faster R-CNN model, which is a two-stage object detection framework that proposes regions and classifies objects within images. It details the architecture, including the Region Proposal Network (RPN) and the object classification process, along with the training and inference steps using PyTorch. Additionally, it includes code snippets for setting up the model, preparing datasets, and evaluating performance.

Uploaded by

Sagar Giri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views20 pages

Faster R-CNN

Uploaded by

Sagar Giri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Understanding and

Implementing Faster R-CNN

Most of the current SOTA models are built on top of the

groundwork laid by the Faster-RCNN model. Faster R-CNN is

an object detection model that identifies objects in an image

and draws bounding boxes around them, while also classifying

what those objects are. It’s a two-stage detector:

1. Stage 1: Proposes potential regions in the image that

might contain objects. This is handled by the Region

Proposal Network (RPN).

2. Stage 2: Uses these proposed regions to predict the

class of the object and refines the bounding box to

better match the object.

The Architecture of Faster R-CNN

Faster R-CNN Architechture

Stage 1: Region Proposal Network (RPN):

Backbone Network:

● The image passes through a convolutional network (like

ResNet or VGG16).

● This extracts important features from the image and

creates a feature map.

Anchors:

● Anchors are boxes of different sizes and shapes placed

over points on the feature map.

● Each anchor box represents a possible object location.

● At every point on the feature map, anchor boxes are

generated with different sizes and aspect ratios.

Classification of Anchors:

● The RPN predicts whether each anchor box is

background (no object) or foreground (contains an

object).
● Positive (foreground) anchors: Boxes with high

overlap with actual objects.

● Negative (background) anchors: Boxes with little

or no overlap with objects.

Bounding Box Refinement:

● The RPN also refines the anchor boxes to better align

them with the actual objects by predicting offsets

(adjustments).

Loss functions:

I)Classification loss: Helps the model decide if the anchor is

background or foreground.

II)Regression loss: Helps adjust the anchor boxes to fit the

objects more precisely.

Stage 2: Object Classification and Box
Refinement:

Region Proposals:

● After RPN, we get region proposals (refined boxes

that likely contain objects).

ROI Pooling:

● The region proposals have different sizes, but the neural

network needs fixed-size inputs.

● ROI Pooling resizes all region proposals to a fixed size

by dividing them into smaller regions and applying

pooling, making them uniform.

Object Classification:

● Each region proposal is passed through a small network

to predict the category (e.g., dog, car, etc.) of the object

inside it.
● Cross-entropy loss is used to classify the objects into

categories.

Bounding Box Refinement (Again):

● The region proposals are refined again to better match

the actual objects, using offsets.

● This uses regression loss to adjust the proposals.

Multi-task Learning:

● The network in stage 2 learns both to predict object

categories and refine bounding boxes at the same time.

Inference (Testing/Prediction Time):

● Top Region Proposals: During testing, the model

generates a large number of region proposals, but only

the top proposals (with the highest classification

scores) are passed to the second stage.

● Final Predictions: The second stage predicts the final

categories and bounding boxes.

● Non-Max Suppression: A technique called

Non-Max Suppression is applied to remove

duplicate or overlapping boxes, keeping only the best

ones.

Training:
Two ways to train:

1. Train in stages: First, train the region proposal

network (RPN) and then the classifier and regressor.

2. Train together: Train both stages at the same time

(faster and more efficient).

Implement and Fine-Tune Faster R-CNN in

PyTorch

Step 1: Install Required Libraries

pip install torch torchvision

Step 2: Import Required Modules

import torch

from torch.utils.data import DataLoader

import torchvision

from torchvision.models.detection import fasterrcnn_resnet50_fpn

from torchvision.datasets import ImageFolder

from torchvision import transforms

import torchvision.transforms as T

from torchvision.models.detection.faster_rcnn import

FastRCNNPredictor

Step 3: Load Pre-trained Faster R-CNN Model

PyTorch’s torchvision provides a Faster R-CNN model

pre-trained on COCO. You can modify this for your own dataset

by changing the number of classes in the final layer.

# Load the pre-trained Faster R-CNN model with a ResNet-50 backbone

model = fasterrcnn_resnet50_fpn(pretrained=True)

# Number of classes (your dataset classes + 1 for background)

num_classes = 3 # For example, 2 classes + background

# Get the number of input features for the classifier

in_features = model.roi_heads.box_predictor.cls_score.in_features
# Replace the head of the model with a new one (for the number of
classes in your dataset)

model.roi_heads.box_predictor = FastRCNNPredictor(in_features,
num_classes)

Step 4: Prepare the Dataset

● Faster R-CNN requires images and corresponding

annotations (bounding boxes and labels).

● Your dataset should return: Images and Target

dictionaries that include bounding boxes (boxes) and

labels (labels).

Create your custom dataset class if necessary. You can use

torchvision.datasets.ImageFolder and provide bounding boxes in

the annotation files or create a custom Dataset class.

# Define transformations (e.g., resizing, normalization)

transform = T.Compose([
T.ToTensor(),

])

# Custom Dataset class or using an existing one

class CustomDataset(torch.utils.data.Dataset):

def init(self, transforms=None):

# Initialize dataset paths and annotations here

self.transforms = transforms

# Your dataset logic (image paths, annotations, etc.)

def getitem(self, idx):

# Load image

img = ... # Load your image here

# Load corresponding bounding boxes and labels

boxes = ... # Load or define bounding boxes

labels = ... # Load or define labels

# Create a target dictionary

target = {}

target["boxes"] = torch.tensor(boxes, dtype=torch.float32)

target["labels"] = torch.tensor(labels, dtype=torch.int64)

# Apply transforms

if self.transforms is not None:

img = self.transforms(img)

return img, target

def __len__(self):

# Return the length of your dataset

return len(self.data)

Step 5: Set Up Data Loader

# Load dataset

dataset = CustomDataset(transforms=transform)

# Split into train and validation sets

indices = torch.randperm(len(dataset)).tolist()

train_dataset = torch.utils.data.Subset(dataset, indices[:-50])

valid_dataset = torch.utils.data.Subset(dataset, indices[-50:])

# Create data loaders

train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True,

collate_fn=lambda x:
tuple(zip(*x)))

valid_loader = DataLoader(valid_dataset, batch_size=4,

shuffle=False,

collate_fn=lambda x:
tuple(zip(*x)))

Step 6: Set Up Training Loop

Now set up the optimizer and training loop. For Faster R-CNN,

it’s common to use SGD or Adam as the optimizer.

# Move model to GPU if available

device = torch.device('cuda') if torch.cuda.is_available()

else torch.device('cpu')
model.to(device)

# Set up the optimizer

params = [p for p in model.parameters() if p.requires_grad]

optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9,

weight_decay=0.0005)

# Learning rate scheduler

lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
step_size=3,

gamma=0.1)

# Train the model

num_epochs = 10
for epoch in range(num_epochs):

model.train()

train_loss = 0.0

# Training loop

for images, targets in train_loader:

images = list(image.to(device) for image in images)

targets = [{k: v.to(device) for k, v in t.items()} for t in

targets]

# Zero the gradients

optimizer.zero_grad()
# Forward pass

loss_dict = model(images, targets)

losses = sum(loss for loss in loss_dict.values())

# Backward pass

losses.backward()

optimizer.step()

train_loss += losses.item()

# Update the learning rate

lr_scheduler.step()
print(f'Epoch: {epoch + 1}, Loss: {train_loss /
len(train_loader)}')

print("Training complete!")

Step 7: Evaluate the Model

After training, you can evaluate the model on the validation set

or use it for inference on new images.

# Set the model to evaluation mode

model.eval()

# Test on a new image

with torch.no_grad():

for images, targets in valid_loader:

images = list(img.to(device) for img in images)

predictions = model(images)
# Example: print the bounding boxes and labels for the first
image

print(predictions[0]['boxes'])

print(predictions[0]['labels'])

Step 8: Inference
To run inference on a new image:

import cv2

from PIL import Image

# Load image

img = Image.open("path/to/your/image.jpg")

# Apply the same transformation as for training

img = transform(img)
img = img.unsqueeze(0).to(device)

# Model prediction

model.eval()

with torch.no_grad():

prediction = model([img])

# Print the predicted bounding boxes and labels

print(prediction[0]['boxes'])

print(prediction[0]['labels'])

Understanding and Implementing Faster R-CNN - by Rishabh Singh - Medium
No ratings yet
Understanding and Implementing Faster R-CNN - by Rishabh Singh - Medium
14 pages
Deep Learning for Fruit Detection
No ratings yet
Deep Learning for Fruit Detection
31 pages
Intro To Pytorch
No ratings yet
Intro To Pytorch
12 pages
Assign PDF
No ratings yet
Assign PDF
19 pages
Document 2
No ratings yet
Document 2
8 pages
CV Lab Final AwaisKhan EE A
No ratings yet
CV Lab Final AwaisKhan EE A
7 pages
Genai 1,2,3
No ratings yet
Genai 1,2,3
15 pages
DLV Lab Manual Print
No ratings yet
DLV Lab Manual Print
29 pages
Assignment3 AL
No ratings yet
Assignment3 AL
23 pages
Report
No ratings yet
Report
15 pages
Ccnet Only
No ratings yet
Ccnet Only
6 pages
Code
No ratings yet
Code
4 pages
Building Deep Learning Models Using The PyTorch Library
No ratings yet
Building Deep Learning Models Using The PyTorch Library
4 pages
Deep Learning with PyTorch Course
No ratings yet
Deep Learning with PyTorch Course
9 pages
Training A Classifier - PyTorch Tutorials 2.3.0+cu121 Documentation
No ratings yet
Training A Classifier - PyTorch Tutorials 2.3.0+cu121 Documentation
8 pages
Deep Learning For Vision Lab Manual 2024
100% (1)
Deep Learning For Vision Lab Manual 2024
25 pages
CIFAR - 10 - Dataset - Using - CNN - Aniiiii - HTML
No ratings yet
CIFAR - 10 - Dataset - Using - CNN - Aniiiii - HTML
8 pages
HW3 Pedro Aguiar
No ratings yet
HW3 Pedro Aguiar
9 pages
CVDL Tae 63
No ratings yet
CVDL Tae 63
9 pages
NN From Scratch
No ratings yet
NN From Scratch
5 pages
Pytorch Project Pedro Aguiar
No ratings yet
Pytorch Project Pedro Aguiar
27 pages
Cep Dip
No ratings yet
Cep Dip
9 pages
361 Project Code
No ratings yet
361 Project Code
10 pages
Assignment 3 DS5620
No ratings yet
Assignment 3 DS5620
11 pages
(Deep Learning Using PyTorch) (Cheatsheet)
No ratings yet
(Deep Learning Using PyTorch) (Cheatsheet)
7 pages
Tut4 NN Pytorch Updated - Ipynb - Colab
No ratings yet
Tut4 NN Pytorch Updated - Ipynb - Colab
11 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
10 pages
3 Steps To Update Parameters of Faster R-CNN - SSD Models in TensorFlow Object Detection API
No ratings yet
3 Steps To Update Parameters of Faster R-CNN - SSD Models in TensorFlow Object Detection API
15 pages
Keras
No ratings yet
Keras
4 pages
Val
No ratings yet
Val
9 pages
Pytorch Tutorial: Narges Honarvar Nazari January 30
No ratings yet
Pytorch Tutorial: Narges Honarvar Nazari January 30
29 pages
Aditya Joshi 23252595 Assign 5
No ratings yet
Aditya Joshi 23252595 Assign 5
7 pages
Font Image Augmentation & Model Training
No ratings yet
Font Image Augmentation & Model Training
78 pages
Practical: Build and Train A Feedforward Neural Network (MLP)
No ratings yet
Practical: Build and Train A Feedforward Neural Network (MLP)
4 pages
PyTorch Deep Learning Guide
No ratings yet
PyTorch Deep Learning Guide
19 pages
Advance Questions Answers
No ratings yet
Advance Questions Answers
4 pages
ML Code Analysis
No ratings yet
ML Code Analysis
6 pages
Image Classification with PyTorch
No ratings yet
Image Classification with PyTorch
19 pages
2c PyTorch4
No ratings yet
2c PyTorch4
4 pages
Paper Code
No ratings yet
Paper Code
3 pages
PyTorch Tensor and Autograd Guide
No ratings yet
PyTorch Tensor and Autograd Guide
15 pages
Skill 7
No ratings yet
Skill 7
11 pages
Practical 02
No ratings yet
Practical 02
5 pages
01 - Mnist - Ipynb (4) - JupyterLab
No ratings yet
01 - Mnist - Ipynb (4) - JupyterLab
23 pages
Palaka
No ratings yet
Palaka
2 pages
DL Pipeline and Tutorial
No ratings yet
DL Pipeline and Tutorial
36 pages
Quality Testing MobileNet V2 Compressed
No ratings yet
Quality Testing MobileNet V2 Compressed
13 pages
Pytorch Neural Networks Guide 1717173717
No ratings yet
Pytorch Neural Networks Guide 1717173717
17 pages
Transfer Learning For Image Classification in Pytorch
No ratings yet
Transfer Learning For Image Classification in Pytorch
13 pages
Notebook - Agave Plant Maturation Model Inference and Testing
No ratings yet
Notebook - Agave Plant Maturation Model Inference and Testing
7 pages
Deep Learning Project For Computer Vision With Python 2022
No ratings yet
Deep Learning Project For Computer Vision With Python 2022
297 pages
Chapter 3 - Training Deep Neural Networks
No ratings yet
Chapter 3 - Training Deep Neural Networks
25 pages
Assignment U Net
No ratings yet
Assignment U Net
11 pages
Quality Testing Resnet18 Compressed
No ratings yet
Quality Testing Resnet18 Compressed
13 pages
Csc413 Project Semantic Segmentation
No ratings yet
Csc413 Project Semantic Segmentation
84 pages
Assignment 2 DL
No ratings yet
Assignment 2 DL
10 pages
PDL 05-Merged
No ratings yet
PDL 05-Merged
8 pages
Tesi
No ratings yet
Tesi
57 pages
DL Programs
No ratings yet
DL Programs
36 pages
Cognizant Coding Analysis
No ratings yet
Cognizant Coding Analysis
1 page
SF Handbook-2025
No ratings yet
SF Handbook-2025
14 pages
All Cam Pla
No ratings yet
All Cam Pla
5 pages
Normalisation
No ratings yet
Normalisation
7 pages
SFVIP Calendar 2025 - Cohort 1
No ratings yet
SFVIP Calendar 2025 - Cohort 1
1 page
Basics of Python
No ratings yet
Basics of Python
1 page
MARKSHEET4
No ratings yet
MARKSHEET4
1 page
Attribute Closure
No ratings yet
Attribute Closure
4 pages
A Comparative Analysis of Machine Learning Algorithms For Classification Purpose
No ratings yet
A Comparative Analysis of Machine Learning Algorithms For Classification Purpose
10 pages
Impact of ICT on Education & Society
No ratings yet
Impact of ICT on Education & Society
1 page
DSA Lab Report 10 Anas Zohrab
No ratings yet
DSA Lab Report 10 Anas Zohrab
7 pages
Structures A Geometric Approach Graphical Statics and Analysis Edmond Saliklis PDF Version
100% (2)
Structures A Geometric Approach Graphical Statics and Analysis Edmond Saliklis PDF Version
64 pages
List of Experiments: Experiment No. Experiment Name Page No
No ratings yet
List of Experiments: Experiment No. Experiment Name Page No
1 page
Ds 3marks
No ratings yet
Ds 3marks
8 pages
Hacker Rank
No ratings yet
Hacker Rank
44 pages
Virtual Machine Setup Guide
No ratings yet
Virtual Machine Setup Guide
7 pages
05 MIS Development Process
100% (1)
05 MIS Development Process
29 pages
MFCS Assignment-2 R23
No ratings yet
MFCS Assignment-2 R23
3 pages
It Paper 12TH 2025-26
No ratings yet
It Paper 12TH 2025-26
2 pages
Minerva - Minerva 8 - 16E - 80 Opperating Manual E
No ratings yet
Minerva - Minerva 8 - 16E - 80 Opperating Manual E
28 pages
(Ebook PDF) Contemporary Management 11th Edition by Gareth Jones PDF Download
100% (1)
(Ebook PDF) Contemporary Management 11th Edition by Gareth Jones PDF Download
57 pages
Ai-900 Ba1968fd3ca4
100% (2)
Ai-900 Ba1968fd3ca4
143 pages
A Hybrid Approach For Mortality Prediction For Heart Patients Using ACO-HKNN 2020
No ratings yet
A Hybrid Approach For Mortality Prediction For Heart Patients Using ACO-HKNN 2020
8 pages
Kameleonfuzz-Evolutionary Blackbox XSS Fuzzing-Duchene-Codaspy 2014
No ratings yet
Kameleonfuzz-Evolutionary Blackbox XSS Fuzzing-Duchene-Codaspy 2014
13 pages
64C2 Data Sheet
No ratings yet
64C2 Data Sheet
2 pages
Windows 10 IoT Enterprise FAQ - Feb18
No ratings yet
Windows 10 IoT Enterprise FAQ - Feb18
4 pages
Asian and Western Art
No ratings yet
Asian and Western Art
8 pages
Structural Steel Inspection Report Checklist - SafetyCulture
100% (1)
Structural Steel Inspection Report Checklist - SafetyCulture
3 pages
FCP Faz Ad-7.4
No ratings yet
FCP Faz Ad-7.4
4 pages
STATISTICS - AND - PROBABILITY - For - Senior - Hi (Autosaved)
No ratings yet
STATISTICS - AND - PROBABILITY - For - Senior - Hi (Autosaved)
255 pages
Ii-Jcs1411 Os Lab Manual
No ratings yet
Ii-Jcs1411 Os Lab Manual
78 pages
Unit - I MFCS OBJECTIVE
No ratings yet
Unit - I MFCS OBJECTIVE
13 pages
Knowledge Base Article: Ovationutils Excel Add-In and Installation Instructions
No ratings yet
Knowledge Base Article: Ovationutils Excel Add-In and Installation Instructions
14 pages
Data Mining: Pre-Processing Essentials
No ratings yet
Data Mining: Pre-Processing Essentials
11 pages
Section 3 - Networks
No ratings yet
Section 3 - Networks
27 pages
DSA Seminar TOPIC
No ratings yet
DSA Seminar TOPIC
2 pages
Unit V
No ratings yet
Unit V
21 pages
Game Design Rules Guide
No ratings yet
Game Design Rules Guide
1 page