0% found this document useful (0 votes)
19 views46 pages

Report

This project report details the development of a deep learning-based system for logo detection in images using CNN and Zero-Shot Object Detection (ZSD). The system enhances image features and identifies logos even if they have not been seen before, utilizing MobileNet for efficient recognition. The output includes the logo name and brand description, making it applicable for various consumer-facing platforms and enhancing brand recognition.

Uploaded by

meerashaik1201
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views46 pages

Report

This project report details the development of a deep learning-based system for logo detection in images using CNN and Zero-Shot Object Detection (ZSD). The system enhances image features and identifies logos even if they have not been seen before, utilizing MobileNet for efficient recognition. The output includes the logo name and brand description, making it applicable for various consumer-facing platforms and enhancing brand recognition.

Uploaded by

meerashaik1201
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

A Project Report on

DETECTION OF LOGO IN AN IMAGE BY USING


DEEP LEARNING ALGORITHMS CNN AND ZSD

Submitted in partial fulfillment for award of

Bachelor of Technology
Degree
in
Computer Science and Engineering
By
P. Chandini (Y21ACS540) Sk. Shameem (Y21ACS572)
Sk. Sabiha Anjum (Y21ACS571) S. Gopi (Y21ACS573)
M. Rohit Bhaskar (Y20ACS498)

Under the guidance of


Mr. P. Nanda Kishore, M.Tech
Assistant professor

Department of Computer Science and Engineering


Bapatla Engineering College
(Autonomous)
(Affiliated to Acharya Nagarjuna University)
BAPATLA – 522 102, Andhra Pradesh, INDIA
2024-2025
Department of Computer Science and Engineering

CERTIFICATE

This is to certify that the project report entitled Detection of Logo

in an Image by using Deep Learning Algorithms CNN and ZSD that is

being submitted by P.Chandini (Y21ACS540), Sk.Shameem

(Y21ACS572), Sk.Sabiha Anjum(Y21ACS571), S.Gopi (Y21ACS573),

M.Rohit Bhaskar (Y20ACS498) in partial fulfillment for the award of the

Degree of Bachelor of Technology in Computer Science & Engineering to

the Acharya Nagarjuna University is a record of bonafide work carried out

by them under our guidance and supervision.

Date:

Signature of Guide Signature of HOD


Mr. P.Nanda Kishore Dr. M. Rajesh Babu
Assistant professor Assoc. Prof. & Head
DECLARATION

We declare that this project work is composed by ourselves, that the

work contained herein is our own except where explicitly stated otherwise

in the text, and that this work has not been submitted for any other degree

or professional qualification except as specified.

P. Chandini (Y21ACS540)

Sk. Shameem(Y21ACS572)

Sk. Sabiha Anjum (Y21ACS571)

S. Gopi (Y21ACS573)

M. Rohit Bhaskar (Y20ACS498)

iii
Acknowledgement
We sincerely thank the following distinguished personalities who have given

their advice and support for successful completion of the work.

We are deeply indebted to our most respected guide Mr. P. Nanda Kishore,

Asst.prof, Department of CSE, for his valuable and inspiring guidance, comments,

suggestions and encouragement.

We extend our sincere thanks to Dr. M. Rajesh Babu, Assoc. Prof. & Head of

the Dept. for extending his cooperation and providing the required resources.

We would like to thank our beloved Principal Dr.N.Rama Devi for providing

the online resources and other facilities to carry out this work.

We would like to express our sincere thanks to our project coordinator Dr.

P.Pardhasaradhi, Prof. Dept. of CSE for his helpful suggestions in presenting this

document.

We extend our sincere thanks to all other teaching faculty and non-teaching

staff of the department, who helped directly or indirectly for their cooperation and

encouragement.

P. Chandini (Y21ACS540)

Sk. Shameem(Y21ACS572)

Sk. Sabiha Anjum (Y21ACS571)

S. Gopi (Y21ACS573)

M. Rohit Bhaskar (Y20ACS498)

iv
Table of Contents
List of Figures ................................................................................................................ 7
Abstract .......................................................................................................................... 8
1 Introduction ............................................................................................................ 9
1.1 Problem Statement and Objective ................................................................. 10
1.2 Technology Background: .............................................................................. 10
1.2.1 Deep Learning Using Python: ................................................................ 10
1.2.2 Libraries ................................................................................................. 11
1.2.3 Algorithms/Models ................................................................................ 13
1.3 Runtime Environment ................................................................................... 14
2 Literature Review................................................................................................. 15
2.1.1 Traditional Approaches .......................................................................... 15
2.1.2 Deep Learning-Based Approaches ........................................................ 15
2.1.3 Logo Datasets......................................................................................... 16
2.1.4 Key Challenges ...................................................................................... 16
2.1.5 Relevance to Our Work ......................................................................... 16
3 Proposed System .................................................................................................. 18
3.1 Various modules in project ........................................................................... 18
3.1.1 Input Module .......................................................................................... 18
3.1.2 Zero-Shot Object Detection ................................................................... 18
3.1.3 Logo Extraction using OpenCV............................................................. 19
3.1.4 Logo Logo Identification using MobileNet ........................................... 19
3.1.5 Brand Mapping and Description ............................................................ 19
3.1.6 Output Module ....................................................................................... 19
3.2 Dataset ........................................................................................................... 20
3.3 Logo Detection .............................................................................................. 20
3.4 Image Processing........................................................................................... 22
3.5 Logo Identification ........................................................................................ 23
3.5.1 Why MobileNet ..................................................................................... 24
3.5.2 Model Training ...................................................................................... 24
3.5.3 Output .................................................................................................... 24
3.6 Proposed Architecture ................................................................................... 25
3.7 Advantages of Proposed System ................................................................... 26
4 Design .................................................................................................................. 27
4.1 Usecase Diagram ........................................................................................... 28

v
4.2 Class Diagram ............................................................................................... 28
4.3 Activity Diagram ........................................................................................... 30
4.4 Sequence Diagram......................................................................................... 31
4.5 Flow chart ...................................................................................................... 33
5 Implementation .................................................................................................... 34
5.1 Credibility Assessment.................................................................................. 34
5.2 Data Preprocessing ........................................................................................ 36
5.3 Model Training .............................................................................................. 37
5.3.1 Zero shot model. .................................................................................... 39
5.3.2 MobileNet Model ................................................................................... 40
5.4 Testing ........................................................................................................... 41
6 Conclusion and Future Work ............................................................................... 44
6.1 Future Enhancements .................................................................................... 44
7 References ............................................................................................................ 46

vi
List of Figures
Figure 3-1- Logo Detection process ............................................................................ 22

Figure 3-2-Proposed system Architecture ................................................................... 25

Figure 4-1-Usecase Diagram ....................................................................................... 28

Figure 4-2-Class Diagram ............................................................................................ 29

Figure 4-3-Activity Diagram ....................................................................................... 30

Figure 4-4-Sequence Diagram ..................................................................................... 32

Figure 4-5-Flow Diagram ............................................................................................ 33

Figure 5-1-Process of pre-processing of image ........................................................... 37

Figure 5-2-Training of Zero-shot algorithm ................................................................ 39

Figure 5-3-Traning of MobileNetV2 model ................................................................ 40

Figure 5-4-Detecting of Logo From the Image............................................................ 41

Figure 5-5-Identifyting the Logo and displaying Name and Discription .................... 42

Figure 5-6-Accuracy .................................................................................................... 43

7
Abstract
This project focuses on developing a deep learning-based system for Brand

Logo Recognition. The system takes an image as input and applies RGB layering to

enhance key features for logo detection. It then uses a Zero-Shot Object Detection

(ZSD) model to detect logos within the image, even if the logo has not been seen before

during training. The detected logos are matched to a dataset of logos from the Flickr

Logos Dataset, provided by Oxberg University, using a combination of the MobileNet

model and OpenCV. This ensures a fast and efficient logo recognition process, even for

images with varying conditions such as different backgrounds or lighting.

The system outputs the logo name along with a description of the brand or logo,

offering practical value for users looking to identify brands. This approach leverages

state-of-the-art techniques in deep learning and computer vision to address the

challenge of logo recognition in real-world scenarios. By integrating Zero-Shot Object

Detection and MobileNet, the model is capable of working with previously unseen

logos, making it adaptable to various applications like mobile apps, marketing tools, or

consumer-facing platforms

8
1 Introduction
In today’s world, brand recognition plays a crucial role in consumer decision-

making and business success. However, Brand recognition is vital for consumer

decisions, but many people struggle to identify logos due to varying designs, colors,

and image conditions. This project aims to solve this issue by developing a deep

learning-based system for logo recognition, allowing users to easily identify brands

from images.

The system uses RGB layering to enhance image features and applies Zero-Shot

Object Detection (ZSD) for detecting logos even when they haven't been seen during

training. ZSD allows the model to recognize logos based on semantic relationships,

making it adaptable to previously unseen logos.

After detecting the logo, the system uses MobileNet, a lightweight CNN, to match

the detected logo with entries from the Flickr Logos dataset. OpenCV handles tasks like

resizing and feature extraction to ensure efficient logo matching, even under varied

conditions like different image qualities or backgrounds.

The final output of the system provides users with both the name of the logo and a

brief description of the brand. This feature not only helps consumers identify logos but

also enhances their understanding of the brand's identity. The practical applications of

this system are vast, including integration into mobile applications, marketing tools,

and consumer-facing platforms, where quick brand recognition can be invaluable. By

combining cutting-edge deep learning techniques with a user-friendly interface, this

project aims to contribute significantly to the field of logo recognition and improve the

consumer experience.

9
1.1 Problem Statement and Objective

In today's market, consumers often struggle to identify product logos due to

varying designs, image quality, and backgrounds. Logos appear in different conditions,

such as changes in lighting, distortions, or complex backgrounds, making it difficult to

recognize them consistently. Traditional methods for logo detection and recognition

typically require large, well-curated datasets and struggle with logos that are new or

unseen by the model. As brands evolve and new logos emerge, a system that can

effectively and reliably identify logos, even those that haven't been explicitly trained,

is essential. There is a need for an efficient, real-time logo recognition system that can

handle a wide range of logo variations and accurately identify logos from images in

dynamic environments.

1.2 Technology Background:

Machine learning (ML) is a subfield of artificial intelligence (AI) that enables

computers to learn from data without being explicitly programmed, identifying patterns

and making predictions or decisions a

1.2.1 Deep Learning Using Python:

Deep learning is a subset of machine learning that focuses on using neural

networks with many layers to analyze and learn from large amounts of data. It is

particularly effective in solving complex problems such as image recognition, natural

language processing, and speech recognition. In the context of our project, deep

learning techniques are used to recognize and identify brand logos from images.

Python is one of the most popular programming languages for implementing deep

learning models, due to its simplicity, flexibility, and the wide range of powerful

10
libraries it supports. Some of the key deep learning libraries used in our project are

TensorFlow, Keras, and OpenCV.

1.2.2 Libraries

A. OS

The os library in Python is used to interact with the operating system, such as accessing

or managing files and directories. In this project, it might be used for loading or saving

datasets, managing paths, or handling other file system-related tasks

B. OpenCV

OpenCV (Open Source Computer Vision Library) is a powerful library for real-time

computer vision and image processing. It provides various tools for tasks like image

reading, manipulation, feature extraction, resizing, and applying filters.

In our project, OpenCV is used to handle image preprocessing and manipulation tasks

such as resizing images, applying filters, and extracting features, as well as possibly for

logo matching

C. JSON

The json library is used to work with JSON (JavaScript Object Notation) data. This

allows you to parse and handle structured data, such as configuration files or datasets,

in a human-readable format.

It might be used in our project for loading logo dataset information, handling metadata,

or reading configuration files.

11
D. NumPy

NumPy is a widely-used library for numerical computing in Python. It provides support

for large multi-dimensional arrays and matrices, as well as a collection of mathematical

functions to operate on these arrays.

In our project, NumPy is likely used for image data manipulation, such as handling

pixel data, transforming image arrays, or performing matrix operations required for

image processing.

E. TensorFlow

TensorFlow is an open-source deep learning framework developed by Google. It is used

for building and training machine learning models, including neural networks.

In our project, TensorFlow is used to load pre-trained models (like MobileNet or Zero-

Shot Object Detection models) for logo detection and recognition tasks. It handles the

deep learning aspects of the project, including model inference.

F. Matplotlib

Matplotlib is a popular plotting library in Python used for data visualization. It is used

to create static, animated, and interactive visualizations.

In our project, matplotlib might be used to display images or visual results (such as

showing the output logo and its detected name) or to plot graphs and performance

metrics for model evaluation.

G. PIL (Pillow):

Pillow is a Python Imaging Library (PIL) fork that adds support for opening,

manipulating, and saving many different image file formats.

12
Pillow is likely used to handle the loading and manipulation of image data before

feeding it into the deep learning model for logo detection.

H. Ipywidgets

ipywidgets is a library that enables the creation of interactive widgets in Jupyter

Notebooks or other IPython environments. It allows users to create GUI-like elements,

such as buttons, sliders, and text boxes, that can interact with the notebook.

In our project, ipywidgets might be used to create interactive controls for displaying

results or controlling the flow of the system within a notebook environment.

I. IPython.display

IPython.display provides tools for displaying rich media (such as images, videos, and

widgets) in Jupyter notebooks or IPython environments.

It is used in our project for displaying the results of logo recognition, such as showing

the detected logo or clearing the output to refresh the displayed information.

1.2.3 Algorithms/Models

A. Zero-Shot Object Detection:

Zero-Shot Object Detection (ZSD) is an advanced deep learning technique that

allows the model to detect objects (logos, in our case) even if they were not seen during

the training phase. This is typically achieved by associating visual features of objects

with semantic information (e.g., class labels, attributes, or textual descriptions).

This method helps our system identify logos that were not explicitly part of the training

dataset, improving its robustness and adaptability.

13
B. MobileNet:

MobileNet is a lightweight convolutional neural network (CNN) architecture optimized

for performance on mobile and embedded devices. It is designed to be both fast and

accurate, making it suitable for real-time applications.

In our project, MobileNet is used for logo matching, comparing the detected logo with

the logos in the Flickr Logos dataset. The pre-trained MobileNet model is fine-tuned

for logo recognition tasks, ensuring efficient and accurate identification.

C. Convolutional Neural Networks (CNN):

CNNs are a class of deep learning algorithms primarily used for image processing tasks

such as Logo Identification and object detection. They work by applying convolutional

layers that extract features from images.

Both Zero-Shot Object Detection and MobileNet models are built upon the principles

of CNNs. CNNs help in learning spatial hierarchies of features, making them highly

effective for logo recognition and image Logo Identification.

1.3 Runtime Environment

The development and execution of this project are carried out using Google

Colaboratory (Google Colab), a cloud-based Python runtime environment that allows

for writing and executing code in Jupyter notebooks. Google Colab provides a

powerful, flexible, and easily accessible environment ideal for deep learning and

compute vision projects.

14
2 Literature Review
Logo detection and recognition have gained significant traction in recent years,

especially due to their relevance in areas such as brand monitoring, product analysis,

and augmented reality. Traditional computer vision methods initially tackled this

problem using handcrafted feature extractors, while modern approaches leverage the

power of deep learning to achieve high accuracy and robustness in real-world scenarios.

2.1.1 Traditional Approaches

Earlier techniques for logo detection relied on feature-matching algorithms like

SIFT, SURF, and ORB, which extract key points from images and match them across

different instances. These methods were often paired with Bag-of-Visual-Words

(BoVW) models to classify or detect logos. While efficient, they were prone to failure

in conditions involving occlusion, deformation, varying scales, or cluttered

backgrounds [1].

2.1.2 Deep Learning-Based Approaches

The rise of convolutional neural networks (CNNs) significantly improved the

state of logo detection. Architectures such as RCNN, Fast RCNN, Faster RCNN,

YOLO, and SSD enabled object detection with bounding boxes and higher accuracy,

even in challenging settings [2]. These models process full images and output the spatial

location and class of objects, making them suitable for logo detection in unconstrained

environments.

In the case of limited training data or unseen logos, few-shot and zero-shot

learning methods have proven useful. Zero-shot approaches, in particular, allow

detection of objects not seen during training by leveraging visual-semantic models like

15
CLIP (Contrastive Language–Image Pre-training) or OWL-V2 (Open-World Object

Detection with Vision Transformers). These models match image regions with textual

descriptions to identify logos without the need for custom bounding box annotations.

2.1.3 Logo Datasets

Several benchmark datasets are available for training and evaluation. Among

them:

FlickrLogos-32: Contains 32 logo classes with annotated images and is widely used in

research. This is the primary dataset used in our project.

Logos in the Wild (LITW) and WebLogo-2M: Provide a larger and more diverse set of

logos captured in real-world environments, aiding in generalization and scalability.

2.1.4 Key Challenges

Despite advancements, logo detection remains challenging due to:

1. Variation in logo size, shape, and orientation

2. Low resolution or partially occluded logos

3. Presence of multiple or overlapping logos in an image

4. Highly cluttered or noisy backgrounds

2.1.5 Relevance to Our Work

Our project builds upon these developments by integrating a zero-shot object

detector to locate logos in user-uploaded images. We then use OpenCV for logo

extraction and a lightweight CNN model, MobileNet, trained on the FlickrLogos-32

dataset to classify the cropped logos. The system ultimately maps the detected logo to

16
its brand name and provides a short description of the brand. This hybrid approach,

combining zero-shot detection with deep learning-based Logo Identification, allows for

a robust and flexible logo recognition pipeline.

17
3 Proposed System
The proposed system is designed to identify and classify brand logos from user-

uploaded images using a combination of zero-shot object detection, image processing

techniques, and deep learning-based Logo Identification. The system helps users

recognize brand logos even when they are unfamiliar with the brand name, by providing

both the brand name and a brief description as output.

3.1 Various modules in project

The proposed system consists of four main modules that work together to identify

and classify logos from uploaded images. The Input Module allows users to upload an

image containing a potential brand logo. The Detection Module employs a zero-shot

object detection model to locate the logo region within the image, even if the logo was

not part of the training data. Once detected, the Logo Identification Module extracts the

logo using OpenCV and classifies it using a MobileNet model trained on the

FlickrLogos-32 dataset. Finally, the Output Module presents the user with the identified

brand name and a brief description of the brand, providing an intuitive and informative

result.

3.1.1 Input Module

The user uploads an image that contains a logo.

The image can be in any resolution or background context.

3.1.2 Zero-Shot Object Detection

A zero-shot object detection model (e.g., OWL-V2 or CLIP-based) is used to detect the

region of the image that contains the logo.

18
Zero-shot models are ideal for detecting logos that may not have been explicitly labeled

during training.

These models match image regions with text queries (e.g., “logo”) to detect relevant

areas.

3.1.3 Logo Extraction using OpenCV

Once the logo region is identified, OpenCV is used to crop the detected bounding box

from the image.

Preprocessing steps such as resizing, normalization, and noise removal are applied to

prepare the cropped logo for Logo Identification.

3.1.4 Logo Identification using MobileNet

The extracted logo is passed into a trained MobileNet classifier. MobileNet is chosen

for its efficiency and accuracy in real-time applications.

The model is trained using the FlickrLogos-32 dataset, which contains 32 different logo

classes.

3.1.5 Brand Mapping and Description

After Logo Identification, the logo class is mapped to the corresponding brand name.

A short description about the brand is retrieved from a predefined database or JSON

file linked to the FlickrLogos dataset.

3.1.6 Output Module

The Output Module is responsible for presenting the final results to the user in a clear

and user-friendly manner. After the logo is successfully detected and classified, this

19
module displays the brand name associated with the logo, along with a brief description

of the brand. The output may also include the cropped image of the detected logo for

visual confirmation. This ensures that users receive accurate and helpful information

about the brand they are curious about, making the system both informative and easy

to use.

3.2 Dataset

The dataset used in this project is the FlickrLogos-32 dataset, provided by

Augsburg University. It is a well-known benchmark dataset specifically designed for

logo detection and recognition tasks. The dataset contains images for 32 popular brand

logos, with over 8,000 images in total. These include both logo images in natural scenes

and clean logo samples. Each image in the dataset comes with bounding box

annotations indicating the exact location of the logo within the image, which is useful

for training and evaluation. The dataset is divided into training, validation, and test sets,

enabling structured model development and testing. The diversity of scenes and logo

placements in the dataset makes it ideal for training models that can generalize well to

real-world conditions, and it plays a crucial role in the performance of the MobileNet

Logo Identification model in our system.

3.3 Logo Detection

Logo detection plays a central role in the proposed system by identifying the exact

region of a logo within a user-uploaded image. In this project, the detection process is

powered by a Zero-Shot Object Detection model, which is enhanced through the use of

an RGB layering technique to improve accuracy and localization. The zero-shot model

is capable of detecting logos without needing to be explicitly trained on each individual

20
brand. It achieves this by leveraging pre-trained visual-semantic models (such as CLIP

or OWL-V2) that understand both visual features and textual queries like “logo.”

Before feeding the image to the zero-shot detector, an RGB layering process is applied.

This involves separating the image into its Red, Green, and Blue channels, which can

help in highlighting the contrast and shapes that are often present in logos. By analyzing

these separate color layers, the system can enhance edges, patterns, and contours,

making the logo stand out more distinctly against the background. This pre-processing

step improves the performance of the detection model by enabling it to better focus on

the features that are most likely associated with logos.

Once the layered image is processed, the zero-shot model is used to identify and

generate bounding box coordinates around the detected logo region. This localized

region is then extracted using OpenCV, preparing it for the next stage of Logo

Identification. The combination of RGB layering and zero-shot detection ensures that

the system can accurately detect logos of varying sizes, shapes, and colors—even in

complex and noisy real-world environments.

21
The process will be understandable easily in the following figure,

Figure 3-1- Logo Detection process

3.4 Image Processing

Image processing is a crucial stage in the logo detection pipeline, serving as a

bridge between the raw input image and the logo Logo Identification step. In this

project, image processing is primarily handled using OpenCV, a powerful open-source

computer vision library.

After the logo is detected using the zero-shot object detector, the system retrieves

the bounding box coordinates of the detected logo. These coordinates are then used to

crop the logo region from the original image using OpenCV functions. This cropped

section isolates the logo from any unnecessary background elements, improving the

accuracy of the Logo Identification model.

22
To further enhance the quality of the cropped logo, several preprocessing

techniques are applied:

Resizing: The image is resized to match the input size expected by the MobileNet model

(usually 224x224 pixels).

RGB Channel Separation: In some cases, the image is split into R, G, and B layers to

analyze contrast and highlight features that are distinctive to logos.

Normalization: Pixel values are scaled to a standard range (typically between 0 and 1)

to ensure consistency across different inputs.

Noise Removal (optional): Filters can be applied to reduce background noise and

sharpen the logo edges.

These image processing steps help in making the logo features more distinguishable

and consistent, allowing the classifier to perform more reliably. Overall, image

processing plays a key role in preparing and enhancing the input for accurate brand

identification.

3.5 Logo Identification

The Logo Identification phase in this project is responsible for identifying the

specific brand logo from the cropped image obtained during the detection and image

processing stages. For this purpose, we use MobileNet, a lightweight and efficient

convolutional neural network (CNN) architecture, which is well-suited for real-time

and resource-constrained environments such as web-based or mobile applications.

Once the logo is detected and cropped using OpenCV, it is resized and

preprocessed to match the input requirements of the MobileNet model (typically

23
224x224 pixels and normalized pixel values). This preprocessed logo image is then

passed to the trained MobileNet classifier. [3]

3.5.1 Why MobileNet

Lightweight and Fast: MobileNet is designed for speed and low computational cost,

making it ideal for cloud-based environments like Google Colab or even on-device

inference.

Accurate: Despite its small size, MobileNet provides competitive performance in image

Logo Identification tasks.

Transfer Learning Support: Pre-trained versions of MobileNet on large datasets like

ImageNet allow for faster training and better accuracy when fine-tuned on custom

datasets such as FlickrLogos-32.

3.5.2 Model Training

The MobileNet model is fine-tuned on the FlickrLogos-32 dataset, which contains

images of 32 popular brand logos. During training:

The dataset is split into training, validation, and testing sets.

The model learns to associate visual features with specific logo classes.

3.5.3 Output

After Logo Identification, the model predicts the logo class (e.g., Adidas,

Starbucks, Nike, etc.) based on the features extracted from the image. The predicted

class is then mapped to a brand name and a short description, which is displayed to the

user in the final output.

24
3.6 Proposed Architecture

The architecture of the proposed logo detection and recognition system is

designed as a modular pipeline, combining zero-shot object detection, image

processing, and deep learning Logo Identification to identify brand logos from

uploaded images. Below is a breakdown of each component in the architecture:

Figure 3-2-Proposed system Architecture

25
3.7 Advantages of Proposed System

1. No Need Extensive Training on All Logos: By using a zero-shot object detector,

the system can detect logos even if they were not explicitly present during

training. This makes the system scalable and adaptable to new logos without

retraining [4].

2. Lightweight and Fast Logo Identification: The use of MobileNet, a lightweight

deep learning model, allows for fast and efficient Logo Identification, making

the system suitable for real-time applications and low-resource environments.

3. Effective Preprocessing with RGB Layering: Incorporating RGB layering

before detection enhances the clarity and contrast of logos, especially in

complex backgrounds, improving detection accuracy [5].

4. User-Friendly and Accessible: The system allows users to simply upload an

image and get brand information instantly, even if they don’t know the brand

name—making it highly intuitive and helpful in real-life scenarios [5].

5. Cloud-Based Execution with Google Colab: Running the system on Google

Colab provides access to powerful GPUs and TPUs, removing the need for high-

end local hardware and enabling easy sharing and collaboration [6].

6. Robust Dataset Training: The system is trained on the FlickrLogos-32 dataset,

which includes diverse real-world scenarios and backgrounds, helping the

model generalize better to unseen images [6].

7. Flexible Integration: The modular design allows easy integration with other

platforms or applications, such as e-commerce tools, camera-based apps, or

browser extensions [6].

26
4 Design
The design of the proposed logo detection and recognition system is structured

using UML (Unified Modeling Language) to visually represent the architecture and

internal processes. UML helps in effectively planning and organizing the system by

modeling how each module interacts, making it easier to understand, develop, and

maintain. The design follows a modular approach, where each component—such as

image preprocessing, logo detection, Logo Identification, and output generation—is

treated as an independent unit with a specific role.

To represent user interaction and system behavior, a Use Case Diagram is used,

showing how the user uploads an image and receives the detected brand name and

description. An Activity Diagram illustrates the step-by-step flow of operations,

beginning from image upload, through RGB preprocessing and detection, to final Logo

Identification and output. A Class Diagram outlines the major components or classes

involved in the system—like ImageProcessor, LogoDetector, and Classifier along with

their attributes and methods, and how they relate to each other. Additionally, a

Sequence Diagram demonstrates the order in which tasks are executed, capturing the

interaction between system components during the logo recognition process.

This structured design approach using UML ensures clarity, improves

maintainability, and supports scalability, making it easier to upgrade or expand the

system in the future

27
4.1 Usecase Diagram

The Use Case Diagram provides a high-level visual representation of how the

user interacts with the proposed logo detection system. It helps in understanding the

functional requirements of the system by identifying the actors (users or external

systems) and the various operations they can perform.

In our project, the primary actor is the User, who interacts with the system

through a simple interface.

Figure 4-1-Usecase Diagram

4.2 Class Diagram

The Class Diagram is a structural UML diagram that describes the internal design

of the proposed logo detection system by illustrating its main classes, their attributes,

28
methods, and the relationships between them. It helps developers understand how the

system is organized and how different components interact with each other during

execution.

Each class is designed with specific responsibilities, promoting modularity,

reusability, and separation of concerns. Relationships like associations between classes

(e.g., the UserInterface depends on the ImageProcessor, which interacts with the

LogoDetector) are also captured to show how data and control flow between

components.

Figure 4-2-Class Diagram

29
4.3 Activity Diagram

The Activity Diagram is a type of UML diagram that represents the workflow of

the system by showing the sequence of activities and the flow of control from one step

to the next. It is especially useful for visualizing the dynamic aspects of the system and

how different processes interact during execution.

This diagram provides a clear overview of how data flows through the system.

Figure 4-3-Activity Diagram

30
4.4 Sequence Diagram

The Sequence Diagram is a type of UML diagram that represents the interaction

between different components (objects or classes) in the system over time. It visually

describes how data and messages are passed between modules, emphasizing the

sequence and timing of operations.

In the proposed logo detection system, the sequence diagram illustrates the

interaction between the system’s core components as they work together to detect and

recognize a logo from an uploaded image.

This diagram helps in understanding how the system components collaborate, the

order in which tasks are executed, and how data flows through the system in a time-

sensitive manner. It is especially useful for developers during the implementation phase

to maintain the correct interaction flow between modules

31
Figure 4-4-Sequence Diagram

32
4.5 Flow chart

The flow chart represents the step-by-step working of the Logo Detection and

Classification System. It outlines how a user interacts with the system and how the

system processes the input to generate meaningful output. Each step in the process is

described below:

Figure 4-5-Flow Diagram

This flow ensures a smooth pipeline from input to informative output, making the logo

detection system intuitive and effective for end users.

33
5 Implementation
The implementation phase involves bringing together various components—image

processing, deep learning models, and Logo Identification techniques—to create a

complete and functional logo detection system. The project was developed and

executed using Google Colab, which provides GPU support for faster processing and

eliminates the need for local hardware setup.

5.1 Credibility Assessment

The credibility of this project is assessed based on several key factors including the

reliability of the methodology, the quality of the data sources, the transparency of the

processes used, and the technical soundness of the tools and models applied.

1. Methodology Reliability: The project employs a zero-shot object detection

approach, which is a modern and robust technique capable of detecting unseen

logos without the need for retraining on every possible class. This method

reflects current advancements in computer vision and ensures adaptability in

real-world scenarios.

2. Use of Trusted Tools and Models: The system leverages OpenCV for image

processing and MobileNet for feature extraction—both of which are well-

established and widely used in the AI and computer vision community. These

tools contribute to the technical reliability and performance consistency of the

project.

3. Data Source Quality: The Flickr Logos dataset from Augsburg University is a

publicly available and academically recognized dataset. Its usage not only

34
ensures data credibility but also supports reproducibility, as other researchers

can access and evaluate the same data.

4. Accuracy and Validation: The system's output—logo name and brand

description—is based on a well-structured pipeline that includes detection,

extraction, and brand identification. Although zero-shot models are generalized,

the accuracy can be evaluated and improved through testing on benchmark

datasets or by adding feedback loops for refinement.

5. Transparency and Reproducibility: The design and implementation of the

system follow a modular and transparent approach, making it easy to

understand, debug, and reproduce. All components—from detection to brand

mapping—are based on documented algorithms and open-source tools.

6. Ethical and Responsible Use: The project is designed for educational or

commercial use cases that respect intellectual property and branding guidelines.

No personal or sensitive data is processed, ensuring ethical compliance.

35
5.2 Data Preprocessing

To ensure consistency and improve the performance of the deep learning model, all

logo images used in the project undergo a data preprocessing step. This involves

resizing each image to a fixed size of 224×224 pixels, which is the standard input

dimension for the MobileNet architecture used for feature extraction.

1. Directory Traversal: The system recursively traverses through the input folder

and all its subdirectories to locate valid image files. Supported formats include

.jpg, .jpeg, .png, .bmp, .tiff, .tif, and .webp.

2. Image Validation: Only files with valid image extensions are considered. Any

unsupported or non-image files are skipped.

3. Image Reading and Resizing: Each valid image is read using the OpenCV

library. If the image is successfully loaded, it is resized to 224x224 pixels using

cv2.resize() to standardize the input dimensions across the entire dataset.

4. Preservation of Folder Structure: To maintain dataset organization, the

relative subdirectory structure of the original dataset is preserved in the output

directory. This ensures that class or brand-wise folder organization remains

intact.

5. Saving the Processed Images: The resized images are saved in a separate

output directory. If a directory path does not exist, it is created automatically to

avoid write errors.

36
Figure 5-1-Process of pre-processing of image

5.3 Model Training

The logo detection model was trained to accurately identify and localize brand logos

within natural images. The training process involved the following key steps:

1. Preprocessing

All images were resized and normalized to ensure consistency across the

dataset. Data augmentation techniques, such as random cropping, flipping,

rotation, and color jittering, were applied to increase robustness to variations in

scale, orientation, and lighting conditions.

2. ModelArchitecture

A zero-shot object detection approach was adopted, using a pretrained backbone

(e.g., CLIP or MobileNet) combined with a region proposal network or similar

architecture to detect and classify logos without task-specific retraining. This

enabled the system to generalize to unseen logo classes during inference.

37
3. TrainingConfiguration

The model was trained using a cross-entropy loss for Logo Identification and a

smooth L1 loss for bounding box regression. An Adam optimizer was used with

a learning rate scheduler to adjust learning rates during training. The model was

trained for N epochs (adjust based on your setup) with early stopping to prevent

overfitting.

4. EvaluationMetrics

Model performance was evaluated using standard object detection metrics,

including mAP (mean Average Precision) at different IoU thresholds, precision,

recall, and F1-score. Evaluation was conducted on a held-out validation set to

assess generalization.

5. TransferLearning

To improve accuracy and training efficiency, transfer learning was leveraged

by fine-tuning a pretrained model. This approach helped retain general visual

features while adapting to the specific task of logo detection.

38
5.3.1 Zero shot model.

In this project, we utilize a zero-shot object detection model, specifically OWL-V2

(google/owlv2-base-patch16-ensemble) from Hugging Face Transformers, to identify

brand logos in images without requiring task-specific training. Zero-shot learning

allows the model to detect and classify objects based solely on textual prompts,

enabling flexible and scalable detection of logos by simply specifying brand names as

labels.

Figure 5-2-Training of Zero-shot algorithm

39
5.3.2 MobileNet Model

To build an efficient and lightweight logo detection model, MobileNet was employed

as the backbone for feature extraction. MobileNet is a deep convolutional neural

network architecture designed for mobile and embedded vision applications, offering

an excellent balance between accuracy and computational efficiency.

Figure 5-3-Traning of MobileNetV2 model

40
5.4 Testing

A separate set of logo images was used for testing. These images varied in resolution,

background complexity, and lighting conditions to reflect realistic use cases. Each

image was preprocessed to match the input format required by the model—resized to

224x224 pixels and normalized to a 0,10, 10,1 range.

Figure 5-4-Detecting of Logo From the Image

During testing, each image was passed through the trained model, which outputted a

probability distribution across all known logo classes. The class with the highest

probability was selected as the predicted output. The predicted label was then compared

against the ground truth to evaluate accuracy.

41
Figure 5-5-Identifyting the Logo and displaying Name and Discription

5.5 Accuracy

The logo recognition system, built using the MobileNetV2 model, demonstrates an

overall classification accuracy of approximately 62% on the test set. The confusion

matrix indicates that the model performs well on several frequently occurring logos but

struggles with less represented or visually similar logos. Despite the challenging nature

of logo classification—due to varying sizes, colors, and backgrounds—the model is

able to correctly classify a majority of the logos. Fine-tuning using cropped logos and

a diverse dataset has contributed to this level of performance. The system effectively

42
integrates logo detection, cropping, and classification into a single pipeline. Further

improvements such as data augmentation or model ensembling could boost

performance beyond 70%.

Figure 5-6-Accuracy

43
6 Conclusion and Future Work

The proposed system successfully addresses the challenge of identifying brand

logos from user-uploaded images by combining zero-shot object detection, image

processing, and deep learning techniques. The integration of RGB layering and

OpenCV enhances the system’s ability to accurately isolate logo regions, while the use

of MobileNet ensures efficient and reliable Logo Identification of the detected logos.

With a user-friendly interface and a robust backend pipeline, the system offers an

accessible solution for users who wish to identify brands simply by uploading images.

Training the system using the FlickrLogos-32 dataset also ensures a good level of

accuracy and generalization across commonly encountered logos.

6.1 Future Enhancements

1. Support for More Logos: Expanding the Logo Identification to include more

brand logos, including lesser-known and regional brands, by integrating

additional datasets.

2. Logo Recognition in Video Streams: Extending the model to work with video

input by extracting frames and continuously detecting logos across time. This

would enable applications like brand monitoring in sports broadcasts,

advertisements, or surveillance footage.

3. Multilingual Brand Descriptions: Incorporating brand descriptions in various

languages to make the system globally usable.

4. Real-Time Detection: Adapting the system for real-time use by processing

images captured directly from webcams or mobile cameras.

44
5. Robustness in Complex Backgrounds: Improving logo detection in noisy,

low-light, or cluttered backgrounds through advanced preprocessing or deep

segmentation models.

6. Mobile Application Integration: Developing a mobile app version of the

system for convenient, on-the-go logo detection.

45
7 References
[1] Marisa Bernabeu, Antonio Javier Gallego, and A. Pertusa. 2022. Multi-label logo

recognition and retrieval based on weighted fusion of neural features.

[2] Pedro Carvalho, Américo Pereira, and Paula Viana. 2021. Automatic TV logo

identification for advertisement detection without prior data. Appl. Sci. 11, 16 (2021),

74–94

[3] Umesh D. Dixit, M. S. Shirdhonkar, and G. R. Sinha. 2022. Automatic logo

detection from document image using HOG features. Multimedia Tools Appl. 82

(2022), 863–878

[4] Alexey Bochkovskiy, Chien Yao Wang, and H. Liao. 2020. YOLOv4: Optimal

speed and accuracy of object detection.

[5] Pedro Carvalho, Américo Pereira, and Paula Viana. 2021. Automatic TV logo

identification for advertisement detection without prior data. Appl. Sci. 11, 16 (2021),

74–94.

[6] Hang Chen, Xiao Li, Zefan Wang, and Xiaolin Hu. 2021. Robust logo detection in

e-commerce images by data augmentation. In Proceedings of the 29th ACM

International Conference on Multimedia. 4789–4793.

46

You might also like