Report
Report
                     Bachelor of Technology
                              Degree
                                 in
            Computer Science and Engineering
                                 By
P. Chandini (Y21ACS540)                Sk. Shameem (Y21ACS572)
Sk. Sabiha Anjum (Y21ACS571)                 S. Gopi (Y21ACS573)
                M. Rohit Bhaskar (Y20ACS498)
CERTIFICATE
Date:
work contained herein is our own except where explicitly stated otherwise
in the text, and that this work has not been submitted for any other degree
P. Chandini (Y21ACS540)
Sk. Shameem(Y21ACS572)
S. Gopi (Y21ACS573)
                                     iii
                               Acknowledgement
       We sincerely thank the following distinguished personalities who have given
We are deeply indebted to our most respected guide Mr. P. Nanda Kishore,
Asst.prof, Department of CSE, for his valuable and inspiring guidance, comments,
We extend our sincere thanks to Dr. M. Rajesh Babu, Assoc. Prof. & Head of
the Dept. for extending his cooperation and providing the required resources.
We would like to thank our beloved Principal Dr.N.Rama Devi for providing
the online resources and other facilities to carry out this work.
We would like to express our sincere thanks to our project coordinator Dr.
P.Pardhasaradhi, Prof. Dept. of CSE for his helpful suggestions in presenting this
document.
We extend our sincere thanks to all other teaching faculty and non-teaching
staff of the department, who helped directly or indirectly for their cooperation and
encouragement.
P. Chandini (Y21ACS540)
Sk. Shameem(Y21ACS572)
S. Gopi (Y21ACS573)
                                            iv
                                                Table of Contents
List of Figures ................................................................................................................ 7
Abstract .......................................................................................................................... 8
1      Introduction ............................................................................................................ 9
    1.1       Problem Statement and Objective ................................................................. 10
    1.2       Technology Background: .............................................................................. 10
       1.2.1         Deep Learning Using Python: ................................................................ 10
       1.2.2         Libraries ................................................................................................. 11
       1.2.3         Algorithms/Models ................................................................................ 13
    1.3       Runtime Environment ................................................................................... 14
2      Literature Review................................................................................................. 15
       2.1.1         Traditional Approaches .......................................................................... 15
       2.1.2         Deep Learning-Based Approaches ........................................................ 15
       2.1.3         Logo Datasets......................................................................................... 16
       2.1.4         Key Challenges ...................................................................................... 16
       2.1.5         Relevance to Our Work ......................................................................... 16
3      Proposed System .................................................................................................. 18
    3.1       Various modules in project ........................................................................... 18
       3.1.1         Input Module .......................................................................................... 18
       3.1.2         Zero-Shot Object Detection ................................................................... 18
       3.1.3         Logo Extraction using OpenCV............................................................. 19
       3.1.4         Logo Logo Identification using MobileNet ........................................... 19
       3.1.5         Brand Mapping and Description ............................................................ 19
       3.1.6         Output Module ....................................................................................... 19
    3.2       Dataset ........................................................................................................... 20
    3.3       Logo Detection .............................................................................................. 20
    3.4       Image Processing........................................................................................... 22
    3.5       Logo Identification ........................................................................................ 23
       3.5.1         Why MobileNet ..................................................................................... 24
       3.5.2         Model Training ...................................................................................... 24
       3.5.3         Output .................................................................................................... 24
    3.6       Proposed Architecture ................................................................................... 25
    3.7       Advantages of Proposed System ................................................................... 26
4      Design .................................................................................................................. 27
    4.1       Usecase Diagram ........................................................................................... 28
                                                                 v
    4.2     Class Diagram ............................................................................................... 28
    4.3     Activity Diagram ........................................................................................... 30
    4.4     Sequence Diagram......................................................................................... 31
    4.5     Flow chart ...................................................................................................... 33
5     Implementation .................................................................................................... 34
    5.1     Credibility Assessment.................................................................................. 34
    5.2     Data Preprocessing ........................................................................................ 36
    5.3     Model Training .............................................................................................. 37
      5.3.1        Zero shot model. .................................................................................... 39
      5.3.2        MobileNet Model ................................................................................... 40
    5.4     Testing ........................................................................................................... 41
6     Conclusion and Future Work ............................................................................... 44
    6.1     Future Enhancements .................................................................................... 44
7     References ............................................................................................................ 46
                                                              vi
                                          List of Figures
Figure 3-1- Logo Detection process ............................................................................ 22
Figure 5-5-Identifyting the Logo and displaying Name and Discription .................... 42
                                                            7
                                      Abstract
       This project focuses on developing a deep learning-based system for Brand
Logo Recognition. The system takes an image as input and applies RGB layering to
enhance key features for logo detection. It then uses a Zero-Shot Object Detection
(ZSD) model to detect logos within the image, even if the logo has not been seen before
during training. The detected logos are matched to a dataset of logos from the Flickr
model and OpenCV. This ensures a fast and efficient logo recognition process, even for
The system outputs the logo name along with a description of the brand or logo,
offering practical value for users looking to identify brands. This approach leverages
Detection and MobileNet, the model is capable of working with previously unseen
logos, making it adaptable to various applications like mobile apps, marketing tools, or
consumer-facing platforms
                                           8
                               1 Introduction
    In today’s world, brand recognition plays a crucial role in consumer decision-
making and business success. However, Brand recognition is vital for consumer
decisions, but many people struggle to identify logos due to varying designs, colors,
and image conditions. This project aims to solve this issue by developing a deep
learning-based system for logo recognition, allowing users to easily identify brands
from images.
The system uses RGB layering to enhance image features and applies Zero-Shot
Object Detection (ZSD) for detecting logos even when they haven't been seen during
training. ZSD allows the model to recognize logos based on semantic relationships,
After detecting the logo, the system uses MobileNet, a lightweight CNN, to match
the detected logo with entries from the Flickr Logos dataset. OpenCV handles tasks like
resizing and feature extraction to ensure efficient logo matching, even under varied
The final output of the system provides users with both the name of the logo and a
brief description of the brand. This feature not only helps consumers identify logos but
also enhances their understanding of the brand's identity. The practical applications of
this system are vast, including integration into mobile applications, marketing tools,
project aims to contribute significantly to the field of logo recognition and improve the
consumer experience.
                                           9
1.1 Problem Statement and Objective
varying designs, image quality, and backgrounds. Logos appear in different conditions,
recognize them consistently. Traditional methods for logo detection and recognition
typically require large, well-curated datasets and struggle with logos that are new or
unseen by the model. As brands evolve and new logos emerge, a system that can
effectively and reliably identify logos, even those that haven't been explicitly trained,
is essential. There is a need for an efficient, real-time logo recognition system that can
handle a wide range of logo variations and accurately identify logos from images in
dynamic environments.
computers to learn from data without being explicitly programmed, identifying patterns
networks with many layers to analyze and learn from large amounts of data. It is
language processing, and speech recognition. In the context of our project, deep
learning techniques are used to recognize and identify brand logos from images.
Python is one of the most popular programming languages for implementing deep
learning models, due to its simplicity, flexibility, and the wide range of powerful
                                           10
libraries it supports. Some of the key deep learning libraries used in our project are
1.2.2 Libraries
A. OS
The os library in Python is used to interact with the operating system, such as accessing
or managing files and directories. In this project, it might be used for loading or saving
B. OpenCV
OpenCV (Open Source Computer Vision Library) is a powerful library for real-time
computer vision and image processing. It provides various tools for tasks like image
In our project, OpenCV is used to handle image preprocessing and manipulation tasks
such as resizing images, applying filters, and extracting features, as well as possibly for
logo matching
C. JSON
The json library is used to work with JSON (JavaScript Object Notation) data. This
allows you to parse and handle structured data, such as configuration files or datasets,
in a human-readable format.
It might be used in our project for loading logo dataset information, handling metadata,
                                            11
D. NumPy
In our project, NumPy is likely used for image data manipulation, such as handling
pixel data, transforming image arrays, or performing matrix operations required for
image processing.
E. TensorFlow
for building and training machine learning models, including neural networks.
In our project, TensorFlow is used to load pre-trained models (like MobileNet or Zero-
Shot Object Detection models) for logo detection and recognition tasks. It handles the
F. Matplotlib
Matplotlib is a popular plotting library in Python used for data visualization. It is used
In our project, matplotlib might be used to display images or visual results (such as
showing the output logo and its detected name) or to plot graphs and performance
G. PIL (Pillow):
Pillow is a Python Imaging Library (PIL) fork that adds support for opening,
                                            12
Pillow is likely used to handle the loading and manipulation of image data before
H. Ipywidgets
such as buttons, sliders, and text boxes, that can interact with the notebook.
In our project, ipywidgets might be used to create interactive controls for displaying
I. IPython.display
IPython.display provides tools for displaying rich media (such as images, videos, and
It is used in our project for displaying the results of logo recognition, such as showing
the detected logo or clearing the output to refresh the displayed information.
1.2.3 Algorithms/Models
allows the model to detect objects (logos, in our case) even if they were not seen during
the training phase. This is typically achieved by associating visual features of objects
This method helps our system identify logos that were not explicitly part of the training
                                            13
B. MobileNet:
for performance on mobile and embedded devices. It is designed to be both fast and
In our project, MobileNet is used for logo matching, comparing the detected logo with
the logos in the Flickr Logos dataset. The pre-trained MobileNet model is fine-tuned
CNNs are a class of deep learning algorithms primarily used for image processing tasks
such as Logo Identification and object detection. They work by applying convolutional
Both Zero-Shot Object Detection and MobileNet models are built upon the principles
of CNNs. CNNs help in learning spatial hierarchies of features, making them highly
The development and execution of this project are carried out using Google
for writing and executing code in Jupyter notebooks. Google Colab provides a
powerful, flexible, and easily accessible environment ideal for deep learning and
                                            14
                           2 Literature Review
    Logo detection and recognition have gained significant traction in recent years,
especially due to their relevance in areas such as brand monitoring, product analysis,
and augmented reality. Traditional computer vision methods initially tackled this
problem using handcrafted feature extractors, while modern approaches leverage the
power of deep learning to achieve high accuracy and robustness in real-world scenarios.
SIFT, SURF, and ORB, which extract key points from images and match them across
(BoVW) models to classify or detect logos. While efficient, they were prone to failure
backgrounds [1].
state of logo detection. Architectures such as RCNN, Fast RCNN, Faster RCNN,
YOLO, and SSD enabled object detection with bounding boxes and higher accuracy,
even in challenging settings [2]. These models process full images and output the spatial
location and class of objects, making them suitable for logo detection in unconstrained
environments.
In the case of limited training data or unseen logos, few-shot and zero-shot
detection of objects not seen during training by leveraging visual-semantic models like
                                           15
CLIP (Contrastive Language–Image Pre-training) or OWL-V2 (Open-World Object
Detection with Vision Transformers). These models match image regions with textual
descriptions to identify logos without the need for custom bounding box annotations.
Several benchmark datasets are available for training and evaluation. Among
them:
FlickrLogos-32: Contains 32 logo classes with annotated images and is widely used in
Logos in the Wild (LITW) and WebLogo-2M: Provide a larger and more diverse set of
detector to locate logos in user-uploaded images. We then use OpenCV for logo
dataset to classify the cropped logos. The system ultimately maps the detected logo to
                                           16
its brand name and provides a short description of the brand. This hybrid approach,
combining zero-shot detection with deep learning-based Logo Identification, allows for
                                           17
                            3 Proposed System
     The proposed system is designed to identify and classify brand logos from user-
techniques, and deep learning-based Logo Identification. The system helps users
recognize brand logos even when they are unfamiliar with the brand name, by providing
The proposed system consists of four main modules that work together to identify
and classify logos from uploaded images. The Input Module allows users to upload an
image containing a potential brand logo. The Detection Module employs a zero-shot
object detection model to locate the logo region within the image, even if the logo was
not part of the training data. Once detected, the Logo Identification Module extracts the
logo using OpenCV and classifies it using a MobileNet model trained on the
FlickrLogos-32 dataset. Finally, the Output Module presents the user with the identified
brand name and a brief description of the brand, providing an intuitive and informative
result.
A zero-shot object detection model (e.g., OWL-V2 or CLIP-based) is used to detect the
                                           18
Zero-shot models are ideal for detecting logos that may not have been explicitly labeled
during training.
These models match image regions with text queries (e.g., “logo”) to detect relevant
areas.
Once the logo region is identified, OpenCV is used to crop the detected bounding box
Preprocessing steps such as resizing, normalization, and noise removal are applied to
The extracted logo is passed into a trained MobileNet classifier. MobileNet is chosen
The model is trained using the FlickrLogos-32 dataset, which contains 32 different logo
classes.
After Logo Identification, the logo class is mapped to the corresponding brand name.
A short description about the brand is retrieved from a predefined database or JSON
The Output Module is responsible for presenting the final results to the user in a clear
and user-friendly manner. After the logo is successfully detected and classified, this
                                           19
module displays the brand name associated with the logo, along with a brief description
of the brand. The output may also include the cropped image of the detected logo for
visual confirmation. This ensures that users receive accurate and helpful information
about the brand they are curious about, making the system both informative and easy
to use.
3.2 Dataset
logo detection and recognition tasks. The dataset contains images for 32 popular brand
logos, with over 8,000 images in total. These include both logo images in natural scenes
and clean logo samples. Each image in the dataset comes with bounding box
annotations indicating the exact location of the logo within the image, which is useful
for training and evaluation. The dataset is divided into training, validation, and test sets,
enabling structured model development and testing. The diversity of scenes and logo
placements in the dataset makes it ideal for training models that can generalize well to
real-world conditions, and it plays a crucial role in the performance of the MobileNet
Logo detection plays a central role in the proposed system by identifying the exact
region of a logo within a user-uploaded image. In this project, the detection process is
powered by a Zero-Shot Object Detection model, which is enhanced through the use of
an RGB layering technique to improve accuracy and localization. The zero-shot model
                                             20
brand. It achieves this by leveraging pre-trained visual-semantic models (such as CLIP
or OWL-V2) that understand both visual features and textual queries like “logo.”
Before feeding the image to the zero-shot detector, an RGB layering process is applied.
This involves separating the image into its Red, Green, and Blue channels, which can
help in highlighting the contrast and shapes that are often present in logos. By analyzing
these separate color layers, the system can enhance edges, patterns, and contours,
making the logo stand out more distinctly against the background. This pre-processing
step improves the performance of the detection model by enabling it to better focus on
Once the layered image is processed, the zero-shot model is used to identify and
generate bounding box coordinates around the detected logo region. This localized
region is then extracted using OpenCV, preparing it for the next stage of Logo
Identification. The combination of RGB layering and zero-shot detection ensures that
the system can accurately detect logos of varying sizes, shapes, and colors—even in
                                           21
The process will be understandable easily in the following figure,
bridge between the raw input image and the logo Logo Identification step. In this
After the logo is detected using the zero-shot object detector, the system retrieves
the bounding box coordinates of the detected logo. These coordinates are then used to
crop the logo region from the original image using OpenCV functions. This cropped
section isolates the logo from any unnecessary background elements, improving the
                                           22
      To further enhance the quality of the cropped logo, several preprocessing
Resizing: The image is resized to match the input size expected by the MobileNet model
RGB Channel Separation: In some cases, the image is split into R, G, and B layers to
Normalization: Pixel values are scaled to a standard range (typically between 0 and 1)
Noise Removal (optional): Filters can be applied to reduce background noise and
These image processing steps help in making the logo features more distinguishable
and consistent, allowing the classifier to perform more reliably. Overall, image
processing plays a key role in preparing and enhancing the input for accurate brand
identification.
The Logo Identification phase in this project is responsible for identifying the
specific brand logo from the cropped image obtained during the detection and image
processing stages. For this purpose, we use MobileNet, a lightweight and efficient
Once the logo is detected and cropped using OpenCV, it is resized and
                                           23
224x224 pixels and normalized pixel values). This preprocessed logo image is then
Lightweight and Fast: MobileNet is designed for speed and low computational cost,
making it ideal for cloud-based environments like Google Colab or even on-device
inference.
Accurate: Despite its small size, MobileNet provides competitive performance in image
ImageNet allow for faster training and better accuracy when fine-tuned on custom
The model learns to associate visual features with specific logo classes.
3.5.3 Output
After Logo Identification, the model predicts the logo class (e.g., Adidas,
Starbucks, Nike, etc.) based on the features extracted from the image. The predicted
class is then mapped to a brand name and a short description, which is displayed to the
                                             24
3.6 Proposed Architecture
processing, and deep learning Logo Identification to identify brand logos from
                                        25
3.7 Advantages of Proposed System
the system can detect logos even if they were not explicitly present during
training. This makes the system scalable and adaptable to new logos without
retraining [4].
deep learning model, allows for fast and efficient Logo Identification, making
image and get brand information instantly, even if they don’t know the brand
Colab provides access to powerful GPUs and TPUs, removing the need for high-
end local hardware and enabling easy sharing and collaboration [6].
7. Flexible Integration: The modular design allows easy integration with other
                                       26
                                    4 Design
    The design of the proposed logo detection and recognition system is structured
using UML (Unified Modeling Language) to visually represent the architecture and
internal processes. UML helps in effectively planning and organizing the system by
modeling how each module interacts, making it easier to understand, develop, and
To represent user interaction and system behavior, a Use Case Diagram is used,
showing how the user uploads an image and receives the detected brand name and
beginning from image upload, through RGB preprocessing and detection, to final Logo
Identification and output. A Class Diagram outlines the major components or classes
their attributes and methods, and how they relate to each other. Additionally, a
Sequence Diagram demonstrates the order in which tasks are executed, capturing the
                                           27
4.1 Usecase Diagram
The Use Case Diagram provides a high-level visual representation of how the
user interacts with the proposed logo detection system. It helps in understanding the
In our project, the primary actor is the User, who interacts with the system
The Class Diagram is a structural UML diagram that describes the internal design
of the proposed logo detection system by illustrating its main classes, their attributes,
                                           28
methods, and the relationships between them. It helps developers understand how the
system is organized and how different components interact with each other during
execution.
(e.g., the UserInterface depends on the ImageProcessor, which interacts with the
LogoDetector) are also captured to show how data and control flow between
components.
                                          29
4.3 Activity Diagram
The Activity Diagram is a type of UML diagram that represents the workflow of
the system by showing the sequence of activities and the flow of control from one step
to the next. It is especially useful for visualizing the dynamic aspects of the system and
This diagram provides a clear overview of how data flows through the system.
                                           30
4.4 Sequence Diagram
The Sequence Diagram is a type of UML diagram that represents the interaction
between different components (objects or classes) in the system over time. It visually
describes how data and messages are passed between modules, emphasizing the
In the proposed logo detection system, the sequence diagram illustrates the
interaction between the system’s core components as they work together to detect and
This diagram helps in understanding how the system components collaborate, the
order in which tasks are executed, and how data flows through the system in a time-
sensitive manner. It is especially useful for developers during the implementation phase
                                          31
Figure 4-4-Sequence Diagram
            32
4.5 Flow chart
The flow chart represents the step-by-step working of the Logo Detection and
Classification System. It outlines how a user interacts with the system and how the
system processes the input to generate meaningful output. Each step in the process is
described below:
This flow ensures a smooth pipeline from input to informative output, making the logo
                                           33
                            5 Implementation
    The implementation phase involves bringing together various components—image
complete and functional logo detection system. The project was developed and
executed using Google Colab, which provides GPU support for faster processing and
The credibility of this project is assessed based on several key factors including the
reliability of the methodology, the quality of the data sources, the transparency of the
processes used, and the technical soundness of the tools and models applied.
logos without the need for retraining on every possible class. This method
real-world scenarios.
2. Use of Trusted Tools and Models: The system leverages OpenCV for image
established and widely used in the AI and computer vision community. These
project.
3. Data Source Quality: The Flickr Logos dataset from Augsburg University is a
publicly available and academically recognized dataset. Its usage not only
                                          34
   ensures data credibility but also supports reproducibility, as other researchers
commercial use cases that respect intellectual property and branding guidelines.
                                     35
5.2 Data Preprocessing
To ensure consistency and improve the performance of the deep learning model, all
logo images used in the project undergo a data preprocessing step. This involves
resizing each image to a fixed size of 224×224 pixels, which is the standard input
1. Directory Traversal: The system recursively traverses through the input folder
and all its subdirectories to locate valid image files. Supported formats include
2. Image Validation: Only files with valid image extensions are considered. Any
3. Image Reading and Resizing: Each valid image is read using the OpenCV
intact.
5. Saving the Processed Images: The resized images are saved in a separate
                                             36
                      Figure 5-1-Process of pre-processing of image
The logo detection model was trained to accurately identify and localize brand logos
within natural images. The training process involved the following key steps:
1. Preprocessing
All images were resized and normalized to ensure consistency across the
2. ModelArchitecture
                                           37
3. TrainingConfiguration
The model was trained using a cross-entropy loss for Logo Identification and a
smooth L1 loss for bounding box regression. An Adam optimizer was used with
a learning rate scheduler to adjust learning rates during training. The model was
trained for N epochs (adjust based on your setup) with early stopping to prevent
overfitting.
4. EvaluationMetrics
assess generalization.
5. TransferLearning
                                      38
5.3.1 Zero shot model.
allows the model to detect and classify objects based solely on textual prompts,
enabling flexible and scalable detection of logos by simply specifying brand names as
labels.
                                         39
5.3.2 MobileNet Model
To build an efficient and lightweight logo detection model, MobileNet was employed
network architecture designed for mobile and embedded vision applications, offering
                                        40
5.4 Testing
A separate set of logo images was used for testing. These images varied in resolution,
background complexity, and lighting conditions to reflect realistic use cases. Each
image was preprocessed to match the input format required by the model—resized to
During testing, each image was passed through the trained model, which outputted a
probability distribution across all known logo classes. The class with the highest
probability was selected as the predicted output. The predicted label was then compared
                                          41
     Figure 5-5-Identifyting the Logo and displaying Name and Discription
5.5 Accuracy
The logo recognition system, built using the MobileNetV2 model, demonstrates an
overall classification accuracy of approximately 62% on the test set. The confusion
matrix indicates that the model performs well on several frequently occurring logos but
struggles with less represented or visually similar logos. Despite the challenging nature
able to correctly classify a majority of the logos. Fine-tuning using cropped logos and
a diverse dataset has contributed to this level of performance. The system effectively
                                           42
integrates logo detection, cropping, and classification into a single pipeline. Further
Figure 5-6-Accuracy
                                          43
                 6 Conclusion and Future Work
processing, and deep learning techniques. The integration of RGB layering and
OpenCV enhances the system’s ability to accurately isolate logo regions, while the use
of MobileNet ensures efficient and reliable Logo Identification of the detected logos.
With a user-friendly interface and a robust backend pipeline, the system offers an
accessible solution for users who wish to identify brands simply by uploading images.
Training the system using the FlickrLogos-32 dataset also ensures a good level of
1. Support for More Logos: Expanding the Logo Identification to include more
additional datasets.
2. Logo Recognition in Video Streams: Extending the model to work with video
input by extracting frames and continuously detecting logos across time. This
                                         44
5. Robustness in Complex Backgrounds: Improving logo detection in noisy,
segmentation models.
                                     45
                                7 References
[1] Marisa Bernabeu, Antonio Javier Gallego, and A. Pertusa. 2022. Multi-label logo
[2] Pedro Carvalho, Américo Pereira, and Paula Viana. 2021. Automatic TV logo
identification for advertisement detection without prior data. Appl. Sci. 11, 16 (2021),
74–94
detection from document image using HOG features. Multimedia Tools Appl. 82
(2022), 863–878
[4] Alexey Bochkovskiy, Chien Yao Wang, and H. Liao. 2020. YOLOv4: Optimal
[5] Pedro Carvalho, Américo Pereira, and Paula Viana. 2021. Automatic TV logo
identification for advertisement detection without prior data. Appl. Sci. 11, 16 (2021),
74–94.
[6] Hang Chen, Xiao Li, Zefan Wang, and Xiaolin Hu. 2021. Robust logo detection in
46