Sasi
Sasi
in Shopping Malls
A PROJECT REPORT
Submitted in partial fulfillment of the requirements for the award of the degree
Of
BACHELOR OF TECHNOLOGY
in
CSE (Artificial Intelligence and Machine Learning)
Submitted by
YAMALA SASI REKHA 21A51A4257
GEDELA CHANDINI 21A51A4239
BODIGI SAI KARTHIK 21A51A4243
ANDHAVARAPU CHARAN 21A51A4223
“I hereby declare that the project entitled “Real-Time Customer Behavior and
Satisfaction Insight System in Shopping Malls” Submitted for the award of the
degree of Bachelor of Technology in CSE(Artificial Intelligence and Machine
Learning) is my own work and that, to the best of my knowledge and belief, it
contains no material previously published or written by another person nor
material which has been accepted for the completion for the award of any other
degree, associate ship, fellowship or any other similar titles.
PLACE: Tekkali
DATE:
YAMALA SASI REKHA 21A51A4257
GEDELA CHANDINI 21A51A4239
BODIGI SAI KARTHIK 21A51A4243
ANDHAVARAPU CHARAN 21A51A4223
ii
ADITYA INSTITUTE OF TECHNOLOGY AND MANAGEMENT
(An Autonomous Institution)
CERTIFICATE
This is to certify that the project report entitled “Real-Time Customer Behavior and
Satisfaction Insight System in Shopping Malls” being submitted by YAMALA SASI
REKHA(21A51A4257),GEDELA CHANDINI (21A51A4239), BODIGI SAI KARTHIK
(21A51A4243), ANDHAVARAPU CHARAN( 21A51A4223)Submitted in partial fulfillment
for the award of Degree of Bachelor of Technology in CSE (Artificial Intelligence and
Machine Learning) during the year 2024-2025 to the JNTUGV, Vizianagaram is a record of
bonafied work carried out by them under my guidance and supervisio
iii
ACKNOWLEDGEMENT
We wish to thank Mr. K. Srinivasa Rao for his kind support and his valuable suggestions
and encouragement helped us a lot in carrying out this project work as well as in bringing
this project to this form
We take this opportunity to express our sincere gratitude to our Director Prof. V. V.
Nageswara Rao for his encouragement in all respect.
We take the privilege to thank our principal Dr. A. S. Srinivasa Rao for his encouragement
and support.
We are also very much thankful to Dr. M. V. B. Chandrasekhar, Head of CSE (Artificial
Intelligence and Machine Learning) for his help and valuable support in completing the
project
We are also thankful to all staff members in the Department of of CSE (Artificial Intelligence
and Machine Learning), for their feedback in the reviews and kind help throughout our
project
Last but not the least, we thank all our classmates for their encouragement and help in making
this project a success
It is their help and support, due to which we became able to complete the design and technical
report.
iv
Program Outcomes (PO)
v
9. INDIVIDUAL AND TEAM WORK: Function effectively as an individual,
and as a member or leader in diverse teams, and in multidisciplinary settings.
10. COMMUNICATION: Communicate effectively on complex engineering
activities with the engineering community and with society at large, such as,
being able to comprehend and write effective reports and design documentation,
make effective presentations, and give and receive clear instructions.
11. PROJECT MANAGEMENT AND FINANCE: Demonstrate knowledge and
understanding of the engineering and management principles and apply these to
one’s own work, as a member and leader in a team, to manage projects and in
multidisciplinary environments.
12. LIFE-LONG LEARNING: Recognize the need for, and have the preparation and
ability to engage in independent and life-long learning in the broadest context of
technological change.
PSO1: Apply the fundamental knowledge for problem analysis and conduct
investigations in CSE (AIML) for sustainable development.
PSO2: Design and development of solutions by using modern software for the
purpose of execution of the projects in specialized areas.
PSO3: Inculcate effective communication and ethics for lifelong learning with
social awareness.
PO-PSO Mapping
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3
3 3 3 2 3 3 2 3 3 3 2 2 3 2 2
vi
Project Details
Batch No : 12
Project Title : Real-Time Customer Behavior and Satisfaction Insight
System in shopping malls.
Methodology : Deep Learning
Application : Personalized Marketing
Description of the
Project : The “Real Time Customer Behaviour and Satisfaction Insight
Systems in Shopping Malls” project leverages AI and data
analytics to track shopper movements, purchase patterns, and
engagement, enabling retailers to gain a deeper understanding
of customer behavior. By analyzing foot traffic, dwell times,
and product interactions, the system helps optimize store
layouts for better navigation and improved sales. The project
enhances personalized marketing by identifying customer
preferences and delivering targeted promotions, leading to
increased engagement and conversion rates. Additionally, it
supports operational efficiency by improving inventory
management, reducing checkout bottlenecks, and streamlining
staff allocation.
Batch Details :
GEDELA CHANDINI 21A51A4239
YAMALA SASI REKHA 21A51A4257
BODIGI SAI KARTHIK 21A51A4243
ANDHAVARAPU CHARAN 21A51A4223
vii
Objectives : The project “Real Time Customer Behaviour and
Satisfaction Insight Systems in Shopping Malls” aims to
analyze shopper movements and purchase patterns using AI
and data analytics to optimize store layouts, enhance customer
engagement, and improve inventory and queue management.
The project aims to provide predictive insights for boosting
sales, security, and the overall shopping experience.
Additionally, it leverages real-time analytics and machine
learning to enable data-driven decision-making for efficient
store operations and targeted marketing strategies.
viii
ABSTRACT
In this study, we analyzed existing customer behavior analysis systems in shopping malls, such as
Pose Network and MoveNet, which primarily rely on movement tracking and pose estimation.
However, these methods fall short in capturing deeper insights into customer interests, such as
gaze direction, dwell time, and product interactions. To overcome these limitations, we propose an
advanced AI-driven solution integrating YOLO for pose estimation with deep learning
algorithms to classify customer interests more accurately. Our system leverages computer vision
and machine learning to generate real-time heatmaps, sentiment analysis, and predictive
insights, assisting mall operators in optimizing store layouts, enhancing personalized marketing,
and improving customer satisfaction. Additionally, our approach enables real-time crowd density
monitoring, improving queue management and staff allocation for a seamless shopping
experience.
The system differentiates between casual browsers and serious buyers, allowing businesses to
tailor marketing strategies and optimize sales efforts. Furthermore, inventory management
benefits from demand trend analysis based on customer movement patterns near product displays.
By incorporating predictive analytics, mall operators can forecast shopping trends, adjust
promotional campaigns, and refine engagement strategies. To ensure continuous improvement,
customers can provide direct feedback via a QR code, linking to a form that gathers valuable
insights for further system refinement and data-driven decision-making.
Keywords:
YOLO, CNN, DeepSort, Object Detection, Image Recognition, Feature Extraction, Customer
Behavior Analysis, Deep Learning, Computer Vision, Retail Analytics, Predictive Analytics,
Sentiment Analysis, Machine Learning.
ix
TABLE OF CONTENTS
Contents
Page. No
Candidate’s Declaration ii
Supervisor’s Certificate iii
Acknowledgements iv
PO – PSO mapping v-vi
Project Details vii-viii
Abstract ix
Table of Contents x-xii
LIST OF TABLES xiii
LIST OF FIGURES xiv
LIST OF ABBREVATIONS xv
x
3.2 Challenges Faced by Shopping Malls in
Customer Behavior Analysis
xi
Chapter 6 Methodology 33-36
6.1 Model Selection and Loading
6.2 Real-Time Object Detection
6.2.1 Video Frame Processing
6.2.2 Object Detection and positioning
6.3 Real-Time Customer Identification
6.4 Estimating Age and Gender of Customers
6.5 Movement Tracking with Deep SORT
6.6 Analyzing Customer Trends
6.6.1 Time Spent in AOI
Chapter 7 Results And Discussions 37-46
7.1 Object Detection Performance
7.2 Detecting and Analyzing Customers
7.3 Analysis of Product Categories
7.4 Collecting Customer Feedback via QR Codes
7.5 User Interface and Usability
7.6 System Limitations and Improvements
7.7 Conclusion
7.8 Appendix (Source Code)
References 49
Publications 50
xii
LIST OF TABLES
xiii
LIST OF FIGURES
xiv
LIST OF ABBREVIATIONS
EEG Electroencephalogram
AR Augmented Reality
MVC Model-View-Controller
xv
CHAPTER 1
INTRODUCTION
Understanding customer behavior in shopping malls is crucial for enhancing retail experiences and
optimizing business strategies. Traditional methods of customer analysis, such as surveys and
manual observation, are time-consuming and often lack accuracy. In recent years, computer vision-
based approaches, including pose estimation models like Pose Network and Move Net, have been
explored for tracking customer movements and engagement. However, these systems have
limitations in accurately detecting and interpreting nuanced customer behaviors, such as intent and
interest in specific products or areas.
To overcome these limitations, we propose an advanced approach that integrates YOLO (You Only
Look Once), a cutting-edge object detection framework, with deep learning techniques for precise
pose estimation and behavior classification. Unlike conventional pose estimation models, our
system not only detects customer movements but also classifies their interests by analyzing
movement patterns, gaze direction, and dwell time.
By leveraging machine learning and real-time analytics, our solution provides actionable insights
into customer preferences, enabling mall operators to optimize store layouts, improve product
placements, and personalize marketing strategies. The proposed system enhances decision-making
by offering a more accurate and adaptive method for customer behavior analysis in retail
environments.
Customer behavior analysis plays a crucial role in modern retail environments, helping businesses
optimize store layouts, improve marketing strategies, and enhance customer experiences.
Traditional methods such as surveys and manual observations are often inefficient and fail to
provide real-time insights. Recent advancements in computer vision and machine learning have
enabled automated solutions for tracking and analyzing customer behavior in shopping malls.
Pose estimation struggles with detecting intent and interaction. Our approach integrates YOLO and deep
learning for better accuracy.
1
1.2 Problem Statement
Despite the growing adoption of AI-driven analytics in retail, existing pose estimation models have
the following challenges:
Limited Accuracy: Current systems struggle to distinguish between browsing, product selection,
and purchasing intent.
Lack of Context Awareness: Existing approaches do not consider behavioral cues such as gaze
direction, hesitation, or dwell time near specific products.
These limitations reduce the effectiveness of customer behavior analysis in shopping malls,
leading to missed opportunities for personalized marketing, customer engagement, and optimized
store layouts.
The "Real-Time Customer Behaviour and Satisfaction Insight System in Shopping Malls" is
designed to provide mall operators with a sophisticated tool for monitoring and analyzing customer
behavior and satisfaction in real time, leveraging advanced artificial intelligence (AI) and data
collection technologies. The project integrates the YOLO (You Only Look Once) object detection
framework, Deep SORT (Simple Online and Realtime Tracking) algorithm, Natural Language
Processing (NLP), and QR code-based feedback mechanisms to deliver actionable insights. This
system bridges the gap between raw observational data and strategic decision-making to enhance
customer experiences and optimize operations in shopping malls. It targets multiple dimensions,
including technical capabilities, user needs, and operational contexts. The project ensures a
scalable and adaptable solution for modern retail environments.
The technical foundation of the system is built on a combination of cutting-edge computer vision,
multi-object tracking, and text analysis technologies, designed to operate seamlessly in real-time
2
retail settings. The key technical components include:
The system employs YOLO, a state-of-the-art deep learning model, to detect customers
within live video feeds captured by strategically positioned CCTV cameras across the mall.
YOLO’s single-pass processing ensures high-speed detection, identifying individuals with
bounding boxes and confidence scores even in busy environments. This is complemented
by Deep SORT, which enhances tracking accuracy by associating detections across frames,
using Kalman filters for motion prediction and appearance-based re-identification to handle
occlusions and identity switches. For example, the system can track a customer moving
from a clothing store to a food court, maintaining continuity despite temporary obstructions
like other shoppers.
BehavioralAnalysis:
Beyond mere detection, the system analyzes customer movement patterns, such as walking
paths, dwell times, and interactions with products or digital touchpoints (e.g., kiosks or
mobile apps). It calculates metrics like the average time spent in specific zones (e.g., 15
minutes in the saree section) or the frequency of interactions with promotional displays,
providing quantitative insights into engagement levels. Heatmaps and flow analysis plots
are generated to visualize high-traffic areas and popular routes, aiding in store layout
optimization.
QR codes placed at key locations (e.g., exits, service desks) enable customers to submit
anonymous feedback via mobile devices, capturing preferences, complaints, and
satisfaction levels in real time. The collected textual data is processed using NLP
techniques, including sentiment analysis (e.g., classifying “great deals” as positive) and
topic modeling (e.g., grouping feedback into themes like “pricing” or “service”). Combining
video tracking and text insights provides a holistic view of customer experiences.
The system is designed to handle varying data volumes, from a single-store setup with one
camera to a multi-level mall with dozens of feeds. It uses modular architecture, allowing
components (e.g., YOLO detection, NLP processing) to be updated independently as
technology evolves, ensuring long-term relevance.
The project targets a diverse range of stakeholders within the retail ecosystem, each benefiting
from its insights in unique ways:
The primary users, who leverage the system to monitor customer behavior, optimize store
layouts, and enhance operational efficiency. For instance, managers can use dwell time data
to reposition underperforming stores or adjust staffing during peak hours based on crowd
density insights.
Individual shop owners within the mall can access tailored reports (e.g., time spent in their
store, product category preferences) to refine inventory, promotions, and customer service.
For example, a clothing store might stock more sarees if feedback and dwell times indicate
high interest.
Marketing Teams:
Marketing professionals use demographic data (e.g., 59.7% male, 40.3% female customers)
and sentiment analysis to design targeted campaigns, such as promotions for adolescents
(48.4% of visitors) or addressing negative feedback about pricing.
Customers:
4
Indirect beneficiaries, as improved layouts, faster service, and personalized experiences
enhance their shopping satisfaction. The anonymous QR code feedback mechanism
empowers them to voice opinions without pressure, fostering a participatory role.
Academics and developers can use the system as a case study for advancing AI-driven retail
analytics, potentially integrating it with emerging tools like augmented reality (AR) or
blockchain.
The system is engineered to function effectively across various operational contexts within
shopping malls, ensuring versatility and practical applicability:
The primary focus is on indoor malls, where CCTV cameras capture customer movements
across stores, corridors, and common areas like food courts. It excels in tracking
interactions with physical products (e.g., lifting a shirt) and digital touchpoints (e.g., app
usage at a kiosk), providing insights into both shopping and leisure behaviors.
Designed to handle high-traffic scenarios, such as weekend sales or holiday seasons, the
system uses Deep SORT to maintain tracking accuracy in dense crowds. It identifies busy
zones (e.g., near escalators) and suggests crowd management strategies, like temporary
signage or staff deployment.
While optimized for well-lit indoor environments, the system includes preprocessing
techniques (e.g., noise reduction, contrast adjustment) to adapt to moderate lighting
variations, such as dimmer evening conditions or shadowed areas. Future iterations could
incorporate infrared cameras for low-light performance.
5
Real-Time Decision Support:
The system’s modular design and forward-looking approach allow for significant enhancements
and broader applications, ensuring its evolution alongside retail trends:
Future versions could connect with customer wearables (e.g., smartwatches) or mobile apps
to track behavior beyond CCTV range, offering personalized notifications like “20% off at
the store you lingered at.” This would enhance engagement while maintaining privacy
through opt-in mechanisms.
QR codes can evolve into AR triggers, enabling interactive experiences like virtual try-ons
and store maps. They also provide behavioral data, such as time spent on AR content. This
transformation enhances customer engagement and system interactivity.
AI-Driven Personalization:
Advanced machine learning models could analyze historical data to predict preferences,
recommending promotions tailored to individual or demographic trends (e.g., targeting
adolescents with gaming deals). This shifts the system from reactive to proactive insights.
6
Blockchain for Data Security:
Incorporating blockchain could secure feedback and tracking data, ensuring transparency
and trust. Customers could verify their anonymized contributions, while operators benefit
from tamper-proof records for audits or compliance.
The system could adapt to outdoor markets, airports, or supermarkets, adjusting detection
for open spaces or perishable goods. For example, in an airport, it might track passenger
flow to optimize gate assignments, demonstrating cross-industry potential.
Adding multi-language NLP support (e.g., Hindi, Telugu) and cultural context analysis
could make the system globally deployable, catering to diverse mall demographics in
regions like India or international hubs.
Future iterations might integrate additional sensors (e.g., audio for crowd noise levels,
thermal for occupancy) to improve performance in challenging conditions, such as noisy
food courts or extreme weather affecting outdoor sections.
The scope of this project is deliberately broad yet focused, balancing immediate applicability with
long-term potential. It targets the core needs of shopping mall management—real-time behavior
tracking, engagement analysis, and satisfaction insights—while laying a foundation for innovative
expansions that could redefine retail analytics. Balancing technical precision with user focus
7
CHAPTER 2
LITERATURE SURVEY
This survey examines key research on real-time customer behavior analysis in retail, highlighting
methodologies, findings, and gaps. It provides context for the proposed system, integrating YOLO,
Deep SORT, NLP, and QR-based feedback to enhance object detection, tracking, and customer
insights. By addressing limitations in existing studies, the system aims to deliver comprehensive,
real-time behavioral analysis. Additionally, it emphasizes the role of AI-driven analytics in
improving customer engagement. The study underscores the need for seamless integration of
technology to refine shopping experiences.
DOI https://doi.org/10.1016/j.patrec.2016.04.011
Author Djamal Merad , Kheir-
Eddine Aziz , Rabah Iguernaissi , Bernard Fertil , Pierre Drap
What they have done? Enhanced multiple-object tracking for behavioral marketing by
reducing identity switches using a re-identification strategy.
How they have done? Used a re-identification strategy with pose classification,
integrated with particle filter-based tracking in a mono-camera
setup.
Which method/approach Applied a particle filter-based tracking approach combined with
they followed? re-identification to segment individuals and classify poses.
What they found? Integrating re-identification with particle filter tracking reduces
identity switches and improves trajectory recovery in crowded
spaces.
What they concluded? Re-identification combined with particle filter tracking enhances
multiple-object tracking, improving customer behavior analysis
in dense environments.
Table 2.1 Multi-Person Tracking Under Occlusions for Customer Behavior Analysis
8
2. Title Performing Customer Behavior Analysis using Big Data
Analytics
DOI https://doi.org/10.1016/j.procs.2016.03.125
Author Anindita A. Khade
What they have done? Used OpenPose for customer tracking in malls, integrated with
MapReduce for decision tree analysis, and visualized insights
using D3.js.
How they have done? Applied OpenPose for movement tracking, used MapReduce for
analysis, and generated interactive visualizations with D3.js.
Which method/approach Employed OpenPose for pose estimation, deep learning models
they followed? for behavior classification, and D3.js for data visualization.
What they found? Combining OpenPose with deep learning improves customer
tracking, enabling precise behavior classification and insightful
visualizations.
What they concluded? Integrating pose estimation, deep learning, and visualization
enhances customer behavior analysis, offering valuable business
insights.
DOI https://doi.org/10.1016/j.jbusres.2022.02.074
Author Scott D. Murray, Hyun Seung Jin, Brett A.S. Martin
What they have done? Explored how shopping orientation influences variety-seeking
behavior in consumer choices.
How they have done? They conducted three studies examining the link between shopping
enjoyment and variety-seeking in decision-making.
DOI https://doi.org/10.1016/j.heliyon.2024.e36027
Author Bilal Khalid
What they have done? They studied omnichannel experiences' impact on customer
satisfaction in Thai fashion retail using the UTAUT model and
SEM.
How they have done? Surveyed 509 omnichannel shoppers and analyzed data using
SEM with Amos software.
Which method/approach Used a quantitative survey design with simple random sampling and
they followed? SEM analysis.
What they found? Ease of use, enjoyment, promotions, service, and transactions enhance
omnichannel satisfaction.
What they concluded? Better coordination across service channels improves customer
satisfaction in fashion retail.
10
6. Title Customer perception, integration behavior, and loyalty of
internet of things enterprises
DOI https://doi.org/10.1016/j.techsoc.2024.102600
Author Gaofei Ren , Yaoyao Chen , Maobao Yang
What they have done? Identified key factors influencing customer loyalty in IoT,
including price perception, service perception, and integration
behavior.
How they have done? Collected data from 211 IoT users via an anonymous survey
and analyzed it using structural equation modeling (SEM).
Which method/approach Used a quantitative research approach with SEM to examine
they followed? relationships between key factors.
What they found? Price perception, service perception, and integration behavior
positively impact customer satisfaction and loyalty.
What they concluded? Improving these factors enhances customer satisfaction and
loyalty, providing strategic insights for IoT companies.
Table 2.6 Customer Perception, Integration Behavior, and Loyalty in IoT Enterprises
DOI https://doi.org/10.1016/j.enbuild.2021.111691
Author M.S. Mayhoub , Emad H. Rabboh
What they have done? Explored the emotional and functional impacts of daylighting in
shopping malls from customers' perspectives.
How they have done? Conducted a field survey with 552 customers of Carian
shopping malls to analyze preferences and perceptions.
Which method/approach Used quantitative analysis to assess illumination, sunlight
they followed? presence, and connection to outdoor views.
What they found? Daylighting enhances mood more than energy savings, with
illumination quality being the most valued aspect.
What they concluded? Designers should prioritize daylighting’s emotional impact,
focusing on quality over source for better customer
experiences.
DOI https://doi.org/10.1016/j.procs.2024.10.024
Author Yujie Wu
What they have done? Designed an online shopping mall to enhance user experience
and functionality.
How they have done? Used Spring Boot, Vue, and a multi-layer Java architecture
with collaborative filtering and sensitive word filtering.
Which method/approach Applied collaborative filtering for recommendations and a
they followed? dynamic sensitive word filter for comment moderation.
What they found? The system improved functionality, personalization, and user
trust.
What they concluded? It serves as a reference model for future e-commerce platform
development.
Table 2.8 Design and Implementation of an Online Shopping Mall Using Collaborative
Filtering.
DOI https://doi.org/10.1016/j.jretai.2022.11.002
Author KhadijaAli Vakeel , Morana Fudurić , Vijay Viswanathan ,
Mototaka Sakashita
What they have done? Investigated the impact of real-time mobile promotions
(RTMs) on shopping momentum and spending in retail malls.
How they have done? Used a quasi-experimental design targeting loyalty program
members with RTMs and analyzed buyer responses.
Which method/approach Categorized buyers by spending levels to assess RTMs'
they followed? influence on shopping momentum and spending.
What they found? RTMs increased spending for moderate and heavy buyers but
had no effect or reduced spending for light buyers.
What they concluded? Retailers should focus RTMs on moderate and heavy buyers
for maximum effectiveness in driving sales.
12
10. Title Lost in a mall, the effects of gender, familiarity with the
shopping mall and the shopping values on shoppers'
wayfinding processes
DOI https://doi.org/10.1016/j.jbusres.2004.02.006
Author Jean-Charles Chebat , Claire Gélinas-Chebat , Karina Therrien
What they have done? The study explored how shopper characteristics—gender, mall
familiarity, and shopping values—affect wayfinding processes
and information sources used in a shopping mall.
How they have done? Data were collected from 156 real shoppers who recorded their
thoughts and actions during wayfinding, which were then
content-analyzed.
Which method/approach they The study analyzed variations in wayfinding behavior based on
followed? shopper characteristics and examined mediating effects of
hedonistic shopping values.
What they found? Wayfinding processes and information source preferences
significantly differed by gender, mall familiarity, and shopping
values, with notable mediation by hedonistic values.
What they concluded? Understanding these factors can help mall managers improve
wayfinding systems and enhance shopper experiences, aligning
with their diverse needs and values.
Table 2.10 Lost in a Mall: Effects of Gender, Mall Familiarity, and Shopping Values on
Wayfinding.
13
CHAPTER 3
PROBLEM STATEMENT
3.1 Introduction
Shopping malls serve as bustling hubs of commerce and leisure, where understanding customer
behavior and satisfaction is critical to maintaining competitiveness, optimizing operations, and
enhancing the overall shopping experience. However, the dynamic and multifaceted nature of
customer interactions within these environments poses significant challenges to traditional retail
analytics methods. Operators need real-time insights into how customers move, what engages
them, and how they feel about their experiences to make informed decisions about store layouts,
staffing, marketing strategies, and service improvements. Existing systems, such as pose-based
detection frameworks (e.g., Pose Network, Move-Net) and static feedback mechanisms (e.g., paper
surveys), fall short in providing a comprehensive, immediate, and actionable understanding of
customer dynamics. This chapter delves into the specific challenges faced by shopping malls in
this context, highlighting the limitations of current approaches and establishing the urgent need for
an advanced, integrated solution like the proposed "Real-Time Customer Behaviour and
Satisfaction Insight System in Shopping Malls," which leverages YOLO (You Only Look Once),
Deep SORT (Simple Online and Real-time Tracking), Natural Language Processing (NLP), and
QR code-based feedback collection.
The inability to accurately and promptly analyze customer behavior and satisfaction in shopping malls
stems from several interconnected challenges. These issues not only hinder operational efficiency but
also impact customer retention and revenue potential.
Limited real-time insights delay strategic decisions, affecting personalized marketing efforts.
Fragmented and unstructured data sources make it difficult to create a unified customer profile.
14
3.2.1 Limited Behavioral Insights from Pose-Based Systems
Traditional systems like Pose Network and Move-Net rely heavily on pose detection to infer
customer behavior, focusing on static postures such as standing, reaching, or bending. While these
methods can identify basic actions, they often misinterpret customer intent and fail to capture the
broader spectrum of behaviors that influence shopping decisions. For example, a customer standing
still near a clothing display might be waiting for a friend rather than showing interest, yet pose-
based systems might classify this as engagement. Similarly, these systems overlook critical
patterns such as movement paths, dwell times, or interactions with multiple products, which are
essential for understanding preferences and engagement levels. This narrow focus limits mall
operators’ ability to discern whether a customer is casually browsing, actively shopping, or simply
passing through, resulting in incomplete or misleading insights that undermine effective decision-
making.
Shopping malls are inherently crowded and dynamic, especially during peak hours, weekends, or
sales events, where customer density can obscure visibility and complicate tracking. Existing
multi-object tracking systems often struggle with occlusions—when one customer blocks another
from the camera’s view—or identity switches, where a customer’s tracking ID is erroneously
reassigned to someone else as they cross paths. For instance, in a busy food court, a system might
lose track of a customer moving behind a group, skewing data on popular areas or dwell times.
This inaccuracy hampers the ability to map customer flow accurately, identify high-traffic zones,
or assess congestion-related dissatisfaction, leaving operators with unreliable data for layout
planning or crowd management. The lack of robust tracking in such scenarios is a significant
barrier to real-time behavioral analysis.
Customer feedback is a cornerstone of satisfaction analysis, yet traditional methods like paper surveys,
suggestion boxes, or post-visit online questionnaires are slow, cumbersome, and often non-
anonymous, deterring participation. For example, a customer frustrated by a long checkout line might
leave without completing a survey, or one satisfied with a sale might forget to provide feedback later.
These delays mean operators miss the chance to
15
address issues in real-time e.g., adding staff to a busy counter or capitalize on positive experiences
with instant follow-ups, like targeted promotions. Moreover, the lack of anonymity in some systems
(e.g., requiring email addresses) can discourage honest responses, especially about negative
experiences, leading to biased or incomplete data. This gap in timely, candid feedback limits the
ability to gauge true satisfaction levels and respond proactively.
Understanding how customers interact with mall services—such as digital kiosks, mobile apps,
promotional displays, or in-store products—is crucial for assessing engagement and tailoring
experiences. However, current systems lack standardized metrics to measure these interactions
effectively. For instance, there’s no automated way to determine how long a customer lingers at a
kiosk, how often they use an app for navigation, or whether they pick up and then return an item.
Manual observation is impractical in large malls, and pose-based systems don’t track these subtle
actions across time and space. Without quantifiable engagement data, operators cannot evaluate
the effectiveness of digital touchpoints, optimize product placement, or identify underperforming
services, resulting in missed opportunities to enhance customer experiences and drive sales.
Mall operators often rely on fragmented data sources—CCTV footage reviewed manually, sporadic
surveys, or sales reports—requiring significant human effort to synthesize into actionable insights. This
manual process is time-consuming, prone to error, and incapable of delivering real- time results. For
example, analyzing hours of video to identify busy zones might take days, by which time customer
patterns have shifted. Similarly, correlating survey feedback with observed behavior is challenging
without integrated tools, leaving operators with disconnected datasets that fail to provide a unified
view. This dependence on labor-intensive, disjointed methods restricts the ability to respond swiftly to
trends, address dissatisfaction, or capitalize on emerging opportunities, ultimately impacting
operational agility and customer satisfaction.
16
3.3 Need for an AI-Powered Real-Time Solution
The challenges outlined above underscore the urgent need for a technologically advanced, real-
time system that overcomes the limitations of existing approaches and delivers comprehensive,
immediate, and actionable insights. Traditional tools—whether pose-based detection, basic
tracking, or static feedback mechanisms—are insufficient for the complex, fast-paced environment
of shopping malls, where customer behaviors and preferences evolve rapidly. The ideal solution
must address the following requirements:
17
Scalability and Offline Capability:
Operate efficiently across malls of varying sizes, leveraging existing CCTV infrastructure and
functioning offline to ensure reliability in environments with limited internet connectivity, while
remaining adaptable to future technological integrations.
The system integrates YOLO for detection, Deep SORT for tracking, and NLP for real-time
sentiment analysis of QR feedback. It bridges the gap between fragmented data and immediate
insights, overcoming pose-based misinterpretations, tracking issues in crowds, and delayed
feedback. For example, it cross-references dwell time with feedback to clarify intent.
18
CHAPTER 4
The success of the "Real-Time Customer Behaviour and Satisfaction Insight System in Shopping
Malls" hinges on the availability of high-quality, diverse, and well-structured data to train, validate,
and operate its AI-driven components, including YOLO (You Only Look Once) for object detection,
Deep SORT (Simple Online and Realtime Tracking) for movement analysis, and Natural Language
Processing (NLP) for sentiment extraction. Data serves as the foundation for detecting customers,
tracking their behaviors, quantifying engagement, and analyzing satisfaction in real time, enabling
mall operators to derive actionable insights. This chapter outlines the systematic process of data
collection and procurement, detailing the types of data required, their sources, preprocessing
techniques, ethical considerations, and challenges encountered. By ensuring a robust data pipeline,
the system can accurately interpret customer dynamics in complex mall environments, delivering
reliable and scalable analytics to enhance operational efficiency and customer experiences.
To achieve its objectives, the system relies on multiple data categories, each tailored to specific
analytical needs. These types are carefully selected to capture both the physical and perceptual
aspects of customer behavior within shopping malls.
High-resolution video footage is essential for detecting customers and analyzing their movements.
This includes real-time feeds from CCTV cameras capturing actions such as walking, standing,
browsing, or interacting with products. For example, a 1080p video at 30 frames per second (FPS)
provides sufficient detail to identify individuals and track their paths through crowded aisles or
open spaces like food courts. Annotated datasets with bounding boxes around customers and labels
(e.g., “person,” “group”) are required to train the YOLO model, ensuring accurate detection across
diverse mall settings—indoor stores, corridors, and escalator zones.
19
Textual Data for Feedback and Sentiment Analysis:
Textual inputs from customer feedback are critical for understanding satisfaction and preferences.
This includes short responses from QR code surveys (e.g., “great variety”), longer comments (e.g.,
“queues too long, but good deals”), and potentially social media posts or online reviews
mentioning the mall. These texts vary in length and tone, requiring NLP to process positive,
negative, or neutral sentiments and extract themes like “service quality” or “pricing.” The data
must be timestamped to align with video observations, enabling correlation between behavior (e.g.,
lingering in a store) and feedback (e.g., “loved the shirts”).
Quantitative metrics derived from video and interaction logs measure customer engagement with
mall services. This includes dwell times (e.g., seconds spent near a display), interaction frequencies
(e.g., number of kiosk uses), and movement patterns (e.g., total distance traveled). These metrics
provide concrete data to assess interest—for instance, a 10-minute dwell time in the saree section
might indicate strong engagement—supporting decisions on product placement or promotional
efforts.
The system draws from a mix of primary and secondary sources to build a comprehensive dataset,
balancing real-world applicability with scalability and cost-efficiency. Primary sources include CCTV
footage, in-mall sensors, and customer feedback forms, which provide direct insights into customer
movement, engagement, and satisfaction. Secondary sources such as social media activity, e-
commerce trends, and industry reports offer broader contextual understanding and trend analysis.
Together, these sources enable a holistic view of customer behavior, supporting data-driven decision-
making for improved mall operations and customer experience.
20
4.3.1 Primary Data Collection
Real-time video is captured from existing CCTV cameras strategically placed throughout the
mall—entrances, store fronts, food courts, and escalators. For instance, a mall with 50 cameras
might provide 24/7 coverage of key areas, generating terabytes of footage weekly. This data is
collected in collaboration with mall management, ensuring alignment with operational needs (e.g.,
monitoring peak hours from 12 PM to 6 PM). Custom recordings in varied conditions—bright
daylight, dim evening lighting, or crowded sales events—enhance robustness.
QR Code Surveys:
Customers provide feedback by scanning QR codes displayed at high-traffic points like exits,
restrooms, or service desks. These codes link to mobile-friendly forms collecting ratings (e.g., 1-
5 stars) and free-text comments, designed for quick input (under 30 seconds) to maximize
participation. During a pilot, 500 daily responses might be gathered in a medium-sized mall,
offering a rich dataset of immediate reactions—e.g., “fast service” after a purchase or “no seating”
near the food court.
Open-Source Datasets:
Public datasets like COCO (Common Objects in Context) and Open Images provide pre-annotated
images of people in various settings, ideal for initial YOLO training. COCO, with over 330,000
images, includes diverse human poses (e.g., walking, standing), while Open Images offers 9
million annotated instances, ensuring the model generalizes across demographics and
environments. These datasets supplement custom mall footage, reducing the annotation burden.
Retail-Specific Benchmarks:
Datasets like the KITTI Vision Benchmark Suite, though designed for autonomous driving, include
pedestrian tracking examples adaptable to mall corridors.
21
Hypothetical retail datasets (e.g., “MallCrowd 2023”) could provide annotated videos of shopping
behaviors, filling gaps in public resources tailored to indoor retail.
To enhance model robustness and prevent overfitting, augmentation techniques are applied
Video Augmentation: Rotations, flips, and brightness adjustments simulate camera angles and lighting
changes (e.g., evening shadows). Noise addition mimics real-world imperfections like lens glare. Frame
skipping and motion blur replicate rapid movements, while color shifts simulate varying environmental
conditions.
Synthetic Data: Tools like Unity or Blender generate simulated mall scenes with virtual customers,
adding controlled variations (e.g., occlusion by groups) to training data.
Raw data must be refined and structured to support AI model training and real-time analysis,
involving several key steps:
Video Annotation:
Tools like Label Img or CVAT are used to manually draw bounding boxes around customers in
sample footage, labeling them as “person” with attributes like position (e.g., “left aisle”). For a 10-
minute clip, 300 frames might be annotated, creating a dataset of 1,000+ labeled instances.
Automated pre-labeling with pre-trained YOLO speeds this process, followed by human
verification for accuracy.
Text Preprocessing:
Feedback text is cleaned by removing typos, emojis, or irrelevant punctuation (e.g., “great!!!” to
“great”), tokenized into words (e.g., “slow service” → [“slow”, “service”]), and labeled for
22
sentiment (positive, negative, neutral) using tools like VADER. Stop words (e.g., “the”) are filtered
to focus on meaningful terms.
Data Normalization:
Video frames are resized to 416x416 pixels (YOLO’s standard input) and normalized to a 0-1
range for faster processing. Engagement metrics are standardized (e.g., dwell times in seconds) for
consistent analysis across datasets.
Dataset Splitting:
The data is divided into training (70%), validation (15%), and test (15%) sets. For example, 7,000
video frames and 3,500 feedback entries might train the models, with 1,500 each for validation
and testing, ensuring balanced evaluation.
Responsible data handling is paramount to protect customer privacy and ensure fairness, guided
by ethical principles and legal standards:
Informed Consent:
Signage near CCTV cameras (e.g., “Your movements may be recorded for service improvement”)
informs customers of data collection, while QR code participation is voluntary, with clear opt-in
prompts (e.g., “Scan to share your thoughts, no personal data required”).
Faces in video footage are blurred using OpenCV’s face detection algorithms to prevent
identification, and feedback responses exclude personal identifiers (e.g., names, phone numbers).
Data is stored encrypted on secure servers, adhering to regulations like GDPR or India’s Personal
Data Protection Bill.
Bias Mitigation:
Datasets include diverse demographics (age, gender, attire) and mall conditions (busy vs. quiet) to
avoid skewed models—e.g., ensuring YOLO detects both children and adults accurately.
23
Transparency:
Mall operators disclose data usage policies to customers via signage or apps, fostering trust.
Anonymized aggregate insights (e.g., “60% of visitors liked the layout”) are shared, not individual
records.
Despite a structured approach, several obstacles complicate data acquisition and preparation:
Customers vary widely in clothing, size, and posture (e.g., carrying bags, pushing strollers),
requiring extensive training data to ensure YOLO’s detection accuracy.
High-density areas like food courts or sales events introduce occlusions, making it hard to capture
clean footage or track individuals consistently. This necessitates robust preprocessing and
augmentation to simulate such conditions.
QR code surveys may suffer from low engagement if not promoted effectively—e.g., only 10% of
visitors might scan during a busy day. Incentives (e.g., discount coupons) or strategic placement
(e.g., near exits) are needed to boost responses.
Environmental Factors:
Lighting variations (e.g., dim corners, bright storefronts) and camera angles affect video quality,
potentially reducing detection reliability. Weather or seasonal events (e.g., monsoon crowds)
further complicate data consistency.
Resource Intensity:
Annotating thousands of frames and processing terabytes of video is time- and computationally
expensive. A small team might take weeks to label a week’s footage, requiring efficient tools or
outsourcing to balance cost and quality.
24
4.7 Strategies to Overcome Challenges
Automated Tools: Pre-trained models assist annotation, reducing manual effort by 50%.
Diverse Sampling: Footage from multiple malls and times ensures variety, while synthetic data
fills gaps.
User Incentives: QR codes offer small rewards (e.g., 5% off purchase) to increase feedback rates.
Preprocessing Enhancements: Stabilization and noise filters improve video usability, tested
across lighting conditions.
25
CHAPTER 5
THEORETICAL BACKGROUND
5.1 Introduction
The "Real-Time Customer Behaviour and Satisfaction Insight System in Shopping Malls"
leverages a synergistic combination of advanced technologies to monitor customer behavior,
quantify engagement, and analyze satisfaction in real time. This system integrates computer vision,
deep learning-based object detection and tracking, natural language processing (NLP), and QR
code-based data collection to transform raw mall data into actionable insights. Understanding the
theoretical underpinnings of these components is essential for appreciating how the system detects
customers, tracks their movements, processes feedback, and delivers real-time analytics to mall
operators. This chapter explores the core principles and algorithms—namely, computer vision,
YOLO (You Only Look Once), Deep SORT (Simple Online and Real-time Tracking), NLP, and
QR code technology—providing a detailed foundation for their application in the retail context.
By grounding the system in these theories, we ensure its technical robustness, scalability, and
effectiveness in addressing the complexities of shopping mall environments.
In addition, by utilizing QR codes for voluntary feedback collection, the system ensures customer
participation while maintaining privacy and compliance with data protection standards. This low-
cost, high-engagement method provides direct sentiment analysis opportunities, complementing
the observational data captured by AI tools. The blend of passive observation and active input
allows for a more nuanced understanding of customer journeys, preferences, and pain points. As
shopping environments grow more competitive and data-driven, such intelligent systems play a
crucial role in redefining customer engagement strategies, enabling malls to stay relevant and
responsive in an evolving retail landscape. The real-time nature of the system ensures that mall
administrators can respond instantly to customer needs, optimize operations dynamically, and
personalize experiences at scale, ultimately driving customer satisfaction and loyalty.
26
5.2 Computer Vision and Image Processing
Computer vision is a field of artificial intelligence that enables machines to interpret and analyze
visual data, mimicking human perception. It forms the backbone of the system’s ability to detect
and track customers within video feeds from CCTV cameras.
Core Concepts:
Images and videos are represented as matrices of pixel values—grayscale (intensity) or RGB (red,
green, blue)—where resolution (e.g., 1080p) determines detail. Feature extraction identifies
patterns like edges, shapes, or textures using techniques such as Sobel filters (for edge detection)
or Histogram of Oriented Gradients (HOG) (for object outlines). In a mall, this might mean
detecting a customer’s silhouette against a cluttered background of shelves and signage.
Image Preprocessing:
Preprocessing enhances video quality for analysis. Techniques include noise reduction (e.g.,
Gaussian blur to smooth out graininess from low-light feeds), contrast adjustment (to distinguish
customers in dim corridors), and frame resizing (to 416x416 pixels for YOLO compatibility).
Stabilization via optical flow corrects shaky footage, ensuring consistent tracking across frames.
Computer vision enables the system to process live mall footage, identifying customers as distinct
objects and extracting spatial-temporal data (e.g., position over time). This lays the groundwork
for subsequent detection and tracking, critical for mapping movement patterns and engagement
zones.
Deep learning, a subset of machine learning, uses neural networks to model complex patterns in
data, making it ideal for real-time customer detection in malls.
27
Convolutional Neural Networks (CNNs):
CNNs are specialized neural networks for image analysis, featuring layers that extract hierarchical
features—edges in early layers, shapes in middle layers, and full objects (e.g., people) in deeper
layers. Architectures like VGGNet or ResNet underpin modern detection models, with
convolutional filters scanning images to identify key visual elements. In our system, CNNs process
video frames to recognize customers amidst diverse backgrounds.
YOLO is a state-of-the-art object detection algorithm designed for speed and accuracy, making it
perfect for real-time applications. Unlike two-stage detectors (e.g., R-CNN), YOLO processes an
entire image in a single pass, dividing it into a grid (e.g., 13x13) and predicting bounding boxes,
class probabilities (e.g., “person”), and confidence scores simultaneously. Its backbone, often
Darknet-53, balances computational efficiency with precision. In a mall, YOLO might detect
multiple customers in a crowded aisle, outputting boxes with 92% confidence, enabling rapid
identification for tracking.
YOLO serves as the detection engine, identifying customers in each frame with bounding boxes
(e.g., coordinates [x, y, width, height]). This real-time capability (~30 FPS) ensures the system
keeps pace with dynamic mall activity, providing the raw data for behavioral analysis.
Tracking customers across video frames requires associating detections over time, a task handled
by Deep SORT.
Tracking involves linking detected objects (e.g., customers) across frames to maintain their
identities despite movement, occlusions, or camera switches. Basic methods use motion prediction
(e.g., Kalman filters), but modern systems add appearance features for robustness.
28
Deep SORT Mechanics:
Deep SORT extends the SORT (Simple Online and Realtime Tracking) algorithm by integrating
Appearance Model: A pre-trained CNN extracts features (e.g., clothing color, shape) from
bounding boxes, calculating similarity scores (e.g., cosine distance) to re-identify customers post-
occlusion.
Hungarian Algorithm: Matches predicted tracks to new detections, resolving identity switches
in crowds.
Understanding where customers are and how they move enhances behavioral analysis beyond
mere detection.
Each detected customer is enclosed in a bounding box with coordinates (x, y, width, height). The
system calculates the center point (x_center, y_center) to determine spatial position relative to the
frame—e.g., left (<40% frame width), right (>60%), or center (40-60%). This maps customer
locations within stores or corridors.
Engagement Metrics:
Temporal analysis tracks dwell times (e.g., seconds spent near a display) and interaction counts
(e.g., kiosk touches), derived from Deep SORT’s continuous tracking. For instance, a customer
lingering 15 seconds at a shirt rack suggests interest, informing product placement.
29
Role in the System:
Spatial analysis produces actionable data—e.g., “three customers on the left near dresses”—and
visual outputs like heatmaps, showing high-traffic zones for layout adjustments.
NLP processes textual feedback from QR code surveys, extracting sentiments and preferences to
complement video data.
NLP involves tokenization (splitting “great service” into “great” and “service”), stop-word
removal (e.g., “the”), and stemming (e.g., “running” to “run”) to prepare text for analysis.
Sentiment analysis tools like VADER score phrases—e.g., “love the deals” (+0.8, positive)—
while topic modeling (e.g., Latent Dirichlet Allocation) groups feedback into themes like “service”
or “pricing.”
The system uses pyttsx3 or similar offline NLP engines to process feedback instantly, classifying
sentiments (e.g., “queues too long” as negative) and identifying key issues (e.g., “slow checkout”).
This ensures timely insights without internet dependency.
NLP correlates feedback with behavior—e.g., negative comments about queues align with long
dwell times at checkouts—offering a dual perspective that enhances satisfaction analysis and
guides operational responses.
QR codes enable fast, anonymous feedback collection, bridging physical and digital interactions.
Operational Principles:
QR codes are matrix barcodes encoding URLs, generated with Python libraries like qrcode.
30
Customers scan them with smartphones, accessing forms to rate experiences (e.g., 1-5 stars) or
write comments (e.g., “fast staff”). Data is timestamped and stored locally or on a server.
Advantages:
QR codes offer immediacy (feedback in seconds), anonymity (no login required), and scalability
(deployable mall-wide). A single scan at an exit might yield “great variety,” instantly processed
by NLP.
QR codes collect real-time customer feedback, enriching video-based insights with subjective
opinions. They provide a non-intrusive way to gauge satisfaction, ensuring customer privacy. This
dual-layered approach enhances accuracy in understanding shopping behaviors. It bridges the gap
between observed actions and customer perceptions. The system thus delivers a more
comprehensive and actionable analysis.
Real-Time Constraints: Achieving <100ms latency for YOLO and Deep SORT requires optimized
31
hardware (e.g., GPUs).
Occlusion Handling: Deep SORT may falter in extreme crowds, needing parameter tuning.
NLP Accuracy: Informal feedback (e.g., slang) challenges sentiment models, requiring robust
training.
The system combines these theories into a cohesive pipeline: computer vision and YOLO detect
customers, Deep SORT tracks them, spatial analysis quantifies behavior, NLP interprets feedback,
and QR codes collect it—all unified in real-time analytics. This integration leverages each
component’s strengths, overcoming individual limitations to deliver a comprehensive retail
solution.
32
CHAPTER 6
METHODOLOGY
The methodology for analyzing customer behavior in shopping malls using state-of-the-art AI
techniques. The proposed system enhances existing methods by integrating YOLO for person
detection, DeepSORT for multi-object tracking, and QR code-based feedback collection. The
framework ensures high accuracy in detecting customer movements, identifying interactions, and
providing real-time insights for data-driven decision-making
To analyze customer behavior in malls, the system employs YOLO for real-time object detection.
YOLO is chosen for its speed and accuracy in detecting multiple objects simultaneously. The
model is initialized using the YOLO class from the ultralytics library. The load_model function
loads the pre-trained model for video frame processing. This ensures efficient and accurate
customer tracking in dynamic retail environments.
33
6.2 Real-Time Object Detection:
Real-time object detection leverages YOLO to instantly recognize and classify customers within a
shopping environment. Deep SORT enhances this by continuously tracking movement patterns,
distinguishing between first-time and returning visitors. This seamless integration ensures accurate
identification and reduces false positives, improving analytical precision.
The system captures real-time video frames using a webcam or any connected camera, feeding
them into the YOLO model via OpenCV. Each frame is resized and pre-processed to meet the
input requirements of the YOLO architecture, ensuring accurate object detection. The model then
identifies customers and other relevant objects within the frame, outputting bounding boxes and
confidence scores. These detections are passed on to the tracking algorithm for consistent
identification across frames. This continuous frame-by-frame processing enables real-time
monitoring of customer movement and interaction within the shopping mall environment.
The detect_objects function identifies and classifies objects in each frame, with a primary focus on
detecting people (class 0 in YOLO). Once detected, the system extracts the bounding box coordinates
and calculates the center position of each object. Based on these coordinates, the system determines
the relative positioning of individuals—such as whether they are to the left, right, or directly in front
of the camera. This spatial information is then used to interpret customer behavior and trigger relevant
feedback mechanisms or alerts. The positioning data also aids in mapping customer density across
different zones in the mall. It enables the system to detect crowd formations and unusual movement
patterns. These insights can be used to enhance mall navigation, prevent congestion, and improve
overall safety. Furthermore, the collected positioning data contributes to heatmap generation and
layout optimization over time.
34
6.3 Real-Time Customer Identification
The YOLO model identifies all individuals in the video feed in real time. A predefined Area of
Interest (AOI) is marked (e.g., checkout counter space). Individuals inside the AOI are classified
as customers, while those outside (e.g., sellers behind the counter) are excluded. A pre-trained
Caffe deep learning model analyzes facial features using a CNN to predict age ranges (e.g., 18-25,
26-35) and gender (male, female, or non-binary). Customers are grouped into demographic
categories, enabling personalized marketing strategies and trend analysis.
Deep SORT is an enhanced version of the SORT (Simple Online and Real-time Tracker)
algorithm, incorporating deep learning-based appearance embedding for robust object tracking. It
is commonly used for tracking objects in video streams. Deep SORT (Simple Online and Real-
time Tracking) assigns a unique ID to each customer detected by YOLO. The system uses Kalman
filtering for motion prediction and appearance features to maintain consistent tracking, even with
occlusions or overlapping paths. Tracks customer movement from the AOI to other store areas,
providing insights into shopping behavior and navigation patterns.
The system records how long each customer remains in the AOI (e.g., near a product display or
checkout counter). Customers staying beyond a threshold (e.g., 30 seconds) are classified as
"interested" Pose estimation and object detection track specific actions (e.g., picking up items or
interacting with sellers), providing. Aggregated data reveals peak interest periods, frequently
visited areas, and high-demand products, helping optimize store layouts and marketing strategie.
Unique QR codes are placed at checkout counters, store exits, and key locations. These strategically
placed codes are easily accessible to customers as they move through various areas of the mall. Scanning
these QR codes allows customers to provide instant feedback on their shopping experience, satisfaction,
and service quality.
35
Fig 6.6 Sentiment Analysis and Feedback Processing Workflow
Customers scan the QR code to access a short digital survey for feedback on satisfaction levels,
product preferences, and store experience. The collected feedback is processed using Natural
Language Processing (NLP) to classify sentiment as positive, negative, or neutral. Sentiment trends
help identify improvement areas, enhance service quality, and refine store operations.
Additionally, the survey data is analyzed over time to track changes in customer satisfaction and
identify recurring issues. By correlating this feedback with other data sources, such as foot traffic
and purchase behavior, the system can provide actionable recommendations for targeted marketing
strategies and personalized services. Ultimately, this integration of digital surveys and sentiment
analysis empowers mall operators to continuously adapt and enhance the shopping experience.
36
CHAPTER 7
customer profiling. These real-time insights empower store managers to optimize layouts, allocate
resources effectively, and deliver personalized customer experiences, thereby accelerating
decision-making processes.
37
Fig 7.2: Real-Time Detection and Analysis of Customers
This system uses YOLO for real-time customer detection in a retail clothing store. An Area of
Interest (AOI) helps differentiate customers (blue bounding boxes) from sellers (red bounding
boxes). This approach ensures accurate tracking of shopping behavior within defined store zones.
This figure illustrates product arrangement across key clothing categories: sarees, dresses, western
wear, children’s clothing, shirts, pants, and t-shirts. These categories serve as a foundation for
understanding customer behavior, including shopping durations, purchased items, verbal
interactions, and satisfaction levels.
For example, customers who spend more time in a particular category like sarees may have specific
preferences related to fabric, design, or occasion. Understanding these preferences can help retailers
tailor promotions, product displays, and staff training to enhance customer experience. Similarly,
customers purchasing western wear or children's clothing may exhibit different behaviors, such as
prioritizing comfort or style, which may influence their satisfaction levels.
38
Fig 7.3: Customer Preferences on sarees
The system generates reports on consumer interest in saree styles, browsing time, and feedback on
quality and pricing. It analyzes clothing preferences, revealing trends in fit, fabric fading, and
interactivity levels. By processing feedback, businesses can personalize recommendations to
enhance customer experience. Retailers refine inventory by identifying demand patterns and
popular product categories. Browsing duration analysis highlights which items attract the most
attention. QR-based sentiment analysis enables real-time adaptation to customer preferences.
Heatmaps visualize high-traffic zones, optimizing store layouts for better engagement.
Data-driven insights help businesses streamline marketing, pricing, and inventory strategies.
Collecting customer feedback via QR codes has become an innovative and effective way for
retailers to gather real-time insights. By placing QR codes at strategic points in stores—such as
near product displays, checkout counters, or customer service areas—retailers can make it easy
for customers to scan the code with their smartphones and quickly access an online survey or
feedback form.
39
Fig 7.4: Customer Feedback Form Interface
To facilitate seamless feedback collection, the system implements QR codes placed at strategic
locations within the retail environment, such as checkout counters or store exits. Customers can
scan these QR codes using their smartphones, which redirect them to a user-friendly digital
feedback form, as shown in Figure 8. The form, titled "Customer Feedback Form," prompts users
to input their name, age, product type (e.g., electronics, with a dropdown menu for selection), and
detailed feedback in a text box. Additional features include an "Add Another Product" button for
multiple entries and a "Submit" button to finalize the response.
This QR code-based approach ensures easy access for customers, encouraging higher participation
rates in feedback collection. The form’s simple design minimizes user effort, while the collected
data—such as product preferences and satisfaction levels—provides valuable insights for
businesses.
The project dashboard, developed using the Django framework and Python programming
language, provides an intuitive interface for retail managers to monitor real-time customer analysis
data.
40
Django’s MVC architecture ensures scalability, while Python enables seamless integration with
system components like YOLO, Deep SORT, NLP, and the Caffe model. The dashboard features
interactive visualizations of customer metrics—such as time spent, movement patterns,
demographics, and feedback sentiment—along with filters and export options for easy data
management. Integrated feedback forms from QR codes allow real-time monitoring of customer
responses. With a responsive design optimized for desktop and mobile, and Django’s security
features like user authentication, the dashboard ensures a user-friendly and secure experience,
empowering retailers to make data-driven decisions efficiently.
The dashboard provides real-time insights on customer behavior, including visit trends, gender distribution,
sentiment analysis, and time spent per category. These insights help retailers optimize inventory, store
layout, and marketing strategies for better customer engagement.
While the system performed well under typical conditions, there are several areas for potential
improvement:
41
Detection Accuracy: The system’s performance may be impacted by low-light conditions
or occlusions, where objects are partially hidden or obscured
● Model Optimization: For a smoother experience, the YOLO model could be optimized
further, especially for devices with limited processing power, to maintain high frame rates
without compromising detection accuracy.
● Person Tracking: occasionally faces challenges in maintaining consistent IDs under
complex scenarios, such as when customers move quickly, change directions abruptly, or
temporarily exit and re-enter the camera’s field of view. These situations can lead to ID
switching (where two individuals’ IDs are swapped) or tracking loss, impacting the
accuracy of time-spent analysis and movement patterns.
● NLP in Feedback Processing: the system encounters limitations in accurately interpreting
nuanced or ambiguous responses. For instance, colloquial language, sarcasm, or mixed
sentiments (e.g., "The fabric is great, but the price is too high") can lead to misclassification
of sentiment as purely positive or negative. Additionally, the system may struggle with
feedback in multiple languages or with grammatical errors, reducing the accuracy of
sentiment analysis
● Caffe Model for Age and Gender Estimation: it faces challenges in achieving high
accuracy under varying conditions. Factors such as poor image quality, non-frontal facial
views, or diverse lighting conditions can lead to incorrect predictions, such as
misclassifying a young adult as an older individual or failing to determine gender
accurately. Additionally, the model may struggle with underrepresented demographic
groups in its training data, leading to biased estimations.
42
7.7 Conclusion
In conclusion, the integration of advanced detection and tracking algorithms with Natural
Language Processing and QR code-based data collection creates a powerful system for real-time
consumer behavior analysis. By accurately monitoring customer movements, dwell time, and
engagement, the system offers valuable insights into in-store dynamics. The addition of NLP to
interpret customer feedback further enhances the depth of understanding, allowing businesses to
uncover sentiments and preferences that may not be explicitly stated.
This holistic approach empowers store managers with actionable intelligence, enabling them to
make swift, informed decisions on layout adjustments, staff deployment, and personalized
marketing strategies. As a result, customer satisfaction and operational efficiency are significantly
improved. The seamless fusion of technology and human behavior analysis positions this system
as a vital tool for modern retail environments aiming to stay competitive and customer-focused.
43
7.8 APPENDIX (SOURCE CODE)
import cv2
from ultralytics import YOLO
import wget
import numpy as np
from deep_sort.deep_sort import DeepSort
import time
import csv
import datetime
import torch
44
text_position = (top_left[0] + 18, top_left[1] - 5)
cv2.putText(image, text, text_position, font, font_scale, font_color, font_thickness)
def get_box_details(boxes):
cls = boxes.cls.tolist() # Convert tensor to list
xyxy = boxes.xyxy
conf = boxes.conf
xywh = boxes.xywh
return cls, xyxy, conf, xywh
46
CHAPTER 8
The proposed AI-based customer behavior analysis system offers a substantial improvement over
conventional methods like Pose Network and MoveNet by going beyond simple movement
tracking. Our approach incorporates YOLO for pose estimation and deep learning algorithms to
gain deeper insights into customer interests, such as gaze direction, product interactions, and dwell
time. These capabilities allow for a more comprehensive understanding of customer intent and
behavior, helping mall operators optimize store layouts and improve personalized marketing
strategies.
By generating real-time heatmaps and performing sentiment analysis, the system enables mall
operators to identify high-traffic zones, customer preferences, and emotional responses to products
or environments. It further supports operational efficiency through real-time crowd density
monitoring, which plays a crucial role in effective queue management and staff allocation. The
ability to distinguish between casual browsers and serious buyers enhances targeted sales efforts,
boosting conversion rates and overall customer satisfaction.
Another significant contribution of the system is its role in improving inventory management. By
analyzing movement patterns near specific product displays, the system can predict demand trends
and guide stocking decisions. Predictive analytics empowers mall management to stay ahead of
consumer behavior, adjusting promotional campaigns in real time and refining marketing strategies
based on data-driven insights.
Future enhancements could focus on integrating more advanced gaze estimation techniques,
including eye-tracking for better engagement measurement. Combining facial expression
recognition with voice analysis could provide even more accurate sentiment detection.
Implementing multi-camera setups and 3D environment mapping would allow for precise spatial
tracking and further improve the effectiveness of store layout optimization and customer journey
mapping.
Scalability will also be a key focus, with potential extensions to multi-level malls and integration
across various retail locations. Adding features like AI-based virtual assistants and augmented
reality product recommendations could enrich the customer experience. Finally, incorporating
secure and privacy-conscious technologies such as blockchain and federated learning can help
47
ensure responsible data use while expanding the system’s reach and effectiveness in the retail
industry.
To ensure ongoing improvement and adaptability, the system includes a feedback mechanism that
enables customers to share their experiences through a QR code-linked form. This allows mall
operators to gather direct input and continuously refine the system based on real user perspectives.
Such a feedback loop not only empowers customers but also helps businesses stay aligned with
evolving expectations and trends, creating a more responsive and customer-centric retail
environment.
Looking ahead, the integration of this system with broader smart city infrastructure presents
exciting possibilities. By connecting mall behavior analytics with urban transportation data or
external retail ecosystems, stakeholders can plan better logistics, offer seamless shopping journeys,
and create unified experiences across locations. This convergence of AI, IoT, and data science
opens new avenues for revolutionizing the retail landscape and delivering next-level service
personalization and operational excellence.
48
REFERENCES
[1] D. Merad, K.-E. Aziz, R. Iguernaissi, B. Fertil, and P. Drap, ‘Tracking multiple persons under
partial and global occlusions: Application to customers’ behavior analysis’, Pattern Recognit.
Lett., vol. 81, pp. 11–20, Oct. 2016, doi: 10.1016/j.patrec.2016.04.011.
[2] B. Kesari and S. Atulkar, ‘Satisfaction of mall shoppers: A study on perceived utilitarian and
hedonic shopping values’, J. Retail. Consum. Serv., vol. 31, pp. 22–31, Jul. 2016, doi:
10.1016/j.jretconser.2016.03.005.
[3] X. Sheng, ‘The consumer behavior analysis of virtual clothes’, Telemat. Inform. Rep., vol. 10,
p. 100047, Jun. 2023, doi: 10.1016/j.teler.2023.100047.
[7] Y. Wu, ‘Design and Implementation of Online Shopping Mall Based on Collaborative
Filtering’, Procedia Comput. Sci., vol. 247, pp. 201–210, 2024, doi:
10.1016/j.procs.2024.10.024.
[9] D. Oosterlinck, D. F. Benoit, P. Baecke, and N. Van De Weghe, ‘Bluetooth tracking of humans
in an indoor environment: An application to shopping mall visits’, Appl. Geogr., vol. 78, pp.
55–65, Jan. 2017, doi: 10.1016/j.apgeog.2016.11.005.
49
PUBLICATIONS
Srinivasa Rao Konni , Chandini Gedela , Andhavarapu Charan , Sasi Rekha Yamala , Sai Karthik
Bodigi , Dasaradha Arangi “Real-Time Customer Behaviour and Satisfaction Insight System In
Shopping Malls”, Communicated with GIET College , 2025 International Conference on Next
Generation of Green Information and Emerging Technologies.
50