Data Analytics For Product Segmentation and Demand Forecasting of A Local Retail Store Using Python
Data Analytics For Product Segmentation and Demand Forecasting of A Local Retail Store Using Python
    Abstract—In today's competitive business environment,              machine learning, and time series modelling. Pandas, NumPy,
understanding customers' expectations and choices is a necessity       Scikit-learn, and Prophet are particularly adept at product
for the successful operations of a retail store. Forecasting demand    clustering, trend analysis, and prediction modelling.
also plays an important role in maintaining inventory at an
optimum level. The work utilises data analytics for product               This article examines the utilisation of Python-based data
segmentation and demand forecasting in a local retail store.           analytics methods for efficient product segmentation and
Python is being used as a programming language for data                demand forecasting in a small retail establishment. The study
analytics. Historical sales data of a local store has been used to     seeks to analyse previous sales data to Determine specific
categorise products into different segments. Statistical techniques    product segments for focused marketing and inventory
and a k-means clustering algorithm have been used to understand        approaches and construct predictive models to anticipate future
different segments of the product. Machine learning algorithms         demand, reducing stockouts and excess inventory.
and time series models have been used to forecast future sales
trends. The business insights allow the retail store to meet              This study's findings emphasise that data-driven techniques
customers' expectations, manage inventory at an optimum level          can enhance decision-making processes in retail, resulting in
and enhance supply chain efficiency. The present work seeks to         greater efficiency, customer satisfaction, and profitability.
illustrate how data-driven tactics can enhance operational
decision-making in retail.                                                                II.   LITERATURE REVIEW
                                                                           Generally, uniform control measures for all inventory
    Keywords—Data analytics; product segmentation; demand              products are inadvisable. The high-value items may be essential
forecasting; multicriteria ABC classification; seasonality             to the viability of the firm. The study in [1] talked about ABC
                       I.    INTRODUCTION                              and multicriteria ABC analysis. In ABC analysis, products are
                                                                       classified into three classes: A, B and C. Class A items entail
    To ensure business growth and maintain operational                 significant stock-out expenses and necessitate stringent control
efficiency, understanding customer choices and forecasting             measures. It emphasised that multicriteria ABC analysis was
demand for various products have become important in the               crucial for comprehending different product categories: volume
current competitive retail environment. The problems a local           drivers, margin drivers, regular movers, and slow movers. In
retail store faces include but are not limited to inventory            multicriteria analysis, categories A_B and A_C denote volume
management, optimisation of sales strategy and fulfilling              driver items, B_A and C_A indicate margin driver items,
customers' expectations in changing market trends. In this             categories B_B, B_C, and C_B imply regular items, category
scenario, product segmentation and demand forecasting help             C_C reveals slow-moving items, and A_A encompasses both
overcome these hurdles to run a successful business.                   margin and volume driver items. The research performed
    Based on common characteristics such as sales performance,         multicriteria ABC analysis on the online retail dataset utilising
revenue generation, demand trends and consumer preferences,            data analytics methodologies. The study in [2] proposed using a
products are classified into different groups. This process is         three-phased Multi-Criteria Inventory Classification (MCIC)
nothing but product segmentation. It helps retailers customise         integrating the Analytical Hierarchy Process (AHP), Fuzzy C-
marketing strategies, optimise inventory, and enhance overall          Means (FCM) algorithm, and a newly proposed Revised-Veto
resource allocation. Previous sales data is utilised to predict        (RVeto) phase to adhere to the ABC Classification principles
future demand trends in demand forecasting. It helps retailers         and enhance its application and adaptability. Classification
make proactive decisions in procurement, inventory                     based on several criteria is essential to meeting management's
management, and supply chain operations.                               needs in the current context. The study in [3] presented a semi-
                                                                       supervised explainable methodology that integrated semi-
    Python, an object-oriented programming language, provides          supervised clustering with explainable artificial intelligence.
a rich set of libraries and tools for data analytics. Python           The semi-supervised method integrated intelligent initialisation
provides extensive solutions for addressing intricate retail           with a constrained clustering process that directed the
difficulties, encompassing data pre-treatment, visualisation,          classification procedure towards Pareto-distributed items. At the
                                                                                                                           226 | P a g e
                                                         www.ijacsa.thesai.org
                                                            (IJACSA) International Journal of Advanced Computer Science and Applications,
                                                                                                                     Vol. 16, No. 2, 2025
same time, explainable artificial intelligence was employed to          suppliers in the automotive sector was delineated. The study in
generate comprehensive micro and macro explanations of                  [9] reviewed the available literature, focusing on market
inventory categories at both the item and class levels.                 conditions, supplier characteristics, buyer characteristics, and
Implementing the suggested method for the automatic                     the connections between buyers and suppliers. The study in [10]
classification of chemical items within a distribution                  formulated an innovative methodology for supplier
organisation has demonstrated its efficacy in delivering precise,       segmentation. Fuzzy logic was utilised to divide suppliers in a
transparent, and thoroughly elucidated ABC classifications. The         broiler firm.
study in [4] presented an optimal multi-criteria ABC inventory
classification for supermarkets to manage commodities based on              The study in [11] performed a comparative analysis of
unit price, lead time, and annual usage. Of the 442 objects, 30         machine learning algorithms for demand forecasting under
were categorised as group "A," 31 as group "B," and 27 as group         uncertainty. The research utilised a synthetic dataset. The
"C" under the new ABC classification; nevertheless, all these           machine learning algorithms compared were Linear Regression,
things were categorised in group "A" in the conventional ABC            Decision Tree Regression, Random Forest Regression, Support
classification. The study in [5] indicated that AI-based methods        Vector Machine Regression (SVR), XG Boost Regression on
exhibited more accuracy than multiple discriminant analysis             the parameters of Mean Absolute Error (MAE), Mean Squared
(MDA). The statistical study specified that SVM facilitated             Error (MSE) and Root Mean Squared Error (RMSE). The study
superior classification accuracy compared to alternative AI             in [12] proposed a model that integrated time series analysis,
methodologies. This discovery indicated the potential for               boosting and deep learning for demand forecasting. It achieved
employing AI-driven methodologies for multi-criteria ABC                a significant enhancement in accuracy relative to state-of-the-art
analysis within enterprise resource planning (ERP) systems. The         studies. The testing utilised authentic data from Turkey's SOK
study in [6] aimed to present a case-based multiple-criteria ABC        Market. The article compared the Decision Tree Classifier,
analysis that enhances the traditional method by incorporating          Gaussian Naive Bayes, and K-Nearest Neighbours (KNN). The
other factors, such as lead time and SKU criticality. It offered        Gaussian Naive Bayes technique exhibited the greatest accuracy
                                                                        in demand estimation. The study in [13] focused on demand
greater managerial flexibility. Decisions from instances served
as input, with preferences for alternatives represented naturally       forecasting and consumer satisfaction within the retail sector. It
using weighted Euclidean distances. It facilitated easy                 emphasised the importance of precise demand estimation for
understanding for the decision-maker. The study in [7]                  merchants. The discussion encompassed machine learning
examined current portfolio models in procurement that                   methodologies for forecasting product demand. The paper
categorise purchases into several product classifications. Case         considered variables for prediction, including time, location, and
studies from two European automotive OEMs and two vehicle               historical data.
industry suppliers and benchmarking interviews at Toyota,                                     III.   PRESENT WORK
Japan, were used to establish a connection between these
product categories and various supplier types. Further, it tried           In this paper, product segmentation was performed using
correlating the product categories and supplier types with the          ABC and Multicriteria ABC analysis. First, data was collected,
specification       process—specifically,      associating     the      and then it was prepared for the segmentation exercise. Then,
specification types with their respective generators. The study in      segmentation was performed using Python’s inventorize
[8] conducted supplier segmentation within the automobile               package. After that, classification algorithms were applied to it,
sector and proposed four techniques for supplier relationships.         and performance was evaluated. The flow of work is shown in
Additionally, a four-phase approach for analysing, selecting,           Fig. 1.
and managing decisions on a dynamic relationship strategy with
                                                                                                                           227 | P a g e
                                                         www.ijacsa.thesai.org
                                                                    (IJACSA) International Journal of Advanced Computer Science and Applications,
                                                                                                                             Vol. 16, No. 2, 2025
                                                                                                                                        228 | P a g e
                                                              www.ijacsa.thesai.org
                                        (IJACSA) International Journal of Advanced Computer Science and Applications,
                                                                                                 Vol. 16, No. 2, 2025
(a)
(b)
(c)
                                                                                     (d)
Fig. 7. Multicriteria ABC analysis.
                                                   Fig. 8. Confusion matrix (a) KNN (b) Decision Tree (c) Random Forest and
                                                                   (d) Naïve Bayes classification algorithms.
                                                                                                            229 | P a g e
                                      www.ijacsa.thesai.org
                                                                    (IJACSA) International Journal of Advanced Computer Science and Applications,
                                                                                                                             Vol. 16, No. 2, 2025
(a)
(c)
                                                                                                                                   230 | P a g e
                                                                  www.ijacsa.thesai.org
                                                                     (IJACSA) International Journal of Advanced Computer Science and Applications,
                                                                                                                              Vol. 16, No. 2, 2025
(f)
(g)
                                     (i)
  Fig. 11. Demand pattern for (a) Printed Matter 18% (b) Printed Matter 12%
(c) Printing Job Work (d) PEN (e) Envelope (f) Paper (g) Envelopes (h) Letter
                           Head (i) Hard Bord Kut.
    Demand for these products was forecasted using machine                       Fig. 14. Comparative performance of machine learning algorithms based on
learning algorithms, and MAE, MSE, and RMSE were                                                       RMSE for all class ‘A’ items.
calculated. To visualise the results in a single frame, grouped bar
graphs have been plotted for MAE, MSE, and RMSE, as shown                           Fig. 15 shows the comparative performance of the machine
in Fig. 12, 13, and 14, respectively.                                           learning algorithm compared in this study.
                                                                                                                                          231 | P a g e
                                                                  www.ijacsa.thesai.org
                                                                    (IJACSA) International Journal of Advanced Computer Science and Applications,
                                                                                                                             Vol. 16, No. 2, 2025
                                                                                                                                               232 | P a g e
                                                                 www.ijacsa.thesai.org
                    © 2025. This work is licensed under
http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding
the ProQuest Terms and Conditions, you may use this content in accordance
                       with the terms of the License.