Kavikulguru Institute of Technology and
Science,Ramtek
             Project Preliminary Investigation Report
Name of Department:
Computer Technology
Name of Project Guide:
Mr. Bhushan A. Deshpande
Name of Project Co - Guide (if any):
Students Details :
Roll         Name of Student                        Email ID            Mobile No.
No.
 01    Saurabh Ghute                      saurabhghute123@gmail.com 7350963234
 02    Shreyash Kumbhare                  shreyashkumbhare483@gmail. 7387116166
                                          com
 03    Yeshwari Bhure                     yeshwaribhure49@gmail.com 9665591938
       Ravikant Chandewar                 ravichandewar05@gmail.com   8530259527
 04
Title of the Project:
Predictive Model for Retail Sales using Machine Learning
                                                                                      1
Area of Project Work:
Machine Learning
Problem Statement:
To design sales predictive model using linear regression and random forest model
with the help of machine learning techniques and algorithms to forecast future
sales based on historical data, trends, and various influencing factors.
Prior Art (Patent Search):
   Patent        Title of Patent                    Existing Solutions
  Applicati                                        (Abstract of Patent)
   on No.
US 9.202,227 B2 SALES    PREDICTION This study explores the development and
                SYSTEMS         AND implementation of various sales prediction systems
                METHODS             and methods, focusing on their accuracy,
                                    efficiency, and practical application in real-world
                                    scenarios. This study investigate a range of machine
                                    learning models and statistical techniques,
                                    including Linear Regression, Decision Trees,
                                    Random Forest, Gradient Boosting, Support Vector
                                    Machines, ARIMA, SARIMA, and Long Short-
                                    Term Memory (LSTM) networks. These models
                                    are applied to a comprehensive dataset comprising
                                    historical sales data, promotional activities,
                                    seasonal effects, and economic indicators from a
                                    retail company.
US              PREDICTIVE      AND This study explores the development and
2014/0067470 PROFILE       LEARNING implementation of advanced analytics systems that
A1              SALES AUTOMATION combine predictive modeling and profile learning
                ANALYTICS    SYSTEM to automate and enhance sales processes. it utilizes
                AND METHOD          clustering techniques like K-Means and DBSCAN
                                    (Density-Based Spatial Clustering of Applications
                                    with Noise) for customer segmentation and profile
                                    learning, enabling more personalized marketing
                                    strategies.By apply these methods to a rich dataset
                                    from a retail company, which includes historical
                                    sales data, customer demographics, purchase
                                    behaviors, promotional activities, seasonal effects,
                                    and economic indicators.
                                                                                          2
US 7.689,456 B2 SYSTEM           FOR Accurately predicting sales lift and profit is critical
                PREDICTING    SALES for strategic decision-making in product
                LIFT AND PROFIT OF A management and marketing. This study introduces
                PRODUCT BASED ON a novel system designed to predict the sales lift and
                HISTORICAL    SALES profit of a product using historical sales
                INFORMATION          information. Leveraging advanced machine
                                     learning algorithms and cloud computing
                                     infrastructure, the system aims to provide precise
                                     forecasts that can guide pricing strategies,
                                     promotional activities, and inventory management.
WO              FORECASTING SYSTEM The forecasting system's performance is evaluated
2023/215538 A1 USING       MACHINE using metrics such as Mean Absolute Error (MAE),
                LEARNING        AND Mean Squared Error (MSE), and Root Mean
                ENSEMBLE METHOD Squared Error (RMSE). The study also examines
                                     the computational efficiency and scalability of the
                                     proposed methods to ensure their practicality in
                                     real-world applications. The dataset used for this
                                     study comprises historical sales data from a retail
                                     company, along with features such as promotional
                                     activities, seasonal effects, and economic
                                     indicators.
                                                                                           3
Literature Review:
  Title of Paper             Details of            Literature Identified for
                           Publication with                 Project
                            Date and Year
Predictive Analysis of       February 2023    In this paper operation performed predictive
Retail Sales Forecasting                      analysis of retail sales of Citadel POS
using Machine Learning                        dataset, using different machine learning
Techniques                                    techniques.       Implemented        different
                                              regression (Linear regression, Random
                                              Forest Regression, Gradient Boosting
                                              Regression) and time series models (ARIMA
                                              LSTM), models for sale forecasting, and
                                              provided detailed predictive analysis and
                                              evaluation. The dataset used in this research
                                              work is obtained from Citadel POS (Point Of
                                              Sale) from 2013 to 2018 that is a cloud base
                                              application and facilitates retail store to
                                              carryout transactions, manage inventories,
                                              customers, vendors, view reports, manage
                                              sales, and tender data locally
Benchmarking of              January 2019     In this research, we employ a diverse set of
Regression and Time                           machine learning models and time series
Series Analysis                               analysis techniques, including Linear
Techniques for Sales                          Regression, Decision Trees, Random Forest,
Forecasting                                   Support Vector Regression (SVR), Gradient
                                              Boosting, SARIMA (Seasonal ARIMA),
                                              networks. The dataset used for this study
                                              comprises historical sales data from a retail
                                              company, including features such as past
                                              sales, promotional activities, seasonal
                                              effects, and economic indicators.
Forecasting seasonals          June 2004      Forecasting seasonal patterns and trends is
and trends by                                 crucial for effective business planning and
exponentially weighted                        resource allocation. This study focuses on
moving averages                               using Exponentially Weighted Moving
                                              Averages (EWMA) to forecast seasonals and
                                              trends in time series data. EWMA is a
                                              powerful tool for smoothing time series data
                                              and making short-term predictions by
                                              assigning exponentially decreasing weights
                                              to past observations. By analyzing historical
                                              data, we aim to provide accurate and reliable
                                              forecasts of future trends.
                                                                                                       4
 Drought forecasting in           October 2016         This study advances drought modelling
 eastern Australia using                               using multivariate adaptive regression
  multivariate adaptive                                splines (MARS), least square sup port
 regression spline, least                              vector machine (LSSVM), and M5Tree
  square support vector                                models by forecasting SPI. . MARS model
  machine and M5Tree                                   incorporated rainfall as mandatory predictor
          model                                        with      month      (periodicity),  Southern
                                                       Oscillation      Index,     Pacific   Decadal
                                                       Oscillation Index and Indian Ocean Dipole,
                                                       ENSO Modoki and Nino 3.0, 3.4 and 4.0
                                                       data added gradually.
Market Basket Analysis:             July 2016          Market Basket Analysis(MBA) also known
Identify the changing                                  as association rule learning or affinity
trends of market data                                  analysis, is a data mining technique that can
using association rule                                 be used in various fields, such as marketing,
mining                                                 bioinformatics, education field, nuclear
                                                       science etc. The main aim of MBA in
                                                       marketing is to provide the information to
                                                       the retailer to understand the purchase
                                                       behavior of the buyer, which can help the
                                                       retailer in correct decision making.
Current Limitations
  •   Incomplete Data: Inconsistent or missing data can significantly impact model accuracy.
      Retail data often have gaps due to incomplete transactions or system errors.
  •   Market Competition: Competitors' actions, such as pricing strategies or marketing
      campaigns, can affect sales in ways that are difficult to predict.
Proposed Solution
      •   Developing accurate sales prediction model to avoid over forecasting and under forecasting
          by using machine learning technique and algorithms.
      •   Appling linear regression, LSTM, ARIMA, and random forest model sales data prediction
          accuracy will be increases.
                                                                                                5
Objectives and Scope of Work
   •   To develop a sales predictive model using machine learning that can accurately predict
       retail sales for over-forecasting and under-forecasting method.
   •   To develop accurate and robust forecasting model by implementing the random forest
       model and linear regression methods for future sales prediction
Feasibility Assessment:
  I.   Expected Outcomes of the Project
         - Accurate Sales Forecasting
         - Enhanced Decision-Making
         -Increased Revenue
 II.   Innovation Potential
         - Convolutional Neural Networks (CNNs): For image-based data (e.g., product
         images) to understand visual attributes influencing sales.
         - Reinforcement Learning: To optimize pricing, promotions, and inventory levels
         through trial-and-error learning.
III.   Task Involved
         - Project Planning and Management
         - Data Collection and Integration
                                                               6
IV.   Expertise Required
         1. Inhouse Expertise
             -Guidance of project coordinator
         2. External Expertise
             -Not required
V.    Facilities Required
         1. Inhouse Facilities
            - Computer with necessary software and hardware.
            -Access to industrial equipment dataset.
         2. External Facilities
            -Not required
                                                                                 7
 Milestones and Time Plan
                                 Task                JUL    AUG         SEP
                                                     2024   2024        2024
                            Conceptual Design
                                                        ✓
Design
                            Detailed design
                                                        ✓
                            Design Modifications
                                                        ✓
                            Final Design
                                                        ✓
                            Procurement (If any)
                                                              ✓
Develop                     Prototyping
                                                              ✓
                            Modifications
                                                              ✓
                            Testing and Validation
                                                                            ✓
                            Final Modifications
                                                                            ✓
Deliver
                            IPR / patent draft
                                                                            ✓
                            Thesis and Poster
                                                                            ✓
  Name and Signature of Project Guide                              Signature of HOD
8