Sajawal et al.
LGURJCSIT 2022                     SSN: 2521-0122 (Online)
                                                                       ISSN: 2519-7991 (Print)
                     LGU Research Journal of               doi: 10.54692/lgurjcsit.2022.0604399
                     Computer Science & IT              Vol. 6 Issue 4, October – December 2022
        A Predictive Analysis of Retail Sales Forecasting using Machine
                             Learning Techniques
 Muhammad Sajawal1, Sardar Usman2 , Hamed Sanad Alshaikh3, Asad Hayat4 and M.Usman Ashraf5*
      1
        Department of Computer Science & IT Lahore Leads University, Lahore, Pakistan,
          2
            Department of Computer science, Grand Asian University Sialkot, Pakistan,
              3
                College of Telecommunications and Electronic Jeddah Saudi Arabia,
              4
                Department of Computer Science Leads university Lahore, Pakistan,
          5
            Department of Computer Science, GC Women University, Sialkot, Pakistan
                                 Email: m.usmanashraf@yahoo.com
ABSTRACT:
         Sales forecasting is vital to supply chain management and operations between retailer and
manufacturers in the retail industry. The abundant growth of digital data has minimized the tradition-
al system and approaches to a specific tasks. Sales forecasting is the most challenging task for the
retail industry's inventory management, marketing, customer service, and business financial
planning. In this paper, we performed a predictive analysis of retail sales of the Citadel POS dataset
using different machine-learning techniques. We implemented different regression (Linear regres-
sion, Random Forest Regression, Gradient Boosting Regression) and time series models (ARIMA
LSTM), models for sale forecasting, and provided detailed predictive analysis and evaluation. The
dataset used in this research is obtained from Citadel POS (Point Of Sale) from 2013 to 2018, a
cloud-based application that facilitates retail stores to carry out transactions, manage inventories,
customers, vendors, view reports, manage reports, manage sales, and tender data locally. The results
show that Xgboost outperformed time series and other regression models and achieved the best perfor-
mance with an MAE of 0.516 and RMSE of 0.63.
KEYWORDS:Machine Learning, Time Series, Sales Forecasting, Regression, Gradient Boosting,
                LSTM, ARIMA, Random
1.      INTRODUCTION                                 customer satisfaction, enhanced channel
Sales forecasting is the most challenging task for     relationships, and significant monetary savings.
inventory management, marketing, customer             There are different Back Propagation Neural
service and business financial planning for the       Network (BPN) techniques for sales forecasting
information technology chain store. For multiple      due to their ability to capture functional relations
reasons, developing an accurate sales forecasting     among the empirical data. Still, there is difficult to
model is challenging. An over-forecasting model       control large parameters and has the risk of model
increases operation costs and generates               over-fitting. The support vector regression (SVR)
unnecessary products, and the under-forecasting       algorithm has been used for solving the
model loses customer satisfaction and sales           non-linear regression estimation problem. So, the
opportunities [15]. Accurate and robust sales         prediction result of SVR is better than the BPN
forecasting results can lead to                       due to the capability to obtain a unique solution
LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT                     33
among the empirical data. SVR has been mostly         end users. Therefore, depending on the nature of
used for time series prediction, such as traffic       the business, sales forecasting can be done
flow prediction, financial time series forecasting      through human planning and
and wind speed prediction. But SVR cannot            • minimize capital costs. So, it depends upon the
show accurate results when many potential            end users. Therefore, depending on the nature of
independent variables are considered.                the business, sales forecasting can be done
To overcome the problem, Multivariate Adaptive       through human planning and
regression splines (MARS) are a suitable             • statistical models or by combining both
methodology for modelling complex nonlinear          methods. This paper used the Partial Recurrent
and non-parameter regression problems. MARS          Neural Networks (PRNN) statistical model for
has much power for building models with huge         sales forecasting. The proposed methodology can
datasets like electricity price forecasting, credit   extract the pattern from past sales and facilitates
scoring and network intrusion detection [5].         future sales forecasting.
Sales forecasting is important for enterprises to
make business plans and gain competitive             The aim of this research is to investigate the
advantages. Different time series methods             various sales forecasting methods executed in
contribute to sales forecasting, but they only deal   financial area and evaluate the performance of the
with traditional linear data and ignore nonlinear    chosen machine learning algorithms to find the
data [1]. So, to overcome this traditional method,   best suitable and efficient model for the chosen
many researchers use soft computing skills to        data set. We have used machine learning-based
solve non-linear data problems like fuzzy neural     regression models (Linear regression, random
networks, fuzzy logic, neural networks,              forest and Xtreme Gradient boosting) and time
evolutionary algorithms, etc., for robust sales       series models ( LSTM, ARIMA) for sale
forecasting.       Different    sales     forecasting   forecasting using Citadel POS data set. Results
algorithms and statistical models have been            showed that Xtreme Gradient outperformed time
generated to solve problems like ARIMA model         series models and other regression techniques.
that forecast within a few seconds based on
hundreds of historical data points [18]. But these    2.       LITERATURE REVIEW
models cannot process when complex data              2.1.      Background
patterns are given to that model for sales           The supply chain contains different business
forecasting. Although ANN-based algorithms            parties that share physical goods and customer
can solve this problem, when we consider             services related to goods and money. Supply chain
improvement in predication accuracy, these           to be developed in two areas: Supply chain
models take time to complete simple sales            execution and planning.
forecasting. The ELM model minimizes the
learning time of ANN quickly. ELM can learn          2.1.1. Forecasting Concept
much faster with a higher performance than the        Forecasts are nothing but predictions. Maybe
traditional gradient-based learning algorithms.      forecasts of sunrise and sunset can be predictable
Still, it also reduces many difficulties faced by      without any mistakes, but it is not the scenario in
gradient-based learning methods such as learning     business. Business equations change as time goes
rate, stopping criteria, the over-tuned problems,     and hence prediction may give an error. [13]
learning epochs, and local minima. ELM is being      Describes sales forecast as a projected future of
used in real-time applications such as real-time     expected demand, given a set of environmental
controlling                                          conditions. We should not confuse the planning
• systems [12]. Sales forecasting is essential to      process and forecasting process. Planning is only
the supply chain                                     a managerial action that should be taken to meet
• management and operations between the              or exceed the sales forecast. The right forecast
retailer and manufacturers. The manufacturer         aims to predict demand perfectly. Forecasting has
needs to predict the actual future demand to         been used in all kinds of companies, service
inform production planning. Similarly, retailers     sectors, and government organizations and as
need to predict sales for purchasing decisions and   input to the planning project or set of activities.
• minimize capital costs. So, it depends upon the     [8] summarizes the characteristics of sales
                                                     forecast as follows:
LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT                  34
Forecasts are always wrong; hence, one should          impossible to forecast every product with the
always expect evaluation of errors in them.           same time series technique, which is why we need
A long-term forecast usually is less accurate than     different time series techniques for each product.
a short-time forecast. This is because of a more       He also points out that many techniques are
significant standard deviation of error relative to     available in the general category of time series
the mean than short-term forecasts.                    analysis. Time series techniques have common
Aggregate forecasts usually are more accurate          characteristics and endogenous techniques. It
than disaggregate forecasts. The aggregate             means that the time series technique looks at the
forecast contains a smaller standard deviation of       patterns of actual sales history. These patterns can
error than disaggregate forecasts.                     be identified and projected to derive a forecast.
The greater the distortions of information in the      Time series techniques look only at patterns that
supply chain, the higher the errors in the sales      are parts of the actual history. Which time series
forecast.                                              technique used four basic time series patterns to
                                                      examine them: level, trend, seasonality, and noise.
2.1.2. Sales forecasting need in Planning
Manufacturing industries work on principle to          2.1.4. Machine Learning Techniques
satisfy customer demand by appropriate supply.         There are three main machine learning algorithms
According to [13], companies consider sales           i.e.,     Supervised,      Unsupervised,       and
forecasting as an integral part of this process. End   Reinforcement Learning.
customers create demand, and activities like           In supervised learning, we are given a labelled
promotions can increase it. Hence marketing           data set (labeled training data), and the desired
focuses on end customers to create demand. The         outcome is already known, where every pair of
sales department ease the same by different            training data has a relationship. Supervised
strategies such as servicing other parties in this     learning is where you have input variables (x) and
streamline like wholesaler and retailers. Supply       an output variable (Y), and you use an algorithm
should be enough to meet demand. Different             to learn the mapping function from the input to the
management functions like manufacturing,              output. Random forest, linear regression, and long
purchasing and logistics work together to              short-term      memory         are      supervised
maintain the supply.                                  machine-learning techniques ([17].
                                                      In the unsupervised machine learning approach,
2.1.3. Forecasting Methods and Techniques              the model is trained by using unlabeled or
Several standardized methods for forecasting are        non-classified data objects. The unsupervised
available. They differ in relative performance in      learning approach is more complex than
forecasting over the level of quantitative             supervised learning because neither the trained
sophistication used and the logic base (historical      model nor the machine uses a training dataset in
data, expert opinion, or surveys) from which the      this technique—two main types of unsupervised
forecast is derived. Those methods could be            machine learning are Association Rule Mining
categorized into three groups: historical              and Clustering [10].
projection, qualitative, and casual[2].
[2] states that “when a reasonable amount of           2.1.5. Association Rule Mining
historical data is available and the trend and         In this unsupervised technique, Association rule
seasonal variations in the time series are stable      mining is a technique to identify underlying
and well defined, projecting these data into the       relations between different items. Take an
future can be an effective way of forecasting for       example of a Super Market where customers can
the short term”. He also mentions that the            buy various items. For instance, mothers with
quantitative nature of the time series supports the   babies buy baby products such as milk and
use of mathematical and statistical models as a         diapers. In short, transactions involve a pattern
primary forecasting tool. By using such tools,         [11].
accuracy can be reached for forecasted periods.
These methods are most appropriate when the            2.1.6. Clustering
environmental situation is stable, and the primary     Clustering is the task of dividing the population or
demand pattern does not vary significantly from        data points into several groups such that data
year to year.                                         more like other data points in the same group than
According to [13], it is                              those in other groups. Simply, clustering is to
LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT                    35
segregate groups with similar traits [19].           Recently, [6] proposed a new learning algorithm
                                                     called Extreme Learning Machine (ELM) for
2.2.      Related Work                               single-hidden-layer feed-forward neural networks
[3] predicted the actual sales accurately by using   (SLFNs), which randomly chooses hidden nodes
different machine learning algorithms like linear     and analytically determines the output weights of
regression, Random Forest Regression, and time        SLFN. The author predicted the book sales using
series techniques like ARIMA, Seasonal Arima,        the ELM by combining with the statistical method
Non-Seasonal Arima, and Seasonal ETS. They           for a famous e-commerce company in China.
used the Walmart’s public online sales data to       [14] They proposed methodologies to extract the
predict sales using different regression              pattern from past sales and the facility's future
algorithms in Azure Machine Learning (ML)            sales forecasting. They used the statistical model
Studio. Several time series analysis methods         of Partial Recurrent Neural Networks (PRNN) for
were implemented manually using R packages           sales forecasting. They used it as a tool for
through R programming language. They selected        business planning, and after that, they performed
the best model to predict the sales, made a web       an empirical benchmark that was the prevailing
service of that model, and deployed it on the        approach in forecasting. Real-world sales series
Azure Cloud platform. Azure sent the output in       show non-linear patterns for different reasons like
the form of JSON format. After the experimental      trend, seasonality, or introduction of a new
results, the author identified the best method,        product model. That’s why PRNN can handle
which was that the regression techniques provide     nonlinear and well-suited models to solve sales
better performance compared to the time series       forecasting problems.
analysis approaches. To overcome the problem,        [9] used the Exponentially Weighted Moving
Multivariate Adaptive regression splines             Averages (EWMA) model to measure the
(MARS) are a suitable methodology for                seasonal impact on sales trends. They combined
modelling complex nonlinear and non-parameter        two feature cluster-related query algorithms and
regression problems. MARS has much power for         seasonal time series sales behavior. They
building model which has huge dataset like           compared four models query feature only, a
electricity price forecasting, credit scoring and     seasonal feature without EWMA model, a
network intrusion detection.                         seasonal feature with EWMA model, proposed
[12] proposed the hybrid two-stage model using        model seasonal feature with EWMA combined
MARS and SVR by focusing on the drawbacks            with query feature and showed the best
mentioned above and for the sales prediction         performance by developing the proposed model.
accurately. To evaluate the performance of the
proposed hybrid sales forecasting procedure,          3.       METHODOLOGY
three IT product sales data, i.e., notebook (NB),    First, we learned about current relevant research
LCD monitor and motherboard (MB), collected          to find the results. These literature review results
from an IT chain store in Taiwan are used as          are being used as input to our analysis of retail
illustrative examples.                                sales using machine learning techniques. Our
[16] proposed a model based on Back                  main goal in this research is to evaluate the
Propagation Neural Network (BPNN) to improve         performance of machine learning models like
sales forecasting by using popularity information     linear regression, Random Forest regression, and
in magazines through the Google Search engine.       Xtreme Boosting Regression on the sales data
According to the Author's view, popular content      from the point of sale. Figure 4.1 shows the
in a magazine can boost sales. In the proposed        complete methodology of the proposed solution.
model, he used popular celebrity words as
keywords to interact with the user for sales
forecasting. They used some tools to estimate the
popularity of words like Digg allows users to
submit links to news. They used the nonlinear
historical data to evaluate the forecasting
performance that our proposed model can
improve sales forecasting. They used the Chinese                  Figure 4.1: Methodology
publication magazine data.
LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT                   36
This research work is performed using python            structure. The result can be explained as follows:
programming language and multiple libraries             • P-value > 0.05: Fail to reject the null hypothesis
like pandas, Numpy, matplotlib, seaborn, and            (H0); the data has a unit root and is non-stationary.
sklearn.                                                • P-value <= 0.05: Reject the null hypothesis
                                                        (H0); the data does not have a unit root and is
3.1.      Dataset description                           stationary.
This paper presents a methodology implemented
for a retail Point of Sale system in a test set of |S|    3.3.     Feature Selection
=32 locations in early 2007. We have a citadel          As many factors play important role for machine
point of sales system that has all the records           learning success. Feature selection is a significant
related to sales. We collected the data from the        factor that hugely influences machine learning
different tables using SQL queries. Multiple             model performance. It helps from overfitting by
stores contain different items for sale. Each store        removing data redundancy, reducing the training
has five stations, and we took one customer                time and improving the model's accuracy. We
history data, has 228 invoices contains average          used different approaches to overcome these
five items. We collected the data from 2013 to           problems, like the correlation method. Feature
2018 data and performed the testing on the 2020          sets having negative co-relations with target
data. Train data contains item id, store number,         variables have been removed during the feature
total sales items, and total sales of each item. The    selection process.
training data set contains a total of 87847 rows.
                                                        3.4.     Implementation
3.2.     Data Pre-Processing                            We implemented the following model to see our
Several standardized methods for forecasting are          results and compared the performance of these
available. They differ in terms of the relative          models:
performance in forecasting over the level of             • Linear Regression Model
quantitative sophistication used and the logic           • ARIMA
base from which the forecast is derived. We              • Random Forest Regression
converted the data into days, week, year then we        • LSTM model
check the outlier, all missing or null values and       • Gradient Boosting Regression
removed all the outlier and missing values. We          Mean Absolute Error is a standard measure of
refined the dataset to perform testing. Those             forecast error in the time series analysis.MAE is
methods could be categorized into three different        one of the many metrics for evaluating the
groups: historical projection, qualitative, and          machine learning model's performance. Two
casual [2].                                             metrics, mean absolute error and root mean
                                                        square error, are evaluated for each machine
3.2.1. Augmented Dickey-Fuller Test                      learning regression model. The mean absolute is a
This is the statistical test used to test the null          quantity used to measure how close forecasts or
hypothesis that a unit root is present in an            predictions are to the eventual outcomes. As the
autoregressive model.                                   name suggests, the mean absolute error is an
The test's null hypothesis is that a unit root can       average of the absolute errors [4]. Lowering the
represent the time series and is not stationary (has     error implies greater accuracy of the model.
some time-dependent structure). The alternate
hypothesis is that the time series is Stationary [7].
3.2.2. Null Hypothesis (H0)                                                                            (1)
If failed to be rejected, it suggests the time series    Where yi represents actual values and ˆyi
has a unit root, meaning it is non-stationary. It has    represents the forecasted values.
some time-dependent structure.                           Root Mean Square Error (RMSE) is the square
                                                        root of the mean square error. It is the root of the
3.2.3. Alternate Hypothesis (H1)                        average of squared differences between prediction
The null hypothesis is rejected; it suggests the         and observation. Lower the error implies greater
time series does not have a unit root, meaning it       the accuracy of the model [4].
is stationary. It does not have a time-dependent
LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT                      37
                                                      Figure 5.2 represents item-wise sales of different
                                              (2)     stores. We took sales data from 2013 to 2018
                                                      containing Ladies, Men, Shoes, Misc., Kids,
                                                      Jewelry, Furniture, Electrical and Bins.
Where yi represents actual values and ˆyi             The dataset contains 27343 rows having attributes
represents the forecasted values.                      item id, item name, the total number of items
                                                      sales, and total sales from each item.
4.       RESULTS AND DISCUSSION
To find the result, we must have to check
stationary and non-stationary time series. Non
stationary data is not predictable and cannot be
modeled or forecasted due to change in mean,
variance, and co-relation. So, we must convert it
into stationary time series for reliable results.
4.1.      Citadel POS Dataset
The Citadel POS is a point of sales system
working in the US. There are 32 locations with
different items for sale. Each item has a different     Figure 5.3: Stationary Data Checkpoint Using
price. Two types of customers visit these stores,              Rolling Mean and Rolling STD
i.e. loyalty and non-loyalty customers. Loyalty
customers are regular customers with frequent           Fig 5.3 shows the rolling and standard deviations
shopping, but non-loyalty are not regular             of sales data. Dickey Fuller test is also used to
customers with occasional visits. Fig 5.1 shows        check static data. By visualizing the data, we can
abstract store-wise data containing a total             check whether the data is stationary or not.
invoice, sales tax and grand total, including sales   Stationary data means in which the mean value
and sales without including tax.                      continuously increase with time. If the p-value is
                                                      less than the significance level, which is 5%, or if
                                                      the static test value is greater than the critical
                                                      value, then our data would be stationary. Here, our
                                                      p-value is 5.70503.
                                                      4.2      Predictive Analysis
                                                      4.2.1    Linear Regression
                                                      Linear regression is a machine learning algorithm
                                                      based on supervised learning techniques. It is
          Figure 5.1: Store Wise Sales                used to perform regression tasks and to predict the
                                                      dependent variable (y) based on the independent
Fig 5.1 represents the overall 2019 store-wise         value (x).
sales data. Blue lines show the total invoices of                       y=m*x +c                (3)
each store, the red line represents the sales tax on
each invoice, the green line shows the total sales    We trained our model with sales data. First, we
on each store including tax, and the dark blue line    got the retail sales dataset from 2013 to 2018 to
represents the sales without including tax. The       perform the prediction task. We performed some
figure shows that the Bell store has the highest         pre-processing tasks on the dataset. After training
sales for 2019.                                       the dataset, we made sure that using a small data
                                                      set was working fine then we performed the task
                                                      on a large dataset.
                                                      Table 5.1 represents the Root Mean Squared Error
                                                      and Mean Absolute Error acquired by Linear
                                                      Regression on the validation test. In this table,
                                                      RMSE is the standard deviation of the prediction
                                                      error, which is the measure of how far data points
                                                      from the regression line, which value is 0.96849,
       Figure 5.2: Item-Wise Sales
LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT                    38
and MAE measures the average magnitude of the       prediction error, which means the measure of how
error without considering the directions between    far data points from the regression line which a
the actual and prediction observation which         value is 1.04959, and MAE measures the average
value is0.82136.                                    magnitude of the error without considering the
                                                    directions between the actual and prediction
 Table 5.1: Model Performance Using Linear          observation which value is 1.01265.
                 Regression
                                                            Table 5.2: ARIMA Model Result
           Index             Score
                                                                  Index             Score
           RMSE             0.96849
                                                                 RMSE              1.04959
           MAE              0.82136
                                                                  MAE              1.01265
                                                    Figure 5.5 represents the actual and forecasted
                                                    sales of the target variant obtained using the
                                                    ARIMA regression model, where the blue line
                                                    represents the actual sales value. The red line
                                                    represents forecasted sales of the targeted variant.
     Figure 5.4: Linear Regression Sales
                 Forecasting
Figure 5.4 represents the actual and forecasted
sales of the target variant obtained using the
linear regression model, where the blue line
represents the actual sales value. The red line
represents forecasted sales of the targeted           Figure 5.5: Sales Forecasting Using ARIMA
variant.                                                                model
Figure 5.4 represents the actual and forecasted
sales of the target variant obtained using the      4.4:     LSTM Model
linear regression model, where the blue line        Multiple sequence predictions problems have
represents the actual sales value. The red line     been for a long time; due to these types of
represents forecasted sales of the targeted          problems, it’s verydifficult to solve the time series
variant.                                            problem. To handle the sequence problem in
                                                    dataset LSTM model has been applied to predict
4.3      ARIMA Model                                the sales. It is used to predict the sales based on
This model is used to forecast sales; it’s the       previous history dataset of retail sales. For
statistical method for time series sales. There are   example, they predict sales to find patterns in the
the following ARIMA model parameters:               stock market’s data.
P: Trend autoregression order. D: Trend
difference order.                                    Table 5.3: LSTM Model Performance Results
Q: Trend moving average order
Four other differential seasonal elements are not
part of the ARIMA model. It can be handled                      Index              Score
using the SARIMA model like SARIMA (p, d, q)
(P, D, Q) m.                                                    RMSE             0.99964
Table 5.2 represents the Root Mean Squared
Error and Mean Absolute Error acquired by the
ARIMA model on the validation test. In this                      MAE              0.81910
table, RMSE is the standard deviation of the
LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT                 39
Table 5.3 represents the Root Mean Squared         the prediction error, which means a measure of
Error and Mean Absolute Error acquired by Long     how far data points from the regression line
Short-Term Memory (LSTM) Regression on the         whose value is 0.69460, and MAE measures the
validation test. In this table, RMSE is the         average magnitude of the error without
standard deviation of the prediction error, which   considering the directions between the actual and
is a measure of how far data points from the       prediction observation which value is 0.59121.
regression line whose value is 0.99964, and MAE
measures the average magnitude of the error
without considering the directions between the
actual and prediction observation which value is
0.81910.
                                                    Figure 5.7: Random Forest Regression Sales
                                                                   Forecasting
                                                   Figure 5.7 represents the actual and forecasted
                                                   sales of the target variant obtained using the
 Figure 5.6: LSTM Forecasting Performance           Random Forest regression model, where the blue
                                                   line represents the actual sales value. Red line
Figure 5.6 represents the actual and forecasted     represents forecasted sales of the targeted variant.
sales of the target variant obtained using the
LSTM regression model, where the blue line         4.6.      Extreme Gradient Boosting Regression
represents the actual sales value. The red line    The Xgboost algorithm involves three concepts:
represents forecasted sales of the targeted         extreme, gradient, and boosting. Starting from
variant.                                           basics boosting is one of the systematic ensemble
                                                   methods that aims to convert weak learners
4.5.     Random Forest Regression                   (regression trees in this case as a tree-based
We used to Random Forest regression model to        Xgboost model; there is also a linear type) into
improve our results. Random forest is a             stronger learners to obtain more accurate
supervised machine-learning technique using a      predictions. SMAPE error score is 10.14 %.
decision-tree mechanism when training the          Table 5.5 represents the Root Mean Squared Error
model. It is used to improve the computation       and Mean Absolute Error acquired by Gradient
power. Build multiple models (decision trees)      Boosting Regression on the validation test. In this
using random training dataset with replacement     table, RMSE is the standard deviation of the
and then compute the accuracy of each model.       prediction error, which means a measure of how
And increase the weight of the model that has      far data points from the regression line whose
maximum accuracy.                                  value is 0.63010, and MAE measures the average
                                                   magnitude of the error without considering the
Table 5.4: Random Forest Performance Results        directions between the actual and prediction
                                                   observation which value is 0.51599.
          Index            Score                   Table 5.5: Xgboost Model Performance Results
         RMSE             0.69460
                                                                Index             Score
          MAE             0.59121
                                                               RMSE              0.63010
Table 5.4 represents the Root Mean Squared
Error and Mean Absolute Error acquired by                       MAE              0.51599
Random Forest Regression on the validation test.
In this table, RMSE is the standard deviation of
LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT                40
                                                       worst performance with the higher error in both
                                                       matrices.
                                                       Table 5.6: Regression Model Error Comparison
 Figure 5.8: Xgboost Model Sales Forecasting
                                                        Sr.#    Index            RMSE       MAE
Figure 5.8 represents the target variant's actual
                                                            0   Random Forest 0.69460        0.59121
and forecasted sales, obtained using the Gradient
Boosting regression model, where the blue line                   Linear
                                                            1                    0.96849    0.82136
represents the actual sales value. The red line                 Regression
represents forecasted sales of the targeted                  2   ARIMA            1.04959    1.01265
variant.
                                                            3   LSTM             0.99964    0.81910
4.7.     Performance        evaluation       and
comparison results                                          4   Xgboost           0.63010    0.51599
We have implemented different machine learning
algorithm on dataset of retail sales. We applied
different machine learning algorithms. We
implemented two evaluation Root Mean Squared
Error and Mean Absolute Error to check the
performance of our different machine learning
models. When we compared all the different
models, we concluded that Xgboost is the best
suitable model for our retail sales dataset based
on the performance evaluation of all the models.
                                                        Figure 5.10: Comparison Machine Learning
                                                                    Model Error Results
                                                       Figure 5.10 represents the Mean absolute error
                                                       and Root Mean Squared error from the results of
                                                       the random forest regression, linear regression,
  Figure 5.9: Model Prediction Comparison              ARIMA, LSTM, random forest and Xgboost.
                                                       From the figure, Xgboost has the least RMSE and
Figure 5.9 represents the comparison of the            MAE and is the best-performed algorithm for
different model predictions. We implemented             point-of-sales retail sales data.
different models on the sales dataset with more
than 87746 rows. Here, the blue line shows the         5.       CONCLUSION
original value, the red line represents the linear     In this paper, we concluded that sales forecasting
regression result, the green line represents the       is the most challenging task for inventory
Random Forest regression, and the orange line           management, marketing, customer service and
represents the Xgboost results. Xgboost is the           business financial planning for the information
best suitable model to predict future sales and         technology chain store. Sales forecasting is an
shows the nearest prediction values compared to         important part of supply chain management and
other models like LSTM, Linear regression, and         operations between retailers and manufacturers.
Random Forest Regression.                               The manufacturer needs to predict the actual
Table 5.6 shows the results and performance of         future demand to inform production planning.
the models with their basic configuration and           Similarly, retailers need to predict sales for
default parameters. From this table, it’s clear that   purchasing decisions and minimize capital costs.
Gradient Boosting regression and Random Forest           Therefore, depending on the nature of the
performed well with both Matrices RMSE and             business, sales forecasting can be done through
MAE, Xgboost has a minor error in sales                 human planning and statistical model or by
forecasting compared to the linear regression,          combining both methods. Developing sales
ARIMA, and LSTM. ARIMA model showed the                forecasting an accurate model is challenging for
LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT                  41
reasons like over and under-forecasting.             Transactions on Neural Networks, Vol 20, Issue
Therefore, accurate and robust sales forecasting      8, Pg 1352-1357, 2009.
results can lead to customer satisfaction,
enhanced channel relationships, and significant      [7].     Glynn, J., Perera, N. and Verma. “Unit
monetary savings. We applied time series models     root tests and structural breaks: A survey with
like LSTM and ARIMA models to predict sales         applications”, 2007.
and machine learning algorithms like the Linear
Regression model, Random Forest model and            [8].    HOFMANN, E. “Supply Chain
Xgboost model. We found the Xgboost is the            Management: Strategy, Planning and Operation,
most suitable model for the Citadel POS dataset.     S. Chopra, P. Meindl. Elsevier Science”, 2013.
In future, the deep learning approach can be used
for sales forecasting by increasing the dataset      [9].     HOLT, Charles. C.. Forecasting
size. Similarly, deep learning models can           seasonals and trends by exponentially weighted
increase accuracy on large retail sales datasets.   moving averages. International journal of
                                                    forecasting, Vol 20,Issue 1, Pg 5-10, 2004.
REFERENCES
[1]      Álvarez-Díaz,     Marcos,      Manuel      [10].    Hussain,     Sadiq,    Rasha    Atallah,
González-Gómez,      and     María      Soledad     Amirrudin Kamsin, and Jiten Hazarika.
Otero-Giráldez.   “Forecasting      international    “Classification, clustering and association rule
tourism     demand     using   a      non-linear    mining in educational datasets using data mining
autoregressive neural network and genetic           tools: A case study”. Computer Science On-line
programming”. Forecasting, Vol 1, Issue 1, Pg 7,     Conference, Vol 3, Issue 7 , Pg 196-211 Springer,
2018.                                               2018.
[2]      BALLON,       R.   2004.      Business     [11].    KAUR, Manpreet. & KANG, Shivani.”
logistics/supply chain             management.       Market Basket Analysis: Identify the changing
Planning, organizing and controlling the supply     trends of market data using association rule
chain.                                              mining”. Procedia computer science,Vol 85, Pg
                                                    78-85, 2016.
[3]      CATAL, C., KAAN, E., ARSLAN, B. &
AKBULUT, A. “Benchmarking of regression             [12].   Lu, Chi-Jie. “Sales forecasting of
algorithms and time series analysis techniques      computer products based on variable selection
for sales forecasting”. Balkan Journal of            scheme and support vector regression”.
Electrical and Computer Engineering, Vol 7, Pg      Neurocomputing, Vol 128, Pg 491-499, 2014
20-26, 2019.
[4]     Chai, Tianfeng, and Roland R. Draxler.      [13].  MENTZER, J. T. & MOON, M. A..
"Root mean square error RMSE or mean absolute       Sales forecasting management: a demand
error MAE?–Arguments against avoiding RMSE           management approach, Sage Publications, 2004.
in the literature." Geoscientific model
development, Vol 7, Issue 3, Pg 1247-1250,          [14].    MÜLLER-NAVARRA,                      M.,
2014.                                               LESSMANN, S. & VOß, S. Sales forecasting
                                                    with partial recurrent neural networks: Empirical
[5]       Deo, Ravinesh C., Ozgur Kisi, and
Vijay P. Singh, “Drought forecasting in eastern       insights and benchmarking results. 48th Hawaii
Australia using multivariate adaptive regression     International Conference on System Sciences,
spline, least square support vector machine and      IEEE, Pg 1108-1116, 2015.
M5Tree model”. Atmospheric Research, Vol 184,
Pg 149-175, 2017.                                   [15].    Ofoegbu, Kenneth. A comparison of four
                                                    machine learning algorithms to predict product
[6].     Feng, Guorui, Guang-Bin Huang,             sales in a retail store. Dublin Business School.
Qingping Lin, and Robert Ga. “Error minimized       2021.
extreme learning machine with growth of hidden
nodes and incremental learning”. IEEE               [16].   Omar, Hani A., and Duen-Ren Liu.
LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT               42
“Enhancing sales forecasting by using neuro        [18].    Shumway, Robert H., David S. Stoffer,
networks and the popularity of magazine article   Robert H. Shumway, and David S. Stoffer. "State
titles”. Sixth International Conference on        space models." Time series analysis and its
Genetic and Evolutionary Computing,. IEEE, Pg     applications: with R examples ,Pg 89-384 ,2017.
577-580, 2012.
                                                  [19].    Sinaga, Kristina P., and Miin-Shen
[17].   Pavlyshenko,         Bohdan      M..      Yang.. “Unsupervised K-means clustering
Machine-learning models for sales time series     algorithm”. IEEE access, Volume 8, Pg
forecasting. Data, Vol 4, Pg 15,2019.              80716-80727, 2020.
LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT            43