0% found this document useful (0 votes)
10 views11 pages

A Predictive Analysis of Retail Sales Foreca Ing Using Machine Learning Techniques

This paper presents a predictive analysis of retail sales forecasting using various machine learning techniques, specifically focusing on the Citadel POS dataset from 2013 to 2018. The study evaluates different regression and time series models, with results indicating that the Xgboost model outperformed others, achieving the best performance metrics. The research highlights the importance of accurate sales forecasting for effective inventory management and business planning in the retail industry.

Uploaded by

karhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views11 pages

A Predictive Analysis of Retail Sales Foreca Ing Using Machine Learning Techniques

This paper presents a predictive analysis of retail sales forecasting using various machine learning techniques, specifically focusing on the Citadel POS dataset from 2013 to 2018. The study evaluates different regression and time series models, with results indicating that the Xgboost model outperformed others, achieving the best performance metrics. The research highlights the importance of accurate sales forecasting for effective inventory management and business planning in the retail industry.

Uploaded by

karhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Sajawal et al.

LGURJCSIT 2022 SSN: 2521-0122 (Online)


ISSN: 2519-7991 (Print)

LGU Research Journal of doi: 10.54692/lgurjcsit.2022.0604399


Computer Science & IT Vol. 6 Issue 4, October – December 2022

A Predictive Analysis of Retail Sales Forecasting using Machine


Learning Techniques

Muhammad Sajawal1, Sardar Usman2 , Hamed Sanad Alshaikh3, Asad Hayat4 and M.Usman Ashraf5*
1
Department of Computer Science & IT Lahore Leads University, Lahore, Pakistan,
2
Department of Computer science, Grand Asian University Sialkot, Pakistan,
3
College of Telecommunications and Electronic Jeddah Saudi Arabia,
4
Department of Computer Science Leads university Lahore, Pakistan,
5
Department of Computer Science, GC Women University, Sialkot, Pakistan

Email: m.usmanashraf@yahoo.com

ABSTRACT:
Sales forecasting is vital to supply chain management and operations between retailer and
manufacturers in the retail industry. The abundant growth of digital data has minimized the tradition-
al system and approaches to a specific tasks. Sales forecasting is the most challenging task for the
retail industry's inventory management, marketing, customer service, and business financial
planning. In this paper, we performed a predictive analysis of retail sales of the Citadel POS dataset
using different machine-learning techniques. We implemented different regression (Linear regres-
sion, Random Forest Regression, Gradient Boosting Regression) and time series models (ARIMA
LSTM), models for sale forecasting, and provided detailed predictive analysis and evaluation. The
dataset used in this research is obtained from Citadel POS (Point Of Sale) from 2013 to 2018, a
cloud-based application that facilitates retail stores to carry out transactions, manage inventories,
customers, vendors, view reports, manage reports, manage sales, and tender data locally. The results
show that Xgboost outperformed time series and other regression models and achieved the best perfor-
mance with an MAE of 0.516 and RMSE of 0.63.

KEYWORDS:Machine Learning, Time Series, Sales Forecasting, Regression, Gradient Boosting,


LSTM, ARIMA, Random

1. INTRODUCTION customer satisfaction, enhanced channel


Sales forecasting is the most challenging task for relationships, and significant monetary savings.
inventory management, marketing, customer There are different Back Propagation Neural
service and business financial planning for the Network (BPN) techniques for sales forecasting
information technology chain store. For multiple due to their ability to capture functional relations
reasons, developing an accurate sales forecasting among the empirical data. Still, there is difficult to
model is challenging. An over-forecasting model control large parameters and has the risk of model
increases operation costs and generates over-fitting. The support vector regression (SVR)
unnecessary products, and the under-forecasting algorithm has been used for solving the
model loses customer satisfaction and sales non-linear regression estimation problem. So, the
opportunities [15]. Accurate and robust sales prediction result of SVR is better than the BPN
forecasting results can lead to due to the capability to obtain a unique solution

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 33
among the empirical data. SVR has been mostly end users. Therefore, depending on the nature of
used for time series prediction, such as traffic the business, sales forecasting can be done
flow prediction, financial time series forecasting through human planning and
and wind speed prediction. But SVR cannot • minimize capital costs. So, it depends upon the
show accurate results when many potential end users. Therefore, depending on the nature of
independent variables are considered. the business, sales forecasting can be done
To overcome the problem, Multivariate Adaptive through human planning and
regression splines (MARS) are a suitable • statistical models or by combining both
methodology for modelling complex nonlinear methods. This paper used the Partial Recurrent
and non-parameter regression problems. MARS Neural Networks (PRNN) statistical model for
has much power for building models with huge sales forecasting. The proposed methodology can
datasets like electricity price forecasting, credit extract the pattern from past sales and facilitates
scoring and network intrusion detection [5]. future sales forecasting.
Sales forecasting is important for enterprises to
make business plans and gain competitive The aim of this research is to investigate the
advantages. Different time series methods various sales forecasting methods executed in
contribute to sales forecasting, but they only deal financial area and evaluate the performance of the
with traditional linear data and ignore nonlinear chosen machine learning algorithms to find the
data [1]. So, to overcome this traditional method, best suitable and efficient model for the chosen
many researchers use soft computing skills to data set. We have used machine learning-based
solve non-linear data problems like fuzzy neural regression models (Linear regression, random
networks, fuzzy logic, neural networks, forest and Xtreme Gradient boosting) and time
evolutionary algorithms, etc., for robust sales series models ( LSTM, ARIMA) for sale
forecasting. Different sales forecasting forecasting using Citadel POS data set. Results
algorithms and statistical models have been showed that Xtreme Gradient outperformed time
generated to solve problems like ARIMA model series models and other regression techniques.
that forecast within a few seconds based on
hundreds of historical data points [18]. But these 2. LITERATURE REVIEW
models cannot process when complex data 2.1. Background
patterns are given to that model for sales The supply chain contains different business
forecasting. Although ANN-based algorithms parties that share physical goods and customer
can solve this problem, when we consider services related to goods and money. Supply chain
improvement in predication accuracy, these to be developed in two areas: Supply chain
models take time to complete simple sales execution and planning.
forecasting. The ELM model minimizes the
learning time of ANN quickly. ELM can learn 2.1.1. Forecasting Concept
much faster with a higher performance than the Forecasts are nothing but predictions. Maybe
traditional gradient-based learning algorithms. forecasts of sunrise and sunset can be predictable
Still, it also reduces many difficulties faced by without any mistakes, but it is not the scenario in
gradient-based learning methods such as learning business. Business equations change as time goes
rate, stopping criteria, the over-tuned problems, and hence prediction may give an error. [13]
learning epochs, and local minima. ELM is being Describes sales forecast as a projected future of
used in real-time applications such as real-time expected demand, given a set of environmental
controlling conditions. We should not confuse the planning
• systems [12]. Sales forecasting is essential to process and forecasting process. Planning is only
the supply chain a managerial action that should be taken to meet
• management and operations between the or exceed the sales forecast. The right forecast
retailer and manufacturers. The manufacturer aims to predict demand perfectly. Forecasting has
needs to predict the actual future demand to been used in all kinds of companies, service
inform production planning. Similarly, retailers sectors, and government organizations and as
need to predict sales for purchasing decisions and input to the planning project or set of activities.
• minimize capital costs. So, it depends upon the [8] summarizes the characteristics of sales
forecast as follows:

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 34
Forecasts are always wrong; hence, one should impossible to forecast every product with the
always expect evaluation of errors in them. same time series technique, which is why we need
A long-term forecast usually is less accurate than different time series techniques for each product.
a short-time forecast. This is because of a more He also points out that many techniques are
significant standard deviation of error relative to available in the general category of time series
the mean than short-term forecasts. analysis. Time series techniques have common
Aggregate forecasts usually are more accurate characteristics and endogenous techniques. It
than disaggregate forecasts. The aggregate means that the time series technique looks at the
forecast contains a smaller standard deviation of patterns of actual sales history. These patterns can
error than disaggregate forecasts. be identified and projected to derive a forecast.
The greater the distortions of information in the Time series techniques look only at patterns that
supply chain, the higher the errors in the sales are parts of the actual history. Which time series
forecast. technique used four basic time series patterns to
examine them: level, trend, seasonality, and noise.
2.1.2. Sales forecasting need in Planning
Manufacturing industries work on principle to 2.1.4. Machine Learning Techniques
satisfy customer demand by appropriate supply. There are three main machine learning algorithms
According to [13], companies consider sales i.e., Supervised, Unsupervised, and
forecasting as an integral part of this process. End Reinforcement Learning.
customers create demand, and activities like In supervised learning, we are given a labelled
promotions can increase it. Hence marketing data set (labeled training data), and the desired
focuses on end customers to create demand. The outcome is already known, where every pair of
sales department ease the same by different training data has a relationship. Supervised
strategies such as servicing other parties in this learning is where you have input variables (x) and
streamline like wholesaler and retailers. Supply an output variable (Y), and you use an algorithm
should be enough to meet demand. Different to learn the mapping function from the input to the
management functions like manufacturing, output. Random forest, linear regression, and long
purchasing and logistics work together to short-term memory are supervised
maintain the supply. machine-learning techniques ([17].
In the unsupervised machine learning approach,
2.1.3. Forecasting Methods and Techniques the model is trained by using unlabeled or
Several standardized methods for forecasting are non-classified data objects. The unsupervised
available. They differ in relative performance in learning approach is more complex than
forecasting over the level of quantitative supervised learning because neither the trained
sophistication used and the logic base (historical model nor the machine uses a training dataset in
data, expert opinion, or surveys) from which the this technique—two main types of unsupervised
forecast is derived. Those methods could be machine learning are Association Rule Mining
categorized into three groups: historical and Clustering [10].
projection, qualitative, and casual[2].
[2] states that “when a reasonable amount of 2.1.5. Association Rule Mining
historical data is available and the trend and In this unsupervised technique, Association rule
seasonal variations in the time series are stable mining is a technique to identify underlying
and well defined, projecting these data into the relations between different items. Take an
future can be an effective way of forecasting for example of a Super Market where customers can
the short term”. He also mentions that the buy various items. For instance, mothers with
quantitative nature of the time series supports the babies buy baby products such as milk and
use of mathematical and statistical models as a diapers. In short, transactions involve a pattern
primary forecasting tool. By using such tools, [11].
accuracy can be reached for forecasted periods.
These methods are most appropriate when the 2.1.6. Clustering
environmental situation is stable, and the primary Clustering is the task of dividing the population or
demand pattern does not vary significantly from data points into several groups such that data
year to year. more like other data points in the same group than
According to [13], it is those in other groups. Simply, clustering is to

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 35
segregate groups with similar traits [19]. Recently, [6] proposed a new learning algorithm
called Extreme Learning Machine (ELM) for
2.2. Related Work single-hidden-layer feed-forward neural networks
[3] predicted the actual sales accurately by using (SLFNs), which randomly chooses hidden nodes
different machine learning algorithms like linear and analytically determines the output weights of
regression, Random Forest Regression, and time SLFN. The author predicted the book sales using
series techniques like ARIMA, Seasonal Arima, the ELM by combining with the statistical method
Non-Seasonal Arima, and Seasonal ETS. They for a famous e-commerce company in China.
used the Walmart’s public online sales data to [14] They proposed methodologies to extract the
predict sales using different regression pattern from past sales and the facility's future
algorithms in Azure Machine Learning (ML) sales forecasting. They used the statistical model
Studio. Several time series analysis methods of Partial Recurrent Neural Networks (PRNN) for
were implemented manually using R packages sales forecasting. They used it as a tool for
through R programming language. They selected business planning, and after that, they performed
the best model to predict the sales, made a web an empirical benchmark that was the prevailing
service of that model, and deployed it on the approach in forecasting. Real-world sales series
Azure Cloud platform. Azure sent the output in show non-linear patterns for different reasons like
the form of JSON format. After the experimental trend, seasonality, or introduction of a new
results, the author identified the best method, product model. That’s why PRNN can handle
which was that the regression techniques provide nonlinear and well-suited models to solve sales
better performance compared to the time series forecasting problems.
analysis approaches. To overcome the problem, [9] used the Exponentially Weighted Moving
Multivariate Adaptive regression splines Averages (EWMA) model to measure the
(MARS) are a suitable methodology for seasonal impact on sales trends. They combined
modelling complex nonlinear and non-parameter two feature cluster-related query algorithms and
regression problems. MARS has much power for seasonal time series sales behavior. They
building model which has huge dataset like compared four models query feature only, a
electricity price forecasting, credit scoring and seasonal feature without EWMA model, a
network intrusion detection. seasonal feature with EWMA model, proposed
[12] proposed the hybrid two-stage model using model seasonal feature with EWMA combined
MARS and SVR by focusing on the drawbacks with query feature and showed the best
mentioned above and for the sales prediction performance by developing the proposed model.
accurately. To evaluate the performance of the
proposed hybrid sales forecasting procedure, 3. METHODOLOGY
three IT product sales data, i.e., notebook (NB), First, we learned about current relevant research
LCD monitor and motherboard (MB), collected to find the results. These literature review results
from an IT chain store in Taiwan are used as are being used as input to our analysis of retail
illustrative examples. sales using machine learning techniques. Our
[16] proposed a model based on Back main goal in this research is to evaluate the
Propagation Neural Network (BPNN) to improve performance of machine learning models like
sales forecasting by using popularity information linear regression, Random Forest regression, and
in magazines through the Google Search engine. Xtreme Boosting Regression on the sales data
According to the Author's view, popular content from the point of sale. Figure 4.1 shows the
in a magazine can boost sales. In the proposed complete methodology of the proposed solution.
model, he used popular celebrity words as
keywords to interact with the user for sales
forecasting. They used some tools to estimate the
popularity of words like Digg allows users to
submit links to news. They used the nonlinear
historical data to evaluate the forecasting
performance that our proposed model can
improve sales forecasting. They used the Chinese Figure 4.1: Methodology
publication magazine data.

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 36
This research work is performed using python structure. The result can be explained as follows:
programming language and multiple libraries • P-value > 0.05: Fail to reject the null hypothesis
like pandas, Numpy, matplotlib, seaborn, and (H0); the data has a unit root and is non-stationary.
sklearn. • P-value <= 0.05: Reject the null hypothesis
(H0); the data does not have a unit root and is
3.1. Dataset description stationary.
This paper presents a methodology implemented
for a retail Point of Sale system in a test set of |S| 3.3. Feature Selection
=32 locations in early 2007. We have a citadel As many factors play important role for machine
point of sales system that has all the records learning success. Feature selection is a significant
related to sales. We collected the data from the factor that hugely influences machine learning
different tables using SQL queries. Multiple model performance. It helps from overfitting by
stores contain different items for sale. Each store removing data redundancy, reducing the training
has five stations, and we took one customer time and improving the model's accuracy. We
history data, has 228 invoices contains average used different approaches to overcome these
five items. We collected the data from 2013 to problems, like the correlation method. Feature
2018 data and performed the testing on the 2020 sets having negative co-relations with target
data. Train data contains item id, store number, variables have been removed during the feature
total sales items, and total sales of each item. The selection process.
training data set contains a total of 87847 rows.
3.4. Implementation
3.2. Data Pre-Processing We implemented the following model to see our
Several standardized methods for forecasting are results and compared the performance of these
available. They differ in terms of the relative models:
performance in forecasting over the level of • Linear Regression Model
quantitative sophistication used and the logic • ARIMA
base from which the forecast is derived. We • Random Forest Regression
converted the data into days, week, year then we • LSTM model
check the outlier, all missing or null values and • Gradient Boosting Regression
removed all the outlier and missing values. We Mean Absolute Error is a standard measure of
refined the dataset to perform testing. Those forecast error in the time series analysis.MAE is
methods could be categorized into three different one of the many metrics for evaluating the
groups: historical projection, qualitative, and machine learning model's performance. Two
casual [2]. metrics, mean absolute error and root mean
square error, are evaluated for each machine
3.2.1. Augmented Dickey-Fuller Test learning regression model. The mean absolute is a
This is the statistical test used to test the null quantity used to measure how close forecasts or
hypothesis that a unit root is present in an predictions are to the eventual outcomes. As the
autoregressive model. name suggests, the mean absolute error is an
The test's null hypothesis is that a unit root can average of the absolute errors [4]. Lowering the
represent the time series and is not stationary (has error implies greater accuracy of the model.
some time-dependent structure). The alternate
hypothesis is that the time series is Stationary [7].

3.2.2. Null Hypothesis (H0) (1)


If failed to be rejected, it suggests the time series Where yi represents actual values and ˆyi
has a unit root, meaning it is non-stationary. It has represents the forecasted values.
some time-dependent structure. Root Mean Square Error (RMSE) is the square
root of the mean square error. It is the root of the
3.2.3. Alternate Hypothesis (H1) average of squared differences between prediction
The null hypothesis is rejected; it suggests the and observation. Lower the error implies greater
time series does not have a unit root, meaning it the accuracy of the model [4].
is stationary. It does not have a time-dependent

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 37
Figure 5.2 represents item-wise sales of different
(2) stores. We took sales data from 2013 to 2018
containing Ladies, Men, Shoes, Misc., Kids,
Jewelry, Furniture, Electrical and Bins.
Where yi represents actual values and ˆyi The dataset contains 27343 rows having attributes
represents the forecasted values. item id, item name, the total number of items
sales, and total sales from each item.
4. RESULTS AND DISCUSSION
To find the result, we must have to check
stationary and non-stationary time series. Non
stationary data is not predictable and cannot be
modeled or forecasted due to change in mean,
variance, and co-relation. So, we must convert it
into stationary time series for reliable results.

4.1. Citadel POS Dataset


The Citadel POS is a point of sales system
working in the US. There are 32 locations with
different items for sale. Each item has a different Figure 5.3: Stationary Data Checkpoint Using
price. Two types of customers visit these stores, Rolling Mean and Rolling STD
i.e. loyalty and non-loyalty customers. Loyalty
customers are regular customers with frequent Fig 5.3 shows the rolling and standard deviations
shopping, but non-loyalty are not regular of sales data. Dickey Fuller test is also used to
customers with occasional visits. Fig 5.1 shows check static data. By visualizing the data, we can
abstract store-wise data containing a total check whether the data is stationary or not.
invoice, sales tax and grand total, including sales Stationary data means in which the mean value
and sales without including tax. continuously increase with time. If the p-value is
less than the significance level, which is 5%, or if
the static test value is greater than the critical
value, then our data would be stationary. Here, our
p-value is 5.70503.

4.2 Predictive Analysis


4.2.1 Linear Regression
Linear regression is a machine learning algorithm
based on supervised learning techniques. It is
Figure 5.1: Store Wise Sales used to perform regression tasks and to predict the
dependent variable (y) based on the independent
Fig 5.1 represents the overall 2019 store-wise value (x).
sales data. Blue lines show the total invoices of y=m*x +c (3)
each store, the red line represents the sales tax on
each invoice, the green line shows the total sales We trained our model with sales data. First, we
on each store including tax, and the dark blue line got the retail sales dataset from 2013 to 2018 to
represents the sales without including tax. The perform the prediction task. We performed some
figure shows that the Bell store has the highest pre-processing tasks on the dataset. After training
sales for 2019. the dataset, we made sure that using a small data
set was working fine then we performed the task
on a large dataset.
Table 5.1 represents the Root Mean Squared Error
and Mean Absolute Error acquired by Linear
Regression on the validation test. In this table,
RMSE is the standard deviation of the prediction
error, which is the measure of how far data points
from the regression line, which value is 0.96849,
Figure 5.2: Item-Wise Sales
LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 38
and MAE measures the average magnitude of the prediction error, which means the measure of how
error without considering the directions between far data points from the regression line which a
the actual and prediction observation which value is 1.04959, and MAE measures the average
value is0.82136. magnitude of the error without considering the
directions between the actual and prediction
Table 5.1: Model Performance Using Linear observation which value is 1.01265.
Regression
Table 5.2: ARIMA Model Result
Index Score
Index Score
RMSE 0.96849
RMSE 1.04959
MAE 0.82136
MAE 1.01265

Figure 5.5 represents the actual and forecasted


sales of the target variant obtained using the
ARIMA regression model, where the blue line
represents the actual sales value. The red line
represents forecasted sales of the targeted variant.

Figure 5.4: Linear Regression Sales


Forecasting

Figure 5.4 represents the actual and forecasted


sales of the target variant obtained using the
linear regression model, where the blue line
represents the actual sales value. The red line
represents forecasted sales of the targeted Figure 5.5: Sales Forecasting Using ARIMA
variant. model
Figure 5.4 represents the actual and forecasted
sales of the target variant obtained using the 4.4: LSTM Model
linear regression model, where the blue line Multiple sequence predictions problems have
represents the actual sales value. The red line been for a long time; due to these types of
represents forecasted sales of the targeted problems, it’s verydifficult to solve the time series
variant. problem. To handle the sequence problem in
dataset LSTM model has been applied to predict
4.3 ARIMA Model the sales. It is used to predict the sales based on
This model is used to forecast sales; it’s the previous history dataset of retail sales. For
statistical method for time series sales. There are example, they predict sales to find patterns in the
the following ARIMA model parameters: stock market’s data.
P: Trend autoregression order. D: Trend
difference order. Table 5.3: LSTM Model Performance Results
Q: Trend moving average order
Four other differential seasonal elements are not
part of the ARIMA model. It can be handled Index Score
using the SARIMA model like SARIMA (p, d, q)
(P, D, Q) m. RMSE 0.99964
Table 5.2 represents the Root Mean Squared
Error and Mean Absolute Error acquired by the
ARIMA model on the validation test. In this MAE 0.81910
table, RMSE is the standard deviation of the

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 39
Table 5.3 represents the Root Mean Squared the prediction error, which means a measure of
Error and Mean Absolute Error acquired by Long how far data points from the regression line
Short-Term Memory (LSTM) Regression on the whose value is 0.69460, and MAE measures the
validation test. In this table, RMSE is the average magnitude of the error without
standard deviation of the prediction error, which considering the directions between the actual and
is a measure of how far data points from the prediction observation which value is 0.59121.
regression line whose value is 0.99964, and MAE
measures the average magnitude of the error
without considering the directions between the
actual and prediction observation which value is
0.81910.

Figure 5.7: Random Forest Regression Sales


Forecasting

Figure 5.7 represents the actual and forecasted


sales of the target variant obtained using the
Figure 5.6: LSTM Forecasting Performance Random Forest regression model, where the blue
line represents the actual sales value. Red line
Figure 5.6 represents the actual and forecasted represents forecasted sales of the targeted variant.
sales of the target variant obtained using the
LSTM regression model, where the blue line 4.6. Extreme Gradient Boosting Regression
represents the actual sales value. The red line The Xgboost algorithm involves three concepts:
represents forecasted sales of the targeted extreme, gradient, and boosting. Starting from
variant. basics boosting is one of the systematic ensemble
methods that aims to convert weak learners
4.5. Random Forest Regression (regression trees in this case as a tree-based
We used to Random Forest regression model to Xgboost model; there is also a linear type) into
improve our results. Random forest is a stronger learners to obtain more accurate
supervised machine-learning technique using a predictions. SMAPE error score is 10.14 %.
decision-tree mechanism when training the Table 5.5 represents the Root Mean Squared Error
model. It is used to improve the computation and Mean Absolute Error acquired by Gradient
power. Build multiple models (decision trees) Boosting Regression on the validation test. In this
using random training dataset with replacement table, RMSE is the standard deviation of the
and then compute the accuracy of each model. prediction error, which means a measure of how
And increase the weight of the model that has far data points from the regression line whose
maximum accuracy. value is 0.63010, and MAE measures the average
magnitude of the error without considering the
Table 5.4: Random Forest Performance Results directions between the actual and prediction
observation which value is 0.51599.

Index Score Table 5.5: Xgboost Model Performance Results

RMSE 0.69460
Index Score
MAE 0.59121
RMSE 0.63010

Table 5.4 represents the Root Mean Squared


Error and Mean Absolute Error acquired by MAE 0.51599
Random Forest Regression on the validation test.
In this table, RMSE is the standard deviation of

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 40
worst performance with the higher error in both
matrices.

Table 5.6: Regression Model Error Comparison


Figure 5.8: Xgboost Model Sales Forecasting
Sr.# Index RMSE MAE
Figure 5.8 represents the target variant's actual
0 Random Forest 0.69460 0.59121
and forecasted sales, obtained using the Gradient
Boosting regression model, where the blue line Linear
1 0.96849 0.82136
represents the actual sales value. The red line Regression
represents forecasted sales of the targeted 2 ARIMA 1.04959 1.01265
variant.
3 LSTM 0.99964 0.81910
4.7. Performance evaluation and
comparison results 4 Xgboost 0.63010 0.51599
We have implemented different machine learning
algorithm on dataset of retail sales. We applied
different machine learning algorithms. We
implemented two evaluation Root Mean Squared
Error and Mean Absolute Error to check the
performance of our different machine learning
models. When we compared all the different
models, we concluded that Xgboost is the best
suitable model for our retail sales dataset based
on the performance evaluation of all the models.

Figure 5.10: Comparison Machine Learning


Model Error Results

Figure 5.10 represents the Mean absolute error


and Root Mean Squared error from the results of
the random forest regression, linear regression,
Figure 5.9: Model Prediction Comparison ARIMA, LSTM, random forest and Xgboost.
From the figure, Xgboost has the least RMSE and
Figure 5.9 represents the comparison of the MAE and is the best-performed algorithm for
different model predictions. We implemented point-of-sales retail sales data.
different models on the sales dataset with more
than 87746 rows. Here, the blue line shows the 5. CONCLUSION
original value, the red line represents the linear In this paper, we concluded that sales forecasting
regression result, the green line represents the is the most challenging task for inventory
Random Forest regression, and the orange line management, marketing, customer service and
represents the Xgboost results. Xgboost is the business financial planning for the information
best suitable model to predict future sales and technology chain store. Sales forecasting is an
shows the nearest prediction values compared to important part of supply chain management and
other models like LSTM, Linear regression, and operations between retailers and manufacturers.
Random Forest Regression. The manufacturer needs to predict the actual
Table 5.6 shows the results and performance of future demand to inform production planning.
the models with their basic configuration and Similarly, retailers need to predict sales for
default parameters. From this table, it’s clear that purchasing decisions and minimize capital costs.
Gradient Boosting regression and Random Forest Therefore, depending on the nature of the
performed well with both Matrices RMSE and business, sales forecasting can be done through
MAE, Xgboost has a minor error in sales human planning and statistical model or by
forecasting compared to the linear regression, combining both methods. Developing sales
ARIMA, and LSTM. ARIMA model showed the forecasting an accurate model is challenging for

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 41
reasons like over and under-forecasting. Transactions on Neural Networks, Vol 20, Issue
Therefore, accurate and robust sales forecasting 8, Pg 1352-1357, 2009.
results can lead to customer satisfaction,
enhanced channel relationships, and significant [7]. Glynn, J., Perera, N. and Verma. “Unit
monetary savings. We applied time series models root tests and structural breaks: A survey with
like LSTM and ARIMA models to predict sales applications”, 2007.
and machine learning algorithms like the Linear
Regression model, Random Forest model and [8]. HOFMANN, E. “Supply Chain
Xgboost model. We found the Xgboost is the Management: Strategy, Planning and Operation,
most suitable model for the Citadel POS dataset. S. Chopra, P. Meindl. Elsevier Science”, 2013.
In future, the deep learning approach can be used
for sales forecasting by increasing the dataset [9]. HOLT, Charles. C.. Forecasting
size. Similarly, deep learning models can seasonals and trends by exponentially weighted
increase accuracy on large retail sales datasets. moving averages. International journal of
forecasting, Vol 20,Issue 1, Pg 5-10, 2004.
REFERENCES
[1] Álvarez-Díaz, Marcos, Manuel [10]. Hussain, Sadiq, Rasha Atallah,
González-Gómez, and María Soledad Amirrudin Kamsin, and Jiten Hazarika.
Otero-Giráldez. “Forecasting international “Classification, clustering and association rule
tourism demand using a non-linear mining in educational datasets using data mining
autoregressive neural network and genetic tools: A case study”. Computer Science On-line
programming”. Forecasting, Vol 1, Issue 1, Pg 7, Conference, Vol 3, Issue 7 , Pg 196-211 Springer,
2018. 2018.

[2] BALLON, R. 2004. Business [11]. KAUR, Manpreet. & KANG, Shivani.”
logistics/supply chain management. Market Basket Analysis: Identify the changing
Planning, organizing and controlling the supply trends of market data using association rule
chain. mining”. Procedia computer science,Vol 85, Pg
78-85, 2016.
[3] CATAL, C., KAAN, E., ARSLAN, B. &
AKBULUT, A. “Benchmarking of regression [12]. Lu, Chi-Jie. “Sales forecasting of
algorithms and time series analysis techniques computer products based on variable selection
for sales forecasting”. Balkan Journal of scheme and support vector regression”.
Electrical and Computer Engineering, Vol 7, Pg Neurocomputing, Vol 128, Pg 491-499, 2014
20-26, 2019.

[4] Chai, Tianfeng, and Roland R. Draxler. [13]. MENTZER, J. T. & MOON, M. A..
"Root mean square error RMSE or mean absolute Sales forecasting management: a demand
error MAE?–Arguments against avoiding RMSE management approach, Sage Publications, 2004.
in the literature." Geoscientific model
development, Vol 7, Issue 3, Pg 1247-1250, [14]. MÜLLER-NAVARRA, M.,
2014. LESSMANN, S. & VOß, S. Sales forecasting
with partial recurrent neural networks: Empirical
[5] Deo, Ravinesh C., Ozgur Kisi, and
Vijay P. Singh, “Drought forecasting in eastern insights and benchmarking results. 48th Hawaii
Australia using multivariate adaptive regression International Conference on System Sciences,
spline, least square support vector machine and IEEE, Pg 1108-1116, 2015.
M5Tree model”. Atmospheric Research, Vol 184,
Pg 149-175, 2017. [15]. Ofoegbu, Kenneth. A comparison of four
machine learning algorithms to predict product
[6]. Feng, Guorui, Guang-Bin Huang, sales in a retail store. Dublin Business School.
Qingping Lin, and Robert Ga. “Error minimized 2021.
extreme learning machine with growth of hidden
nodes and incremental learning”. IEEE [16]. Omar, Hani A., and Duen-Ren Liu.

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 42
“Enhancing sales forecasting by using neuro [18]. Shumway, Robert H., David S. Stoffer,
networks and the popularity of magazine article Robert H. Shumway, and David S. Stoffer. "State
titles”. Sixth International Conference on space models." Time series analysis and its
Genetic and Evolutionary Computing,. IEEE, Pg applications: with R examples ,Pg 89-384 ,2017.
577-580, 2012.
[19]. Sinaga, Kristina P., and Miin-Shen
[17]. Pavlyshenko, Bohdan M.. Yang.. “Unsupervised K-means clustering
Machine-learning models for sales time series algorithm”. IEEE access, Volume 8, Pg
forecasting. Data, Vol 4, Pg 15,2019. 80716-80727, 2020.

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 43

You might also like