0% found this document useful (0 votes)

10 views11 pages

A Predictive Analysis of Retail Sales Foreca Ing Using Machine Learning Techniques

This paper presents a predictive analysis of retail sales forecasting using various machine learning techniques, specifically focusing on the Citadel POS dataset from 2013 to 2018. The study evaluates different regression and time series models, with results indicating that the Xgboost model outperformed others, achieving the best performance metrics. The research highlights the importance of accurate sales forecasting for effective inventory management and business planning in the retail industry.

Uploaded by

karhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views11 pages

A Predictive Analysis of Retail Sales Foreca Ing Using Machine Learning Techniques

Uploaded by

karhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Sajawal et al.

LGURJCSIT 2022 SSN: 2521-0122 (Online)

ISSN: 2519-7991 (Print)

LGU Research Journal of doi: 10.54692/lgurjcsit.2022.0604399

Computer Science & IT Vol. 6 Issue 4, October – December 2022

A Predictive Analysis of Retail Sales Forecaﬆing using Machine

Learning Techniques

Muhammad Sajawal1, Sardar Usman2 , Hamed Sanad Alshaikh3, Asad Hayat4 and M.Usman Ashraf5*
1
Department of Computer Science & IT Lahore Leads University, Lahore, Pakiﬆan,
2
Department of Computer science, Grand Asian University Sialkot, Pakiﬆan,
3
College of Telecommunications and Electronic Jeddah Saudi Arabia,
4
Department of Computer Science Leads university Lahore, Pakiﬆan,
5
Department of Computer Science, GC Women University, Sialkot, Pakiﬆan

Email: m.usmanashraf@yahoo.com

ABSTRACT:
Sales forecaﬆing is vital to supply chain management and operations between retailer and
manufacturers in the retail induﬆry. The abundant growth of digital data has minimized the tradition-
al syﬆem and approaches to a specific tasks. Sales forecaﬆing is the moﬆ challenging task for the
retail induﬆry's inventory management, marketing, cuﬆomer service, and business financial
planning. In this paper, we performed a predictive analysis of retail sales of the Citadel POS dataset
using different machine-learning techniques. We implemented different regression (Linear regres-
sion, Random Foreﬆ Regression, Gradient Booﬆing Regression) and time series models (ARIMA
LSTM), models for sale forecaﬆing, and provided detailed predictive analysis and evaluation. The
dataset used in this research is obtained from Citadel POS (Point Of Sale) from 2013 to 2018, a
cloud-based application that facilitates retail ﬆores to carry out transactions, manage inventories,
cuﬆomers, vendors, view reports, manage reports, manage sales, and tender data locally. The results
show that Xgbooﬆ outperformed time series and other regression models and achieved the beﬆ perfor-
mance with an MAE of 0.516 and RMSE of 0.63.

KEYWORDS:Machine Learning, Time Series, Sales Forecaﬆing, Regression, Gradient Booﬆing,

LSTM, ARIMA, Random

1. INTRODUCTION cuﬆomer satisfaction, enhanced channel

Sales forecaﬆing is the moﬆ challenging task for relationships, and significant monetary savings.
inventory management, marketing, cuﬆomer There are different Back Propagation Neural
service and business financial planning for the Network (BPN) techniques for sales forecaﬆing
information technology chain ﬆore. For multiple due to their ability to capture functional relations
reasons, developing an accurate sales forecaﬆing among the empirical data. Still, there is difficult to
model is challenging. An over-forecaﬆing model control large parameters and has the risk of model
increases operation coﬆs and generates over-fitting. The support vector regression (SVR)
unnecessary products, and the under-forecaﬆing algorithm has been used for solving the
model loses cuﬆomer satisfaction and sales non-linear regression eﬆimation problem. So, the
opportunities [15]. Accurate and robuﬆ sales prediction result of SVR is better than the BPN
forecaﬆing results can lead to due to the capability to obtain a unique solution

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 33
among the empirical data. SVR has been moﬆly end users. Therefore, depending on the nature of
used for time series prediction, such as traffic the business, sales forecaﬆing can be done
flow prediction, financial time series forecaﬆing through human planning and
and wind speed prediction. But SVR cannot • minimize capital coﬆs. So, it depends upon the
show accurate results when many potential end users. Therefore, depending on the nature of
independent variables are considered. the business, sales forecaﬆing can be done
To overcome the problem, Multivariate Adaptive through human planning and
regression splines (MARS) are a suitable • ﬆatiﬆical models or by combining both
methodology for modelling complex nonlinear methods. This paper used the Partial Recurrent
and non-parameter regression problems. MARS Neural Networks (PRNN) ﬆatiﬆical model for
has much power for building models with huge sales forecaﬆing. The proposed methodology can
datasets like electricity price forecaﬆing, credit extract the pattern from paﬆ sales and facilitates
scoring and network intrusion detection [5]. future sales forecaﬆing.
Sales forecaﬆing is important for enterprises to
make business plans and gain competitive The aim of this research is to inveﬆigate the
advantages. Different time series methods various sales forecaﬆing methods executed in
contribute to sales forecaﬆing, but they only deal financial area and evaluate the performance of the
with traditional linear data and ignore nonlinear chosen machine learning algorithms to find the
data [1]. So, to overcome this traditional method, beﬆ suitable and efficient model for the chosen
many researchers use soft computing skills to data set. We have used machine learning-based
solve non-linear data problems like fuzzy neural regression models (Linear regression, random
networks, fuzzy logic, neural networks, foreﬆ and Xtreme Gradient booﬆing) and time
evolutionary algorithms, etc., for robuﬆ sales series models ( LSTM, ARIMA) for sale
forecaﬆing. Different sales forecaﬆing forecaﬆing using Citadel POS data set. Results
algorithms and ﬆatiﬆical models have been showed that Xtreme Gradient outperformed time
generated to solve problems like ARIMA model series models and other regression techniques.
that forecaﬆ within a few seconds based on
hundreds of hiﬆorical data points [18]. But these 2. LITERATURE REVIEW
models cannot process when complex data 2.1. Background
patterns are given to that model for sales The supply chain contains different business
forecaﬆing. Although ANN-based algorithms parties that share physical goods and cuﬆomer
can solve this problem, when we consider services related to goods and money. Supply chain
improvement in predication accuracy, these to be developed in two areas: Supply chain
models take time to complete simple sales execution and planning.
forecaﬆing. The ELM model minimizes the
learning time of ANN quickly. ELM can learn 2.1.1. Forecaﬆing Concept
much faﬆer with a higher performance than the Forecaﬆs are nothing but predictions. Maybe
traditional gradient-based learning algorithms. forecaﬆs of sunrise and sunset can be predictable
Still, it also reduces many difficulties faced by without any miﬆakes, but it is not the scenario in
gradient-based learning methods such as learning business. Business equations change as time goes
rate, ﬆopping criteria, the over-tuned problems, and hence prediction may give an error. [13]
learning epochs, and local minima. ELM is being Describes sales forecaﬆ as a projected future of
used in real-time applications such as real-time expected demand, given a set of environmental
controlling conditions. We should not confuse the planning
• syﬆems [12]. Sales forecaﬆing is essential to process and forecaﬆing process. Planning is only
the supply chain a managerial action that should be taken to meet
• management and operations between the or exceed the sales forecaﬆ. The right forecaﬆ
retailer and manufacturers. The manufacturer aims to predict demand perfectly. Forecaﬆing has
needs to predict the actual future demand to been used in all kinds of companies, service
inform production planning. Similarly, retailers sectors, and government organizations and as
need to predict sales for purchasing decisions and input to the planning project or set of activities.
• minimize capital coﬆs. So, it depends upon the [8] summarizes the characteriﬆics of sales
forecaﬆ as follows:

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 34
Forecaﬆs are always wrong; hence, one should impossible to forecaﬆ every product with the
always expect evaluation of errors in them. same time series technique, which is why we need
A long-term forecaﬆ usually is less accurate than different time series techniques for each product.
a short-time forecaﬆ. This is because of a more He also points out that many techniques are
significant ﬆandard deviation of error relative to available in the general category of time series
the mean than short-term forecaﬆs. analysis. Time series techniques have common
Aggregate forecaﬆs usually are more accurate characteriﬆics and endogenous techniques. It
than disaggregate forecaﬆs. The aggregate means that the time series technique looks at the
forecaﬆ contains a smaller ﬆandard deviation of patterns of actual sales hiﬆory. These patterns can
error than disaggregate forecaﬆs. be identified and projected to derive a forecaﬆ.
The greater the diﬆortions of information in the Time series techniques look only at patterns that
supply chain, the higher the errors in the sales are parts of the actual hiﬆory. Which time series
forecaﬆ. technique used four basic time series patterns to
examine them: level, trend, seasonality, and noise.
2.1.2. Sales forecaﬆing need in Planning
Manufacturing induﬆries work on principle to 2.1.4. Machine Learning Techniques
satisfy cuﬆomer demand by appropriate supply. There are three main machine learning algorithms
According to [13], companies consider sales i.e., Supervised, Unsupervised, and
forecaﬆing as an integral part of this process. End Reinforcement Learning.
cuﬆomers create demand, and activities like In supervised learning, we are given a labelled
promotions can increase it. Hence marketing data set (labeled training data), and the desired
focuses on end cuﬆomers to create demand. The outcome is already known, where every pair of
sales department ease the same by different training data has a relationship. Supervised
ﬆrategies such as servicing other parties in this learning is where you have input variables (x) and
ﬆreamline like wholesaler and retailers. Supply an output variable (Y), and you use an algorithm
should be enough to meet demand. Different to learn the mapping function from the input to the
management functions like manufacturing, output. Random foreﬆ, linear regression, and long
purchasing and logiﬆics work together to short-term memory are supervised
maintain the supply. machine-learning techniques ([17].
In the unsupervised machine learning approach,
2.1.3. Forecaﬆing Methods and Techniques the model is trained by using unlabeled or
Several ﬆandardized methods for forecaﬆing are non-classified data objects. The unsupervised
available. They differ in relative performance in learning approach is more complex than
forecaﬆing over the level of quantitative supervised learning because neither the trained
sophiﬆication used and the logic base (hiﬆorical model nor the machine uses a training dataset in
data, expert opinion, or surveys) from which the this technique—two main types of unsupervised
forecaﬆ is derived. Those methods could be machine learning are Association Rule Mining
categorized into three groups: hiﬆorical and Cluﬆering [10].
projection, qualitative, and casual[2].
[2] ﬆates that “when a reasonable amount of 2.1.5. Association Rule Mining
hiﬆorical data is available and the trend and In this unsupervised technique, Association rule
seasonal variations in the time series are ﬆable mining is a technique to identify underlying
and well defined, projecting these data into the relations between different items. Take an
future can be an effective way of forecaﬆing for example of a Super Market where cuﬆomers can
the short term”. He also mentions that the buy various items. For inﬆance, mothers with
quantitative nature of the time series supports the babies buy baby products such as milk and
use of mathematical and ﬆatiﬆical models as a diapers. In short, transactions involve a pattern
primary forecaﬆing tool. By using such tools, [11].
accuracy can be reached for forecaﬆed periods.
These methods are moﬆ appropriate when the 2.1.6. Cluﬆering
environmental situation is ﬆable, and the primary Cluﬆering is the task of dividing the population or
demand pattern does not vary significantly from data points into several groups such that data
year to year. more like other data points in the same group than
According to [13], it is those in other groups. Simply, cluﬆering is to

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 35
segregate groups with similar traits [19]. Recently, [6] proposed a new learning algorithm
called Extreme Learning Machine (ELM) for
2.2. Related Work single-hidden-layer feed-forward neural networks
[3] predicted the actual sales accurately by using (SLFNs), which randomly chooses hidden nodes
different machine learning algorithms like linear and analytically determines the output weights of
regression, Random Foreﬆ Regression, and time SLFN. The author predicted the book sales using
series techniques like ARIMA, Seasonal Arima, the ELM by combining with the ﬆatiﬆical method
Non-Seasonal Arima, and Seasonal ETS. They for a famous e-commerce company in China.
used the Walmart’s public online sales data to [14] They proposed methodologies to extract the
predict sales using different regression pattern from paﬆ sales and the facility's future
algorithms in Azure Machine Learning (ML) sales forecaﬆing. They used the ﬆatiﬆical model
Studio. Several time series analysis methods of Partial Recurrent Neural Networks (PRNN) for
were implemented manually using R packages sales forecaﬆing. They used it as a tool for
through R programming language. They selected business planning, and after that, they performed
the beﬆ model to predict the sales, made a web an empirical benchmark that was the prevailing
service of that model, and deployed it on the approach in forecaﬆing. Real-world sales series
Azure Cloud platform. Azure sent the output in show non-linear patterns for different reasons like
the form of JSON format. After the experimental trend, seasonality, or introduction of a new
results, the author identified the beﬆ method, product model. That’s why PRNN can handle
which was that the regression techniques provide nonlinear and well-suited models to solve sales
better performance compared to the time series forecaﬆing problems.
analysis approaches. To overcome the problem, [9] used the Exponentially Weighted Moving
Multivariate Adaptive regression splines Averages (EWMA) model to measure the
(MARS) are a suitable methodology for seasonal impact on sales trends. They combined
modelling complex nonlinear and non-parameter two feature cluﬆer-related query algorithms and
regression problems. MARS has much power for seasonal time series sales behavior. They
building model which has huge dataset like compared four models query feature only, a
electricity price forecaﬆing, credit scoring and seasonal feature without EWMA model, a
network intrusion detection. seasonal feature with EWMA model, proposed
[12] proposed the hybrid two-ﬆage model using model seasonal feature with EWMA combined
MARS and SVR by focusing on the drawbacks with query feature and showed the beﬆ
mentioned above and for the sales prediction performance by developing the proposed model.
accurately. To evaluate the performance of the
proposed hybrid sales forecaﬆing procedure, 3. METHODOLOGY
three IT product sales data, i.e., notebook (NB), Firﬆ, we learned about current relevant research
LCD monitor and motherboard (MB), collected to find the results. These literature review results
from an IT chain ﬆore in Taiwan are used as are being used as input to our analysis of retail
illuﬆrative examples. sales using machine learning techniques. Our
[16] proposed a model based on Back main goal in this research is to evaluate the
Propagation Neural Network (BPNN) to improve performance of machine learning models like
sales forecaﬆing by using popularity information linear regression, Random Foreﬆ regression, and
in magazines through the Google Search engine. Xtreme Booﬆing Regression on the sales data
According to the Author's view, popular content from the point of sale. Figure 4.1 shows the
in a magazine can booﬆ sales. In the proposed complete methodology of the proposed solution.
model, he used popular celebrity words as
keywords to interact with the user for sales
forecaﬆing. They used some tools to eﬆimate the
popularity of words like Digg allows users to
submit links to news. They used the nonlinear
hiﬆorical data to evaluate the forecaﬆing
performance that our proposed model can
improve sales forecaﬆing. They used the Chinese Figure 4.1: Methodology
publication magazine data.

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 36
This research work is performed using python ﬆructure. The result can be explained as follows:
programming language and multiple libraries • P-value > 0.05: Fail to reject the null hypothesis
like pandas, Numpy, matplotlib, seaborn, and (H0); the data has a unit root and is non-ﬆationary.
sklearn. • P-value <= 0.05: Reject the null hypothesis
(H0); the data does not have a unit root and is
3.1. Dataset description ﬆationary.
This paper presents a methodology implemented
for a retail Point of Sale syﬆem in a teﬆ set of |S| 3.3. Feature Selection
=32 locations in early 2007. We have a citadel As many factors play important role for machine
point of sales syﬆem that has all the records learning success. Feature selection is a significant
related to sales. We collected the data from the factor that hugely influences machine learning
different tables using SQL queries. Multiple model performance. It helps from overfitting by
ﬆores contain different items for sale. Each ﬆore removing data redundancy, reducing the training
has five ﬆations, and we took one cuﬆomer time and improving the model's accuracy. We
hiﬆory data, has 228 invoices contains average used different approaches to overcome these
five items. We collected the data from 2013 to problems, like the correlation method. Feature
2018 data and performed the teﬆing on the 2020 sets having negative co-relations with target
data. Train data contains item id, ﬆore number, variables have been removed during the feature
total sales items, and total sales of each item. The selection process.
training data set contains a total of 87847 rows.
3.4. Implementation
3.2. Data Pre-Processing We implemented the following model to see our
Several ﬆandardized methods for forecaﬆing are results and compared the performance of these
available. They differ in terms of the relative models:
performance in forecaﬆing over the level of • Linear Regression Model
quantitative sophiﬆication used and the logic • ARIMA
base from which the forecaﬆ is derived. We • Random Foreﬆ Regression
converted the data into days, week, year then we • LSTM model
check the outlier, all missing or null values and • Gradient Booﬆing Regression
removed all the outlier and missing values. We Mean Absolute Error is a ﬆandard measure of
refined the dataset to perform teﬆing. Those forecaﬆ error in the time series analysis.MAE is
methods could be categorized into three different one of the many metrics for evaluating the
groups: hiﬆorical projection, qualitative, and machine learning model's performance. Two
casual [2]. metrics, mean absolute error and root mean
square error, are evaluated for each machine
3.2.1. Augmented Dickey-Fuller Teﬆ learning regression model. The mean absolute is a
This is the ﬆatiﬆical teﬆ used to teﬆ the null quantity used to measure how close forecaﬆs or
hypothesis that a unit root is present in an predictions are to the eventual outcomes. As the
autoregressive model. name suggeﬆs, the mean absolute error is an
The teﬆ's null hypothesis is that a unit root can average of the absolute errors [4]. Lowering the
represent the time series and is not ﬆationary (has error implies greater accuracy of the model.
some time-dependent ﬆructure). The alternate
hypothesis is that the time series is Stationary [7].

3.2.2. Null Hypothesis (H0) (1)

If failed to be rejected, it suggeﬆs the time series Where yi represents actual values and ˆyi
has a unit root, meaning it is non-ﬆationary. It has represents the forecaﬆed values.
some time-dependent ﬆructure. Root Mean Square Error (RMSE) is the square
root of the mean square error. It is the root of the
3.2.3. Alternate Hypothesis (H1) average of squared diﬀerences between prediction
The null hypothesis is rejected; it suggeﬆs the and observation. Lower the error implies greater
time series does not have a unit root, meaning it the accuracy of the model [4].
is ﬆationary. It does not have a time-dependent

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 37
Figure 5.2 represents item-wise sales of diﬀerent
(2) ﬆores. We took sales data from 2013 to 2018
containing Ladies, Men, Shoes, Misc., Kids,
Jewelry, Furniture, Electrical and Bins.
Where yi represents actual values and ˆyi The dataset contains 27343 rows having attributes
represents the forecaﬆed values. item id, item name, the total number of items
sales, and total sales from each item.
4. RESULTS AND DISCUSSION
To ﬁnd the result, we muﬆ have to check
ﬆationary and non-ﬆationary time series. Non
ﬆationary data is not predictable and cannot be
modeled or forecaﬆed due to change in mean,
variance, and co-relation. So, we muﬆ convert it
into ﬆationary time series for reliable results.

4.1. Citadel POS Dataset

The Citadel POS is a point of sales syﬆem
working in the US. There are 32 locations with
different items for sale. Each item has a different Figure 5.3: Stationary Data Checkpoint Using
price. Two types of cuﬆomers visit these ﬆores, Rolling Mean and Rolling STD
i.e. loyalty and non-loyalty cuﬆomers. Loyalty
cuﬆomers are regular cuﬆomers with frequent Fig 5.3 shows the rolling and ﬆandard deviations
shopping, but non-loyalty are not regular of sales data. Dickey Fuller teﬆ is also used to
cuﬆomers with occasional visits. Fig 5.1 shows check ﬆatic data. By visualizing the data, we can
abﬆract ﬆore-wise data containing a total check whether the data is ﬆationary or not.
invoice, sales tax and grand total, including sales Stationary data means in which the mean value
and sales without including tax. continuously increase with time. If the p-value is
less than the significance level, which is 5%, or if
the ﬆatic teﬆ value is greater than the critical
value, then our data would be ﬆationary. Here, our
p-value is 5.70503.

4.2 Predictive Analysis

4.2.1 Linear Regression
Linear regression is a machine learning algorithm
based on supervised learning techniques. It is
Figure 5.1: Store Wise Sales used to perform regression tasks and to predict the
dependent variable (y) based on the independent
Fig 5.1 represents the overall 2019 ﬆore-wise value (x).
sales data. Blue lines show the total invoices of y=m*x +c (3)
each ﬆore, the red line represents the sales tax on
each invoice, the green line shows the total sales We trained our model with sales data. Firﬆ, we
on each ﬆore including tax, and the dark blue line got the retail sales dataset from 2013 to 2018 to
represents the sales without including tax. The perform the prediction task. We performed some
ﬁgure shows that the Bell ﬆore has the higheﬆ pre-processing tasks on the dataset. After training
sales for 2019. the dataset, we made sure that using a small data
set was working ﬁne then we performed the task
on a large dataset.
Table 5.1 represents the Root Mean Squared Error
and Mean Absolute Error acquired by Linear
Regression on the validation teﬆ. In this table,
RMSE is the ﬆandard deviation of the prediction
error, which is the measure of how far data points
from the regression line, which value is 0.96849,
Figure 5.2: Item-Wise Sales
LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 38
and MAE measures the average magnitude of the prediction error, which means the measure of how
error without considering the directions between far data points from the regression line which a
the actual and prediction observation which value is 1.04959, and MAE measures the average
value is0.82136. magnitude of the error without considering the
directions between the actual and prediction
Table 5.1: Model Performance Using Linear observation which value is 1.01265.
Regression
Table 5.2: ARIMA Model Result
Index Score
Index Score
RMSE 0.96849
RMSE 1.04959
MAE 0.82136
MAE 1.01265

Figure 5.5 represents the actual and forecaﬆed

sales of the target variant obtained using the
ARIMA regression model, where the blue line
represents the actual sales value. The red line
represents forecaﬆed sales of the targeted variant.

Figure 5.4: Linear Regression Sales

Forecaﬆing

Figure 5.4 represents the actual and forecaﬆed

sales of the target variant obtained using the
linear regression model, where the blue line
represents the actual sales value. The red line
represents forecaﬆed sales of the targeted Figure 5.5: Sales Forecaﬆing Using ARIMA
variant. model
Figure 5.4 represents the actual and forecaﬆed
sales of the target variant obtained using the 4.4: LSTM Model
linear regression model, where the blue line Multiple sequence predictions problems have
represents the actual sales value. The red line been for a long time; due to these types of
represents forecaﬆed sales of the targeted problems, it’s verydifficult to solve the time series
variant. problem. To handle the sequence problem in
dataset LSTM model has been applied to predict
4.3 ARIMA Model the sales. It is used to predict the sales based on
This model is used to forecaﬆ sales; it’s the previous hiﬆory dataset of retail sales. For
ﬆatiﬆical method for time series sales. There are example, they predict sales to find patterns in the
the following ARIMA model parameters: ﬆock market’s data.
P: Trend autoregression order. D: Trend
difference order. Table 5.3: LSTM Model Performance Results
Q: Trend moving average order
Four other differential seasonal elements are not
part of the ARIMA model. It can be handled Index Score
using the SARIMA model like SARIMA (p, d, q)
(P, D, Q) m. RMSE 0.99964
Table 5.2 represents the Root Mean Squared
Error and Mean Absolute Error acquired by the
ARIMA model on the validation teﬆ. In this MAE 0.81910
table, RMSE is the ﬆandard deviation of the

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 39
Table 5.3 represents the Root Mean Squared the prediction error, which means a measure of
Error and Mean Absolute Error acquired by Long how far data points from the regression line
Short-Term Memory (LSTM) Regression on the whose value is 0.69460, and MAE measures the
validation teﬆ. In this table, RMSE is the average magnitude of the error without
ﬆandard deviation of the prediction error, which considering the directions between the actual and
is a measure of how far data points from the prediction observation which value is 0.59121.
regression line whose value is 0.99964, and MAE
measures the average magnitude of the error
without considering the directions between the
actual and prediction observation which value is
0.81910.

Figure 5.7: Random Foreﬆ Regression Sales

Forecaﬆing

Figure 5.7 represents the actual and forecaﬆed

sales of the target variant obtained using the
Figure 5.6: LSTM Forecaﬆing Performance Random Foreﬆ regression model, where the blue
line represents the actual sales value. Red line
Figure 5.6 represents the actual and forecaﬆed represents forecaﬆed sales of the targeted variant.
sales of the target variant obtained using the
LSTM regression model, where the blue line 4.6. Extreme Gradient Booﬆing Regression
represents the actual sales value. The red line The Xgbooﬆ algorithm involves three concepts:
represents forecaﬆed sales of the targeted extreme, gradient, and booﬆing. Starting from
variant. basics booﬆing is one of the syﬆematic ensemble
methods that aims to convert weak learners
4.5. Random Foreﬆ Regression (regression trees in this case as a tree-based
We used to Random Foreﬆ regression model to Xgbooﬆ model; there is also a linear type) into
improve our results. Random foreﬆ is a ﬆronger learners to obtain more accurate
supervised machine-learning technique using a predictions. SMAPE error score is 10.14 %.
decision-tree mechanism when training the Table 5.5 represents the Root Mean Squared Error
model. It is used to improve the computation and Mean Absolute Error acquired by Gradient
power. Build multiple models (decision trees) Booﬆing Regression on the validation teﬆ. In this
using random training dataset with replacement table, RMSE is the ﬆandard deviation of the
and then compute the accuracy of each model. prediction error, which means a measure of how
And increase the weight of the model that has far data points from the regression line whose
maximum accuracy. value is 0.63010, and MAE measures the average
magnitude of the error without considering the
Table 5.4: Random Foreﬆ Performance Results directions between the actual and prediction
observation which value is 0.51599.

Index Score Table 5.5: Xgbooﬆ Model Performance Results

RMSE 0.69460
Index Score
MAE 0.59121
RMSE 0.63010

Table 5.4 represents the Root Mean Squared

Error and Mean Absolute Error acquired by MAE 0.51599
Random Foreﬆ Regression on the validation teﬆ.
In this table, RMSE is the ﬆandard deviation of

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 40
worﬆ performance with the higher error in both
matrices.

Table 5.6: Regression Model Error Comparison

Figure 5.8: Xgbooﬆ Model Sales Forecaﬆing
Sr.# Index RMSE MAE
Figure 5.8 represents the target variant's actual
0 Random Foreﬆ 0.69460 0.59121
and forecaﬆed sales, obtained using the Gradient
Booﬆing regression model, where the blue line Linear
1 0.96849 0.82136
represents the actual sales value. The red line Regression
represents forecaﬆed sales of the targeted 2 ARIMA 1.04959 1.01265
variant.
3 LSTM 0.99964 0.81910
4.7. Performance evaluation and
comparison results 4 Xgbooﬆ 0.63010 0.51599
We have implemented different machine learning
algorithm on dataset of retail sales. We applied
different machine learning algorithms. We
implemented two evaluation Root Mean Squared
Error and Mean Absolute Error to check the
performance of our different machine learning
models. When we compared all the different
models, we concluded that Xgbooﬆ is the beﬆ
suitable model for our retail sales dataset based
on the performance evaluation of all the models.

Figure 5.10: Comparison Machine Learning

Model Error Results

Figure 5.10 represents the Mean absolute error

and Root Mean Squared error from the results of
the random foreﬆ regression, linear regression,
Figure 5.9: Model Prediction Comparison ARIMA, LSTM, random foreﬆ and Xgbooﬆ.
From the figure, Xgbooﬆ has the leaﬆ RMSE and
Figure 5.9 represents the comparison of the MAE and is the beﬆ-performed algorithm for
different model predictions. We implemented point-of-sales retail sales data.
different models on the sales dataset with more
than 87746 rows. Here, the blue line shows the 5. CONCLUSION
original value, the red line represents the linear In this paper, we concluded that sales forecaﬆing
regression result, the green line represents the is the moﬆ challenging task for inventory
Random Foreﬆ regression, and the orange line management, marketing, cuﬆomer service and
represents the Xgbooﬆ results. Xgbooﬆ is the business financial planning for the information
beﬆ suitable model to predict future sales and technology chain ﬆore. Sales forecaﬆing is an
shows the neareﬆ prediction values compared to important part of supply chain management and
other models like LSTM, Linear regression, and operations between retailers and manufacturers.
Random Foreﬆ Regression. The manufacturer needs to predict the actual
Table 5.6 shows the results and performance of future demand to inform production planning.
the models with their basic configuration and Similarly, retailers need to predict sales for
default parameters. From this table, it’s clear that purchasing decisions and minimize capital coﬆs.
Gradient Booﬆing regression and Random Foreﬆ Therefore, depending on the nature of the
performed well with both Matrices RMSE and business, sales forecaﬆing can be done through
MAE, Xgbooﬆ has a minor error in sales human planning and ﬆatiﬆical model or by
forecaﬆing compared to the linear regression, combining both methods. Developing sales
ARIMA, and LSTM. ARIMA model showed the forecaﬆing an accurate model is challenging for

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 41
reasons like over and under-forecaﬆing. Transactions on Neural Networks, Vol 20, Issue
Therefore, accurate and robuﬆ sales forecaﬆing 8, Pg 1352-1357, 2009.
results can lead to cuﬆomer satisfaction,
enhanced channel relationships, and signiﬁcant [7]. Glynn, J., Perera, N. and Verma. “Unit
monetary savings. We applied time series models root teﬆs and ﬆructural breaks: A survey with
like LSTM and ARIMA models to predict sales applications”, 2007.
and machine learning algorithms like the Linear
Regression model, Random Foreﬆ model and [8]. HOFMANN, E. “Supply Chain
Xgbooﬆ model. We found the Xgbooﬆ is the Management: Strategy, Planning and Operation,
moﬆ suitable model for the Citadel POS dataset. S. Chopra, P. Meindl. Elsevier Science”, 2013.
In future, the deep learning approach can be used
for sales forecaﬆing by increasing the dataset [9]. HOLT, Charles. C.. Forecaﬆing
size. Similarly, deep learning models can seasonals and trends by exponentially weighted
increase accuracy on large retail sales datasets. moving averages. International journal of
forecaﬆing, Vol 20,Issue 1, Pg 5-10, 2004.
REFERENCES
[1] Álvarez-Díaz, Marcos, Manuel [10]. Hussain, Sadiq, Rasha Atallah,
González-Gómez, and María Soledad Amirrudin Kamsin, and Jiten Hazarika.
Otero-Giráldez. “Forecaﬆing international “Classiﬁcation, cluﬆering and association rule
tourism demand using a non-linear mining in educational datasets using data mining
autoregressive neural network and genetic tools: A case ﬆudy”. Computer Science On-line
programming”. Forecaﬆing, Vol 1, Issue 1, Pg 7, Conference, Vol 3, Issue 7 , Pg 196-211 Springer,
2018. 2018.

[2] BALLON, R. 2004. Business [11]. KAUR, Manpreet. & KANG, Shivani.”
logiﬆics/supply chain management. Market Basket Analysis: Identify the changing
Planning, organizing and controlling the supply trends of market data using association rule
chain. mining”. Procedia computer science,Vol 85, Pg
78-85, 2016.
[3] CATAL, C., KAAN, E., ARSLAN, B. &
AKBULUT, A. “Benchmarking of regression [12]. Lu, Chi-Jie. “Sales forecaﬆing of
algorithms and time series analysis techniques computer products based on variable selection
for sales forecaﬆing”. Balkan Journal of scheme and support vector regression”.
Electrical and Computer Engineering, Vol 7, Pg Neurocomputing, Vol 128, Pg 491-499, 2014
20-26, 2019.

[4] Chai, Tianfeng, and Roland R. Draxler. [13]. MENTZER, J. T. & MOON, M. A..
"Root mean square error RMSE or mean absolute Sales forecaﬆing management: a demand
error MAE?–Arguments againﬆ avoiding RMSE management approach, Sage Publications, 2004.
in the literature." Geoscientiﬁc model
development, Vol 7, Issue 3, Pg 1247-1250, [14]. MÜLLER-NAVARRA, M.,
2014. LESSMANN, S. & VOß, S. Sales forecaﬆing
with partial recurrent neural networks: Empirical
[5] Deo, Ravinesh C., Ozgur Kisi, and
Vijay P. Singh, “Drought forecaﬆing in eaﬆern insights and benchmarking results. 48th Hawaii
Auﬆralia using multivariate adaptive regression International Conference on Syﬆem Sciences,
spline, leaﬆ square support vector machine and IEEE, Pg 1108-1116, 2015.
M5Tree model”. Atmospheric Research, Vol 184,
Pg 149-175, 2017. [15]. Ofoegbu, Kenneth. A comparison of four
machine learning algorithms to predict product
[6]. Feng, Guorui, Guang-Bin Huang, sales in a retail ﬆore. Dublin Business School.
Qingping Lin, and Robert Ga. “Error minimized 2021.
extreme learning machine with growth of hidden
nodes and incremental learning”. IEEE [16]. Omar, Hani A., and Duen-Ren Liu.

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 42
“Enhancing sales forecaﬆing by using neuro [18]. Shumway, Robert H., David S. Stoﬀer,
networks and the popularity of magazine article Robert H. Shumway, and David S. Stoﬀer. "State
titles”. Sixth International Conference on space models." Time series analysis and its
Genetic and Evolutionary Computing,. IEEE, Pg applications: with R examples ,Pg 89-384 ,2017.
577-580, 2012.
[19]. Sinaga, Kriﬆina P., and Miin-Shen
[17]. Pavlyshenko, Bohdan M.. Yang.. “Unsupervised K-means cluﬆering
Machine-learning models for sales time series algorithm”. IEEE access, Volume 8, Pg
forecaﬆing. Data, Vol 4, Pg 15,2019. 80716-80727, 2020.

LGU Research Journal of Computer Science & Information Technology 6(4) LGURJCSIT 43

399 ArticleText 844 1 10 20230203
No ratings yet
399 ArticleText 844 1 10 20230203
12 pages
Machine Learning in Sales Forecasting
No ratings yet
Machine Learning in Sales Forecasting
9 pages
Doc3 Main Report
No ratings yet
Doc3 Main Report
60 pages
ForecastingRetailSalesusingMachine Learning Models
No ratings yet
ForecastingRetailSalesusingMachine Learning Models
34 pages
Analysis of Machine Learning Model For Predicting Sales Forecasting
No ratings yet
Analysis of Machine Learning Model For Predicting Sales Forecasting
6 pages
Sales Forecasting Elsvier
No ratings yet
Sales Forecasting Elsvier
19 pages
Sales Forecast Paper
No ratings yet
Sales Forecast Paper
8 pages
ICT801 - A4 - Literature Review
No ratings yet
ICT801 - A4 - Literature Review
8 pages
Salespredmmmm
No ratings yet
Salespredmmmm
15 pages
Sales Prediction
No ratings yet
Sales Prediction
37 pages
Predicting The Future of Sales: A Machine Learning Analysis of Rossman Store Sales
No ratings yet
Predicting The Future of Sales: A Machine Learning Analysis of Rossman Store Sales
11 pages
Sales-Forecasting of Retail Stores Using Machine Learning Techniques
No ratings yet
Sales-Forecasting of Retail Stores Using Machine Learning Techniques
7 pages
Analytical Methods of Machine Learning Model For E-Commerce Sales Analysis and Prediction
No ratings yet
Analytical Methods of Machine Learning Model For E-Commerce Sales Analysis and Prediction
6 pages
Transformer Bert
No ratings yet
Transformer Bert
14 pages
Analyzing E Commerce Market Data Using Deep Learni
No ratings yet
Analyzing E Commerce Market Data Using Deep Learni
22 pages
Application of Big Data Analysis in Sales Forecast
No ratings yet
Application of Big Data Analysis in Sales Forecast
7 pages
Retail Sales Prediction Report
No ratings yet
Retail Sales Prediction Report
9 pages
Sales Forecasting Methods
No ratings yet
Sales Forecasting Methods
3 pages
Sales Forecasting with ML
No ratings yet
Sales Forecasting with ML
9 pages
Cracan Thesis
No ratings yet
Cracan Thesis
35 pages
C A M M L M R S F: Omparative Nalysis of Odern Achine Earning Odels For Etail Ales Orecasting
No ratings yet
C A M M L M R S F: Omparative Nalysis of Odern Achine Earning Odels For Etail Ales Orecasting
20 pages
Gargee
No ratings yet
Gargee
9 pages
RP 3
No ratings yet
RP 3
12 pages
Automation of Best-Fit Model Selection Using A Bag of Machine Learning Libraries For Sales Forecasting
No ratings yet
Automation of Best-Fit Model Selection Using A Bag of Machine Learning Libraries For Sales Forecasting
10 pages
Comparative Analysis of Supervised Machine Learnin
No ratings yet
Comparative Analysis of Supervised Machine Learnin
10 pages
Machine Learning Models For Forecasting and Estimation of Business Operations
No ratings yet
Machine Learning Models For Forecasting and Estimation of Business Operations
8 pages
Grid Search Optimization (GSO) Based Future Sales Prediction For Big Mart
No ratings yet
Grid Search Optimization (GSO) Based Future Sales Prediction For Big Mart
7 pages
Business Forecasting Techniques Using Machine Learning Using R Programming
No ratings yet
Business Forecasting Techniques Using Machine Learning Using R Programming
11 pages
Intern Report
No ratings yet
Intern Report
17 pages
Pavlyshenko (2019) Machine-Learning Models For Sales Time Series Forecasting. Data-04-00015-V2
No ratings yet
Pavlyshenko (2019) Machine-Learning Models For Sales Time Series Forecasting. Data-04-00015-V2
11 pages
Machine-Learning Models For Sales Time Series Forecasting: Bohdan M. Pavlyshenko
No ratings yet
Machine-Learning Models For Sales Time Series Forecasting: Bohdan M. Pavlyshenko
11 pages
Machine-Learning Models For Sales Time Series Forecasting: Bohdan M. Pavlyshenko
No ratings yet
Machine-Learning Models For Sales Time Series Forecasting: Bohdan M. Pavlyshenko
11 pages
PPIR
No ratings yet
PPIR
8 pages
Ifmpt2024 343 349
No ratings yet
Ifmpt2024 343 349
7 pages
Predictive Analytics For Sales Forecasting - A Data-Driven Approach
No ratings yet
Predictive Analytics For Sales Forecasting - A Data-Driven Approach
3 pages
Basepaper 3
No ratings yet
Basepaper 3
14 pages
An Effective Predicting E Commerce Sales
No ratings yet
An Effective Predicting E Commerce Sales
11 pages
JICET-Abdullah Bin Tayyab
No ratings yet
JICET-Abdullah Bin Tayyab
11 pages
Main Report
No ratings yet
Main Report
67 pages
A Comparative Study of Demand Forecasting Models For A Multi Channel Retail Company: A Novel Hybrid Machine Learning Approach
No ratings yet
A Comparative Study of Demand Forecasting Models For A Multi Channel Retail Company: A Novel Hybrid Machine Learning Approach
22 pages
Final PBL of Aaryan & Satyam
No ratings yet
Final PBL of Aaryan & Satyam
19 pages
Improving Sales Forecasting Accuracy: A Tensor Factorization Approach With Demand Awareness
No ratings yet
Improving Sales Forecasting Accuracy: A Tensor Factorization Approach With Demand Awareness
30 pages
Singh 2020 J. Phys. Conf. Ser. 1712 012042
No ratings yet
Singh 2020 J. Phys. Conf. Ser. 1712 012042
9 pages
A Selection of Advanced Technologies For Demand Forecasting in The Retail Industry
No ratings yet
A Selection of Advanced Technologies For Demand Forecasting in The Retail Industry
4 pages
Bigmart Sales Using Machine Learning With Data Analysis
No ratings yet
Bigmart Sales Using Machine Learning With Data Analysis
5 pages
Business Forecasting Insights
No ratings yet
Business Forecasting Insights
24 pages
Retail Sales Forecasting Model
No ratings yet
Retail Sales Forecasting Model
8 pages
Neba 2672024 AJPAS118179
No ratings yet
Neba 2672024 AJPAS118179
24 pages
Finaal Project
No ratings yet
Finaal Project
13 pages
Demand Forecasting in Retail Industry-Dataset
No ratings yet
Demand Forecasting in Retail Industry-Dataset
5 pages
Applsci 12 07081
No ratings yet
Applsci 12 07081
17 pages
BMSP-ML: Big Mart Sales Prediction Using Different Machine Learning Techniques
No ratings yet
BMSP-ML: Big Mart Sales Prediction Using Different Machine Learning Techniques
10 pages
Seasonal Sales Forecasting with ML
No ratings yet
Seasonal Sales Forecasting with ML
16 pages
Sales Prediction with Machine Learning
No ratings yet
Sales Prediction with Machine Learning
4 pages
Conference Paper Final
No ratings yet
Conference Paper Final
8 pages
Sales Prediction Using Machine Learning: R. Praveen D. Praveen Kumar A. Prince Sam and G. Sivakama Sundari
No ratings yet
Sales Prediction Using Machine Learning: R. Praveen D. Praveen Kumar A. Prince Sam and G. Sivakama Sundari
8 pages
Retail Sales Forecasting Guide
No ratings yet
Retail Sales Forecasting Guide
4 pages
Salesforecast Rashmi
No ratings yet
Salesforecast Rashmi
5 pages
AI-Driven Predictive Analytics For Grocery Retail: A Chatbot-Based Approach
No ratings yet
AI-Driven Predictive Analytics For Grocery Retail: A Chatbot-Based Approach
12 pages
Proposal Template
No ratings yet
Proposal Template
2 pages
Predictive Analytics For Sales Forecasting in Enterprise Resource Planning (Erp) Systems Using Machine Learning Technique
No ratings yet
Predictive Analytics For Sales Forecasting in Enterprise Resource Planning (Erp) Systems Using Machine Learning Technique
12 pages
ICT709 - A3 - Student 1 - Self-Reflection
No ratings yet
ICT709 - A3 - Student 1 - Self-Reflection
1 page
SA Tool - BSBPEF301 - Organise Personal Work Priorities - V Feb 2024
No ratings yet
SA Tool - BSBPEF301 - Organise Personal Work Priorities - V Feb 2024
30 pages
Evaluation Report
No ratings yet
Evaluation Report
4 pages
Predictive Analytics in E-Commerce: Analyzing Customer Behavior To Enhance Sales Forecasting
No ratings yet
Predictive Analytics in E-Commerce: Analyzing Customer Behavior To Enhance Sales Forecasting
8 pages
Ict704 A1
No ratings yet
Ict704 A1
10 pages
11 XII December 2023
No ratings yet
11 XII December 2023
10 pages
BSBHRM613 - Student Guide
No ratings yet
BSBHRM613 - Student Guide
28 pages
HR 1642 Eatp 2024december
No ratings yet
HR 1642 Eatp 2024december
7 pages
ITC571 Emerging Technology Assessment 2 Copy1
No ratings yet
ITC571 Emerging Technology Assessment 2 Copy1
12 pages
ICT709 - A3 - Student 4 - Peer Review
No ratings yet
ICT709 - A3 - Student 4 - Peer Review
1 page
Research ReportMethodology - Template
No ratings yet
Research ReportMethodology - Template
2 pages
Ma Nasasai Dip 24 Bsbtwk502-A1
No ratings yet
Ma Nasasai Dip 24 Bsbtwk502-A1
25 pages
Student 3
No ratings yet
Student 3
5 pages
MA - GUdith - VIT - 24 - ICT711 - Peer Review and Self-Relection Report
No ratings yet
MA - GUdith - VIT - 24 - ICT711 - Peer Review and Self-Relection Report
8 pages
MA FSandhFv VIT 23 - A3 - Part C
No ratings yet
MA FSandhFv VIT 23 - A3 - Part C
23 pages
Peer Review and Self Reflection - Member 4
No ratings yet
Peer Review and Self Reflection - Member 4
4 pages
Peer Review and Self Reflection - Member 3
No ratings yet
Peer Review and Self Reflection - Member 3
4 pages
Set 1
No ratings yet
Set 1
2 pages
Preparation Guide For AI
No ratings yet
Preparation Guide For AI
7 pages
HW 6
No ratings yet
HW 6
1 page
Impact of Multitasking On Employees: Productivity at Workplace
No ratings yet
Impact of Multitasking On Employees: Productivity at Workplace
41 pages
Interaction and Nonlinear Effects in Structural Equation Modeling - 1st Edition Authorized Download
100% (17)
Interaction and Nonlinear Effects in Structural Equation Modeling - 1st Edition Authorized Download
17 pages
Sample Size Calculations For Evaluating Mediation
No ratings yet
Sample Size Calculations For Evaluating Mediation
17 pages
Chapter 20
No ratings yet
Chapter 20
44 pages
Keywords:-Leadership Behavior, Education Supervision
No ratings yet
Keywords:-Leadership Behavior, Education Supervision
6 pages
Forecasting Municipal Solid Waste Generation Using Artificial Intelligence Modelling Approaches
No ratings yet
Forecasting Municipal Solid Waste Generation Using Artificial Intelligence Modelling Approaches
10 pages
Design of Control System For Steel Strip-Rolling Mill Using NARMA-L2
No ratings yet
Design of Control System For Steel Strip-Rolling Mill Using NARMA-L2
9 pages
Biostatistics RM Unit 2 Part 1 2024.-1
No ratings yet
Biostatistics RM Unit 2 Part 1 2024.-1
36 pages
AI Based Learning Tools 01
No ratings yet
AI Based Learning Tools 01
15 pages
Unit - 4: Data Preparation and Analysis
No ratings yet
Unit - 4: Data Preparation and Analysis
73 pages
PM ICAN Professional Syllabus 2025
No ratings yet
PM ICAN Professional Syllabus 2025
8 pages
EMAE 285 Course Documents Fall 2020-08 - 24 - 2020 PDF
100% (1)
EMAE 285 Course Documents Fall 2020-08 - 24 - 2020 PDF
181 pages
BCOM BUSINESS MATHEMATICS AND STATISTICS - Sem 3 - Ms University - 2024 - 2025
No ratings yet
BCOM BUSINESS MATHEMATICS AND STATISTICS - Sem 3 - Ms University - 2024 - 2025
422 pages
QMB Exam 2 Review
No ratings yet
QMB Exam 2 Review
7 pages
Main
No ratings yet
Main
11 pages
Topics: Multiple Regression Analysis (MRA)
No ratings yet
Topics: Multiple Regression Analysis (MRA)
18 pages
TerrSe Geospatial Analysis RCEstoque
No ratings yet
TerrSe Geospatial Analysis RCEstoque
31 pages
HW2 Solution
No ratings yet
HW2 Solution
7 pages
2013 IJDSRM Kumar and Liu
No ratings yet
2013 IJDSRM Kumar and Liu
18 pages
Impact of Risk Tolerance and Demographic Factors On Financial Investment Decision Mitali Baruah, Abhishek Kiritkumar Parikh
No ratings yet
Impact of Risk Tolerance and Demographic Factors On Financial Investment Decision Mitali Baruah, Abhishek Kiritkumar Parikh
13 pages
978 3 03897 444 4 PDF
No ratings yet
978 3 03897 444 4 PDF
539 pages
Ba 1502 S 20 Syl
No ratings yet
Ba 1502 S 20 Syl
4 pages
NLOGIT 6 Reference Guide
No ratings yet
NLOGIT 6 Reference Guide
695 pages
Biostatistics: A Foundation For Analysis in The Health Sciences
No ratings yet
Biostatistics: A Foundation For Analysis in The Health Sciences
5 pages
24 GIT JET Regular Edition 2022
No ratings yet
24 GIT JET Regular Edition 2022
3 pages
Sppu ML 2023
No ratings yet
Sppu ML 2023
2 pages
Analysis of Education Loan: A Case Study of National Capital Territory of Delhi
No ratings yet
Analysis of Education Loan: A Case Study of National Capital Territory of Delhi
14 pages
ML and GenAI
No ratings yet
ML and GenAI
3 pages
O Brien Et Al. 2003 IN
No ratings yet
O Brien Et Al. 2003 IN
9 pages
Cost Concepts for Accountancy Students
No ratings yet
Cost Concepts for Accountancy Students
5 pages

A Predictive Analysis of Retail Sales Foreca Ing Using Machine Learning Techniques

Uploaded by

A Predictive Analysis of Retail Sales Foreca Ing Using Machine Learning Techniques

Uploaded by

Sajawal et al.

LGURJCSIT 2022 SSN: 2521-0122 (Online)

LGU Research Journal of doi: 10.54692/lgurjcsit.2022.0604399

A Predictive Analysis of Retail Sales Forecaﬆing using Machine

KEYWORDS:Machine Learning, Time Series, Sales Forecaﬆing, Regression, Gradient Booﬆing,

1. INTRODUCTION cuﬆomer satisfaction, enhanced channel

3.2.2. Null Hypothesis (H0) (1)

4.1. Citadel POS Dataset

4.2 Predictive Analysis

Figure 5.5 represents the actual and forecaﬆed

Figure 5.4: Linear Regression Sales

Figure 5.4 represents the actual and forecaﬆed

Figure 5.7: Random Foreﬆ Regression Sales

Figure 5.7 represents the actual and forecaﬆed

Index Score Table 5.5: Xgbooﬆ Model Performance Results

Table 5.4 represents the Root Mean Squared

Table 5.6: Regression Model Error Comparison

Figure 5.10: Comparison Machine Learning

Figure 5.10 represents the Mean absolute error

You might also like