0% found this document useful (0 votes)
8 views5 pages

IJNRD2401051

Uploaded by

Baheerathan K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views5 pages

IJNRD2401051

Uploaded by

Baheerathan K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

© 2024 IJNRD | Volume 9, Issue 1 January 2024 | ISSN: 2456-4184 | IJNRD.

ORG

Sales Prediction Using Machine Learning


Techniques
Hitesh S.M1, Yukthi A2, Prof.Ramya B.N3
1
Student,Department of AI & ML,Jyothy Institute of Technology,Bengaluru,Karnataka,India
2
Student,Department of AI & ML,Jyothy Institute of Technology,Bengaluru,Karnataka,India
3
Professor,,Department of AI & ML,Jyothy Institute of Technology,Bengaluru,Karnataka,India

trend prediction models characterized by heightened


Abstract - An Intelligent Decision Analytical System
accuracy and reliability.
necessitates the fusion of decision analysis and predictive
methodologies. Within business frameworks, reliance on a
Sales forecasting stands as a crucial element in guiding
knowledge base and the ability to predict sales trends holds
workforce management, cash flow, and resource allocation
paramount importance. The precision of sales forecasts
within companies. It forms the bedrock for enterprise
profoundly influences business outcomes. Leveraging data
planning and decision-making, empowering organizations
mining techniques proves highly effective in unveiling
to strategize effectively. Precise forecasts not only foster
concealed insights within vast datasets, thereby amplifying
market growth but also elevate revenue generation.
the accuracy and efficiency of forecasting. This study deeply
Leveraging data mining techniques to distil vast data into
examines and analyzes transparent predictive models aimed
actionable insights is fundamental for cost prediction and
at refining future sales predictions. Conventional forecasting
sales forecasts, forming the cornerstone of sound budgeting
systems struggle with handling extensive data, often
compromising the accuracy of sales forecasts. However, these
At the organizational level, accurate sales forecasts serve as
challenges can be surmounted by employing diverse data
pivotal inputs across various functional areas such as
mining techniques. The paper provides a succinct analysis of
operations, marketing, sales, production, and finance. For
sales data and forecast methodologies, elaborating on
businesses seeking investment capital, predictive sales data
various techniques and metrics crucial for accurate sales
plays a pivotal role in efficiently leveraging internal
predictions. Through comprehensive performance
resources. This study approaches the challenge with a new
evaluations, a well-suited predictive model is recommended
perspective, focusing on selecting the most suitable
for forecasting sales trends. The findings are encapsulated,
approach for highly precise sales forecasting.
emphasizing the reliability and precision of the adopted
techniques for prediction and forecasting. The research
The initial dataset analysed in this research contained a
identifies the Gradient Boost Algorithm as the optimal model,
substantial volume of entries. However, the final dataset
demonstrating superior accuracy in forecasting future sales
used for analysis was considerably smaller, achieved by
trends
eliminating non-usable data, redundant entries, and
Key Words: Data mining techniques, Machine Learning
irrelevant sales data, ensuring a more refined analysis.
Algorithms, Prediction, Reliability, Sales forecasting
The research delves into data mining techniques and
1.INTRODUCTION predictive methods in Section I, followed by a review of
existing literature on sales forecasts in Section II. Section III
This research aims primarily to develop a dependable outlines the objectives of sales prediction, while Section IV
mechanism for predicting sales trends using data mining covers predictive analytics and methodologies concerning
techniques, ultimately optimizing revenue generation. sales pricing, maintaining the essence while refining the
Present-day businesses grapple with vast data repositories, content.
a volume projected to exponentially expand. Consequently,
imperative measures must accommodate transaction
speed and anticipate the burgeoning data volume alongside 2.LITERATURE REVIEW
evolving customer behaviors. Notably, the E-commerce
sector seeks novel data mining methods and intelligent
"Numerous sales prediction methods, totaling over 200,
sales
have emerged, categorized broadly into subjective and
objective approaches (cheriyan, 2018). Subjective methods

IJNRD2401051 International Journal of Novel Research and Development (www.ijnrd.org) a443


© 2024 IJNRD | Volume 9, Issue 1 January 2024 | ISSN: 2456-4184 | IJNRD.ORG

heavily rely on expert experiences, utilizing techniques such


as the Delphi method (Linstone & Turof, 1975), 3. METHODOLOGY
brainstorming (Tremblay, Grosskopf, & Yang, 2010), and
subjective probability (Hogarth, 1975). These techniques The following picture Fig. 1 shows the sequence steps and
integrate expert opinions, offering flexibility but bearing a the stages of the proposed prediction process. By utilizing
strong subjective element. these steps the big mart sales prediction model is built. In
this flow diagram there are 5 major steps and each plays a
In contrast, objective prediction methods leverage raw data, significant role in building the model.
employing mathematical and statistical models (sakib,
2019). This category includes regression analysis (such as
simple and multivariate regression) and time series analysis
(like moving average, exponential smoothing, seasonal
trends, autoregressive-moving-average, and generalized
autoregressive conditional heteroscedastic models) based
on actual sales data.

Historically, conventional sales prediction methods


introduced factors or time series for forecasting purposes.
McElroy and Burmeister (1988) applied Arbitrage Pricing
Theory in a multivariate regression model, while Lee and
Fambro (1999) utilized the autoregressive-integrated-
moving-average model for traffic volume forecasting. Huang
and Shih (2003) forecasted short-term loans using ARMA,
and Tay and Cao (2001) delved into time series forecasting.
However, complexities in the relationship between
influencing factors, time series data, and sales predictions
often led to unsatisfactory results.
Fig. 1. Block diagram of Big mart sales Prediction

Consequently, recent focus has shifted toward intelligent A. Data Gathering and Preparation
models like artificial neural networks (ANN), support vector This study draws from a dataset sourced from an efashion
machines (SVM), and other cutting-edge approaches. Kuo store, covering three consecutive years of sales data. To
and Xue (1998) proposed a sales prediction decision forecast the efashion store's sales, historical sales records
support system using fuzzy neural networks, while Hill, from 2015 to 2017 were compiled. The dataset includes
Marquez, and O'Connor (1994) reviewed artificial neural diverse fields such as Category, City, Item Type and
network models for forecasting and decision making. Cao Description, Quantity, Quarter, Sales Revenue, Year, SKU
(2003) combined SVM with time series for sales prediction, Description, Week, and Year. Initially abundant, the dataset
and Gao et al. (2014) advocated extreme learning machines. underwent refinement by eliminating unusable, redundant,
Yuan (2014) introduced an online user behavior-based data and irrelevant entries, resulting in a significantly reduced
mining method for ecommerce sales prediction. final dataset [12].

However, previous research primarily aimed at enhancing B. Data Analysis


prediction accuracy via single model algorithm optimization Stage B involves delving into Exploratory Data Analysis
or analyzing influencing factors. Limitations emerged in (EDA) and Preprocessing, pivotal for comprehending
scenarios with zero sales volume, and most methods only datasets and outlining their core attributes through
forecasted for singular items rather than a broader product visualizations. This phase enables a profound grasp of
range. fundamental details essential for subsequent stages. The
amalgamation of train and test data proves advantageous.
To address these limitations, we devised a trigger model Within EDA, a comprehensive Univariate and Bivariate
system instead of relying solely on a single algorithm. This analysis is conducted to formulate data hypotheses. During
system, grounded in data concerning sales-influencing this process, observations might reveal synonymous
factors, triggers one of the previously discussed prediction categories such as "LF," "low fat," and "Low Fat," as well as
models. Consequently, it generates more accurate correspondences like "reg" and "Regular," which need
predictions and accommodates a significantly larger scale of rectification due to repetition.
sales prediction scenarios.
The scrutiny extends to unraveling relationships between
bivariate features. Preprocessing emerges as the
cornerstone of predictive analysis. The dataset may harbor
undesired elements such as missing data or irregularities,
demanding conversion into a structured format compatible
with machine learning models. Statistical parameters—
mean, median, mode, standard deviation, count of values,
maximum values, etc.—are ascertained using the
data.describe() function.

IJNRD2401051 International Journal of Novel Research and Development (www.ijnrd.org) a444


© 2024 IJNRD | Volume 9, Issue 1 January 2024 | ISSN: 2456-4184 | IJNRD.ORG

Pandas tools are employed to streamline data Random Forest Creation: This initiates the amalgamation
preprocessing, encompassing scrutiny of independent of 'N' decision trees, each trained on distinct subsets of the
variables for null values in each column and their dataset. By harnessing multiple trees, the random forest
subsequent replacement with appropriate data types. This maximizes the diversity of predictions.
meticulous step rectifies repeated features, missing values,
and extraneous columns, ensuring the dataset is primed for Prediction Process: Once the forest of trees is established,
model training aimed at forecasting Outlet sales. The the algorithm offers predictions by aggregating the
process involves addressing missing values by computing outcomes from the various trees crafted in the initial phase.
the mean and median for the respective features. This ensemble approach culminates in a robust prediction
Consequently, the dataset stands prepared to facilitate by leveraging the collective insights derived from the
accurate model training, optimizing predictions for Outlet multitude of decision trees.
sales.
A distinct advantage of Random Forest lies in its proficiency
C. Model Building in handling extensive datasets characterized by numerous
After the cleaning processes of the data, now the dataset is dimensions. Its capacity to navigate such complex data
ready to adapt for a model. Here the model is built using landscapes contributes significantly to its efficacy in
three algorithms. prediction tasks.
-- Linear regression.
-- Random forest regression.
-- Gradient boosting. 3.Gradient Boosting Within gradient boosting, two
primary types of base estimators are utilized: an average-
To track the machine-learning system on wholesome basis type model and decision trees with full depth. The sequence
we have used Scikit-Learn here. Algorithms for Predicting of steps characterizing gradient boosting encompasses:
the dataset are discussed below.
i) Creation of Average Model: An initial average model is
1.Linear regression serves to establish a relationship crafted.
between two variables by employing a linear equation
fitted to observed data. This method aims to forecast the ii) Calculation of Residuals: Residuals are computed by
value of a dependent variable (y) based on a provided contrasting actual values with predictions from the average
independent variable (x). Consequently, it unravels a linear model.
connection between the input (x) and the output (y). The
essence of linear regression lies in determining the optimal iii) Model Creation (RM1): A new model (RM1) is
straight line that best encapsulates the given data points. constructed, taking these residuals as the target.

The formula for the linear regression equation is expressed iv) Prediction of New Residual Values: This model (RM1)
as: y=a+bx+ε predicts new residual values, subsequently leading to the
Where: calculation of updated predicted values.
y represents the predicted value.
v) Iteration with Residuals: The cycle continues with a
x denotes the independent variable. recalculated set of residuals (Actual – Predicted), wherein a
new model (RM2) is trained on these residuals as the target.
a signifies the Y-intercept of the line. This model then generates fresh predictions for the
updated residuals.
b represents the slope of the line.
This iterative process perpetuates, refining predictions by
ε accounts for the disparity between actual and predicted iteratively optimizing subsequent models based on the
values. residuals derived from preceding iterations. This method
This technique inherently acknowledges the presence of an enhances predictive accuracy by systematically refining the
irreducible error (ε), signifying the difference between predictions through each iterative step.
actual and predicted values. Hence, complete reliance on
predicted outcomes from the learning algorithm may be
limited due to this inherent variability.
4.RESULTS AND ANALYSIS
2.Random Forest Regression stands as a robust
supervised learning algorithm adept at handling both
classification and regression tasks. Operating on the From the spectrum of algorithms employed, the selection of
principle of an ensemble method, it comprises an assembly the most efficient model defines the subsequent output's
of 'n' decision trees derived from different subsets of the accuracy. In essence, an algorithm showcasing a lower
dataset. The algorithm amalgamates these decision trees to RMSE value tends to yield predictions of higher accuracy.
harness their collective predictive power, enhancing the
overall accuracy of predictions by aggregating their results,
typically through averaging. Among the trio of algorithms assessed, the Gradient
Boosting algorithm emerges as the frontrunner, achieving
Its operation unfolds in two key phases: the highest prediction accuracy of 0.69, representing the
pinnacle among the selections. Moreover, its RMSE value, a

IJNRD2401051 International Journal of Novel Research and Development (www.ijnrd.org) a445


© 2024 IJNRD | Volume 9, Issue 1 January 2024 | ISSN: 2456-4184 | IJNRD.ORG

mere 10.343, stands as the lowest among the algorithms


scrutinized.

Where: Accuracy = Number of Correct Predictions/


Total Number of Predictions

RMS= sqrt(1-r2)SDy

The Gradient Boosting algorithm not only delivers the


highest accuracy, as defined by correct predictions over the
total, but also exhibits a minimal RMSE value, affirming its Fig. 2 OUTPUT OF THE PROJECT
superior predictive capability within this study's context

ALGORITHM ACCURACY RMSE

Linear 0.58 11.787


Regression

Random Forest 0.649 10.821


Regression

Fig. 3 OUTPUT OF THE PROJECT

Gradient boosting 0.69 10.343


algorithm
5. CONCLUSIONS

The evaluation of classification algorithms primarily


revolves around key metrics such as Classification
Accuracy, Accuracy per Class, and the Confusion Matrix,
which illustrates the frequency of predictions for each class,
TABLE I allowing comparison against the instances of each class.
COMPARISON OF RESULTS OF THE ALGORITHM Additionally, metrics like Root Mean Square Error, Mean
Square Error, and Absolute Error are computed,
culminating in an Error Rate displayed in Table III. This
metric aids in identifying the average incorrectness of
predictions.

The comparative analysis of the three algorithms, depicted


in Table 1 and visualized in the accompanying figure,
distinctly indicates the performance disparities. Notably,
the Gradient Boost Algorithm showcased an impressive
98% overall accuracy, followed by the Decision Tree
Algorithm achieving nearly 71% overall accuracy. The
Generalized Linear Model trailed with a 64% accuracy rate.
Ultimately, upon empirical evaluation, the Gradient
Boosted Tree emerges as the most fitting model.

IJNRD2401051 International Journal of Novel Research and Development (www.ijnrd.org) a446


© 2024 IJNRD | Volume 9, Issue 1 January 2024 | ISSN: 2456-4184 | IJNRD.ORG

While classification accuracy rates can theoretically attain


100%, the empirical analysis of the GBT model, achieved
approximately 98% accuracy. This corroborates its
exceptional performance and reliability.

6. References
[1] cheriyan, S. (2018). sales prediction using ml techniques.
IEEE, 10.

[2] fng, y. (2022). sales prediction analysis. science gate, 7.

[3] ibahim, s. (2018). intelligent techniques of ml in sales


prediction. semantic scholar, 6.

[4] sakib. (2019). ML predictive analysis. engrxiv, 8.

[5] varshini, d. p. (2021). analysis of ml algorithms to predict


sales. ijsr, 6.

IJNRD2401051 International Journal of Novel Research and Development (www.ijnrd.org) a447

You might also like