0% found this document useful (0 votes)

15 views19 pages

Bi Mini Project

The report focuses on identifying data mining tasks and analyzing a dataset to forecast weekly sales for Walmart stores based on historical data. It employs various predictive models, including linear regression and gradient boosting, to assess the impact of factors like holidays and fuel prices on sales. The study aims to provide insights that can enhance Walmart's marketing strategies and resource allocation for increased revenue.

Uploaded by

Karan D Parge

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views19 pages

Bi Mini Project

Uploaded by

Karan D Parge

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

A REPORT ON

“Identifying Data Mining Tasks and Performing them on

the given dataset”

SUBMITTED BY

Karan Parge SeatNo:B1908004222

Niraj Mahanta SeatNo:B1908004235

Rajat Potgantiwar SeatNo:B1908004248

Mayuresh Sontankke SeatNo:B1908004260

DEPARTMENT OF COMPUTER ENGINEERING

TSSM’s BHIVARABAI SAWANT COLLEGE OF
ENGINEERING AND RESEARCH, NARHE, PUNE-41

SAVITRIBAI PHULE PUNE UNIVERSITY

2024-2025

i
CERTIFICATE
This is to certify that the project report entitles
“Identifying Data Mining Tasks and Performing them on the given
dataset”
Submitted by

Karan Parge SeatNo:B1908004222

SeatNo:B1908004235
Niraj Mahanta
Rajat Potgantiwar SeatNo:B1908004248

Mayuresh Sontankke SeatNo:B1908004260

The report has been approved as it satisfies the academic requirements in respect of mini-project
work prescribed for the course Laboratory practice IV (Computer Engineering).

Prof. S. S. Bhagat Dr. A. D. Gujar

Guide Head of Department

ii
ACKNOWLEDGEMENT

It gives us great pleasure in presenting the preliminary mini project report on “Identifying
Data Mining Tasks and Performing them on the given dataset”

With We would like to express our sincere gratitude to Prof. S. S. Bhagat. Their invaluable
guidance, support, and contributions have been instrumental in the successful completion of
this project.
We are particularly grateful to HOD Computer Department Dr. A. D. Gujar for providing data,
expertise, or resources.
Their encouragement and mentorship have been invaluable throughout this journey. We are truly
indebted to their support.
We would also like to acknowledge the contributions of all, although not directly involved in the
project, provided indirect support or inspiration.

iii
ABSTRACT

Data is one of the most essential commodities for any organization in the 21st century.
Harnessing data and utilizing it to create effective marketing strategies and make informed
decisions is crucial for organizations. For a conglomerate as large as Walmart, organizing and
analyzing the vast volumes of data generated is necessary to understand existing performance
and identify growth potential. The primary objective of this project is to analyze how various
factors influence sales for Walmart and leverage these insights to develop more efficient plans
and strategies aimed at increasing revenue.

This paper investigates the performance of a subset of Walmart stores and forecasts future
weekly sales using several models, including linear regression, lasso regression, random forest,
and gradient boosting. An exploratory data analysis was conducted on the dataset to assess the
impact of factors such as holidays, fuel prices, and temperature on Walmart's weekly sales.
Additionally, a Power BI dashboard was created to visualize predicted sales information for
each store and department, providing an overview of overall predicted sales trends.

The analysis revealed that the gradient boosting model yielded the most accurate sales
predictions, and notable relationships were observed between factors such as store size,
holidays, unemployment rate, and weekly sales. Implementation of interaction effects within
linear models highlighted relationships between combinations of variables like temperature,
Consumer Price Index (CPI), and unemployment, directly impacting sales for Walmart stores.

iv
TABLE OF CONTENTS

1. INTRODUCTION
1.1 TOOLS AND TECHNOLOGIES APPLIED
2. PROBLEM STATEMENT
3. METHODOLOGY
4. ABOUT THE DATASET
4.1 EXPLORATORY DATA ANALYSIS
4.2 CORELATION MATRIX
5. DATA CLEANING AND PREPROCESSING
6. MODEL SELECTION AND IMPLEMENTATION
7. BUILDING BI DASHBOARD
8. CONCLUSION
9. REFERENCES

v
CHAPTER 1

INTRODUCTION

The 21st century has witnessed an explosion of data generated from the widespread adoption
of advancing technologies. Retail giants like Walmart view this data as their most valuable
asset, enabling them to predict future sales, understand customer behavior, and formulate
strategies to drive profits and maintain competitiveness. Walmart, an American multinational
retail corporation with nearly 11,000 stores across 27 countries and over 2.2 million associates,
relies on extensive data analytics to support its "everyday low prices" promise and annual
revenue of nearly $500 billion.

Walmart's diverse product range spans groceries, home furnishings, personal care items,
electronics, clothing, and more, generating substantial consumer data that fuels predictive
analytics for customer buying patterns, sales forecasts, promotional planning, and innovative
in-store technologies. Embracing modern technological approaches is vital for Walmart's
success in today's dynamic global market, enabling the company to develop distinctive
products and services that set them apart from competitors.

This research focuses on predicting Walmart's sales based on historical data and investigating
whether factors such as temperature, unemployment, fuel prices, and holidays impact the
weekly sales of specific stores under study. By understanding variations in sales during
holidays like Christmas and Thanksgiving compared to regular days, Walmart can tailor
promotional offers to drive sales and increase revenue.

Walmart strategically schedules promotional markdown sales after major U.S. holidays,
underscoring the importance of assessing their impact on weekly sales to guide resource
allocation toward key initiatives. Understanding user preferences and buying patterns is critical
for enhancing customer retention and demand, ultimately driving profitability. Insights from
this study will inform Walmart's resource allocation based on regional demand and profitability
throughout the year.

vi
Furthermore, leveraging big data analytics enables efficient analysis of historical data to
identify at-risk stores, predict future sales, assess organizational performance, and ensure
strategic alignment.

This study utilizes SQL, R, Python, and Power BI to analyze the dataset provided by Walmart
Recruiting on Kaggle ("Walmart Recruiting - Store Sales Forecasting," 2014). The research
involves modeling and exploratory data analysis in R and Python, aggregation and querying
using SQL, and the creation of a final dashboard in Power BI.

1.1 TOOLS AND TECHNOLOGIES APPLIED

The analysis for this study was conducted using key tools such as R, Python, and Power BI,
with specific tasks performed using development environments like R Studio and PyCharm.

Various packages were utilized to facilitate the initial Exploratory Data Analysis (EDA) and
finalize the outcomes. For the initial EDA, a combination of R and Python libraries including
inspectdf, ggplot2, plotly, caret, matplotlib, seaborn, among others, was employed. Packages
like numpy, pandas, tidyverse, etc., were used for data wrangling and manipulation.

For model creation, several packages such as scikit-learn, xgboost, and others were applied to
develop and evaluate predictive models based on the analyzed data.

vii
CHAPTER 2

PROBLEM STATEMENT

The objective of this study is to forecast the weekly sales for Walmart based on historical data
collected between 2010 and 2013 from 45 stores situated across various regions in the country.
Each store encompasses multiple departments, and the primary deliverable is to predict the
weekly sales for all departments.

The dataset, sourced from Kaggle, includes weekly sales data for 45 Walmart stores, store size
and type information, departmental details, weekly sales figures, and holiday indicators.
Additional data on various influencing factors such as Consumer Price Index (CPI),
temperature, fuel prices, promotional markdowns, and unemployment rates for each week were
also collected to investigate potential correlations with weekly sales.

This study incorporates correlation testing to assess relationships between individual factors
and weekly sales, aiming to identify impactful variables on Walmart's sales performance.
Extensive exploratory data analysis has been conducted on the Walmart dataset, focusing on:

• Identifying store and department-wide sales trends.

• Analyzing sales variations based on store size and type.
• Assessing sales patterns during holiday periods.
• Examining correlations among different factors affecting sales.
• Calculating average yearly sales.
• Analyzing weekly sales in relation to regional temperature, CPI, fuel prices, and
unemployment rates.

A Linear Regression model is employed to explore whether specific combinations of factors

directly influence Walmart's weekly sales. Various algorithms are used for predicting future
sales and analyzing correlations within the retail store dataset.

viii
3. ABOUT THE DATASET

The dataset used in this study was obtained from a previous Kaggle competition hosted by
Walmart, accessible at https://www.kaggle.com/c/walmart-recruiting-store-sales-
forecasting/data. It includes historical weekly sales information for 45 Walmart stores across
different regions, along with department-level details.

The 'test.csv' file from this dataset is utilized solely for predicting values using the model with
the lowest Weighted Mean Absolute Error (WMAE) score. Since this dataset lacks the target
variable 'Weekly Sales', it cannot be used for testing purposes in this analysis. Instead, the
training dataset ('train.csv') is split into training and validation datasets for model development.

The primary objective of this study is to predict department-level weekly sales for each store
using the provided dataset.

The training dataset covers weekly sales data from February 5, 2010, to November 1, 2012,
and includes information about stores, departments, and holiday dates. The testing dataset is
identical to the training dataset except for the absence of weekly sales information. The training
dataset comprises 421,570 rows, while the testing dataset contains 115,064 rows (Figure 1).

Fig. A summary of the Training dataset

There is another dataset called ‘stores.csv’ that contains some more detailed information about
the type and size of these 45 stores used in this study. Another big aspect of this study is to
determine whether there is an increase in the weekly store sales because of changes in
temperature, fuel prices, holidays, markdowns, unemployment rate, and fluctuations in
consumer price indexes, The file ‘fea-tures.csv’ SKNCOE, Department of Computer
Engineering 2022-2023 11 contains all necessary information about these factors and is used
in the analysis to study their impact on sale performances.

ix
The holiday information listed in the study is

A summary of the features dataset is displayed in the image below. (Figure 2)

Fig. A summary of the Training dataset

The final file called ‘sampleSubmission.csv’ contains two main columns: dates for each of the
weeks in the study as well as a blank column that should be utilized to record predicted sales
for that week based on the different models and techniques applied.

The results of the most accurate and efficient model have been recorded in this file and the final
Power BI dashboard has been created based on these predicted values, in conformity with the
‘stores’ and ‘features’ dataset.
.

x
3.1 EXPLORATORY DATA ANALYSIS

It is essential to thoroughly understand the dataset used in this analysis to identify the most
accurate prediction models. Often, underlying patterns or trends in the data are not readily
apparent, highlighting the necessity of conducting comprehensive exploratory data analysis
(EDA). This in-depth examination is critical for grasping the dataset's underlying structure and
drawing meaningful insights to validate our analysis.

The study commences with a preliminary analysis of the dataset to grasp its main characteristics
and relevant components for the research. EDA plays a pivotal role, given the dataset's
numerous attributes essential for drawing insights and making predictions. As part of EDA,
various visualizations have been crafted to clarify the study objectives and highlight attributes
contributing to improved results.

EDA serves as an initial investigation, focusing on exploring relationships and understanding

column characteristics. Utilizing tools like the 'inspectdf' package (Ellis, 2019) and 'glimpse'
package (Sullivan, 2019) in R aids in addressing questions related to dataset dimensions,
missing values, variable distributions, correlation coefficients, and more.

Several other packages such as 'ggplot2', 'matplotlib', 'seaborn', and 'plotly' have been employed
to generate visualizations depicting weekly sales by store and department, sales comparisons
on holidays versus normal days, regional and store-specific sales trends based on store type and
size, annual sales averages, and sales variations due to factors like CPI, fuel prices, temperature,
and unemployment. These visualizations, including heatmaps, correlation matrices (Kedia et
al., 2013), histograms, scatterplots, among others, are accompanied by concise descriptions to
elucidate findings and outline potential modeling avenues for the project's subsequent stages.

xi
3.2 CORELATION MATRIX

A correlation matrix describes the correlation between the various variables of a dataset. Each
variable in the table is correlated to each of the other variables in the table and helps in
understanding which variables are more closely related to each other (Glen, 2016).

With the numerous variables available through this dataset, it became imperative to study
correlations between some of them. By default, this matrix also calculates correlation through
Pearson’s Correlation Coefficient (Johnson, 2021) that calculates the linear relationship
between two variables, within a range of −1 to +1. The closer the correlation to |1|, the higher
the linear relationship between the variables and vice versa.

The heatmap/correlation matrix in Figure 22, created using the seaborn library in Python
(Szabo, 2020) gives the following information:

• There is a slight correlation between weekly sales and store size, type, and department.

xii
4. DATA CLEANING AND PREPROCESSING

The data contains 421,570 rows, with some store-specific departments missing a few too many
weeks of sales. As observed in Figure 4, some columns in the features dataset contain missing
values, however, after the features dataset is merged with the training dataset, the only missing
values that exist are in the Markdown columns (as shown in figure 23).

After the extensive EDA, it was determined that these five markdown files, with missing
values, have barely any correlation to the weekly sales for Walmart, hence these five columns
have been eliminated from the subsequent training and testing dataset.

Because the source already provides training and testing datasets, there is no need to create
them for our study. Because the main focus of this study is to accurately predict weekly sales
for different Walmart stores, the previously modified ‘Date’, ‘Month’, ‘Quarter’, and ‘Day’
columns have been dropped and only the ‘Week of Year’ column has been used in the upcoming
models.

xiii
Data has been checked for inaccuracies, missing or out of range values using the ‘inspectdf’
package in R as part of the initial EDA. Columns with missing values have been dropped. The
dataset contains information about weekly sales which was initially broken down to acquire
information about monthly as well as quarterly sales for our analysis, however, that information
is not going to be utilized during the modeling process. The boolean ‘isHoliday’ column in the
dataset contains information about whether the weekly date was a holiday week or not. As
observed in the EDA above, sales have been higher during the holiday season as compared to
non-holiday season sales, hence the ‘isHoliday’ column has been used for further analysis.

Furthermore, as part of this data preprocessing step, I have also created input and target data
frames along with the training and validation datasets that help accurately measure the
performance of applied models. In addition, as part of this data preprocessing, feature scaling
(Vashisht, 2021) has been applied to normalize different data attributes. This has primarily been
done to unionize the independent variables in the training and testing datasets so that these
variables will be centered around the same range (0,1) and provide more accuracy.

Also referred to as normalization, this method uses a simple min-max scaling technique
(implemented in Python using the Scikit-learn (Sklearn) library. The Weighted Mean

SKNCOE, Department of Computer Engineering 2022-2023 16 Absolute Error is one of the

most common metrics used to measure accuracy for continuous variables (JJ, 2016).

A WMAE function has been created that provides a measure of success for the different models
applied. It is the average of errors between prediction and actual observations, with a weighting
factor. In conclusion, the smaller the WMAE, the more efficient the model.

xiv
5. MODEL SELECTION AND IMPLEMENTATION

Trying to find and implement the most effective model is the biggest challenge of this study.
Selecting a model will depend solely on the kind of data available and the analysis that has to
be performed on the data (UNSW, 2020).

Several models have been studied as part of this study that were selected based on different
aspects of our dataset; the main purpose of creating such models is to predict the weekly sales
for different Walmart stores and departments, hence, based on the nature of models that should
be created, the following four machine learning models have been used:

i) Linear Regression
ii) Lasso Regression
iii) Gradient Boosting Machine
iv) Random Forest

Each of these methods have been discussed briefly in the upcoming report. For each of the
models, why they were chosen, their implementation and their success rate (through WMAE)
have been included.

xv
3. BUILDING BI DASHBOARD

As an end product, this Power BI dashboard is going to serve as the final product of this
research. The dashboard contains detailed information about the original data related to the 45
Walmart stores as well as displays their respective predicted weekly sales. Most of the
explorationsthat have been performed as part of the EDA will be included in this dashboard in
the form of a story and users can filter data based on their requirements in the dashboard. After
the final predicted weekly sales are exported in the ‘sampleSubmissionFinal’ file, the id column
is split to separate the store, department, and date information into different columns through
Power BI data transformations (as shown in the figures below).

This file is then merged with the ‘stores’ file that contains information about the type and size
of the store as well as holiday information. All these columns will be used to create several
visualizations that track weekly predicted sales for various stores and departments, sales based
on store size and type, etc. The dashboard also provides detailed information SKNCOE,
Department of Computer Engineering 2022-2023 18 about stores and departments that generate
the highest revenue and their respective store types. The PDF file contains brief information
about all the visualizations created in the dashboard

xvi
The dashboard can be found in the final submitted folder. If a user does not have access to Power
BI, a PDF export of the entire dashboard is included along with the .pbix file that contains all of the
created visualizations and reports in the dashboard. Some views of the dashboard created are
included below

xvii
4. CONCLUSION

The main purpose of this study was to predict Walmart’s sales based on the available his toric
data and identify whether factors like temperature, unemployment, fuel prices, etc affect the
weekly sales of particularstores under study. This study also aimsto understand whether sales
are relatively higher during holidays like Christmas and Thanks- giving than normal days so
that stores can work on creating promotional offers that increase sales and generate higher
revenue.

As observed through the exploratory data analysis, store size and holidays have a direct
relationshipwith highWalmartsales. Itwas also observed that out of all the store types,Type A
stores gathered the most sales for Walmart. Additionally, departments 92, 95, 38, and 72
accumulate the most sales for Walmart stores across all three store types; for all of the 45 stores,
the presence of these departments in a store ensures higher sales. Pertaining to the specific
factors provided in the study (temperature, unemployment, CPI, and fuel price),it was observed
that sales do tend to go up slightly during favorable climate conditions as well as when the
prices of fuel are adequate. However, it is difficult to make a strong claim about this assumption
considering the limited scope of the training dataset provided as part of this study. By the
observations in the exploratory data analysis, sales also tend to be relatively higher when the
unemployment level is lower. Additionally, with the dataset provided for this study, there does
not seem to be a relationship between sales and the CPI index. Again, it is hard to make a
substantial claim about these findings without the presence of a larger training dataset with
additional information available.

Interaction effects were studied as part of the linear regression model to identify if a
combination of different factors could influence the weekly sales for Walmart. This was
necessary because of the presence of a high number of predictor variables in the dataset. While
the interaction effects were tested on a combination of significant variables, a statistically
significant relationship was only observed between the independent variables of temperature,
CPI and unemployment, and weekly sales (predictor variable). However, this is not definite
because of the limitation of training data.

xviii
5. REFERENCES

1. Bakshi, C. (2020). Random forest regression. https : / / levelup . gitconnected . com / random-
forest-regression-209c0f354c84

2. Bari, A., Chaouchi, M., & Jung, T. (n.d.). How to utilize linear regressions in predictive
analytics. https://www.dummies.com/programming/big-data/data-science/ how-to utilize-
linear-regressions-in-predictive-analytics/

3. Baum, D. (2011). How higher gas prices affect consumer behavior. https : / / www .
sciencedaily.com/releases/2011/05/110512132426.htm

4. Brownlee, J. (2016). Feature importance and feature selection with xgboost in python. https
: / / machinelearningmastery . com / feature - importance - and - feature - selection-
with xgboost-in-python/

5. Chouksey, P., & Chauhan, A. S. (2017). A review of weather data analytics using big data.
International Journal of Advanced Research in Computer and Communica- tion
Engineering,https://doi.org/https://ijarcce.com/upload/2017/january 17/IJARCCE%2072.pdf

6. Crown, M. (2016). Weekly sales forecasts using non-seasonal arima models. http : //
mxcrown.com/walmart-sales-forecasting/

7. Editor, M. B. (2013). Regression analysis: How do i interpret r-squared and assess the
goodness-of-fit? https : / /blog . minitab . com /en /adventures - in - statistics - 2 /
regression analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of- fit

8. Ellis, L. (2019). Simple eda in r with inspectdf. https://www.r-bloggers.com/2019/05/ part-

2-simple-eda-in-r-with-inspectdf/

9. Frost, J. (2021). Regression coefficients- statistics by jim. https://statisticsbyjim.com/

glossary/regression-coefficient/

10. Glen, S. (2016). Elementary statistics for the rest of us. https://www.statisticshowto.
com/correlation-matrix/

xix

U2431791 DS7010 2324 T2 Introduction
No ratings yet
U2431791 DS7010 2324 T2 Introduction
2 pages
Paper 9427
No ratings yet
Paper 9427
6 pages
JICET-Abdullah Bin Tayyab
No ratings yet
JICET-Abdullah Bin Tayyab
11 pages
Rashmi Jeswani Capstone
No ratings yet
Rashmi Jeswani Capstone
84 pages
Intern Report
No ratings yet
Intern Report
17 pages
Finaal Project
No ratings yet
Finaal Project
13 pages
Sales Prediction of Walmart Based On Regression Models: Abstract
No ratings yet
Sales Prediction of Walmart Based On Regression Models: Abstract
10 pages
Retail Sales Prediction Report
No ratings yet
Retail Sales Prediction Report
9 pages
Ammmp2023 87 94
No ratings yet
Ammmp2023 87 94
8 pages
Walmart Case
No ratings yet
Walmart Case
5 pages
Walmart's Sales Data Analysis - A Big Data
No ratings yet
Walmart's Sales Data Analysis - A Big Data
6 pages
Wal Mart Sales Forecasting
No ratings yet
Wal Mart Sales Forecasting
35 pages
Neba 2672024 AJPAS118179
No ratings yet
Neba 2672024 AJPAS118179
24 pages
Amit Kumar: Bigmart Sales Prediction A Project Report
No ratings yet
Amit Kumar: Bigmart Sales Prediction A Project Report
47 pages
FA-19 - Articulo Final - Jose Santaella
No ratings yet
FA-19 - Articulo Final - Jose Santaella
6 pages
Walmart Sales Forecast Analysis
94% (17)
Walmart Sales Forecast Analysis
36 pages
Sales Prediction Model For Big Mart: Parichay: Maharaja Surajmal Institute Journal of Applied Research
No ratings yet
Sales Prediction Model For Big Mart: Parichay: Maharaja Surajmal Institute Journal of Applied Research
11 pages
PPIR
No ratings yet
PPIR
8 pages
Bigmart Sales Using Machine Learning With Data Analysis
No ratings yet
Bigmart Sales Using Machine Learning With Data Analysis
5 pages
Project Report Shruti
No ratings yet
Project Report Shruti
66 pages
Pankaj Report
No ratings yet
Pankaj Report
42 pages
Project Report Shruti 2
No ratings yet
Project Report Shruti 2
66 pages
BigMart Sale Prediction Using Machine Learning
No ratings yet
BigMart Sale Prediction Using Machine Learning
2 pages
Improvizing Big Market Sales Prediction: Meghana N
No ratings yet
Improvizing Big Market Sales Prediction: Meghana N
7 pages
ECSFS Report (670 - Kumar Shantanu)
No ratings yet
ECSFS Report (670 - Kumar Shantanu)
21 pages
Final DMT Report PDF
No ratings yet
Final DMT Report PDF
27 pages
An Adaptive Machine Learning Model For Walmart Sales Prediction
No ratings yet
An Adaptive Machine Learning Model For Walmart Sales Prediction
6 pages
ForecastingRetailSalesusingMachine Learning Models
No ratings yet
ForecastingRetailSalesusingMachine Learning Models
34 pages
Roll - No 404
No ratings yet
Roll - No 404
75 pages
AS Riyyan ICT702
No ratings yet
AS Riyyan ICT702
8 pages
Target Corp Sales Forecasting Report
No ratings yet
Target Corp Sales Forecasting Report
36 pages
Final PBL of Aaryan & Satyam
No ratings yet
Final PBL of Aaryan & Satyam
19 pages
Predicting The Future of Sales: A Machine Learning Analysis of Rossman Store Sales
No ratings yet
Predicting The Future of Sales: A Machine Learning Analysis of Rossman Store Sales
11 pages
Synopsis Format
No ratings yet
Synopsis Format
8 pages
Ex4.1 Walmart Forecasting
No ratings yet
Ex4.1 Walmart Forecasting
7 pages
Big Mart Sales Analysis
No ratings yet
Big Mart Sales Analysis
4 pages
Corporate Finance 13th
No ratings yet
Corporate Finance 13th
15 pages
IJCRT2105404 Bigmart 4
No ratings yet
IJCRT2105404 Bigmart 4
4 pages
DSP Research Paper by Shanmukh and Meher
No ratings yet
DSP Research Paper by Shanmukh and Meher
33 pages
Walmart Sales Prediction Using Multiple Linear Reg
No ratings yet
Walmart Sales Prediction Using Multiple Linear Reg
6 pages
FinalPaper SalesPredictionModelforBigMart
No ratings yet
FinalPaper SalesPredictionModelforBigMart
14 pages
Chetan Research Paper
No ratings yet
Chetan Research Paper
7 pages
Prediction of Big Mart Sales Using Machine Learning: (Peer-Reviewed, Open Access, Fully Refereed International Journal)
No ratings yet
Prediction of Big Mart Sales Using Machine Learning: (Peer-Reviewed, Open Access, Fully Refereed International Journal)
8 pages
Data Mining Model Performance of Sales Predictive Algorithms Based On Rapidminer Workflows
No ratings yet
Data Mining Model Performance of Sales Predictive Algorithms Based On Rapidminer Workflows
18 pages
Pimpri Chinchwad College of Engineering & Research Ravet, Pune
No ratings yet
Pimpri Chinchwad College of Engineering & Research Ravet, Pune
4 pages
BMSP-ML: Big Mart Sales Prediction Using Different Machine Learning Techniques
No ratings yet
BMSP-ML: Big Mart Sales Prediction Using Different Machine Learning Techniques
10 pages
Machine Learning in Sales Forecasting
No ratings yet
Machine Learning in Sales Forecasting
9 pages
Forecast of Sales of Walmart Store Using Big Data Applications
No ratings yet
Forecast of Sales of Walmart Store Using Big Data Applications
9 pages
Intelligent Sales Prediction Using Machine Learning Techniques
No ratings yet
Intelligent Sales Prediction Using Machine Learning Techniques
6 pages
Chapter 1: Introduction: 1.1 Background Theory
No ratings yet
Chapter 1: Introduction: 1.1 Background Theory
36 pages
Sales Prediction For Big Mart 3.0.pptx MM
No ratings yet
Sales Prediction For Big Mart 3.0.pptx MM
25 pages
Grid Search Optimization (GSO) Based Future Sales Prediction For Big Mart
No ratings yet
Grid Search Optimization (GSO) Based Future Sales Prediction For Big Mart
7 pages
Sales Analysis and Prediction Dashboard Using Power Bi
No ratings yet
Sales Analysis and Prediction Dashboard Using Power Bi
8 pages
Doc3 Main Report
No ratings yet
Doc3 Main Report
60 pages
Retail Sales Forecasting Guide
No ratings yet
Retail Sales Forecasting Guide
4 pages
Big Mart Sales Analysis
No ratings yet
Big Mart Sales Analysis
4 pages
Ethical Hacking with Google Dorks
100% (23)
Ethical Hacking with Google Dorks
5 pages
Carding Tutorial For Beginners 2024
93% (27)
Carding Tutorial For Beginners 2024
10 pages
Dangerous Google - Searching For Secrets PDF
90% (31)
Dangerous Google - Searching For Secrets PDF
12 pages
7839+ Awesome Deep Web Onion Links List (Uncensored Content) PDF
76% (37)
7839+ Awesome Deep Web Onion Links List (Uncensored Content) PDF
391 pages
How To Order Free Phones On Verizon
91% (35)
How To Order Free Phones On Verizon
17 pages
Hacking The Art of Exploitation 2nd Edition Jon Erickson
100% (21)
Hacking The Art of Exploitation 2nd Edition Jon Erickson
492 pages
Credit Card Fraud Tactics Explained
76% (46)
Credit Card Fraud Tactics Explained
6 pages
Google Hacking Database
75% (20)
Google Hacking Database
91 pages
Mobile Check Fraud Guide 2023
98% (49)
Mobile Check Fraud Guide 2023
18 pages
Illegal Hacking and Carding Methods
86% (21)
Illegal Hacking and Carding Methods
17 pages
30 Best Tor Sites For Any and Everything You'll Ever Need! - Your Hacker
89% (38)
30 Best Tor Sites For Any and Everything You'll Ever Need! - Your Hacker
8 pages
PDF Fraud Bible 1 Compress
90% (62)
PDF Fraud Bible 1 Compress
123 pages
Penetration Testing Step-By-Step Guide
92% (13)
Penetration Testing Step-By-Step Guide
417 pages
Certified Blackhat Methodology To Unethical Hacking
100% (1)
Certified Blackhat Methodology To Unethical Hacking
135 pages
Hacking Tools Cheat Sheet
100% (14)
Hacking Tools Cheat Sheet
2 pages
The Hacker Playbook 1 - Practical Guide To Penetration Testing
92% (12)
The Hacker Playbook 1 - Practical Guide To Penetration Testing
308 pages
??,??,?? Mega List 300+ Sites & Method
88% (17)
??,??,?? Mega List 300+ Sites & Method
63 pages
Mobile Check Fraud Guide 2023
100% (9)
Mobile Check Fraud Guide 2023
18 pages
Illegal Carding Methods and Sites Guide
84% (19)
Illegal Carding Methods and Sites Guide
110 pages
How To Get Credit Cards With Funds
85% (60)
How To Get Credit Cards With Funds
2 pages
Best 20 Hacking Tutorials
94% (31)
Best 20 Hacking Tutorials
404 pages
Credit Card Fraud Guide
81% (32)
Credit Card Fraud Guide
8 pages
Full Tutorial
83% (24)
Full Tutorial
45 pages
Hacking With Kali Linux
100% (10)
Hacking With Kali Linux
170 pages
Android For Hackers: How To Turn An Android Phone Into A Hacking Device Without Root Null Byte :: WonderHowTo
100% (5)
Android For Hackers: How To Turn An Android Phone Into A Hacking Device Without Root Null Byte :: WonderHowTo
12 pages
Hackers Black Book
77% (22)
Hackers Black Book
30 pages
Python 3 Cheat Sheet
94% (51)
Python 3 Cheat Sheet
2 pages
The Complete Cyber Security Course, Hacking Exposed
97% (31)
The Complete Cyber Security Course, Hacking Exposed
282 pages
The Dark Web URL List
59% (22)
The Dark Web URL List
9 pages
How To Check Client or Fullz Credit Score@Baddestupdate
90% (20)
How To Check Client or Fullz Credit Score@Baddestupdate
7 pages
Walmarts Global Expansion Strategy and Financial Risk (1) Read Only
No ratings yet
Walmarts Global Expansion Strategy and Financial Risk (1) Read Only
10 pages
Wal-Mart China: Sustainable Operations Strategy: The China Sustainability Challenge
100% (1)
Wal-Mart China: Sustainable Operations Strategy: The China Sustainability Challenge
14 pages
Case The Globalization of Walmart
No ratings yet
Case The Globalization of Walmart
8 pages
Sourcing Strategies - IT
0% (1)
Sourcing Strategies - IT
29 pages
Portfolio Unit 1 Activity BUS 5112
No ratings yet
Portfolio Unit 1 Activity BUS 5112
2 pages
SHRM - Walmart Case Study
100% (3)
SHRM - Walmart Case Study
5 pages
5 Cases On Retail Management
No ratings yet
5 Cases On Retail Management
7 pages
Globally
No ratings yet
Globally
3 pages
Evaluation of Indian Retail
100% (1)
Evaluation of Indian Retail
71 pages
Report Group 3 - Assignment 2 - Costco Case Study
No ratings yet
Report Group 3 - Assignment 2 - Costco Case Study
49 pages
Milos
No ratings yet
Milos
1 page
The Localized Merchandising For International Retailers: A Study of Tesco's Failure in Japan
No ratings yet
The Localized Merchandising For International Retailers: A Study of Tesco's Failure in Japan
21 pages
BMKT525 Case 3
No ratings yet
BMKT525 Case 3
2 pages
Wal-Mart's Impact: Growth and Criticism
No ratings yet
Wal-Mart's Impact: Growth and Criticism
18 pages
The Walmart Way
No ratings yet
The Walmart Way
1 page
Assignment 4 Fin500
No ratings yet
Assignment 4 Fin500
4 pages
Walmart Presentation
No ratings yet
Walmart Presentation
54 pages
Research Paper On HR Analytics
No ratings yet
Research Paper On HR Analytics
13 pages
Jul 12, 2021 Order
No ratings yet
Jul 12, 2021 Order
2 pages
Levi's and Walmart Case Study
No ratings yet
Levi's and Walmart Case Study
2 pages
9.log. Chapter 9
No ratings yet
9.log. Chapter 9
71 pages
Streetwise Article Aug 2-8, 2006
No ratings yet
Streetwise Article Aug 2-8, 2006
2 pages
Norway's Ethical Divestment
No ratings yet
Norway's Ethical Divestment
10 pages
MInicase Chapter 2 - Hitt
No ratings yet
MInicase Chapter 2 - Hitt
2 pages
Presentation Amazon Case Study
100% (3)
Presentation Amazon Case Study
41 pages
Case Study: Abin Kurian Anup Keecherilkaroor Siva Abhijith Kurup
No ratings yet
Case Study: Abin Kurian Anup Keecherilkaroor Siva Abhijith Kurup
7 pages
Unit 4 CHANGE Class Material Glossary Assignments
No ratings yet
Unit 4 CHANGE Class Material Glossary Assignments
25 pages
4FT3 Final Exam Review Slides-F23-C01 C02 C03 C04 r1
No ratings yet
4FT3 Final Exam Review Slides-F23-C01 C02 C03 C04 r1
27 pages
Walmart U.S. Q4 Comp Sales Grew 1.9% and Walmart U.S. Ecommerce Sales Grew 35%
No ratings yet
Walmart U.S. Q4 Comp Sales Grew 1.9% and Walmart U.S. Ecommerce Sales Grew 35%
16 pages
Critique Article - Walmart The Organization Structure and Organization Culture
No ratings yet
Critique Article - Walmart The Organization Structure and Organization Culture
14 pages

Bi Mini Project

Uploaded by

Bi Mini Project

Uploaded by

A REPORT ON

“Identifying Data Mining Tasks and Performing them on

Karan Parge SeatNo:B1908004222

Niraj Mahanta SeatNo:B1908004235

Rajat Potgantiwar SeatNo:B1908004248

DEPARTMENT OF COMPUTER ENGINEERING

SAVITRIBAI PHULE PUNE UNIVERSITY

Karan Parge SeatNo:B1908004222

Mayuresh Sontankke SeatNo:B1908004260

Prof. S. S. Bhagat Dr. A. D. Gujar

1.1 TOOLS AND TECHNOLOGIES APPLIED

• Identifying store and department-wide sales trends.

A Linear Regression model is employed to explore whether specific combinations of factors

Fig. A summary of the Training dataset

A summary of the features dataset is displayed in the image below. (Figure 2)

Fig. A summary of the Training dataset

EDA serves as an initial investigation, focusing on exploring relationships and understanding

SKNCOE, Department of Computer Engineering 2022-2023 16 Absolute Error is one of the

8. Ellis, L. (2019). Simple eda in r with inspectdf. https://www.r-bloggers.com/2019/05/ part-

9. Frost, J. (2021). Regression coefficients- statistics by jim. https://statisticsbyjim.com/

You might also like