0% found this document useful (0 votes)

18 views14 pages

Report

The document outlines a project aimed at developing a sales prediction system using historical sales data and machine learning models, specifically linear regression and random forest regression. The project includes data preprocessing, model training, evaluation, and visualization of predictions, with the random forest model demonstrating superior performance. Contributions from team members and challenges faced during the project are also discussed.

Uploaded by

faizanpervaz74

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views14 pages

Report

Uploaded by

faizanpervaz74

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

1.

Objectives and Introduction

 Objective:
The objective of this project is to develop a sales prediction system
that uses historical sales data to forecast future sales trends. By
applying machine learning models like linear regression and random
forest regression, the goal is to generate predictions that can aid in
business decision-making.

 Introduction:
This project involves analyzing daily sales data to predict future
sales for a specific year. Various machine learning techniques are
utilized, including data preprocessing, feature scaling, model
training, and evaluation. The project aims to implement an
automated sales forecasting system using Python libraries, with an
easy-to-use interface for the end user.

2. Analytical Solution

 Step-by-Step Solution:

o Data Preprocessing: The first step involves loading the

sales data, handling missing values, and grouping the data by
date to get the total sales per day.

o Model Training: We use linear regression and random forest

regression models for sales prediction. The models are trained
on a time series dataset that includes the sales and date
information.

o Feature Scaling: StandardScaler is used to scale the

features for better model performance.

o Evaluation Metrics: Mean Squared Error (MSE), Mean

Absolute Error (MAE), and R-squared are used to evaluate the
performance of the models.

o Prediction: The models are used to predict sales for future

years by scaling the date-related features and generating
forecasts.

 Assumptions and Values:

o The dataset consists of daily sales data over a specific period.

o The regression models assume that the relationship between
the date (or time) and sales follows a linear or non-linear trend
that can be learned from the historical data.

o The random forest model utilizes 100 trees for better

generalization.

3. Explanation of Commands, Functions, and Toolboxes Used (15

Marks)

 Libraries and Functions:

o pandas: Used for data manipulation, loading CSV files, and

grouping sales data.

o numpy: Used for numerical operations and generating

features.

o matplotlib: Used for data visualization, including time series

plotting and sales distribution.

o sklearn.model_selection.train_test_split: Used to split the data

into training and testing sets.

o sklearn.linear_model.LinearRegression: Used to train the linear

regression model.

o sklearn.preprocessing.StandardScaler: Used to standardize the

features before training the models.

o sklearn.metrics: Used to compute performance metrics such

as MSE, MAE, and R-squared.

o sklearn.ensemble.RandomForestRegressor: Used for training a

random forest regression model.

 Commands:

o The train_model() function prepares the data, splits it into

training and testing sets, scales it, and trains the model.

o The visualize_predictions() function plots the predicted sales

for a specified year.

o The load_data() function reads the sales data from a CSV file
and prepares it for analysis.

4. Results and Discussion (20 Marks)

 Results:
The models generated predictions with varying accuracy. The linear
regression model provided a basic understanding of the sales trend,
but the random forest regressor offered better performance in terms
of lower MSE and higher R-squared scores.

o Linear Regression Results:

o Random Forest Regression Results:

 Discussion:
The random forest model outperformed the linear regression model
in terms of accuracy. This demonstrates the power of more complex
models like random forests, especially when working with time-
series data. However, the simplicity of the linear regression model
could still be useful for quicker predictions with less computational
overhead.
5. Flowchart

6. Conclusions

 Conclusion:
This project successfully developed a sales prediction system using
machine learning models. The random forest regressor provided the
best results for predicting future sales based on historical data. The
system can be further enhanced by incorporating more features and
exploring different model architectures.

o Key Points:

 Linear regression and random forest were tested for

sales prediction.

 Random forest showed better performance.

 The system can predict sales for any future year based
on past trends.
7. Contribution (5 Marks)

 Team Contributions:

o M Abdullah (24I-3050): Responsible for data preprocessing,

feature engineering, and model evaluation.

o M Jibran (24I-3134): Worked on training the machine

learning models and performance analysis.

o M Umer (24I-3132): Implemented the prediction

functionality and visualized the results.

 Difficulties and Solutions:

o Difficulty: Handling missing values and ensuring that the

data was in a clean format.

o Solution: Used dropna() and ensured proper date conversion

and grouping of data.

8. Python Code (Ensure that the code is well-commented as

requested in the report requirements)

Trained Random Forest Regressor Model on OrderDate and Sales

dataset

https://www.kaggle.com/datasets/kyanyoga/sample-sales-data

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import mean_squared_error, mean_absolute_error,

r2_score

from sklearn.ensemble import RandomForestRegressor

from sklearn.pipeline import make_pipeline

def load_data(file_path):

try:

# Read with robust encoding

data = pd.read_csv(file_path, encoding='latin-1')

# Convert ORDERDATE to datetime

data['ORDERDATE'] = pd.to_datetime(data['ORDERDATE'])

# Group by date and sum SALES

daily_sales = data.groupby('ORDERDATE')
['SALES'].sum().reset_index()

daily_sales.set_index('ORDERDATE', inplace=True)

return daily_sales

except Exception as e:

print(f"Error loading data: {e}")

return None

def train_model(data):

# Prepare features and target

X = data.index.map(lambda date: [date.year, date.month, date.day,

date.toordinal()]).tolist()

y = data['SALES'].values

# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

# Create pipeline with scaling and Random Forest

pipeline = make_pipeline(
StandardScaler(),

RandomForestRegressor(n_estimators=100, random_state=42)

# Train model

pipeline.fit(X_train, y_train)

# Predictions and evaluation

predictions = pipeline.predict(X_test)

# Detailed model performance metrics

mse = mean_squared_error(y_test, predictions)

mae = mean_absolute_error(y_test, predictions)

r2 = r2_score(y_test, predictions)

print(f"Mean Squared Error: {mse}")

print(f"Mean Absolute Error: {mae}")

print(f"R-squared Score: {r2}")

return pipeline

def visualize_predictions(model, data, year):

# Create date range for prediction

start_date = pd.to_datetime(f"{year}-01-01")

end_date = pd.to_datetime(f"{year}-12-31")

date_range = pd.date_range(start_date, end_date)

# Prepare features for prediction

X_future = date_range.map(lambda date: [date.year, date.month,
date.day, date.toordinal()]).tolist()

# Predict

predictions = model.predict(X_future)

# Plotting

plt.figure(figsize=(12, 6))

plt.plot(date_range, predictions, label=f"Predicted Sales ({year})",

color='red')

plt.title(f"Predicted Sales for the Year {year}")

plt.xlabel('Date')

plt.ylabel('Sales')

plt.xticks(rotation=45)

plt.legend()

plt.tight_layout()

plt.show()

def main():

# Load dataset

data = load_data('sales_data_sample.csv')

if data is None:

return

# Train model

model = train_model(data)

# Predict future sales

while True:

try:
year = int(input("Enter a year for prediction (e.g., 2025): "))

visualize_predictions(model, data, year)

cont = input("Do you want to predict sales for another year?

(yes/no): ")

if cont.lower() != 'yes':

break

except ValueError:

print("Invalid year. Please enter a valid year.")

if __name__ == "__main__":

main()

Linear Regression on Date and Temperature dataset sourced from github :

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score,

mean_absolute_error

from sklearn.preprocessing import StandardScaler

import os

def display_team_info():

print("Welcome to the Sales Prediction System\n")

print("Software House: Predictify Solutions")

print("Team Members: [ M Abdullah, 24I-3050]")

print("Team Members: [ M Jibran, 24I-3134]")

print("Team Members: [ M Umer, 24I-3132]")

print("--------------------------------------------\n")

def load_data(file_url):

try:

data = pd.read_csv(file_url)

data.columns = ['Date', 'Sales'] # Rename columns for context

data['Date'] = pd.to_datetime(data['Date'])

data.set_index('Date', inplace=True)

return data

except Exception as e:

print(f"Unexpected error loading data: {e}")

return None

def visualize_data(data):

print("Visualizing sales data...\n")

plt.figure(figsize=(12, 6))

plt.plot(data.index, data['Sales'], label='Sales', color='blue')

plt.title('Sales Over Time')

plt.xlabel('Date')

plt.ylabel('Sales')

plt.xticks(rotation=45)

plt.legend()

plt.tight_layout()

plt.show()

def train_model(data):

print("Training the linear regression model...\n")

data['Time'] = np.arange(len(data)) # Create a time index

X = data[['Time']]

y = data['Sales']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)

model = LinearRegression()

model.fit(X_train_scaled, y_train)

predictions = model.predict(X_test_scaled)

mse = mean_squared_error(y_test, predictions)

mae = mean_absolute_error(y_test, predictions)

r2 = r2_score(y_test, predictions)

print(f"Mean Squared Error: {mse:.2f}")

print(f"Mean Absolute Error: {mae:.2f}")

print(f"R-squared Value: {r2:.2f}\n")

return model, scaler

def derivative_analysis(model):

print("Performing derivative analysis...\n")

rate_of_change = model.coef_[0]

print(f"Rate of Change (Derivative): {rate_of_change:.2f} sales per

scaled day\n")

def predict_future_sales(model, scaler, data, year):

print(f"Predicting sales for the year {year}...\n")

start_date = pd.to_datetime(f"{year}-01-01")

end_date = pd.to_datetime(f"{year}-12-31")

date_range = pd.date_range(start_date, end_date)

date_ordinals = np.arange(len(data), len(data) +

len(date_range)).reshape(-1, 1)

date_ordinals_scaled = scaler.transform(date_ordinals)

predictions = model.predict(date_ordinals_scaled)

plt.figure(figsize=(12, 6))

plt.plot(date_range, predictions, label=f"Predicted Sales ({year})",

color='red')

plt.title(f"Predicted Sales for {year}")

plt.xlabel('Date')

plt.ylabel('Sales')

plt.xticks(rotation=45)

plt.legend()

plt.tight_layout()

plt.show()

def main():

display_team_info()

file_url =
"https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-
temperatures.csv"

data = load_data(file_url)

if data is None:

return
visualize_data(data)

model, scaler = train_model(data)

derivative_analysis(model)

while True:

try:

year = int(input("Enter a year for prediction (e.g., 2025): "))

predict_future_sales(model, scaler, data, year)

cont = input("Do you want to predict sales for another year?

(yes/no): ")

if cont.lower() != 'yes':

print("Thank you for using the Sales Prediction System!")

break

except ValueError:

print("Invalid year. Please enter a valid year.")

if __name__ == "__main__":

main()

Applied Datascience - Phase3
No ratings yet
Applied Datascience - Phase3
8 pages
BS Mini Project 2
No ratings yet
BS Mini Project 2
5 pages
ADS Phase2
No ratings yet
ADS Phase2
2 pages
Implementation (Raw)
No ratings yet
Implementation (Raw)
12 pages
Ex4.1 Walmart Forecasting
No ratings yet
Ex4.1 Walmart Forecasting
7 pages
Sales Forecasting Project Detailed
No ratings yet
Sales Forecasting Project Detailed
12 pages
Analytical Project Using Python BMBA-252
No ratings yet
Analytical Project Using Python BMBA-252
4 pages
Bce586 Synopsis
No ratings yet
Bce586 Synopsis
5 pages
Mini Project BSP
No ratings yet
Mini Project BSP
11 pages
Cours 3 - TP
No ratings yet
Cours 3 - TP
3 pages
Sale Prediction Model
No ratings yet
Sale Prediction Model
3 pages
EXP4DV
No ratings yet
EXP4DV
2 pages
Analyzing Sales Data
No ratings yet
Analyzing Sales Data
11 pages
Ads Phase5
No ratings yet
Ads Phase5
6 pages
Ids Case Study
No ratings yet
Ids Case Study
15 pages
Optimizing Sales Forecasting - A Comprehensive Analysis
No ratings yet
Optimizing Sales Forecasting - A Comprehensive Analysis
11 pages
Coe Projects
No ratings yet
Coe Projects
7 pages
A Project Based On Python
No ratings yet
A Project Based On Python
17 pages
Sales
No ratings yet
Sales
7 pages
Ex 5.1 Customer Behaviour Prediction
No ratings yet
Ex 5.1 Customer Behaviour Prediction
8 pages
Ads - Phase 2
No ratings yet
Ads - Phase 2
6 pages
Future Sales Prediction Methods
No ratings yet
Future Sales Prediction Methods
9 pages
Lab08 ML
No ratings yet
Lab08 ML
6 pages
Sales Prediction For Big Mart 3.0.pptx MM
No ratings yet
Sales Prediction For Big Mart 3.0.pptx MM
25 pages
ML Recordjp
No ratings yet
ML Recordjp
35 pages
Linear
No ratings yet
Linear
2 pages
DS Food
No ratings yet
DS Food
23 pages
Rossmann nr1 Doc
No ratings yet
Rossmann nr1 Doc
7 pages
NTFX Price Prediction
No ratings yet
NTFX Price Prediction
5 pages
Project Amazon Sales Data Analysis
No ratings yet
Project Amazon Sales Data Analysis
12 pages
SiddharthShah 1032221195 DivC 50 DL LabAssignment2
No ratings yet
SiddharthShah 1032221195 DivC 50 DL LabAssignment2
7 pages
Em Semester Project
No ratings yet
Em Semester Project
21 pages
Analyzer Poster Final
No ratings yet
Analyzer Poster Final
1 page
Predictive Modeling (MP) Project Report
100% (1)
Predictive Modeling (MP) Project Report
73 pages
Detailed Sales Forecasting Presentation
No ratings yet
Detailed Sales Forecasting Presentation
10 pages
Retail Sales Forecasting Model
No ratings yet
Retail Sales Forecasting Model
8 pages
Task 5 Sales Prediction Using Machine Learning
No ratings yet
Task 5 Sales Prediction Using Machine Learning
8 pages
DWM Project
No ratings yet
DWM Project
16 pages
Seminar Report
No ratings yet
Seminar Report
25 pages
Lab5 MLR
No ratings yet
Lab5 MLR
12 pages
SUKUMARREVIEWPPT2
No ratings yet
SUKUMARREVIEWPPT2
24 pages
ML Project Presentation
No ratings yet
ML Project Presentation
9 pages
Black Friday Sales Prediction Project
No ratings yet
Black Friday Sales Prediction Project
14 pages
Numeric
No ratings yet
Numeric
20 pages
Set 2
No ratings yet
Set 2
19 pages
LAB MANUAL For Machine Learning
No ratings yet
LAB MANUAL For Machine Learning
15 pages
AI
No ratings yet
AI
16 pages
Business Report TSF - Rose DataSet
100% (4)
Business Report TSF - Rose DataSet
52 pages
Retail Sales Prediction Model
No ratings yet
Retail Sales Prediction Model
50 pages
How To Create A Python Model
No ratings yet
How To Create A Python Model
29 pages
Explain Me Every Code Written in It With Deep Know
No ratings yet
Explain Me Every Code Written in It With Deep Know
7 pages
Ai Fundamentals Activity 3
No ratings yet
Ai Fundamentals Activity 3
11 pages
Literature Survey ML
No ratings yet
Literature Survey ML
4 pages
M5 Dataset Model
No ratings yet
M5 Dataset Model
13 pages
Walmart Sales Forecasting Insights
No ratings yet
Walmart Sales Forecasting Insights
7 pages
Time Series 3
No ratings yet
Time Series 3
1 page
Pooja Kabadi - Predictive Modelling Project
No ratings yet
Pooja Kabadi - Predictive Modelling Project
70 pages
Document 4
No ratings yet
Document 4
4 pages
Complex SQL Queries Examples
100% (3)
Complex SQL Queries Examples
7 pages
Python Durga Notes
85% (65)
Python Durga Notes
367 pages
SQL Interview Questions & Answers
75% (4)
SQL Interview Questions & Answers
63 pages
SQL Notes by Vikas Kadakkal
89% (9)
SQL Notes by Vikas Kadakkal
88 pages
SQL Interview Prep Guide
83% (6)
SQL Interview Prep Guide
24 pages
How To Use PDL (Parameter Definition Language) in Abinitio
100% (5)
How To Use PDL (Parameter Definition Language) in Abinitio
11 pages
SQL Interview Questions PDF
88% (43)
SQL Interview Questions PDF
48 pages
Complete SQL Notes
80% (55)
Complete SQL Notes
18 pages
Applied Microsoft Power BI Bring Your Data To Life
100% (14)
Applied Microsoft Power BI Bring Your Data To Life
592 pages
SQL - With Practice Exercises, Learn SQL Fast (PDFDrive) PDF
100% (3)
SQL - With Practice Exercises, Learn SQL Fast (PDFDrive) PDF
167 pages
Tricky SQL Queries For Interview
100% (6)
Tricky SQL Queries For Interview
18 pages
Let Us Python by Yashavant Kanetkar
89% (27)
Let Us Python by Yashavant Kanetkar
429 pages
500 SQL Server Interview Questions and Answers - SQL FAQ PDF
75% (12)
500 SQL Server Interview Questions and Answers - SQL FAQ PDF
22 pages
Advanced Python Material PDF
57% (7)
Advanced Python Material PDF
209 pages
Advanced SQL Tutorial for Oracle
100% (7)
Advanced SQL Tutorial for Oracle
37 pages
SQL
90% (10)
SQL
101 pages
Data Structures & Algorithms Guide
100% (1)
Data Structures & Algorithms Guide
96 pages
Snowflake Notes
100% (10)
Snowflake Notes
67 pages
Data Analytics Concepts Techniques and A PDF
100% (14)
Data Analytics Concepts Techniques and A PDF
451 pages
SQL Exercises (HR Database) (SUBQUERIES)
80% (10)
SQL Exercises (HR Database) (SUBQUERIES)
6 pages
Oracle SQL Queries On Emp Table 1 To 235
92% (73)
Oracle SQL Queries On Emp Table 1 To 235
27 pages
SQL Notes!
No ratings yet
SQL Notes!
92 pages
SQL For Beginners The Simplified Guide To Managing, Analyzing Data PDF
100% (3)
SQL For Beginners The Simplified Guide To Managing, Analyzing Data PDF
109 pages
Ab Inition Scenario
No ratings yet
Ab Inition Scenario
28 pages
K.V.rao Core Java Notes
97% (30)
K.V.rao Core Java Notes
146 pages
SQL Notes PDF
78% (9)
SQL Notes PDF
170 pages
Databricks Data Engineer Associate Dumps
91% (11)
Databricks Data Engineer Associate Dumps
40 pages
Data Engineering Cookbook
90% (10)
Data Engineering Cookbook
88 pages
Abinitio Interview Questions
100% (1)
Abinitio Interview Questions
13 pages
Top Unix Interview Questions
100% (3)
Top Unix Interview Questions
23 pages
Functional Estimation For Density, Regression Models and Processes (Odile Pons)
No ratings yet
Functional Estimation For Density, Regression Models and Processes (Odile Pons)
205 pages
Result-Based Talent Identification in Road Cycling - Discovering
No ratings yet
Result-Based Talent Identification in Road Cycling - Discovering
18 pages
AI Study Guide for Class XII
No ratings yet
AI Study Guide for Class XII
40 pages
Lecture 3
No ratings yet
Lecture 3
36 pages
Machine Learning Techniques For Stock Price Predic
No ratings yet
Machine Learning Techniques For Stock Price Predic
10 pages
Power Plant Output Prediction
No ratings yet
Power Plant Output Prediction
12 pages
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
No ratings yet
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
20 pages
ILO Labour Market Estimates Guide
No ratings yet
ILO Labour Market Estimates Guide
18 pages
Simple Linear Regression With Example Problem
No ratings yet
Simple Linear Regression With Example Problem
12 pages
Ai Class 12 Practical 2
0% (1)
Ai Class 12 Practical 2
21 pages
Paper Linear-Least-Squares Initialization - J. Principe
No ratings yet
Paper Linear-Least-Squares Initialization - J. Principe
14 pages
Robust RD Design Statistical Tools
No ratings yet
Robust RD Design Statistical Tools
18 pages
Guo Et Al. (2022)
No ratings yet
Guo Et Al. (2022)
16 pages
Unit 3 Notes
100% (2)
Unit 3 Notes
32 pages
World Population Analysis Machine Learning Project (Data Analyst)
No ratings yet
World Population Analysis Machine Learning Project (Data Analyst)
27 pages
SGN 21006 Advanced Signal Processing: Stochastic Gradient Based Adaptation: Least Mean Square (LMS) Algorithm
No ratings yet
SGN 21006 Advanced Signal Processing: Stochastic Gradient Based Adaptation: Least Mean Square (LMS) Algorithm
30 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
89 pages
6months ML
No ratings yet
6months ML
161 pages
Ecta - Higher Order Properties of GMM and Generalized - 2004
No ratings yet
Ecta - Higher Order Properties of GMM and Generalized - 2004
37 pages
Causal Forest Presentation - High Dim Causal Inference
No ratings yet
Causal Forest Presentation - High Dim Causal Inference
113 pages
Unit-3 Classification & Regression
No ratings yet
Unit-3 Classification & Regression
4 pages
Mathematical Models For Adjustments in The Quantification of Ammonia Volatilization From Urea Fertilizer Applied On Tropical Pastures
No ratings yet
Mathematical Models For Adjustments in The Quantification of Ammonia Volatilization From Urea Fertilizer Applied On Tropical Pastures
10 pages
Prediction of Seismic Performance of Steel Frame Structures A 2024 Structu
No ratings yet
Prediction of Seismic Performance of Steel Frame Structures A 2024 Structu
13 pages
Railways Tender Price Prediction Using Machine Learning and Deep Learning Algorithms
No ratings yet
Railways Tender Price Prediction Using Machine Learning and Deep Learning Algorithms
8 pages
chp2 Cost Functions
No ratings yet
chp2 Cost Functions
7 pages
Mpe, Mape, Rmse
100% (1)
Mpe, Mape, Rmse
10 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
42 pages
Regression Analysis Techniques
No ratings yet
Regression Analysis Techniques
18 pages
Web-Based AI-Driven Agricultural Market Price Prediction System For Davao Oriental Farmers and Traders
No ratings yet
Web-Based AI-Driven Agricultural Market Price Prediction System For Davao Oriental Farmers and Traders
5 pages
Forecasting Product/Item Demand: John Molson School of Business BSTA 477: Managerial Forecasting Winter 2019
No ratings yet
Forecasting Product/Item Demand: John Molson School of Business BSTA 477: Managerial Forecasting Winter 2019
51 pages