0% found this document useful (0 votes)

97 views6 pages

Wine Quality Prediction

The document discusses importing libraries for machine learning, loading wine quality data, exploring the data through visualizations and statistics, preprocessing the data through feature extraction and label binarization, training a random forest classifier model on 80% of the data and evaluating it on the 20% test data achieving 92.8% accuracy, and building a predictive system to classify new wine data.

Uploaded by

Alisha Anjum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views6 pages

Wine Quality Prediction

Uploaded by

Alisha Anjum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Import

Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import warnings
warnings.simplefilter('ignore')

Data Collection

# Loading the dataset to a Pandas DataFrame

wine_data = pd.read_csv(r'C:\Users\ADMIN\Desktop\Projects\Data sets\winequality-red.csv')

wine_data

fixed volatile citric residual free sulfur total sulfur

chlorides density pH sulphates alcohol quality
acidity acidity acid sugar dioxide dioxide

0 7.4 0.700 0.00 1.9 0.076 11.0 34.0 0.99780 3.51 0.56 9.4 5

1 7.8 0.880 0.00 2.6 0.098 25.0 67.0 0.99680 3.20 0.68 9.8 5

2 7.8 0.760 0.04 2.3 0.092 15.0 54.0 0.99700 3.26 0.65 9.8 5

3 11.2 0.280 0.56 1.9 0.075 17.0 60.0 0.99800 3.16 0.58 9.8 6

4 7.4 0.700 0.00 1.9 0.076 11.0 34.0 0.99780 3.51 0.56 9.4 5

... ... ... ... ... ... ... ... ... ... ... ... ...

1594 6.2 0.600 0.08 2.0 0.090 32.0 44.0 0.99490 3.45 0.58 10.5 5

1595 5.9 0.550 0.10 2.2 0.062 39.0 51.0 0.99512 3.52 0.76 11.2 6

1596 6.3 0.510 0.13 2.3 0.076 29.0 40.0 0.99574 3.42 0.75 11.0 6

1597 5.9 0.645 0.12 2.0 0.075 32.0 44.0 0.99547 3.57 0.71 10.2 5

1598 6.0 0.310 0.47 3.6 0.067 18.0 42.0 0.99549 3.39 0.66 11.0 6

1599 rows × 12 columns

wine_data.shape

(1599, 12)

wine_data.head()

fixed volatile citric residual free sulfur total sulfur

chlorides density pH sulphates alcohol quality
acidity acidity acid sugar dioxide dioxide

0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5

1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5

2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5

3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6

4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5

Checking missing Values

wine_data.isnull().sum()

fixed acidity 0
volatile acidity 0
citric acid 0
residual sugar 0
chlorides 0
free sulfur dioxide 0
total sulfur dioxide 0
density 0
pH 0
sulphates 0
alcohol 0
quality 0
dtype: int64

Statistical measures

wine_data.describe()

volatile residual free sulfur total sulfur

fixed acidity citric acid chlorides density pH sulphates alcoh
acidity sugar dioxide dioxide

count 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.0000

mean 8.319637 0.527821 0.270976 2.538806 0.087467 15.874922 46.467792 0.996747 3.311113 0.658149 10.4229

std 1.741096 0.179060 0.194801 1.409928 0.047065 10.460157 32.895324 0.001887 0.154386 0.169507 1.0656

min 4.600000 0.120000 0.000000 0.900000 0.012000 1.000000 6.000000 0.990070 2.740000 0.330000 8.4000

25% 7.100000 0.390000 0.090000 1.900000 0.070000 7.000000 22.000000 0.995600 3.210000 0.550000 9.5000

50% 7.900000 0.520000 0.260000 2.200000 0.079000 14.000000 38.000000 0.996750 3.310000 0.620000 10.2000

75% 9.200000 0.640000 0.420000 2.600000 0.090000 21.000000 62.000000 0.997835 3.400000 0.730000 11.1000

max 15.900000 1.580000 1.000000 15.500000 0.611000 72.000000 289.000000 1.003690 4.010000 2.000000 14.9000

sns.catplot(x='quality',data = wine_data, kind = 'count')

<seaborn.axisgrid.FacetGrid at 0x219e25c3af0>

# volatile acidity vs quality

plot = plt.figure(figsize=(5,5))
sns.barplot(x='quality',y='volatile acidity',data=wine_data)

<AxesSubplot:xlabel='quality', ylabel='volatile acidity'>

# citric acid vs quality
plot = plt.figure(figsize=(5,5))
sns.barplot(x='quality',y='citric acid',data=wine_data)

<AxesSubplot:xlabel='quality', ylabel='citric acid'>

Correlation
1. Positive Correlation
2. Negative Correlation

correlation = wine_data.corr()

# constructing a heatmap to understand the correlation between the columns

plt.figure(figsize=(10,10))
sns.heatmap(correlation, cbar=True, square=True, fmt='.1f', annot=True, annot_kws={'size':8}, cmap='Blues')

<AxesSubplot:>
Data PreProcessing

X = wine_data.drop('quality',axis=1)

print(X)

fixed acidity volatile acidity citric acid residual sugar chlorides \

0 7.4 0.700 0.00 1.9 0.076
1 7.8 0.880 0.00 2.6 0.098
2 7.8 0.760 0.04 2.3 0.092
3 11.2 0.280 0.56 1.9 0.075
4 7.4 0.700 0.00 1.9 0.076
... ... ... ... ... ...
1594 6.2 0.600 0.08 2.0 0.090
1595 5.9 0.550 0.10 2.2 0.062
1596 6.3 0.510 0.13 2.3 0.076
1597 5.9 0.645 0.12 2.0 0.075
1598 6.0 0.310 0.47 3.6 0.067

free sulfur dioxide total sulfur dioxide density pH sulphates \

0 11.0 34.0 0.99780 3.51 0.56
1 25.0 67.0 0.99680 3.20 0.68
2 15.0 54.0 0.99700 3.26 0.65
3 17.0 60.0 0.99800 3.16 0.58
4 11.0 34.0 0.99780 3.51 0.56
... ... ... ... ... ...
1594 32.0 44.0 0.99490 3.45 0.58
1595 39.0 51.0 0.99512 3.52 0.76
1596 29.0 40.0 0.99574 3.42 0.75
1597 32.0 44.0 0.99547 3.57 0.71
1598 18.0 42.0 0.99549 3.39 0.66

alcohol
0 9.4
1 9.8
2 9.8
3 9.8
4 9.4
... ...
1594 10.5
1595 11.2
1596 11.0
1597 10.2
1598 11.0

[1599 rows x 11 columns]

Label Binarization

Y = wine_data['quality'].apply(lambda y_value: 1 if y_value >= 7 else 0)

print (Y)

0 0
1 0
2 0
3 0
4 0
..
1594 0
1595 0
1596 0
1597 0
1598 0
Name: quality, Length: 1599, dtype: int64

Train and Test Split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=3)

print(Y.shape, Y_train.shape, Y_test.shape)

(1599,) (1279,) (320,)

Model Training:
Random Forest Classifier

model = RandomForestClassifier()

model.fit(X_train, Y_train)

RandomForestClassifier()

Model Evaluation
Accuracy score

# Accuracy on test data

X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)

print('Accuracy:', test_data_accuracy)

Accuracy: 0.928125

Building a Predictive System

input_data1 = (7.5,0.5,0.36,6.1,0.071,17.0,102.0,0.9978,3.35,0.8,10.5)

# Changing the input data in to a numpy array

input_data_as_numpy_array = np.asarray(input_data1)

# Reshape the data as we are predicting the label for only one instance
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

prediction = model.predict(input_data_reshaped)
print(prediction)

if prediction[0]==1:
print('Good Quality Wine')
else:
print('Bad Quality Wine')

[0]
Bad Quality Wine

input_data = (7.3,0.65,0.0,1.2,0.065,15.0,21.0,0.9946,3.39,0.47,10.0)
# Changing the input data in to a numpy array
input_data_as_numpy_array = np.asarray(input_data)

# Reshape the data as we are predicting the label for only one instance
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

prediction = model.predict(input_data_reshaped)
print(prediction)

if prediction[0]==1:
print('Good Quality Wine')
else:
print('Bad Quality Wine')

[1]
Good Quality Wine

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

Quality Prediction
No ratings yet
Quality Prediction
20 pages
Assignment4 VidulGarg
No ratings yet
Assignment4 VidulGarg
14 pages
Karisma 23011101119 Eda Rec
No ratings yet
Karisma 23011101119 Eda Rec
88 pages
Data Mining 1 Practical File-1
No ratings yet
Data Mining 1 Practical File-1
24 pages
ML LAB 12 - Jupyter Notebook
No ratings yet
ML LAB 12 - Jupyter Notebook
11 pages
Code
No ratings yet
Code
5 pages
USL - 21070126112 - Colaboratory
No ratings yet
USL - 21070126112 - Colaboratory
3 pages
Aggregate Packing Density Analysis
No ratings yet
Aggregate Packing Density Analysis
9 pages
TP
No ratings yet
TP
13 pages
Python Project 2 Colab
No ratings yet
Python Project 2 Colab
6 pages
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
No ratings yet
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
5 pages
TR 435 Yh
No ratings yet
TR 435 Yh
11 pages
Datamining Exp5 Datanormalisation
No ratings yet
Datamining Exp5 Datanormalisation
14 pages
Submission of Term Work': Subject - Cpms
No ratings yet
Submission of Term Work': Subject - Cpms
25 pages
Fractionators
No ratings yet
Fractionators
9 pages
Tablas
100% (2)
Tablas
18 pages
Central Limit Theory Spreadsheet
No ratings yet
Central Limit Theory Spreadsheet
87 pages
Tabela Agua
No ratings yet
Tabela Agua
12 pages
PVT Data of Molten Copolymers 6.1. Experimental Data And/or Tait Equation Parameters
No ratings yet
PVT Data of Molten Copolymers 6.1. Experimental Data And/or Tait Equation Parameters
90 pages
2
No ratings yet
2
6 pages
Table A2 A5
No ratings yet
Table A2 A5
9 pages
Planilha Sem Título
No ratings yet
Planilha Sem Título
56 pages
Catalogue
No ratings yet
Catalogue
212 pages
Fase Equilibrio Benceno/Metanol
No ratings yet
Fase Equilibrio Benceno/Metanol
15 pages
Equilibrio de Fases (Benceno/Metanol) 1. Utilización de Software (Chemcad) Por Raoult
No ratings yet
Equilibrio de Fases (Benceno/Metanol) 1. Utilización de Software (Chemcad) Por Raoult
15 pages
Silicon Parameter File
No ratings yet
Silicon Parameter File
49 pages
Maass Nederland BV-2
No ratings yet
Maass Nederland BV-2
212 pages
CHEE3741 2016 Assignments PDF
No ratings yet
CHEE3741 2016 Assignments PDF
16 pages
Tablas A
No ratings yet
Tablas A
2 pages
Trapez Load Tables for Builders
No ratings yet
Trapez Load Tables for Builders
1 page
Steam Tables (English Units)
No ratings yet
Steam Tables (English Units)
3 pages
Hematite R040024-1 Powder DIF File 3048
No ratings yet
Hematite R040024-1 Powder DIF File 3048
1 page
Tables of R22 Ammonia R134a
No ratings yet
Tables of R22 Ammonia R134a
17 pages
Random Forest
No ratings yet
Random Forest
5 pages
Cofe2o4 Jcpds Data Card
100% (1)
Cofe2o4 Jcpds Data Card
3 pages
890 Tables in SI Units: Table A-1
No ratings yet
890 Tables in SI Units: Table A-1
22 pages
Liquid-Vapor Equilibrium Data
No ratings yet
Liquid-Vapor Equilibrium Data
7 pages
Name and Formula
No ratings yet
Name and Formula
4 pages
Name and Formula
No ratings yet
Name and Formula
3 pages
Pattern 1272 Anatase
100% (2)
Pattern 1272 Anatase
6 pages
Statistical Analysis For Analytical Methods Validations: Authors
No ratings yet
Statistical Analysis For Analytical Methods Validations: Authors
9 pages
%trabajamos Con Los Componentes Metanol y 1-Hepteno %componente Metanol
No ratings yet
%trabajamos Con Los Componentes Metanol y 1-Hepteno %componente Metanol
3 pages
Power Factor Correction Table
No ratings yet
Power Factor Correction Table
1 page
Database XRD
No ratings yet
Database XRD
6 pages
Name and Formula
No ratings yet
Name and Formula
5 pages
Steam Tables
No ratings yet
Steam Tables
20 pages
KNN Classification - Solved
No ratings yet
KNN Classification - Solved
13 pages
JCPDScardno 024-0735
No ratings yet
JCPDScardno 024-0735
3 pages
Chávez Cotrina, Joel Lidwer: Laboratorio 4
No ratings yet
Chávez Cotrina, Joel Lidwer: Laboratorio 4
5 pages
Task 7
No ratings yet
Task 7
14 pages
Lampiran C Analisa PCC: Analisatitrasi Kompleksometri
No ratings yet
Lampiran C Analisa PCC: Analisatitrasi Kompleksometri
5 pages
Engineering Flange Specifications
No ratings yet
Engineering Flange Specifications
212 pages
Lecure 6 Determining K From Batch Data
No ratings yet
Lecure 6 Determining K From Batch Data
25 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
CMOS Inverter-Switching, Time Delay
100% (2)
CMOS Inverter-Switching, Time Delay
14 pages
CMOS Logic Circuits Overview
100% (1)
CMOS Logic Circuits Overview
20 pages
Wireless Power Transfer For Electric Vehicle Battery Charging Miniproject
No ratings yet
Wireless Power Transfer For Electric Vehicle Battery Charging Miniproject
20 pages
B - Inductance Profile
No ratings yet
B - Inductance Profile
4 pages
Static Shunt Compensation
No ratings yet
Static Shunt Compensation
16 pages
TSC-TCR and STATCOM
100% (4)
TSC-TCR and STATCOM
24 pages
Unit IV Robot Kinematics and Robot Programming
No ratings yet
Unit IV Robot Kinematics and Robot Programming
23 pages
W3Schools Quiz Results
No ratings yet
W3Schools Quiz Results
6 pages
Sherwood M410
No ratings yet
Sherwood M410
29 pages
25 Nov 1992 Male SC: Communication Address GATE Exam Details
No ratings yet
25 Nov 1992 Male SC: Communication Address GATE Exam Details
1 page
Mini Mk8 MM Installation and Commissioning Guide
No ratings yet
Mini Mk8 MM Installation and Commissioning Guide
118 pages
1762 Um002 - en P
No ratings yet
1762 Um002 - en P
144 pages
IPSDC FW Release Notes
No ratings yet
IPSDC FW Release Notes
2 pages
Paper Juego para IEEE 29148
No ratings yet
Paper Juego para IEEE 29148
16 pages
OmniProx Readers
No ratings yet
OmniProx Readers
2 pages
08a Objective Camdp T
No ratings yet
08a Objective Camdp T
16 pages
Design Better Custom Extractors For ODS Delta Loads
No ratings yet
Design Better Custom Extractors For ODS Delta Loads
7 pages
CuGaS2 Semiconductor Insights
No ratings yet
CuGaS2 Semiconductor Insights
9 pages
Electronic Devices and Circuits - I. J. Nagrath
0% (1)
Electronic Devices and Circuits - I. J. Nagrath
156 pages
CMU Student Schedule & Fees 2024
No ratings yet
CMU Student Schedule & Fees 2024
1 page
Essential Excel Functions Guide
No ratings yet
Essential Excel Functions Guide
33 pages
CSE (IOT) OOPS Theory Internal
No ratings yet
CSE (IOT) OOPS Theory Internal
2 pages
Civil Registration Service Appointment Slip
No ratings yet
Civil Registration Service Appointment Slip
1 page
Yed Graph Editor Manual: Import of Excel Files
No ratings yet
Yed Graph Editor Manual: Import of Excel Files
8 pages
Living FCA Guide for Building Owners
No ratings yet
Living FCA Guide for Building Owners
7 pages
Module 1
No ratings yet
Module 1
19 pages
3HAC049108 SP IRB 4600-En PDF
No ratings yet
3HAC049108 SP IRB 4600-En PDF
50 pages
Adobe Scan 4 Aug 2023
No ratings yet
Adobe Scan 4 Aug 2023
24 pages
Resume 1
No ratings yet
Resume 1
2 pages
Oregon Health Authority Website Usability Report
No ratings yet
Oregon Health Authority Website Usability Report
29 pages
Veeam Backup Free Vs Full
No ratings yet
Veeam Backup Free Vs Full
3 pages
Scope User Guide
No ratings yet
Scope User Guide
428 pages
Types of Digital Data
No ratings yet
Types of Digital Data
33 pages
SM-A507F Manual de Servicio Anibal Garcia Irepair
No ratings yet
SM-A507F Manual de Servicio Anibal Garcia Irepair
107 pages
70 411 PDF
No ratings yet
70 411 PDF
23 pages
Java Design Patterns: A Hands-On Experience With Real-World Examples, Third Edition Vaskaran Sarcar PDF Version
No ratings yet
Java Design Patterns: A Hands-On Experience With Real-World Examples, Third Edition Vaskaran Sarcar PDF Version
132 pages

Wine Quality Prediction

Uploaded by

Wine Quality Prediction

Uploaded by

Import

# Loading the dataset to a Pandas DataFrame

fixed volatile citric residual free sulfur total sulfur

1599 rows × 12 columns

fixed volatile citric residual free sulfur total sulfur

Checking missing Values

volatile residual free sulfur total sulfur

sns.catplot(x='quality',data = wine_data, kind = 'count')

# volatile acidity vs quality

<AxesSubplot:xlabel='quality', ylabel='volatile acidity'>

<AxesSubplot:xlabel='quality', ylabel='citric acid'>

# constructing a heatmap to understand the correlation between the columns

fixed acidity volatile acidity citric acid residual sugar chlorides \

free sulfur dioxide total sulfur dioxide density pH sulphates \

[1599 rows x 11 columns]

Y = wine_data['quality'].apply(lambda y_value: 1 if y_value >= 7 else 0)

Train and Test Split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=3)

print(Y.shape, Y_train.shape, Y_test.shape)

(1599,) (1279,) (320,)

# Accuracy on test data

Building a Predictive System

# Changing the input data in to a numpy array

You might also like