0% found this document useful (0 votes)
86 views3 pages

Wine Quality Prediction: Implementation

This document summarizes a study that used machine learning models to predict the quality of red wine based on physicochemical properties. It tested several regression models and found that model tree and linear regression achieved the best performance, with normalized absolute errors below 0.75. The data came from a dataset containing physicochemical and sensory data from over 1500 wine samples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views3 pages

Wine Quality Prediction: Implementation

This document summarizes a study that used machine learning models to predict the quality of red wine based on physicochemical properties. It tested several regression models and found that model tree and linear regression achieved the best performance, with normalized absolute errors below 0.75. The data came from a dataset containing physicochemical and sensory data from over 1500 wine samples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Wine quality prediction

The goal of this work is to predict the quality of red wine by using parameters such as
the pH value, or the density of the wine. This shall make it possible to presort the
wines so that not every substandard wine must be tasted by expensive expert
sommeliers.

Implementation
We used RapidMiner The following methods have been tested:

1. Linear Regression: NAE = 0.746 +/- 0.038


2. Regression Tree: NAE = 0.745 +/- 0.041
3. Neural Net: NAE = 0.862 +/- 0.149
4. Model Tree: NAE = 0.743 +/- 0.035

Data source
We used the following data set: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
Modeling wine preferences by data mining from physicochemical properties. In Decision
Support Systems, Elsevier, 47(4):547-553, 2009. It can be accessed
here: https://archive.ics.uci.edu/ml/datasets/Wine+Quality

2. Sources

Created by: Paulo Cortez (Univ. Minho), António Cerdeira, Fernando Almeida, Telmo
Matos and José Reis (CVRVV) @ 2009

3. Past Usage:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.

Modeling wine preferences by data mining from physicochemical properties.

In Decision Support Systems>, Elsevier, 47(4):547-553. ISSN: 0167-9236.

In the above reference, two datasets were created, using red and white wine samples.
The inputs include objective tests (e.g. PH values) and the output is based on sensory
data

(median of at least 3 evaluations made by wine experts). Each expert graded the wine
quality

between 0 (very bad) and 10 (very excellent). Several data mining methods were
applied to model

these datasets under a regression approach. The support vector machine model
achieved the

best results. Several metrics were computed: MAD, confusion matrix for a fixed error
tolerance (T),

etc. Also, we plot the relative importances of the input variables (as measured by a
sensitivity

analysis procedure).

4. Relevant Information:

These datasets can be viewed as classification or regression tasks.

The classes are ordered and not balanced (e.g. there are munch more normal wines
than

excellent or poor ones). Outlier detection algorithms could be used to detect the few
excellent

or poor wines. Also, we are not sure if all input variables are relevant. So

it could be interesting to test feature selection methods.

5. Number of Instances: red wine - 1599; white wine - 4898.

6. Number of Attributes: 11 + output attribute


Note: several of the attributes may be correlated, thus it makes sense to apply some
sort of

feature selection.

7. Attribute information:

For more information, read [Cortez et al., 2009].

Input variables (based on physicochemical tests):

1 - fixed acidity

2 - volatile acidity

3 - citric acid

4 - residual sugar

5 - chlorides

6 - free sulfur dioxide

7 - total sulfur dioxide

8 - density

9 - pH

10 - sulphates

11 - alcohol

Output variable (based on sensory data):

12 - quality (score between 0 and 10)

8. Missing Attribute Values: None

You might also like