0% found this document useful (0 votes)
84 views2 pages

Wine Recog Data

This document provides information about a wine recognition data set containing measurements of 13 constituents found in wine samples from 3 different cultivars grown in the same Italian region. The data set has previously been used to compare classification methods, with the RDA method achieving 100% correct classification. The document asks a series of questions about assessing the normality of the data, comparing variance-covariance matrices between cultivars, performing linear and quadratic discriminant analysis in R, examining discriminant loadings and group separation, performing MANOVA and discriminant analysis in JMP, conducting forward selection discriminant analysis to find a discriminating variable subset, and cross-validating a discriminant analysis using only the selected subset.

Uploaded by

gheorghe gardu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views2 pages

Wine Recog Data

This document provides information about a wine recognition data set containing measurements of 13 constituents found in wine samples from 3 different cultivars grown in the same Italian region. The data set has previously been used to compare classification methods, with the RDA method achieving 100% correct classification. The document asks a series of questions about assessing the normality of the data, comparing variance-covariance matrices between cultivars, performing linear and quadratic discriminant analysis in R, examining discriminant loadings and group separation, performing MANOVA and discriminant analysis in JMP, conducting forward selection discriminant analysis to find a discriminating variable subset, and cross-validating a discriminant analysis using only the selected subset.

Uploaded by

gheorghe gardu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 2

Wine Recognition Data

Data Source:
Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification
and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via
Brigata Salerno, 16147 Genoa, Italy.
Past Usage:
(1) S. Aeberhard, D. Coomans and O. de Vel,
Comparison of Classifiers in High Dimensional Settings,
Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of
Mathematics and Statistics, James Cook University of North Queensland.
(Also submitted to Technometrics).
The data was used with many others for comparing various classifiers. The classes are
separable, though only RDA has achieved 100% correct classification. (RDA : 100%,
QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data) (All results using the
leave-one-out technique).
(2) S. Aeberhard, D. Coomans and O. de Vel,
"THE CLASSIFICATION PERFORMANCE OF RDA"
Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of
Mathematics and Statistics, James Cook University of North Queensland.
(Also submitted to Journal of Chemometrics).
Relevant Information:
These data are the results of a chemical analysis of wines grown in the same region in
Italy but derived from three different cultivars. The analysis determined the quantities of
13 constituents found in each of the three types of wines. The units are not important for
the purposes of this problem as I recommend using the scaled data in your analysis
anyway.
The 13 attributes are:
1) Alcohol
2) Malic acid
3) Ash
4) Alcalinity of ash
5) Magnesium
6) Total phenols
7) Flavanoids
8) Nonflavanoid phenols
9) Proanthocyanins
10)Color intensity
11)Hue
12)OD280/OD315 of diluted wines
13)Proline

14) Cultivar/Wine Type (1,2, or 3) Grouping Variable


QUESTIONS
a) Assess the normality of these data. You can use JMP (Wine.JMP), R (Wine), and Arc
(Wine.lsp). (3 pts.)
b) Compare the variance-covariance matrices of the three cultivars. Is it appropriate to
assume 1 2 3 for these data? You can use the TestVar(X,grp) command to
test equality of the variance-covariance matrices two cultivars at a time. (2 pts.)
c) Using the data frame Wine in R perform both linear and quadratic discriminant
analysis to classify cultivar. Reproduce the holdout error rates mentioned above
(bold/underlined). Also perform cross-validation to examine the classification
performance. Which method lda or qda would you recommend and why? (6 pts.)
d) Use the Discrim function in R to examine the loadings of the linear discriminant
functions and group separation achieved. Which variables appear to have the best
discriminatory ability? Be sure to keep in mind the fact that the variables are on very
different scales. (4 pts.)
e) Use JMP to perform MANOVA and a discriminant analysis for these data. Discuss
the canonical centroid plot obtained from the discriminant analysis. Does it agree with
your conclusions from part (d)? (3 pts.)
f) Perform forward selection (stepwise) discriminant analysis to find a subset of the 13
variables that significantly discriminate the cultivars. Which variables are not included?
(4 pts.)

g) Using only the variables from your forward stepwise selection use R to conduct and
cross-validate an appropriate discriminant analysis for this subset. Does the
discrimination model with fewer variables cross-validate better? Compare your results to
part (c) above. (4 pts.)

You might also like