mvda assignment
Thiru Kathir ch22b113
May 2025
1 Introduction
Question 1: Linear Discriminant Analysis (LDA)
MATLAB Code
1 clear all ; clc ;
2 % % Data import
3 data = readtable ( ’ DA_Examplev2 . xlsx ’) ;
4 % Divide data into 2 sets / classes / populations
5 val = length ( data . Group ) ;
6 for i =1: length ( data . Group )
7 if data . Group ( i ) == 1
8 G1logAHFActivity ( i ) = data . logAHFActivity ( i ) ;
9 G1logAHFAntigen ( i ) = data . logAHFAntigen ( i ) ;
10 count = i ;
11 else
12 j = i - count ;
13 G2logAHFActivity ( j ) = data . logAHFActivity ( i ) ;
14 G2logAHFAntigen ( j ) = data . logAHFAntigen ( i ) ;
15 end
16 end
17 NonCarrier = [ G1logAHFActivity ’ G1logAHFAntigen ’];
18 Carrier = [ G2logAHFActivity ’ G2logAHFAntigen ’];
19
20 % % Section a - LDA Parameters
21 div1 = length ( NonCarrier ) ;
22 div2 = length ( Carrier ) ;
23 x1bar = mean ( NonCarrier ) ’;
24 x2bar = mean ( Carrier ) ’;
25 S1 = cov ( NonCarrier ) ;
26 S2 = cov ( Carrier ) ;
27 n1m1 = div1 - 1;
28 n2m1 = div2 - 1;
29 Spooled = ( n1m1 /( n1m1 + n2m1 ) ) * S1 + ( n2m1 /( n1m1 + n2m1 ) ) * S2 ;
30 Spooledinv = inv ( Spooled ) ;
31 diffp = ( x1bar - x2bar ) ’;
1
32 ahatp = diffp * Spooledinv ;
33 y1bar = ahatp * x1bar ;
34 y2bar = ahatp * x2bar ;
35 mhat = 0.5 * ( y1bar + y2bar ) ;
36
37 % % Section b - Classify new observations
38 Test_Data = [
39 -0.112 -0.279
40 -0.059 -0.068
41 0.064 0.012
42 -0.043 -0.052
43 -0.050 -0.098
44 -0.094 -0.113
45 -0.123 -0.143
46 -0.011 -0.037
47 -0.210 -0.090
48 -0.126 -0.019];
49 Result = ahatp * Test_Data ’;
50 Score = Result - mhat ;
51 for j = 1: length ( Test_Data )
52 if Score ( j ) > 0
53 Class ( j ) = 1;
54 else
55 Class ( j ) = 2;
56 end
57 end
58
59 % % Section c - Adjusted mhat for unequal priors
60 prior1 = 0.75;
61 prior2 = 0.25;
62 mhat_unequal = 0.5 * ( y1bar + y2bar ) + log ( prior1 / prior2 ) / 2;
63 Score_unequal = Result - mhat_unequal ;
64 for j = 1: length ( Test_Data )
65 if Score_unequal ( j ) > 0
66 Class_unequal ( j ) = 1;
67 else
68 Class_unequal ( j ) = 2;
69 end
70 end
Results
Part (a): LDA Parameters
• Discriminant vector â = [19.319, −17.124]
• Threshold m = −3.559
• Discriminant function: y = 19.319 · x1 − 17.124 · x2
2
• Classification Rule: If y > −3.559, classify as Group 1 (Non-Carrier); otherwise, Group 2
(Carrier).
Part (b): Classification of New Observations (Equal Priors)
Obs. logAHFActivity logAHFAntigen Score Classification
1 -0.112 -0.279 6.173 1
2 -0.059 -0.068 3.584 1
3 0.064 0.012 4.590 1
4 -0.043 -0.052 3.619 1
5 -0.050 -0.098 4.272 1
6 -0.094 -0.113 3.679 1
7 -0.123 -0.143 3.632 1
8 -0.011 -0.037 3.981 1
9 -0.210 -0.090 1.044 1
10 -0.126 -0.019 1.451 1
Table 1: Classification of 10 new observations under equal priors
Part (c): Classification with Unequal Priors (0.75/0.25)
• Adjusted threshold m = −3.010
• Classification Rule: If y > −3.010, classify as Group 1 (Non-Carrier); otherwise, Group 2
(Carrier).
Obs. logAHFActivity logAHFAntigen Score Classification
1 -0.112 -0.279 5.624 1
2 -0.059 -0.068 3.035 1
3 0.064 0.012 4.041 1
4 -0.043 -0.052 3.070 1
5 -0.050 -0.098 3.722 1
6 -0.094 -0.113 3.129 1
7 -0.123 -0.143 3.083 1
8 -0.011 -0.037 3.431 1
9 -0.210 -0.090 0.494 1
10 -0.126 -0.019 0.901 1
Table 2: Classification of 10 new observations under unequal priors
3
Observation Equal Priors Unequal Priors Change?
1 1 1 No
2 1 1 No
3 1 1 No
4 1 1 No
5 1 1 No
6 1 1 No
7 1 1 No
8 1 1 No
9 1 1 No
10 1 1 No
Table 3: Comparison of classification results under different prior assumptions
Comparison: Equal vs Unequal Priors
Question 2
MATLAB Code
clc; clear all;
%% Import data from spreadsheet
% Script for importing data from the following spreadsheet:
%
% Workbook: C:\Users\thiru\acads\MVDA\Assignment3\Assignment3_Data\Assignment3_ChocolateData.xlsx
% Worksheet: Sheet1
%
% Auto-generated by MATLAB on 14-May-2025 18:54:25
%% Set up the Import Options and import the data
opts = spreadsheetImportOptions("NumVariables", 5);
% Specify sheet and range
opts.Sheet = "Sheet1";
opts.DataRange = "A2:E25";
% Specify column names and types
opts.VariableNames = ["group", "respondent", "constant", "priceLevel", "delicious"];
opts.VariableTypes = ["double", "double", "double", "double", "double"];
% Import the data
Assignment3ChocolateData = readtable("C:\Users\thiru\acads\MVDA\Assignment3\Assignment3_Data\Assignm
%% Clear temporary variables
clear opts;
y = Assignment3ChocolateData.group;
4
for i=1:length(Assignment3ChocolateData.group)
if Assignment3ChocolateData.group(i) == 1
G1priceLevel(i) = Assignment3ChocolateData.priceLevel(i);
G1delicious(i) = Assignment3ChocolateData.delicious(i);
count = i;
else
j = i - count;
G2priceLevel(j) = Assignment3ChocolateData.priceLevel(i);
G2delicious(j) = Assignment3ChocolateData.delicious(i);
end
end
G1 = [G1priceLevel’ G1delicious’];
G2 = [G2priceLevel’ G2delicious’];
%% Section a
% Mean values of observation vectors
div1 = length(G1);
div2 = length(G2);
x1bar = mean(G1)’;
x2bar = mean(G2)’;
% Covariance of observation vectors
S1 = cov(G1);
S2 = cov(G2);
n1m1 = div1 - 1;
n2m1 = div2 - 1;
% Pooled Covariance of observation vectors
Spooled = (n1m1 / (n1m1 + n2m1)) * S1 + (n2m1 / (n1m1 + n2m1)) * S2;
Spooledinv = Spooled^-1;
% To find ahat and mhat
diffp = (x1bar - x2bar)’;
ahatp = diffp * Spooledinv;
y1bar = ahatp * x1bar;
y2bar = ahatp * x2bar;
mhat = 0.5 * (y1bar + y2bar);
%% Verify with built-in code
X = [Assignment3ChocolateData.priceLevel, Assignment3ChocolateData.delicious];
MdlLinear = fitcdiscr(X, Assignment3ChocolateData.group);
MdlLinear.ClassNames([1 2]);
K = MdlLinear.Coeffs(1,2).Const;
L = MdlLinear.Coeffs(1,2).Linear;
5
%% Confusion Matrix Between Custom Code and MATLAB Built-in
% Custom classification using ahatp and mhat
Xdata = [Assignment3ChocolateData.priceLevel Assignment3ChocolateData.delicious]’;
y_custom = ahatp * Xdata;
y_pred_custom = y_custom >= mhat; % 1 if G1, 0 if G2
y_pred_custom = y_pred_custom + 1; % convert to class labels: 1 or 2
% Built-in classification
y_builtin = predict(MdlLinear, X);
y_builtin = y_builtin(:); % Ensure column form
% Confusion Matrix
confMat = confusionmat(y_builtin, y_pred_custom’);
disp(’Confusion Matrix (Rows: Built-in, Columns: Custom):’);
disp(confMat);
Results
From the code and calculations:
âT = −1.8857
1.0327
m̂ = −3.6255
Question 3
MATLAB Code
1 % Clear workspace and command window
2 clc ; clear all ;
3
4 % % Import data from spreadsheet
5 opts = s p r e a d s h e e t I m p o r t O p t i o n s (" NumVariables " , 4) ;
6 opts . Sheet = " T11 -2";
7 opts . DataRange = " A2 : D101 ";
8 opts . VariableNames = [" Class " , " Gender " , " Freshwater " , " Marine "];
9 opts . VariableTypes = [" double " , " double " , " double " , " double "];
10 fishData = readtable (" C :\ Users \ thiru \ acads \ MVDA \ Assignment3 \ A s s i g n m e n t 3 _ D a ta \
A s s i g n m e n t 3 _ F i s h y . xlsx " , opts , " UseExcel " , false ) ;
11 clear opts ;
12
13 % % Section for Gender 1 ( Male )
14 maleData = fishData ( fishData . Gender == 1 , :) ;
15 y_male = maleData . Class ;
16
17 for i =1: height ( maleData )
18 if maleData . Class ( i ) == 1
19 G 1 M a l e _ F r e s h w a t e r ( i ) = maleData . Freshwater ( i ) ;
20 G1Male_Marine ( i ) = maleData . Marine ( i ) ;
21 count_male_1 = i ;
22 else
6
23 j = i - count_male_1 ;
24 G 2 M a l e _ F r e s h w a t e r ( j ) = maleData . Freshwater ( i ) ;
25 G2Male_Marine ( j ) = maleData . Marine ( i ) ;
26 end
27 end
28
29 G 1 M a l e _ F r e s h w a t e r = G 1 M a l e _ F r e s h w a t e r ( G 1 M a l e _ F r e s h w a t e r ~= 0) ;
30 G1Male_Marine = G1Male_Marine ( G1Male_Marine ~= 0) ;
31 G 2 M a l e _ F r e s h w a t e r = G 2 M a l e _ F r e s h w a t e r ( G 2 M a l e _ F r e s h w a t e r ~= 0) ;
32 G2Male_Marine = G2Male_Marine ( G2Male_Marine ~= 0) ;
33
34 G1_male = [ G1Male_Freshwater ’ G1Male_Marine ’];
35 G2_male = [ G2Male_Freshwater ’ G2Male_Marine ’];
36
37 div1_male = length ( G1_male ) ;
38 div2_male = length ( G2_male ) ;
39 x1bar_male = mean ( G1_male ) ’;
40 x2bar_male = mean ( G2_male ) ’;
41 S1_male = cov ( G1_male ) ;
42 S2_male = cov ( G2_male ) ;
43 n1m1_male = div1_male - 1;
44 n2m1_male = div2_male - 1;
45
46 Spooled_male = ( n1m1_male /( n1m1_male + n2m1_male ) ) * S1_male + ( n2m1_male /( n1m1_male +
n2m1_male ) ) * S2_male ;
47 S po ol ed i nv _m al e = inv ( Spooled_male ) ;
48
49 diffp_male = ( x1bar_male - x2bar_male ) ’;
50 ahatp_male = d i f f p _ m a l e S p o o l e d i n v _ m a l e ;
51 y1bar_male = a h a t p _ m a l e x 1 b a r _ m a l e ;
52 y2bar_male = a h a t p _ m a l e x 2 b a r _ m a l e ;
53 mhat_male = 0.5( y1bar_male + y2bar_male ) ;
54
55 disp ( ’ ahat ␣ for ␣ Male ␣ data : ’) ;
56 disp ( ahatp_male ) ;
57 disp ( ’ mhat ␣ for ␣ Male ␣ data : ’) ;
58 disp ( mhat_male ) ;
59
60 X_male = [ maleData . Freshwater , maleData . Marine ];
61 MdlL inear_ma le = fitcdiscr ( X_male , maleData . Class ) ;
62 MdlL inear_ma le . ClassNames ([1 2]) ;
63 K_male = MdlLine ar_male . Coeffs (1 ,2) . Const ;
64 L_male = MdlLine ar_male . Coeffs (1 ,2) . Linear ;
65
66 X_data_male = [ maleData . Freshwater maleData . Marine ] ’;
67 y_custom_male = ahatp_male * X_data_male ;
68 y _ p r e d _ c u s t o m _ m a l e = y_custom_male >= mhat_male ;
69 y _ p r e d _ c u s t o m _ m a l e = double ( y _ p r e d _ c u s t o m _ m a l e ) + 1;
70
71 y_bu iltin_ma le = predict ( MdlLinear_male , X_male ) ;
72 conf_mat_male = confusionmat ( y_builtin_male , y_pred_custom_male ’) ;
73
74 disp ( ’ confusion ␣ matrix ␣ for ␣ male ␣ data : ’) ;
75 disp ( conf_mat_male ) ;
76
77 % % Section for Gender 2 ( Female )
78 femaleData = fishData ( fishData . Gender == 2 , :) ;
79 y_female = femaleData . Class ;
7
80
81 for i =1: height ( femaleData )
82 if femaleData . Class ( i ) == 1
83 G 1 F e m a l e _ F r e s h w a t e r ( i ) = femaleData . Freshwater ( i ) ;
84 G 1F em al e _M ar in e ( i ) = femaleData . Marine ( i ) ;
85 coun t_female _1 = i ;
86 else
87 j = i - coun t_female _1 ;
88 G 2 F e m a l e _ F r e s h w a t e r ( j ) = femaleData . Freshwater ( i ) ;
89 G 2F em al e _M ar in e ( j ) = femaleData . Marine ( i ) ;
90 end
91 end
92
93 G 1 F e m a l e _ F r e s h w a t e r = G 1 F e m a l e _ F r e s h w a t e r ( G 1 F e m a l e _ F r e s h w a t e r ~= 0) ;
94 G 1F em al e _M ar in e = G 1 Fe ma le _ Ma ri ne ( G1 F em al e_ M ar in e ~= 0) ;
95 G 2 F e m a l e _ F r e s h w a t e r = G 2 F e m a l e _ F r e s h w a t e r ( G 2 F e m a l e _ F r e s h w a t e r ~= 0) ;
96 G 2F em al e _M ar in e = G 2 Fe ma le _ Ma ri ne ( G2 F em al e_ M ar in e ~= 0) ;
97
98 G1_female = [ G1Female_Freshwater ’ G1Female_Marine ’];
99 G2_female = [ G2Female_Freshwater ’ G2Female_Marine ’];
100
101 div1_female = length ( G1_female ) ;
102 div2_female = length ( G2_female ) ;
103 x1bar_female = mean ( G1_female ) ’;
104 x2bar_female = mean ( G2_female ) ’;
105 S1_female = cov ( G1_female ) ;
106 S2_female = cov ( G2_female ) ;
107 n1m1_female = div1_female - 1;
108 n2m1_female = div2_female - 1;
109
110 Spoo led_fema le = ( n1m1_female /( n1m1_female + n2m1_female ) ) * S1_female + ( n2m1_female /(
n1m1_female + n2m1_female ) ) * S2_female ;
111 S p o o l e d i n v _ f e m a l e = inv ( Spooled_ female ) ;
112
113 diffp_female = ( x1bar_female - x2bar_female ) ’;
114 ahatp_female = d i f f p _ f e m a l e S p o o l e d i n v _ f e m a l e ;
115 y1bar_female = a h a t p _ f e m a l e x 1 b a r _ f e m a l e ;
116 y2bar_female = a h a t p _ f e m a l e x 2 b a r _ f e m a l e ;
117 mhat_female = 0.5( y1bar_female + y2bar_female ) ;
118
119 X_female = [ femaleData . Freshwater , femaleData . Marine ];
120 M d l L i n e a r _ f e m a le = fitcdiscr ( X_female , femaleData . Class ) ;
121 M d l L i n e a r _ f e m a le . ClassNames ([1 2]) ;
122 K_female = M d l L i n e a r _ f e m a l e . Coeffs (1 ,2) . Const ;
123 L_female = M d l L i n e a r _ f e m a l e . Coeffs (1 ,2) . Linear ;
124
125 X_data_female = [ femaleData . Freshwater femaleData . Marine ] ’;
126 y _c us to m _f em al e = ahatp_female * X_data_female ;
127 y _ p r e d _ c u s t o m _ f e m a l e = y _c us to m _f em al e >= mhat_female ;
128 y _ p r e d _ c u s t o m _ f e m a l e = double ( y _ p r e d _ c u s t o m _ f e m a l e ) + 1;
129
130 y _ b u i l t i n _ f e m a le = predict ( MdlLinear_female , X_female ) ;
131
132 disp ( ’ ahat ␣ for ␣ female ␣ data : ’) ;
133 disp ( ahatp_female ) ;
134 disp ( ’ mhat ␣ for ␣ female ␣ data : ’) ;
135 disp ( mhat_female ) ;
136
8
137 c on f_ ma t _f em al e = confusionmat ( y_builtin_female , y_pred_custom_female ’) ;
138 disp ( ’ confusion ␣ matrix ␣ for ␣ female ␣ data : ’) ;
139 disp ( conf _ ma t_ fe m al e ) ;
Output
ahat for Male data:
-0.1308 0.0457
mhat for Male data:
2.6580
confusion matrix for male data:
0 23
29 0
ahat for female data:
-0.1207 0.0558
mhat for female data:
8.1242
confusion matrix for female data:
0 22
26 0
Question 4
a) All Principal Components (Scores)
The following table presents the principal component scores obtained after standardizing the data
and projecting onto the eigenvector basis:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
0.9415 -0.9637 0.4216 -1.4281 -2.0147 0.7193 0.5646
0.0488 -0.4240 0.2891 -2.0818 0.7737 -0.1600 -0.1562
... ... ... ... ... ... ...
-1.8004 1.1646 1.7067 0.5718 -0.9579 -0.5571 -0.4185
(Note: Full table truncated for brevity.)
b) Eigenvalues and Eigenvectors
Eigenvalues and Explained Variance:
• Component 1: 2.3368 (33.38% variance, Cumulative: 33.38%)
• Component 2: 1.3860 (19.80% variance, Cumulative: 53.18%)
9
• Component 3: 1.2041 (17.20% variance, Cumulative: 70.38%)
• Component 4: 0.7271 (10.39% variance, Cumulative: 80.77%)
• Component 5: 0.6535 (9.34% variance, Cumulative: 90.11%)
• Component 6: 0.5367 (7.67% variance, Cumulative: 97.77%)
• Component 7: 0.1559 (2.23% variance, Cumulative: 100.00%)
Eigenvectors:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
Wind -0.2368 0.2784 -0.6435 -0.1727 -0.5605 -0.2236 -0.2415
Solar 0.2056 -0.5266 -0.2245 -0.7781 0.1561 -0.0057 -0.0113
CO 0.5511 -0.0068 0.1136 -0.0053 -0.5734 -0.1095 0.5853
NO 0.3776 0.4347 0.4071 -0.2905 0.0567 -0.4502 -0.4609
NO2 0.4980 0.1998 -0.1966 0.0424 -0.0502 0.7450 -0.3378
O3 0.3246 -0.5670 -0.1598 0.5079 -0.0802 -0.3306 -0.4171
HC 0.3194 0.3079 -0.5411 0.1431 0.5661 -0.2665 0.3139
c) Loading Matrix
√
The loading matrix, computed as Loadings = Eigenvectors × Eigenvalues, is shown below:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
Wind -0.3620 0.3278 -0.7061 -0.1473 -0.4531 -0.1638 -0.0953
Solar 0.3142 -0.6200 -0.2463 -0.6635 0.1262 -0.0042 -0.0044
CO 0.8424 -0.0080 0.1247 -0.0045 -0.4635 -0.0802 0.2311
NO 0.5772 0.5117 0.4467 -0.2477 0.0458 -0.3298 -0.1820
NO2 0.7613 0.2352 -0.2157 0.0362 -0.0406 0.5458 -0.1334
O3 0.4961 -0.6675 -0.1754 0.4331 -0.0649 -0.2422 -0.1647
HC 0.4883 0.3625 -0.5937 0.1220 0.4576 -0.1952 0.1240
1.1 d) JMP
Figure 1: Enter Caption
Question 5
MATLAB Code
clc; clear all;
10
Figure 2: Enter Caption
covMatrix = [5 2; 2 2];
[eigVectors, eigValueMatrix] = eig(covMatrix);
eigValues = diag(eigValueMatrix);
[sortedEigValues, indices] = sort(eigValues, ’descend’);
sortedEigVectors = eigVectors(:, indices);
totalVariance = sum(sortedEigValues);
proportionFirstPC = sortedEigValues(1) / totalVariance;
fprintf(’Population Principal Components:\n’);
fprintf(’Y = %.4f X + %.4f X\n’, sortedEigVectors(1,1), sortedEigVectors(2,1));
fprintf(’Y = %.4f X + %.4f X\n’, sortedEigVectors(1,2), sortedEigVectors(2,2));
fprintf(’\nProportion of total population variance explained by first principal component: %.2f%%\n’
Output
Population Principal Components:
Y = -0.8944 X + -0.4472 X
Y = 0.4472 X + -0.8944 X
Proportion of total population variance explained by first principal component: 85.71%
Question 6
MATLAB Code
clc; clear all;
stock_data = readtable(’Stock Market Data.xlsx’);
X = stock_data{:, 2:end};
stock_names = stock_data.Properties.VariableNames(2:end);
[n, p] = size(X);
11
%% Part a: Covariance Matrix and Principal Components
S = cov(X);
disp(’Part a: Sample Covariance Matrix’);
disp(S);
[eigenvectors, eigenvalues_diag] = eig(S);
eigenvalues = diag(eigenvalues_diag);
[eigenvalues_sorted, idx] = sort(eigenvalues, ’descend’);
eigenvectors_sorted = eigenvectors(:, idx);
disp(’Principal Components (Eigenvectors):’);
pc_table = array2table(eigenvectors_sorted, ’VariableNames’, ...
strcat(’PC’, cellstr(num2str((1:p)’))));
pc_table.Properties.RowNames = stock_names;
disp(pc_table);
%% Part b: Proportion of Variance Explained
variance_explained = eigenvalues_sorted / sum(eigenvalues_sorted);
cumulative_variance = cumsum(variance_explained);
disp(’Part b: Proportion of Variance Explained’);
for i = 1:p
fprintf(’PC%d: Eigenvalue = %.4f, Variance Explained = %.2f%%, Cumulative = %.2f%%\n’, ...
i, eigenvalues_sorted(i), variance_explained(i)*100, cumulative_variance(i)*100);
end
first_three_proportion = sum(variance_explained(1:3));
fprintf(’Proportion of total variance explained by first three components: %.2f%%\n’, ...
first_three_proportion*100);
Output
Part a: Sample Covariance Matrix
1.0e-03 *
0.4333 0.2757 0.1590 0.0641 0.0890
0.2757 0.4387 0.1800 0.1815 0.1233
0.1590 0.1800 0.2240 0.0734 0.0605
0.0641 0.1815 0.0734 0.7225 0.5083
0.0890 0.1233 0.0605 0.5083 0.7657
Principal Components (Eigenvectors):
PC1 PC2 PC3 PC4 PC5
_______ ________ ________ ________ ________
12
JPMorgan 0.22282 0.62523 -0.32611 0.66276 -0.11766
Citibank 0.30729 0.57039 0.24959 -0.41409 0.58861
WellsFargo 0.15481 0.3445 0.037639 -0.49705 -0.7803
RoyalDutchShell 0.63897 -0.24795 0.6425 0.30887 -0.14846
ExxonMobil 0.6509 -0.32185 -0.64586 -0.21638 0.093718
Part b: Proportion of Variance Explained
PC1: Eigenvalue = 0.0014, Variance Explained = 52.93%, Cumulative = 52.93%
PC2: Eigenvalue = 0.0007, Variance Explained = 27.13%, Cumulative = 80.06%
PC3: Eigenvalue = 0.0003, Variance Explained = 9.82%, Cumulative = 89.88%
PC4: Eigenvalue = 0.0001, Variance Explained = 5.52%, Cumulative = 95.40%
PC5: Eigenvalue = 0.0001, Variance Explained = 4.60%, Cumulative = 100.00%
Proportion of total variance explained by first three components: 89.88%
13