Dam 3
Dam 3
x1
n A simple procedure would be to
g Divide the feature space into uniform bins
g Compute the ratio of examples for each class at each bin and,
g For a new example, find its bin and choose the predominant class in that bin
n We decide to start with one feature and divide the real line into 3 bins
g Notice that there exists a lot of overlap between classes ⇒ to improve
discrimination, we decide to incorporate a second feature
x2
x1 x1
performance
that was lost by discarding some
features is compensated by a
more accurate mapping in lower-
dimensional space
dimensionality
g How do we beat the curse of dimensionality?
n By incorporating prior knowledge
n By providing increasing smoothness of the target function
n By reducing the dimensionality
x1
x y1 w 11 w 12 L x1
w 1N
M
2
→ 2 = 21
y w w 22 L w 2N 2
x
M
M M O M
linear feature extraction
yM w M1 w M2 w MN
xN xN
Feature 2
2
g Within the realm of linear feature 2 2
22 2 2
extraction, two techniques are 111
1
22 2
2 22
1 1 11 2 2
commonly used 1 1 2 2 2
1 111
n Principal Components (PCA) 111 1 2
Si
gn
g Based on signal representation 11
a
1
l r (PC
n
ep A
n Fisher’s Linear Discriminant (LDA)
A) io
D at
re )
(L ific
se
g Based on classification
nt
ss
a
la
tio
C
n
Feature 1
(y )v = argminE [[x’− x] ] 2 x2
x
y
x1 u1
The
The optimal*
optimal* approximation
approximation of
of aa random
random vector x∈ℜNN by
vector x∈ℜ by aa linear
linear combination
combination of of
M
M (M<N)
(M<N) independent
independent vectors
vectors is
is obtained
obtained by by projecting
projecting the
the random
random vector
vector xx onto
onto
the eigenvectors vvii corresponding
the eigenvectors corresponding to to the
the largest eigenvalues λλii of
largest eigenvalues of the
the covariance
covariance
matrix
matrix of (Σxx))
of xx (Σ
x’ = y 1 v 1 + y 2 v 2 L + y M v M
x1
y1 v 1
T
v11 v12 L v1N
y T v v 22 x2
x’= 2 = 2 x = 21
v x
M M O 3
T M
y M v M v M1 v M2 L v MN x
N
x2
n Of all the possible lines we would
like to select the one that maximizes
the separability of the classes
x1
x2
µ1 µ1’ - µ2’
σ1’
maximize
( 1 ’− 2 ’)
2
µ2
1’ − 2 ’
2 2
σ2’
x1
V T SB V
[
V = v1 | v 2 | L | v C−1 ] = argmin T ⇒ (SB − iS W )v i = 0
V SW V
C
SB = ∑ Pi ( i − )( i − )T
i=1
1 1
where i =
Ni x∈
∑ x and = ∑x
N ∀x
i
axis 3
PCA -2
-4
7
10
10 7 4
4 46 2 61 21 29 85
1
6 6 5 5 33
15 31 2 8
9 3 335
53
8
83
53
0
10
7
10
10
74410
7
4
104
101
5 6
44
466
10
5 1
2 64 2
2
664 61566221
6 921
1
2
5
6
8
48
2
6 92 159
5 15
5
5
8 9 9
5
8128922 8 29 1
5 9 5319395 3988 93
7
10 15 1 3 10 4 61769
62129855162 5 8 313 5 8
29 9
-5 13 1 3 122 5 888 9
-6 3 8 313 8 3 8
5
3
2 83 3 3 8
53 3 5
-10 3 3 5
-8 -5 3
0
2 0
-10 8 5
-5
-10
6 -5 0 5 10 10 -10
axis 2
axis 1 axis 1
-3
x 10
7
15 7 77
7 7777 -3
9
7777
x 10
777 9999
95 9599
10 99
9
95
98 5
9 55 52
7 7 5
8885
88
9
888822552
9
5 5
22
52
22522
22
5
22
8888 8 211122
0 11111 1
13111 111 6 7 77 7
axis 2
7 77
LDA
axis 3
5
95 4
9999995555 9 -5 333 66666 44444 44 777 777
7
8 99859555555555
99
333
3
3
3 3 6666666 444444
995 5 2 22 -10
3 333 3 66666 44 4
4
88 88
888 58
8 22222 3 4
1010
0 88
338
3 2222 22 -15 3 10 10
1010
10
1010
333383 2
11
11 10
10
10 10
1010
10
3338
33333 1
11
2 12 1 6 66 4 444 -0.02 15
3 111111 666666666 4444444 4 10 10
-5 3 11 6 44 10 10
10 0
6 666 4 44 1010
10
10
10 5 -3
4 1010
10
10
10 0.02 0
x 10
1010
-0.02 -0.01 0 0.01 0.02 0.03 0.04
10
0.05
0.04 -5
axis 2
axis 1 axis 1
ω2
ω2 ω1 µ2
A
PC
LD
A
x1
Intelligent Sensor Systems 14
Ricardo Gutierrez-Osuna
Wright State University
Limitations of LDA
g LDA has a tendency to overfit training data
n To illustrate this problem, we generate an artificial dataset
g Three classes, 50 examples per class, with the exact same likelihood: a multivariate
Gaussian with zero mean and identity covariance
g As we arbitrarily increase the number of dimensions, classes appear to separate
better, even though they come from the same distribution
10 dimensions 50 dimensions
3 1
2 1 1
2 22 2 2 1
1 1 1 2 121
2 12 21 2 3 2
2 3 1 1
1 2
11 22 3 23 1 1 1 1 1
3 1 2 1 11 1
3
2 31122133 313 1 2 1 1 1
1 3 1322322 11 212 2
1
2 1 3311 11 3
123
2 3 3 32 2 3 1 3113 323 1 1 22 21
32
133 13 1 1 22231 2
1 1 1311211 1213 32
3
1 23 13 2 1 33 3331 1 2 1122
3 22
1 2 3
1121 1 2 32123 3 3 3 33 313 2332 2 121 2
2
1 21
2
3 3231 3 23 2313
2
1 2 22 2 2 2 2
3
3 1
3 1 3 3 3133
212 3 3122222 2 2 2 2
21 1 1333 2121 3 3 2 3
331333 3 32
2
33 3 3 3 1 33
3 2 3
1 3 2