0% found this document useful (0 votes)
33 views119 pages

Tema 7

Uploaded by

Neeraj Sah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views119 pages

Tema 7

Uploaded by

Neeraj Sah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 119

Chapter 7.

Orthogonality and least squares

C.O.S. Sorzano

Biomedical Engineering

December 3, 2013

7. Orthogonality and least squares December 3, 2013 1 / 119


Outline

7 Orthogonality and least squares


Inner product, length and orthogonality (a)
Orthogonal sets, bases and matrices (a)
Orthogonal projections (b)
Gram-Schmidt orthogonalization (b)
Least squares (c)
Least-squares linear regression (c)
Inner product spaces (d)
Applications of inner product spaces (d)

7. Orthogonality and least squares December 3, 2013 2 / 119


References

D. Lay. Linear algebra and its applications (3rd ed). Pearson (2006). Chapter 6.

7. Orthogonality and least squares December 3, 2013 3 / 119


A little bit of history

Least squares was first used to solve problems in geodesy (Andrien-Marie


Legendre, 1805) and astronomy (Carl Friedrich Gauss, 1809). Gauss made the
connection of this method to the distribution of measurement errors. Currently it
is one of the best understood and most widely spread methods.

7. Orthogonality and least squares December 3, 2013 4 / 119


Applications

In this example Least Squares are used to plan a radiation therapy.

Bedford, J. L. Sinogram analysis of aperture optimization by iterative least-squares in volumetric modulated arc therapy. Physics in Medicine and Biology,

2013, 58, 1235-1250

7. Orthogonality and least squares December 3, 2013 5 / 119


Applications

Traditionally, control applications were formulated in a least-squares setup.


Currently, they have found more sophisticated goal functions that can be regarded
as evolved versions of least squares.

7. Orthogonality and least squares December 3, 2013 6 / 119


Outline

7 Orthogonality and least squares


Inner product, length and orthogonality (a)
Orthogonal sets, bases and matrices (a)
Orthogonal projections (b)
Gram-Schmidt orthogonalization (b)
Least squares (c)
Least-squares linear regression (c)
Inner product spaces (d)
Applications of inner product spaces (d)

7. Orthogonality and least squares December 3, 2013 7 / 119


Inner product

Definition 1.1 (Inner product or dot product)


Let u, v ∈ Rn be two vectors. The inner product or dot product between these
two vectors is defined as
n
P
u · v = hu, vi , u i vi
i=1

Theorem 1.1
If we considered u and v to be column vectors (∈ Mn×1 ), then
u · v = uT v

Example
Let u = (2, −5, −1) and v = (3, 2, −3).
u · v = 2 · 3 + (−5) · 2 + 1 · (−3) = −1

7. Orthogonality and least squares December 3, 2013 8 / 119


Inner product

Theorem 1.2
For any three vectors u, v, w ∈ Rn and any scalar r ∈ R it is verified that
1 u·v=v·u
2 (u + v) · w = u · w + v · w
3 (r u) · v = r (u · v) = u · (r v)
4 u·u≥0
5 u·u=0⇔u=0

Corollary
(r1 u1 + r2 u2 + ... + rp up ) · v = r1 (u1 · v) + r2 (u2 · v) + ... + rp (up · v)

7. Orthogonality and least squares December 3, 2013 9 / 119


Length

Definition 1.2 (Length of a vector)


Given any vector v, its length is defined as

kvk , v · v

Theorem 1.3
Given any vector v ∈ Rn
p
kvk = v12 + v22 + ... + vn2

Example
The length of v = (1, −2, 2, 0) is
p
kvk = 12 + (−2)2 + 22 + 02 = 3

7. Orthogonality and least squares December 3, 2013 10 / 119


Length

Theorem 1.4
For any vector v and any scalar r it is verified that
kr vk = |r |kvk

Proof
It will be given only for v ∈ Rn :
p p
kr vk = √ (rvp 2 2 2
1 ) + (rv2 ) + ... + (rvn ) = r 2 (v12 + v22 + ... + vn2 )
2 2 2 2
(q.e.d.)
= r v1 + v2 + ... + vn = |r |kvk

Example (continued)
Find a vector of unit length that has the same direction as v = (1, −2, 2, 0).
Solution
q
v
= 13 , − 23 , 23 , 0 ⇒ kuv k = 19 + 49 + 94 + 0 = 1

uv = kvk

7. Orthogonality and least squares December 3, 2013 11 / 119


Distance

Definition 1.3 (Distance in R)


The distance between any two numbers a, b ∈ R can be defined as
d(a, b) = |a − b|

Example
Calculate the distance between 2 and 8 as well as between -3 and 4.

7. Orthogonality and least squares December 3, 2013 12 / 119


Distance

Definition 1.4 (Distance in Rn )


The distance between any two vectors u, v ∈ Rn can be defined as
d(u, v) = ku − vk

Example
Calculate the distance between u = (7, 1) and v = (3, 2)
√ √
d(u, v) = k(7, 1) − (3, 2)k = k(4, −1)k = 42 + 12 = 17

7. Orthogonality and least squares December 3, 2013 13 / 119


Distance

Example
For any two vectors in R3 , u and v, the distance can be calculated through
= ku − vk = k(u1 − v1 , u2 − v2 , u3 − v3 )k =
d(u, v)p
(u1 − v1 )2 + (u2 − v2 )2 + (u3 − v3 )2

7. Orthogonality and least squares December 3, 2013 14 / 119


Orthogonality

Example
Any two vectors in R2 , u and v, are orthogonal if d(u, v) = d(u, −v)

d 2 (u, v) = ku − vk2 = (u − v) · (u − v) = u · u + v · v − 2u · v = kuk2 + kvk2 − 2u · v


d 2 (u, −v) = ku + vk2 = (u + v) · (u + v) = u · u + v · v + 2u · v = kuk2 + kvk2 + 2u · v

d 2 (u, v) = d 2 (u, −v) ⇒ −2u · v = 2u · v ⇒ u · v = 0

7. Orthogonality and least squares December 3, 2013 15 / 119


Orthogonality

Definition 1.5 (Orthogonality between two vectors)


Any two different vectors, u and v, in a vector space V are orthogonal iff
u·v=0

Corollary
0 is orthogonal to any other vector.

Theorem 1.5 (Pythagorean theorem)


Any two vectors, u and v, in a vector space V are orthogonal iff
ku + vk2 = kuk2 + kvk2

7. Orthogonality and least squares December 3, 2013 16 / 119


Orthogonality

Definition 1.6 (Orthogonality between vector and vector space)


Let u be a vector in a vector space V and W a vector subspace of V . u is
orthogonal to W if u is orthogonal to all vectors in W . The set of all vectors
orthogonal to W is denoted as W ⊥ (the orthogonal complement of W ).

Example
Let W be a plane in R3 passing through the origin and L be a line, passing
through the origin and perpendicular to W . For any vector w ∈ W and any vector
z ∈ L we have

w·z=0

Therefore,
L = W ⊥ ⇔ W = L⊥

7. Orthogonality and least squares December 3, 2013 17 / 119


Orthogonality
Theorem 1.6
Let W be a vector subspace of a vector space V .
1 x ∈ W ⊥ iff x is orthogonal to every vector in a set that spans W .
2 W ⊥ is a vector subspace of V .

Theorem 1.7

Let A ∈ Mm×n , then


1 (Row{A})⊥ = Nul{A}
2 (Col{A})⊥ = Nul{AT }

7. Orthogonality and least squares December 3, 2013 18 / 119


Orthogonality

Proof Nul{A} ⊆ (Row{A})⊥


Consider the rows of A, ai (i = 1, 2, ..., m) as column vectors, then for any vector
x ∈ Nul{A} we know
 T  T     
a1 a1 x a1 · x 0
aT  aT x  a2 · x   0 
 2
Ax = 0 ⇒   x =   2  =   =  
... ...   ...  ...
aTm aT
mx am · x 0

Consequently, x is orthogonal to all the rows of A, which span Row{A} and by


the previous theorem, x ∈ (Row{A})⊥
Proof Nul{A} ⊇ (Row{A})⊥
Conversely, let x ∈ (Row{A})⊥ , then by the previous theorem we know that

ai · x for i = 1, 2, ..., m ⇒ Ax = 0
So, x ∈ Nul{A}

7. Orthogonality and least squares December 3, 2013 19 / 119


Orthogonality

Proof (Col{A})⊥ = Nul{AT }


Let’s define B = AT . By the first part of this theorem, we know
(Row{B})⊥ = Nul{B} ⇒ (Row{AT })⊥ = Nul{AT } ⇒ (Col{A})⊥ = Nul{AT }

Theorem 1.8
For any two vectors u and v in a vector space V , the angle between the two can
be measured through the dot product:

u · v = kukkvk cos θ

7. Orthogonality and least squares December 3, 2013 20 / 119


Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 1:
6.1.15
6.1.22
6.1.24
6.1.26
6.1.28
6.1.30
6.1.32 (computer)

7. Orthogonality and least squares December 3, 2013 21 / 119


Outline

7 Orthogonality and least squares


Inner product, length and orthogonality (a)
Orthogonal sets, bases and matrices (a)
Orthogonal projections (b)
Gram-Schmidt orthogonalization (b)
Least squares (c)
Least-squares linear regression (c)
Inner product spaces (d)
Applications of inner product spaces (d)

7. Orthogonality and least squares December 3, 2013 22 / 119


Orthogonal sets

Definition 2.1 (Orthogonal set)


Let S = {u1 , u2 , ..., up } be a set of vectors. S is an orthogonal set iff

ui · uj = 0 ∀i, j ∈ {1, 2, ..., p} i 6= j

Example
Let u1 = (3, 1, 1), u2 = (−1, 2, 1), u3 = (− 12 , −2, 72 ). Check whether the set
S = {u1 , u2 , u3 } is orthogonal.
Solution

u1 · u2 = 3 · (−1) + 1 · 2 + 1 · 1 = 0
u1 · u3 = 3 · (− 12 ) + 1 · (−2) + 1 · ( 72 ) = 0
u2 · u3 = (−1) · (− 12 ) + 2 · (−2) + 1 · ( 27 ) = 0

7. Orthogonality and least squares December 3, 2013 23 / 119


Orthogonal sets

Theorem 2.1
If S is an orthogonal set of non-null vectors, then S is linearly independent and,
consequently, it is a basis of the subspace spanned by S.
Proof
Let ui (i = 1, 2, ..., p) be the elements of S. Let us assume that S is linearly
dependent. Then, there exists coefficients c1 , c2 , ..., cp not all of them null such
that
0 = c1 u1 + c2 u2 + ... + cp up
Now, we compute the inner product with u1
0 · u1 = (c1 u1 + c2 u2 + ... + cp up ) · u1
0 = c1 (u1 · u1 ) + c2 (u2 · u1 ) + ... + cp (up · u1 ) = c1 ku1 k2 ⇒ c1 = 0
Multiplying by ui (i = 2, 3, ..., p) we can show that all ci ’s are 0, and, therefore,
the set S is linearly independent.

7. Orthogonality and least squares December 3, 2013 24 / 119


Orthogonal basis
Definition 2.2 (Orthogonal basis)
A set of vectors B is an ortohogonal basis of a vector space V if it is an
ortohogonal set and it is a basis of V .

Theorem 2.2
Let {u1 , u2 , ..., up } be an orthogonal basis for a vector space V , for each x ∈ V
we have
x·u1 x·u2 x·up
x= ku1 k2 u1 + ku2 k2 u2 + ... + kup k2 up

Proof
If x is in V , then it can be expressed as a linear combination of the vectors in a
basis of V
x = c1 u1 + c2 u2 + ... + cp up

Now, we calculate the dot product with u1


x·u1
x · u1 = (c1 u1 + c2 u2 + ... + cp up ) · u1 = c1 ku1 k2 ⇒ c1 = ku1 k2
7. Orthogonality and least squares December 3, 2013 25 / 119
Orthogonal basis

Example
Let u1 = (3, 1, 1), u2 = (−1, 2, 1), u3 = (− 12 , −2, 72 ), and B = {u1 , u2 , u3 } be an
orthogonal basis of R3 . Let x = (6, 1, −8). The coordinates of x in B are given by
x · u1 = 11 x · u2 = −12 x · u1 = −33
ku1 k2 = 11 ku2 k2 = 6 ku3 k2 = 33
2

−12 −33
x = 11
11 u1 + 6 u2 + 33 u3
2
= u1 − 2u2 − 2u3
The coordinates of x in the basis B are
[x]B = (1, −2, −2)

7. Orthogonality and least squares December 3, 2013 26 / 119


Orthogonal projections

Orthogonal projection onto a vector


Consider a vector y and another one u. Let us assume we want to decompose y as
the sum of two orthogonal vectors ŷ (along the direction of u) and another vector
z (orthogonal to u):

y = ŷ + z = αu + z ⇒
z = y − ŷ

We need to find α that makes u and z orthogonal.


y·u
0 = z · u = (y − αu) · u = y · u − αkuk2 ⇒ α = kuk2

ŷ is the orthogonal projection of y onto u.

7. Orthogonality and least squares December 3, 2013 27 / 119


Orthogonal projections

Example
Let y = (7, 6) and u = (4, 2). Then,
 
y·u 8 40
ŷ = = kuk2 u
= 2u = 20 u
    4 

y · u = 40
⇒ 7 8 −1
kuk2 = 20 z = y − ŷ = − =
6 p4 2 √
d(y, ŷ) = ky − ŷk = kzk = (−1) + 22 = 5
2

7. Orthogonality and least squares December 3, 2013 28 / 119


Orthonormal set

Definition 2.3 (Orthonormal set)


{u1 , u2 , ..., up } is an orthonormal set if it is an orthogonal set and all ui vectors
have unit length.

Example
Show that the set {u1 , u2 , u3 } is orthonormal, with
     
3 −1 −1
u1 = √111 1 u2 = √16  2  u3 = √1 −4
66
1 1 7
Solution
Let’s check that they are orthogonal:
u1 · u2 = √1 √1 (3 · (−1) + 1 · 2 + 1 · 1) = 0
11 6
u1 · u3 = √1 √1 (3 · (−1) + 1 · (−4) + 1 · 7) = 0
11 66
u2 · u3 = √1 √1 ((−1) · (−1) + (2) · (−4) + (1) · 7) =0
6 66

7. Orthogonality and least squares December 3, 2013 29 / 119


Orthonormal set

Example (continued)
Now, let’s check that they have unit length:
r 2 q
ku1 k = √1 (32 + 12 + 12 ) = 9+1+1
=1
11
r 11
2 q
ku2 k = √1 ((−1)2 + 22 + 12 ) = 1+4+1 =1
6 6
r 2 q
ku3 k = √1 ((−1)2 + (−4)2 + 72 ) = 1+16+49 =1
66 66

Theorem 2.3
If S = {u1 , u2 , ..., un } is an orthonormal set, then it is an orthonormal basis of
Span{S}.

Example
{e1 , e2 , ..., en } is an orthonormal basis of Rn .
7. Orthogonality and least squares December 3, 2013 30 / 119
Orthonormal basis

Theorem 2.4
Let S = {u1 , u2 , ..., un } is an orthogonal set of vectors, then the set
S 0 = {u01 , u02 , ..., u0n } where
ui
u0i = kui k

is a orthonormal set (this operation is called vector normalization).


Proof
Let’s check that the u0i vectors are orthogonal:
ui uj
u0i · u0j = kui k · kuj k = 1
kui kkuj k ui · uj

But this product is obviusly 0 because the ui vectors are orthogonal. Let’s check
now that the u0i vectors have unit length:
ui kui k
ku0i k = kui k = kui k =1

7. Orthogonality and least squares December 3, 2013 31 / 119


Orthonormal matrix
Theorem 2.5
Let U ∈ Mm×n be a square matrix. The columns of U form an orthonormal set iff
U T U = In
It is said that U is an orthonormal matrix.
Proof
Let’s consider the columns of U

U = u1 u2 ... un

Let’s calculate now U T U


 T  T
u1 u1 uT ... uT

u1 1 u2 1 un
 uT   uT u
2 1 u T
2 u2 ... uT
2 un 

UT U =  2 
 ...  u1 u2 ... un =   ... ... ... ... 
uT
n uT u
n 1 u T
n u2 ... uT
n un
 T
ui uj = 0 i 6= j
The condition U T U = In simply states , which is the
uT
i uj = 1 i = j
definition of an orthonormal set.
7. Orthogonality and least squares December 3, 2013 32 / 119
Orthonormal matrix

Theorem 2.6
Let U ∈ Mn×n be an orthonormal matrix and ∀x, y ∈ Rn , then
1 kUxk = kxk
2 (Ux) · (Uy) = x · y
3 (Ux) · (Uy) = 0 ⇔ x · y = 0

Example
 
√1 2
3
√ 
 √12 2
Let U = − 23  and x = .

3
 2
1
0 3
U is an orthonormal matrix because
 1 1   √1 2


2

2
0 2 3
 
1 0
UT U =  2  √12 − 23  =

3 0 1
− 23 1
3 0 1
3

7. Orthogonality and least squares December 3, 2013 33 / 119


Orthonormal matrix
Example (continued)
Let’s calculate now Ux
 
√1 2  
3
√  3
 √12 2
Ux = − 23  = −1

3
 2
0 1 1
3

Let’s check now that kUxk = kxk


p √
kUxk = k(3, −1, 1)k = q32 + (−1)2 + 12 = 11
√ √ √
kxk = ( 2, 3) = ( 2)2 + 32 = 11

Theorem 2.7
Let U be an orthonormal and square matrix. Then,
1 U −1 = U T
2 U T is also an orthonormal matrix (i.e., the rows of U also form an
orthonormal set of vectors).
7. Orthogonality and least squares December 3, 2013 34 / 119
Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 2:
6.2.1
6.2.10
6.2.15
6.2.25
6.2.26
6.2.29
6.2.35 (computer)

7. Orthogonality and least squares December 3, 2013 35 / 119


Outline

7 Orthogonality and least squares


Inner product, length and orthogonality (a)
Orthogonal sets, bases and matrices (a)
Orthogonal projections (b)
Gram-Schmidt orthogonalization (b)
Least squares (c)
Least-squares linear regression (c)
Inner product spaces (d)
Applications of inner product spaces (d)

7. Orthogonality and least squares December 3, 2013 36 / 119


Orthogonal projections

Definition 3.1 (Orthogonal projection)


The orthogonal projection of a point y onto a vector subspace W is a point
ŷ ∈ W such that

z = y − ŷ
z⊥W

7. Orthogonality and least squares December 3, 2013 37 / 119


Orthogonal projections

Example
Let {u1 , u2 , ..., u5 } be an orthogonal basis of R5 . Consider the subspace
W = Span{u1 , u2 }. Given any vector y ∈ R5 , we can decompose it as the sum of
a vector in W and a vector perpendicular to W
y = ŷ + z
Solution
If {u1 , u2 , ..., u5 } is a basis of R5 , then any vector y ∈ R5 can be written as

y = c1 u1 + c2 u2 + ... + c5 u5
We may decompose this sum as
ŷ = c1 u1 + c2 u2
z = c3 u3 + c4 u4 + c5 u5

7. Orthogonality and least squares December 3, 2013 38 / 119


Orthogonal projections

Example (continued)
It is obvious that ŷ ∈ W . Now we need to show that z ∈ W ⊥ . For doing so, we
will show that
z · u1 = 0
z · u2 = 0
To show the first equation we note that
z · u1 = (c3 u3 + c4 u4 + c5 u5 ) · u1
= c3 (u3 · u1 ) + c4 (u4 · u1 ) + c5 (u5 · u1 )
= c3 · 0 + c4 · 0 + c5 · 0
= 0
We would proceed analogously for z · u2 = 0.

7. Orthogonality and least squares December 3, 2013 39 / 119


Orthogonal projections

Theorem 3.1 (Orthogonal Decomposition Theorem)


Let W be a vector subspace of a vector space V . Then, any vector y ∈ V can be
written uniquely as
y = ŷ + z
with ŷ ∈ W and z ∈ W ⊥ . In fact, if {u1 , u2 , ..., up } is an orthogonal basis of W ,
then
y·u1 y·u2 y·up
ŷ = ku1 k2 u1 + ku2 k2 u2 + ... + kup k2 up

7. Orthogonality and least squares December 3, 2013 40 / 119


Orthogonal projections

Proof
ŷ is obviously in W since it has been written as a linear combination of vectors in
a basis of W . z is perpendicular to W because
  
y·u1 y·u2 y·up
z · u1 = y − ku 1k
u
2 1 + u
ku2 k2 2 + ... + u
kup k2 p · u1
y·u1 y·u2 y·u
= y · u1 − ku1 k2 (u1 · u1 ) − ku2 k2 (u2 · u1 ) − ... − kup kp 2 (up · u1 )
[{ui } is an orthogonal set]
y·u1
= y · u1 − ku 1k
2 (u1 · u1 )
y·u1
= y · u1 − ku1 k2 ku1 k2
= y · u1 − y · u1
= 0
We could proceed analogously for all elements in the basis of W .

7. Orthogonality and least squares December 3, 2013 41 / 119


Orthogonal projections
We need to show now that the decomposition is unique. Let us assume that it is
not unique. Consequently, there exist different vectors such that
y = ŷ + z
y = ŷ0 + z0

We subtract both equations


0 = (ŷ − ŷ0 ) + (z − z0 ) ⇒ ŷ − ŷ0 = z0 − z
Let v = ŷ − ŷ0 . It is obvious that v ∈ W because it is written as a linear
combination of vectors in W . On the other side, v = z0 − z, i.e., it is a linear
combination of vectors in W ⊥ , so v ∈ W ⊥ . The only vector that belongs to W
and W ⊥ at the same time is
ŷ = ŷ0

v=0⇒ .
z = z0
and consequently, the orthogonal decomposition is unique. Additionally, the
uniqueness of the decomposition depends only on W and not on the particular
basis chosen for W .
7. Orthogonality and least squares December 3, 2013 42 / 119
Orthogonal projections

Example
Let u1 = (2, 5, −1) and u2 = (−2, 1, 1). Let W be the subspace spanned by u1
and u2 . Let y = (1, 2, 3) ∈ R3 . The orthogonal projection of y onto W is
y·u1 y·u2
ŷ = ku1 k2 u1 + ku2 k2 u2
  
2 −2
1·2+2·5+3·(−1)  5 + 1·(−2)+2·1+3·1 1
= 22 +52 +(−1)2 (−2)2 +12 +12
−1    1
2
  
2 −2 −5
9 
= 30
5  + 15
30
 1 = 2 
−1   1  15 
1 − 25 7
5
z = y − ŷ = 2 −
   2  = 0

1 14
3 5 5

7. Orthogonality and least squares December 3, 2013 43 / 119


Orthogonal projections

Geometrical interpretation
ŷ can be understood as the sum of the orthogonal projection of y onto each one
of the elements of the basis of W .

Theorem 3.2
If y belongs to W , then the orthogonal projection of y onto W is itself:

ŷ = y
7. Orthogonality and least squares December 3, 2013 44 / 119
Properties of orthogonal projections
Theorem 3.3 (Best approximation theorem)
The orthogonal projection of y onto W is the point in W with minimum distance
to y, i.e.,
ky − ŷk ≤ ky − vk

for all v ∈ W , v 6= ŷ.


Proof
We know that y − ŷ is orthogonal to W . For any vector v ∈ W , v 6= ŷ, we have
that ŷ − v is in W . Now consider the orthogonal decomposition of the vector y − v

y − v = (y − ŷ) + (ŷ − v)

7. Orthogonality and least squares December 3, 2013 45 / 119


Properties of orthogonal projections

Due to the orthogonal decomposition theorem (Theorem 3.1), this decomposition


is unique and due to the Pythagorean theorem (Theorem 1.5) we have
ky − vk2 = ky − ŷk2 + kŷ − vk2
Since v 6= ŷ we have kŷ − vk2 > 0 and consequently

ky − vk2 > ky − ŷk2

7. Orthogonality and least squares December 3, 2013 46 / 119


Properties of orthogonal projections

Theorem 3.4
If {u1 , u2 , ..., up } is an orthonormal basis of W , then the orthogonal projection of
y onto W is

ŷ = hy, u1 i u1 + hy, u2 i u2 + ... + hy, up i up



If we construct the orthonormal matrix U = u1 u2 ... up , then
ŷ = UU T y
Proof
By Theorem 3.1 we know that for all orthogonal bases it is verified
y·u1 y·u2 y·up
ŷ = ku1 k2 u1 + ku2 k2 u2 + ... + kup k2 up

Since the basis is in this case orthonormal, then kuk = 1 and consequently
ŷ = hy, u1 i u1 + hy, u2 i u2 + ... + hy, up i up

7. Orthogonality and least squares December 3, 2013 47 / 119


Properties of orthogonal projections

On the other side we have


uT
   T   
1 u1 y hu1 , yi
uT uT
 2 y hu2 , yi
   
UT y =  2 
 ...  y =  ...  =  ... 
uT
p uT
py hup , yi
Then,
 
hu1 , yi
 hu2 , yi
UU T y = u1 u2 ... up  ...  = hy, u1 i u1 + hy, u2 i u2 + ... + hy, up i up

hup , yi
(q.e.d.)

7. Orthogonality and least squares December 3, 2013 48 / 119


Properties of orthogonal projections

Corollary

Let U = u1 u2 ... up be a n × p matrix with orthonormal columns and
W = Col{U} its column space. Then,
∀x ∈ Rp U T Ux = x No effect
∀y ∈ Rn UU T y = ŷ Orthogonal projection of y onto W
If U is a n × n, then W = Rn and the projection has no effect
∀y ∈ Rn UU T y = ŷ = y No effect

7. Orthogonality and least squares December 3, 2013 49 / 119


Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 3:
6.3.1
6.3.7
6.3.15
6.3.23
6.3.24
6.3.25 (computer)

7. Orthogonality and least squares December 3, 2013 50 / 119


Outline

7 Orthogonality and least squares


Inner product, length and orthogonality (a)
Orthogonal sets, bases and matrices (a)
Orthogonal projections (b)
Gram-Schmidt orthogonalization (b)
Least squares (c)
Least-squares linear regression (c)
Inner product spaces (d)
Applications of inner product spaces (d)

7. Orthogonality and least squares December 3, 2013 51 / 119


Gram-Schmidt orthogonalization
Gram-Schmidt orthogonalization is a procedure aimed at producing an orthogonal
basis of any subspace W .
Example
Let W = Span{x1 , x2 } with x1 = (3, 6, 0) and x2 = (1, 2, 2). Let’s look for an
orthogonal basis of W .
Solution
We may keep the first vector for the basis

v1 = x1 = (3, 6, 0)
For the second vector in the basis, we need to keep the component of x2 that is
orthogonal to x1 . For doing so we calculate the projection of x2 onto x1 (p), and
we decompose x2 as

x2 = p + (x2 − p) = (1, 2, 0) + (0, 0, 2)


We, then, keep the orthogonal part of x2
v2 = x2 − p = (0, 0, 2)

7. Orthogonality and least squares December 3, 2013 52 / 119


Gram-Schmidt orthogonalization

Example (continued)
The set {v1 , v2 } is an orthogonal basis of W .

7. Orthogonality and least squares December 3, 2013 53 / 119


Gram-Schmidt orthogonalization

Example
Let W = Span{x1 , x2 , x3 } with x1 = (1, 1, 1, 1), x2 = (0, 1, 1, 1) and
x3 = (0, 0, 1, 1). Let’s look for an orthogonal basis of W .
Solution
We may keep the first vector for the basis. Then we construct a subspace (W1 )
with a single element in its basis
v1 = x1 = (1, 1, 1, 1) W1 = Span{v1 }

For the second vector in the basis, we need to keep the component of x2 that is
orthogonal to W1 . With the already computed basis vectors, we construct a new
subspace (W2 ) with two elements in its basis
v2 = x2 − ProjW1 (x2 ) = (− 43 , 41 , 14 , 14 ) W2 = Span{v1 , v2 }

For the third vector in the basis, we repeat the same procedure
v3 = x3 − ProjW2 (x3 ) = (0, − 32 , 13 , 13 ) W3 = Span{v1 , v2 , v3 }

7. Orthogonality and least squares December 3, 2013 54 / 119


Gram-Schmidt orthogonalization

Theorem 4.1 (Gram-Schmidt orthogonalization)


Given a basis {x1 , x2 , ..., xp } for a vector subspace W . Define

v1 = x1 W1 = Span{v1 }
v2 = x2 − ProjW1 (x2 ) W2 = Span{v1 , v2 }
...
vp = xp − ProjWp−1 (xp ) Wp = Span{v1 , v2 , ..., vp } = W

Then {v1 , v2 , ..., vp } is an orthogonal basis of W .

7. Orthogonality and least squares December 3, 2013 55 / 119


Gram-Schmidt orthogonalization

Proof
Consider Wk = Span{v1 , v2 , ..., vk } and let us assume that {v1 , v2 , ..., vk } is a
basis of Wk . Now we construct
vk+1 = xk+1 − ProjWk (xk+1 ) Wk+1 = Span{v1 , v2 , ..., vk+1 }

By the orthogonal decomposition theorem (Theorem 3.1), we know that vk+1 is


orthogonal to Wk . Because xk+1 is an element of a basis, we know that
xk+1 ∈ / Wk . Therefore, vk+1 is not null and xk+1 ∈ Wk+1 . Finally, the set
{v1 , v2 , ..., vk+1 } is a set of orthogonal, non-null vectors in the
(k + 1)-dimensional space Wk+1 . Consequently, by Theorem 9.4 in Chapter 5, it
must be a basis of Wk+1 . This process can be iterated till k = p.

7. Orthogonality and least squares December 3, 2013 56 / 119


Gram-Schmidt orthogonalization

Orthonormal basis
Once we have an orthogonal basis, we simply have to normalize each vector to
have an orthonormal basis.

Example
Let W = Span{x1 , x2 } with x1 = (3, 6, 0) and x2 = (1, 2, 2). Let’s look for an
orthonormal basis of W .
Solution
In Slide 52 we learned that an orthogonal basis was given by
v1 = (3, 6, 0)
v2 = (0, 0, 2)

Now, we normalize these two vectors to obtain an orthonormal basis


v10 = kvv11 k = √145 (3, 6, 0) = ( √15 , √25 , 0)
v20 = kvv22 k = 12 (0, 0, 2) = (0, 0, 1)

7. Orthogonality and least squares December 3, 2013 57 / 119


QR factorization of matrices
If we apply the Gram-Schmidt factorization to the columns of a matrix, we have
the following factorization scheme. This factorization is used in practice to find
eigenvalues and eigenvectors as well as to solve linear equation systems.

Theorem 4.2 (QR factorization)


Let A ∈ Mm×n with linearly independent columns. Then, A can be factored as
A = QR
where Q ∈ Mm×n is a matrix whose columns form an orthonormal basis of
Col{A} and R ∈ Mn×n is an upper triangular invertible matrix with positive
entries on its diagonal.
Proof
Let’s orthogonalize the columns of A following the Gram-Schmidt procedure and
construct the orthonormal basis of Col{A}. Let {u1 , u2 , ..., un } be such a basis.
Let us construct the matrix

Q = u1 u2 ... un

7. Orthogonality and least squares December 3, 2013 58 / 119


QR factorization of matrices

Let us call ai (i = 1, 2, ..., n) to the columns of A. By the Gram-Schmidt


orthogonalization, we know that for any k between 1 and n we have

Span{a1 , a2 , ..., ak } = Span{u1 , u2 , ..., uk }


Consequently, we can express each column of A in the orthonormal basis:
ak = r1k u1 + r2k u2 + ... + rkk uk + 0 · uk+1 + ... + 0 · un
If rkk is negative, we can multiply both rkk and uk by -1. We now collect all these
coefficients in a vector rk = (r1k , r2k , ..., rkk , 0, 0, ..., 0) to have
ak = Qrk
By gathering all these vectors in a matrix, we have the triangular matrix R

R = r1 r2 ... rn

R is invertible because the columns of A are linearly independent.

7. Orthogonality and least squares December 3, 2013 59 / 119


QR factorization of matrices
Example
 
1 0 0
1 1 0
Let’s calculate the QR factorization of A = 
1
. From Slide 54 we know
1 1
1 1 1
that the vectors
v1 = (1, 1, 1, 1)
v2 = (− 34 , 41 , 14 , 14 )
v3 = (0, − 23 , 13 , 13 )

Is an orthogonal basis of the column space of A. We now normalize these vectors


to obtain the orthonormal basis in Q
1
− √312

2 0
1 √1 − √26 
Q =  21 12
 
√1 √1 

2 12 6
1 √1 √1
2 12 6

7. Orthogonality and least squares December 3, 2013 60 / 119


QR factorization of matrices

Example (continued)
To find R we multiply on both sides of the factorization by Q T

A = QR ⇒ Q T A = Q T QR = R  
1 0 0
 
1 1 1 1
2 2 2 2
 3 1 1 1  1 1 0
= − √12 √12 √12 √12  

R = QT A 1

1 1
0 − √26 √16 √1
 
6 1 1 1
3
2 2 1
3 2 
= 0 √12 √12 

0 0 √1
6

7. Orthogonality and least squares December 3, 2013 61 / 119


Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 4:
6.4.7
6.4.13
6.4.19
6.4.22
6.4.24

7. Orthogonality and least squares December 3, 2013 62 / 119


Outline

7 Orthogonality and least squares


Inner product, length and orthogonality (a)
Orthogonal sets, bases and matrices (a)
Orthogonal projections (b)
Gram-Schmidt orthogonalization (b)
Least squares (c)
Least-squares linear regression (c)
Inner product spaces (d)
Applications of inner product spaces (d)

7. Orthogonality and least squares December 3, 2013 63 / 119


Least squares
Let’s assume we want to solve the equation system Ax = b, but, due to noise,
there is no solution. We may still look for a solution such that Ax ≈ b. In fact the
goal will be to minimize d(Ax, b).

Definition 5.1 (Least squares solution)


Let A be a m × n matrix and b ∈ Rm . x ∈ˆRn is a least squares solution of the
equation system Ax = b iff

∀x ∈ Rn kb − Ax̂k ≤ kb − Axk

7. Orthogonality and least squares December 3, 2013 64 / 119


Least squares

Solution of the general least squares problem


Applying the Best Approximation Theorem (Theorem 3.3), we may project b onto
the column space of A
b̂ = ProjCol{A} {b}

Then, we solve the equation


system
Ax = b̂

that has at least one solution.

7. Orthogonality and least squares December 3, 2013 65 / 119


Least squares

Theorem 5.1
The set of least-squares solutions of Ax = b is the same as the set of solutions of
the normal equations
AT Ax = AT b

Proof: least-squares solutions ⊂ normal equations solutions


Let us assume that x̂ is a least-squares solution. Then, b − Ax̂ is orthogonal to
Col{A}, and in particular, to each one of the columns of A (ai , i = 1, 2, ..., n):
ai · (b − Ax̂) = 0 ∀i ∈ {1, 2, ..., n} ⇒
aTi (b − Ax̂) = 0 ∀i ∈ {1, 2, ..., n} ⇒
AT (b − Ax̂) = 0 ⇒
AT b = AT Ax̂
That is, every least-squares solution is also a solution of the normal equations.

7. Orthogonality and least squares December 3, 2013 66 / 119


Least squares

Proof: least-squares solutions ⊃ normal equations solutions


Let us assume that x̂ is solution of the normal equations. Then,
AT b = AT Ax̂ ⇒
AT (b − Ax̂) = 0 ⇒
T
ai (b − Ax̂) = 0 ∀i ∈ {1, 2, ..., n}
That is, b − Ax̂ is orthogonal to the columns of A and, consequently, to Col{A}.
Hence, the equation
b = Ax̂ + (b − Ax̂)

is the orthogonal decomposition of b as a vector in Col{A} and a vector


orthogonal to Col{A}. By the uniqueness of the orthogonal decomposition, Ax̂
must be the orthogonal projection of b onto Col{A} so that
Ax̂ = b̂

and, therefore, x̂ is a least-squares solution.

7. Orthogonality and least squares December 3, 2013 67 / 119


Least squares

Example
   
4 0 2
Find a least-squares solution to Ax = b with A = 0 2 and b =  0 .
1 1 11
Solution
Let’s solve the normal equations AT Ax̂ = AT b
   
T 17 1 T 19
A A= A b=
1 5 11
     −1    
17 1 19 17 1 19 1
x̂ = ⇒ x̂ = =
1 5 11 1 5 11 2
Let’s check that x̂ is not a solution of the original equation system but a
least-squares solution
     
4 0   4 2
1
Ax̂ = 0 2 = 4 = b̂ 6= b =  0 
2
1 1 3 11

7. Orthogonality and least squares December 3, 2013 68 / 119


Least squares

Definition 5.2 (Least-squares error)


The least-squares error is defined as
σ2 , kAx̂ − bk2 = kb̂ − bk2

Example (continued)
In this case:
σ2 = k(4, 4, 3) − (2, 0, 11)k = k(2, 4, −8)k ≈ 9.165

7. Orthogonality and least squares December 3, 2013 69 / 119


Least squares

Example
Unfortunately, the least-squares solution may not be unique as shown in the next
example
 (arising in ANOVA). Find
 aleast-squares solution to Ax = b with
1 1 0 0 −3
1 1 0 0 −1
   
1 0 1 0
 and b =  0 .
 
A= 1 0 1 0 2
   
1 0 0 1 5
1 0 0 1 1
Solution
   
6 2 2 2 4
 2 2 0 0  −4 
AT A =  T
2 0 2 0 A b =  2 
  

2 0 0 2 6

7. Orthogonality and least squares December 3, 2013 70 / 119


Least squares

Example (continued)
The augmented matrix is
   
6 2 2 2 4 1 0 0 1 3
 2 2
 0 0 −4 
∼ 0 1 0
 −1 −5 

 2 0 2 0 2   0 0 1 −1 −2 
2 0 0 2 6 0 0 0 0 0
Any point of the form
   
3 −1
−5 1
x̂ = 
−2 + x4  1 
   ∀x4 ∈ R
0 1
is a least-squares solution of the problem.

7. Orthogonality and least squares December 3, 2013 71 / 119


Least squares

Theorem 5.2
The matrix AT A is invertible iff the columns of A are linearly independent. In this
case, the equation system Ax = b has a unique least-squares solution given by
x̂ = A+ b

where A+ is the Moore-Penrose pseudoinverse


A+ = (AT A)−1 AT

7. Orthogonality and least squares December 3, 2013 72 / 119


Least squares and QR decomposition
Sometimes AT A is ill-conditioned, this means that small perturbations in b
translate into large perturbations in x̂. The QR decomposition offers a numerically
more stable way of finding the least-squares solution.
Theorem 5.3
Let there be A ∈ Mm×n with linearly independent columns. Consider its QR
decomposition (A = QR). Then, for each b ∈ Rm there is a unique least-squares
solution of Ax = b given by
x̂ = R −1 Q T b
Proof
If we substitute x̂ = R −1 Q T b into Ax we have

Ax̂ = AR −1 Q T b = QRR −1 Q T b = QQ T b.
But Q is an orthonormal basis of Col{A} (Theorem 4.2 and Corollary in Slide 49)
and consequently QQ T b is the orthogonal projection of b onto Col{A}, that is, b̂.
So, x̂ = R −1 Q T b is a least-squares solution of Ax = b. Additionally, since the
columns of A are linearly independent, by Theorem 5.2, this solution is unique.
7. Orthogonality and least squares December 3, 2013 73 / 119
Least squares and QR decomposition

Remind that numerically it is easier to solve Rx̂ = Q T b than x̂ = R −1 Q T b

L
   
1 3 5 3
1 1 0 5
et A = 
1
 and b =  . Its QR decomposition is
1 2 7
1 3 3 −3
 1 1 1

 
2 2 2
 1 −1 −1  2 4 5
A = QR =  2 2 2  0 2 3
 1 −1 1 
2 2 2 0 0 2
1 1
   2 2
 − 12    
6 2 4 5 6 10
Q T b =  −6  ⇒ 0 2 3 x̂ =  −6  ⇒ x̂ =  −6 
4 0 0 2 4 2

7. Orthogonality and least squares December 3, 2013 74 / 119


Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 5:
6.5.1
6.5.19
6.5.20
6.5.21
6.5.24

7. Orthogonality and least squares December 3, 2013 75 / 119


Outline

7 Orthogonality and least squares


Inner product, length and orthogonality (a)
Orthogonal sets, bases and matrices (a)
Orthogonal projections (b)
Gram-Schmidt orthogonalization (b)
Least squares (c)
Least-squares linear regression (c)
Inner product spaces (d)
Applications of inner product spaces (d)

7. Orthogonality and least squares December 3, 2013 76 / 119


Least-squares linear regression
Example
In many scientific and engineering problems, it is needed to explain some
observations y as a linear function of an independent variable x. For instance, we
may try to explain the weight of a person as a linear function of its height

Weight = β0 + β1 Height

A. Schneider, G. Hommel, M. Blettner. Linear Regression Analysis. Dtsch Arztebl Int. 2010 November; 107(44): 776–782.

7. Orthogonality and least squares December 3, 2013 77 / 119


Least-squares linear regression

Example (continued)
For each observation we have an equation

Height (m.) Weight (kg.)


57 = β0 + 1.70β1
1.70 57
43 = β0 + 1.53β1
1.53 43
94 = β0 + 1.90β1
1.90 94
...
... ...
   
1 1.70   57
1 1.53 β0 = 43
 

1 1.90 β1 94
... ... ...

which is of the form


Xβ = y

7. Orthogonality and least squares December 3, 2013 78 / 119


Least-squares linear regression

Least-squares regression
Each one of the observed data points (xj , yj ) gives an equation. All together
provide an equation system
Xβ = y
that is an overdetermined, linear equation system of the form Ax = b. The matrix
X is called the system matrix and it is related to the independent (predictor)
variables (the height in this case). The vector y is called the observation vector
and collects the values of the dependent (predicted) variable (the weight in this
case). The model
y = β 0 + β1 x + 

is called the linear regression of y on x . β0 and β1 are called the regression


coefficients. The difference between the predicted value and the observed value
for a particular observation () is called the residual of that observation.

7. Orthogonality and least squares December 3, 2013 79 / 119


Least-squares linear regression

The residual of the j-th observation is defined as


j = yj − (β0 + β1 xj )

7. Orthogonality and least squares December 3, 2013 80 / 119


Least-squares linear regression

The goal of least-squares regression is to minimize


n
2j = ky − X βk2
P
j=1

Let’s analyze this term


     
1 x1   β0 + β1 x 1 ŷ1
 1 x2  β0 β0 + β2 x2  ŷ2 
Xβ = 
  =

 =  ... 
  
... ...  β1 ...
1 xn β0 + βn x n ŷn
Then
  2
y1 − ŷ1
y2 − ŷ2  n n
ky − X βk2 =  (yj − ŷj )2 = 2j
P P
 ... 
 =
j=1 j=1
yn − ŷn

7. Orthogonality and least squares December 3, 2013 81 / 119


Least-squares linear regression

Example
Suppose we have observed the following values of height and
 weight (1.70,57),

1 1.70
(1.53,43), (1.90,94). We construct the system matrix X = 1 1.53 and the
  1 1.90
57
observation vector y = 43. Now we look the normal equations
94
T T
  X β = y⇒ X Xβ = X y  
3.00 5.13 194.00 −173.14
XTX = XTy = β̂ = (X T X )−1 X T y =
5.13 8.84 341.29 137.90
Weight = −173.39 + 139.21Height

7. Orthogonality and least squares December 3, 2013 82 / 119


Least-squares linear regression

Example
110
MATLAB:
100
X=[1 1.70; 1 1.53; 1 1.90];
90
y=[57; 43; 94];
80
beta=inv(X’*X)*X’*y
Weight (kg)

70
x=1.5:0.01:2.00;
60
yp=beta(1)+beta(2)*x;
50 plot(x,yp,X(:,1),y,’o’)
40 xlabel(’Height (m)’)
30 ylabel(’Weight (kg)’)
1.5 1.6 1.7 1.8 1.9 2
Height (m)

7. Orthogonality and least squares December 3, 2013 83 / 119


Least-squares linear regression

The general linear model


The linear model is not restricted to straight lines. We can use it to fit any kind of
curves:
y = β0 f0 (x ) + β1 f1 (x ) + β2 f2 (x ) + ...

Fitting a parabola
y1 = f0 (x1 ) + β1 f1 (x1 ) + β2 f2 (x1 )
f0 (x ) = 1
y2 = f0 (x2 ) + β1 f1 (x2 ) + β2 f2 (x2 )
f1 (x ) = x ⇒
...
f2 (x ) = x 2
 y n = f
0 (xn ) + β1 f1 (x
n ) + β2 f2 (x
n) 
y1 1 x1 x12   1
y2   1 x2 x22  β0
 β1  + 2 
 
 =
 ...  ... ... ...  ... ⇒ y = Xβ + 
2 β2
yn 1 xn xn n

7. Orthogonality and least squares December 3, 2013 84 / 119


Least-squares linear regression

Fitting a parabola
In this example they model the deformation of the wall of the zebra fish embryo as
a function of strain.

Z. Lua, P. C.Y. Chen, H. Luo, J. Nam, R. Ge, W. Lin. Models of maximum stress and strain of zebrafish embryos under indentation. J. Biomechanics 42

(5): 620–625 (2009)

7. Orthogonality and least squares December 3, 2013 85 / 119


Least-squares linear regression

Multivariate linear regression


The linear model is not restricted to one variable. By fitting several variables we
may fit surfaces and hypersurfaces
y = β0 f0 (x1 , x2 ) + β1 f1 (x1 , x2 ) + β2 f2 (x1 , x2 ) + ...

Fitting a parabolic surface


f0 (x1 , x2 ) = 1
2 2
 
f1 (x1 , x2 ) = x1 1 x11 x12 x11 x12 x11 x12
2 2
f2 (x1 , x2 ) = x2  1 x21 x22 x21 x22 x21 x22 
⇒X = 
f3 (x1 , x2 ) = x12 ... ... ... ... ... ... 
2 2 2
f4 (x1 , x2 ) = x2 1 xn1 xn2 xn1 xn2 xn1 xn2
f5 (x1 , x2 ) = x1 x2

7. Orthogonality and least squares December 3, 2013 86 / 119


Least-squares linear regression
Fitting a parabolic surface
In this example they model the shape of cornea using videokeratoscopic images.

7. Orthogonality
http://www.fhp.tu- darmstadt.de/nt/index.php?id=531&L=1Signal and least
Processing squares
Group, Technische Universitat Darmstadt December 3, 2013 87 / 119
Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 6:
6.6.1
6.6.5
6.6.9
6.6.12 (computer)

7. Orthogonality and least squares December 3, 2013 88 / 119


Outline

7 Orthogonality and least squares


Inner product, length and orthogonality (a)
Orthogonal sets, bases and matrices (a)
Orthogonal projections (b)
Gram-Schmidt orthogonalization (b)
Least squares (c)
Least-squares linear regression (c)
Inner product spaces (d)
Applications of inner product spaces (d)

7. Orthogonality and least squares December 3, 2013 89 / 119


Inner product spaces

7. Orthogonality and least squares December 3, 2013 90 / 119


Inner product spaces
Definition 7.1 (Inner product)
An inner product in a vector space V is a function that assigns a real number to
every pair of vectors u and v, hu, vi and that satisfies the following axioms for all
u, v, w ∈ V and all scalars c:
1 hu, vi = hv, ui
2 hu + v, wi = hu, wi + hv, wi
3 hcu, vi = c hu, vi
4 hu, ui ≥ 0 and hu, ui = 0 iff u = 0.

Example
For instance in Weighted Least Squares (WLS) we may use an inner product in
R2 defined as:

hu, vi = 4u1 v1 + 5u2 v2


In this way we give less weight to distances in the first component with respect to
distances in the second component.
7. Orthogonality and least squares December 3, 2013 91 / 119
Inner product spaces

Now we have to prove that this function is effectively an inner product:


1 hu, vi = hv, ui
hu, vi = 4u1 v1 + 5u2 v2 [by definition]
= 4v1 u1 + 5v2 u2 [commutativity of scalar multiplication]
= hv, ui [by definition]
2 hu + v, wi = hu, wi + hv, wi

hu + v, wi = 4(u1 + v1 )w1 + 5(u2 + v2 )w2 [by definition]


= 4u1 w1 + 4v1 w1 + 5u2 w2 + 5v2 w2 [distributivity of scalar]
[multiplication/addition]
= 4u1 w1 + 5u2 w2 + 4v1 w1 + 5v2 w2 [commutativity]
[of scalar addition]
= hu, wi + hv, wi [by definition]

7. Orthogonality and least squares December 3, 2013 92 / 119


Inner product spaces

3 hcu, vi = c hu, vi
hcu, vi = 4cu1 v1 + 5cu2 v2 [by definition]
= c4v1 u1 + c5v2 u2 [commutativity of scalar multiplication]
= c(4v1 u1 + 5v2 u2 ) [distributivity of scalar multiplication]
= c hu, vi [by definition]
4 hu, ui ≥ 0 and hu, ui = 0 iff u = 0.
1 hu, ui ≥ 0
hu, ui = 4u12 + 5u22 [by definition]
which is obviously larger than 0.
2 hu, ui = 0 iff u = 0.
hu, ui = 0 ⇔ 4u12 + 5u22 = 0 ⇔ u1 = u2 = 0

7. Orthogonality and least squares December 3, 2013 93 / 119


Inner product spaces

Example
Consider two vectors p and q the vector space of polynomials of degree n (Pn ).
Let t0 , t1 , ..., tn be n distinct real numbers and K any scalar. The inner product
between p and q is defined as
hp, qi = K (p(t0 )q(t0 ) + p(t1 )q(t1 ) + ... + p(tn )q(tn ))

Axioms 1-3 are easy to check. Let’s prove Axiom 4


4 hp, pi ≥ 0 and hp, pi = 0 iff p = 0.
1 hp, pi ≥ 0

hp, pi = K p 2 (t0 ) + p 2 (t1 ) + ... + p 2 (tn ) [by definition]
which is obviously larger than 0.
2 hp, pi = 0 iff p = 0.

hp, pi = 0 ⇔ K p 2 (t0 ) + p 2 (t1 ) + ... + p 2 (tn ) ⇔
p(t0 ) = p(t1 ) = ... = p(tn ) = 0
But p is a polynomial of degree n so, at most, it can have n zeros. However,
the previous condition requires the polynomial to vanish at n + 1 points. This
is impossible unless p = 0.

7. Orthogonality and least squares December 3, 2013 94 / 119


Inner product spaces

Example
Consider two vectors p and q the vector space of polynomials of degree n (Pn ).
Assume that we regularly space the n + 1 points in the interval [−1, 1]

and set K = ∆T , then the inner product between the two polynomials becomes
n
P
hp, qi = (p(t0 )q(t0 ) + p(t1 )q(t1 ) + ... + p(tn )q(tn )) ∆T = p(ti )q(ti )∆T
i=0

Making ∆T tend to 0, the inner product becomes


R1
hp, qi = −1 p(t)q(t)dt

7. Orthogonality and least squares December 3, 2013 95 / 119


Inner product spaces

Legendre polynomials are orthogonal polynomials in the interval [−1, 1]

Legendre polynomials are very useful for regression of high-order polynomials as


shown in next slide.

7. Orthogonality and least squares December 3, 2013 96 / 119


Inner product spaces

7. Orthogonality and least squares December 3, 2013 97 / 119


Length, distance and orthogonality

Length, distance and orthogonality


The length of a vector u in an inner product space is defined in the standard way
p
kuk = hu, ui
Similarly, the distance between two vectors u and v is defined as

d(u, v) = ku − vk
Finally, two vectors u and v are said to be orthogonal iff
hu, vi = 0

7. Orthogonality and least squares December 3, 2013 98 / 119


Length, distance and orthogonality

Example
In the vector space of polynomials in the interval [0, 1], P[0, 1], let’s define the
inner product
R1
hp, qi = 0 p(t)q(t)dt
What is the length of the vector p(t) = 3t 2 ?
Solution
qR qR qR
p 1 2 1 2 )2 dt = 1
kpk = hp, pi = 0
p (t)dt = 0
(3t 0
9t 4 dt
r
5
1 q
9 t5 = 9 15 − 0 = √35

=
0

7. Orthogonality and least squares December 3, 2013 99 / 119


Gram-Schmidt orthogonalization
Example
Gram-Schmidt is applied in the standard way. For instance, find an orthogonal
basis of P2 [−1, 1]. A basis that spans this space is

{1, t, t 2 }
Let’s orthogonalize it
p0 (t) = 1 R1
ht,p0 (t)i tdt
p1 (t) = t− kp0 k2 p0 (t) = t − R−11 1 = t − 02 1 = t
dt
−1
ht 2 ,p0 (t)i
2 ht 2 ,p1 (t)i
p2 (t) = t − kp0 k2 p0 (t) − kp1 k2 p1 (t)
R1 2 R1 2
t dt t tdt 2
2 1
−1
= t − 1
R − R−11 2 t = t 2 − 23 = t 2 − 3
dt t dt
−1 −1

In Slide 97 we proposed the Legendre polynomial of degree 2 to be


P2 (t) = 21 (3t 2 − 1), we can easily show that P2 (t) = 32 p2 (t). Consequently, if
p2 (t) is orthogonal to p0 (t) and p1 (t) so is P2 (t).
7. Orthogonality and least squares December 3, 2013 100 / 119
Best approximation

Example
What is the best approximation in P2 [−1, 1] of p(t) = t 3 ?
Solution
We know the answer is the orthogonal projection of p(t) onto P2 [−1, 1]. An
orthogonal basis of P2 [−1, 1] is {1, t, t 2 − 13 }. Therefore, this projection can be
calculated as
hp,p0 i hp,p1 i hp,p2 i
p̂(t) = ProjP2 [−1,1] {p(t)} = kp0 k2 p0 (t) + kp1 k2 p1 (t) + kp2 k2 p2 (t)

Let’s perform these calculations:


R1 R1
hp, p0 (t)i = −1 t 3 dt = 0 kp0 k2 = −1 dt = 2
R1 R1
hp, p1 (t)i = −1 t 3 tdt = 25 kp1 k2 = −1 t 2 dt = 23
R1 3 2 1 R1
hp, p2 (t)i = −1 t (t − 3 )dt = 0 kp2 k2 = −1 (t 2 − 13 )2 dt = 8
45
2
0 0
p̂(t) = 2 + t+5
2 8 (t 2 − 13 ) = 53 t
3 45

7. Orthogonality and least squares December 3, 2013 101 / 119


Best approximation

1
t3
0.8
3/5t
0.6

0.4

0.2

−0.2

−0.4

−0.6

−0.8

−1
−1 −0.5 0 0.5 1

7. Orthogonality and least squares December 3, 2013 102 / 119


Best approximation

Example
In this example we exploited the best approximation property of orthogonal
wavelets to speed-up and make more robust angular alignments of projections in
3D Electron Microscopy.

C.O.S.Sorzano, S. Jonic, C. El-Bez, J.M. Carazo, S. De Carlo, P. Thévenaz, M. Unser. A multiresolution approach to orientation assignment in 3-D

electron microscopy of single particles. Journal of Structural Biology 146(3): 381-392 (2004, cover article)

7. Orthogonality and least squares December 3, 2013 103 / 119


Pythagorean theorem

Theorem 7.1 (Pythagorean theorem)


Given any vector v in an inner product space V and a subspace of it W ⊆ V we
have
kvk2 = kProjW {v}k2 + kv − ProjW {v}k2

7. Orthogonality and least squares December 3, 2013 104 / 119


The Cauchy-Schwarz inequality

Theorem 7.2 (The Cauchy-Schwarz inequality)


For all u, v ∈ V it is verified
| hu, vi | ≤ kukkvk
Proof
If u = 0, then

| h0, vi | = 0 and k0kkvk = 0kvk = 0


So the inequality becomes an equality.
If u 6= 0, then consider W = Span{u} and
hv,ui |hv,ui| |hv,ui|
kProjW {v}k = kuk2 u = kuk2 kuk = kuk

But by the Pythagorean Theorem (Theorem 7.1) we have kProjW {v}k ≤ kvk.
Consequently,
|hv,ui|
kuk ≤ kvk ⇒ | hv, ui | ≤ kukkvk (q.e.d.)

7. Orthogonality and least squares December 3, 2013 105 / 119


The Triangle inequality

Theorem 7.3 (The Triangle inequality)


For all u, v ∈ V it is verified
ku + vk ≤ kuk + kvk
Proof

ku + vk2 = hu + v, u + vi [By definition]


= hu, ui + hv, vi + 2 hu, vi [Properties of inner product]
≤ kuk2 + kvk2 + 2| hu, vi | hu, vi ≤ | hu, vi |
≤ kuk2 + kvk2 + 2kukkvk Cauchy-Schwarz
= (kuk + kvk)2

ku + vk ≤ kuk + kvk [Taking square root]

(q.e.d.)

7. Orthogonality and least squares December 3, 2013 106 / 119


Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 7:
6.7.1
6.7.13
6.7.16
6.7.18

7. Orthogonality and least squares December 3, 2013 107 / 119


Outline

7 Orthogonality and least squares


Inner product, length and orthogonality (a)
Orthogonal sets, bases and matrices (a)
Orthogonal projections (b)
Gram-Schmidt orthogonalization (b)
Least squares (c)
Least-squares linear regression (c)
Inner product spaces (d)
Applications of inner product spaces (d)

7. Orthogonality and least squares December 3, 2013 108 / 119


Weighted Least Squares

Weighted Least Squares


Let us assume we have a table of collected data and we want to fit a least squares
model. However, we want to give more importance to some observations because
we are more confident about them or they are more important. We encode the
importance as a weight value (the larger the weight, the more importance the
observation has)
X Y W
x1 y1 w1
x2 y2 w2
x3 y3 w3
... ... ...
Let us call ŷj the prediction of the model for the j-th observation and j the
committed error
yj = ŷj + j

7. Orthogonality and least squares December 3, 2013 109 / 119


Weighted Least Squares

The goal is now to minimize the weighted sum of square errors


n n n
(wj j )2 = (wj (yj − ŷj ))2 = (wj yj − wj ŷj )2
P P P
j=1 j=1 j=1

Let us collect all observed values into a vector y and do analogously with the
predictions ŷ. Let us define the diagonal matrix
 
w1 0 0 ... 0
 0 w2 0 ... 0 
 
W = 0 0 w3 ... 0  
 ... ... ... ... ... 
0 0 0 ... wn
Then, the previous objective function becomes
n
(wj yj − wj ŷj )2 = kW y − W ŷk2
P
j=1

7. Orthogonality and least squares December 3, 2013 110 / 119


Weighted Least Squares

Now, suppose that ŷ is calculated from the columns of a matrix A, that is,
ŷ = Ax. The objective function becomes
n
(wj yj − wj ŷj )2 = kW y − WAxk2
P
j=1

The minimum of this objective function is attained for x̂ that is the least-squares
solution of the equation system
WAx = W y

The normal equations of the problem are


(WA)T WAx = (WA)T W y

7. Orthogonality and least squares December 3, 2013 111 / 119


Weighted Least Squares

Example
In this work they used Weighted Least Squares to calibrate a digital system to
measure maximum respiratory pressures.

J.L. Ferreira, F.H. Vasconcelos, C.J. Tierra-Criollo. A Case Study of Applying Weighted Least Squares to Calibrate a Digital Maximum Respiratory

Pressures Measuring System. Applied Biomedical Engineering, Chapter 18 (2011)

7. Orthogonality and least squares December 3, 2013 112 / 119


Fourier Series

Example

Fourier tools are, maybe, the


most widespread tool to analyze
signals and its frequency
components. Fourier
decomposition states that any
signal can be obtained by
summing sine waves of different
amplitude, phase and frequency.

7. Orthogonality and least squares December 3, 2013 113 / 119


Fourier Series

Theorem 8.1
Consider the vector space of continuous functions in the interval [0, 2π], C [0, 2π].
The set
S = {1, cos(t), sin(t), cos(2t), sin(2t), ..., cos(Nt), sin(Nt)}

is orthogonal with respect to the inner product defined as


R 2π
hf (t), g(t)i = 0 f (t)g(t)dt
Proof
R 2π
hcos(nt), cos(mt)i = 0 cos(nt) cos(mt)dt
R 2π
= 0 12 (cos((n + m)t) + cos((n − m)t))dt
  2∗π
= 21 sin((n+m)t)
n+m + sin((n−m)t)
n−m
0
= 0

where we have used cos(A) cos(B) = 21 (cos(A + B) + cos(A − B)).

7. Orthogonality and least squares December 3, 2013 114 / 119


Fourier Series

Analogously we could prove that


hcos(nt), sin(mt)i = 0
hcos(nt), 1i = 0
hsin(nt), 1i = 0
k cos(nt)k2 = π
k sin(nt)k2 = π
k1k2 = 2π

7. Orthogonality and least squares December 3, 2013 115 / 119


Fourier Series

Theorem 8.2 (Fourier series)


Given any function f (t) ∈ C [0, 2π], f (t) can be approximated as closely as desired
by a sum of the form simply by orthogonally projecting it onto W = Span{S}
N  
hf (t),1i P hf (t),cos(nt)i hf (t),sin(nt)i
f (t) ≈ ProjW {f (t)} = k1k2 + k cos(nt)k2 cos(nt) + k sin(nt)k2 sin(nt)
n=1

7. Orthogonality and least squares December 3, 2013 116 / 119


Fourier Series

Example
In this work we used Fourier space to simulate and to align electron microscopy
images

S. Jonic, C.O.S.Sorzano, P. Thévenaz, C. El-Bez, S. De Carlo, M. Unser. Spline-Based image-to-volume registration for three-dimensional electron

microscopy. Ultramicroscopy, 103: 303-317 (2005)

7. Orthogonality and least squares December 3, 2013 117 / 119


Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 8:
6.8.1
6.8.6
6.8.8
6.8.11

7. Orthogonality and least squares December 3, 2013 118 / 119


Outline

7 Orthogonality and least squares


Inner product, length and orthogonality (a)
Orthogonal sets, bases and matrices (a)
Orthogonal projections (b)
Gram-Schmidt orthogonalization (b)
Least squares (c)
Least-squares linear regression (c)
Inner product spaces (d)
Applications of inner product spaces (d)

7. Orthogonality and least squares December 3, 2013 119 / 119

You might also like