0% found this document useful (0 votes)

33 views119 pages

Tema 7

Uploaded by

Neeraj Sah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views119 pages

Tema 7

Uploaded by

Neeraj Sah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 119

Chapter 7.

Orthogonality and least squares

C.O.S. Sorzano

Biomedical Engineering

December 3, 2013

7. Orthogonality and least squares December 3, 2013 1 / 119

Outline

7 Orthogonality and least squares

Inner product, length and orthogonality (a)
Orthogonal sets, bases and matrices (a)
Orthogonal projections (b)
Gram-Schmidt orthogonalization (b)
Least squares (c)
Least-squares linear regression (c)
Inner product spaces (d)
Applications of inner product spaces (d)

7. Orthogonality and least squares December 3, 2013 2 / 119

References

D. Lay. Linear algebra and its applications (3rd ed). Pearson (2006). Chapter 6.

7. Orthogonality and least squares December 3, 2013 3 / 119

A little bit of history

Least squares was first used to solve problems in geodesy (Andrien-Marie

Legendre, 1805) and astronomy (Carl Friedrich Gauss, 1809). Gauss made the
connection of this method to the distribution of measurement errors. Currently it
is one of the best understood and most widely spread methods.

7. Orthogonality and least squares December 3, 2013 4 / 119

Applications

In this example Least Squares are used to plan a radiation therapy.

Bedford, J. L. Sinogram analysis of aperture optimization by iterative least-squares in volumetric modulated arc therapy. Physics in Medicine and Biology,

2013, 58, 1235-1250

7. Orthogonality and least squares December 3, 2013 5 / 119

Applications

Traditionally, control applications were formulated in a least-squares setup.

Currently, they have found more sophisticated goal functions that can be regarded
as evolved versions of least squares.

7. Orthogonality and least squares December 3, 2013 6 / 119

Outline

7 Orthogonality and least squares

7. Orthogonality and least squares December 3, 2013 7 / 119

Inner product

Definition 1.1 (Inner product or dot product)

Let u, v ∈ Rn be two vectors. The inner product or dot product between these
two vectors is defined as
n
P
u · v = hu, vi , u i vi
i=1

Theorem 1.1
If we considered u and v to be column vectors (∈ Mn×1 ), then
u · v = uT v

Example
Let u = (2, −5, −1) and v = (3, 2, −3).
u · v = 2 · 3 + (−5) · 2 + 1 · (−3) = −1

7. Orthogonality and least squares December 3, 2013 8 / 119

Inner product

Theorem 1.2
For any three vectors u, v, w ∈ Rn and any scalar r ∈ R it is verified that
1 u·v=v·u
2 (u + v) · w = u · w + v · w
3 (r u) · v = r (u · v) = u · (r v)
4 u·u≥0
5 u·u=0⇔u=0

Corollary
(r1 u1 + r2 u2 + ... + rp up ) · v = r1 (u1 · v) + r2 (u2 · v) + ... + rp (up · v)

7. Orthogonality and least squares December 3, 2013 9 / 119

Length

Definition 1.2 (Length of a vector)

Given any vector v, its length is defined as
√
kvk , v · v

Theorem 1.3
Given any vector v ∈ Rn
p
kvk = v12 + v22 + ... + vn2

Example
The length of v = (1, −2, 2, 0) is
p
kvk = 12 + (−2)2 + 22 + 02 = 3

7. Orthogonality and least squares December 3, 2013 10 / 119

Length

Theorem 1.4
For any vector v and any scalar r it is verified that
kr vk = |r |kvk

Proof
It will be given only for v ∈ Rn :
p p
kr vk = √ (rvp 2 2 2
1 ) + (rv2 ) + ... + (rvn ) = r 2 (v12 + v22 + ... + vn2 )
2 2 2 2
(q.e.d.)
= r v1 + v2 + ... + vn = |r |kvk

Example (continued)
Find a vector of unit length that has the same direction as v = (1, −2, 2, 0).
Solution
q
v
= 13 , − 23 , 23 , 0 ⇒ kuv k = 19 + 49 + 94 + 0 = 1

uv = kvk

7. Orthogonality and least squares December 3, 2013 11 / 119

Distance

Definition 1.3 (Distance in R)

The distance between any two numbers a, b ∈ R can be defined as
d(a, b) = |a − b|

Example
Calculate the distance between 2 and 8 as well as between -3 and 4.

7. Orthogonality and least squares December 3, 2013 12 / 119

Distance

Definition 1.4 (Distance in Rn )

The distance between any two vectors u, v ∈ Rn can be defined as
d(u, v) = ku − vk

Example
Calculate the distance between u = (7, 1) and v = (3, 2)
√ √
d(u, v) = k(7, 1) − (3, 2)k = k(4, −1)k = 42 + 12 = 17

7. Orthogonality and least squares December 3, 2013 13 / 119

Distance

Example
For any two vectors in R3 , u and v, the distance can be calculated through
= ku − vk = k(u1 − v1 , u2 − v2 , u3 − v3 )k =
d(u, v)p
(u1 − v1 )2 + (u2 − v2 )2 + (u3 − v3 )2

7. Orthogonality and least squares December 3, 2013 14 / 119

Orthogonality

Example
Any two vectors in R2 , u and v, are orthogonal if d(u, v) = d(u, −v)

d 2 (u, v) = ku − vk2 = (u − v) · (u − v) = u · u + v · v − 2u · v = kuk2 + kvk2 − 2u · v

d 2 (u, −v) = ku + vk2 = (u + v) · (u + v) = u · u + v · v + 2u · v = kuk2 + kvk2 + 2u · v

d 2 (u, v) = d 2 (u, −v) ⇒ −2u · v = 2u · v ⇒ u · v = 0

7. Orthogonality and least squares December 3, 2013 15 / 119

Orthogonality

Definition 1.5 (Orthogonality between two vectors)

Any two different vectors, u and v, in a vector space V are orthogonal iff
u·v=0

Corollary
0 is orthogonal to any other vector.

Theorem 1.5 (Pythagorean theorem)

Any two vectors, u and v, in a vector space V are orthogonal iff
ku + vk2 = kuk2 + kvk2

7. Orthogonality and least squares December 3, 2013 16 / 119

Orthogonality

Definition 1.6 (Orthogonality between vector and vector space)

Let u be a vector in a vector space V and W a vector subspace of V . u is
orthogonal to W if u is orthogonal to all vectors in W . The set of all vectors
orthogonal to W is denoted as W ⊥ (the orthogonal complement of W ).

Example
Let W be a plane in R3 passing through the origin and L be a line, passing
through the origin and perpendicular to W . For any vector w ∈ W and any vector
z ∈ L we have

w·z=0

Therefore,
L = W ⊥ ⇔ W = L⊥

7. Orthogonality and least squares December 3, 2013 17 / 119

Orthogonality
Theorem 1.6
Let W be a vector subspace of a vector space V .
1 x ∈ W ⊥ iff x is orthogonal to every vector in a set that spans W .
2 W ⊥ is a vector subspace of V .

Theorem 1.7

Let A ∈ Mm×n , then

1 (Row{A})⊥ = Nul{A}
2 (Col{A})⊥ = Nul{AT }

7. Orthogonality and least squares December 3, 2013 18 / 119

Orthogonality

Proof Nul{A} ⊆ (Row{A})⊥

Consider the rows of A, ai (i = 1, 2, ..., m) as column vectors, then for any vector
x ∈ Nul{A} we know
 T  T     
a1 a1 x a1 · x 0
aT  aT x  a2 · x   0 
 2
Ax = 0 ⇒   x =   2  =   =  
... ...   ...  ...
aTm aT
mx am · x 0

Consequently, x is orthogonal to all the rows of A, which span Row{A} and by

the previous theorem, x ∈ (Row{A})⊥
Proof Nul{A} ⊇ (Row{A})⊥
Conversely, let x ∈ (Row{A})⊥ , then by the previous theorem we know that

ai · x for i = 1, 2, ..., m ⇒ Ax = 0
So, x ∈ Nul{A}

7. Orthogonality and least squares December 3, 2013 19 / 119

Orthogonality

Proof (Col{A})⊥ = Nul{AT }

Let’s define B = AT . By the first part of this theorem, we know
(Row{B})⊥ = Nul{B} ⇒ (Row{AT })⊥ = Nul{AT } ⇒ (Col{A})⊥ = Nul{AT }

Theorem 1.8
For any two vectors u and v in a vector space V , the angle between the two can
be measured through the dot product:

u · v = kukkvk cos θ

7. Orthogonality and least squares December 3, 2013 20 / 119

Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 1:
6.1.15
6.1.22
6.1.24
6.1.26
6.1.28
6.1.30
6.1.32 (computer)

7. Orthogonality and least squares December 3, 2013 21 / 119

Outline

7 Orthogonality and least squares

7. Orthogonality and least squares December 3, 2013 22 / 119

Orthogonal sets

Definition 2.1 (Orthogonal set)

Let S = {u1 , u2 , ..., up } be a set of vectors. S is an orthogonal set iff

ui · uj = 0 ∀i, j ∈ {1, 2, ..., p} i 6= j

Example
Let u1 = (3, 1, 1), u2 = (−1, 2, 1), u3 = (− 12 , −2, 72 ). Check whether the set
S = {u1 , u2 , u3 } is orthogonal.
Solution

u1 · u2 = 3 · (−1) + 1 · 2 + 1 · 1 = 0
u1 · u3 = 3 · (− 12 ) + 1 · (−2) + 1 · ( 72 ) = 0
u2 · u3 = (−1) · (− 12 ) + 2 · (−2) + 1 · ( 27 ) = 0

7. Orthogonality and least squares December 3, 2013 23 / 119

Orthogonal sets

Theorem 2.1
If S is an orthogonal set of non-null vectors, then S is linearly independent and,
consequently, it is a basis of the subspace spanned by S.
Proof
Let ui (i = 1, 2, ..., p) be the elements of S. Let us assume that S is linearly
dependent. Then, there exists coefficients c1 , c2 , ..., cp not all of them null such
that
0 = c1 u1 + c2 u2 + ... + cp up
Now, we compute the inner product with u1
0 · u1 = (c1 u1 + c2 u2 + ... + cp up ) · u1
0 = c1 (u1 · u1 ) + c2 (u2 · u1 ) + ... + cp (up · u1 ) = c1 ku1 k2 ⇒ c1 = 0
Multiplying by ui (i = 2, 3, ..., p) we can show that all ci ’s are 0, and, therefore,
the set S is linearly independent.

7. Orthogonality and least squares December 3, 2013 24 / 119

Orthogonal basis
Definition 2.2 (Orthogonal basis)
A set of vectors B is an ortohogonal basis of a vector space V if it is an
ortohogonal set and it is a basis of V .

Theorem 2.2
Let {u1 , u2 , ..., up } be an orthogonal basis for a vector space V , for each x ∈ V
we have
x·u1 x·u2 x·up
x= ku1 k2 u1 + ku2 k2 u2 + ... + kup k2 up

Proof
If x is in V , then it can be expressed as a linear combination of the vectors in a
basis of V
x = c1 u1 + c2 u2 + ... + cp up

Now, we calculate the dot product with u1

x·u1
x · u1 = (c1 u1 + c2 u2 + ... + cp up ) · u1 = c1 ku1 k2 ⇒ c1 = ku1 k2
7. Orthogonality and least squares December 3, 2013 25 / 119
Orthogonal basis

Example
Let u1 = (3, 1, 1), u2 = (−1, 2, 1), u3 = (− 12 , −2, 72 ), and B = {u1 , u2 , u3 } be an
orthogonal basis of R3 . Let x = (6, 1, −8). The coordinates of x in B are given by
x · u1 = 11 x · u2 = −12 x · u1 = −33
ku1 k2 = 11 ku2 k2 = 6 ku3 k2 = 33
2

−12 −33
x = 11
11 u1 + 6 u2 + 33 u3
2
= u1 − 2u2 − 2u3
The coordinates of x in the basis B are
[x]B = (1, −2, −2)

7. Orthogonality and least squares December 3, 2013 26 / 119

Orthogonal projections

Orthogonal projection onto a vector

Consider a vector y and another one u. Let us assume we want to decompose y as
the sum of two orthogonal vectors ŷ (along the direction of u) and another vector
z (orthogonal to u):

y = ŷ + z = αu + z ⇒
z = y − ŷ

We need to find α that makes u and z orthogonal.

y·u
0 = z · u = (y − αu) · u = y · u − αkuk2 ⇒ α = kuk2

ŷ is the orthogonal projection of y onto u.

7. Orthogonality and least squares December 3, 2013 27 / 119

Orthogonal projections

Example
Let y = (7, 6) and u = (4, 2). Then,

y·u 8 40
ŷ = = kuk2 u
= 2u = 20 u
4

y · u = 40
⇒ 7 8 −1
kuk2 = 20 z = y − ŷ = − =
6 p4 2 √
d(y, ŷ) = ky − ŷk = kzk = (−1) + 22 = 5
2

7. Orthogonality and least squares December 3, 2013 28 / 119

Orthonormal set

Definition 2.3 (Orthonormal set)

{u1 , u2 , ..., up } is an orthonormal set if it is an orthogonal set and all ui vectors
have unit length.

Example
Show that the set {u1 , u2 , u3 } is orthonormal, with
     
3 −1 −1
u1 = √111 1 u2 = √16  2  u3 = √1 −4
66
1 1 7
Solution
Let’s check that they are orthogonal:
u1 · u2 = √1 √1 (3 · (−1) + 1 · 2 + 1 · 1) = 0
11 6
u1 · u3 = √1 √1 (3 · (−1) + 1 · (−4) + 1 · 7) = 0
11 66
u2 · u3 = √1 √1 ((−1) · (−1) + (2) · (−4) + (1) · 7) =0
6 66

7. Orthogonality and least squares December 3, 2013 29 / 119

Orthonormal set

Example (continued)
Now, let’s check that they have unit length:
r 2 q
ku1 k = √1 (32 + 12 + 12 ) = 9+1+1
=1
11
r 11
2 q
ku2 k = √1 ((−1)2 + 22 + 12 ) = 1+4+1 =1
6 6
r 2 q
ku3 k = √1 ((−1)2 + (−4)2 + 72 ) = 1+16+49 =1
66 66

Theorem 2.3
If S = {u1 , u2 , ..., un } is an orthonormal set, then it is an orthonormal basis of
Span{S}.

Example
{e1 , e2 , ..., en } is an orthonormal basis of Rn .
7. Orthogonality and least squares December 3, 2013 30 / 119
Orthonormal basis

Theorem 2.4
Let S = {u1 , u2 , ..., un } is an orthogonal set of vectors, then the set
S 0 = {u01 , u02 , ..., u0n } where
ui
u0i = kui k

is a orthonormal set (this operation is called vector normalization).

Proof
Let’s check that the u0i vectors are orthogonal:
ui uj
u0i · u0j = kui k · kuj k = 1
kui kkuj k ui · uj

But this product is obviusly 0 because the ui vectors are orthogonal. Let’s check
now that the u0i vectors have unit length:
ui kui k
ku0i k = kui k = kui k =1

7. Orthogonality and least squares December 3, 2013 31 / 119

Orthonormal matrix
Theorem 2.5
Let U ∈ Mm×n be a square matrix. The columns of U form an orthonormal set iff
U T U = In
It is said that U is an orthonormal matrix.
Proof
Let’s consider the columns of U

U = u1 u2 ... un

Let’s calculate now U T U

 T  T
u1 u1 uT ... uT

u1 1 u2 1 un
 uT  uT u
2 1 u T
2 u2 ... uT
2 un 

UT U =  2 
 ...  u1 u2 ... un =   ... ... ... ... 
uT
n uT u
n 1 u T
n u2 ... uT
n un
T
ui uj = 0 i 6= j
The condition U T U = In simply states , which is the
uT
i uj = 1 i = j
definition of an orthonormal set.
7. Orthogonality and least squares December 3, 2013 32 / 119
Orthonormal matrix

Theorem 2.6
Let U ∈ Mn×n be an orthonormal matrix and ∀x, y ∈ Rn , then
1 kUxk = kxk
2 (Ux) · (Uy) = x · y
3 (Ux) · (Uy) = 0 ⇔ x · y = 0

Example
 
√1 2
3
√
 √12 2
Let U = − 23  and x = .

3
 2
1
0 3
U is an orthonormal matrix because
 1 1   √1 2

√
2
√
2
0 2 3

1 0
UT U =  2  √12 − 23  =

3 0 1
− 23 1
3 0 1
3

7. Orthogonality and least squares December 3, 2013 33 / 119

Orthonormal matrix
Example (continued)
Let’s calculate now Ux
 
√1 2  
3
√ 3
 √12 2
Ux = − 23  = −1

3
 2
0 1 1
3

Let’s check now that kUxk = kxk

p √
kUxk = k(3, −1, 1)k = q32 + (−1)2 + 12 = 11
√ √ √
kxk = ( 2, 3) = ( 2)2 + 32 = 11

Theorem 2.7
Let U be an orthonormal and square matrix. Then,
1 U −1 = U T
2 U T is also an orthonormal matrix (i.e., the rows of U also form an
orthonormal set of vectors).
7. Orthogonality and least squares December 3, 2013 34 / 119
Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 2:
6.2.1
6.2.10
6.2.15
6.2.25
6.2.26
6.2.29
6.2.35 (computer)

7. Orthogonality and least squares December 3, 2013 35 / 119

Outline

7 Orthogonality and least squares

7. Orthogonality and least squares December 3, 2013 36 / 119

Orthogonal projections

Definition 3.1 (Orthogonal projection)

The orthogonal projection of a point y onto a vector subspace W is a point
ŷ ∈ W such that

z = y − ŷ
z⊥W

7. Orthogonality and least squares December 3, 2013 37 / 119

Orthogonal projections

Example
Let {u1 , u2 , ..., u5 } be an orthogonal basis of R5 . Consider the subspace
W = Span{u1 , u2 }. Given any vector y ∈ R5 , we can decompose it as the sum of
a vector in W and a vector perpendicular to W
y = ŷ + z
Solution
If {u1 , u2 , ..., u5 } is a basis of R5 , then any vector y ∈ R5 can be written as

y = c1 u1 + c2 u2 + ... + c5 u5
We may decompose this sum as
ŷ = c1 u1 + c2 u2
z = c3 u3 + c4 u4 + c5 u5

7. Orthogonality and least squares December 3, 2013 38 / 119

Orthogonal projections

Example (continued)
It is obvious that ŷ ∈ W . Now we need to show that z ∈ W ⊥ . For doing so, we
will show that
z · u1 = 0
z · u2 = 0
To show the first equation we note that
z · u1 = (c3 u3 + c4 u4 + c5 u5 ) · u1
= c3 (u3 · u1 ) + c4 (u4 · u1 ) + c5 (u5 · u1 )
= c3 · 0 + c4 · 0 + c5 · 0
= 0
We would proceed analogously for z · u2 = 0.

7. Orthogonality and least squares December 3, 2013 39 / 119

Orthogonal projections

Theorem 3.1 (Orthogonal Decomposition Theorem)

Let W be a vector subspace of a vector space V . Then, any vector y ∈ V can be
written uniquely as
y = ŷ + z
with ŷ ∈ W and z ∈ W ⊥ . In fact, if {u1 , u2 , ..., up } is an orthogonal basis of W ,
then
y·u1 y·u2 y·up
ŷ = ku1 k2 u1 + ku2 k2 u2 + ... + kup k2 up

7. Orthogonality and least squares December 3, 2013 40 / 119

Orthogonal projections

Proof
ŷ is obviously in W since it has been written as a linear combination of vectors in
a basis of W . z is perpendicular to W because

y·u1 y·u2 y·up
z · u1 = y − ku 1k
u
2 1 + u
ku2 k2 2 + ... + u
kup k2 p · u1
y·u1 y·u2 y·u
= y · u1 − ku1 k2 (u1 · u1 ) − ku2 k2 (u2 · u1 ) − ... − kup kp 2 (up · u1 )
[{ui } is an orthogonal set]
y·u1
= y · u1 − ku 1k
2 (u1 · u1 )
y·u1
= y · u1 − ku1 k2 ku1 k2
= y · u1 − y · u1
= 0
We could proceed analogously for all elements in the basis of W .

7. Orthogonality and least squares December 3, 2013 41 / 119

Orthogonal projections
We need to show now that the decomposition is unique. Let us assume that it is
not unique. Consequently, there exist different vectors such that
y = ŷ + z
y = ŷ0 + z0

We subtract both equations

0 = (ŷ − ŷ0 ) + (z − z0 ) ⇒ ŷ − ŷ0 = z0 − z
Let v = ŷ − ŷ0 . It is obvious that v ∈ W because it is written as a linear
combination of vectors in W . On the other side, v = z0 − z, i.e., it is a linear
combination of vectors in W ⊥ , so v ∈ W ⊥ . The only vector that belongs to W
and W ⊥ at the same time is
ŷ = ŷ0

v=0⇒ .
z = z0
and consequently, the orthogonal decomposition is unique. Additionally, the
uniqueness of the decomposition depends only on W and not on the particular
basis chosen for W .
7. Orthogonality and least squares December 3, 2013 42 / 119
Orthogonal projections

Example
Let u1 = (2, 5, −1) and u2 = (−2, 1, 1). Let W be the subspace spanned by u1
and u2 . Let y = (1, 2, 3) ∈ R3 . The orthogonal projection of y onto W is
y·u1 y·u2
ŷ = ku1 k2 u1 + ku2 k2 u2
  
2 −2
1·2+2·5+3·(−1)  5 + 1·(−2)+2·1+3·1 1
= 22 +52 +(−1)2 (−2)2 +12 +12
−1    1
2
  
2 −2 −5
9 
= 30
5  + 15
30
 1 = 2 
−1   1  15 
1 − 25 7
5
z = y − ŷ = 2 −
   2  = 0

1 14
3 5 5

7. Orthogonality and least squares December 3, 2013 43 / 119

Orthogonal projections

Geometrical interpretation
ŷ can be understood as the sum of the orthogonal projection of y onto each one
of the elements of the basis of W .

Theorem 3.2
If y belongs to W , then the orthogonal projection of y onto W is itself:

ŷ = y
7. Orthogonality and least squares December 3, 2013 44 / 119
Properties of orthogonal projections
Theorem 3.3 (Best approximation theorem)
The orthogonal projection of y onto W is the point in W with minimum distance
to y, i.e.,
ky − ŷk ≤ ky − vk

for all v ∈ W , v 6= ŷ.

Proof
We know that y − ŷ is orthogonal to W . For any vector v ∈ W , v 6= ŷ, we have
that ŷ − v is in W . Now consider the orthogonal decomposition of the vector y − v

y − v = (y − ŷ) + (ŷ − v)

7. Orthogonality and least squares December 3, 2013 45 / 119

Properties of orthogonal projections

Due to the orthogonal decomposition theorem (Theorem 3.1), this decomposition

is unique and due to the Pythagorean theorem (Theorem 1.5) we have
ky − vk2 = ky − ŷk2 + kŷ − vk2
Since v 6= ŷ we have kŷ − vk2 > 0 and consequently

ky − vk2 > ky − ŷk2

7. Orthogonality and least squares December 3, 2013 46 / 119

Properties of orthogonal projections

Theorem 3.4
If {u1 , u2 , ..., up } is an orthonormal basis of W , then the orthogonal projection of
y onto W is

ŷ = hy, u1 i u1 + hy, u2 i u2 + ... + hy, up i up

If we construct the orthonormal matrix U = u1 u2 ... up , then
ŷ = UU T y
Proof
By Theorem 3.1 we know that for all orthogonal bases it is verified
y·u1 y·u2 y·up
ŷ = ku1 k2 u1 + ku2 k2 u2 + ... + kup k2 up

Since the basis is in this case orthonormal, then kuk = 1 and consequently
ŷ = hy, u1 i u1 + hy, u2 i u2 + ... + hy, up i up

7. Orthogonality and least squares December 3, 2013 47 / 119

Properties of orthogonal projections

On the other side we have

uT
   T   
1 u1 y hu1 , yi
uT uT
 2 y hu2 , yi
   
UT y =  2 
 ...  y =  ...  =  ... 
uT
p uT
py hup , yi
Then,
 
hu1 , yi
hu2 , yi
UU T y = u1 u2 ... up  ...  = hy, u1 i u1 + hy, u2 i u2 + ... + hy, up i up


hup , yi
(q.e.d.)

7. Orthogonality and least squares December 3, 2013 48 / 119

Properties of orthogonal projections

Corollary

Let U = u1 u2 ... up be a n × p matrix with orthonormal columns and
W = Col{U} its column space. Then,
∀x ∈ Rp U T Ux = x No effect
∀y ∈ Rn UU T y = ŷ Orthogonal projection of y onto W
If U is a n × n, then W = Rn and the projection has no effect
∀y ∈ Rn UU T y = ŷ = y No effect

7. Orthogonality and least squares December 3, 2013 49 / 119

Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 3:
6.3.1
6.3.7
6.3.15
6.3.23
6.3.24
6.3.25 (computer)

7. Orthogonality and least squares December 3, 2013 50 / 119

Outline

7 Orthogonality and least squares

7. Orthogonality and least squares December 3, 2013 51 / 119

Gram-Schmidt orthogonalization
Gram-Schmidt orthogonalization is a procedure aimed at producing an orthogonal
basis of any subspace W .
Example
Let W = Span{x1 , x2 } with x1 = (3, 6, 0) and x2 = (1, 2, 2). Let’s look for an
orthogonal basis of W .
Solution
We may keep the first vector for the basis

v1 = x1 = (3, 6, 0)
For the second vector in the basis, we need to keep the component of x2 that is
orthogonal to x1 . For doing so we calculate the projection of x2 onto x1 (p), and
we decompose x2 as

x2 = p + (x2 − p) = (1, 2, 0) + (0, 0, 2)

We, then, keep the orthogonal part of x2
v2 = x2 − p = (0, 0, 2)

7. Orthogonality and least squares December 3, 2013 52 / 119

Gram-Schmidt orthogonalization

Example (continued)
The set {v1 , v2 } is an orthogonal basis of W .

7. Orthogonality and least squares December 3, 2013 53 / 119

Gram-Schmidt orthogonalization

Example
Let W = Span{x1 , x2 , x3 } with x1 = (1, 1, 1, 1), x2 = (0, 1, 1, 1) and
x3 = (0, 0, 1, 1). Let’s look for an orthogonal basis of W .
Solution
We may keep the first vector for the basis. Then we construct a subspace (W1 )
with a single element in its basis
v1 = x1 = (1, 1, 1, 1) W1 = Span{v1 }

For the second vector in the basis, we need to keep the component of x2 that is
orthogonal to W1 . With the already computed basis vectors, we construct a new
subspace (W2 ) with two elements in its basis
v2 = x2 − ProjW1 (x2 ) = (− 43 , 41 , 14 , 14 ) W2 = Span{v1 , v2 }

For the third vector in the basis, we repeat the same procedure
v3 = x3 − ProjW2 (x3 ) = (0, − 32 , 13 , 13 ) W3 = Span{v1 , v2 , v3 }

7. Orthogonality and least squares December 3, 2013 54 / 119

Gram-Schmidt orthogonalization

Theorem 4.1 (Gram-Schmidt orthogonalization)

Given a basis {x1 , x2 , ..., xp } for a vector subspace W . Define

v1 = x1 W1 = Span{v1 }
v2 = x2 − ProjW1 (x2 ) W2 = Span{v1 , v2 }
...
vp = xp − ProjWp−1 (xp ) Wp = Span{v1 , v2 , ..., vp } = W

Then {v1 , v2 , ..., vp } is an orthogonal basis of W .

7. Orthogonality and least squares December 3, 2013 55 / 119

Gram-Schmidt orthogonalization

Proof
Consider Wk = Span{v1 , v2 , ..., vk } and let us assume that {v1 , v2 , ..., vk } is a
basis of Wk . Now we construct
vk+1 = xk+1 − ProjWk (xk+1 ) Wk+1 = Span{v1 , v2 , ..., vk+1 }

By the orthogonal decomposition theorem (Theorem 3.1), we know that vk+1 is

orthogonal to Wk . Because xk+1 is an element of a basis, we know that
xk+1 ∈ / Wk . Therefore, vk+1 is not null and xk+1 ∈ Wk+1 . Finally, the set
{v1 , v2 , ..., vk+1 } is a set of orthogonal, non-null vectors in the
(k + 1)-dimensional space Wk+1 . Consequently, by Theorem 9.4 in Chapter 5, it
must be a basis of Wk+1 . This process can be iterated till k = p.

7. Orthogonality and least squares December 3, 2013 56 / 119

Gram-Schmidt orthogonalization

Orthonormal basis
Once we have an orthogonal basis, we simply have to normalize each vector to
have an orthonormal basis.

Example
Let W = Span{x1 , x2 } with x1 = (3, 6, 0) and x2 = (1, 2, 2). Let’s look for an
orthonormal basis of W .
Solution
In Slide 52 we learned that an orthogonal basis was given by
v1 = (3, 6, 0)
v2 = (0, 0, 2)

Now, we normalize these two vectors to obtain an orthonormal basis

v10 = kvv11 k = √145 (3, 6, 0) = ( √15 , √25 , 0)
v20 = kvv22 k = 12 (0, 0, 2) = (0, 0, 1)

7. Orthogonality and least squares December 3, 2013 57 / 119

QR factorization of matrices
If we apply the Gram-Schmidt factorization to the columns of a matrix, we have
the following factorization scheme. This factorization is used in practice to find
eigenvalues and eigenvectors as well as to solve linear equation systems.

Theorem 4.2 (QR factorization)

Let A ∈ Mm×n with linearly independent columns. Then, A can be factored as
A = QR
where Q ∈ Mm×n is a matrix whose columns form an orthonormal basis of
Col{A} and R ∈ Mn×n is an upper triangular invertible matrix with positive
entries on its diagonal.
Proof
Let’s orthogonalize the columns of A following the Gram-Schmidt procedure and
construct the orthonormal basis of Col{A}. Let {u1 , u2 , ..., un } be such a basis.
Let us construct the matrix

Q = u1 u2 ... un

7. Orthogonality and least squares December 3, 2013 58 / 119

QR factorization of matrices

Let us call ai (i = 1, 2, ..., n) to the columns of A. By the Gram-Schmidt

orthogonalization, we know that for any k between 1 and n we have

Span{a1 , a2 , ..., ak } = Span{u1 , u2 , ..., uk }

Consequently, we can express each column of A in the orthonormal basis:
ak = r1k u1 + r2k u2 + ... + rkk uk + 0 · uk+1 + ... + 0 · un
If rkk is negative, we can multiply both rkk and uk by -1. We now collect all these
coefficients in a vector rk = (r1k , r2k , ..., rkk , 0, 0, ..., 0) to have
ak = Qrk
By gathering all these vectors in a matrix, we have the triangular matrix R

R = r1 r2 ... rn

R is invertible because the columns of A are linearly independent.

7. Orthogonality and least squares December 3, 2013 59 / 119

QR factorization of matrices
Example
 
1 0 0
1 1 0
Let’s calculate the QR factorization of A = 
1
. From Slide 54 we know
1 1
1 1 1
that the vectors
v1 = (1, 1, 1, 1)
v2 = (− 34 , 41 , 14 , 14 )
v3 = (0, − 23 , 13 , 13 )

Is an orthogonal basis of the column space of A. We now normalize these vectors

to obtain the orthonormal basis in Q
1
− √312

2 0
1 √1 − √26 
Q =  21 12
 
√1 √1 

2 12 6
1 √1 √1
2 12 6

7. Orthogonality and least squares December 3, 2013 60 / 119

QR factorization of matrices

Example (continued)
To find R we multiply on both sides of the factorization by Q T

A = QR ⇒ Q T A = Q T QR = R  
1 0 0
 
1 1 1 1
2 2 2 2
 3 1 1 1  1 1 0
= − √12 √12 √12 √12  

R = QT A 1

1 1
0 − √26 √16 √1
 
6 1 1 1
3
2 2 1
3 2 
= 0 √12 √12 

0 0 √1
6

7. Orthogonality and least squares December 3, 2013 61 / 119

Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 4:
6.4.7
6.4.13
6.4.19
6.4.22
6.4.24

7. Orthogonality and least squares December 3, 2013 62 / 119

Outline

7 Orthogonality and least squares

7. Orthogonality and least squares December 3, 2013 63 / 119

Least squares
Let’s assume we want to solve the equation system Ax = b, but, due to noise,
there is no solution. We may still look for a solution such that Ax ≈ b. In fact the
goal will be to minimize d(Ax, b).

Definition 5.1 (Least squares solution)

Let A be a m × n matrix and b ∈ Rm . x ∈ˆRn is a least squares solution of the
equation system Ax = b iff

∀x ∈ Rn kb − Ax̂k ≤ kb − Axk

7. Orthogonality and least squares December 3, 2013 64 / 119

Least squares

Solution of the general least squares problem

Applying the Best Approximation Theorem (Theorem 3.3), we may project b onto
the column space of A
b̂ = ProjCol{A} {b}

Then, we solve the equation

system
Ax = b̂

that has at least one solution.

7. Orthogonality and least squares December 3, 2013 65 / 119

Least squares

Theorem 5.1
The set of least-squares solutions of Ax = b is the same as the set of solutions of
the normal equations
AT Ax = AT b

Proof: least-squares solutions ⊂ normal equations solutions

Let us assume that x̂ is a least-squares solution. Then, b − Ax̂ is orthogonal to
Col{A}, and in particular, to each one of the columns of A (ai , i = 1, 2, ..., n):
ai · (b − Ax̂) = 0 ∀i ∈ {1, 2, ..., n} ⇒
aTi (b − Ax̂) = 0 ∀i ∈ {1, 2, ..., n} ⇒
AT (b − Ax̂) = 0 ⇒
AT b = AT Ax̂
That is, every least-squares solution is also a solution of the normal equations.

7. Orthogonality and least squares December 3, 2013 66 / 119

Least squares

Proof: least-squares solutions ⊃ normal equations solutions

Let us assume that x̂ is solution of the normal equations. Then,
AT b = AT Ax̂ ⇒
AT (b − Ax̂) = 0 ⇒
T
ai (b − Ax̂) = 0 ∀i ∈ {1, 2, ..., n}
That is, b − Ax̂ is orthogonal to the columns of A and, consequently, to Col{A}.
Hence, the equation
b = Ax̂ + (b − Ax̂)

is the orthogonal decomposition of b as a vector in Col{A} and a vector

orthogonal to Col{A}. By the uniqueness of the orthogonal decomposition, Ax̂
must be the orthogonal projection of b onto Col{A} so that
Ax̂ = b̂

and, therefore, x̂ is a least-squares solution.

7. Orthogonality and least squares December 3, 2013 67 / 119

Least squares

Example
   
4 0 2
Find a least-squares solution to Ax = b with A = 0 2 and b =  0 .
1 1 11
Solution
Let’s solve the normal equations AT Ax̂ = AT b

T 17 1 T 19
A A= A b=
1 5 11
−1
17 1 19 17 1 19 1
x̂ = ⇒ x̂ = =
1 5 11 1 5 11 2
Let’s check that x̂ is not a solution of the original equation system but a
least-squares solution
     
4 0 4 2
1
Ax̂ = 0 2 = 4 = b̂ 6= b =  0 
2
1 1 3 11

7. Orthogonality and least squares December 3, 2013 68 / 119

Least squares

Definition 5.2 (Least-squares error)

The least-squares error is defined as
σ2 , kAx̂ − bk2 = kb̂ − bk2

Example (continued)
In this case:
σ2 = k(4, 4, 3) − (2, 0, 11)k = k(2, 4, −8)k ≈ 9.165

7. Orthogonality and least squares December 3, 2013 69 / 119

Least squares

Example
Unfortunately, the least-squares solution may not be unique as shown in the next
example
 (arising in ANOVA). Find
 aleast-squares solution to Ax = b with
1 1 0 0 −3
1 1 0 0 −1
   
1 0 1 0
 and b =  0 .
 
A= 1 0 1 0 2
   
1 0 0 1 5
1 0 0 1 1
Solution
   
6 2 2 2 4
 2 2 0 0  −4 
AT A =  T
2 0 2 0 A b =  2 
  

2 0 0 2 6

7. Orthogonality and least squares December 3, 2013 70 / 119

Least squares

Example (continued)
The augmented matrix is
   
6 2 2 2 4 1 0 0 1 3
 2 2
 0 0 −4 
∼ 0 1 0
 −1 −5 

 2 0 2 0 2   0 0 1 −1 −2 
2 0 0 2 6 0 0 0 0 0
Any point of the form
   
3 −1
−5 1
x̂ = 
−2 + x4  1 
   ∀x4 ∈ R
0 1
is a least-squares solution of the problem.

7. Orthogonality and least squares December 3, 2013 71 / 119

Least squares

Theorem 5.2
The matrix AT A is invertible iff the columns of A are linearly independent. In this
case, the equation system Ax = b has a unique least-squares solution given by
x̂ = A+ b

where A+ is the Moore-Penrose pseudoinverse

A+ = (AT A)−1 AT

7. Orthogonality and least squares December 3, 2013 72 / 119

Least squares and QR decomposition
Sometimes AT A is ill-conditioned, this means that small perturbations in b
translate into large perturbations in x̂. The QR decomposition offers a numerically
more stable way of finding the least-squares solution.
Theorem 5.3
Let there be A ∈ Mm×n with linearly independent columns. Consider its QR
decomposition (A = QR). Then, for each b ∈ Rm there is a unique least-squares
solution of Ax = b given by
x̂ = R −1 Q T b
Proof
If we substitute x̂ = R −1 Q T b into Ax we have

Ax̂ = AR −1 Q T b = QRR −1 Q T b = QQ T b.
But Q is an orthonormal basis of Col{A} (Theorem 4.2 and Corollary in Slide 49)
and consequently QQ T b is the orthogonal projection of b onto Col{A}, that is, b̂.
So, x̂ = R −1 Q T b is a least-squares solution of Ax = b. Additionally, since the
columns of A are linearly independent, by Theorem 5.2, this solution is unique.
7. Orthogonality and least squares December 3, 2013 73 / 119
Least squares and QR decomposition

Remind that numerically it is easier to solve Rx̂ = Q T b than x̂ = R −1 Q T b

L
   
1 3 5 3
1 1 0 5
et A = 
1
 and b =  . Its QR decomposition is
1 2 7
1 3 3 −3
 1 1 1

 
2 2 2
 1 −1 −1  2 4 5
A = QR =  2 2 2  0 2 3
 1 −1 1 
2 2 2 0 0 2
1 1
   2 2
 − 12    
6 2 4 5 6 10
Q T b =  −6  ⇒ 0 2 3 x̂ =  −6  ⇒ x̂ =  −6 
4 0 0 2 4 2

7. Orthogonality and least squares December 3, 2013 74 / 119

Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 5:
6.5.1
6.5.19
6.5.20
6.5.21
6.5.24

7. Orthogonality and least squares December 3, 2013 75 / 119

Outline

7 Orthogonality and least squares

7. Orthogonality and least squares December 3, 2013 76 / 119

Least-squares linear regression
Example
In many scientific and engineering problems, it is needed to explain some
observations y as a linear function of an independent variable x. For instance, we
may try to explain the weight of a person as a linear function of its height

Weight = β0 + β1 Height

A. Schneider, G. Hommel, M. Blettner. Linear Regression Analysis. Dtsch Arztebl Int. 2010 November; 107(44): 776–782.

7. Orthogonality and least squares December 3, 2013 77 / 119

Least-squares linear regression

Example (continued)
For each observation we have an equation

Height (m.) Weight (kg.)

57 = β0 + 1.70β1
1.70 57
43 = β0 + 1.53β1
1.53 43
94 = β0 + 1.90β1
1.90 94
...
... ...
   
1 1.70 57
1 1.53 β0 = 43
 

1 1.90 β1 94
... ... ...

which is of the form

Xβ = y

7. Orthogonality and least squares December 3, 2013 78 / 119

Least-squares linear regression

Least-squares regression
Each one of the observed data points (xj , yj ) gives an equation. All together
provide an equation system
Xβ = y
that is an overdetermined, linear equation system of the form Ax = b. The matrix
X is called the system matrix and it is related to the independent (predictor)
variables (the height in this case). The vector y is called the observation vector
and collects the values of the dependent (predicted) variable (the weight in this
case). The model
y = β 0 + β1 x +

is called the linear regression of y on x . β0 and β1 are called the regression

coefficients. The difference between the predicted value and the observed value
for a particular observation () is called the residual of that observation.

7. Orthogonality and least squares December 3, 2013 79 / 119

Least-squares linear regression

The residual of the j-th observation is defined as

j = yj − (β0 + β1 xj )

7. Orthogonality and least squares December 3, 2013 80 / 119

Least-squares linear regression

The goal of least-squares regression is to minimize

n
2j = ky − X βk2
P
j=1

Let’s analyze this term

     
1 x1 β0 + β1 x 1 ŷ1
 1 x2  β0 β0 + β2 x2  ŷ2 
Xβ = 
  =

 =  ... 
  
... ...  β1 ...
1 xn β0 + βn x n ŷn
Then
  2
y1 − ŷ1
y2 − ŷ2  n n
ky − X βk2 =  (yj − ŷj )2 = 2j
P P
 ... 
 =
j=1 j=1
yn − ŷn

7. Orthogonality and least squares December 3, 2013 81 / 119

Least-squares linear regression

Example
Suppose we have observed the following values of height and
 weight (1.70,57),

1 1.70
(1.53,43), (1.90,94). We construct the system matrix X = 1 1.53 and the
  1 1.90
57
observation vector y = 43. Now we look the normal equations
94
T T
X β = y⇒ X Xβ = X y
3.00 5.13 194.00 −173.14
XTX = XTy = β̂ = (X T X )−1 X T y =
5.13 8.84 341.29 137.90
Weight = −173.39 + 139.21Height

7. Orthogonality and least squares December 3, 2013 82 / 119

Least-squares linear regression

Example
110
MATLAB:
100
X=[1 1.70; 1 1.53; 1 1.90];
90
y=[57; 43; 94];
80
beta=inv(X’*X)*X’*y
Weight (kg)

70
x=1.5:0.01:2.00;
60
yp=beta(1)+beta(2)*x;
50 plot(x,yp,X(:,1),y,’o’)
40 xlabel(’Height (m)’)
30 ylabel(’Weight (kg)’)
1.5 1.6 1.7 1.8 1.9 2
Height (m)

7. Orthogonality and least squares December 3, 2013 83 / 119

Least-squares linear regression

The general linear model

The linear model is not restricted to straight lines. We can use it to fit any kind of
curves:
y = β0 f0 (x ) + β1 f1 (x ) + β2 f2 (x ) + ...

Fitting a parabola
y1 = f0 (x1 ) + β1 f1 (x1 ) + β2 f2 (x1 )
f0 (x ) = 1
y2 = f0 (x2 ) + β1 f1 (x2 ) + β2 f2 (x2 )
f1 (x ) = x ⇒
...
f2 (x ) = x 2
 y n = f
0 (xn ) + β1 f1 (x
n ) + β2 f2 (x
n) 
y1 1 x1 x12   1
y2   1 x2 x22  β0
 β1  + 2 
 
 =
 ...  ... ... ...  ... ⇒ y = Xβ +
2 β2
yn 1 xn xn n

7. Orthogonality and least squares December 3, 2013 84 / 119

Least-squares linear regression

Fitting a parabola
In this example they model the deformation of the wall of the zebra fish embryo as
a function of strain.

Z. Lua, P. C.Y. Chen, H. Luo, J. Nam, R. Ge, W. Lin. Models of maximum stress and strain of zebrafish embryos under indentation. J. Biomechanics 42

(5): 620–625 (2009)

7. Orthogonality and least squares December 3, 2013 85 / 119

Least-squares linear regression

Multivariate linear regression

The linear model is not restricted to one variable. By fitting several variables we
may fit surfaces and hypersurfaces
y = β0 f0 (x1 , x2 ) + β1 f1 (x1 , x2 ) + β2 f2 (x1 , x2 ) + ...

Fitting a parabolic surface

f0 (x1 , x2 ) = 1
2 2
 
f1 (x1 , x2 ) = x1 1 x11 x12 x11 x12 x11 x12
2 2
f2 (x1 , x2 ) = x2  1 x21 x22 x21 x22 x21 x22 
⇒X = 
f3 (x1 , x2 ) = x12 ... ... ... ... ... ... 
2 2 2
f4 (x1 , x2 ) = x2 1 xn1 xn2 xn1 xn2 xn1 xn2
f5 (x1 , x2 ) = x1 x2

7. Orthogonality and least squares December 3, 2013 86 / 119

Least-squares linear regression
Fitting a parabolic surface
In this example they model the shape of cornea using videokeratoscopic images.

7. Orthogonality
http://www.fhp.tu- darmstadt.de/nt/index.php?id=531&L=1Signal and least
Processing squares
Group, Technische Universitat Darmstadt December 3, 2013 87 / 119
Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 6:
6.6.1
6.6.5
6.6.9
6.6.12 (computer)

7. Orthogonality and least squares December 3, 2013 88 / 119

Outline

7 Orthogonality and least squares

7. Orthogonality and least squares December 3, 2013 89 / 119

Inner product spaces

7. Orthogonality and least squares December 3, 2013 90 / 119

Inner product spaces
Definition 7.1 (Inner product)
An inner product in a vector space V is a function that assigns a real number to
every pair of vectors u and v, hu, vi and that satisfies the following axioms for all
u, v, w ∈ V and all scalars c:
1 hu, vi = hv, ui
2 hu + v, wi = hu, wi + hv, wi
3 hcu, vi = c hu, vi
4 hu, ui ≥ 0 and hu, ui = 0 iff u = 0.

Example
For instance in Weighted Least Squares (WLS) we may use an inner product in
R2 defined as:

hu, vi = 4u1 v1 + 5u2 v2

In this way we give less weight to distances in the first component with respect to
distances in the second component.
7. Orthogonality and least squares December 3, 2013 91 / 119
Inner product spaces

Now we have to prove that this function is effectively an inner product:

1 hu, vi = hv, ui
hu, vi = 4u1 v1 + 5u2 v2 [by definition]
= 4v1 u1 + 5v2 u2 [commutativity of scalar multiplication]
= hv, ui [by definition]
2 hu + v, wi = hu, wi + hv, wi

hu + v, wi = 4(u1 + v1 )w1 + 5(u2 + v2 )w2 [by definition]

= 4u1 w1 + 4v1 w1 + 5u2 w2 + 5v2 w2 [distributivity of scalar]
[multiplication/addition]
= 4u1 w1 + 5u2 w2 + 4v1 w1 + 5v2 w2 [commutativity]
[of scalar addition]
= hu, wi + hv, wi [by definition]

7. Orthogonality and least squares December 3, 2013 92 / 119

Inner product spaces

3 hcu, vi = c hu, vi
hcu, vi = 4cu1 v1 + 5cu2 v2 [by definition]
= c4v1 u1 + c5v2 u2 [commutativity of scalar multiplication]
= c(4v1 u1 + 5v2 u2 ) [distributivity of scalar multiplication]
= c hu, vi [by definition]
4 hu, ui ≥ 0 and hu, ui = 0 iff u = 0.
1 hu, ui ≥ 0
hu, ui = 4u12 + 5u22 [by definition]
which is obviously larger than 0.
2 hu, ui = 0 iff u = 0.
hu, ui = 0 ⇔ 4u12 + 5u22 = 0 ⇔ u1 = u2 = 0

7. Orthogonality and least squares December 3, 2013 93 / 119

Inner product spaces

Example
Consider two vectors p and q the vector space of polynomials of degree n (Pn ).
Let t0 , t1 , ..., tn be n distinct real numbers and K any scalar. The inner product
between p and q is defined as
hp, qi = K (p(t0 )q(t0 ) + p(t1 )q(t1 ) + ... + p(tn )q(tn ))

Axioms 1-3 are easy to check. Let’s prove Axiom 4

4 hp, pi ≥ 0 and hp, pi = 0 iff p = 0.
1 hp, pi ≥ 0

hp, pi = K p 2 (t0 ) + p 2 (t1 ) + ... + p 2 (tn ) [by definition]
which is obviously larger than 0.
2 hp, pi = 0 iff p = 0.

hp, pi = 0 ⇔ K p 2 (t0 ) + p 2 (t1 ) + ... + p 2 (tn ) ⇔
p(t0 ) = p(t1 ) = ... = p(tn ) = 0
But p is a polynomial of degree n so, at most, it can have n zeros. However,
the previous condition requires the polynomial to vanish at n + 1 points. This
is impossible unless p = 0.

7. Orthogonality and least squares December 3, 2013 94 / 119

Inner product spaces

Example
Consider two vectors p and q the vector space of polynomials of degree n (Pn ).
Assume that we regularly space the n + 1 points in the interval [−1, 1]

and set K = ∆T , then the inner product between the two polynomials becomes
n
P
hp, qi = (p(t0 )q(t0 ) + p(t1 )q(t1 ) + ... + p(tn )q(tn )) ∆T = p(ti )q(ti )∆T
i=0

Making ∆T tend to 0, the inner product becomes

R1
hp, qi = −1 p(t)q(t)dt

7. Orthogonality and least squares December 3, 2013 95 / 119

Inner product spaces

Legendre polynomials are orthogonal polynomials in the interval [−1, 1]

Legendre polynomials are very useful for regression of high-order polynomials as

shown in next slide.

7. Orthogonality and least squares December 3, 2013 96 / 119

Inner product spaces

7. Orthogonality and least squares December 3, 2013 97 / 119

Length, distance and orthogonality

The length of a vector u in an inner product space is defined in the standard way
p
kuk = hu, ui
Similarly, the distance between two vectors u and v is defined as

d(u, v) = ku − vk
Finally, two vectors u and v are said to be orthogonal iff
hu, vi = 0

7. Orthogonality and least squares December 3, 2013 98 / 119

Length, distance and orthogonality

Example
In the vector space of polynomials in the interval [0, 1], P[0, 1], let’s define the
inner product
R1
hp, qi = 0 p(t)q(t)dt
What is the length of the vector p(t) = 3t 2 ?
Solution
qR qR qR
p 1 2 1 2 )2 dt = 1
kpk = hp, pi = 0
p (t)dt = 0
(3t 0
9t 4 dt
r
5
1 q
9 t5 = 9 15 − 0 = √35

=
0

7. Orthogonality and least squares December 3, 2013 99 / 119

Gram-Schmidt orthogonalization
Example
Gram-Schmidt is applied in the standard way. For instance, find an orthogonal
basis of P2 [−1, 1]. A basis that spans this space is

{1, t, t 2 }
Let’s orthogonalize it
p0 (t) = 1 R1
ht,p0 (t)i tdt
p1 (t) = t− kp0 k2 p0 (t) = t − R−11 1 = t − 02 1 = t
dt
−1
ht 2 ,p0 (t)i
2 ht 2 ,p1 (t)i
p2 (t) = t − kp0 k2 p0 (t) − kp1 k2 p1 (t)
R1 2 R1 2
t dt t tdt 2
2 1
−1
= t − 1
R − R−11 2 t = t 2 − 23 = t 2 − 3
dt t dt
−1 −1

In Slide 97 we proposed the Legendre polynomial of degree 2 to be

P2 (t) = 21 (3t 2 − 1), we can easily show that P2 (t) = 32 p2 (t). Consequently, if
p2 (t) is orthogonal to p0 (t) and p1 (t) so is P2 (t).
7. Orthogonality and least squares December 3, 2013 100 / 119
Best approximation

Example
What is the best approximation in P2 [−1, 1] of p(t) = t 3 ?
Solution
We know the answer is the orthogonal projection of p(t) onto P2 [−1, 1]. An
orthogonal basis of P2 [−1, 1] is {1, t, t 2 − 13 }. Therefore, this projection can be
calculated as
hp,p0 i hp,p1 i hp,p2 i
p̂(t) = ProjP2 [−1,1] {p(t)} = kp0 k2 p0 (t) + kp1 k2 p1 (t) + kp2 k2 p2 (t)

Let’s perform these calculations:

R1 R1
hp, p0 (t)i = −1 t 3 dt = 0 kp0 k2 = −1 dt = 2
R1 R1
hp, p1 (t)i = −1 t 3 tdt = 25 kp1 k2 = −1 t 2 dt = 23
R1 3 2 1 R1
hp, p2 (t)i = −1 t (t − 3 )dt = 0 kp2 k2 = −1 (t 2 − 13 )2 dt = 8
45
2
0 0
p̂(t) = 2 + t+5
2 8 (t 2 − 13 ) = 53 t
3 45

7. Orthogonality and least squares December 3, 2013 101 / 119

Best approximation

1
t3
0.8
3/5t
0.6

0.4

0.2

−0.2

−0.4

−0.6

−0.8

−1
−1 −0.5 0 0.5 1

7. Orthogonality and least squares December 3, 2013 102 / 119

Best approximation

Example
In this example we exploited the best approximation property of orthogonal
wavelets to speed-up and make more robust angular alignments of projections in
3D Electron Microscopy.

C.O.S.Sorzano, S. Jonic, C. El-Bez, J.M. Carazo, S. De Carlo, P. Thévenaz, M. Unser. A multiresolution approach to orientation assignment in 3-D

electron microscopy of single particles. Journal of Structural Biology 146(3): 381-392 (2004, cover article)

7. Orthogonality and least squares December 3, 2013 103 / 119

Pythagorean theorem

Theorem 7.1 (Pythagorean theorem)

Given any vector v in an inner product space V and a subspace of it W ⊆ V we
have
kvk2 = kProjW {v}k2 + kv − ProjW {v}k2

7. Orthogonality and least squares December 3, 2013 104 / 119

The Cauchy-Schwarz inequality

Theorem 7.2 (The Cauchy-Schwarz inequality)

For all u, v ∈ V it is verified
| hu, vi | ≤ kukkvk
Proof
If u = 0, then

| h0, vi | = 0 and k0kkvk = 0kvk = 0

So the inequality becomes an equality.
If u 6= 0, then consider W = Span{u} and
hv,ui |hv,ui| |hv,ui|
kProjW {v}k = kuk2 u = kuk2 kuk = kuk

But by the Pythagorean Theorem (Theorem 7.1) we have kProjW {v}k ≤ kvk.
Consequently,
|hv,ui|
kuk ≤ kvk ⇒ | hv, ui | ≤ kukkvk (q.e.d.)

7. Orthogonality and least squares December 3, 2013 105 / 119

The Triangle inequality

Theorem 7.3 (The Triangle inequality)

For all u, v ∈ V it is verified
ku + vk ≤ kuk + kvk
Proof

ku + vk2 = hu + v, u + vi [By definition]

= hu, ui + hv, vi + 2 hu, vi [Properties of inner product]
≤ kuk2 + kvk2 + 2| hu, vi | hu, vi ≤ | hu, vi |
≤ kuk2 + kvk2 + 2kukkvk Cauchy-Schwarz
= (kuk + kvk)2
⇒
ku + vk ≤ kuk + kvk [Taking square root]

(q.e.d.)

7. Orthogonality and least squares December 3, 2013 106 / 119

Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 7:
6.7.1
6.7.13
6.7.16
6.7.18

7. Orthogonality and least squares December 3, 2013 107 / 119

Outline

7 Orthogonality and least squares

7. Orthogonality and least squares December 3, 2013 108 / 119

Weighted Least Squares

Let us assume we have a table of collected data and we want to fit a least squares
model. However, we want to give more importance to some observations because
we are more confident about them or they are more important. We encode the
importance as a weight value (the larger the weight, the more importance the
observation has)
X Y W
x1 y1 w1
x2 y2 w2
x3 y3 w3
... ... ...
Let us call ŷj the prediction of the model for the j-th observation and j the
committed error
yj = ŷj + j

7. Orthogonality and least squares December 3, 2013 109 / 119

Weighted Least Squares

The goal is now to minimize the weighted sum of square errors

n n n
(wj j )2 = (wj (yj − ŷj ))2 = (wj yj − wj ŷj )2
P P P
j=1 j=1 j=1

Let us collect all observed values into a vector y and do analogously with the
predictions ŷ. Let us define the diagonal matrix
 
w1 0 0 ... 0
 0 w2 0 ... 0 
 
W = 0 0 w3 ... 0  
 ... ... ... ... ... 
0 0 0 ... wn
Then, the previous objective function becomes
n
(wj yj − wj ŷj )2 = kW y − W ŷk2
P
j=1

7. Orthogonality and least squares December 3, 2013 110 / 119

Weighted Least Squares

Now, suppose that ŷ is calculated from the columns of a matrix A, that is,
ŷ = Ax. The objective function becomes
n
(wj yj − wj ŷj )2 = kW y − WAxk2
P
j=1

The minimum of this objective function is attained for x̂ that is the least-squares
solution of the equation system
WAx = W y

The normal equations of the problem are

(WA)T WAx = (WA)T W y

7. Orthogonality and least squares December 3, 2013 111 / 119

Weighted Least Squares

Example
In this work they used Weighted Least Squares to calibrate a digital system to
measure maximum respiratory pressures.

J.L. Ferreira, F.H. Vasconcelos, C.J. Tierra-Criollo. A Case Study of Applying Weighted Least Squares to Calibrate a Digital Maximum Respiratory

Pressures Measuring System. Applied Biomedical Engineering, Chapter 18 (2011)

7. Orthogonality and least squares December 3, 2013 112 / 119

Fourier Series

Example

Fourier tools are, maybe, the

most widespread tool to analyze
signals and its frequency
components. Fourier
decomposition states that any
signal can be obtained by
summing sine waves of different
amplitude, phase and frequency.

7. Orthogonality and least squares December 3, 2013 113 / 119

Fourier Series

Theorem 8.1
Consider the vector space of continuous functions in the interval [0, 2π], C [0, 2π].
The set
S = {1, cos(t), sin(t), cos(2t), sin(2t), ..., cos(Nt), sin(Nt)}

is orthogonal with respect to the inner product defined as

R 2π
hf (t), g(t)i = 0 f (t)g(t)dt
Proof
R 2π
hcos(nt), cos(mt)i = 0 cos(nt) cos(mt)dt
R 2π
= 0 12 (cos((n + m)t) + cos((n − m)t))dt
2∗π
= 21 sin((n+m)t)
n+m + sin((n−m)t)
n−m
0
= 0

where we have used cos(A) cos(B) = 21 (cos(A + B) + cos(A − B)).

7. Orthogonality and least squares December 3, 2013 114 / 119

Fourier Series

Analogously we could prove that

hcos(nt), sin(mt)i = 0
hcos(nt), 1i = 0
hsin(nt), 1i = 0
k cos(nt)k2 = π
k sin(nt)k2 = π
k1k2 = 2π

7. Orthogonality and least squares December 3, 2013 115 / 119

Fourier Series

Theorem 8.2 (Fourier series)

Given any function f (t) ∈ C [0, 2π], f (t) can be approximated as closely as desired
by a sum of the form simply by orthogonally projecting it onto W = Span{S}
N
hf (t),1i P hf (t),cos(nt)i hf (t),sin(nt)i
f (t) ≈ ProjW {f (t)} = k1k2 + k cos(nt)k2 cos(nt) + k sin(nt)k2 sin(nt)
n=1

7. Orthogonality and least squares December 3, 2013 116 / 119

Fourier Series

Example
In this work we used Fourier space to simulate and to align electron microscopy
images

S. Jonic, C.O.S.Sorzano, P. Thévenaz, C. El-Bez, S. De Carlo, M. Unser. Spline-Based image-to-volume registration for three-dimensional electron

microscopy. Ultramicroscopy, 103: 303-317 (2005)

7. Orthogonality and least squares December 3, 2013 117 / 119

Exercises

Exercises
From Lay (3rd ed.), Chapter 6, Section 8:
6.8.1
6.8.6
6.8.8
6.8.11

7. Orthogonality and least squares December 3, 2013 118 / 119

Outline

7 Orthogonality and least squares

7. Orthogonality and least squares December 3, 2013 119 / 119

Inner Product Spaces and Gram-Schmidt Orthogonalizations
No ratings yet
Inner Product Spaces and Gram-Schmidt Orthogonalizations
21 pages
Week13 14
No ratings yet
Week13 14
21 pages
6-Inner Product Spaces
No ratings yet
6-Inner Product Spaces
36 pages
Chapter 6 Orthogonality
No ratings yet
Chapter 6 Orthogonality
39 pages
Lec 9
No ratings yet
Lec 9
34 pages
Lin Algebra 2 Inner Product Spaces
No ratings yet
Lin Algebra 2 Inner Product Spaces
28 pages
Chapter 5
No ratings yet
Chapter 5
76 pages
Inner Product Space
No ratings yet
Inner Product Space
19 pages
Inner Product Space
No ratings yet
Inner Product Space
32 pages
Inner Product
No ratings yet
Inner Product
10 pages
Orthogonality and Vector Spaces
No ratings yet
Orthogonality and Vector Spaces
18 pages
Chapitre 2
No ratings yet
Chapitre 2
17 pages
Module - 5: Inner Product Spaces
No ratings yet
Module - 5: Inner Product Spaces
13 pages
Orthogonality and Inner Products
No ratings yet
Orthogonality and Inner Products
28 pages
Lecture 17
No ratings yet
Lecture 17
3 pages
5.3 Inner Product, Othogonal and Orthonormal Set
No ratings yet
5.3 Inner Product, Othogonal and Orthonormal Set
7 pages
mat67-Lj-Inner Product Spaces
No ratings yet
mat67-Lj-Inner Product Spaces
13 pages
Topic 10 Inner Products
No ratings yet
Topic 10 Inner Products
4 pages
Inner Product Spaces
No ratings yet
Inner Product Spaces
16 pages
Lecture 16 Merged
No ratings yet
Lecture 16 Merged
12 pages
Section 6.3 (Extra Details)
No ratings yet
Section 6.3 (Extra Details)
18 pages
Inner Product Spaces Explained
No ratings yet
Inner Product Spaces Explained
22 pages
Espace Zoom
No ratings yet
Espace Zoom
39 pages
Inner Product Space IITB
No ratings yet
Inner Product Space IITB
13 pages
Orthogonality and Linear Independence: Sarang S. Sane
No ratings yet
Orthogonality and Linear Independence: Sarang S. Sane
25 pages
Inner Product Spaces FY
No ratings yet
Inner Product Spaces FY
43 pages
Module-5: Inner Product Spaces: Dr. Bhavya Shivaraj
No ratings yet
Module-5: Inner Product Spaces: Dr. Bhavya Shivaraj
69 pages
Advanced Linear Algebra Concepts
No ratings yet
Advanced Linear Algebra Concepts
8 pages
4 12
No ratings yet
4 12
19 pages
Basis Orthogonal
No ratings yet
Basis Orthogonal
12 pages
Inner Product Spaces & Theorems
No ratings yet
Inner Product Spaces & Theorems
3 pages
Linear Algebra 2
No ratings yet
Linear Algebra 2
16 pages
Sem232 La Cc03 Group 2
No ratings yet
Sem232 La Cc03 Group 2
16 pages
LA7 Orthogonality
No ratings yet
LA7 Orthogonality
10 pages
Module 1, Notes 2
No ratings yet
Module 1, Notes 2
23 pages
5.4 Inner Product Spaces
No ratings yet
5.4 Inner Product Spaces
10 pages
Linear Algebra (MA - 102) Lecture - 11 Inner Product Spaces Gram-Schmidt Orthogonalization Process
No ratings yet
Linear Algebra (MA - 102) Lecture - 11 Inner Product Spaces Gram-Schmidt Orthogonalization Process
24 pages
Inner Product Spaces Explained
No ratings yet
Inner Product Spaces Explained
15 pages
Lec8 Projection, Gram Schmidt Orthogonalization, Projection Into
No ratings yet
Lec8 Projection, Gram Schmidt Orthogonalization, Projection Into
51 pages
Section 6.2 (Extra Details)
No ratings yet
Section 6.2 (Extra Details)
9 pages
Chapter 4
No ratings yet
Chapter 4
23 pages
I Sem2 EN Burlayenko LC 31-03-20inerproduct
No ratings yet
I Sem2 EN Burlayenko LC 31-03-20inerproduct
9 pages
Advanced Linear Algebra Concepts
No ratings yet
Advanced Linear Algebra Concepts
19 pages
8.1. Inner Product Spaces
No ratings yet
8.1. Inner Product Spaces
27 pages
A7003 (LAO) Handout-11
No ratings yet
A7003 (LAO) Handout-11
8 pages
Lecture 18 Inner Product
No ratings yet
Lecture 18 Inner Product
4 pages
Unit6 EuclideanSp
No ratings yet
Unit6 EuclideanSp
36 pages
Lect3 02web
No ratings yet
Lect3 02web
23 pages
Rhotrix Topological Space
No ratings yet
Rhotrix Topological Space
8 pages
BCS405D Module 4 PDF
No ratings yet
BCS405D Module 4 PDF
19 pages
Week 10 PDF
No ratings yet
Week 10 PDF
8 pages
Week8 Inner Product
No ratings yet
Week8 Inner Product
3 pages
Orthogonality
No ratings yet
Orthogonality
61 pages
Orthogonal Compliment Ring Theory 2
No ratings yet
Orthogonal Compliment Ring Theory 2
23 pages
Week 13 17072025 091527am
No ratings yet
Week 13 17072025 091527am
10 pages
Analyzing The Impact of Video Marketing On Engagement
No ratings yet
Analyzing The Impact of Video Marketing On Engagement
96 pages
NPCT42x Trusted Platform Module (TPM) : General Description
No ratings yet
NPCT42x Trusted Platform Module (TPM) : General Description
25 pages
Crude Oil and Products
100% (2)
Crude Oil and Products
34 pages
Cold Weather Concrete PDF
No ratings yet
Cold Weather Concrete PDF
4 pages
Cohen's Pathways of The Pulp 12th Edition Louis H. Berman Dds Facd Instant Download
No ratings yet
Cohen's Pathways of The Pulp 12th Edition Louis H. Berman Dds Facd Instant Download
125 pages
Holiday Homework (Class IX-XII) PDF
No ratings yet
Holiday Homework (Class IX-XII) PDF
59 pages
Tesis Con Mrf136
No ratings yet
Tesis Con Mrf136
147 pages
Importance of Culture in Comm
100% (1)
Importance of Culture in Comm
30 pages
Microbiology and Parasitology Chapter 3 5
No ratings yet
Microbiology and Parasitology Chapter 3 5
30 pages
Warpoint Franchise Package Offer
No ratings yet
Warpoint Franchise Package Offer
21 pages
Muthucrackers Pricelist
No ratings yet
Muthucrackers Pricelist
7 pages
HP DesignJet T530 Printer Series Datasheet
No ratings yet
HP DesignJet T530 Printer Series Datasheet
2 pages
A Wadood CV
No ratings yet
A Wadood CV
3 pages
Std09 III Maths em
No ratings yet
Std09 III Maths em
120 pages
Fin254.14 Group Assignment
No ratings yet
Fin254.14 Group Assignment
35 pages
Amazon
No ratings yet
Amazon
14 pages
All-Solid-State Lithium-Ion and Lithium Metal Batteries-2018schnell
No ratings yet
All-Solid-State Lithium-Ion and Lithium Metal Batteries-2018schnell
16 pages
Don Don Donki
No ratings yet
Don Don Donki
30 pages
English 7 Lesson Plan: Prose vs Poetry
No ratings yet
English 7 Lesson Plan: Prose vs Poetry
3 pages
Literature Review Empirical Research
100% (2)
Literature Review Empirical Research
4 pages
Final Draft Cen 15780 Cleanliness of Ventilation Systems March 2011
100% (2)
Final Draft Cen 15780 Cleanliness of Ventilation Systems March 2011
35 pages
DLL - Mapeh 5 - Q2 - W3
No ratings yet
DLL - Mapeh 5 - Q2 - W3
6 pages
O.E.E. Reading Material
No ratings yet
O.E.E. Reading Material
12 pages
Brighenti 2016 Gramsci Civil Society
No ratings yet
Brighenti 2016 Gramsci Civil Society
8 pages
Class Xi - Holiday Homework - 2025
No ratings yet
Class Xi - Holiday Homework - 2025
6 pages
CH 13
100% (1)
CH 13
121 pages
Practical Research 1 Module 1 3
No ratings yet
Practical Research 1 Module 1 3
75 pages
Molecules 26 00666 v4
No ratings yet
Molecules 26 00666 v4
24 pages
Iom1210 LR WMC-C 06.17.2014 Da
No ratings yet
Iom1210 LR WMC-C 06.17.2014 Da
68 pages
19 Brioche Bread
No ratings yet
19 Brioche Bread
1 page