0% found this document useful (0 votes)

31 views22 pages

Least Squares, QR, SVD Explained

The document discusses techniques for solving least squares problems, including normal equations, QR factorization, and singular value decomposition (SVD). It presents an example of finding a curve to best fit data points and defines an overdetermined system. The normal equations approach for solving least squares problems is derived and its tendency to worsen matrix conditioning is noted. QR factorization and SVD are introduced as alternatives that avoid this conditioning problem.

Uploaded by

Muntaz Muntaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views22 pages

Least Squares, QR, SVD Explained

Uploaded by

Muntaz Muntaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Lecture 9

Least Squares, QR and SVD

T. Gambill

Department of Computer Science

University of Illinois at Urbana-Champaign

March 15, 2011

T. Gambill (UIUC) CS 357 March 15, 2011 1 / 22

Example 1: Finding a curve that best fits the data
Suppose we are given the data {(t1 , y1 ), ..., (t21 , y21 )} (circles) and we want to
find a parabolic curve that best fits the data.

T. Gambill (UIUC) CS 357 March 15, 2011 2 / 22

Example 1: Finding a curve that best fits the data

We are looking for a curve of the form,

f (t, x) = x1 + x2 t + x3 t2

so that
t21
   
1 t1   y1
1 t2 t22  x1  y2 
A ∗ x = . ..  ∗ x2  ≈  ..  = b
   
..
 .. . .  x3  . 
1 t21 t221 y21
The matrix A has the form of a Vandermonde matrix.

T. Gambill (UIUC) CS 357 March 15, 2011 3 / 22

Example 2: Reducing Measurement Error
Suppose a surveyor determined the heights of three hills above some
reference point as x1 = 1237ft., x2 = 1941ft. and x3 = 2417ft. and to confirm
these measurements the surveyor climbs to the top of the first hill and
measures the height of the second hill above the first to be, x2 = x1 + 711 and
the third above the first to be x3 = x1 + 1177. Similarly, the surveyor climbs the
second hill and measures the height of the third above the second to be
x3 = x2 + 475. (M. Heath) These equations can be written in matrix form as,
   
1 0 0 1237
0 1 0   1941
x
0 1  1  
   
0  2417
A∗x= −1 1 0 ∗ x2 ≈  711  = b
 

−1 0 1
 x3

1177


0 −1 1 475

Systems with more equations than unknowns are called overdetermined

The system above, Ax = b is an over-determined linear system. What values
should the surveyor give for the heights of the hills?
T. Gambill (UIUC) CS 357 March 15, 2011 4 / 22
Overdetermined Systems

If A is an m × n matrix, then in general, an m × 1 vector b may not lie in the

column space of A. Hence Ax = b may not have an exact solution.

Definition
The residual vector is
r = b − Ax.

The least squares solution is given by minimizing the square of the residual
in the 2-norm.

T. Gambill (UIUC) CS 357 March 15, 2011 5 / 22

Normal equations
Writing r = (b − Ax) and substituting, we want to find an x that minimizes the
following function

φ(x) = ||r||22 = rT r = (b − Ax)T (b − Ax) = bT b − 2xT AT b + xT AT Ax

From calculus we know that the minimizer occurs where ∇φ(x) = 0.

The derivative is given by

∇φ(x) = −2AT b + 2AT Ax = 0

Definition
The system of normal equations is given by

AT Ax = AT b.
The normal equations has a unique solution if rank(A) = n.
T. Gambill (UIUC) CS 357 March 15, 2011 6 / 22
Normal equations, a Geometric View

If the vector b is not in the span (the set of all linear combinations of vectors)
of the columns of A then in order to find the minimum distance from b to the
span of the columns of A we need to find x∗ such that r = (b − Ax∗ ) is
orthogonal to Ax for any x.

< r, Ax >=< b − Ax∗ , Ax >= 0

T. Gambill (UIUC) CS 357 March 15, 2011 7 / 22

Normal equations, a Geometric View

< r, Ax >=< b − Ax∗ , Ax >= 0

or
< AT (b − Ax∗ ), x >= 0
so that with x = ei the column vectors of the identity matrix we have,

(AT (b − Ax∗ ))T ei = 0 for i = 1, ...n

and thus
AT b = AT Ax∗

T. Gambill (UIUC) CS 357 March 15, 2011 8 / 22

Solving normal equations

Since the normal equations forms a symmetric positive definite system

(assuming rank(A) = n), we can solve by computing the Cholesky
factorization
AT A = LLT
and solving Ly = AT b and LT x = y.

Consider  
1 1
A =  0 
0
√
where 0 < < mach . The normal equations for this system is given by

1 + 2

1 1 1
AT A = =
1 1 + 2 1 1

T. Gambill (UIUC) CS 357 March 15, 2011 9 / 22

Normal equations: conditioning

The normal equations tend to worsen the condition of the matrix.

Since we defined the condition number for a square matrix only we will have
to extend this definition for Am×n .

Definition
Let Am×n have rank(A) = n. Then we define the pseudo-inverse A+ of A as
A+ = (AT A)−1 AT
and we define the condition number of A as,
cond(A) = ||A||2 ||A+ ||2

Theorem
cond(AT A) = (cond(A))2

T. Gambill (UIUC) CS 357 March 15, 2011 10 / 22

Normal equations: Matlab conditioning example

1 >> A = rand (10 ,10);

2 >> cond(A)
3 43.4237
4 >> cond(A’*A)
5 1.8856 e+03

How can we solve the least squares problem without squaring the condition of
the matrix?

T. Gambill (UIUC) CS 357 March 15, 2011 11 / 22

Other approaches

QR factorization.
I For A ∈ Rm×n , factor A = QR where
F Q is an m × m orthogonal matrix
F R is an m × n upper triangular
0 matrix (since R is an m × n upper triangular
R
matrix we can write R = where R 0 is n × n upper triangular and 0 is the
0
(m − n) × n matrix of zeros)

SVD - singular value decomposition

I For A ∈ Rm×n , factor A = USV T where
F U is an m × m orthogonal matrix
F V is an n × n orthogonal matrix
F S is an m × n diagonal matrix whose elements are the singular values.

T. Gambill (UIUC) CS 357 March 15, 2011 12 / 22

Orthogonal matrices

Definition
A matrix Q is orthogonal if
QT Q = QQT = I

Orthogonal matrices preserve the Euclidean norm of any vector v,

||Qv||22 = (Qv)T (Qv) = vT QT Qv = vT v = ||v||22 .

T. Gambill (UIUC) CS 357 March 15, 2011 13 / 22

Using QR factorization for least squares

Now that we know orthogonal matrices preserve the euclidean norm, we can
apply orthogonal matrices to the residual vector without changing the norm of
the residual. Note that A is m x n, Q is m x m, R 0 is n x n, x is n x 1 and b is
m x 1.
0 2 0 2 0 2
2 2 R T T R T R
krk2 = kb−Axk2 = b − Q x = Q b−Q Q x = Q b− x
0 2
0 2
0 2

c1
If QT b = where c1 is an n x 1 vector then
c2
2 2 2
R0
0
c1 − R 0 x

c1 Rx
T
Q b− x = − = = ||c1 − R 0 x||22 + ||c2 ||22
0 2
c2 0 2
c2 2

Hence the least squares solution is given by solving R 0 x = c1 . We can solve

R 0 x = c1 using back substitution and the residual is ||r||2 = ||c2 ||2 .

T. Gambill (UIUC) CS 357 March 15, 2011 14 / 22

Gram-Schmidt orthogonalization
One way to obtain the QR factorization of a matrix Am×n (rank(A) = n) is by
Gram-Schmidt orthogonalization.

For each column of A, subtract out its component in the other columns.

For the simple case of 2 vectors {a1 , a2 }, first normalize a1 and obtain
a1
q1 = .
||a1 ||

Now subtract from a2 the components from q1 . This is given by

a20 = a2 − < q1 , a2 > q1 = a2 − (qT1 a2 )q1 .

Normalizing a20 we have

a20
q2 =
||a20 ||
Repeating this idea for n columns gives us Gram-Schmidt orthogonalization.
T. Gambill (UIUC) CS 357 March 15, 2011 15 / 22
Gram-Schmidt orthogonalization

T. Gambill (UIUC) CS 357 March 15, 2011 16 / 22

Gram-Schmidt orthogonalization

Since R is upper triangular and A = QR where Q = [q1 |q2 | . . . |qm ] we have

a1 = q1 r11
a2 = q1 r12 + q2 r22
.. ..
. = .
an = q1 r1n + q2 r2n + ... + qn rnn

From this we see that rij = qTi aj , j >= i

T. Gambill (UIUC) CS 357 March 15, 2011 17 / 22

Gram-Schmidt orthogonalization

1 function [Q,R] = gs_qr (A)

3 m = size(A ,1);
4 n = size(A ,2);
5

6 for i = 1:n
7 R(i,i) = norm(A(:,i) ,2);
8 Q(:,i) = A(:,i)./R(i,i);
9 for j = i+1:n
10 R(i,j) = Q(:,i)’ * A(:,j);
11 A(:,j) = A(:,j) - R(i,j)*Q(:,i);
12 end
13 end
14

15 end

T. Gambill (UIUC) CS 357 March 15, 2011 18 / 22

Using SVD for least squares

Recall that a singular value decomposition is given by

 
 σ1
.. .. .. 

.. 
 ... vT1 ...

. . .  .

A = u1 . . . um  

σr
 .. 
 ... . . . .
   
.. .. .. .. vTn
 ... ...
 
. . .  .
0

where σi are the singular values.

T. Gambill (UIUC) CS 357 March 15, 2011 19 / 22

Using SVD for least squares

Assume that A has rank k (and hence k nonzero singular values σi ) and recall
that we want to minimize
||r||22 = ||b − Ax||22 .
Substituting the SVD for A we find that

||r||22 = ||b − Ax||22 = ||b − USV T x||22

where U and V are orthogonal and S is diagonal with k nonzero singular

values.
||b − USV T x||22 = ||UT b − UT USV T x||22 = ||UT b − SV T x||22

T. Gambill (UIUC) CS 357 March 15, 2011 20 / 22

Using SVD for least squares

Let c = UT b and y = V T x (and hence x = Vy) in ||UT b − SV T x||22 . We now have

||r||22 = ||c − Sy||22

Since S has only k nonzero diagonal elements, we have

X
k X
m
||r||22 = (ci − σi yi )2 + c2i
i=1 i=k+1

ci
which is minimized when yi = σi for 1 6 i 6 k.

T. Gambill (UIUC) CS 357 March 15, 2011 21 / 22

Using SVD for least squares

Theorem
Let A be an m × n matrix of rank r and let A = USV T , the singular value
decomposition. The least squares solution of the system Ax = b is

X
r
x= (σ−1
i ci )vi
i=1

where ci = uTi b.

T. Gambill (UIUC) CS 357 March 15, 2011 22 / 22

Some Notes On Least Squares, QR-factorization, SVD and Fitting
No ratings yet
Some Notes On Least Squares, QR-factorization, SVD and Fitting
12 pages
Linear Least-Squares
No ratings yet
Linear Least-Squares
7 pages
Linear Algebraic Equations, SVD, and The Pseudo-Inverse: 1 A Little Background
No ratings yet
Linear Algebraic Equations, SVD, and The Pseudo-Inverse: 1 A Little Background
8 pages
MATH 685/ CSI 700/ OR 682 Lecture Notes: Least Squares
No ratings yet
MATH 685/ CSI 700/ OR 682 Lecture Notes: Least Squares
60 pages
SVD for Linear Least Squares Solutions
No ratings yet
SVD for Linear Least Squares Solutions
20 pages
Final 4 Sem
No ratings yet
Final 4 Sem
29 pages
Linear Algebra Cheat Sheet
100% (1)
Linear Algebra Cheat Sheet
2 pages
SVD Slides
No ratings yet
SVD Slides
17 pages
Least Square Equation Solving
No ratings yet
Least Square Equation Solving
22 pages
Linear Least Squares Problem: - Can Not Be Satisfied Exactly
No ratings yet
Linear Least Squares Problem: - Can Not Be Satisfied Exactly
22 pages
Homework 4 MATH2050
No ratings yet
Homework 4 MATH2050
7 pages
Linear Algebra: Least Squares
No ratings yet
Linear Algebra: Least Squares
13 pages
Properties of The Singular Value Decomposition: Preliminary Definitions
No ratings yet
Properties of The Singular Value Decomposition: Preliminary Definitions
24 pages
Advanced Linear Algebra Exercises
No ratings yet
Advanced Linear Algebra Exercises
5 pages
Least Squares and The Singular Value Decomposition: Ivan Markovsky
No ratings yet
Least Squares and The Singular Value Decomposition: Ivan Markovsky
52 pages
Midterm Solutions: 1: Schur, Backsubstitution, Complexity (20 Points)
No ratings yet
Midterm Solutions: 1: Schur, Backsubstitution, Complexity (20 Points)
4 pages
Lecture 20
No ratings yet
Lecture 20
5 pages
Chap 03
No ratings yet
Chap 03
59 pages
A01 Exam1 - 2013
No ratings yet
A01 Exam1 - 2013
8 pages
Lecture 6
No ratings yet
Lecture 6
53 pages
Linear Project 1
No ratings yet
Linear Project 1
13 pages
Least Square+Best Fit
No ratings yet
Least Square+Best Fit
78 pages
Lec10 PDF
No ratings yet
Lec10 PDF
5 pages
Lecture 6
No ratings yet
Lecture 6
53 pages
15 Notes 6250 f13
No ratings yet
15 Notes 6250 f13
9 pages
72073931-8e00-4107-bdde-c19d4ec282cb
No ratings yet
72073931-8e00-4107-bdde-c19d4ec282cb
5 pages
Selected Linear Algebra For Machine Learning
No ratings yet
Selected Linear Algebra For Machine Learning
30 pages
Sol 11
No ratings yet
Sol 11
15 pages
Decomp
No ratings yet
Decomp
27 pages
EE263 Homework 3 Solutions
No ratings yet
EE263 Homework 3 Solutions
16 pages
QR Factorisation Method
No ratings yet
QR Factorisation Method
21 pages
Notas de Optimizacion
No ratings yet
Notas de Optimizacion
3 pages
Rank-Deficient Least Squares Analysis
No ratings yet
Rank-Deficient Least Squares Analysis
6 pages
Lecture 4 Full
No ratings yet
Lecture 4 Full
11 pages
Linear Algebra Concepts & Methods
No ratings yet
Linear Algebra Concepts & Methods
2 pages
Col726 2302 Ass2 Solutions
No ratings yet
Col726 2302 Ass2 Solutions
6 pages
The Singular Value Decomposition: Prof. Walter Gander ETH Zurich Decenber 12, 2008
No ratings yet
The Singular Value Decomposition: Prof. Walter Gander ETH Zurich Decenber 12, 2008
18 pages
L02 Notes
No ratings yet
L02 Notes
6 pages
Ee127-Fa2018-Mt1-El Ghaoui-Soln
No ratings yet
Ee127-Fa2018-Mt1-El Ghaoui-Soln
15 pages
Math 5610 Fall 2018 Notes of 9/24/18 Review: The Significance of Orthogonal Matrices
No ratings yet
Math 5610 Fall 2018 Notes of 9/24/18 Review: The Significance of Orthogonal Matrices
16 pages
Numerical Linear Algebra Solution
No ratings yet
Numerical Linear Algebra Solution
55 pages
SVD Lecture Notes: Concepts & Uses
No ratings yet
SVD Lecture Notes: Concepts & Uses
13 pages
Multiple View Geometry Exercises
No ratings yet
Multiple View Geometry Exercises
2 pages
MIT18 065S18PSets
No ratings yet
MIT18 065S18PSets
36 pages
Linear Algebra Essentials
No ratings yet
Linear Algebra Essentials
13 pages
MA412 Q&A Part 4
No ratings yet
MA412 Q&A Part 4
53 pages
If You Find Any Errors, Please Let Your Instructor Know ASAP
No ratings yet
If You Find Any Errors, Please Let Your Instructor Know ASAP
4 pages
1.2.7 Singular Value Decomposition: Mathematical Background 39
No ratings yet
1.2.7 Singular Value Decomposition: Mathematical Background 39
7 pages
Linear Algebra for Engineers
No ratings yet
Linear Algebra for Engineers
123 pages
Advanced Linear Algebra Methods
No ratings yet
Advanced Linear Algebra Methods
37 pages
SVD Presentation Updated2
No ratings yet
SVD Presentation Updated2
41 pages
Matrix Analysis Chap 08 Handout
No ratings yet
Matrix Analysis Chap 08 Handout
94 pages
Caam 453 Numerical Analysis I: 6 October 2009 M. Embree, Rice University
No ratings yet
Caam 453 Numerical Analysis I: 6 October 2009 M. Embree, Rice University
4 pages
Linear Algebra Cheat-Sheet: Laurent Lessard
100% (1)
Linear Algebra Cheat-Sheet: Laurent Lessard
13 pages
GVL 5, 5.3 Heath 3.1-3.5? TB 7 - (8), 11? - Qr-New
No ratings yet
GVL 5, 5.3 Heath 3.1-3.5? TB 7 - (8), 11? - Qr-New
68 pages
Orthogonal and Projection Matrices
No ratings yet
Orthogonal and Projection Matrices
11 pages
Spring 2017 Homework 9
No ratings yet
Spring 2017 Homework 9
30 pages
Least Squares Solution and Pseudo-Inverse: Bghiggins/Ucdavis/Ech256/Jan - 2012
No ratings yet
Least Squares Solution and Pseudo-Inverse: Bghiggins/Ucdavis/Ech256/Jan - 2012
12 pages
Week2 StatisticalLearning
No ratings yet
Week2 StatisticalLearning
46 pages
Tet Hall Ticket
No ratings yet
Tet Hall Ticket
3 pages
M.Tech - Computer Science Engineering
No ratings yet
M.Tech - Computer Science Engineering
47 pages
ACN Notes
No ratings yet
ACN Notes
15 pages

Least Squares, QR, SVD Explained

Uploaded by

Least Squares, QR, SVD Explained

Uploaded by

Lecture 9

Least Squares, QR and SVD

Department of Computer Science

March 15, 2011

T. Gambill (UIUC) CS 357 March 15, 2011 1 / 22

T. Gambill (UIUC) CS 357 March 15, 2011 2 / 22

We are looking for a curve of the form,

T. Gambill (UIUC) CS 357 March 15, 2011 3 / 22

Systems with more equations than unknowns are called overdetermined

If A is an m × n matrix, then in general, an m × 1 vector b may not lie in the

T. Gambill (UIUC) CS 357 March 15, 2011 5 / 22

φ(x) = ||r||22 = rT r = (b − Ax)T (b − Ax) = bT b − 2xT AT b + xT AT Ax

From calculus we know that the minimizer occurs where ∇φ(x) = 0.

The derivative is given by

∇φ(x) = −2AT b + 2AT Ax = 0

< r, Ax >=< b − Ax∗ , Ax >= 0

T. Gambill (UIUC) CS 357 March 15, 2011 7 / 22

< r, Ax >=< b − Ax∗ , Ax >= 0

(AT (b − Ax∗ ))T ei = 0 for i = 1, ...n

T. Gambill (UIUC) CS 357 March 15, 2011 8 / 22

Since the normal equations forms a symmetric positive definite system

T. Gambill (UIUC) CS 357 March 15, 2011 9 / 22

The normal equations tend to worsen the condition of the matrix.

T. Gambill (UIUC) CS 357 March 15, 2011 10 / 22

1 >> A = rand (10 ,10);

T. Gambill (UIUC) CS 357 March 15, 2011 11 / 22

SVD - singular value decomposition

T. Gambill (UIUC) CS 357 March 15, 2011 12 / 22

Orthogonal matrices preserve the Euclidean norm of any vector v,

||Qv||22 = (Qv)T (Qv) = vT QT Qv = vT v = ||v||22 .

T. Gambill (UIUC) CS 357 March 15, 2011 13 / 22

Hence the least squares solution is given by solving R 0 x = c1 . We can solve

T. Gambill (UIUC) CS 357 March 15, 2011 14 / 22

Now subtract from a2 the components from q1 . This is given by

a20 = a2 − < q1 , a2 > q1 = a2 − (qT1 a2 )q1 .

Normalizing a20 we have

T. Gambill (UIUC) CS 357 March 15, 2011 16 / 22

Since R is upper triangular and A = QR where Q = [q1 |q2 | . . . |qm ] we have

From this we see that rij = qTi aj , j >= i

T. Gambill (UIUC) CS 357 March 15, 2011 17 / 22

1 function [Q,R] = gs_qr (A)

T. Gambill (UIUC) CS 357 March 15, 2011 18 / 22

Recall that a singular value decomposition is given by

where σi are the singular values.

T. Gambill (UIUC) CS 357 March 15, 2011 19 / 22

||r||22 = ||b − Ax||22 = ||b − USV T x||22

where U and V are orthogonal and S is diagonal with k nonzero singular

T. Gambill (UIUC) CS 357 March 15, 2011 20 / 22

Let c = UT b and y = V T x (and hence x = Vy) in ||UT b − SV T x||22 . We now have

||r||22 = ||c − Sy||22

Since S has only k nonzero diagonal elements, we have

T. Gambill (UIUC) CS 357 March 15, 2011 21 / 22

T. Gambill (UIUC) CS 357 March 15, 2011 22 / 22

You might also like