Linear Algebra: Lecture Notes
Linear Algebra: Lecture Notes
Lecture Notes
Rostyslav Hryniv
        1st term
      Autumn 2019
Singular Value Decomposition   LU decomposition   Cholesky decomposition   QR and around
       go to socrative.com
       press student login
       enter the room LAUCU2019
       answer 10 questions on eigenvalues and eigenvectors
Singular Value Decomposition   LU decomposition   Cholesky decomposition   QR and around
Outline
        Spectral Theorem
        Low rank approximations
1    Singular Value Decomposition
        Definition
        Explanation and proof
        Applications of SVD
2    LU decomposition
        Linear systems
        Elementary matrices
        LU factorization
3    Cholesky decomposition
        Motivation
        Applications of Cholesky decomposition
        Algorithm
4    QR and around
        Applications of QR
Singular Value Decomposition       LU decomposition   Cholesky decomposition   QR and around
                                   A = λ1 u1 u>                  >
                                              1 + · · · + λn un un
Let                                            
                                          15 10
                                      A=
                                          10 0
Why low-rank?
Cost comparison:
       full m × n matrix requires mn numbers to store;
       rank one matrix requires only m + n + 1
       important e.g. for image compressions
Singular Value Decomposition   LU decomposition   Cholesky decomposition   QR and around
       If A is non-symmetric, then
               its eigenvectors need not be orthogonal, or even
               too few eigenvectors (have to use generalized EVc’s)
       If A is non-square, there are no eigenvalues and eigenvectors at all!
The matrix uv> has rows u1 v> , . . . , um v> , if the rows of A are a>
                                                                      1 , ...,
a>
 m , then                m                           m
                        X                           X
                 > 2            >          > 2
          kA − uv kF =      kaj − uj v k =              kaj − uj vk2
                                      j=1                        j=1
This is minimal if uj v is the projection Pk aj of aj onto ls(v):
            m
            X                         m
                                      X                   m
                                                          X                m
                                                                           X
                   kaj − Pk aj k2 =         kP⊥ aj k2 =         kaj k2 −           kPk aj k2
             j=1                      j=1                 j=1               j=1
Thus need to maximize
           Xm             m
                          X           m
                                      X
              kPk aj k2 =   |a>
                              j v|2
                                    =   |v> aj |2 = kAvk2
                     j=1              j=1             j=1
Singular Value Decomposition    LU decomposition   Cholesky decomposition   QR and around
Motivation
                                            A = UΣV T
Singular Value Decomposition      LU decomposition   Cholesky decomposition   QR and around
Singular values
Example
        
     1 1                    
                  T     2  1
A = 0 1; B = A A =           has EV’s λ1 = 3 and λ2 = 1;
                        1 2
     1 0
                    √
          thus σ1 = 3, σ2 = 1
Singular Value Decomposition   LU decomposition     Cholesky decomposition   QR and around
SVD theorem
Theorem (SVD)
Every m × n matrix A can be written as
A = UΣV T
Remark
This is an analogue for the A = UDU T diagonalization of a symmetric
matrix A
Singular Value Decomposition           LU decomposition      Cholesky decomposition     QR and around
SVD theorem
Theorem (SVD — expanded form)
Every m × n matrix A of rank r can be written as A = UΣV T , where
                     U = (u1 . . . ur |ur +1 . . . um ),
                         V = (v1 . . . vr |vr +1 . . . vn ),
       Σ has σj on its main diagonal and zeros otherwise
       vj are eigenvectors of AT A with EV’s σj2 :                    AT Avj = σj2 vj
       uj := Avj /kAvj k = Avj /σj for j = 1, . . . , r is an ONB for the range
       of A
       u1 , . . . , um is an ONB for Rm
                               A = σ1 u1 vT1 + · · · + σr ur vTr
The vectors u1 , . . . , ur are the left singular vectors of A;
            v1 , . . . , vr are the right singular vectors of A
Remark:            Avj = σj uj , AT uj = σvj , AAT uj = σj2 uj
Singular Value Decomposition          LU decomposition            Cholesky decomposition            QR and around
                       UΣ = (σ1 u1 . . . σr ur |0 .{z
                                                    . . 0})
                                                          n−r
                                                     . . 0}) = A(v1 . . . vn ) = AV
                               = (Av1 . . . Avr |0 .{z
                                                         n−r
Example
            
         1 1
For A = 0 1, we find that
         1 0
         √
    σ1 = 3 and σ2 = 1
       v1 = ( √1 , √1 ) and v2 = ( √1 , − √1 )
                2    2               2      2
              1
                  √ 1 1 T
       u1 = √3 ( 2, √ , √ ) ,
                               2   2
       u2 = (0, − √1 , √1 )T ,
                         2   2
       u3 =      √1 (−1, 1, 1)T
                                                                                  
                   3                                  1 1               0        0
                     σ1 u1 vT + σ2 u2 vT2     =  12      1
                                                          2     + − 21          1
                                                                                 2       =A
                                                      1   1             1
                                                      2   2             2     − 12
              T is the best rank one approximation of A in the Frobenius
       σ1 u1 vP
       norm (aij − bij )2
Singular Value Decomposition    LU decomposition    Cholesky decomposition   QR and around
Interpretation of SVD
x 7→ y := V T x, y 7→ z := Σy, z 7→ Ax = Uz
Reduced SVD
       In the SVD representation, some part is uninformative:
               vr +1 , . . . , vn are chosen arbitrarily in the nullspace of A
               ur +1 , . . . um are chosen arbitrarily in the nullspace of AT
               Σ has zero rows or columns
       The reduced SVD removes that uninformative part:
                                                           T
                                                              v1
                                   
                                     σ 1    · · ·     0
                  A = u1 · · · ur  . . . . . . . . . . .   ... 
                                                                
                      |    {z    } 0 ··· σ                      T
                                                          } | v{zr }
                          m×r                           r
                                   |         {z
                                                            r ×r               r ×n
To summarize:
       The SVD for arbitrary rectangular matrices is an analogue of the
       spectral decomposition for square (especially symmetric) matrices
       factorization
                                   A = UΣV T
       means:
               rotation V T : change of basis to v1 , . . . , vn
               stretch Σ: multiplication by singular values along the vj
               rotation U: change of basis to u1 , . . . , um
       in particular, Ax = b is equivalent to Σc = d, where
               c := V > x is the coordinate vector of x in the ONB v1 , . . . , vn
               d := U > bPis the coordinatePvector of b in the ONB u1 , . . . , um ; ;
               thus x = ck vk and b = dk uk with dk = σk ck for k = 1, . . . , r
       Geometrically this means that A maps the unit ball Bn of Rn into
       “degenerate” ellipsoid Em of Rm :
                                 X
                                    (dk /σk )2 ≤ 1
Singular Value Decomposition   LU decomposition   Cholesky decomposition   QR and around
                               v1 7→ e1 7→ σ1 e1 7→ σ1 u1
                               v2 7→ e2 7→ σ2 e2 7→ σ2 u2
Singular Value Decomposition   LU decomposition   Cholesky decomposition   QR and around
Polar decomposition
Why polar?
z = reiθ =⇒ zz = |z|2 = r 2
A = QS =⇒ AT A = S(Q T Q)S = S 2
Proof.
Write A = UΣV T = (UV T )(V ΣV T ) =: QS
      Q := UV T is orthogonal
      S := V ΣV T is symmetric and positive semidefinite
Singular Value Decomposition          LU decomposition       Cholesky decomposition   QR and around
Image compression
Instead of storing m × n numerical entries, can take the best rank-r
approximation of A; need
                            r (1 + m + n)
numbers
Pseudo-inverse
A rectangular A cannot be inverted!
However, a pseudo-inverse A+ can be defined s.t.
                        A+ A ≈ In and AA+ ≈ Im
for Σ, the pseudo-inverse Σ+ should satisfy
                               Σ+ Σ = Ir ⊕ 0n−r ,        Σ Σ+ = Ir ⊕ 0m−r
         Σ+ gets transposed and σj replaced with 1/σj
if A = UΣV T , then its pseudo-inverse is A+ := V Σ+ U T : indeed,
Best solution to Ax = b:
       Recall that Ax = b is soluble ⇐⇒ b belongs to the range (ie,
       column space) of A
       if n > rank A, then there are many solutions (or none)
       (homogeneous equation Ax = 0 has nontrivial solutions)
       any solution has the form x = x0 + x1 with x0 a particular solution
       and x1 any solution to Ax = 0
       if b not in the range, solve the normal equation AT Ax = AT b to get
       the least square solution
       if rank A = n, then AT A is invertible
       otherwise, the least square solution is not unique
       look for the shortest solution x̂
       Claim: if A = UΣV T , then x̂ = V Σ+ U T b
       kAx − bk = kΣV T x − U T bk = kΣy − U T bk;
       the shortest solution: y = Σ+ (U T b) and x = V y = (V Σ+ U T )b
Singular Value Decomposition   LU decomposition   Cholesky decomposition   QR and around
SVD vs PCA
       Observe that the largest value of kAxk with kxk ≤ 1 is obtained for
       x = v1 and is equal to σ1 ;
       this is the first principal axis for AT A:
               indeed, AT A = V ΣT U T UΣV T = V ΣT ΣV T = VDV T is the spectral
               decomposition of the symmetric matrix B := AT A
               B has eigenvalues σk2 with eigenvectors vk
               the quadratic form Q(x) := xT Bx is equal to kAxk2
               by the minimax properties of the eigenvalues,
σ32 = . . .
Linear systems
               A an m × n matrix
               x ∈ Rn , b ∈ Rm
       Solution: x = A−1 b for invertible A (m=n)
       Computation of A−1 is costly ∼ O(n3 ) and not always necessary!
       Alternatively, use the Gauss elimination method
       In matrix form, amounts to an LU representation of A
       L stands for “lower”- and U for “upper”-triangular
Singular Value Decomposition   LU decomposition   Cholesky decomposition   QR and around
Definition
The above matrices performing the elementary row operations are
called elementary matrices
Singular Value Decomposition         LU decomposition       Cholesky decomposition         QR and around
Lemma
Product of two lower-triangular (upper-triangular) matrices is
lower-triangular (upper-triangular)
Proof.
Use the row or column form of matrix-matrix product
Singular Value Decomposition   LU decomposition   Cholesky decomposition   QR and around
LU factorization
Theorem
Assume that an m × n matrix can be reduced to row echelon form U
using only row substitution operations. Then A = LU with a
lower-triangular m × m matrix L.
Proof.
Ek · · · E1 A = U =⇒ L = (Ek · · · E1 )−1 = E1−1 · · · Ek−1
Definition
The above representation A = LU, with an m × m lower-triangular
matrix L and upper-triangular∗ m × n matrix U, is the LU-factorization
of A.
Remark (∗ )
       L is unique if all its diagonal entries are 1
       If row interchanges are needed, use PA = LU, with P encoding all
       row interchanges
Singular Value Decomposition   LU decomposition   Cholesky decomposition   QR and around
                                         (
                                          Ux = y
                               Ax = b ⇐⇒
                                          Ly = b
               L−1
                 2 L1 is lower-triangular with 1 on the diagonal
               U2 U1−1 is upper-triangular
               thus L−1            −1
                     2 L 1 = U2 U1 = I
       if A is nonsingular, U has nonzero diagonal
       can “factor it out” as D to get A = LDU with U having 1’s on the
       main diagonal
       for symmetric matrices, U = LT and A = LDLT
       reason: AT = U T DLT = LDU = A and use uniqueness
Singular Value Decomposition           LU decomposition      Cholesky decomposition   QR and around
xT Ax = xT LT Lx = (Lx)T Lx = kLxk2 ≥ 0
Applications
The algorithm
Idea:
As in Gaussian elimination, make entries below diagonal zero
Recursive algorithm:
   1   start with i := 1 and A(1) := A
   2   At step i, the matrix A(i) has the following form:
                                                      
                                       Ii−1 0      0
                              A(i) =  0 ai,i bTi  ,
                                         0    bi B (i)
       with the identity matrix Ii−1 of size i − 1
   3   Set                                                     
                                      Ii−1          0         0
                                      0          √
                               Li :=               ai,i      0 
                                                                
                                        0         √1 bi     In−i
                                                   ai,i
Singular Value Decomposition        LU decomposition          Cholesky decomposition   QR and around
                               A(i+1)   = 0 1       0         
                                                 (i)  1      T
                                            0 0 B − ai,i bi bi
   5   Finally, A(n+1) = In and so we get
       with L := L1 L2 . . . Ln
Singular Value Decomposition       LU decomposition   Cholesky decomposition   QR and around
Example
                                                                
                             4    0   2                     2 0 0
              A = A(1) = 0       1 −1      =⇒     L1 = 0 1 0 
                             2 −1     6                     1 0 1
                                                   
                                         1   0    0                    
                   (2) T                                            1 0
     A(1)    = L1 A L1 =⇒ A = 0  (2)        1 −1 =⇒ L̃2 =
                                                    
                                                                  −1 1
                                         0 −1     5
                                                                 
               (2)       (3) T       (3)    1 0                 1 0
             Ã = L̃2 Ã L̃2 =⇒ Ã =                =⇒ L̃3 =
                                            0 4                 0 2
                                                              
                                2 0 0      1    0     0    1 0 0
              L = L1 L2 L3 = 0 1 0 0         1     0 0 1 0
                                1 0 1      0 −1       1    0 0 2
                                                           
                                   2   0    0     2     0   1
                    A = LLT = 0       1    0  0      1 −1
                                   1 −1     2     0     0   2
Singular Value Decomposition   LU decomposition    Cholesky decomposition    QR and around
Advantages of QR-decomposition:
Orthogonal columns of Q make algorithm stable
(norms do not increase or decrease)
Singular Value Decomposition   LU decomposition   Cholesky decomposition   QR and around
Applications of QR
       QR eigenvalue algorithm
               On each step, factorize Ak = Qk Rk and set Ak +1 := Rk Qk
               as Rk = Qk−1 Ak , one gets Ak +1 = Qk−1 Ak Qk
               thus Ak and Ak +1 have the same eigenvalues
               typically, Ak converge to an upper-triangular matrix R
               (Schur form of A)
               eigenvalues of A = diagonal entries of R
       Fourier transform and Fast Fourier transform
       ...