Computing Eigenvalues and/or
Eigenvectors;Part 2, The Power method
          and QR-algorithm
                  Tom Lyche
        Centre of Mathematics for Applications,
              Department of Informatics,
                   University of Oslo
              November 16, 2008
Today
   I    The   power method to find the dominant eigenvector
   I    The   shifted power method to speed up convergence
   I    The   inverse power method
   I    The   Rayleigh quotient iteration
   I    The   QR-algorithm
The Power Method
   I   Find the eigenvector corresponding to the dominant
       (largest in absolute value) eigenvalue.
   I   With a simple modification we can also find the
       corresponding eigenvalue
Assumptions
   I   Let A ∈ Cn,n have eigenpairs (λj , vj ), j = 1, . . . , n.
   I   Given z0 ∈ Cn we assume that
              (i)   |λ1 | > |λ2 | ≥ |λ3 | ≥ · · · ≥ |λn |,
             (ii)   A has linearly independent eigenvectors ,
            (iii)   z0 = c1 v1 + · · · + cn vn with c1 6= 0.
   I   The first assumption means that A has a dominant
       eigenvalue λ1 of algebraic multiplicity one.
   I   The second assumption is not necessary. It is included to
       simplify the analysis.
   I   The third assumption says that zT0 v1 6= 0, i. e., z0 has a
       component in the direction v1 .
Powers
   I   Given A ∈ Cn,n , a vector z0 ∈ Cn , and assume that i),ii),
       iii) hold.
   I   Define a sequence {zk } of vectors in Cn by
       zk := Ak z0 = Azk−1 , k = 1, 2, . . . .
   I   Assume z0 = c1 v1 + c2 v2 + · · · + cn vn .
   I   Then
       zk = c1 λk1 v1 + c2 λk2 v2 + · · · + cn λkn vn , k = 0, 1, 2, . . . .
                             k                    k
        zk                   λ2                     λn
   I
       λk
           =  c 1 v 1 + c 2  λ1
                                 v 2 + · · · + c n  λ1
                                                        .
         1
   I   zk /λk1 , → c1 v1 , k → ∞
The Power method
  Need to normalize the vectors zk .
    I Choose a norm on Cn , set x0 = z0 /kz0 k and generate for
      k = 1, 2, . . . unit vectors ,{xk } as follows:
                         (i)   yk = Axk−1
                                                            (1)
                        (ii)   xk = yk /kyk k.
Example
   I
              
           1 2
        A=       ,      z0 = [1.0, 1.0]T ,   x0 = [0.707, 0.707]T
           3 4
   I   x1 = [0.39, 0.92], x2 = [0.4175, 0.9087],
       x3 = [0.4159, 0.9094], . . .
   I   converges to an eigenvector of A
Convergence
   I   The way {xk } converges to an eigenvector will depend on
       the sign of λ1 . Suppose xk = K v1 for some K 6= 0.
   I   Then Axk = λ1 xk and xk+1 = Axk /kAxk k = |λλ11 | xk .
   I   Thus xk+1 = −xk if λ1 < 0.
eigenvalue
    I   Suppose we know an approximate eigenvector of
        A ∈ Cn,n . How should we estimate the corresponding
        eigenvalue?
    I   If (λ, u) is an exact eigenpair then Au − λu = 0.
    I   If u is an approximate eigenvector we can minimize the
        function ρ : C → R given by
                         ρ(µ) := kAu − µuk2 .
  Theorem
                                 uH Au
  ρ is minimized when µ = ν :=    uH u
                                         is the Rayleigh quotient of
  u.
Power with Rayleigh
  function [l,x,it]=powerit(A,z,K,tol)
  af=norm(A,’fro’); x=z/norm(z);
  for k=1:K
      y=A*x; l=x’*y;
      if norm(y-l*x)/af < tol
        it=k; x=y/norm(y); return
      end
      x=y/norm(y);
  end
  it=K+1;
The shifted power method
    I   A variant of the power method is the shifted power
        method.
    I   In this method we choose a number s and apply the
        power method to the matrix A − sI.
    I   The number s is called a shift since it shifts an eigenvalue
        λ of A to λ − s of A − sI.
    I   Sometimes the convergence can be faster if the shift is
        chosen intelligently.
Example
   I
                                                       
             1 2               1.7 −0.4                1 2
       A1 :=       ,     A2 :=            , and A3 =          .
             3 4               0.15 2.2                −3 4
   I   Start with a random vector and tol=10−6 .
   I   Get convergence in 7 iterations for A1 , 174 iterations for
       A2 and no convergence for A3 .
   I   A3 has two complex eigenvalues so assumption i) is not
       satisfied
   I   Rate of convergence depends on r = |λ2 /λ1 |. Faster
       convergence for smaller r .
   I   We have r ≈ 0.07 for A1 and r = 0.95 for A2 .
The inverse power method
    I   We apply the power method to the inverse matrix
        (A − sI)−1 , where s is a shift.
    I   If A has eigenvalues λ1 , . . . , λn in no particular order then
        (A − sI)−1 has eigenvalues
        µ1 (s) = (λ1 −s)−1 , µ2 (s) = (λ2 −s)−1 , . . . , µn (s) = (λn −s)−1 .
    I   Suppose λ1 is a simple eigenvalue of A.
    I   Then lims→λ1 |µ1 (s)| = ∞, while
        lims→λ1 µj (s) = (λj − λ1 )−1 < ∞ for j = 2, . . . , n.
    I   Hence, by choosing s sufficiently close to λ1 the inverse
        power method will converge to that eigenvalue.
For the inverse power method (1) is replaced by.
                   (i)   (A − sI)yk = xk−1
                                                          (2)
                  (ii)   xk = yk /kyk k.
Note that we solve the linear system rather than computing
the inverse matrix. Normally the PLU-factorization of A − sI
is pre-computed in order to speed up the iteration.
Rayleigh quotient iteration
  We can combine inverse power with Rayleigh quotient
  calculation.
                     (i)    (A − sk−1 I)yk = xk−1 ,
                    (ii)    xk = yk /kyk k,
                    (iii)   sk = xHk Axk ,
                    (iv )   rk = Axk − sk xk .
    I   We can avoid the calculation of Axk in (iii) and (iv ).
Example                  
               1 2
   I   A1 :=         .
               3 4
                 √
   I   λ = (5 − 33)/2 ≈ −0.37
   I   Start with x = [1, 1]T
   k               1         2         3         4         5
   krk2     1.0e+000 7.7e-002 1.6e-004 8.2e-010 2.0e-020
   |sk − λ| 3.7e-001 -1.2e-002 -2.9e-005 -1.4e-010 -2.2e-016
    Table: Quadratic convergence of Rayleigh quotient iteration
Problem with singularity?
    I   The linear system in i) becomes closer and closer to
        singular as sk converges to the eigenvalue.
    I   Thus the system becomes more and more ill-conditioned
        and we can expect large errors in the computed yk .
    I   This is indeed true, but we are lucky.
    I   Most of the error occurs in the direction of the
        eigenvector and this error disappears when we normalize
        yk in ii).
    I   Miraculously, the normalized eigenvector will be quite
        accurate.
Discussion
    I   Since the shift changes from iteration to iteration the
        computation of y will require O(n3 ) flops for a full matrix.
    I   For such a matrix it might pay to reduce it to a upper
        Hessenberg form or tridiagonal form before starting the
        iteration.
    I   However, if we have a good approximation to an eigenpair
        then only a few iterations are necessary to obtain close to
        machine accuracy.
    I   If Rayleigh quotient iteration converges the convergence
        will be quadratic and sometimes even cubic.
The QR-algorithm
   I   An iterative method to compute all eigenvalues and
       eigenvectors of a matrix A ∈ Cn,n .
   I   The matrix is reduced to triangular form by a sequence of
       unitary similarity transformations computed from the
       QR-factorization of A.
   I   Recall that for a square matric the QR-factorization and
       the QR-decomposition are the same.
   I   If A = QR is a QR-factorization then Q ∈ Cn,n is unitary,
       QH Q = I and R ∈ Cn,n is upper triangular.
Basic QR
           A1 = A
           for k = 1, 2, . . .
               Qk Rk = Ak      (QR-factorization of Ak )   (3)
               Ak+1 = Rk Qk .
           end
Example
                    
                 2 1
   I   A1 = A =
                 1 2
                                                   
                      √1
                            −2 −1        1    −5  −4   
   I   A1 = Q1 R1 =                   ∗ √5
                        5   −1 2               0   3
                                           
                          −5 −4      −2 −1
   I   A2 = R1 Q1 = 15            ∗             ∗=
                        0   3      −1 2
         2.8 −0.6
                      .
         −0.6 1.2
                                                          
               2.997 −0.074                 3.0000 −0.0001
   I   A3 =                     , A9 =
               −0.074 1.0027               −0.0001 1.0000
   I   A9 is almost diagonal and contain the eigenvalues λ1 = 3
       and λ2 = 1 on the diagonal.
Example 2
                                                        
                    0.9501      0.8913   0.8214   0.9218
                   0.2311      0.7621   0.4447   0.7382 
         A1 = A = 
                   0.6068
                                                         
                                0.4565   0.6154   0.1763 
                    0.4860      0.0185   0.7919   0.4057
  we obtain
                                                   
                2.323 0.047223   −0.39232 −0.65056
          −2.1e − 10   0.13029    0.36125  0.15946 
  A14   =
          −4.1e − 10 −0.58622
                                                    .
                                  0.052576 −0.25774 
            1.2e − 14 3.3e − 05 −1.1e − 05  0.22746
  A14 is close to quasi-triangular.
Example 2
                                                   
                2.323 0.047223   −0.39232 −0.65056
          −2.1e − 10   0.13029    0.36125  0.15946 
  A14   =
          −4.1e − 10 −0.58622
                                                    .
                                  0.052576 −0.25774 
            1.2e − 14 3.3e − 05 −1.1e − 05  0.22746
   I    The 1 × 1 blocks give us two real eigenvalues λ1 ≈ 2.323
        and λ4 ≈ 0.2275.
   I    The middle 2 × 2 block has complex eigenvalues resulting
        in λ2 ≈ 0.0914 + 0.4586i and λ3 ≈ 0.0914 − 0.4586i.
   I    From Gerschgorin’s circle theorem it follows that the
        approximations to the real eigenvalues are quite accurate.
   I    We would also expect the complex eigenvalues to have
        small absolute errors.
Why QR works
   I   Since QHk Ak = Rk we obtain
                                      H
       Ak+1 = Rk Qk = QHk Ak Qk = Q̃k AQ̃k , where Q̃k = Q1 · · · Qk .
                                                               (4)
   I   Thus Ak+1 is similar to Ak and hence to A.
   I   It combines both the power method and the Rayleigh
       quotient iteration.
   I   If A ∈ Rn,n has real eigenvalues, then under fairly general
       conditions, the sequence {Ak } converges to an upper
       triangular matrix, the Schur form.
   I   If A is real, but with some complex eigenvalues, then the
       convergence will be to the quasi-triangular Schur form
The Shifted QR-algorithms
  Like in the inverse power method it is possible to speed up the
  convergence by introducing shifts. The explicitly shifted
  QR-algorithm works as follows:
    A1 = A
    for k = 1, 2, . . .
        Choose a shift sk
                                                           (5)
        Qk Rk = Ak − sk I    (QR-factorization of Ak − sI)
        Ak+1 = Rk Qk + sk I.
    end
Remarks
   1. Ak+1 = QHk Ak Qk is unitary similar to Ak .
   2. Before applying this algorithm we reduce A to upper
      Hessenberg form
   3. If A is upper Hessenberg then all matrices {Ak }k≥1 will
      be upper Hessenberg.
   4. Why? because Ak+1 = Rk Ak R−1  k . A product of two
      upper triangular matrices and an upper Hessenberg
      matrix is upper Hessenberg.
   5. Givens rotations is used to compute the QR-factorization
      of Ak − sk I.
More remarks
         6. If ai+1,i = 0 we can split the the eigenvalue
            problem into two smaller problems. We do this
            splitting if one ai+1,i becomes sufficiently small.
         7. sk := eTn Ak en is called the Rayleigh quotient shift
         8. The Wilkinson shift is the eigenvalue of the lower
            right 2 × 2 corner closest to the n, n entry of Ak .
         9. Convergence is at least quadratic for both kinds
            of shifts.
        10. In the implicitly shifted QR-algorithm we combine
            two shifts at a time. Can find both real and
            complex eigenvalues without using complex
            arithmetic.
More remarks
        11. To find the eigenvectors we first find the
            eigenvectors of the triangular or quasi-triangular
            matrix. We then compute the eigenvalues of the
            upper Hessenberg matrix and finally the
            eigenvectors of A.
        12. Normally we find all eigenvalues in O(n3 ) flops.
            This is because we need O(n3 ) flops to find the
            upper Hessenberg form, each iterations requires
            O(n2 ) flops if Ak is upper Hessenberg and O(n)
            flops if Ak is tridiagonal, and we normally need
            O(n) iterations.