MATH 5330: Computational Methods of Linear Algebra
Lecture Note 10: Householder Transformation
                                         Xianyi Zeng
                          Department of Mathematical Sciences, UTEP
1    The QR Decomposition
Using Givens rotations allows us to write A = QE where Q is orthogonal and E is of the row
echelon form. Note that the lower-triangular part of E is always zero, i.e. eij = 0 if i > j; thus this
is decomposition is also known as the QR decomposition, where “R” stands for right-triangular or
upper triangular. Henceforth we will write A = QR instead of A = QE. In the special case when
A is square and non-singular, we immediately see that R = Q−1 A is also non-singular; hence all its
diagonal elements are non-zero. Furthermore, we can compute the inverse of A easily by:
                                            A−1 = R−1 Qt .
The advantage of using the QR decomposition over the LU decomposition to invert a non-singular
matrix is that all its components have an a priori estimate. However, the QR decomposition requires
much more computational cost (∼ 2n3 ) than the Gaussian elimination (∼ 2n3 /3). Robustness does
not come free!
   Givens rotations belong to one of three widely used methods to compute A = QR:
    • Gram Schmidt: Since every column of A is a linear combination of the columns of Q, we have
      col(A) ⊆ col(Q); thus in the end the QR decomposition can be reduced to orthogonalization
      of the column vectors of A. We have already seen in the Arnoldi’s method that, this can be
      achieved by the Gram Schmidt process. The major problem of the Gram Schmidt process is
      similar to that of Gaussian elimination: The algorithm involves dividing by a number that
      cannot be estimated a priori, hence it can breakdown unexpected.
    • Givens rotations: This method is more robust than the Gram Schmidt, and in each rotation
      only two adjacent rows are involved so it is more bandwidth efficient and reduce cache misses.
      A major objection for using the Givens rotation is its complexity in implementation; partic-
      ularly people found out that the ordering of the rotations actually matter in practice [1], and
      determining the optimal order is a non-trivial problem.
    • Householder transformation: This method is robust like the one using Givens rotations, easier
      to implement, and requires less computation. However, it is less bandwidth efficient and more
      difficult to parallelize than the latter.
    At the end of this section, we prove that if A is square and non-singular, the QR decomposition
is unique, if we require in addition that all diagonal entries of R are positive. In fact, if A = Q1 R1 =
Q2 R2 where both Q1 and Q2 are orthogonal and both R1 and R2 are upper-triangular, we then
have:
                                            Qt2 Q1 = R2 R1−1 .
                                                   1
The left hand side is orthogonal whereas the right hand side is upper triangular. It is not difficult to
see that the only upper triangular matrix that is also orthogonal is diagonal matrices with diagonal
elements being ±1. By the assumption that the diagonal elements of both R1 and R2 are positive,
the only possibility is R2 R1−1 = I, or equivalently Q1 = Q2 and R1 = R2 .
2    The Householder Transformation
A major motivation for using Givens transform to construct the QR decomposition is that rotations
preserve the L2 -norm of vectors. These are, however, not the only operations that have this
property. For example, the reflection about any plane also preserve the L2 -norm of vectors in Rn .
The method utilizing this latter property is built on the Householder transformation.
   There are at least two ways to describe a Householder matrix. Algebraically, a Householder
matrix differs from the identity matrix by a rank one matrix as follows:
                                             Hv = I − 2vv t ,                                      (2.1)
where v is a unit vector. Furthermore, Hv is symmetric and orthogonal; and we’ll check the latter:
                        Hvt Hv = (I − 2vv t )(I − 2vv t ) = I − 4vv t + 4v(v t v)v t
                               = I − 4vv t + 4vv t = I .
    Now let u be any vector, and we compute Hv u:
                                 Hv u = (I − 2vv t )u = u − 2(u · v)v .                            (2.2)
The second term looks familiar: (u · v)v is the projection of u onto the straight line spanned by
the vector v. Indeed, we see that:
                                       u = (u · v)v + [u − (u · v)v]
is the orthogonal decomposition of u w.r.t. V = span(v):
                               v · [u − (u · v)v] = v · u − (u · v)||v||2 = 0 .
Thus (2.2) gives the reflection of u w.r.t. V ⊥ , in the sense that the midpoint of the line connecting
u and Hv u is the orthogonal projection of u onto V ⊥ . This is the geometrical description of a
Householder transform, and a 2D example is depicted in Figure 1.
                                                                 V⊥
                                          Hv u
                                      O
                                             v
                           Figure 1: Reflection of u by the line V ⊥ in R2 .
                                                      2
  In 2D, V ⊥ itself is a straight line; in Rn , V ⊥ has dimension n − 1 and it has normal vector v.
We call V ⊥ a hyperplane with unit normal vector v.
     As an immediate application of the Householder transform, we can essentially map u to any
other direction by choosing v carefully. In particular, we would like to find v such that Hv u=||u||e1 ,
i.e., all the components except the first one are zero. Let α = u · v for now, then we need:
                                        u1 − 2αv1 = ||u|| ;
                                        ui − 2αvi = 0 , i = 2,···,n .
Hence v is given by:
                                   u1 − ||u||        ui
                                 v1 =         , vi =    , i = 2,···,n .
                                      2α             2α
To proceed, note that v needs to be a unit vector, we thusly have:
                                                       n
                                                       X
                             4α2 = (u1 − ||u||)2 +            u2i = 2||u||2 − 2u1 ||u|| .
                                                       i=2
             q
Choosing 2α = 2||u||2 − 2u1 ||u|| we obtain the desired unit vector v:
                                                                         
                                                        u1 −||u||
                                                  √
                                           2||u||2 −2u1 ||u||            
                                          √      u2                      
                                           2||u||2 −2u
                                                       1 ||u||
                                                                          
                                        v=         .
                                                                          .                       (2.3)
                                          
                                                   ..                    
                                                                          
                                                  u
                                                                         
                                            √     2
                                                     n
                                                      2||u|| −2u1 ||u||
Finally, it is not difficult to check that:
                                    n
                 u2 − u1 ||u||      X         u2i           ||u||2 − u1 ||u||    2α2
        v ·u = q 1                +   q                   =q                   =     =α.
                      2                      2                     2             2α
                2||u|| − 2u1 ||u|| i=2 2||u|| − 2u1 ||u||   2||u|| − 2u1 ||u||
For convenience, we shall denote this Householder matrix by H(u), i.e., I − 2vv t where v is given
by (2.3).
3    The QR Decomposition using Householder Transformations
The idea is to use Householder transformation to construct a sequence of matrices H1 , H2 , ···, Hn
such that Hn Hn−1 ···H1 A is of the row echelon form. Again, the purpose of each Hi is to put the
i-th column of A in the desired form without touching the previous i − 1 columns.
    The purpose of H1 is to have the first column of H1 A possess at most one non-zero, which
is the first component. This is clearly achieved by defining H1 as the Householder matrix that
                                                                         (0)
corresponds to (2.3) where u is replaced by the first column of A. Let a1 be the first column of
                                                          3
     def                                                                         (0)
A(0) == A, following our previous notation we have H1 = H(a1 ):
                                                    p 2
                                                           a11 + a221 + ··· + a2m1
                                                                                                
                                               a11
                                              a21                0                            
                                                                                              
                                              a31                0
                [A]1 7→ [H1 A]1 is given by        7→                                         .
                                                                                                 
                                              ..                 ..                           
                                              .                   .                           
                                               am1                  0
                                                                                                            (1)
Let A(1) = H1 A; now we look at the second column. Again, there are two situations: if a11 =
p                                          (1)         (1)
  a211 + ··· + a2m1 = 0, we define H2 = H(a2 ), where a2 is the entire second column of A(1) :
                                                (1)   q                               
                                                 a12          (1) 2     (1) 2      (1) 2
                                                (1)   (a12 ) + (a21 ) + ··· + (am1 ) 
                                                a22                    0              
             (1)          (1)
                                                (1)                                   
         [A ]2 7→ [H2 A ]2 is given by  a32  7→ 
                                                                       0              .
                                                ..  
                                                                                         
                                                                          ..             
                                                .                       .             
                                                    (1)
                                                  am2                                  0
              (1)
If, however, a11 is not zero, we do not want to touch the first row of A(1) and we only need to put
the last (m−2) elements of the second column of A(1) into zero. To this end, we abuse the notation
and denote:                                       (1) 
                                                   a
                                                  22
                                                  a(1)
                                                        
                                            (1)      32 
                                          a2 =  . 
                                                 
                                                  .. 
                                                        
                                                             (1)
                                                            am2
and define:                                     "                      #
                                                    1
                                         H2 =                   (1)        .
                                                        H(a2 )
It is not difficult to check that H2 A(1) keeps   the first column of A(1) unchanged; furthermore, the
first two columns of A transform as:
                                                                          (1)             (1)              
                                                                   a11 q            a12
                                                           
                                            a11     a12
                                           a21     a22                 (1)       (1)             (1)     
                                                                 0   (a22 )2 + (a32 )2 + ··· + (am2 )2   
   [A]1:2 7→ [H2 H1 A]1:2 is given by  a31         a32
                                                                                                           
                                                             7→  0                                        .
                                                            
                                                                                      0
                                           ..       ..                                                     
                                                                    .                 ..
                                                             
                                           .         .       .                                           
                                                                    .                  .                    
                                           am1 am2                         0               0
Now suppose we have already computed H1 , ··· , Hk . In order to constructHk+1 , we first identify
            (k)
the vector ak+1 , which is composed of the last m−p elements of the (k +1)-th column of A(k) that
we need to transform, and then define:
                                             "              #
                                                Ip
                                     Hk+1 =           (k)     .
                                                   H(ak+1 )
                                                        4
Here Ip is the identity matrix in Rp×p .
    To summarize, the Householder version of QR decomposition is given by Algorithm 3.1.
Algorithm 3.1 Orthogonal Reduction by Householder Transformation
 1: Set p = 1 and Q = Im
 2: for i = 1,2,···,n do
 3:    Compute s = a2p+1,i + ··· + a2m,i
 4:    if s 6= 0 then      q
 5:        Compute β = s + a2pi
                           p
 6:        Compute γ = 2(β 2 − api β)
 7:        Compute v = [(api − β)/γ , ap+1,i /γ , ··· , ami /γ]t
 8:        A[p : m,i : n] ← Hv A[p : m,i : n]
 9:        Q[p : m,p : m] ← Q[p : m,p : m]Hv
10:    end if
11:    if api ! = 0 then
12:        Set p ← p + 1
13:    end if
14: end for
    Here we use a notation M [I,J ] to denote a sub-matrix of M : I is a subset of all the row indices
of M and J is a subset of all column indices of J ; and M [I,J ] is the |I| × |J | matrix composed
of elements at the intersections of rows denoted by I and columns designated by J .
4    Analysis and Other Aspects
First of all, we note that updating the matrices A[p : m,i : n] and Q[p : m,p : m] in Algorithm 3.1
does not require constructing Hv explicitly. Let us assume for simplicity Hv ∈ Rm×m and we want
to compute Hv A. Note that:
                                Hv A = (I − 2vv t )A = A − 2v(v t A) ,
can be accomplished column by column. That is, for each column ai of A, we replace ai by:
                                           ai ← ai − 2(ai · v)v ,
which requires 4m flops and essentially no additional storage. Similarly, computing QHv leads to:
                                QHv = Q(I − 2vv t ) = Q − 2(Qv)v t ,
and this can be done row by row, with 4m flops for updating each row of Q.
    Finally, we look at the complexity of Algorithm 3.1. Note that the line 3 will be executed every
outer loop, unless we smartly terminate the program when p reaches r =min(m,n). But in any case,
the computational cost associated with line 3 is bounded by (2m − 1)n ∼ 2mn flops. The execution
of the block from line 4 to line 10, however, will leads to an increase in the value of p; hence this
                                                     5
block can be run at most r times, with the complexity bounded by 2r square root operations and
the number of flops:
                         X r
                             [2 + 3 + m − p + 2 + 4(m − p + 1)(n − ip + 1)]
                              p=1
where ip is the value i such that the if block with p is executed. Clearly ip ≥ p, and we obtain the
estimate:
      r
      X                                                      r
                                                             X
            [2 + 3 + m − p + 2 + 4(m − p + 1)(n − ip + 1)] ≤   [m − p + 7 + 4(m − p + 1)(n − p + 1)]
      p=1                                                  p=1
                                                                                    2
                                                         ∼ 2mr(n − r) + 2nr(m − r) + r3 .
                                                                                    3
Comparing Householder transformation and Givens rotation, the former requires only nearly two
thirds of the computational cost of the latter; however, because each Householder transformation
work on (almost) the entire column of A simultaneously, it is less friendly to parallelization.
References
[1] M. I. Gillespie and D. O. Olesky. Ordering givens rotations for sparse QR factorization. SIAM
    J. Matrix Anal. A., 16(3):1024–1041, 1995.