Block LU Factorization
Block LU Factorization
php
                                                                                                                                      Chapter 13
                                                                                                                                      Block LU Factorization
                                                                                                                                                                            245
                                                                                                                                      246                                                   Block LU Factorization
                                                                                                                                      prises three nested loops that can be ordered in six ways, each yielding a different
                                                                                                                                      algorithmic variant of the method. These variants involve different computational
                                                                                                                                      kernels: inner product and saxpy operations (level-1 BLAS), or outer product and
                                                                                                                                      gaxpy operations (level-2 BLAS). To introduce matrix–matrix operations (level-3
                                                                                                                                      BLAS), which are beneficial for high-performance computing, further manipula-
                                                                                                                                      tion beyond loop reordering is needed. We will use the following terminology,
                                                                                                                                      which emphasises an important distinction.
                                                                                                                                          A partitioned algorithm is a scalar (or point) algorithm in which the operations
                                                                                                                                      have been grouped and reordered into matrix operations.
                                                                                                                                          A block algorithm is a generalization of a scalar algorithm in which the basic
                                                                                                                                      scalar operations become matrix operations (α + β, αβ, and α/β become A + B,
                                                                                                                                      AB, and AB −1 ), and a matrix property based on the nonzero structure becomes
                                                                                                                                      the corresponding property blockwise (in particular, the scalars 0 and 1 become
                                                                                                                                      the zero matrix and the identity matrix, respectively). A block factorization is
                                                                                                                                      defined in a similar way and is usually what a block algorithm computes.
                                                                                                                                          A partitioned version of the outer product form of LU factorization may be
                                                                                                                                      developed as follows. For A ∈ Rn×n and a given block size r, write
                                                                                                                                                                                                   
                                                                                                                                                       A11 A12         L11     0      Ir 0      U11 U12
                                                                                                                                                                  =                                         ,       (13.1)
                                                                                                                                                       A21 A22         L21 In−r       0 S        0   In−r
                                                                                                                                      where A11 is r × r.
                                                                                                                                      In general, the blocks can be of different dimensions. Note that this factorization is
                                                                                                                                      not the same as a standard LU factorization, because U is not triangular. However,
                                                                                                                                      the standard and block LU factorizations are related as follows: if A = LU is a
Downloaded 07/01/14 to 129.174.21.5. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
                                                                                                                                      block LU factorization and each Uii has an LU factorization Uii = Lii U ii , then
                                                                                                                                      A = L diag(Lii ) · diag(U ii )U is an LU factorization. Conditions for the existence
                                                                                                                                      of a block LU factorization are easy to state.
                                                                                                                                      Theorem 13.5 (Demmel and Higham). Under the assumptions (13.4)–(13.6), the
                                                                                                                                      LU factors of A ∈ Rn×n computed using the partitioned outer product form of LU
                                                                                                                                                                              bU
                                                                                                                                      factorization with block size r satisfy L b = A + ∆A, where
                                                                                                                                                                                              
                                                                                                                                                      k∆Ak ≤ u δ(n, r)kAk + θ(n, r)kLk b kUb k + O(u2 ),       (13.7)
and where
                                                                                                                                         Proof. The proof is essentially inductive. To save clutter we will omit “+O(u2 )”
                                                                                                                                      from each bound. For n = r, the result holds trivially. Consider the first block
                                                                                                                                      stage of the factorization, with the partitioning (13.1). The assumptions imply
                                                                                                                                      that
                                                                                                                                               b 11 U
                                                                                                                                               L    b12 = A12 + ∆A12 ,                                 b 11 k kU
                                                                                                                                                                              k∆A12 k ≤ c2 (r, n − r)ukL       b12 k,       (13.8)
                                                                                                                                               b 21 U
                                                                                                                                               L    b11 = A21 + ∆A21 ,                                 b 21 k kU
                                                                                                                                                                              k∆A21 k ≤ c2 (r, n − r)ukL       b11 k.       (13.9)
                                                                                                                                                                                      b 21 U
                                                                                                                                      To obtain S = A22 − L21 U12 we first compute C = L    b12 , obtaining
                                                                                                                                               b=L
                                                                                                                                               C b 21 U
                                                                                                                                                      b12 + ∆C,                                        b 21 k kU
                                                                                                                                                                          k∆Ck ≤ c1 (n − r, r, n − r)ukL       b12 k,
                                                                                                                                                          Sb = A22 − C
                                                                                                                                                                     b + F,                          b
                                                                                                                                                                                  kF k ≤ u(kA22 k + kCk).                  (13.10)
It follows that
                                                                                                                                              Sb = A22 − L
                                                                                                                                                         b 21 U
                                                                                                                                                              b12 + ∆S,                                          (13.11a)
                                                                                                                                                                                                              
                                                                                                                                                                b    b                            b      b
                                                                                                                                           k∆Sk ≤ u kA22 k + kL21 k kU12 k + c1 (n − r, r, n − r)kL21 k kU12 k . (13.11b)
                                                                                                                                      We also assume that when a system Uii xi = di of order r is solved, the computed
                                                                                                                                      solution x
                                                                                                                                               bi satisfies
                                                                                                                                                 (Uii + ∆Uii )b
                                                                                                                                                              xi = di ,      k∆Uii k ≤ c5 (r)ukUii k + O(u2 ).          (13.15)
                                                                                                                                      The assumptions (13.14) and (13.15) are satisfied for Implementation 1 of Algo-
                                                                                                                                      rithm 13.3 and are sufficient to prove the following result.
                                                                                                                                          Proof. We omit the proof (see Demmel, Higham, and Schreiber [326, ]
                                                                                                                                      for details). It is similar to the proof of Theorem 13.5.
Downloaded 07/01/14 to 129.174.21.5. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
                                                                                                                                          The bounds in Theorem 13.6 are valid also for other versions of block LU
                                                                                                                                      factorization obtained by “block loop reordering”, such as a block gaxpy based
                                                                                                                                      algorithm.
                                                                                                                                          Theorem 13.6 shows that the stability of block LU factorization is determined
                                                                                                                                      by the ratio kLkb kU b k/kAk (numerical experiments show that the bounds are, in
                                                                                                                                      fact, reasonably sharp). If this ratio is bounded by a modest function of n, then
                                                                                                                                      b and U
                                                                                                                                      L        b are the true factors of a matrix close to A, and x       b solves a slightly
                                                                                                                                      perturbed system. However, kLk   b kU b k can exceed kAk by an arbitrary factor, even
                                                                                                                                      if A is symmetric positive definite or diagonally dominant by rows. Indeed, kLk ≥
                                                                                                                                      kL21 k = kA21 A−111 k, using the partitioning (13.2), and this lower bound for kLk can
                                                                                                                                      be arbitrarily large. In the following two subsections we investigate this instability
                                                                                                                                      more closely and show that kLk kU k can be bounded in a useful way for particular
                                                                                                                                      classes of A. Without further comment we make the reasonable assumption that
                                                                                                                                      kLk kU k ≈ kLkb kU b k, so that these bounds may be used in Theorem 13.6.
                                                                                                                                          What can be said for Implementation 2? Suppose, for simplicity, that the
                                                                                                                                      inverses A−1
                                                                                                                                                 11 (which are used in step 2 of Algorithm 13.3 and in the block back
                                                                                                                                      substitution) are computed exactly. Then the best bounds of the forms (13.14)
                                                                                                                                      and (13.15) are
                                                                                                                                      Working from these results, we find that Theorem 13.6 still holds provided the
                                                                                                                                      first-order terms in the bounds in (13.16) are multiplied by maxi κ(Ubii ). This
                                                                                                                                      suggests that Implementation 2 of Algorithm 13.3 can be much less stable than
                                                                                                                                      Implementation 1 when the diagonal blocks of U are ill conditioned, and this is
                                                                                                                                      confirmed by numerical experiments.
                                                                                                                                      This definition implicitly requires that the diagonal blocks Ajj are all nonsingu-
                                                                                                                                      lar. A is block diagonally dominant by rows if AT is block diagonally dominant
                                                                                                                                      by columns. For the block size 1, the usual property of point diagonal domi-
                                                                                                                                      nance is obtained. Note that for the 1- and ∞-norms diagonal dominance does
                                                                                                                                      not imply block diagonal dominance, nor does the reverse implication hold (see
                                                                                                                                      Problem 13.2). Throughout our analysis of block diagonal dominance we take the
                                                                                                                                      norm to be an arbitrary subordinate matrix norm.
                                                                                                                                      252                                                                        Block LU Factorization
                                                                                                                                          First, we show that for block diagonally dominant matrices a block LU factor-
                                                                                                                                      ization exists, using the key property that block diagonal dominance is inherited by
                                                                                                                                      the Schur complements obtained in the course of the factorization. In the analysis
Downloaded 07/01/14 to 129.174.21.5. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
                                                                                                                                      For j = 2: m we have
                                                                                                                                         m
                                                                                                                                         X                      m
                                                                                                                                                                X
                                                                                                                                                     (2)
                                                                                                                                                   kAij k   =          kAij − Ai1 A−1
                                                                                                                                                                                   11 A1j k
                                                                                                                                            i=2                 i=2
                                                                                                                                            i6=j                i6=j
                                                                                                                                                                m
                                                                                                                                                                X                               m
                                                                                                                                                                                                X
                                                                                                                                                            ≤          kAij k + kA1j k kA−1
                                                                                                                                                                                         11 k          kAi1 k
                                                                                                                                                                i=2                             i=2
                                                                                                                                                                i6=j                            i6=j
                                                                                                                                                                m
                                                                                                                                                                X                                             
                                                                                                                                                            ≤          kAij k + kA1j k kA−1     −1 −1
                                                                                                                                                                                         11 k kA11 k  − kAj1 k ,             using (13.17),
                                                                                                                                                                i=2
                                                                                                                                                                i6=j
                                                                                                                                                                m
                                                                                                                                                                X
                                                                                                                                                            =          kAij k + kA1j k − kA1j k kA−1
                                                                                                                                                                                                  11 k kAj1 k
                                                                                                                                                                i=2
                                                                                                                                                                i6=j
                                                                                                                                                            ≤ kA−1
                                                                                                                                                                jj k
                                                                                                                                                                    −1
                                                                                                                                                                       − kA1j k kA−1
                                                                                                                                                                                  11 k kAj1 k,              using (13.17),
                                                                                                                                                            = min kAjj xk −        kA1j k kA−1
                                                                                                                                                                                            11 k kAj1 k
                                                                                                                                                                kxk=1
                                                                                                                                                     (2)                                      Pm                (2)
                                                                                                                                      Now if Ajj is singular it follows that                    i=2,i6=j   kAij k = 0; therefore A(2) , and
                                                                                                                                                                                                                      (2)
                                                                                                                                      hence also A, is singular, which is a contradiction. Thus Ajj is nonsingular, and
                                                                                                                                      13.3 Error Analysis of Block LU Factorization                                                                 253
                                                                                                                                                                                kAij k ≤ kAjj             k    ,
                                                                                                                                                                         i=2
                                                                                                                                                                         i6=j
                                                                                                                                      showing that A(2) is block diagonally dominant by columns. The result follows by
                                                                                                                                      induction.
                                                                                                                                         The next result allows us to bound kU k for a block diagonally dominant matrix.
                                                                                                                                      Theorem 13.8 (Demmel, Higham, and Schreiber). Let A satisfy the conditions
                                                                                                                                      of Theorem 13.7. If A(k) denotes the matrix obtained after k − 1 steps of Algo-
                                                                                                                                      rithm 13.3, then
                                                                                                                                                                       (k)
                                                                                                                                                              max kAij k ≤ 2 max kAij k.
                                                                                                                                                                  k≤i,j≤m                        1≤i,j≤m
                                                                                                                                         Proof. Let A be block diagonally dominant by columns (the proof for row
                                                                                                                                      diagonal dominance is similar). Then
                                                                                                                                                       m
                                                                                                                                                       X                     m
                                                                                                                                                                             X
                                                                                                                                                                 (2)
                                                                                                                                                               kAij k =            kAij − Ai1 A−1
                                                                                                                                                                                               11 A1j k
                                                                                                                                                        i=2                  i=2
                                                                                                                                                                             m
                                                                                                                                                                             X                                     m
                                                                                                                                                                                                                   X
                                                                                                                                                                         ≤         kAij k + kA1j k kA−1
                                                                                                                                                                                                     11 k                kAi1 k
                                                                                                                                                                             i=2                                   i=2
                                                                                                                                                                             Xm
                                                                                                                                                                         ≤         kAij k,
                                                                                                                                                                             i=1
                                                                                                                                                                                                                                  Pm          (k)
                                                                                                                                      using (13.17). By induction, using Theorem 13.7, it follows that                                i=k   kAij k ≤
                                                                                                                                      Pm
                                                                                                                                        i=1 kAij k. This yields
                                                                                                                                                                                          m
                                                                                                                                                                                          X                           m
                                                                                                                                                                                                                      X
                                                                                                                                                                   (k)                            (k)
                                                                                                                                                      max kAij k ≤ max                          kAij k ≤ max                kAij k.
                                                                                                                                                    k≤i,j≤m                     k≤j≤m                         k≤j≤m
                                                                                                                                                                                          i=k                         i=1
                                                                                                                                                      P
                                                                                                                                      From (13.17),     i6=j   kAij k ≤ kA−1
                                                                                                                                                                          jj k
                                                                                                                                                                              −1
                                                                                                                                                                                 ≤ kAjj k, so
                                                                                                                                                      (k)
                                                                                                                                            max kAij k ≤ 2 max kAjj k ≤ 2 max kAjj k = 2 max kAij k.
                                                                                                                                          k≤i,j≤m                 k≤j≤m                      1≤j≤m                  1≤i,j≤m
                                                                                                                                         The implications of Theorems 13.7 and 13.8 for stability are as follows. Suppose
                                                                                                                                      A is block diagonally dominant by columns. Also, assume for the moment that
                                                                                                                                      the (subordinate) norm has the property that
                                                                                                                                                                                   X
                                                                                                                                                               max kAij k ≤ kAk ≤      kAij k,                     (13.19)
                                                                                                                                                                       i,j
                                                                                                                                                                                                    i,j
                                                                                                                                      which holds for any p-norm, for example. The subdiagonal blocks in the first
                                                                                                                                      block column of L are given by Li1 = Ai1 A−1             T           T T
                                                                                                                                                                                11 and so k[L21 , . . . , Lm1 ] k ≤ 1, by
                                                                                                                                      (13.17) and (13.19). From Theorem 13.7 it follows that k[Lj+1,j , . . . , LTmj ]T k ≤ 1
                                                                                                                                                                                                 T
                                                                                                                                      254                                                  Block LU Factorization
                                                                                                                                                                  (i)
                                                                                                                                      for j = 2: m. Since Uij = Aij for j ≥ i, Theorem 13.8 shows that kUij k ≤ 2kAk
                                                                                                                                      for each block of U (and kUii k ≤ kAk). Therefore kLk ≤ m and kU k ≤ m2 kAk,
                                                                                                                                      and so kLk kU k ≤ m3 kAk. For particular norms the bounds on the blocks of L
Downloaded 07/01/14 to 129.174.21.5. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
                                                                                                                                      and U yield a smaller bound for kLk and kU k. For example, for the 1-norm we
                                                                                                                                      have kLk1 kU k1 ≤ 2mkAk1 and for the ∞-norm kLk∞ kU k∞ ≤ 2m2 kAk∞ . We
                                                                                                                                      conclude that block LU factorization is stable if A is block diagonally dominant
                                                                                                                                      by columns with respect to any subordinate matrix norm satisfying (13.19).
                                                                                                                                          Unfortunately, block LU factorization can be unstable when A is block diago-
                                                                                                                                      nally dominant by rows, for although Theorem 13.8 guarantees that kUij k ≤ 2kAk,
                                                                                                                                      kLk can be arbitrarily large. This can be seen from the example
                                                                                                                                                                                          
                                                                                                                                                              A     0        I     0    A11 0
                                                                                                                                                       A = 111         = 1 −1                    = LU,
                                                                                                                                                              2I    I      2 A11  I      0   I
                                                                                                                                      where A is block diagonally dominant by rows in any subordinate norm for any
                                                                                                                                      nonsingular matrix A11 . It is easy to confirm numerically that block LU factor-
                                                                                                                                      ization can be unstable on matrices of this form.
                                                                                                                                          Next, we bound kLk kU k for a general matrix and then specialize to point
                                                                                                                                      diagonal dominance. From this point on we use the norm kAk := maxi,j |aij |. We
                                                                                                                                      partition A according to
                                                                                                                                                                            
                                                                                                                                                                   A11 A12
                                                                                                                                                             A=                ,     A11 ∈ Rr×r ,            (13.20)
                                                                                                                                                                   A21 A22
                                                                                                                                      and denote by ρn the growth factor for GE without pivoting. We assume that GE
                                                                                                                                      applied to A succeeds.
                                                                                                                                         To bound kLk, we note that, under the partitioning (13.20), for the first
                                                                                                                                      block stage of Algorithm 13.3 we have kL21 k = kA21 A−111 k ≤ nρn κ(A) (see Prob-
                                                                                                                                      lem 13.4). Since the algorithm works recursively with the Schur complement S,
                                                                                                                                      and since every Schur complement satisfies κ(S) ≤ ρn κ(A) (see Problem 13.4),
                                                                                                                                      each subsequently computed subdiagonal block of L has norm at most nρ2n κ(A).
                                                                                                                                      Since U is composed of elements of A together with elements of Schur complements
                                                                                                                                      of A,
                                                                                                                                                                       kU k ≤ ρn kAk.                           (13.21)
                                                                                                                                      Overall, then, for a general matrix A ∈ Rn×n ,
kLk kU k ≤ 2kAk.
                                                                                                                                      Thus block LU factorization is perfectly stable for a matrix point diagonally dom-
                                                                                                                                      inant by columns.
                                                                                                                                      13.3 Error Analysis of Block LU Factorization                                  255
                                                                                                                                           kA21 A−1                −1
                                                                                                                                                 11 k2 ≤ kR12 k2 kR11 k2 ≤ kRk2 kR
                                                                                                                                                                                   −1
                                                                                                                                                                                      k2 = κ2 (R) = κ2 (A)1/2 .
                                                                                                                                      The following lemma is proved in a way similar to the second inequality in Prob-
                                                                                                                                      lem 13.4.
                                                                                                                                         Using the same reasoning as in the last subsection, we deduce from these
                                                                                                                                      two lemmas that each subdiagonal block of L is bounded in 2-norm by κ2 (A)1/2 .
                                                                                                                                      Therefore kLk2 ≤ 1 + mκ2 (A)1/2 , where
                                                                                                                                                                         √ there are m block stages in the algorithm.
                                                                                                                                      Also, it can be shown that kU k2 ≤ mkAk2 . Hence
                                                                                                                                                                        √
                                                                                                                                                           kLk2 kU k2 ≤ m(1 + mκ2 (A)1/2 )kAk2 .              (13.24)
                                                                                                                                      256                                                      Block LU Factorization
                                                                                                                                      Table 13.1. Stability of block and point LU factorization. ρn is the growth factor for GE
                                                                                                                                      without pivoting.
Downloaded 07/01/14 to 129.174.21.5. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
                                                                                                                                      It follows from Theorem 13.6 that when Algorithm 13.3 is applied to a symmetric
                                                                                                                                      positive definite matrix A, the backward errors for the LU factorization and the
                                                                                                                                      subsequent solution of a linear system are both bounded by
                                                                                                                                                              √
                                                                                                                                                           cn mukAk2 (2 + mκ2 (A)1/2 ) + O(u2 ).               (13.25)
                                                                                                                                      Any resulting bound for kx − x   bk2 /kxk2 will be proportional to κ2 (A)3/2 , rather
                                                                                                                                      than κ2 (A) as for a stable method. This suggests that block LU factorization
                                                                                                                                      can lose up to 50% more digits of accuracy in x than a stable method for solving
                                                                                                                                      symmetric positive definite linear systems. The positive conclusion to be drawn,
                                                                                                                                      however, is that block LU factorization is guaranteed to be stable for a symmetric
                                                                                                                                      positive definite matrix that is well conditioned.
                                                                                                                                          The stability results for block LU factorization are summarized in Table 13.1,
                                                                                                                                      which tabulates a bound for kA− L  bU b k/(cn ukAk) for block and point LU factoriza-
                                                                                                                                      tion for the matrix properties considered in this chapter. The constant cn incor-
                                                                                                                                      porates any constants in the bound that depend polynomially on the dimension,
                                                                                                                                      so a value of 1 in the table indicates unconditional stability.
                                                                                                                                          Block LU factorization appears to have first been proposed for block tridi-
                                                                                                                                      agonal matrices, which frequently arise in the discretization of partial differential
                                                                                                                                      equations. References relevant to this application include Isaacson and Keller [667,
Downloaded 07/01/14 to 129.174.21.5. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
                                                                                                                                      , p. 59], Varah [1187, ], Bank and Rose [62, ], Mattheij [827, ],
                                                                                                                                      [828, ], and Concus, Golub, and Meurant [262, ].
                                                                                                                                          For an application of block LU factorization to linear programming, see Elder-
                                                                                                                                      sveld and Saunders [388, ].
                                                                                                                                          Theorem 13.5 is from Demmel and Higham [324, ]. The results in §13.3 are
                                                                                                                                      from Demmel, Higham, and Schreiber [326, ], which extends earlier analysis
                                                                                                                                      of block LU factorization by Demmel and Higham [324, ].
                                                                                                                                          Block diagonal dominance was introduced by Feingold and Varga [406, ],
                                                                                                                                      and has been used mainly in generalizations of the Gershgorin circle theorem.
                                                                                                                                      Varah [1187, ] obtained bounds on kLk and kU k for block diagonally dominant
                                                                                                                                      block tridiagonal matrices; see Problem 13.1.
                                                                                                                                          Theorem 13.7 is obtained in the case of block diagonal dominance by rows
                                                                                                                                      with minj γj > 0 by Polman [946, ]; the proof in [946, ] makes use of the
                                                                                                                                      corresponding result for point diagonal dominance and thus differs from the proof
                                                                                                                                      we have given.
                                                                                                                                          At the cost of a much more difficult proof, Lemma 13.9 can be strengthened
                                                                                                                                      to the attainable bound kA21 A−1 11 k2 ≤ (κ2 (A)
                                                                                                                                                                                       1/2
                                                                                                                                                                                           − κ2 (A)−1/2 )/2, as shown by
                                                                                                                                      Demmel [307, , Thm. 4], but the weaker bound is sufficient for our purposes.
                                                                                                                                      13.4.1. LAPACK
                                                                                                                                      LAPACK does not implement block LU factorization, but its LU factorization
                                                                                                                                      (and related) routines for full matrices employ partitioned LU factorization in
                                                                                                                                      order to exploit the level-3 BLAS and thereby to be efficient on high-performance
                                                                                                                                      machines.
                                                                                                                                      Problems
                                                                                                                                      13.1. (Varah [1187, ]) Suppose A is block tridiagonal and has the block LU
                                                                                                                                      factorization A = LU (so that L and U are block bidiagonal and Ui,i+1 = Ai,i+1 ).
                                                                                                                                      Show that if A is block diagonally dominant by columns then
                                                                                                                                      What can be deduced about the stability of the factorization for these two classes
                                                                                                                                      of matrices?
                                                                                                                                      13.2. Show that for the 1- and ∞-norms diagonal dominance does not imply block
                                                                                                                                      diagonal dominance, and vice versa.
                                                                                                                                      13.3. If A ∈ Rn×n is symmetric, has positive diagonal elements, and is block
                                                                                                                                      diagonally dominant by rows, must it be positive definite?
                                                                                                                                      258                                                      Block LU Factorization
A21 A22
                                                                                                                                      with A11 nonsingular. Let kAk := maxij |aij |. Show that kA21 A−1
                                                                                                                                                                                                     11 k ≤ nρn κ(A),
                                                                                                                                      where ρn is the growth factor for GE without pivoting on A. Show that the Schur
                                                                                                                                      complement S = A22 − A21 A−1 11 A12 satisfies κ(S) ≤ ρn κ(A).
                                                                                                                                      and the computed solution to the multiple right-hand side system AX = B (where
                                                                                                                                      (13.5) is assumed to hold for the multiple right-hand side triangular solves) satisfies
                                                                                                                                                                                        
                                                                                                                                                             b − Bk ≤ cn u kAk + kLk
                                                                                                                                                           kAX                    b kU
                                                                                                                                                                                     b k kXk
                                                                                                                                                                                          b + O(u2 ).
(T − U W −1 V T )−1 = T −1 + T −1 U (W − V T T −1 U )−1 V T T −1 ,