Least Squares and Kalman Filtering
Questions: Email me, namrata@ece.gatech.edu
Least Squares and Kalman Filtering                                                 1
                                        Recall: Weighted Least Squares
            • y = Hx + e
            • Minimize
                                                                  4
                                     J(x) = (y − Hx)T W (y − Hx) = ||y − Hx||2W   (1)
                 Solution:
                                              x̂ = (H T W H)−1 H T W y            (2)
            • Given that E[e] = 0 and E[eeT ] = V ,
              Min. Variance Unbiased Linear Estimator of x: choose W = V −1 in (2)
              Min. Variance of a vector: variance in any direction is minimized
Least Squares and Kalman Filtering                                                      2
                                               Recall: Proof
   • Given x̂ = Ly, find L, s.t. E[Ly] = E[LHx] = E[x], so LH = I
   • Let L0 = (H T V −1 H)−1 H T V −1
   • Error variance E[(x − x̂)(x − x̂)T ]
            E[(x − x̂)(x − x̂)T ]              =   E[(x − LHx − Le)(x − LHX − Le)T ]
                                               =   E[LeeT LT ] = LV LT
        Say L = L − L0 + L0 . Here LH = I, L0 H = I, so (L − L0 )H = 0
        LV LT             =          L0 V LT0 + (L − L0 )V (L − L0 )T + 2L0 V (L − L0 )T
                          =          L0 V LT0 + (L − L0 )V (L − L0 )T + (H T V −1 H)−1 H T (L − L0 )T
                          =          L0 V LT0 + (L − L0 )V (L − L0 )T ≥ L0 V LT0
        Thus L0 is the optimal estimator (Note: ≥ for matrices)
Least Squares and Kalman Filtering                                                          3
                                     Regularized Least Squares
            • Minimize
                        J(x) = (x − x0 )T Π−1                    T
                                           0 (x − x0 ) + (y − Hx) W (y − Hx)   (3)
                                            0   4              0 4
                                        x       =   x − x0 , y = y − Hx0
                                      J(x) =        x0T Π−1
                                                         0  x0
                                                               + y 0T
                                                                      W y 0
                                                =   zM −1z     
                                                4      0        I
                                         z      =        −      x0
                                                       y0       H
                                                              
                                                4      Π−1   0
                                        M       =     0       
                                                        0   W
Least Squares and Kalman Filtering                                                   4
                                                                                        
                                                                      0                I
            • Solution: Use least squares formula with ỹ =               , H̃ =        ,
                                                                      y0               H
                 W̃ = M
                 Get:
                                     x̂ = x0 + (Π−1   T
                                                 0 + H W H)
                                                            −1 T
                                                              H W (y − Hx0 )
            • Advantage: improves condition number of H T H, incorporate prior
              knowledge about distance from x0
Least Squares and Kalman Filtering                                                              5
                                     Recursive Least Squares
            • When number of equations much larger than number of variables
                  – Storage
                  – Invert big matrices
                  – Getting data sequentially
            • Use a recursive algorithm
              At step i − 1, have x̂i−1 : minimizer of
              (x − x0 )T Π−1                               2
                           0 (x − x0 ) + ||Hi−1 x − Yi−1 ||Wi−1 , Yi−1 = [y1 , ...yi−1 ]
                                                                                         T
                 Find x̂i : minimizer of (x − x0 )T Π−1
                                                     0  (x − x0 ) + ||Hi x − Yi ||2Wi ,
                                 
                            Hi−1
                 Hi =             (hi is a row vector), Yi = [y1 , ...yi ]T (column vector)
                             hi
Least Squares and Kalman Filtering                                                              6
                 For simplicity of notation, assume x0 = 0 and Wi = I.
                       HiT Hi            T
                                      = Hi−1 Hi−1 + hTi hi
                                x̂i   = (Π−1
                                          0  + H T
                                                 i H i )−1 T
                                                          Hi Yi
                                      = (Π−1
                                          0  + H T
                                                 i−1 H i−1 + h T
                                                               i h i ) −1
                                                                          (H T
                                                                             i−1 Y i−1 + h T
                                                                                           i yi )
                 Define
                                            Pi   = (Π−1   T
                                                     0 + Hi Hi )
                                                                −1
                                                                   , P−1 = Π0
                                      So Pi−1       −1
                                                 = Pi−1 + hTi hi
                 Use Matrix Inversion identity:
                 (A + BCD)−1 = A−1 + A−1 B(C −1 + DA−1 B)−1 DA−1
                                                        Pi−1 hTi hi Pi−1
                                            Pi = Pi−1 −
                                                        1 + hi Pi−1 hTi
Least Squares and Kalman Filtering                                                                  7
                            x̂0      = 0
                             x̂i     =    Pi HiT Yi
                                                  Pi−1 hTi hi Pi−1     T            T
                                     =    [Pi−1 −               T
                                                                   ][H i−1 Yi−1 + h i yi ]
                                                  1 + hi Pi−1 hi
                                                T                Pi−1 hTi                T
                                     =    Pi−1 Hi−1 Yi−1    −              T
                                                                             h i P i−1 H i−1 Yi−1
                                                              1 + hi Pi−1 hi
                                                             hi Pi−1 hTi
                                          +Pi−1 hTi (1   −               T
                                                                           )yi
                                                           1 + hi Pi−1 hi
                                                     Pi−1 hTi
                                     =    x̂i−1 +              T
                                                                 (yi − hi x̂i−1 )
                                                  1 + hi Pi−1 hi
                                                                         1/2               1/2
                 If Wi 6= I, this modifies to (replace yi by wi                yi & hi by wi     hi ):
                          x̂i        =   x̂i−1 + Pi−1 hTi (wi −1 + hi Pi−1 hTi )−1 (yi − hi x̂i−1 )
Least Squares and Kalman Filtering                                                                       8
                 Here we considered yi to be a scalar and hi to be a row vector.
                 In general: yi can be a k-dim vector, hi will be a matrix with k rows
                 RLS with Forgetting factor
                                                                Pi
                 Weight older data with smaller weight J(x) =      j=1 (yj   − hj x)2 β(i, j)
                 Exponential forgetting: β(i, j) = λi−j ,   λ<1
                 Moving average: β(i, j) = 0 if |i − j| > ∆ and β(i, j) = 1 otherwise
Least Squares and Kalman Filtering                                                              9
                                      Connection with Kalman Filtering
        The above is also the Kalman filter estimate of the state for the following
        system model:
                                     xi   =   xi−1
                                     yi   =   hi xi + vi , vi ∼ N (0, Ri ), Ri = wi−1   (4)
Least Squares and Kalman Filtering                                                            10
                                                   Kalman Filter
        RLS was for static data: estimate the signal x better and better as more and
        more data comes in, e.g. estimating the mean intensity of an object from a
        video sequence
        RLS with forgetting factor assumes slowly time varying x
        Kalman filter: if the signal is time varying, and we know (statistically) the
        dynamical model followed by the signal: e.g. tracking a moving object
                                     x0    ∼   N (0, Π0 )
                                     xi    =   Fi xi−1 + vx,i , vx,i ∼ N (0, Qi )
        The observation model is as before:
                                          yi   =   hi xi + vi , vi ∼ N (0, Ri )
Least Squares and Kalman Filtering                                                      11
        Goal: get the best (minimum mean square error) estimate of xi from Yi
        Cost: J(x̂i ) = E[(xi − x̂i )2 |Yi ]
        Minimizer: conditional mean x̂i = E[xi |Yi ]
        This is also the MAP estimate, i.e. x̂i also maximizes p(xi |Yi )
Least Squares and Kalman Filtering                                              12
                                           Kalman filtering algorithm
        At i = 0, x̂0 = 0, P0 = Π0 .
        For any i, assume that we know x̂i−1 = E[xi |Yi−1 ]. Then
                                                                 4
                                        E[xi |Yi−1 ] = Fi x̂i−1 = x̂i|i−1
                                                                            4
                                     V ar(xi |Yi−1 )   = Fi Pi−1 FiT + Qi = Pi|i−1   (5)
        This is the prediction step
Least Squares and Kalman Filtering                                                         13
        Filtering or correction step: Now xi |Yi−1 & yi |xi , Yi−1 jointly Gaussian
                                                   xi |Yi−1    ∼   N (x̂i|i−1 , Pi|i−1 )
                                      yi |xi , Yi−1 = yi |xi   ∼   N (hi xi , Ri )
        Using formula for the conditional distribution of Z1 |Z2 when Z1 and Z2 are
        jointly Gaussian,
                E[xi |Yi ] =             x̂i|i−1 + Pi|i−1 hTi (Ri + hi Pi|i−1 hTi )−1 (yi − hi x̂i|i−1 )
           V ar(xi |Yi )             = Pi|i−1 − Pi|i−1 hTi hi Pi|i−1 (Ri + hi Pi|i−1 hTi )−1
        x̂i = E[xi |Yi ] and Pi = V ar(xi |Yi )
Least Squares and Kalman Filtering                                                                         14
                                        Summarizing the algorithm
                x̂i|i−1         =    Fi x̂i−1
                Pi|i−1          =    Fi Pi−1 FiT + Qi
                       x̂i      =    x̂i|i−1 + Pi|i−1 hTi (Ri + hi Pi|i−1 hTi )−1 (yi − hi x̂i|i−1 )
                       Pi       =    Pi|i−1 − Pi|i−1 hTi hi Pi|i−1 (Ri + hi Pi|i−1 hTi )−1
        For Fi = I, Qi = 0, get the RLS algorithm.
Least Squares and Kalman Filtering                                                                     15
                                     Example Applications
            • RLS:
                  – adaptive noise cancelation, given a noisy signal dn assumed to be
                    given by dn = uTn w + vn , get the best estimate of the weight w.
                    Here yn = dn , hn = un , x = w
                  – channel equalization using a training sequence
                  – Object intensity estimation: x = intensity, yi = vector of intensities of
                    object region in frame i, hi = 1m (column vector of m ones),
            • Kalman filter: Track a moving object (estimate its location and velocity
              at each time)
Least Squares and Kalman Filtering                                                              16
                                     Suggested Reading
            • Chapters 2, 3 & 9 of Linear Estimation, by Kailath, Sayed, Hassibi
            • Chapters 4 & 5 of An Introduction to Signal Detection and Estimation,
              by Vincent Poor
Least Squares and Kalman Filtering                                                    17