4.
SHORT-TERM LOAD FORECASTING -
    “WEIGHTED LEAST SQUARE METHOD (WLSM)”
4.1) INTRODUCTION
                Weighted Least Square Method utilizes a technique called
 discounted multiple regressions (DMR) to fit a time polynomial of historical
 hourly demands. Since most standard time polynomials do not fit all hourly
 demand values, hourly demands must be described by deterministic time
 polynomial plus a random variable. Fitting a time polynomial using a DMR
 technique basically involves finding the coefficients of a set of so called “fitting
 functions” that best fit the historical data. Fitting functions are time functions
 that describe the underlying variation of the historical component with time. The
 variation is best described by
          ξ(t) = a1 + a2 t + a3 t2 = a1 f1(t) + a2 f2(t) + a3 f3(t)
  In this case, three fitting functions are used to describe the load variations:
          f1(t) = 1.0
          f2(t) = t
          f3(t) = t2
                The DMR technique enables us to calculate the values for a 1, a2 and
 a3. Knowing these coefficients, we can use ξ(t) to forecast the load at some
 future hour.
                                             42
4.2) DISCOUNTED MULTIPLE REGRESSION
      We may describe the historical hourly demand data l(t) as
         l (t )   (t )   (t )
                 a f (t )   (t )
                                                                             (4.1)
 where a' = 1. n coefficient vector
         f(t) = n . 1 fitting function vector
 and η(t) is a random variable that describe the random variation of l(t) about ξ(t).
 It is necessary to assume that η(t) is a zero mean, Gaussian, white noise, random
 variable, implying that
     ( (t ))   (t )  0                       property1                  (4.2)
                     1        1   (t )  2 
     f ( (t ))         exp                         property 2         (4.3)
                    2      2     
                                             
     ( (t ) (t   ))  0                0                  property3
                                       2                                     (4.4)
                                         0
 Property 1 indicates that if we sum η(t) over all time, its average will be zero.
 Property 2 tells that the noise component is Gaussian distributed (bell-shaped)
 about its mean as shown in fig-4.1.
                                                               2.57σ
                                                  η(t)
                                                              Fig-4.1
                                                         43
              This probability density describes the probability that η(t) will be in
a range η(t) to η(t) + dη(t). Property 3 says that knowing the noise at time‘t’ does
not tell us what it will be at time t + τ ; it is serially uncorrelated. However, for
τ=0 we would interpret property 3 as the degree of spread of the probability
density function, where ση2 is the variance. Large variances indicate that the
noise varies widely about its mean. If the variance is zero, we are certain that
    . The standard deviation, ση, defines the 99% confidence interval; that is we
can be 99% confidant that a Gaussian–distributed random variable lies in a range
±2.57 ση about its mean.
              There are many ways of defining a best fit, but we define the best fit
to correspond to that value of a' that minimizes the squared difference between
the measured hourly demand using ξ(t). We use the squared difference to give
negative errors resulting when ξ(t) is greater than l(t), the same weight as
positive errors. Further we associate with every l(t), ξ(t) pair a weighting factor
w(t) that will allow us to place more weight on newer data. The function to be
minimized thus becomes:
            N                            2
                1
       J   W 2 (t )[l (t )   (t )]                                    (4.5)
           t 1 2
   W (t )     N t
Where‘t’ is a discrete variable starting with load 1 and continuing to the last load
of the available historical data. We find those values of a 1, a2 and a3 that
minimize the Eq.4.5. In carrying out the minimization procedure, it is
                                             44
advantageous to rely on the compactness of matrix notation and to express
Eq.4.5 in matrix form.
From Eq.4.1 we see that for a given‘t’
                                  f1 (t ) 
   (t )   a1     a2    a3     f (t )  a f (t )
                                  2 
                                  f 3 (t ) 
  l (t )  l (t )
  W (t )  W (t )
From Eq.4.5
         1                                1
      J  W 2 (1) [l (1)  a f (1)] 2  W 2 ( 2) [l (2)  a f ( 2)] 2  
         2                                2
         1                                             1
      J  [l (1)  a f (1)]W 2 (1) [l (1)  a f (1)  [l (2)  a f ( 2)]W 2 ( 2)[l (2)  a f (2)  
         2                                             2
                                                                                                                     2
                                                                                 w(1)    0   0             0 
                                                                 f 1 (t )   
                                                                 f (t )    0        w(2) 0             0 
          1                                                                  
           l (1) l (2)  l ( N )   a1            a2  an   2   0               0        0         
          2                                                                                                
                                                                                         0 w( n  1)    0 
                                                                 f n (t ) 
                                                                              0        0         0       w(n)
                                                                                             
                                                                      f 1 (t )  
                                                                      f (t )  
                                                                      2 
                     l (1) l ( 2 )  l ( N )    a    a    a                 
                    
                                                      1     2     n
                                                                        
                                                                               
                                                                      f n (t )  
                                                                                   
             1
              [l ( N )  a R ( N )] W ( N ) W ( N ) [ L( N )  a R ( N )]
             2                                                                                     (4.6)
Where,
    L( N )  1.N    vector whose elements are l (t )
       a   1.n   fitting function coefficient vector
     R ( N )  n.N fitting function matrix whose columns are f (t )
    W ( N )  N .N weighting factor diagonal vector whose elements are w( n)
                                                         45
           N t          0        0        0 
                                               
          0            N t 1    0        0 
W (N )   0             0             0      ,     0.95
                                               
                                 0   1    0 
                                               
          0             0              0    0
The minimization process involves determining the value of a' that results in the
first partial derivative of Eq.4.6 being equal to zero. Taking the first partial,
      J     1
         = a   [L(N) – a' R(N)] W(N)W'(N) [L(N) – a' R(N)]'
      a      2
                          1
                    = a           [L(N)W(N)W'(N)L'(N) – a'R(N)W(N)W'(N)L'(N) -
                           2
              L(N)W(N)W'(N)R'(N)a + a' R(N)W(N)W'(N)R'(N)a]                 (4.7)
In the above Eq. a' R(N)W(N)W'(N)L'(N) = 1 x 1 scalar, and the transpose of a
scalar is a scalar. Therefore
 a' R(N)W(N)W'(N)L'(N) = L(N)W(N)W'(N)R'(N) a
and
      J
         = - L(N)W(N)W'(N)R'(N) + a' R(N)W(N)W'(N)R'(N) = 0
      a
which gives the best estimate:
      â' = [L(N)W(N)W'(N)R'(N)] [R(N)W(N)W'(N)R'(N)]-1                      (4.8)
Letting
   F(N) = R(N)W(N)W'(N)R'(N)                                                (4.9)
   G(N) = L(N)W(N)W'(N)R'(N)                                                (4.10)
We have
                                              46
    â'(N) = G(N)F(N)-1                                                        (4.11)
 To find â'(N) computationally, we must process sequentially the historical load
 data to determine G(N) an F(N)-1 as indicated by Eq.4.11. From Eq.4.9 with
 W(N) equal to N x N diagonal matrix where the (N – t, N – t) diagonal element
 equals      N t
                     (β is called the “discount factor” and is less than unity), we get
 from the Eq.4.11, the fitting function coefficient vector.
4.3) SETTING UP THE TREND USING WEIGHTED LEAST
      SQUARE METHOD
               The implementation of the project commences with the setting up of
 trend using Weighted Least Square Method. Here, to forecast the load for every
 future hour, the loads are taken as the load of previous hour, previous day same
 hour, same day and same hour of the previous week etc. Let N represent the
 current week of the year, then the loads of same day same hour of N-1, N-2, N-3,
 N-4, N-5, N-6 weeks are taken. The temperature effect itself is included in the
 available data. The day effect is given consideration by taking the loads of same
 day of prior weeks and seasonality effect is included by taking number of prior
 weeks. The program for setting up the trend using the Weighted Least Square
 Method is developed in ‘C’ language. A step-by-step explanation of the program
 designed to set up the trend is explained with the help of a flowchart.
                                             47
4.3.1 FLOWCHART           START
                          Time( )
                       ReadData( )
                      Initialization( )
                        GMatrix( )
              Calculate
              G(N) = L(N)W(N)W'(N)R'(N)
                      FINVMatrix( )
             Calculate
              F(N)-1=[R(N)W(N)W'(N)R'(N)]-1
                        Forecast( )
                      Adjustments( )
                    Display Forecasted
                    value for next hour
                           STOP
                               48
 4.3.2 BASIC FUNCTIONS OF THE PROGRAM
          Basically there are eight functions in the program each of which
are explained below with the help of flowchart.
   DataRead            – To read data from database
   Time Function       – To find the system date and time to forecast the load
                        of future hour in accordance with the current hour.
   Initialization      – To initialize vectors L(N), R(N), R'(N), W(N), W'(N)
   GMatrix             – To compute the G(N) vector
   FInverse            – To compute F(N)-1 vector
   Coeffmatrix         – To find the fitting function coefficient vector from
                        G(N) and F(N)-1
   Forecast            – To forecast the load using fitting function vector
   Adjustments         – To adjust the forecast value from Forecast function
                        to reduce the deviation of value from actual load.
 4.3.3 COMPUTATION OF FITTING FUNCTION COEFFICIENT
        VECTOR
          To find the fitting function coefficient vector a'(N), the historical
load data has been processed to determine G(N) and F(N) -1 according to the
following equations:
     F(N) = R(N)W(N)W'(N)R'(N)
    G(N) = L(N)W(N)W'(N)R'(N)
                                     49
Here
        L(N) = [l1 l 2  l n ]                       Number of past loads
                  1      1                1
                  1                         n 
        R(N) =           2                          Fitting function matrix
                  12    22               n 2 
                   N t               0             0         0 
                                                                  
                  0                 N t 1        0          0 
        W (N )   0                  0                  0        Weighting function vector
                                                                  
                                                  0    1     0 
                                                                  
                  0                  0                  0      0
                discount factor
               Here ‘N’ has been tested with various values, but with N = 8 gives
 better forecasting results. So, ‘N’ has been taken as ‘8’. ‘b’ is chosen arbitrarily
 as 0.95. The final output of the method i.e. forecasted load is adjusted by the
 following equations to reduce the variation from the actual load.
  4.3.4 ADJUSTMENTS
                               n
       Cafac             =   
                             i 1
                                    [NWFAC(i) x ActMW(i) / OriginalMW(i)]
       UpMW(j)           = OrigMW(j) x Cafac
       Cafac             = Composite Adjustment Factor (%)
       UpMW(j)           = Updated Adjusted Load forecast for future hour ‘j’
       NWFAC(i) = Normalized Weighting Factor for past hour ‘i’ (%)
       ActualMW(i) = Actual System MW load for past hour ‘i’
       Orig MW(i) = Original System MW load for past hour ‘i’ and future
                               hour ‘j’.
                                                          50