Variance
The variance is a single-valued metric that reflects the amount of spread that the values
of a random variable will take on. More specifically, variance is the expected difference
between the random variable’s value and its mean (squared):
                                                      2
                              Var(X) ∶= E [ (X − E(X)) ]
(Definition 1). Variance can also be expressed as
                                  Var(X) = E(X 2 ) − E(X)2
as proven in Theorem 1.
Intuition
The variance of a random variable is single number that tells us about the amount of
spread that we would expect to see if we were able to repeatedly sample from random
variable’s distribution. We note that the expectation of a random variable only tells
us the average value of the random variable over a long number of observations, but it
doesn’t tell us anything about how spread out we expect these values to be. For example,
let us define a random variable X where,
                                       P(X = 0) = 0.5
                                     P(X = 100) = 0.5
then,
                                         E(X) = 50
If for another random variable,
                                       P(Y = 50) = 1
then,
                                         E(Y) = 50
Despite the fact that the two random variables behave very differently, they have the
same expected value. The expected value didn’t at all capture the fact that the values of
X are much more spread out than Y’s.
Properties
   1. Variance of a scaled random variable:
                                       Var(cX) = c2 Var(X)
        where c is a constant (Theorem 2). Unlike expectation, variance is not a linear
        function.
© Matthew Bernstein 2016                     1
  2. Variance of a shifted random variable: Given a random variable X and constant
     c, the variance of X + c is simply the variance of X:
                                    Var(X + c) = Var(X)
     (Theorem 3). This result makes intuitive sense; since the variance measures the
     amount of spread of the distribution, shifting the distribution left or right by a
     constant doesn’t affect that spread and therefore shouldn’t affect the variance.
  3. Variance of a point mass random variable: Given a random variable for which
     X = c with probability 1, the variance of X is zero. Furthermore, if X is not
     constant, then its variance is greater than zero (Theorem 4). This makes intuitive
     sense, if a random variable will always be the same value, then there is zero spread
     in the outcomes. On the other hand, if the random variable can take on more than
     one value (even with small probability), the average spread will be non-zero.
  4. Variance of convolution of random variables: Given two random variables X
     and Y, variance of their sum is:
                       Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y)
     (Theorem 5). If, however, X and Y are independent, then we have
                               Var(X + Y) = Var(X) + Var(Y)
     since the covariance of independent random variables is always zero.
 Definition 1 Given random variable X with defined expected value, it’s variance is
 given by
                                                    2
                          Var(X) ∶= E [ (X − E(X)) ]
 Theorem 1
                              Var(X) = E(X 2 ) − E(X)2
© Matthew Bernstein 2016                   2
 Proof:
                                           2
                    Var(X) = E [ (X − E(X)) ]
                              = E [(X − E(X))(X − E(X))]
                              = E [X 2 − 2XE(X) + E(X)2 ]
                              = E(X 2 ) − E[2XE(X)] + E[E(X)2 ]
                              = E(X 2 ) − 2E(X)E(X) + E(X)2
                              = E(X 2 ) − 2E(X)2 + E(X)2
                              = E(X 2 ) − E(X)2
 Theorem 2
                                 Var(cX) = c2 Var(X)
 Proof:
                             Var(cX) = E [(cX)2 ] − E(cX)2
                                     = c2 E (X 2 ) − c2 E(X)2
                                     = c2 [E (X 2 ) − E(X)2 ]
                                     = c2 Var(X)
 Theorem 3
                                 Var(X + c) = Var(X)
 Proof:
                Var(X + c) = E [(X + c)2 ] − E(X + c)2
                             = E (X 2 + 2cX + c2 ) − E(X + c)E(X + c)
 Expanding the first term,
                     E (X 2 + 2cX + c2 ) = E (X 2 ) + 2cE(X) + c2
© Matthew Bernstein 2016                   3
 Expanding the second term,
               E(X + c)E(X + c) = E(X)E(X + c) + E(c)E(X + c)
                                = E[XE(X) + cE(X)] + cE(X) + cE(c)
                                = E(X)2 + cE(X) + cE(X) + c2
                                = E(X)2 + 2cE(X) + c2
 Putting it all together,
               Var(X + c) = E (X 2 ) + 2cE(X) + c2 − E(X)2 − 2cE(X) − c2
                            = E (X 2 ) − E(X)2
                            = Var(X)
 Theorem 4 If a random variable X is equal to a constant c, then
                                        Var(X) = 0
 Otherswise,
                                        Var(X) ≥ 0
 Proof:
 The proof of this property lies in the fact that variance is equal to E [(X − E(X))2 ].
 Not that if X = c, then the value inside the expectation is zero. Otherwise, the value
 inside the exception is positive due to the squared.
 Theorem 5
                      Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y)
 and
                      Var(X − Y) = Var(X) + Var(Y) − 2Cov(X, Y)
© Matthew Bernstein 2016                     4
 Proof:
 We only prove the result for X + Y. The result for X − Y can be proven by
 identical calculation by substitution −Y for Y.
                       Var(X + Y) = E [(X + Y)2 ] − E(X − Y)2
 We see that the first term can be expressed as,
                      E [(X + Y)2 ] = E (X 2 + 2XY + Y 2 )
                                    = E (X 2 ) + 2E(XY) + E (Y 2 )
 The second term can be expressed as,
           E(X + Y)2 = E(X + Y)E(X + Y)
                     = E(X)E(X + Y) + E(Y)E(X + Y)
                     = E[E(X)(X + Y)] + E[E(Y)(X + Y)]
                     = E[XE(X) + XE(Y)] + E[XE(Y) + Y E(Y)]
                     = E[XE(X)] + E[XE(Y)] + E[XE(Y)] + E[Y E(Y)]
                     = E(X)2 + E(X)E(Y) + E(X)E(Y) + E(Y)2
                     = E(X)2 + 2E(X)E(Y) + E(Y)2
 Now putting it all together,
     Var(X + Y) = E (X 2 ) + 2E(XY) + E (Y 2 ) − E(X)2 − 2E(X)E(Y) − E(Y)2
                  = E (X 2 ) − E(X)2 + E (Y 2 ) − E(Y)2 + 2[E(XY) − E(X)E(Y)]
                  = Var(X) + Var(Y) + 2Cov(X, Y)
© Matthew Bernstein 2016                   5