Output
Output
Assignment Solutions
Authors: Manas Sharma (MB2412)
Advisor/Instructor: Dr. Arijit Chakrabarti, Associate Professor, Applied Statistics Unit
Course: Large Sample Statistical Methods — M. Stat. — Indian Statistical Institute, Kolkata
Question 1 (Supplementary Sufficient Conditions for Fréchet Domain). Let F be a distribution function
with density f , and assume that f (x) > 0 for all sufficiently large x. Suppose further that
                                                        xf (x)
                                                lim             = ν > 0.
                                               x→∞    1 − F (x)
Show that under this condition, F lies in the domain of attraction of the Fréchet distribution H1,ν , i.e.,
                                         1 − F (tx)
                                      lim           = x−ν ,          for all x > 0.
                                     t→∞ 1 − F (t)
Proof. Define the survival function S(x) = 1 − F (x). The given condition can then be rewritten as
                                          xf (x)       −xS ′ (x)
                                       lim       = lim           = ν > 0.
                                      x→∞ S(x)    x→∞   S(x)
We aim to establish that for all x > 0,
                                                   S(tx)
                                                    lim  = x−ν .
                                                   S(t)
                                                  t→∞
Let c = log x. By the Mean Value Theorem, for each u, there exists some ξu ∈ (u, u + c) such that
                                             h(u + c) − h(u) = c · h′ (ξu ).
As u → ∞, ξu → ∞, and since limy→∞ h′ (y) = −ν, we have
                                                    lim h′ (ξu ) = −ν.
                                                  u→∞
Consequently,
                                   lim (h(u + c) − h(u)) = c · (−ν) = −ν log x.
                                  u→∞
Exponentiating both sides, we obtain:
                                            S(eu+log x )
                                        lim              = e−ν log x = x−ν .
                                        u→∞   S(eu )
Substituting back t = eu , it follows that
                                                      S(tx)
                                                    lim     = x−ν .
                                                  t→∞ S(t)
Hence,
                                       1 − F (tx)
                                      lim         = x−ν , for all x > 0,
                                     t→∞1 − F (t)
which confirms that F is in the domain of attraction of the Fréchet distribution H1,ν .                         ■
Question 2. Consider a multinomial distribution with k cells. Let the cell probabilities π1 , . . . , πk be specified
functions π1 (θ), . . . , πk (θ) of m(< k) unknown parameters θ = (θ1 , . . . , θm )⊤ . Assume the following conditions
hold:
(a) Each πi (θ) admits continuous partial derivatives with respect to θj for all j = 1, . . . , m.
                                                                        1    ∂πr (θ)
                                                     Mrs = p
                                                                       πr (θ) ∂θs
Let I(θ) be the Fisher Information matrix for a single observation from this multinomial distribution. Show
that
                                              I(θ) = M ⊤ M.
Proof. Let X = (X1 , . . . , Xk )⊤ represent the outcome of a single trial from the multinomial distribution with
probabilities π(θ) = (π1 (θ), . . . , πk (θ))⊤ . Here, Xi = 1 if the outcome falls into cell i, and Xi = 0 otherwise.
The log-likelihood function for a single observation is:
                                                             k
                                                             X
                                               ℓ(θ; X) =               Xi log πi (θ).
                                                             i=1
I(θ) = Eθ [S(θ)S(θ)⊤ ].
Taking the expectation and using the properties of multinomial indicator variables (E[Xi ] = πi (θ), E[Xi Xl ] =
0 for i ̸= l), we get:                              (
                                                      πi (θ) if i = l,
                                        E[Xi Xl ] =
                                                      0      if i ̸= l.
Thus, the expectation simplifies to:
                                                   k
                                                   X                  1 ∂πi (θ) ∂πi (θ)
                                       I(θ)js =          πi (θ)                         .
                                                   i=1
                                                                  πi (θ)2 ∂θj    ∂θs
Simplifying further:
                                                     k
                                                     X        1 ∂πi (θ) ∂πi (θ)
                                          I(θ)js =                              .
                                                     i=1
                                                           πi (θ) ∂θj    ∂θs
                                                              1    ∂πr (θ)
                                                  Mrs = p                  .
                                                             πr (θ) ∂θs
I(θ)js = (M ⊤ M )js .
I(θ) = M ⊤ M. ■
Solution.
Step 1: Asymptotic Distribution of Sample Moments Let Yi = (Xi , Xi2 , . . . , Xik )⊤ . Then Y1 , . . . , Yn
are i.i.d. random vectors.P
                          The population    P mean   vector is E[Yi ] = (µ1 (θ 0 ), . . . , µk (θ 0 ))⊤ = e(θ 0 ). The sample
mean vector is Yn = ( n1    Xi , . . . , n1  Xik )⊤ = m. The covariance matrix of Yi is Σ, a k × k matrix with
entries
               Σrs = CoV(X r , X s ) = Eθ0 (X r+s ) − Eθ0 (X r )Eθ0 (X s ) = µr+s (θ 0 ) − µr (θ 0 )µs (θ 0 ).
The existence of the 2k-th moment ensures that all entries of Σ are finite. By the Multivariate Central Limit
Theorem (CLT, see Question 13):
                                      √                 d
                                        n(m − e(θ 0 )) −→ Nk (0, Σ).
Step 2: Applying the Multivariate Delta Method The MOM estimator is θ          bn = g(m), where g = e−1
is the inverse function mapping the vector of moments back to the parameter vector. The true parameter
is θ 0 = g(e(θ 0 )). The function g maps from a subset of Rk to Rp . Let G be the Jacobian matrix of the
transformation g evaluated at the point e(θ 0 ). G is a p × k matrix:
                                                          ∂g(y)
                                                     G=                          .
                                                           ∂y⊤      y=e(θ 0 )
                                                                                                √     bn − θ 0 ) is given by:
By the Multivariate Delta Method (see Question 13), the asymptotic distribution of                  n(θ
                                       √                            d
                                                                → Np (0, GΣG⊤ ).
                                           n(g(m) − g(e(θ 0 ))) −
Step 3: Relating G to the Jacobian of e Let J be the k × p Jacobian matrix of e(θ) evaluated at θ 0 :
                                                            ∂e(θ)
                                                      J=                     .
                                                            ∂θ ⊤     θ=θ 0
Since g = e−1 and we have the conditions that e is continuously differentiable with J having full column rank,
the inverse function theorem ensures that g is continuously differentiable in a neighborhood of e(θ 0 ).
By the chain rule, differentiating g(e(θ)) = θ with respect to θ ⊤ gives:
                              ∂g(y)                        ∂e(θ)
                                                       ·                   = Ip   =⇒   GJ = Ip .
                               ∂y⊤         y=e(θ 0 )       ∂θ ⊤    θ=θ 0
                     ∂e(θ)                 ∂g(y)
                                       ·                           = Ik     =⇒    JG = Ik when θ = θ 0 .
                     ∂θ ⊤    θ=g(y)         ∂y⊤        y=e(θ 0 )
• If p = k (equal number of parameters and moments), then J is square and non-singular, and G = J −1 .
   • If p < k (more moments than parameters), then J is k × p with full column rank p, and G is a left
     inverse of J. Specifically, G = (J ⊤ J)−1 J ⊤ , which is the Moore-Penrose pseudo-inverse when J has full
     column rank.
Step 4: Final Expression for Vθ0 Substituting the expression for G into Vθ0 = GΣG⊤ :
   • For p = k:
                                                Vθ0 = J −1 Σ(J −1 )⊤ = J −1 Σ(J ⊤ )−1 .
   • For p < k:
                                                   Vθ0 = (J ⊤ J)−1 J ⊤ ΣJ(J ⊤ J)−1 .
                             ∂µr (θ)
The entries of J are Jrs =    ∂θs      evaluated at θ 0 , and the entries of Σ are Σrs = µr+s (θ 0 ) − µr (θ 0 )µs (θ 0 ).
Therefore, the asymptotic covariance matrix of the MOM estimator is explicitly given by:
                                       (
                                         J −1 Σ(J ⊤ )−1 ,       if p = k,
                                Vθ 0 =      ⊤   −1 ⊤      ⊤ −1
                                         (J J) J ΣJ(J J) , if p < k.
Question 4. Let (n1 , . . . , nk ) be the observed counts from n independent trials, where each trial results in
                                                                          Pk
                                                                                           b = (n1 /n, . . . , nk /n)⊤
one of k categories with probabilities p = (p1 , . . . , pk )⊤ , such that j=1 pj = 1. Let p
be the vector of sample proportions. Show that
                                              √              d
                                                  p − p) −
                                                n(b          → Nk (0, Σ),
where Σ = diag(p) − pp⊤ .
Solution. Let Xi = (Xi1 , . . . , Xik )⊤ for i = 1, . . . , n be the outcome vector for the i-th independent trial. The
component Xij is an indicator variable defined as:
                                          (
                                            1 if the i-th trial results in category j
                                   Xij =
                                            0 otherwise
                                                                      Pk
Since each trial must result in exactly one category, we have j=1 Xij = 1 for each i. The vectors X1 , . . . , Xn
are independent and identically distributed (i.i.d.).
Distribution of a single trial vector Xi Each Xi follows a Multinomial distribution with 1 trial and
category probabilities (p1 , . . . , pk ), denoted as Multinomial(1, p).
Covariance Matrix of Xi Let Σ be the k × k covariance matrix of Xi . The (j, l)-th entry is Σjl =
CoV(Xij , Xil ) = E[Xij Xil ] − E[Xij ]E[Xil ].
   • Off-diagonal entries (j ̸= l): We need CoV(Xij , Xil ) = E[Xij Xil ]−E[Xij ]E[Xil ]. The product Xij Xil
     is 1 only if the i-th trial results in both category j and category l. Since the categories are mutually
     exclusive, this is impossible. Thus, Xij Xil = 0 always for j ̸= l.
                                                         E[Xij Xil ] = 0.
      So, the off-diagonal entries are:
                                                   Σjl = 0 − pj pl = −pj pl .
                                     b ThePtotal count for category j after n trials is nj = ni=1 Xij . The vector
                                                                                             P
Sample Proportion Vector p
                                             n
of observed counts is (n1 , . . . , nk )⊤ = i=1 Xi . The vector of sample proportions is p b = (n1 /n, . . . , nk /n)⊤ .
We can write this as:
                                                                        n
                                               1                     1X
                                           b = (n1 , . . . , nk )⊤ =
                                           p                               Xi .
                                               n                     n i=1
Applying the Multivariate Central Limit Theorem (CLT) The Multivariate CLT states that if
Y1 , . . . , Yn are i.i.d. k-dimensional random vectors with mean vector µ and finite covariance matrix Σ, then
as n → ∞:                                     √            d
                                                n(Yn − µ) −→ Nk (0, Σ),
                   Pn
where Yn = n1 i=1 Yi .
In our case, Yi = Xi , Yn = p   b , µ = E[Xi ] = p, and the covariance matrix is Σ = diag(p) − pp⊤ . The
covariance matrix Σ is finite since all pj ∈ [0, 1].
Applying the Multivariate CLT directly gives:
                                              √             d
                                                    p − p) −
                                                  n(b      → Nk (0, Σ),
Question 5. Consider  Pa k × l contingency table with observed cell counts nij (i = 1, . . . , k; j = 1, . . . , l) and
total sample size n = i,j nij . Let πij be the probability of an observation falling
                                                                                P into cell (i, j). P We want to
test the hypothesis of independence H0 : πij = πi. π.j for all i, j, where πi. = j πij and π.j = i πij are the
marginal probabilities. Assume the marginal probabilities πi. and π.j are known.
Define the following chi-squared statistics:
                                               k X l                    2
                                              X       (nij − nπi. π.j )
                                        T =
                                              i=1 j=1
                                                           nπi. π.j
                                              k              2
                                              X (ni. − nπi. )                       X
                                       T1 =                           where ni. =              nij
                                              i=1
                                                          nπi.                         j
                                               l               2
                                              X   (n.j − nπ.j )                     X
                                       T2 =                           where n.j =              nij
                                              j=1
                                                       nπ.j                            i
The test statistic often used for testing independence when marginals are known is T3 = T − T1 − T2 . Show
the algebraic simplification of T3 to the form
                                             k X l                                       2
                                            X       (nij − ni. π.j − n.j πi. + nπi. π.j )
                                       T3 =
                                            i=1 j=1
                                                                   nπi. π.j
         X (nij − eij )2 X (ni. − ei. )2 X (n.j − e.j )2   X (nij − ni. π.j − n.j πi. + nπi. π.j )2
                        −               −                =                                          .
         i,j
                eij       i
                                ei.       j
                                                e.j        i,j
                                                                            nπi. π.j
                                                      2
We start by expanding the term (nij − eij ) :
Let dij = nij − ni. π.j − n.j πi. + eij , di. = ni. − ei. , and d.j = n.j − e.j . Then
We need to show that the cross-product terms sum to zero when summed over i and j.
                                X 2dij (π.j di. )             X 2di. X
                                                          =            dij
                                 i,j
                                           nπi. π.j           i
                                                                nπi. j
                                X 2dij (πi. d.j )             X 2d.j X
                                                          =            dij
                                 i,j
                                           nπi. π.j           j
                                                                nπ.j i
                                                                                                     !      
                            X 2 (π.j di. ) (πi. d.j )                    X 2di. d.j       2   X        X
                                                                 =                    =           di.   d.j 
                             i,j
                                               nπi. π.j                  i,j
                                                                                  n       n   i         j
                            = −nπi. + nπi. = 0.
              P                 P         P                   P              P
Similarly,
P            i dij   = 0. Also,   i di. =  i (ni. − nπi. ) = ( i ni. ) − n (  i πi. ) = n − n(1) = 0. Similarly,
  j d.j = 0.
       P              P                                                        P             P
Since j dij = 0 and i dij = 0, the first two cross-product sums are zero. Since i di. = 0 and j d.j = 0,
the third cross-product sum is also zero.
Therefore,
              X (nij − eij )2   X d2ij + (π.j di. )2 + (πi. d.j )2
                              =
              i,j
                     eij        i,j
                                             nπi. π.j
                                       X d2ij            X π.j 2 2
                                                                di.     X πi.2 d2.j
                                   =                 +                +
                                       i,j
                                               eij       i,j
                                                             nπi. π.j   i,j
                                                                            nπi. π.j
                                       X d2ij            X d2 X         X d2.j X
                                                            i.
                                   =                 +            π.j +          πi.
                                       i,j
                                               eij       i
                                                           nπi. j       j
                                                                          nπ.j i
                                       X d2ij            X d2         X d2.j
                                                            i.
                                   =                 +          (1) +        (1)
                                       i,j
                                               eij       i
                                                           nπi.       j
                                                                        nπ.j
                                       X (nij − ni. π.j − n.j πi. + eij )2 X (ni. − ei. )2 X (n.j − e.j )2
                                   =                                      +               +
                                       i,j
                                                        eij                 i
                                                                                  ei.       j
                                                                                                  e.j
This confirms the algebraic simplification provided in the question prompt. This result is a specific instance
of Lancaster-Irwin partitioning for contingency tables.
Asymptotic Distribution Under the null hypothesis H0 : πij = πi. π.j with known marginal probabilities
πi. and π.j :
         d
   • T −→ χ2kl−1 (since the cell counts follow a multinomial distribution with kl cells, and probabilities sum
     to 1).
          d
   • T1 − → χ2k−1 (since the row sums ni. follow a multinomial distribution with k cells and known probabilities
     πi. ).
          d
   • T2 −→ χ2l−1 (since the column sums n.j follow a multinomial distribution with l cells and known proba-
     bilities π.j ).
By Cochran’s theorem or properties of partitioning chi-squared statistics for multinomial data, the statistic
T3 = T − T1 − T2 measures the interaction (departure from independence) after accounting for the marginal
discrepancies. Its asymptotic distribution under H0 is chi-squared with degrees of freedom equal to the
difference in the degrees of freedom of the components:
Thus, under H0 , as n → ∞,
                                                   d
                                                  → χ2(k−1)(l−1) .
                                               T3 −
Therefore,
                                       n
                                    1 X h            2
                                                                i
                                         E (Xk − µk )  1 (A nk )  →0              as n → ∞.
                                   s2n
                                        k=1
Since this holds for all ϵ > 0, the Lindeberg-Feller condition is satisfied. ■
                                       (n + 1)i/2
                                                 
                                                         n
                             P Xn =                 =          , i, n ∈ N.
                                           n          (n + 1)i
Solution. First, we verify that the probability mass function (PMF) is valid for each Xn :
                                         ∞                    ∞
                                         X        n          X      1
                                                       i
                                                         = n
                                         i=1
                                               (n + 1)       i=1
                                                                 (n + 1)i
                                                                   1      1
                                                            =n·       ·     1
                                                                 n + 1 1 − n+1
                                                                 1
                                                            = n · = 1.
                                                                 n
Let µn = E[Xn ]. To apply the weak law of large numbers via Markov’s criterion, we examine whether
                                           n
                                         1X
                                            E[|Xk − µk |] → 0              as n → ∞.
                                         n
                                          k=1
Simplifying:
                                         ∞                        ∞
                                         X         1             X      n          1
                       E[|Xn − µn |] =                    − µn ·             i
                                                                               +
                                         i=2
                                               (n + 1)i/2        i=2
                                                                     (n + 1)     n + 1
                                                               √
                                               1       1      ( n + 1 + 1) n + 1    1
                                         =        ·      1  −             ·      +
                                             n + 1 1 − √n+1     n(n + 1)     n     n+1
                                              2
                                         =       .
                                             n+1
Now consider:
                                                   n                            n
                                             1X                 1X 2
                                                E[|Xk − µk |] =
                                             n                  n k+1
                                               k=1                              k=1
                                                                                 n
                                                                            2   X      1
                                                                        =                 .
                                                                            n         k+1
                                                                                k=1
        Pn    1
Since    k=1 k+1   ∼ ln n, we have
                                                   n
                                              1X                 2 ln n
                                                 E[|Xk − µk |] ∼        → 0.
                                              n                     n
                                                   k=1
Since F is a distribution function, we can write F (x) = P(X1 ≤ x1 , . . . , Xk ≤ xk ) for a random vector
X = (X1 , . . . , Xk ).
For any b = (b1 , . . . , bk ) ∈ Rk , let us define
                                                       F (b− ) =     lim F (x)
                                                                     x→b
                                                                   xi <bi ,∀i
This is because S precisely captures the set of points that contribute to the jump of F at a. Since F (a) =
F (a− ) by continuity, we have P(S) = 0.
(If direction):      Assume P(S) = 0. We need to show F is continuous at a, i.e., F (a) = F (a− ).
Let Si = {x ∈ Rk : xi = ai , xj ≤ aj ∀ j ̸= i} so that S = ∪ki=1 Si . Then,
                                                   F (a) − F (a− ) = P(S) = 0
where the inequality follows from the sub-additivity of probability and the equality to zero follows from our
assumption that P(S) = 0.
Since F (a) − F (a− ) = 0, we have F (a) = F (a− ), implying that F is continuous at a.
Therefore, F is continuous at a if and only if P(S) = 0.                                                              ■
                                               p
Question 9. Show that if Xn −
                            → a and f is differentiable at a, then
Proof. Since f is differentiable at a, by definition, there exists a function r(x) such that for all x near a, we
have
                                   f (x) = f (a) + (x − a)f ′ (a) + (x − a)r(x),
where limx→a r(x) = 0.
Define a function g : R → R as follows:
                                                                ′
                                      
                                       f (x) − f (a) − (x − a)f (a)
                                                                     , if x ̸= a
                              g(x) =                x−a
                                      0,                              if x = a
                                           p
Define Zn = g(Xn ). Since Xn −
                             → a and g is continuous at a, it follows from the Continuous Mapping Theorem
                 p
that Zn = g(Xn ) −
                 → g(a) = 0.
Hence, we have
                                           f (Xn ) = f (a) + (Xn − a)f ′ (a) + (Xn − a)Zn ,
              p
where Zn −
         → 0, completing the proof.                                                                                        ■
Question 10 (Compulsory). Let Tn = (T1n , . . . , Tkn )⊤ be a sequence of k-dimensional statistics for n ≥ 1,
with the asymptotic behavior:
                                    √                  d
                                      n(Tn − θ) −      → Nk (0, Σ),
where θ ∈ Rk and Σ is a positive semi-definite covariance matrix. Let gi : Rk → R, i = 1, . . . , ℓ, be functions
that are differentiable. The asymptotic distribution of
                                             √                
                                               n g(Tn ) − g(θ) ,
                                                                       √
Lemma 10.2. Let Tn ∈ Rk be a sequence such that                            n(Tn − θ) is Op (1). If ∥Rn ∥ = o(∥Tn − θ∥), then
√     p
  nRn −
      → 0.
Proof. Using Lemma 10.1, we can expand g(Tn ) via a first-order Taylor series around θ:
              √               d
Given that        n(Tn − θ) −
                            → Nk (0, Σ), by the continuous mapping theorem, we get:
                                    √          d
                                               → G · Nk (0, Σ) = Nℓ (0, GΣG⊤ ).
                                   G n(Tn − θ) −
                                     √                                                      √        p
Moreover, by Lemma 10.2, since           n(Tn − θ) is convergent in distribution, we have       nRn −
                                                                                                    → 0.
Finally, by Slutsky’s theorem, which allows us to combine converging sequences, we conclude:
                          √                      √           √    d
                                                                  → Nℓ (0, GΣG⊤ ).
                                             
                              n g(Tn ) − g(θ) = G n(Tn − θ) + nRn −
                                                                                                                    ■
Question 11. Suppose Xi are i.i.d. random variables with a finite fourth moment. Let
                                                                σ
                                                        γ=
                                                                µ
represent the population coefficient of variation. Determine the asymptotic distribution of
                                                √
                                                           
                                                      sn
                                                  n      −γ ,
                                                      Xn
               Pn                    Pn
where X n = n1 i=1 Xi and s2n = n1 i=1 (Xi − X n )2 .
Solution. Let µ = E(Xi ) and σ 2 = Var(Xi ). The coefficient of variation is γ = σµ . We aim to find the
                          √         
asymptotic distribution of n Xsn − γ . We will also denote the third and fourth central moments as
                                      n
Step 1: Asymptotic Distribution of X n . By the Central Limit Theorem (CLT), for i.i.d. random
variables with finite variance: √           d
                                             → N 0, σ 2 .
                                                        
                                  n Xn − µ −
Step 2: Asymptotic Distribution of s2n .                    We now analyze the asymptotic behavior of s2n . Expanding s2n :
                                                            n
                                                    1X
                                            s2n =         (Xi − X n )2
                                                    n i=1
                                                            n
                                                    1X
                                               =          [(Xi − µ) − (X n − µ)]2
                                                    n i=1
                                                            n
                                                    1X
                                               =          (Xi − µ)2 − (X n − µ)2 .
                                                    n i=1
      where τ 2 = µ4 − σ 4 .
                                                             √                               √
 (ii) The second term, (X n − µ)2 , is of order Op (n−1 ), as n(X n − µ) = Op (1). Therefore, n(X n − µ)2 =
      Op (n−1/2 ) = op (1), which means its contribution becomes negligible in the limit.
Step 3: Joint Asymptotic Distribution. To use the multivariate delta method for the ratio Xsn , we
                                         √                 √                                       n
need the joint asymptotic distribution of n(s2n − σ 2 ) and n(X n − µ). The covariance between these two
quantities is:                          √              √
                                  CoV n(X n − µ), n(s2n − σ 2 ) = µ3 .
                                                                 
                                                        sn
Step 4: Asymptotic Distribution of                      Xn
                                                           .    Applying the multivariate delta method to the function
            √
               x
g(x, y) =     y    at (σ 2 , µ), we compute the partial derivatives:
                                                 ∂g    1                ∂g    σ
                                                    =     ,                = − 2.
                                                 ∂x   2σµ               ∂y    µ
Using the delta method, we find:
                                               √
                                                                    
                                                            sn   σ          d
                                                    n          −            −
                                                                            → N (0, V ),
                                                            Xn   µ
where:
                                                   1             4     σ4    µ3
                                           V =           (µ4 − σ   ) +    −      .
                                                 4σ 2 µ2               µ4   2σµ3
                                                 σ                                               √  sn  
Step 5: Final Result.             Since γ =      µ,    the asymptotic distribution of             n X − γ is:
                                                                                                        n
                                  √                               1 µ4 − σ 4
                                                                                      
                                              sn           d                     4   µ3 γ
                                      n          −γ        −
                                                           → N 0, 2          + γ   −         .
                                              Xn                 µ    4σ 2           2σµ
Question 12. Suppose (Xi , Yi ), for i = 1, . . . , n, is an i.i.d. sample from the bivariate normal distribution:
                                                                          2
                                                         BV N (µX , µY , σX , σY2 , ρ),
where ρ is the population correlation coefficient. Define the sample correlation coefficient as:
                                               Pn
                                                  i=1 (Xi − X)(Yi − Y )
                                 rn = qP                     qP                ,
                                             n             2    n            2
                                             i=1 (Xi − X)       i=1 (Yi − Y )
                              1
                                  Pn                           1
                                                                   Pn
with sample means X =         n       i=1     Xi and Y =       n     i=1    Yi . Show that:
                                                     √                  d
                                                                   → N (0, (1 − ρ2 )2 ).
                                                         n(rn − ρ) −
                                                                                   2
Proof. Let (Xi , Yi ), i = 1, . . . , n, be an i.i.d. sample from BV N (µX , µY , σX , σY2 , ρ). Define:
                          n                                                       n                              n
                      1X                                                     1X                               1X
              SXY =         (Xi − X)(Yi − Y ),                  SXX =              (Xi − X)2 ,       SY Y =         (Yi − Y )2 .
                      n i=1                                                  n i=1                            n i=1
Then:
                                                                             SXY
                                                               rn = √                .
                                                                            SXX SY Y
Step 2: Joint asymptotics of SXY , SXX , SY Y . Applying the multivariate Central Limit Theorem and
delta method:                                           
                                 SXY − ρσX σY               0
                            √               2
                                                 d       
                                SXX − σX  −
                             n                  → N3 0 , Σ ,
                                                          
                                    SY Y − σY2              0
where Σ is given by:                                                                                  
                                                      2 2
                                                     σX σY (1 + ρ2 )               3
                                                                                 ρσX σY     ρσX σY3
                                               1            3                      4
                                                                                                     
                                      Σ=                  ρσX σY                 2σX      2ρ2 σX
                                                                                                2 2 .
                                                                                                  σY 
                                               n
                                                           ρσX σY3           2ρ2 σX
                                                                                  2 2
                                                                                    σY        2σY4
Then:
                                  V = ∇g T Σ∇g
                                                    ρ2   ρ2
                                                                      
                                      1          2            2  2   4
                                    =    (1 + ρ ) +    +    −ρ −ρ +ρ
                                      n             2    2
                                      (1 − ρ2 )2
                                    =            .
                                          n
Determine the joint asymptotic distribution of the first r sample raw moments (M1,n , M2,n , . . . , Mr,n ).
Solution. Each Mk,n is the sample mean of the i.i.d. sequence X1k , . . . , Xnk . Let µk = E[Xik ], the population’s
k-th raw moment.
Because E[Xi2r ] < ∞, all lower-order moments (up to 2r) also exist. In particular, for each 1 ≤ k ≤ r,
Step 1: Define the vector of sample moments.                     Let Mn = (M1,n , M2,n , . . . , Mr,n )⊤ . Then:
                                            n
                                    1X
                               Mn =       Yi ,          where Yi = (Xi , Xi2 , . . . , Xir )⊤ .
                                    n i=1
Step 2: Compute mean vector and covariance matrix. Define µ = (µ1 , µ2 , . . . , µr )⊤ and let the
covariance matrix Σ = (σjk )1≤j,k≤r be:
Step 3: Apply the multivariate CLT.                  By the multivariate central limit theorem:
                                                √                    d
                                                    n(Mn − µ) −
                                                              → Nr (0, Σ).
Conclusion: The joint asymptotic distribution of (M1,n , M2,n , . . . , Mr,n ) is multivariate normal with mean
vector µ and covariance matrix Σ/n.
                                 iid
Question 14. Let X1 , . . . , Xn ∼ N (θ, θ), where θ > 0.
                                                            (xi − θ)2
                                                                     
                                                  1
                                   f (xi ; θ) = √     exp −             ,             θ > 0.
                                                  2πθ           2θ
The likelihood is                                                                          !
                                                                       n
                                                      −n/2         1 X
                                   L(θ; x) = (2πθ)           exp −        (xi − θ)2            .
                                                                   2θ i=1
Taking the log-likelihood:
                                         n          n         1 X 2          nθ
                              ℓ(θ) = −     log(2π) − log(θ) −    xi + nX̄n −    .
                                         2          2         2θ             2
Differentiating:
                                            ∂ℓ   n   1 X 2 n
                                               =− + 2   xi − .
                                            ∂θ   2θ 2θ      2
Setting this to zero and solving:
                                                 ′                     ′         1X 2
                                       θ2 + θ − M2,n = 0,             M2,n =       xi .
                                                                                 n
The positive solution gives the MLE:                             q
                                                          −1 +              ′
                                                                      1 + 4M2,n
                                                θbn =                             .
                                                                      2
Second derivative check confirms it is a maximum.
                                                               n(1 + 2θ)
                                                     I(θ) =              .
                                                                  2θ2
Question 15. Suppose we have k independent samples of sizes n1 , . . . , nk drawn from bivariate normal
distributions:                              !         !!
                                          0     1 ρi
                                    N2        ,
                                          0     ρi 1
for each i = 1, . . . , k. Let ri denote the sample correlation coefficient for the i-th sample. We want to test the
null hypothesis:
                                                H0 : ρ1 = · · · = ρk = ρ.
Define the Fisher z-transform Zi = tanh−1 (ri ) and the weighted average:
                                                   Pk
                                                      i=1 ni Zi
                                               Z̄ = P  k
                                                                .
                                                       i=1 ni
                                           √                d
Proof. From previous results, we know that ni (ri − ρi ) − → N (0, (1 − ρ2i )2 ). Applying the Delta method to
                        −1                                                             −1
the function g(r) = tanh (r), whose derivative is g ′ (r) = 1−r
                                                              1
                                                                2 , we define ζi = tanh   (ρi ). Then:
                                               √                    d
                                                   ni (Zi − ζi ) −
                                                                 → N (0, 1).
Therefore, for large ni , Zi ∼ N (ζi , 1/ni ) approximately and the Zi are independent.
Under H0 , all ζi = ζ = tanh−1 (ρ), so:
                                              Zi ∼ N (ζ, 1/ni )         for all i.
              √
Define Yi =       ni (Zi − ζ), so Yi ∼ N (0, 1). We write:
                                                                          Pk          √
                                                 Yi                           j=1 nj Yj
                                       Zi = ζ + √ ,        Z̄ = ζ +           Pk               .
                                                  ni                            j=1 nj
Then:                                                               Pk        √
                                                      Yi                j=1 nj Yj
                                           Zi − Z̄ = √ −                Pk                 .
                                                       ni                 j=1 nj
The test statistic becomes:
                                                 k
                                                               P
                                                                    k     √           2
                                                 X                  i=1       ni Yi
                                           T =         Yi2 −        Pk                     .
                                                 i=1                    i=1   ni
                                                               √            √
Let Y = (Y1 , . . . , Yk )T ∼ Nk (0, Ik ), and define a = √1N ( n1 , . . . , nk )T , where N =   ni . Note ∥a∥2 = 1.
                                                                                               P
Then:
                                                 T = YT (Ik − aaT )Y.
Define Q = Ik − aaT . Then Q is symmetric and idempotent:
                                                         Q2 = Q.
Hence, T = YT QY ∼ χ2rank(Q) . Since tr(Q) = k − tr(aaT ) = k − 1,
                                                           d
                                                          → χ2k−1 .
                                                        T −
                                                                                                                      ■
Question 16. Let X1 , . . . , Xn be an i.i.d. sample from N (0, 1) distribution with sample size n = 5. Consider
the function g(x) = ex . We are interested in the distribution of the statistic
                                           √
                                             n(g(X n ) − g(0)) √ X n
                                   Yn =                       = n(e − 1).
                                                 g ′ (0)
The standard delta method approximates the cumulative distribution function (CDF) of Yn by F1 (z) = Φ(z),
where Φ is the standard normal CDF. A two-term delta√method can√provide a refined  √ approximation, let’s
call it F2 (z). The exact CDF is given by FExact (z) = Φ( n ln(1 + z/ n)) for z > − n.
Compare the accuracy of the standard delta method and the two-term delta method approximation (derived
in the solution) against the exact distribution for this specific case (n = 5, g(x) = ex , underlying N (0, 1)),
using the table provided in the solution.
Solution.
Derivation of the Two-Term Delta Method Approximation √     Let Tn = X n . Since Xi ∼ N (0, 1), we
                                                            √
have µ = E[Tn ] = 0 and σn2 = Var(Tn ) = 1/n. Let Zn = nTn = nX n , so Zn ∼ N (0, 1). We expand
g(Tn ) = eTn around µ = 0:
                                                              1
                                   g(Tn ) = g(0) + g ′ (0)Tn + g ′′ (0)Tn2 + Op (n−3/2 )
                                                              2
Since g(x) = ex , g ′ (x) = ex , g ′′ (x) = ex , we have g(0) = 1, g ′ (0) = 1, g ′′ (0) = 1.
                                                               1         2
                                       eX n = 1 + 1 · X n +      · 1 · X n + Op (n−3/2 )
                                                               2
                                                  Zn Z2
                                            = 1 + √ + n + Op (n−3/2 )
                                                   n 2n
                                                                2                                            2
The two-term approximation considers Yn ≈ Z + 2Z√n , where Z ∼ N (0, 1). Let h(Z) = Z + 2Z√n . We
approximate P(Yn ≤ z) by P(h(Z) ≤ z). The quadratic equation h(Z) = z, or 2√1 n Z 2 + Z − z = 0, has
solutions
                                              p         √
                                          −1 ± 1 + 2z/ n
                                   Z1,2 =        √
                                               1/ n
                                                    s         !
                                          √               2z
                                        = n −1 ± 1 + √
                                                            n
                               √                 √
These
 √ solutions are             √ n ≥ 0, i.e., z ≥ − n/2. The function h(Z) has a minimum value of
              √ real if 1 + 2z/
− n/2 at Z = − n. For z > − n/2, the event h(Z) ≤ z is equivalent to Z1 ≤ Z ≤ Z2 , where
                                       √        q         √
                                 Z1 = n(−1 − 1 + 2z/ n)
                                       √        q         √
                                 Z2 = n(−1 + 1 + 2z/ n)
                          √
Since X n ∼ N (0, 1/n),       nX n ∼ N (0, 1).
                                                     √        √         √
                                      FExact (z) = P( nX n ≤ n ln(1 + z/ n))
                                                     √           √
                                                 = Φ( n ln(1 + z/ n))
          √                  √
for 1 + z/ n > 0, i.e., z > − n.
Comparison Table The following table compares the CDF values F1 (z), F2 (z), and FExact (z) for n = 5
at various values of z.
z Delta method (F1 (z)) Two term delta method (F2 (z)) Exact (FExact (z))
                                                                              √
Analysis of Accuracy          The table provides values for the CDF of Y5 =       5(eX 5 −1) using the three methods.
   • Standard Delta Method (F1 (z) = Φ(z)): This normal approximation performs adequately near
     z = 0 but deteriorates in the tails. It systematically overestimates the true CDF at both extremes,
     with substantial errors at z = −1.6 (F1 ≈ 0.0548 vs Exact 0.0025) and z = 1.6 (F1 ≈ 0.9452 vs Exact
     0.8863).
                                                                                                      2
   • Two-Term Delta Method (F2 (z)): This refined approximation based on Yn ≈ Z + 2Z√n significantly
     improves accuracy throughout. For z < −1.118, it correctly gives F2 (z) = 0, and near z = 0, its
     accuracy is remarkable (at z = −0.2, F2 = 0.4169 vs Exact 0.4170). In both tails, F2 (z) consistently
     outperforms F1 (z) in approximating the exact CDF.
   • Exact CDF (FExact (z)): The distribution of Yn stems from the log-normal distribution and exhibits
     right-skewness, explaining why the symmetric normal approximation performs poorly in the tails.
Conclusion For this small sample case (n = 5), the standard delta method provides only a rough approx-
imation, particularly in the tails where the exponential transformation introduces skewness. The two-term
delta method, by incorporating the quadratic term from the Taylor expansion, captures this asymmetry better
and delivers a substantially more accurate approximation across all quantiles. This demonstrates the value
of higher-order approximations for non-linear transformations with small samples.
Question 17. Let X1 , . . . , Xn be i.i.d. random variables with distribution function F , possessing a density
f . Define the population p-th quantile as ξp = F −1 (p), where 0 < p < 1. Suppose that f (ξp ) > 0 and that f
is continuous near ξp . Let Yp,n denote the sample p-th quantile based on the sample X1 , . . . , Xn .
                                                                                                   √
Provide Bahadur’s representation of Yp,n and use it to determine the asymptotic distribution of n(Yp,n −ξp ).
Solution.
Bahadur’s Representation Bahadur’s representation offers a linear approximation for the sample quantile
Yp,n under regularity conditions. Specifically:
                                                            p − Fn (ξp )
                                           Yp,n = ξp +                   + Rn ,
                                                               f (ξp )
Let Zi = 1(Xi ≤ ξp ). Then Z1 , . . . , Zn are i.i.d. Bernoulli with E[Zi ] = p and Var(Zi ) = p(1 − p). Hence,
                                            n
                                         1X        √                d
                            Fn (ξp ) =         Zi ⇒ n(Fn (ξp ) − p) −
                                                                    → N (0, p(1 − p))
                                         n i=1
                                                   σk2
                                                max    →0         as n → ∞,
                                             1≤k≤n s2
                                                    n
Proof that the condition implies UAN. The Lindeberg-Feller condition states that for all ϵ > 0,
                              n                                
                           1 X               2     |Xk − µk |
                                E  (Xk − µk )  1              > ϵ   →0                    as n → ∞.
                          s2n                         sn
                               k=1
Therefore,
                          n                  n                             
                     1 X 2         σk      1 X               2    |Xk − µk |
                        ·   σk 1      >ϵ ≤ 2 ·   E (Xk − µk ) 1              >ϵ .
                    s2n            sn     sn                         sn
                         k=1                               k=1
                                                     σk2
                                               max       →0             as n → ∞.                                                ■
                                               1≤k≤n s2
                                                      n
Proof that the condition implies asymptotic normality. To show asymptotic normality, we use the character-
istic function of Zn . The characteristic function of Zn is given by
                                                                  n                    
                                                                  Y              Xk − µk
                               ϕZn (t) = E [exp (itZn )] =              E exp it            .
                                                                                   sn
                                                                  k=1
                                                            t2 σk2
                                                                   2
                                         Xk − µk                       σk
                              E exp it               =1−           +o      .
                                            sn              2 s2n      s2n
Exponentiating, we get                                        2
                                                                t
                                                ϕZn (t) → exp −   ,
                                                                2
which is the characteristic function of the standard normal distribution. By the Lévy continuity theorem,
   d
Zn −
   → N (0, 1), which implies
                                      P (Zn ≤ z) → Φ(z) for all z ∈ R.                                  ■
Question 19. Show that under the uniform asymptotic negligibility (UAN) condition:
                                                 σk2
                                           max       →0           as n → ∞,
                                           1≤k≤n s2
                                                  n
we have
                                  max P (Ynk > ϵ) → 0                   for any ϵ > 0,
                                  1≤k≤n
where
                                                         n
                                 Xk − µk                 X
                         Ynk =           ,       s2n =         σk2 ,     and σk2 = Var(Xk ).
                                   sn
                                                         k=1
Proof. Let {Xk }nk=1 be independent random variables with E(Xk ) = µk and Var(Xk ) = σk2 . Define the
standardized random variables
                                                               n
                                        Xk − µk               X
                                  Ynk =         , where s2n =    σk2 .
                                          sn
                                                                               k=1
                                                                      σ2
                                                       
                                            Xk − µk        Var(Xk )
                          P (Ynk > ϵ) = P             >ϵ ≤   2 2
                                                                    = 2 k2 .
                                               sn           ϵ sn     ϵ sn
                                                                       1        σk2
                                     max P (Ynk > ϵ) ≤                     max      .
                                    1≤k≤n                              ϵ2 1≤k≤n s2n
                                       2
                                     σk
Under the UAN condition, max1≤k≤n    s2n   → 0 as n → ∞. Therefore,
                                  p
Question 20. Show that Xn −
                          → X if and only if Xn are Cauchy in probability, i.e.,
                                      ∀ ϵ > 0,      lim P (|Xn − Xm | > ϵ) = 0.
                                                  n,m→∞
Lemma 20.1. If a sequence of random variables {Xn } is Cauchy in probability, then there exists a subsequence
                                              a.s.
{Xnk } and a random variable X such that Xnk −−→ X as k → ∞.
Proof of Lemma. Since {Xn } is Cauchy in probability, for each integer k ≥ 1, there exists an integer nk such
that for all n, m ≥ nk ,
                                     P |Xn − Xm | > 2−k < 2−k .
                                                           
We can choose the sequence {nk } to be strictly increasing, i.e., n1 < n2 < . . .. Consider the subsequence
{Xnk }. We have
                                       P Xnk+1 − Xnk > 2−k < 2−k .
                                                              
                                            P∞             P∞
Let Ek = Xnk+1 − Xnk > 2−k . Then k=1 P(Ek ) < k=1 2−k = 1 < ∞. By the first Borel-Cantelli
          
lemma, P (lim supk→∞ Ek ) = 0. This means that for almost every ω, there exists K(ω) such that for all
k ≥ K(ω), ω ∈/ Ek , i.e., Xnk+1 (ω) − Xnk (ω) ≤ 2−k . For k ≥ K(ω) and any p ≥ 1,
                                          k+p−1
                                           X                                k+p−1
                                                                             X              ∞
                                                                                            X
                  Xnk+p (ω) − Xnk (ω) ≤           Xnj+1 (ω) − Xnj (ω) ≤             2−j <         2−j = 2−k+1 .
                                           j=k                               j=k            j=k
This shows that for almost every ω, the sequence {Xnk (ω)} is a Cauchy sequence of real numbers. Therefore,
it converges to a limit, say X(ω). Define X(ω) = limk→∞ Xnk (ω) for ω in the set of convergence (which
has probability 1), and X(ω) = 0 otherwise. X is a measurable function, hence a random variable. This
                                           a.s.
establishes almost sure convergence: Xnk −−→ X.                                                          ■
                                  p
Proof of ( =⇒ ). Assume Xn −
                           → X. By definition, for any ϵ > 0,
                                          P (|Xn − X| > ϵ) → 0      as n → ∞.
For any n, m ≥ N (for sufficiently large N ), we have
                           P (|Xn − Xm | > ϵ) ≤ P (|Xn − X| > ϵ/2) + P (|Xm − X| > ϵ/2) .
              p
Since Xn −
         → X, both terms on the right-hand side tend to 0 as n, m → ∞. Hence,
                                             lim P (|Xn − Xm | > ϵ) = 0.
                                            n,m→∞
Fix ϵ > 0. Since Xn are Cauchy in probability, there exists N such that for all n, m ≥ N ,
                                                  P (|Xn − Xm | > ϵ) < ϵ.
                                                                                                               a.s.
By Lemma 20.1, there exists a subsequence {Xnk } and a random variable X such that Xnk −−→ X. Since
                                                                         p
almost sure convergence implies convergence in probability, we have Xnk −
                                                                        → X. Now, for any ϵ > 0, we have
                                                           ϵ                    ϵ
                       P (|Xn − X| > ϵ) ≤ P |Xn − Xnk | >      + P |Xnk − X| >        .
                                                            2                      2
                               p
                               → X, we can choose K large enough so that for all k > K, P |Xnk − X| > 2ϵ <
                                                                                                        
For the second term, since Xnk −
ϵ
2.
Since Xn is Cauchy in probability, for any fixed n > N , we can choose k large enough such that nk > N and
P |Xn − Xnk | > 2ϵ < 2ϵ .
                                                                                                           p
Combining these bounds, we get P (|Xn − X| > ϵ) < ϵ for all n sufficiently large. Thus, Xn −
                                                                                           → X.                           ■
                         d
Question 21. (a) Let Fbn −
                         → F , where Fbn and F are distribution functions of Xn and X, respectively. If
xn → x, xn ∈ CF , then
                              Fbn (xn ) → F (x) and Fbn (x−         −
                                                          n ) → F (x ).
(b) If xn → x, then                                               
                                    lim       Fbn (xn ) − Fbn (x−                 −
                                                                n ) ≤ F (x) − F (x ).
                                    n→∞
                        d
Proof of (a). Since Fbn −
                        → F , we know that Fbn (y) → F (y) for all y ∈ CF , where CF is the set of continuity
points of F .
For the convergence of Fbn (xn ) to F (x), we use the given condition that xn ∈ CF for all n. For any ϵ > 0, by
the continuity of F at each xn , we can find δn > 0 such that |F (y) − F (xn )| < ϵ/2 whenever |y − xn | < δn .
Since xn → x, for sufficiently large n, |xn − x| < min(δn , ϵ/2). Thus:
                d
Also, since Fbn −
                → F and xn ∈ CF , for sufficiently large n:
|Fbn (xn ) − F (x)| ≤ |Fbn (xn ) − F (xn )| + |F (xn ) − F (x)| < ϵ/2 + ϵ/2 = ϵ
Therefore, Fbn (xn ) → F (x). A similar argument applies for showing Fbn (x−         −
                                                                           n ) → F (x ).                           ■
Proof of (b). Let us denote ∆n = Fbn (xn ) − Fbn (x−   n ), which represents the probability mass at point xn
according to the distribution Fn .
                              b
For any y < x, since xn → x, there exists N such that for all n ≥ N , y < xn < x + ϵ for any ϵ > 0. Since Fbn
is non-decreasing:
                                   Fbn (y) ≤ Fbn (x−
                                                   n ) ≤ Fn (xn ) ≤ Fn (x + ϵ)
                                                          b         b
Therefore:
                          lim sup ∆n = lim sup(Fbn (xn ) − Fbn (x−                  −
                                                                 n )) ≤ F (x) − F (x )
                              n→∞              n→∞
                                                   Yn = Xn bn + an bn .
                               d
By Slutsky’s theorem, Xn bn −
                            → X because bn → 1. Also, an bn → 0 because an → 0 and bn → 1. Therefore,
the sum
                                                          d
                                      Yn = Xn bn + an bn −→ X.                                     ■
Question 22. Let Xn − Yn = op (1) and Xn = Op (1). Show that for a continuous function g : R → R,
P (|Xn − Yn | > ϵ) → 0 as n → ∞.
From Xn − Yn = op (1) and Xn = Op (1), we can deduce that Yn = Op (1) as well. This is because Yn =
Xn − (Xn − Yn ), and both terms on the right are stochastically bounded.
Now, let g : R → R be a continuous function. We want to show that g(Xn ) − g(Yn ) = op (1).
Given any ϵ > 0, we need to show that P (|g(Xn ) − g(Yn )| > ϵ) → 0 as n → ∞.
Since Xn = Op (1) and Yn = Op (1), for any η > 0, there exists M > 0 such that:
                                                    η                          η
                                 P (|Xn | > M ) <       and P (|Yn | > M ) <
                                                    2                          2
for all n.
Consider the compact interval [−M, M ]. Since g is continuous on R, it is uniformly continuous on [−M, M ].
This means for any ϵ > 0, there exists δ > 0 such that if x, y ∈ [−M, M ] and |x−y| < δ, then |g(x)−g(y)| < ϵ.
Now, consider the probability:
                                              P (|g(Xn ) − g(Yn )| > ϵ)
By uniform continuity of g on [−M, M ], if |Xn − Yn | < δ and both Xn and Yn are in [−M, M ], then
|g(Xn ) − g(Yn )| < ϵ. Therefore:
                                    p
Question 23. Prove that Xn −
                           → X if and only if
                                               
                                     |Xn − X|
                              E                   →0                            as n → ∞.
                                   1 + |Xn − X|
                                    p
Proof of ( =⇒ ). Assume Xn −
                           → X. By definition,
                                               for any ϵ > 0, P (|Xn − X| > ϵ) → 0 as n → ∞. Let
                                   Yn                       y
Yn = |Xn − X|. We want to show E 1+Yn → 0. Let f (y) = 1+y     for y ≥ 0. This function is continuous,
                                                                            p                   p
increasing, 0 ̸= f (y) < 1, and f (y) → 0 as y → 0. Since Xn −→ X, we have Yn −
                                                                              → 0. Because f is continuous,
                                                p
by the Continuous Mapping Theorem, f (Yn ) −    → f (0) = 0. That is,
                                                       |Xn − X|   p
                                                                  −
                                                                  → 0.
                                                     1 + |Xn − X|
               |Xn −X|
Let Zn =      1+|Xn −X| .   We have 0 ≤ Zn < 1. Since Zn is bounded by the integrable random variable 1, and
    p
Zn −
   → 0, by the Dominated Convergence Theorem for convergence in probability (sometimes called Pratt’s
lemma or bounded convergence theorem), we have
                                                            
                                                  |Xn − X|
                                  E [Zn ] = E                  → 0.
                                                1 + |Xn − X|
                   
          |Xn −X|
Thus, E 1+|X  n −X|
                      → 0.                                                                         ■
Proof of ( ⇐= ). Assume                                        
                                                   |Xn − X|
                                         E                           →0         as n → ∞.
                                                 1 + |Xn − X|
                                 Yn
Let Yn = |Xn − X| and Zn = 1+Y      n
                                      . We are given E [Zn ] → 0. We want to show that for any ϵ > 0,
                                                                                        y
P (Yn > ϵ) → 0. Consider the event {Yn > ϵ}. On this event, since the function f (y) = 1+y is increasing for
y ≥ 0, we have
                                                  Yn        ϵ
                                          Zn =         >       .
                                                1 + Yn   1+ϵ
           ϵ
Let δ =   1+ϵ   > 0. Then the event {Yn > ϵ} is a subset of the event {Zn > δ}. Therefore,
                                                                         E [Zn ]
                                                    P (Zn > δ) ≤                 .
                                                                            δ
Combining these, we get
                                                                                                              
                                                              E [Zn ]   1+ϵ                       |Xn − X|
                 P (|Xn − X| > ϵ) = P (Yn > ϵ) ≤ P (Zn > δ) ≤         =     E                                      .
                                                                 δ       ϵ                      1 + |Xn − X|
                                                                     p
This holds for any ϵ > 0, which is the definition of Xn −
                                                        → X.                                                               ■
Question 24. Show that
           p                            a.s.
 (a) Xn −
        → X may not imply Xn −−→ X.
           p                                Lr
 (b) Xn −
        → X may not imply Xn −−→ X for r ≥ 1.
Solution of (a). Consider the probability space (Ω, F, P) = ([0, 1], B ([0, 1]) , λ), where λ is the Lebesgue
measure on the Borel σ-algebra B ([0, 1]).
Define a sequence of intervals Ink for n = 1, 2, . . . and k = 1, . . . , n as Ink = k−1           k
                                                                                                    
                                                                                               n , n . Define a single sequence
of random variables Xm by ordering the indicator functions 1Ink first by n, then by k. That is, the sequence
of indicators is 1I11 , 1I21 , 1I22 , 1I31 , 1I32 , 1I33 , . . .. Let m be the index in this sequence, corresponding to a pair
(n, k). Define Xm (ω) = 1Ink (ω). Let X = 0 (the constant zero random variable).
                p
We show Xm −
           → 0. For any ϵ ∈ (0, 1), if Xm = 1Ink , then
                                                                                                      k   k−1  1
                    P (|Xm − 0| > ϵ) = P (Xm = 1) = P (ω ∈ Ink ) = λ (Ink ) =                           −     = .
                                                                                                      n    n   n
                                                                                                      1                    p
As m → ∞, the corresponding n must also go to ∞. Thus, P (|Xm | > ϵ) =                                n   → 0. So, Xm −
                                                                                                                      → 0.
However, Xm does not converge almost surely to 0. For any ω ∈ [0, 1], the sequence Xm (ω) contains infinitely
many 1s. Specifically, for each n, there is exactly one k ∈ {1, . . . , n} such that ω ∈ Ink (ignoring endpoints,
which form a set of measure zero). For the index m corresponding to this (n, k), Xm (ω) = 1. Since this
occurs for every n, Xm (ω) = 1 infinitely often.
Therefore, the sequence Xm (ω) does not converge to 0 for any ω ∈ [0, 1]. The set {ω : limm→∞ Xm (ω) = 0}
                                                                                                     a.s.
is empty (or has measure zero if endpoints are considered), so its probability is 0 ̸= 1. Hence, Xm −−̸ → 0.
Solution of (b). Consider the same probability space (Ω, F, P) = ([0, 1], B ([0, 1]) , λ).
Define the sequence of random variables Xn for n = 1, 2, . . . as:
                                                (
                                                  n if ω ∈ 0, n1
                                                                  
                                       Xn (ω) =
                                                  0 if ω ∈ n1 , 1
                                                                   
Let X = 0.
                p
We show Xn −
           → 0. For any ϵ > 0, choose an integer N such that N > ϵ. Then for all n ≥ N ,
                                                                             
                                                                  1              1       1
          P (|Xn − 0| > ϵ) = P (Xn > ϵ) = P (Xn = n) = P ω ∈ 0,        = λ 0,         = .
                                                                  n              n       n
                                  1                      p
As n → ∞, P (|Xn | > ϵ) =         n    → 0. So, Xn −
                                                   → 0.
Now consider convergence in Lr for r ≥ 1. We compute the r-th moment:
                                                          Z 1
                                         r           r                r
                             E [|Xn − 0| ] = E [|Xn | ] =     |Xn (ω)| dλ(ω)
                                                                                0
                             Z   1/n             Z   1                                
                                                                                    1                     1
                         =             nr dω +           0r dω = nr · λ        0,            + 0 = nr ·     = nr−1 .
                             0                   1/n                                n                     n
Since r ≥ 1, we have r − 1 ≥ 0. If r = 1, E [|Xn |] = n1−1 = n0 = 1. The limit as n → ∞ is 1 ̸= 0. If r > 1,
        r
E [|Xn | ] = nr−1 . Since r − 1 > 0, limn→∞ nr−1 = ∞ = ̸ 0.
                                       r                            Lr
In both cases, limn→∞ E [|Xn | ] ̸= 0. Therefore, Xn −−
                                                      ̸ → 0 for any r ≥ 1.