University of Geneva                                                                 GSEM
Statistics I                                                                       Fall 2017
Prof. Eva Cantoni                                                                Practical 5
                      Point Estimation - Sampling distributions
Goals: This practical has two objectives. First, we shall review some notions of useful in
point estimation theory. Second, use R to simulate the sampling distribution of the median.
1     Theoretical Exercices
1.1    Exercise 1
Which of the following are random variables?
    1. Population mean.
    2. Population size.
    3. Sample size.
    4. Sample mean.
    5. Variance of the sample mean.
    6. Maximum of the sample.
    7. Variance of the population.
    8. Estimated variance of the sample mean.
1.2    Exercice 2
Suppose we are interested in estimating the proportion of households living below the poverty
line in a given swiss canton. For this purpose, a random sample of households is drawn from
the population.
    1. What type of computations should we carry on the sample in order to answer our
       question? Formalize these steps by proposing an estimator and specify its distribution.
    2. If the proportion of households living below the poverty line is equal to 0.15 at the
       population level, what should the sample size be so that the standard deviation of the
       estimator we defined at point 1. is (at the most) equal to 0.02?
                                              1
University of Geneva                                                                      GSEM
Statistics I                                                                            Fall 2017
Prof. Eva Cantoni                                                                     Practical 5
1.3    Exercice 3
The sample mean X̄ of a sample of size n is used to estimate the population mean µ. We
would like to find n such that the absolute error of estimation |X̄ − µ| is (at the most) equal
to a fixed value d with a (large) probability 1 − α (α given). Assume we have a sequence of
random variables X1 , . . . , Xn independent and identically distributed (i.i.d.) drawn from a
N (µ, 4) distribution.
    1. Is there a value of n for any value of α?
    2. Find n if α = 0.05 and d = 1.
2     R simulation
2.1    Distributions in R
R has built-in functions to evaluate quantities associated with many common probability
distributions. You can compute values of the cumulative distribution function (cdf) using
functions with prefix “p", quantiles using the prefix “q". Moreover, we can evaluate the
probability density functions (pdf for continuous distributions) or probability mass functions
(pmf for discrete distributions) with the prefix “d" and randomly generate observations
drawn from given distributions using the prefix “r".
The following table summarizes the available functions for some common probability distri-
butions.
                               F (q) = P (X 6 q)       F −1 (p) = Q(p)   f (x) or P (X = x)     Simulation
 Uniform         U(a, b)          punif(q,a,b)          qunif(p,a,b)        dunif(x,a,b)       runif(n,a,b)
 Normal          N (µ, σ 2 )     pnorm(q,µ, σ)          qnorm(p,µ, σ)       dnorm(x,µ, σ)      rnorm(n,µ, σ)
 Binomial        B(m, p)        pbinom(q,m,p)          qbinom(p,m,p)       dbinom(x,m,p)      rbinom(n,m,p)
 Poisson         P(λ)              ppois(q,λ)            qpois(p,λ)          dpois(q,λ)         rpois(n,λ)
 Exponential     E(λ)               pexp(q,λ)             qexp(p,λ)           dexp(x,λ)          rexp(n,λ)
 Chi-square      χ2df             pchisq(q,df)          qchisq(p,df)        dchisq(x,df)       rchisq(n,df)
 Student         Stdf                pt(q,df)              qt(p,df)            dt(x,df)           rt(n,df)
 Fisher          Fdf 1,df 2     pf(q, df1, df2)        qf(p, df1, df2)     df(p, df1, df2)    rf(n, df1, df2)
where :
    • p : the value of probability at which evaluate a quantile;
    • q : the quantile at which evaluate a lower tail probability.;
    • x : a value at which evaluate the density or probability;
    • n : the size of the sample to simulate.
                                                   2
University of Geneva                                                                GSEM
Statistics I                                                                      Fall 2017
Prof. Eva Cantoni                                                               Practical 5
The R implementations of these probability distributions employ the following parameters
(listed with their values by default):
   • a et b (min=0 and max=1): the parameters of the Uniform distribution (beginning and
     end of the interval);
   • µ et σ (mean=0 and sd=1): mean and standard deviation parameters of the Normal
     distribution;
   • m et p (size and prob): Number of trials and probability of success for a binomial
     distribution.
   • λ (lambda or rate = 1): Rate parameter for a Poisson and exponential distribution.
   • df (df) : Degrees of freedom for the Student and Chi-square distributions.
Consider the following set of applications of the above:
  1. For X ∼ N (10, 4) compute P (X < 12) and P (10 < X < 12).
  2. Compute the 90% quantile of a E(1/2) distribution.
  3. For X ∼ P(4), compute P (X = 2) and P (X 6 2).
  4. Generate a random sample of size 200 from a N (1.5, 4) distribution by :
        norm . sample      = rnorm ( n = 200 , mean         = 1.5 , sd = 2)
      Draw a kernel density plot of the sample (see Practical 3). Add the probability density
      function of a N (1.5, 4) distribution on the same plot with :
        sorted . norm . sample = sort ( norm . sample )
        lines ( sorted . norm . sample , dnorm ( sorted . norm . sample ,
                              mean = 1.5 , sd = 2) , col = ' ' red ' ')
         # Try lines(norm.sample, dnorm(norm.sample, mean = 1.5, sd = 2),
         # col=”red”) instead of the above. What happens ?
2.2   Sampling distribution of the median
As an example, we shall study the behavior of the sampling distribution of the median
of a sample, median(X1 , . . . , Xn ), when the observations are drawn from an exponential
                             iid
distribution, i.e. X1 , . . . , Xn ∼ E(λ). Remember that, with the parametrization used by R,
the cumulative distribution function of such a random variable is :
                                   F (x) = 1 − exp(−λx)
                                             3
University of Geneva                                                                GSEM
Statistics I                                                                      Fall 2017
Prof. Eva Cantoni                                                               Practical 5
  1. Give the population mean E(X) and population median m(λ) of a random variable
     X ∼ E(λ).
  2. To look at the sampling distribution of the median when λ = 1/2 and sample size is
     100, we simulate 500 samples:
       Exp . median = numeric (500)
       # prepare a vector of size 500 to store the results
       for ( i in 1:500)
         {
           Exp . median [ i ] = median ( rexp (100 , rate =            1 / 2 ))
           # store results at each iteration
           }
     Based on the above simulated object Exp.median:
      (a) Explore graphically the sampling distribution of the median (with histograms,
          boxplots etc).
      (b) Can the sampling distribution be considered normal? Check graphically with the
          appropriate tool.
      (c) Around which value is the sampling distribution of median(X1 , . . . , Xn ) concen-
          trated?
     Redo steps 1.-4. above for the 90%-quantile.