Statistical Estimation
Sampling Distribution
Distribution of all possible values of a statistic computed from samples of
the same size randomly selected from the same population.
Due to random variation different samples from the same population will
have different sample means.
If we repeatedly take sample of the same size n from a population, the
means of the samples form a sampling distribution of means of size n.
Serves to answer probability questions about sample statistics.
         A. Sampling distribution of sample mean
• Suppose we have a population of size N=4, constituting the ages
  of four outpatients.
    x, Age (years): 18, 20, 22, 24
                         μ
                              x   i
                               N
                              18  20  22  24
                                                21
                                      4
                         σ
                                 i
                                 (x  μ) 2
                                              2.236
                                       N
Now consider all possible samples of size
                   n=2
   1st      2nd Observation          1st 2nd Observation
   Obs   18    20    22     24       Obs 18 20 22 24
    18 18,18 18,20 18,22 18,24       18 18 19 20 21
    20 20,18 20,20 20,22 20,24       20 19 20 21 22
    22 22,18 22,20 22,22 22,24       22 20 21 22 23
    24 24,18 24,20 24,22 24,24       24 21 22 23 24
 • 16 possible samples               • 16 Sample Means
   (with replacement)
Sample means   Freq   P(   )
18             1      0.0625
19             2      0.1250
20             3      0.1875
21             4      0.2500
22             3      0.1875
23             2      0.1250
24             1      0.0625
     Sampling distribution of all sample means
16 Sample Means                      Sample Means
                                      Distribution
1st 2nd Observation
Obs 18 20 22 24           P(x)
                          .3
18 18 19 20 21
                          .2
20 19 20 21 22
                          .1
22 20 21 22 23
                           0                            _
24 21 22 23 24                   18 19 20 21 22 23 24   x
         Summary measures of this sampling distribution:
            Add the 16 sample means & divide by 16.
            Also calculate the SD of the sample means.
μx   
        x     i
                   
                     18  19  21    24
                                            21
           N                  16
σx 
             i x
             (x  μ ) 2
                    N
            (18 - 21)2  (19 - 21)2    (24 - 21)2
                                                     1.58
                               16
Properties
1. The mean of the sampling distribution of means is the same as the
   population mean, μ .
2. The SD of the sampling distribution of means is σ / √n .
3. The shape of the sampling distribution of means is approximately a
   normal curve, regardless of the shape of the population distribution
   and provided n is large enough (Central limit theorem).
 In practice, the approximation is a workable one if n is 30 or more.
              Sampling Distribution of proportion
The sampling distribution of the sample proportion p posses the
  following properties.
 The mean of sampling distribution of proportion p is equal to the
  population proportion P.
 The standard deviation of sampling distribution p is = √P(1-P) /n
  (called the standard error of the proportion).
 Provided n is large enough the shape of the sampling distribution
  of p is normal.
                                                                  9
                      Statistical Inference
 Statistical inference includes (methods of making inference)
     1. Estimation
     2. Hypothesis testing
Sample statistic   Population parameter
                       Statistical Estimation
Estimation: is the process of determining a likely value for a
variable in the population based on information collected from the
sample.
The use of sample statistics to estimate population parameters.
E.g.
    Estimates for the proportion of smokers among all people aged 15 to 24
   in the population
   The mean level of a certain enzyme among healthy men.
                       Point Estimation
A single numerical value is used to estimate the corresponding
population parameter
     is an estimator of the population mean μ
 S is an estimator of the population standard deviation σ
 p is an estimator of the population proportion π
                        Point estimation…
 From a single sample we can calculate a sample statistic to estimate a
   single parameter (a point estimate).
 Point estimate for population mean µ is
                                             n
                                                    xi
                                     x =    i =1
                                                 n
 Point estimate for population proportion is given by
                                   x
                                 p=
                                    n
 Where x is the total number of success (events)
                                                                       15
                          Interval estimation
 Interval estimation: is a statement that a population parameter has a value
   lying between two specified limits.
 The value of the sample statistic will vary from sample to sample therefore to
   simply obtain an estimate of the single value of the parameter is not generally
   acceptable.
 We need to take into account the sample to sample variation of the statistic.
 A confidence interval defines an interval within which the true population
   parameter is like to fall (interval estimate).
                        Confidence interval ……
A (1-α) 100% confidence interval for unknown population mean
and population proportion is given as follows;
                          
 [ x  z .    , x  z .    ]          for estimating mean
          2   n         2   n
if  is unknown, it can be estimated by s.
                               
 [ p  z  . p (1  p ) / n , p  z  . p (1  p ) / n ]   for estimating proportion
           2                          2
                                                                                  17
 The 95% confidence interval is interpreted in such a way that,
  under the conditions assumed for underlying distribution, you are
  95% confident that the interval contains the true parameter.
 90% CI is narrower than 95% CI since we are only 90% certain
  that the interval includes the population parameter.
 The 99% CI is wider than 95% CI; the extra width meaning that
  we can be more certain that the interval will contain the
  population parameter.
 But to obtain a higher confidence from the same sample, we must be
   willing to accept a larger margin of error (a wider interval).
 For a given confidence level (i.e. 90%, 95%, 99%) the width of the
   confidence interval depends on the standard error of the estimate which
   in turn depends on the:
   1.   Sample size:-The larger the sample size, the narrower the confidence
        interval and the more precise our estimate.
 Lack of precision means in repeated sampling the values of the sample
   statistic are spread out or scattered.
 The result of sampling is not repeatable.
 You can make the precision as high as you want by taking a large
  enough sample.
 The margin of error decreases as√n increases.
   2.   Standard deviation:-The more the variation among the individual
        values, the wider the confidence interval and the less precise the estimate.
       3. C.I. for a population proportion (large
                       sample size)
• A 100(1‐α)% C.I. for π is
Example:
A study on dental health practice. Of 300 adults interviewed, 123
  said that they regularly had a dental check‐up twice a year.
  What is the 95% C.I. for π?
 P = 123/300 = 0.41 a point estimator of π.
 α = 0.05 ⇒ Z0.025 = 1.96
Example2: An epidemiologist is worried about the ever increasing
  trend of malaria in a certain locality and wants to estimate the
  proportion of persons infected in the peak malaria transmission
  period.
• If he takes a random sample of 150 persons in that locality
  during the peak transmission period and finds that 60 of them are
  positive for malaria,
Find: a) 95%    b) 90%        c) 99% confidence intervals
for the proportion of the whole infected people in that locality
  during the peak malaria transmission period.
Solution:
Sample proportion = 60 / 150 =0 .4
a) A 95% C.I for the population proportion (the proportion of the
  whole infected people in that locality) = 0.4 ± 1.96 (.04) = (0.4
  ± .078) = (0.322, 0.478).
b) A 90% C.I for the population proportion ( the proportion of the
  whole infected people in that locality) = 0.4 ± 1.64 (.04) = (.4
  ± .066) = (.334, .466).
c) A 99% C.I for the population proportion (the proportion of the
  whole infected people in that locality) = .4 ± 2.58 (.04) = (0.4
  ± .103) = (.297, .503).
4. C.I. for the difference between two population
  proportions (large sample size)
A 100(1‐α)% C.I. for π1 ‐ π2 is
Example
 Two hundred patients suffering from a certain disease were randomly
  divided into two equal groups. Of the first group, who received the
  standard treatment, 78 recovered within three days. Out of the other
  100, who were treated by a new method, 90 recovered within three
  days. The physician wished to estimate the true difference in the
  proportions who would recovered within three days.
Solution:
The estimate of the difference in the population proportions is
P1 – P2 = 0.78 – 0.90 = ‐0.12
• The 95% C.I. Is
• we are 95% sure that the difference is between -0.22 and –0.02.
 Note: that the negative signs merely reflect the fact that better
  results were obtained by using the new treatment.