HCMC University of Technology
Probability and
Dung Nguyen                        Statistics
                                 Confidence Intervals
    Outline I
1   Point estimation and Interval estimation
2   Confidence Intervals for Parameters of
    Normal Distribution.
3   Confidence Intervals for Other
    Distributions
                 Dung Nguyen   Probability and Statistics 2/48
    Outline II
4   Summary
              Dung Nguyen   Probability and Statistics 3/48
    Point estimation and Interval estimation
1    Point estimation and Interval estimation
       Point estimation
       Interval estimation
                                  Dung Nguyen   Probability and Statistics 4/48
Point estimation and Interval estimation                  Point estimation
 Population vs.                                  Sample
   A population is a collection of objects,
   items, humans/animals about which
   information is sought.
   A sample is a part of the population
   that is observed.
   A parameter is a numerical
   characteristic of a population,
   A statistic is a numerical function of
   the sampled data, used to estimate an
   unknown parameter.
                              Dung Nguyen   Probability and Statistics 5/48
Point estimation and Interval estimation                  Point estimation
 Some characteristics of
 samples
   Sample         mean
   Sample         variance/standard deviation
   Sample         median
   Sample         interquartile range
   Sample         proportion
                              Dung Nguyen   Probability and Statistics 6/48
 Point estimation and Interval estimation                  Point estimation
  The Sample Proportion
Relative frequency estimate of p is k/n.
The estimated value of p ∈ [0, 1].
Example (1)
5023 Heads are observed on 10000 tosses.
The relative frequency estimate of p is
0.5023 Is it possible that actually p = 0.5
instead? Is it possible that actually
p = 0.51?
                               Dung Nguyen   Probability and Statistics 7/48
 Point estimation and Interval estimation                Interval estimation
  Interval Estimates
An interval estimate estimates the value
of p as being in an interval (a, b) or [a, b]
Example (2)
5023 Heads are observed on 10000 tosses.
An interval estimate is of the form
    0.4973 < p < 0.5073                           0.5013 ≤ p ≤ 0.5033
The length of the interval is a crucial
parameter of the estimate.
                               Dung Nguyen   Probability and Statistics 8/48
 Point estimation and Interval estimation                Interval estimation
  Confidence Interval
How sure are we that the unknown value of
p actually is in the interval specified?
   [0, 1]: 100% confident.
   Smaller intervals: lesser degree of
   confidence.
   “0.4973 < p < 0.5073” vs.
   “0.5013 ≤ p ≤ 0.5033”.
                               Dung Nguyen   Probability and Statistics 9/48
Point estimation and Interval estimation                Interval estimation
 Confidence Interval and Level
   (X1, . . . , Xn) is a random sample from a
   distribution depending on a parameter θ
   A confidence interval for θ:
                                    S1 ≤ θ ≤ S2,
   where S1 and S2                  are
          computed from the sample data.
          called the lower- and upper- confidence
          limits.
   The confidence level: γ = Pθ (S1 ≤ θ ≤ S2).
   Wide interval ⇐⇒ high confidence level
                              Dung Nguyen   Probability and Statistics 10/48
Point estimation and Interval estimation                Interval estimation
 Confidence level and
 Significance level
   A confidence level (γ) is a measure of
   the degree of reliability of the
   interval.
   A significance level (α) is the
   probability we allow ourselves to be
   wrong when we are estimating a parameter
   with a confidence interval.
                                       γ+α=1
                              Dung Nguyen   Probability and Statistics 11/48
 Point estimation and Interval estimation                Interval estimation
  One-Sided Confidence
  Intervals
Definition
   Let S1 be a statistic:                             for all values
   of θ, P(S1 < θ) = γ.
   (S1, ∞) is called
           a left-sided 100γ percent CI for θ.
    S1 is called
           a 100γ percent lower confidence limit for θ.
                               Dung Nguyen   Probability and Statistics 12/48
 Point estimation and Interval estimation                Interval estimation
  One-Sided Confidence
  Intervals
Definition
   Let S2 be a statistic:                             for all values
   of θ, P(θ < S2) = γ.
   (−∞, S2) is called
           a right-sided 100γ percent CI for θ.
    S2 is called
           a 100γ percent lower confidence limit for θ.
                               Dung Nguyen   Probability and Statistics 13/48
Point estimation and Interval estimation                Interval estimation
 X1, . . . , Xn ∼ N(µ, σ 2) are iid
     Known σ
     Unknown σ
 X1, . . . , Xn are iid & n ≫ 1
     Arbitrary distribution
     Bernoulli distribution -> Proportion
                              Dung Nguyen   Probability and Statistics 14/48
Confidence Intervals for Parameters of Normal
                Distribution.
2   Confidence Intervals for Parameters of
    Normal Distribution.
      Normal Population + Known σ
      Normal Population + Unknown σ
                                 Dung Nguyen    Probability and Statistics 15/48
Confidence Intervals for Parameters of Normal
                Distribution.                            Normal Population + Known σ
   Normal Population + Known σ
Theorem                                        CI of population mean
If X1, . . . , Xn are iid                      If X1, . . . , Xn are iid
∼ N (µ, σ 2), then                             ∼ N (µ, σ 2) then
     b−µ
     µ                                                                 σ
        √ ∼ N (0, 1).                               µ=µ    b ± zα/2 · √ .
     σ/ n                                                               n
Sample size
Let MOE = √σn · zα/2.                   Then
                                                                     2
                                                         σ · zα/2
                                                     
                   MOE ≤ ϵ0 ⇐⇒ n ≥                                        .
                                                            ϵ0
                                 Dung Nguyen     Probability and Statistics 16/48
Confidence Intervals for Parameters of Normal
                Distribution.                           Normal Population + Known σ
   Example 3 - Pit Stop
In auto racing, a pit stop is where a
racing vehicle stops for new tires, fuel,
repairs, and other mechanical adjustments.
The efficiency of a pit crew that makes
these adjustments can affect the outcome
of a race. A random sample of 32 pit stop
times has a sample mean of 12.9 seconds.
Assume that the population distribution is
normal and the population standard
deviation is 0.19 second.
                                 Dung Nguyen    Probability and Statistics 17/48
Confidence Intervals for Parameters of Normal
                Distribution.                           Normal Population + Known σ
a     Construct a 99% confidence interval for
      the mean pit stop time.
b     How many observations must be collected
      to ensure that the radius of the 99% CI
      is at most 0.01?
                                 Dung Nguyen    Probability and Statistics 18/48
Confidence Intervals for Parameters of Normal
                Distribution.                           Normal Population + Known σ
a     Construct a 99% confidence interval for
      the mean pit stop time.
b     How many observations must be collected
      to ensure that the radius of the 99% CI
      is at most 0.01?
Solution
                                0.19
                  12.9 ± 2.58 · √    = 12.9 ± 0.087
                                  32
                         n ≥ 2403 or 2395.198.
                                 Dung Nguyen    Probability and Statistics 18/48
Confidence Intervals for Parameters of Normal
                Distribution.                           Normal Population + Known σ
   One-Sided Confidence Interval
   (Normal Population + Known σ)
      A γ upper-confidence bound (aka
      right-sided confidence interval) for µ
      is
                                 σ
                    µ≤µb + zα · √ .
                                  n
      A γ lower-confidence bound (aka
      left-sided confidence interval) for µ is
                                 σ
                    µ≥µb − zα · √ .
                                  n
                                 Dung Nguyen    Probability and Statistics 19/48
Confidence Intervals for Parameters of Normal
                Distribution.                           Normal Population + Known σ
   Example 4 - Pit Stop
In auto racing, a pit stop is where a
racing vehicle stops for new tires, fuel,
repairs, and other mechanical adjustments.
A random sample of 32 pit stop times has a
sample mean of 12.9 seconds. Assume that
the population distribution is normal and
the population standard deviation is 0.19
second. Construct a right-sided 95% CI
for the population mean.
                                 Dung Nguyen    Probability and Statistics 20/48
Confidence Intervals for Parameters of Normal
                Distribution.                           Normal Population + Known σ
   One-sided vs Two-sided
                                                            Two-sided CI
                                                            One-sided CI
                 −zα/2                   µ
                                         b                  zα/2
                                                          zα
                                 Dung Nguyen    Probability and Statistics 21/48
Confidence Intervals for Parameters of Normal
                Distribution.                          Normal Population + Unknown σ
 0.4
                                                                       N(0, 1)
 0.3                                                                   t(15)
                                                                        t(2)
 0.2
 0.1
      −3            −2           −1             0             1              2         3
                   Figure: Pdf of N(0, 1) and t(df)
                                 Dung Nguyen    Probability and Statistics 22/48
Confidence Intervals for Parameters of Normal
                Distribution.                          Normal Population + Unknown σ
   Normal Population + Unknown σ
Theorem
If X1, . . . , Xn are i.i.d. ∼ N (µ, σ 2), then
            µb−µ             (n − 1)s2
                √ ∼ tn−1 and            ∼ χ2n−1.
            s/ n                 σ2
CI of the population mean
If X1, . . . , Xn are i.i.d. ∼ N (µ, σ 2) then
                                       s
                    µ=µb ± tn−1,α/2 · √ .
                                        n
                                 Dung Nguyen    Probability and Statistics 23/48
Confidence Intervals for Parameters of Normal
                Distribution.                          Normal Population + Unknown σ
   Example 5 - Tread Depth
11 randomly selected automobiles were
stopped, and the tread depth of the right
front tire was measured. The sample mean
was 0.32 inch, and the sample standard
deviation was 0.08 inch. Find the 95%
confidence interval of the mean depth.
Assume that the variable is approximately
normally distributed.
                                 Dung Nguyen    Probability and Statistics 24/48
Confidence Intervals for Parameters of Normal
                Distribution.                          Normal Population + Unknown σ
                                 Dung Nguyen    Probability and Statistics 25/48
Confidence Intervals for Parameters of Normal
                Distribution.                          Normal Population + Unknown σ
   Solution
                           0.08
        µ = 0.32 ± 2.228 · √    =⇒ µ = 0.32 ± 0.05.
                             11
                                 Dung Nguyen    Probability and Statistics 26/48
Confidence Intervals for Parameters of Normal
                Distribution.                          Normal Population + Unknown σ
   Example 6 - Point of
   inflammation of Diesel oil
Five independent measurements of the point
of inflammation of Diesel oil gave the
values (in F)
            144 147 146 144 142
Assuming normality, determine a 99%
confidence interval for the mean.
                                 Dung Nguyen    Probability and Statistics 27/48
Confidence Intervals for Parameters of Normal
                Distribution.                          Normal Population + Unknown σ
                                 Dung Nguyen    Probability and Statistics 28/48
Confidence Intervals for Parameters of Normal
                Distribution.                          Normal Population + Unknown σ
   Solution
Required values:          µ
                          b = 144.6, s = 1.949. Thus
                             1.949
          µ = 144.6 ± 4.604 · √    = 144.6 ± 4.014
                                5
                                 Dung Nguyen    Probability and Statistics 29/48
Confidence Intervals for Parameters of Normal
                Distribution.                          Normal Population + Unknown σ
   CI of the population variance
      Choose c1 and c2 so that the area in
      each tail of χ2n−1 distribution is α/2.
      Then the γ-confidence interval for the
      unknown variance σ 2 is
                (n − 1)s2   2  (n − 1)s2
                          ≤σ ≤           .
                   c2             c1
                                 Dung Nguyen    Probability and Statistics 30/48
Confidence Intervals for Parameters of Normal
                Distribution.                          Normal Population + Unknown σ
   CI of the population variance
      Choose c1 and c2 so that the area in
      each tail of χ2n−1 distribution is α.
      The γ lower and upper confidence bounds
      on σ 2 are
                       2  (n − 1)s2
                     σ ≥            ,
                             c2
      and
                       2  (n − 1)s2
                     σ ≤            .
                             c1
                                 Dung Nguyen    Probability and Statistics 31/48
Confidence Intervals for Parameters of Normal
                Distribution.                          Normal Population + Unknown σ
   Example 7 -
An automatic filling machine is used to
fill bottles with liquid detergent. A
random sample of 20 bottles results in a
sample variance of fill volume of
s2 = 0.01532. Assume that the fill volume
is approximately normal. Compute a 95%
upper confidence bound.
                                 Dung Nguyen    Probability and Statistics 32/48
Confidence Intervals for Parameters of Normal
                Distribution.                          Normal Population + Unknown σ
   Solution
                             (20 − 1)0.0153
                    σ2 ≤                    = 0.0287,
                                 10.117
and
                                      σ ≤ 0.17.
                                 Dung Nguyen    Probability and Statistics 33/48
Confidence Intervals for Other Distributions
3   Confidence Intervals for Other
    Distributions
      Large Sample CIs for Population Means
      Large-Sample CIs for Population
      Proportions
                                Dung Nguyen    Probability and Statistics 34/48
Confidence Intervals for Other Distributions     Large Sample CIs for Population Means
   Large Sample Size
Theorem
If X1, . . . , Xn are i.i.d. then
                   b−µ
                   µ      b−µ
                          µ
                     √ ≈ √ ≃ N (0, 1)
                   s/ n σ/ n
CI of population mean - Large sample size
If X1, . . . , Xn are                          Moreover, if σ is
i.i.d. and n is                                unknown then we
large then                                     estimate σ ≈ s and
                       σ                                          s
    µ≈µ    b ± zα/2 · √ .                          µ≈µb ± zα/2 · √ .
                        n                                          n
                                Dung Nguyen    Probability and Statistics 35/48
Confidence Intervals for Other Distributions     Large Sample CIs for Population Means
   Example 8 -
A random sample of 110 lighting flashes in
a region resulted in a sample average
radar echo duration of 0.81 s and a sample
standard deviation of 0.34 s. Calculate a
99% (two-sided) CI for the true average
echo duration.
                                Dung Nguyen    Probability and Statistics 36/48
Confidence Intervals for Other Distributions     Large Sample CIs for Population Means
   Example 9 -
A sample of fish was selected from Florida
lakes, and mercury concentration in the
muscle tissue was measured (ppm).
    1.230 1.330 0.040 0.044 0.490 0.190
    0.830 0.810 0.490 1.160 0.050 0.150
    1.080 0.980 0.630 0.560 0.590 0.340
    0.340 0.840 0.280 0.340 0.750 0.870
    0.180 0.190 0.040 0.490 0.100 0.210
    0.860 0.520 0.940 0.400 0.430 0.250
Find an approximate 95% CI on µ.
                                Dung Nguyen    Probability and Statistics 37/48
Confidence Intervals for Other Distributions     Large Sample CIs for Population Means
   Solution
        b = 0.5284, s2 = 0.1361, s = 0.3690, z0.025 =
n = 36, µ
1.96. Then the CI
                      0.3690
       0.5284 ± 1.96 √       = 0.5284 ± 0.1205
                         36
                             = [0.4079, 0.6490]
                                Dung Nguyen    Probability and Statistics 38/48
Confidence Intervals for Other Distributions   Large-Sample CIs for Population Proportions
   Population Proportion
Corollary
Let X ∼ B(n, p) and assume np ≥ 10, nq ≥ 10.
Then
                  p̂ − p
                 p       ≃ N(0, 1).
                    pq/n
                                Dung Nguyen    Probability and Statistics 39/48
Confidence Intervals for Other Distributions   Large-Sample CIs for Population Proportions
   Population Proportion
An approximate 100γ% CI for p
                                 √
                                    p̂q̂
              p ≈ p̂ ± zα/2 · √ .
                                      n
The approximate 100γ% lower and upper
confidence bounds              √
                                   p̂q̂
               p ≳ p̂ − zα · √ ,
                                    n
and                            √
                                   p̂q̂
               p ≲ p̂ + zα · √ .
                                    n
respectively.   Dung Nguyen Probability and Statistics 40/48
Confidence Intervals for Other Distributions   Large-Sample CIs for Population Proportions
   Example 10 - Population
   Proportion
An article reported that in n = 45 trials
in a particular laboratory, 16 resulted in
ignition of a particular type of substrate
by a lighted cigarette. Let p denote the
long-run proportion of all such trials
that would result in ignition. Find a
confidence interval for p with a
confidence level of about 95%.
                                Dung Nguyen    Probability and Statistics 41/48
Confidence Intervals for Other Distributions   Large-Sample CIs for Population Proportions
   Solution
A point estimate for p is p̂ = 16/45 = 0.36.
A confidence interval for p is
                p
     0.36 ± 1.96 0.36 · 0.64/45 = 0.36 ± 0.14.
                                Dung Nguyen    Probability and Statistics 42/48
Confidence Intervals for Other Distributions   Large-Sample CIs for Population Proportions
   Find the sample size
                             √
Let MOE = zα/2 ·             √p̂q̂ .     Then
                               n
                                                                   2
                                               zα/2
                 MOE ≤ ϵ0          ⇐= n ≥ 0.25                           .
                                                ϵ0
                                Dung Nguyen    Probability and Statistics 43/48
Confidence Intervals for Other Distributions   Large-Sample CIs for Population Proportions
   Find the sample size
                             √
Let MOE = zα/2 ·             √p̂q̂ .     Then
                               n
                                                                   2
                                               zα/2
                 MOE ≤ ϵ0          ⇐= n ≥ 0.25                           .
                                                ϵ0
Example
How many people do you need to survey so
that the margin of error (95%) is plus or
minus 3% points? This means that 95% of
the time, the survey estimate should be
within 3% points of the true answer.
                                Dung Nguyen    Probability and Statistics 43/48
           Summary
    Example 11 - z vs t
A random sample of 32 pit stop times has a
sample mean of 12.9 seconds and a sample
standard deviation of 0.20 seconds.
Assume that the population distribution is
normal and the population standard
deviation is 0.19 second. Construct a CI.
1 µ=µ b ± zα/2 · √σn . (exact CI)
2     b ± tn−1,α/2 · √sn . (exact CI)
    µ=µ
3     b ± zα/2 · √sn . (approximate CI)
    µ=µ
                     Dung Nguyen   Probability and Statistics 44/48
     Summary
zα/2 vs tα/2
                                                        N(0, 1)
                                                        t(df)
                                            z α2    t α2
               Dung Nguyen   Probability and Statistics 45/48
    Summary
Example 12 - Which case?
     x 9.62   4.09      1.70 10.62 4.73
       2.40   4.05      8.41 6.77 4.16
     y 9.18   4.70      2.57 0.22 1.82
       0.82   3.98      6.06 0.24 0.21
              Dung Nguyen   Probability and Statistics 46/48
           Summary
    Example 12 - Which case?
            x 9.62   4.09      1.70 10.62 4.73
              2.40   4.05      8.41 6.77 4.16
            y 9.18   4.70      2.57 0.22 1.82
              0.82   3.98      6.06 0.24 0.21
1     b ± zα/2 · √σn . (exact CI)
    µ=µ
2     b ± tn−1,α/2 · √sn . (exact CI)
    µ=µ
3     b ± zα/2 · √sn . (approximate CI)
    µ=µ
                     Dung Nguyen   Probability and Statistics 46/48
          Summary
    Which case?
x 9.62 4.09 1.70 10.62 4.73 2.40 4.05 8.41 6.77 4.16
y 9.18 4.70 2.57 0.22 1.82 0.82 3.98 6.06 0.24 0.21
4                                 5
                                  4
3
                                  3
2
                                  2
1                                 1
0                                 0
     0 2 4 6 8 10 12                    0      2      4      6       8 10
                    Dung Nguyen   Probability and Statistics 47/48
        Summary
Summary
X1, . . . , Xn ∼ N(µ, σ 2) are iid
                                   σ
    Known σ: µ = µ       b ± zα/2 √
                                    n
                                      s
    Unknown σ: µ = µ       b ± tα/2 √
                                       n
X1, . . . , Xn are iid & n ≫ 1
    Arbitrary distribution:
                      σ                s
    µ≈µ     b ± zα/2 √ ≈ µ  b ± zα/2 √
                       n                n p
    Bernoulli distribution: p ≈ p̂ ± zα/2 p̂q̂/n
                  Dung Nguyen   Probability and Statistics 48/48