Sampling Distributions: IPS Chapter 5
Sampling Distributions: IPS Chapter 5
IPS Chapter 5
     Weibull distributions
Reminder: the two types of data
   Quantitative
        Something that can be counted or measured and then averaged across
         individuals in the population (e.g., your height, your age, your IQ score)
   Categorical
        Something that falls into one of several categories. What can be
         counted is the proportion of individuals in each category (e.g., your
         gender, your hair color, your blood type—A, B, AB, O).
Some sample means will be above the population mean µ and some
will be below, making up the sampling distribution.
                                   Sampling
                                  distribution
                                   of “x bar”
                                                          Histogram
                                                           of some
                                                           sample
                                                          averages
For any population with mean µ and standard deviation σ:
σ/√n
                                                          µ
   Mean of a sampling distribution of     x
      There is no tendency for a sample mean to fall systematically above or
      below µ, even if the distribution of the raw data is skewed. Thus, the mean
      of the sampling distribution is an unbiased estimate of the population mean
      µ — it will be “correct on average” in many samples.
Sampling distribution
      (x − µ)     3.5 − 3.8
z=              =           = −1.5 ,       P(z < −1.5) = 0.0668 ≈ 7%
          σ          0.2
       Sampling                                               Sampling
  distribution of                                             distribution of
    x for n = 10                                              x for n = 25
  observations                                                observations
Practical note
         Opinion polls have a limited sample size due to time and cost of
          operation. During election times, though, sample sizes are increased
          for better accuracy.
       We take 1000 SRSs of 100 incomes, calculate the sample mean for
        each, and make a histogram of these 1000 means.
       We also take 1000 SRSs of 25 incomes, calculate the sample mean for
        each, and make a histogram of these 1000 means.
        Which histogram
          corresponds to
         samples of size
               100? 25?
How large a sample size?
10
                                    Frequency
                                                8
                                                2
 Describethe histogram.
                                                0
 What do you assume for the
                                                     1.5   3   4.5    6     7.5    9   10.5 More
 population distribution?
                                                                     Acorn sizes
   Quality control (breaking strength of products and parts, food shelf life)
   Maintenance planning (scheduled car revision, airplane maintenance)
   Cost analysis and control (number of returns under warranty, delivery time)
   Research (materials properties, microbial resistance to treatment)
 Density curves of three members of the Weibull family describing a
 different type of product time to failure in manufacturing:
 Sample proportions
 Normal approximation
     Binomial formulas
Binomial distributions for sample counts
Binomial distributions are models for some categorical variables,
typically representing the number of successes in a series of n trials.
However, if you don’t put the coin back in the pile, the probability of picking up
another coin and having it be heads up is now less than 0.5. The successive
observations are not independent.
Likewise, choosing a simple random sample (SRS) from any population is not
quite a binomial setting. However, when the population is large, removing a
few items has a very small effect on the composition of the remaining
population: successive observations are very nearly independent.
Binomial distribution in statistical sampling
In Minitab,
Menu/Calc/
Probability Distributions/Binomial
Number_s:
 number of successes in trials.
Trials:
 number of independent trials.
Probability_s:
 probability of success on each trial.
Cumulative:
 a logical value that determines
 the form of the function.
    TRUE, or 1, for the cumulative
     P(X ≤ Number_s)
    FALSE, or 0, for the probability
     function P(X = Number_s).
Binomial mean and standard deviation
                                                      0.3
                                                     0.25
The center and spread of the binomial                 0.2
                                                                                                   a)
                                            P(X=x)
                                                     0.15
distribution for a count X are defined by             0.1
                                                     0.05
the mean µ and standard deviation σ:                    0
                                                            0   1        2    3   4   5   6   7    8    9   10
                                                      0.3                    Number of successes
µ = np         σ = npq = np (1 − p )                 0.25
                                                      0.2                                          b)
                                            P(X=x)
                                                     0.15
We often write q as 1 – p.                            0.1
                                                     0.05
                                                       0
Effect of changing p when n is fixed.                       0   1    2        3   4   5   6   7    8    9   10
                                                                             Number of successes
a) n = 10, p = 0.25                                   0.3
                                                     0.25
b) n = 10, p = 0.5                                    0.2
                                                                    c)
                                            P(X=x)
                                                     0.15
c) n = 10, p = 0.75                                   0.1
 What is the probability that five individuals or fewer in the sample are color
blind?
      Use Excel’s “=BINOMDIST(number_s,trials,probability_s,cumulative)”
      P(x ≤ 5) = BINOMDIST(5, 25, .08, 1) = 0.9877
   What is the probability that more than five will be color blind?
      P(x > 5) = 1 − P(x ≤ 5) =1 − 0.9877 = 0.0123
                          P(X = x)
 8    0.04%     99.99%               15%
 9    0.01% 100.00%
10    0.00% 100.00%                  10%
11    0.00% 100.00%
12    0.00% 100.00%                    5%
13    0.00% 100.00%
14    0.00% 100.00%                    0%
15    0.00% 100.00%                         0
                                                2
                                                    4
                                                          6
                                                              8
                                                                   10
                                                                        12
                                                                             14
                                                                                  16
                                                                                       18
                                                                                            20
                                                                                                 22
                                                                                                      24
16    0.00% 100.00%
17    0.00% 100.00%                                     Number of color blind individuals (x)
18    0.00% 100.00%
19    0.00% 100.00%
20    0.00% 100.00%
21    0.00% 100.00%                  Probability distribution and histogram for the number
22    0.00% 100.00%
23    0.00% 100.00%                  of color blind individuals among 25 Caucasian males.
24    0.00% 100.00%
25    0.00% 100.00%
What are the mean and standard deviation of the count
of color blind individuals in the SRS of 25 Caucasian
American males?
µ = np = 25*0.08 = 2
           0.4                                               0.15
                                                    P(X=x)
  P(X=x)
           0.3
                                  p = .08                     0.1                                p = .08
           0.2
                                  n = 10                     0.05
                                                                                                 n = 75
           0.1
            0                                                  0
                 0   1    2   3    4   5   6                        0 1 2 3 4 5 6 7 8 9 10 11 12 13
                     Number of successes                                   Number of successes
Sample proportions
The proportion of “successes” can be more informative than the count.
In statistical sampling the sample proportion of successes, p̂, is used to
estimate the proportion p of successes in a population.
 The 30 subjects in an SRS are asked to taste an unmarked brand of coffee and rate it
“would buy” or “would not buy.” Eighteen subjects rated the coffee “would buy.”
       p̂ = (18)/(30) = 0.6 (proportion of “would buy”)
If the sample size is much smaller than the size of a population with
proportion p of successes, then the mean and standard deviation of
p̂ are:
                                              p (1 − p )
                     µ pˆ = p       σ pˆ =
                                                  n
If X is the count of successes in the sample and p̂ = X/n, the sample proportion
of successes, their sampling distributions for large n, are:
The sampling distribution of p̂ is never exactly normal. But as the sample size
increases, the sampling distribution of p̂ becomes approximately normal.
The normal approximation is most accurate for any fixed n when p is close to
0.5, and least accurate when p is near 0 or near 1.
Color blindness
The frequency of color blindness (dyschromatopsia) in the
Caucasian American male population is about 8%.
We take a random sample of size 125 from this population. What is the
probability that six individuals or fewer in the sample are color blind?
The normal approximation is reasonable, though not perfect. Here p = 0.08 is not
close to 0.5 when the normal approximation is at its best.
A sample size of 125 is the smallest sample size that can allow use of the normal
approximation (np = 10 and n(1 − p) = 115).
         Sampling distributions for the color blindness example.
                                Binomial          Normal approx.
         0.25
0.2
                                                       n = 50
P(X=x)
         0.15
                                                                            The larger the sample size, the better
          0.1                                                               the normal approximation suits the
         0.05                                                               binomial distribution.
           0                                                                Avoid sample sizes too small for np or
                0   1   2       3    4   5    6   7    8   9 10 11 12       n(1 − p) to reach at least 10 (e.g., n =
                                    Count of successes
                                                                            50).
                                                                        P(X=x)
         0.08                                                                    0.03
         0.06                                                                    0.02
         0.04
         0.02                                                                    0.01
            0                                                                      0
                0           5            10       15        20     25                   0   20    40    60   80    100   120   140
P(X ≤ 10) for a binomial variable is P(X ≤ 10.5) using a normal approximation.
P(X < 10) for a binomial variable excludes the outcome X = 10, so we exclude
the entire interval from 9.5 to 10.5 and calculate P(X ≤ 9.5) when using a
normal approximation.
Color blindness
The frequency of color blindness (dyschromatopsia) in the
Caucasian American male population is about 8%. We
take a random sample of size 125 from this population.
           n  =  n!
                                             Where k = 0, 1, 2, ..., or n.
            k  k!(n − k )!
Binomial formulas
   The binomial coefficient “n_choose_k” uses the factorial notation “!”.
   The factorial n! for any strictly positive whole number n is:
              n! = n × (n − 1) × (n − 2) × · · · × 3 × 2 × 1
   Note that 0! = 1.
Calculations for binomial probabilities
                                                              X          P(X)
        P( X = k ) =  n  p k (1 − p ) n − k
                      k                                     0      nC0 p
                                                                             0qn =   qn
                                                              1      nC1 p
                                                                             1qn-1
                                                              2      nC2 p
                                                                             2qn-2
What is the probability that exactly five individuals in the sample are color blind?
                        n!              n −k   25!
    
         P(x = 5) =            p (1 − p) =
                                k
                                                     0.08 5 (0.92) 20
                    k!(n − k)!               5!(20)!
                     21* 22 * 23* 24 * 25
        P(x = 5) =                        0.08 5 (0.92) 20
                       1* 2 * 3* 4 * 5
        P(x = 5) = 53,130 * 0.0000033 * 0.1887 = 0.03285
           Alternate Slides
In   JMP,
Highlight a column,
click the column triangle ( ), and
click formula to open the box on the
right.
Select Discrete Probability
What    is the probability that exactly five individuals in the sample are color
blind?
 What is the probability that five individuals or fewer in the sample are
color blind?
       Use JMPs: Probability Î Binomial Distribution (0.08, 20, 5)
       P(x≤ 5) = Binomial Distribution(0.08, 20, 5) = 0.9962
   What is the probability that more than five will be color blind?
       P(x> 5) = 1 −P(x≤ 5) =1 − 0.9962 = 0.0038
µ = np = 20*0.08 = 1.6
0.5 0.2
           0.4                                          0.15
                                               P(X=x)
  P(X=x)
           0.3
                                  p = .08                0.1                                p = .08
           0.2
                                  n = 10                0.05
                                                                                            n = 75
           0.1
            0                                             0
                 0   1    2   3    4   5   6                   0 1 2 3 4 5 6 7 8 9 10 11 12 13
                     Number of successes                              Number of successes
Color blindness
The frequency of color blindness (dyschromatopsia) in the
Caucasian American male population is estimated to be
about 8%. We take a random sample of size 20 from this population.
What is the probability that exactly five individuals in the sample are color blind?