0% found this document useful (0 votes)
89 views4 pages

Sampling Distribution Solutions

This document contains solutions to 5 tutorial questions about sampling distributions and properties of sample means and proportions. For question 1, the document finds the mean and standard deviation of the sampling distribution of the proportion of correct answers on a 50 question exam. For question 2, it calculates the probability of rejecting a shipment of chips if more than 5% are found defective in a sample. The remaining questions deal with additional examples calculating properties of sampling distributions and determining if sample means or proportions are unusually high based on the expected sampling distribution.

Uploaded by

Jingyi Li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views4 pages

Sampling Distribution Solutions

This document contains solutions to 5 tutorial questions about sampling distributions and properties of sample means and proportions. For question 1, the document finds the mean and standard deviation of the sampling distribution of the proportion of correct answers on a 50 question exam. For question 2, it calculates the probability of rejecting a shipment of chips if more than 5% are found defective in a sample. The remaining questions deal with additional examples calculating properties of sampling distributions and determining if sample means or proportions are unusually high based on the expected sampling distribution.

Uploaded by

Jingyi Li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

ST1131

Tutorial 6 Solution

1. An exam consists of 50 multiple choice questions. Based on how much you studied, for any given
question you think you have a probability of p = 0.7 of getting the correct answer. Consider the
sampling distribution of the sample proportion of the 50 questions on which you get the correct answer.

(a) Find the mean and standard deviation of the sampling distribution of this proportion.
(b) What do you expect for the shape of the sampling distribution? Why?
(c) If truly p = 0.7, what’s the probability that you can get the correct answers for not more than
60% of the questions?

Solution:

(a) Each question is either right or wrong with probability p = 0.7. Sample size n = 50.
The sampling distribution of sample proportion - p̂ has mean 0.7 and standard deviation of
r r
p(1 − p) (0.3)(0.7)
= = 0.0648
n 50

(b) Since sample size n = 50 is large enough, the distribution of p̂ can be approximated by a normal
distribution.
(c) If true p = 0.7 then
 (0.3)(0.7) 
p̂ ∼ N 0.7, ,
50
The probability of getting at most 60% correct answer is

P r(p̂ ≤ 0.6) = 0.0614


> pnorm(0.6, mean = 0.7, sd = sqrt(0.3*0.7/50))
[1] 0.06141132

2. A supplier of electronic chips for tablets claims that only 4% of his chips are defective. A manufacturer
tests 500 randomly selected chips from a large shipment from the supplier for potential defects.

(a) Find the mean and standard deviation for the distribution of the sample proportion of defective
chips in the sample of 500.
(b) Is it reasonable to assume a normal shape for the sampling distribution? Explain.
(c) The manufacturer will return the entire shipment if he finds more than 5% of the 500 sampled
chips to be defective. Find the probability that the shipment will be returned?

Solution:

(a) Each chip is either qualified or defective with probability 0.04 of defective. Sample size is n = 500.
Proportion of defective in a sample is p̂, which has distribution with mean 0.04 and standard
deviation r r
p(1 − p) (0.04)(0.96)
= ≈ 0.0088.
n 500

1
ST1131

(b) It is reasonable to assume a normal shape for the sampling distribution of p̂, since the sample size
is large enough: np(1 − p) = 500(0.04)(0.96) = 19.2.
(c) True p = 0.04 then
 (0.04)(0.96) 
p̂ ∼ N 0.04, .
500
The shipment will be returned if from the sample we obtain a p̂ that is 5% or larger. Hence, the
probability that the shipment is returned is

P r(p̂ ≥ 0.05) = 0.1269.

> pnorm(0.05, mean = 0.04, sd = sqrt(0.04*0.96/500), lower.tail = FALSE)


[1] 0.1269165

3. According to a recent Current Population Report, the population distribution of numbers of years of
education for self-employed individuals in the US has a mean of 13.6 and a standard deviation of 3.

(a) Identify the random variable X whose population distribution is described here.
(b) Find the mean and standard deviation of the sampling distribution of X̄ for a random sample of
size n = 100. Interpret the results.
(c) Repeat part 3b for n = 400. Describe the effect of increasing n.

Solution:

(a) The random variable X is years of education of self-employed U.S. citizens.


√ √
(b) The mean is 13.6 and the standard deviation is σ/ n = 3/ 100 = 0.3.
The mean of the sampling distribution is 13.6, the same as the mean of the population.
The standard deviation of the sampling distribution is smaller than the standard deviation of the
population. It reflects the variability of sample mean from samples of size 100 taken from this
population.
√ √
(c) The mean of the sampling distribution is still 13.6 and the standard deviation is σ/ n = 3/ 400 =
0.15. As n increases, the mean of the sampling distribution stays the same, but the standard
deviation gets smaller.

4. Consider duration of mobile phone calls of a P6 student. It is assumed to follow a distribution with
mean µ = 2.8 minutes and standard deviation σ = 2.1 minutes.
From a random sample of 45 calls, the parents of this student computed a sample mean of X̄ = 3.3
minutes and a sample standard deviation of s = 2.9 minutes.

(a) What are the mean and standard deviation of the data distribution?
(b) Find the mean and standard deviation of the sampling distribution of the sample mean.
(c) Is the mean of 3.3 minutes from the collected sample unusually high? Find its z-score and
comment.
(d) The parents have told this student that they will re-plan (to tighten phone use) when they find a
sample mean larger than 3.5 minutes. How likely is it to happen?

Solution:

(a) Data (which is the sample of 45 calls) has a mean of 3.3 and a standard deviation of 2.9.

2
ST1131

(b) The √sampling distribution has a mean as µ, 2.8 minutes and a standard deviation of σ/ n =
2.1/ 45 = 0.313.
(c) For a sample with mean 3.3 minutes, the z-score is
3.3 − 2.8
z= = 1.6
0.313
Since the sampling distribution of sample mean is bell shaped, this is not unusually high since it
is only 1.6 standard deviations from the mean.
(d) For a sample with mean 3.5 minutes, the z-score is

3.5 − 2.8
z= = 2.24
0.313
which corresponds to a cumulative probability of 0.987. The probability of a sample mean being
larger than 3.5 is 1 − 0.987 = 0.0127 or 1.27%.

5. (Question for Group Work) Consider duration of mobile phone calls of a P6 student. It is assumed to
follow a distribution with mean µ = 2.8 minutes and standard deviation σ = 2.1 minutes.

(a) Assume that the duration of mobile phone call of this student follows a normal distribution.
Generate a random sample of duration of 45 calls. Derive the sample mean and sample standard
deviation.
(b) We repeat part 5a for N = 1000 times, where for those N times we keep all the sample mean
values in a vector called X.bar. After that, we plot a histogram for X.bar. What’s the shape of
this histogram?
(c) What is (roughly) the center and the range of the histogram above?
(d) Among all the values in X.bar, what is the proportion of the values that are larger than 3.5?

Solution:

(a) > mu = 2.8


> sigma = 2.1
> #(a) generate a sample of 45
> sample = rnorm(45, mu, sigma)
> mean(sample)
[1] 3.314398
> sd(sample)
[1] 2.028001
(b) Theoretically, the histogram is bell-shaped. This is because the size of each sample is n = 45
which is large enough for CLT to apply. Hence, the distribution of X̄ is normal. The histogram
of N = 1000 observations of X̄ should resemble a normal curve.
(c) Theoretically: Centre of the histogram should be around 2.8 which is the mean of the population.
The range of the histogram is √ about 3 times
√ of standard deviations of X̄ away from mean 2.8.
Standard deviation of X̄ is σ/ n = 2.1/ 45 = 0.3130495. The range of the histogram hence
about from 1.86 to 3.74.
(d) Theoretically, the proportion is the area on the right of 3.5 from a curve of N (2.8, 0.3132 ), which
is about 0.0127.

Note: The R code below is for practicing question (a) - (d) in R.

3
ST1131

> mu = 2.8
> sigma = 2.1
> #(a) generate a sample of 45
> sample = rnorm(45, mu, sigma)
> mean(sample)

[1] 2.462294

> sd(sample)

[1] 1.977686

> #(b) Repeat part a for N = 1000 times.


> N = 1000
> N.samples = matrix(rnorm(45*N, mu, sigma), N, 45)
> # "N.samples" is a matrix of N rows, each row has 45 values from N(mu, sigma^2).
>
> X.bar = rowMeans(N.samples) # 1000 values of sample mean from N samples
> # X.bar
>
> # histogram of vector "X.bar" is:
> hist(X.bar, prob = TRUE, col = "green4",
+ xlab = "Histogram of sample means")
> #(c)
> # In R:
> range(X.bar) # this should be close to 1.86 to 3.74.

[1] 1.668789 3.715907

> mean(X.bar) #this should be close to mu = 2.8

[1] 2.803892

> sd(X.bar) #this should be close to sigma/sqrt(45) = 0.3130495

[1] 0.3060744

> #(d)
> # In R:
> length(which(X.bar>3.5)) # count how many values in X.bar > 3.5

[1] 15

> #proportion
> length(which(X.bar>3.5))/N # this value should be close to 0.0127

[1] 0.015

You might also like