Chapter 1: Sampling and Sampling Distribution
Chapter 1: Sampling and Sampling Distribution
Introduction
All items in any field of inquiry constitute a ‘universe’ or a ‘population’. A complete
enumeration of all items in the population is known as a census inquiry. In such an inquiry, when
all items are covered, no element of chance is left and highest accuracy is obtained.
This type of inquiry, however, involves a great deal of time, money and energy. Therefore, when
the field of inquiry is large, this method becomes difficult to adopt because of the resources
involved. At times, this method is beyond the reach of ordinary researchers. Government, in very
rare cases, is the only institution which can get the complete enumeration carried out.
Undertaking a census survey, many a times, is not possible. Sometimes it is possible to obtain
sufficiently accurate results by studying only a part of total population, technically called
samples. The process of selecting samples is called sampling. In sampling, however, the samples
selected should be as representative of the total population as possible in order to produce a
miniature cross-section.
A subset of the population selected for the study is known as sample. The group from which the
samples are selected is called Universe or Population. .Survey is a procedure which makes one
able to draw inferences about the population by observing or measuring few items. Census
survey: is a method of inquiry, which makes one able to draw inferences by observing each item
constituting the population.
Generalization of the sample data results to the population, which is one of the characteristic
features of research, needs scientific approach of searching for facts. Therefore, sampling must
be scientific. If the sample is drawn on scientific approach, the adopted sample design is good
and the sample size is adequate.
The following points summarize the benefits of studying samples.
1. There could be resource (time, finance, manpower, etc.) limitations which would make it
difficult to study the whole population.
2. In some cases, tests may be destructive. For example, when we test the breaking strength
of materials, we must destroy them. A census would mean complete destruction of
materials. In such a case, we must sample.
3. Sampling provides much quicker results than does a census. When the time between the
recognition of the need of information and the availability of that information is short,
sampling helps not to miss the information.
4. Sampling is the only process possible, if the population is infinite.
5. There is also an argument that the quality of a study is often better with sampling than
with a census. The basis of the argument is that in sampling possesses the possibility of
better interviewing, more thorough investigation of missing, wrong, or suspicious
information, better supervision and better processing is possible than with complete
coverage.
The following are some of the advantage of sample of over census
• Sampling saves time and money.
• It is much convenient as it involves less personal staff.
• It is useful when population is infinitely large.
• It can be more accurately supervised and data can be carefully selected.
BIAS AND ERRORS IN SAMPLING
One must remember that two costs are involved in a sampling analysis-the cost of collecting the
data and the cost of an incorrect inference resulting from the data. There are two causes of
incorrect inferences namely systematic bias and sampling error.
Systematic bias: Systematic bias results from errors in the sampling procedures, and it
cannot be reduced or eliminated by increasing the sample size. However, the causes
responsible for these errors can be detected and corrected. Bias enters in when a sample fails
to represent the population it was intended to represent.
Sampling errors: Sampling errors are the random variations in the sample estimates around
the true population parameters. Since they occur randomly and are equally likely to be in
either direction, their nature happens to be of compensatory type and the expected value of
such errors happens to be equal to zero. Sampling error decreases with the increase in the
size of the sample, and it happens to be of a smaller magnitude in case of homogeneous
population.
Sampling Techniques
Sampling techniques are basically of two types; Probability sampling and non-probability
sampling. Probability sampling is based on the concept of random selection, whereas
nonprobability sampling is non-random selection.
1) PROBABILITY SAMPLING TECHNIQUES
Probability sampling is also known as random sampling or chance sampling.
I. Simple Random Sampling
Individuals are randomly drawn from the population at large. Under this method each unit in the
universe has the same chance of being included in the sample. Random sampling needs a
‘sampling frame’ or a ‘sampling unit’, i.e. complete and up-to-date list of all members of the
population. For a homogeneous type of population, simple random sampling is reliable. Under
this sampling design, every item of the universe has an equal chance of inclusion in the sample.
It is blind chance alone that determines whether one item or the other is selected. Random
sampling is considered as the best technique of selecting a representative sample. The problem
with this method is that it is time consuming.
Simple random sampling gives:
a) each element in the population an equal probability of getting into the sample; and all
choices are independent of one another
b) each possible sample combination has an equal probability of being chosen.
We can define a simple random sample from a finite population as a sample which is chosen in
1
such a way that each of the NCn possible samples has the same probability, N Cn of being
selected. For example, if a finite population consists of 4 elements (say a, b, c, d) i.e. N=4.
Suppose that we want to take a sample of n=2 from it. Then there are 4C2 6 possible distinct
samples of the required size, and they consist of the elements ab, ac, ad, bc,bd, and cd. If we
choose one of these samples in such a way that each has a probability 1/6 of being chosen, we
will then call this a random sample.
With regard to how to take a random sample, we can use a lottery method. To do this, we first
write the name of each element of a finite population on a slip of paper, and then put the slips of
paper in to a box and mix them thoroughly and then draw the required number of slips for the
sample one after the other without replacement. Using the above example where we want to
draw two from 4 elements, the probability of drawing any one element for our sample in the first
draw is 2/4, the probability of drawing one more element in the second draw is 1/3 ( the first
element drawn is not replaced). Since these draws are independent, the joint probability of the
two elements which constitute our sample is the product of their individual probabilities and this
works out to 2/4 X 1/3 =1/6. This verifies our earlier calculation.
Another method of selecting random samples is to use random number tables. Random numbers
are often tabulated on some standard books. Sampling is then just by picking numbers at random
from the table until the required sample size is obtained.
(II) Systematic Sampling: In some instances, the most practical way of sampling is to select
every ith item on a list. An element of randomness is introduced into this kind of sampling by
using random numbers to pick up the unit with which to start. The following steps will help:
- Assign a sequence number to each member of the population.
- Determine the skip interval by dividing the number of units in the population by the sample
size. I=P/S where I is skip interval, P is population size, and S is sample size.
- Select a starting point in a random digit table (it must be between 1 and I).
- include that item in a sample and select every i th item thereafter until total sample has been
selected.
For example, if we want to take 20 samples from a population of 100 members, our skip interval
is 5 (i.e 100/20). Our starting point must be selected randomly from the interval 1 to 5. Then
every fifth item will be our sample. If our starting point is 2, then our sample must include
members with sequence numbers of 2, 7, 12, 17, 22, 25, …, 97.
The advantage of this sampling technique is the samples will spread evenly over the entire
population. It is also an easier and less costly method of sampling and can be conveniently used
even in case of large populations.
However, if there is a hidden periodicity in the population, systematic sampling will prove to be
an inefficient method of sampling.
(III) Stratified Sampling: If a population from which a sample is to be drawn does not
constitute a homogeneous group, stratified sampling is generally applied. Under stratified
sampling, the population is divided into several subpopulations (strata) that are individually more
homogeneous than the total population and then we select items from each stratum to constitute a
sample. Stratified sampling results in a more reliable and detailed information. The basic steps
for stratified sampling are:
a) Divide the population to be surveyed in to strata of similar study units or into areas with
which similar social, environmental, or economic conditions exist.
b) Make a separate and complete list of the stratum and from each stratum draw a separate
random sample of study units using these lists.
c) A similar survey is then done on the sample of study units in each of the strata i.e. the
same questionnaire is used.
The main advantages of stratified sampling are: (i) more reliable information is obtained for the
same sample size if the population is stratified than they are for the population as a whole. (ii)
Comparisons between strata are easy. This is so because a separate but similar survey is done in
each stratum.
(IV) Cluster Sampling: If the total area of interest happens to be a big one, a convenient
way in which a sample can be taken is to divide the area in to a number of smaller non-
overlapping areas and then to randomly select a number of these smaller areas (clusters), with
the ultimate sample consisting of all units in these small areas or clusters.
In cluster sampling, the total population is divided into a number of relatively small subdivisions
which are themselves clusters of still smaller units and then some of these clusters are randomly
selected for inclusion in the overall sample. Cluster sampling reduces cost by concentrating
surveys in selected clusters. But certainly it is less precise than simple random sampling.
2) NON-PROBABILITY SAMPLING TECHNIQUES
While useful for many studies, non-probability sampling procedures provide only a weak basis
for generalization. In reality, the conclusions drawn from a study of a non-probability sample are
limited to that sample and cannot be used for further generalization. In this type of sampling,
items for the sample are selected deliberately by the researcher; his choice concerning the items
remains supreme. For instance, if economic conditions of people living in a region are to be
studied, a few towns and villages may be purposively selected for intensive study on the
principle that they can be representative of the entire state. Thus, the judgment of the organizers
of the study plays an important part in this sampling design.
In such a design, personal element has a great chance of entering into the selection of the sample.
Sampling error in this type of sampling cannot be estimated and the element of bias, great or
small, is always there. Some of the major non-probability sampling techniques are the following:
a) Judgment (purposive) sampling - The researcher uses his judgment to select people that
he feels are representative of the population to have a particular expertise or knowledge which
makes them suitable.
b) Convenience (accidental) sampling - the most convenience population is chosen, which
may be the researchers friends, work colleagues or students from a nearby college. This method
is often used to save time and resources.
c) Quota sampling - The researcher selects a predetermined number of individuals from
different group (i.e. based on age, gender, etc). This is perhaps the most popular non-probability
sampling method used. Quota sampling is useful when a particular group or characteristics is
relatively rare in the population. By setting a quota, and selecting people until the quota is filled
the investigator ensures that the group or characteristics he/she wants in the sample are there.
d) Referral Sampling: this is a non-probability sampling technique which utilizes some form of
referral, wherein respondents who are initially contacted are asked to supply the names and
addresses of members of the target population.
Sampling Distribution
A sampling distribution is a probability distribution for the possible values of a sample statistic,
such as a sample mean.
NOTE: The normal probability distribution is used to determine probabilities for the normally
distributed individual measurements, given the mean and the standard deviation. Symbolically,
the variable is the measurement X, with the population mean µ and population standard
deviation δ. In contrast to such distributions of individual measurements, a sampling distribution
is a probability distribution for the possible values of a sample statistic.
The sampling distribution of the mean is the probability distributions of the means, X of all
simple random samples of a given sample size n that can be drawn from the population.
NB: the sampling distribution of the mean is not the sample distribution, which is the distribution
of the measured values of X in one random sample. Rather, the sampling distribution of the mean
For any given sample size n taken from a population with mean µ and standard deviation δ, the
value of the sample mean X would vary from sample to sample if several random samples were
obtained from the population. This variability serves as the basis for sampling distribution.
The sampling distribution of the mean is described by two parameters: the expected value ( X ) =
X , or mean of the sampling distribution of the mean, and the standard deviation of the mean x ,
the standard error of the mean.
Properties of the Sampling Distribution of Means
1. The mean of the sampling distribution of the means is equal to the population mean. µ =
X =X.
2. the standard deviation of the sampling distribution of the means (standard error) is equal to the
population standard deviation divided by the square root of the sample size: x = δ/√n. This hold
true if and only of n<0.05N and N is very large. If N is finite and n≥
mean x , can be
SX which is calculated
as estimated by using the sample standard error of the mean
3. The sampling distribution of means is approximately normal for sufficiently large sample
sizes (n≥ 30).
Example:
A population consists of the following ages: 10, 20, 30, 40, and 50. A random sample of three is
to be selected from this population and mean computed. Develop the sampling distribution of the
mean.
Solution:
The number of simple random samples of size n that can be drawn without replacement from a
population of size N is NCn. With N= 5 and n = 3, 5C3 = 10 samples can be drawn from the
population as:
2
Xi X 1000
14.142
N 5
N n 14.142 5 3
X n * N 1 3 * 5 1 5.774
2
X i X
333.4 5.774
N 10
x < δ except the cases where δ = 0 and 1.
Since averaging reduces variability n= .
Central Limit Theorem and the Sampling Distribution of the Mean The
Central Limit Theorem (CLT) states that:
1. If the population is normally distributed, the distribution of sample means is normal
regardless of the sample size.
2. If the population from which samples are taken is not normal, the distribution of sample
means will be approximately normal if the sample size (n) is sufficiently large (n ≥ 30).
The larger the sample size is used, the closer the sampling distribution is to the normal
curve.
The relationship between the shape of the population distribution and the shape of the sampling
distribution of the mean is called the Central Limit Theorem.
The significance of the Central Limit Theorem is that it permits us to use sample statistics to
make inference about population parameters with out knowing anything about the shape of the
frequency distribution of that population other than what we can get from the sample. It also
permits us to use the normal distribution (curve for analyzing distributions whose shape is
unknown. It creates the potential for applying the normal distribution to many problems when
the sample is sufficiently large.
Example:
1. The distribution of annual earnings of all bank tellers with five years of experience is skewed
negatively. This distribution has a mean of Birr 15,000 and a standard deviation of Birr 2000. If
we draw a random sample of 30 tellers, what is the probability that their earnings will average
more than Birr 15,750 annually?
Solution:
Steps:
1. Calculate µ and x µ = Birr 15,000
x = δ/√n= 2000/√30 = Birr 365.15
2. Calculate Z for X
ZX X X X
X X
2. Suppose that during any hour in a large department store, the average number of shoppers is
448, with a standard deviation of 21 shoppers. What is the probability of randomly selecting 49
different shopping hours, counting the shoppers, and having the sample mean fall between 441
and 446 shoppers, inclusive?
Solution:
1. Calculate µ and x µ = 448 shoppers
x = δ/√n= 21/√49 = 3
2. Calculate Z for X
ZX X X X
X X
Whereas the mean is computed by averaging a set of values, the sample proportion is computed
by dividing the frequency that a given characteristic occurs in a sample by the number of items
in the sample.
X
P
n
Like other probability distribution, sampling distribution of the proportion is described by two
parameters: the mean of the sample proportions, E ( P ) and the standard deviation of the
proportions, P which is called the standard error of the proportion.
1. As the sampling distribution of the mean does, the population proportion, P, is always
equal to the mean of the sample proportion, i.e., P P ). =E(
Pq
n
2. The standard error of the proportion is equal to: P , where P= population
proportion
q=1–P
n = sample size.
Or
P n* N 1 N 1
Pq N n , where N n = finite population correction factor.
Answer: By applying the Central Limit Theorem. The CLT states that normal distribution
approximates the shape of the distribution of sample proportions if np and nq are greater than 5.
Consequently we solve problems involving sample proportions by using a normal distribution
whose mean and standard deviation are:
Pq
PP, P n andZPPP
P
p
NB: The sampling distribution of can be approximated by a normal distribution whenever the
sample size is large i.e., np and nq>5.
Example:
1. Suppose that 60% of the electrical contractors in a region use a particular brand of wire.
What is the probability of taking a random sample of size 120 from these electrical
contractors and finding that 0.5 or less use that brand of wire?
Solution:
p
n = 120 P = 0.6 q = 0.4 P ( < 0.5) =?
Steps:
1. Check that np and nq > 5
120*0.6 = 120, and 120*0.4 = 48. Both are greater than 5.
2. Calculate P
Pq 0.6*0.4
P n = 120 0.0477
p
3. Calculate Z for
PP
Zp
p
p
P ( < 0.5) = P (Z < -2.24)
= 0.5 - P (0 to -2.24)
= 0.5 – 0.48745
= 0.01255
P = 0.1
X = 12
p
= X/n = 12/80 = 0.15
p
P(
> 0.15) =?
0.
Z 0.15 1.49
p
P ( > 0.15) = P (Z > + 1.49)
= 0.5 – P(0 to + 1.49)
= 0.5 – P (0 to + 1.49)
= 0.5 – 0.43189 = 0.06811
About 6.81% of the time, twelve or more defective parts would appear in a random sample of
eighty parts when the population proportion is 0.10.
X X
1 2
This distribution is concerned with finding the difference between sample means drawn from two
populations; it is interested in determining if the mean of one population is equal to the mean of
another.
- Whether the mean life expectancy of females is equal to the mean life expectancy for
males
- Whether the mean productivity of women and men are equal or not
- Whether the mean CGPA for business students is equal to the mean CGPA for social
science students
- Whether the mean number of white blood cells in a droplet of blood is equal to the
mean number of red blood cells - etc.
In each case we have two different populations (p1 and p2). Population 1 has mean µ1 and
2 2
variance, 1 , and population 2 has mean µ2 and variance 2 .
x and x
means, which are denoted by 1 2 .
N.B: In sampling distribution of the difference between two means X X
1 2 , we are actually
concerned with five different probability distributions.
1st, Two distribution of the two populations, which have means and variances of µ 1 µ 2
2 2
and 1 & 2 respectively.
x and x x , x x , x2
2nd, Two sampling distributions of 1 2 with µ 1 = 1 1 and µ 2 = 2 .
The sampling distribution of the difference between two sample means X X
1 2 is described
by two parameters.
1. Mean of the difference between two sample means, µ 1 - µ 2
Variance X = X X = n
1 2 1 2 1 n2 X1 X2
[If X and Y are independent random variables: var (X-Y) = var (X) + var (Y)
Where, 1 = variance of population one
2
2
= variance of population two n1 =
sample size drawn from population one n2 =
sample size drawn from population two
The Central Limit Theorem and the sampling distribution of X X
1 2
If n1 and n2 are greater than 30, the distribution of X X
1 2 will be approximately
normal no matter how the original populations are distributed
If the original populations are normally distributed, then the distribution of X X
1 2 is
exactly normally distributed for any values of n1 and n2. This means the sum or difference
of independent normal variables is normally distributed.
To standardize a difference between two sample means X X
1 2 we use the following
formula:
Z
X1X2 X X
1 2 1 2
X 1 X2
Example:
A financial loan officer claims that the mean monthly payment for credit cards is Br 80 with a
variance of 1400 for single females and Br 80 with a variance of 1320 for single males. You take
a random sample of 100 females (population 1) and an independent random sample of 120
(population 2). What is the probability that the sample mean for females will be at least Br 5
higher than the sample mean for males? Solution:
px x 5?
1 2
2 2
n
X1 X2 =
1 2
n1
2
Е X X
1 2 =µ -µ1
1400 1320
100 120
2
= 80 – 80 =
=0 =5
Z
X1X2 X X
1 2 1 2
X 1 X2
Z5 1.0
p x x 5P(Z 1.0)
1 2
= 0.5 – P (0 to +1.00)
= 0.5 - 0.34134
= 0.15866
There is a 15.9% chance that the mean monthly payment for credit cards for single females will
be higher than that of single males by at least Birr 5
Example : MOHA soft drinks factory produces two soft drinks: 7 up and Pepsi-cola. The daily
15,000
production of 7 up averages 1 bottles and is normally distributed with a standard
12,500 2500
deviation 2 bottles and standard deviation 2 bottles. A sample of five randomly
selected daily production figures is taken from each of the plants. What is the probability that
the sample mean production for 7 up will be less than or equal to the sample mean production for
Pepsi-cola? Solution:
1 15,000bottles 2 12,500bottles 1 2,000 2 2,500bottles n1 = 5 n2 = 5
p x x 0?
1 2
Е X X
1 2 = µ 1- µ 2
2 2
n
X1 X2 =
1 2
n1 2
= 15,000 – 12,500
2 2
2000
2,500
5 5
= 2,500 =
= 1,431.78
Z
X1X2 X X
1 2 1 2
X 1 X2
Z0 1.75
= 0.5 – P (0 to -1.75)
= 0.5 - 0.45994
= 0.04006
Thus, there is only a 4% chance that the mean productivity for 7UP will be smaller than the mean
productivity for Pepsi-cola. So, if the owner of the two plants found a smaller first sample
x 13,500
mean, say 2 bottles, in independent random samples of five randomly selected days from
each plant, he would suspect that either the sampling was faulty or that the difference in the
plant’s mean daily outputs had changed.
PROPORTIONS P P1 2 .
Frequently we are interested in determining if the proportion of items in one population that
possesses a certain characteristic is the same as the proportion possessing the characteristic in
another population. For example, a doctor who gives one type of medicine to some patients and
another medicine to others may want to determine if the percentage of people cured by the first
medicine is the same as the percentage of people cured by the second. For this and other similar
cases the sampling distribution of the difference between sample proportions is used. It is also
used to measure the proportion of market share and proportion of vote.
Suppose we take independent samples of size n 1 and n2 from two populations. Let p1 and p2 be
the proportions of items in each population that possesses a certain characteristic, and let
If n1p1, n1q1 are greater than 5 and n2p2, n2q2 are greater than 5, then the random variable
p p
1 2 is approximately normally distributed with
Mean: Е p p
1 2 = P1 – P2; and
Variance: Var p p
1 2 =
n1 n2
; if
N1 N2
, finite population correction factor is used.
To standardize a difference between two sample proportions p 1
p
2 we use the following
formula:
Z
P1P2 P P P P
1 2 1 2
P 1 P2
Example:
At Addis Ababa University there is a movement to re-establish the Students’ Union.
Approximately 90% of the entire students favor the reinstatement. A pro union student takes a
random sample of 100 students. An anti-union student takes an independent random sample of
p
100 students. Let 1 denote the proportion of student who favor union in a sample taken by the
p
pro-union student and 2 denote the proportion of students who favor the union in the sample
p p
taken by an anti-union student. Calculate the probability that 1 exceeds 2 by 0.1 or more.
Solution:
Pro-union Anti-union
P1 = 0.9 P2 = 0.9
q1 = 0.1 q2 = 0.1
n1 = 100 n2 = 100
P p p 0.10?
1 2
p1 q1 p2q 2
Е p 1
p
2 = P 1- P 2 p 2 =
1
n1 n2 p
= 0.9 – 0.9
0 .9 * 0 .1 0 .9 * 0 .1
=0 = 100 100
= 0.04243
Z p1
p2 p p P P
1 2 1 2
p p
1 2
Z0.1 2.36
= 0.5 - P (0 to +2.36)
= 0.5 – 0.49086
= 0.00914
Chapter 2
STATISTICAL ESTIMATION
Introduction
The sampling distribution of the mean shows how far sample means could be from a
known population mean. Similarly, the sampling distribution of the proportion shows
how far sample proportions could be from a known population proportion. In
estimation, our aim is to determine how far an unknown population mean could be
from the mean of a simple random sample selected from that population; or how far an
unknown population proportion could be from a sample proportion. Those are the
concerns of statistical inference, in which a statement about an unknown population
parameter is derived from information contained in a random sample selected from the
population.
Basic concepts:
Types of Estimates:
We can make two types of estimates about a population: a point estimate and an
interval estimate.
Z/ 2n
X
Where:
X = sample mean
Z = value from the standard normal table reflecting confidence level
σ = population standard deviation n = sample size
α = the proportion of incorrect statements (α = 1 – C)
= unknown population mean
From the above formula we can learn that an interval estimate is constructed by adding
and subtracting the error term to and from the point estimate. That is, the point estimate
is found at the center of the confidence interval.
To find the interval estimate of population mean, we have the following steps.
Example:
1. The vice president of operations for Ethiopian Tele Communication Corporation (ETC)
is in the process of developing a strategic management plan. He believes that the ability
to estimate the length of the average phone call on the system is important. He takes a
random sample of 60 calls from the company records and finds that the mean sample
length for a call is 4.26 minutes. Past history for these types of calls has shown that the
population standard deviation for call length is about 1.1 minutes. Assuming that the
population is normally distributed and he wants to have a 95% confidence, help him in
estimating the population mean. Solution:
1 This formula works also for problems which involve large sample size (n>30) even though the
population is not normally distributed. And if n>.05N, finite population correction factor may be used.
n= 60 calls X = 4.26 minutes σ = 1.1 minutes C= 0.95
c. Part b has a better precision because the sample size is larger as compared with part
a.
If the sample size is small (n<30), we can develop an interval estimate of a population
mean only if the population has a normal probability distribution. If the sample
standard deviation s is used as an estimator of the population standard deviation and
if the population has a normal distribution, interval estimation of the population mean
can be based up on a probability distribution known as t-distribution.
Characteristics of t-distribution
1. The t-distribution is symmetric about its mean (0) and ranges from - ∞ to ∞.
2. The t-distribution is bell-shaped (unimodal) and has approximately the same
appearance as the standard normal distribution (Z- distribution).
3. The t-distribution depends on a parameter ν (Greek Nu) 2, called the degrees of
freedom of the distribution. ν = n -1, where n is sample size. The degree of freedom,
ν, refers to the number of values we can choose freely.
The confidence interval to estimate becomes:
s X t/2,v
n
2 What are degrees of freedom? We can define them as the number of values we can choose freely. In
general, the degrees of freedom for a t statistic are the degrees of freedom associated with the sum of
ν = n – 1 (degrees of freedom)
s = sample standard deviation
n = sample size
= unknown population mean
Steps:
1. Calculate degrees of freedom (v=n-1) and sample standard error of the mean.
2. Compute 2
t
3. Look up /2,V
4. Construct the confidence interval
5. Interpret results
Example:
1. If a random sample of 27 items produces x = 128.4 and s = 20.6. What is the 98%
confidence interval for? Assume that x is normally distributed for the population.
What is the point estimate?
Solution:
The point estimate of the population mean is the sample mean, in this case 128.4 is the
point estimate.
squares used to obtain an estimate of the variance. The variance estimate depends on not only on the
sample size but also on how many parameters must be estimated with the sample:
Degreesof Number of Number of parametersthat
freedom
Observations must be estimated beforehand
Here we calculate sample variance by using n observations and estimating one parameter (the mean).
118.56 ≤ ≤ 138.22
We state with 98% confidence that the population mean lies between 118.56 and 138.23.
P PP
normal with Z P .
p Pq
PpZ pq
Since Z represents the confidence level we write it as / 2 n
pZ/ 2Sp
Where: p = sample proportion
q=1-p α=1–C
n = sample size
P = unknown population proportion
Example:
1. Recently, a study of 87 randomly selected companies with telemarketing operation was
completed. The study revealed that 39% of the sampled companies had used
telemarketing to assist them in order processing. Using this information estimate the
population proportion of telemarketing companies who use their telemarketing
operation to assist them in order processing taking a 95% confidence level. Solution:
n= 87 p = 0.39 q = 0.61 C = 0.95
S
iv. P p Z/2 p
= 0.39 ± 1.96(0.0523)
= 0.39 ± 0.1025
0.2875 ≤ P ≤ 0.4925
We state with 955 confidence that the proportion of companies which use telemarketing to
assist order processing lies between 0.2875 and
If the two parent populations are normal, then the sampling distribution of the difference
between two means will be normally distributed regardless of n (sample size). And we can
estimate 1 2 (regardless of n1 &n2 using the following formula; given that 1 &2 are
known.
1 2 X1 X 2Z / 32 X1 X2
2 2
2 2 1 2
2 X1 X2 n n
1 2
X1X
When 1 and 2 are not known, the standard error between two sample means
x x is estimated by the sample standard error of the difference between two
1 2
2 2 S 12 S 22
X 1 X 2 X1 X2
n1 n2
sample means, S S S , and the interval estimation takes
S
the following form: 1 2 X1 X 2 Z/2 X1X2 , given that the sample sizes are large.
Example:
1. In a sex discrimination case, an employee alleged that a large corporation paid men more
than women for comparable work. Let population 1 represent all male employees
performing certain jobs and population 2 represent all female employees performing
comparable jobs at the corporation. Independent samples are taken of n1 100males
and n2 100females; the sample means are x1 Birr20,600and x2 Birr 19,700, and the
sample standard deviations are s1 Birr3,000ands2 Birr 2,500. Construct a 95%
confidence interval for1 2 . What do you conclude from this?
Solution:
Male employees Female employees n1 100males n2 100females C=
0.95
3 This formula works also for problems which involve large sample sizes n1 &n2 30even though the
parent population may not be normally distributed.
S12 S 22 (3,000) 2 (2,500) 2
SX n n
1 X 2
1 2 100 100
142,500 390.51
ii. Compute 2
α = 1-C = 1- 0.95 = 0.05
α/2 = 0.05/2 = 0.025
iii. Look upZ/ 2 Z0.025 1.96 iv. Construct the confidence interval
Because this interval contains only positive values, we can be quite confident that 1
2 > 0. Thus, it reasonable to assume that the mean salary for males exceeds the
mean salary for females.
We know that the unbiased estimator of the difference between the proportions of two
populations p1 p2 is the difference between two sample proportionsp1 p2 , where
each sample is a random sample taken from the respective target population. Moreover,
based on CLT, if n1 p1,n1q1andn2 p2,n2q2 are greater than 5, the sampling
P1q1 p2q2
n1 n2
However, here p1andp2 are unknown, and we want to estimate p1andp2 by p1andp 2
P1q1 p2q2
n1 n2
p1 q1 p 2 q 2
P P p1 p 2 Z
n1 n 2 1 2
Where:
p1 = the sample proportion of success in the first sample
p 2 = the sample proportion of in the second sample
q1 = 1- p1 q 2 = 1- p 2
n1 = sample size drawn from the first population n2 =
sample size drawn from the second population α = 1
-C
This formula holds true provided that n1 p1, n1q1 5 and n2 p2 ,n2q2 5.
Example:
1. A TV executive is interested in determining if the proportion of people who watch a
late-night talk show is higher with the regular host or a guest host. In a random sample
of 400 people, 175 watch the show when the regular host is on. In an independent
random sample of 500 people, 185 watch the show a guest host is on. Calculate a 95%
confidence interval for p1 p2 . What do you conclude?
Solution:
Regular host Guest Host n1 = 400 p1 = 0.4375 n2 = 500 p 2 = 0.37
X1 = 175 q1 = 0.5625 X2 = 185 q 2 = 0.63
C = 0.95
i. Calculate the sample standard error of the diff. between two proportions
p1 q1 p 2 q 2 0.4375 * 0.5625 0.37 * 0.63
p1 p 2 S 0.033
n1 n2 400 500
ii. Compute 2
α = 1-C = 1- 0.95 = 0.05 α/2
= 0.05/2 = 0.025
iii. Look upZ/ 2 Z0.025 1.96 iv. Construct the confidence interval
p1 q1 p 2 q 2
P1 P2 p1 p2 Z/ 2 n1 n 2
0.4375 0371.96(0.033)
= 0.0675 ± 0.065
0.0025 ≤ p1 p2 ≤ 0.1325
We state with 95% confidence that the true difference between p1 p2 is between 0.0025
and 0.1325. Since this interval contains only positive value it is reasonable to say that
the proportion of people who watch TV when the regular host is on is greater than
when the guest host is on.
2. A city planner claims that home owner tend to have closer ties to their community than
do renters. Thus, home owners are more willing to pay for good schools and
recreational facilities than are renters. In a random sample of 120 home owners, 51
stated that the local tax rates were too high and 69 stated that tax rates were “about
right” In an independent random sample of 200 renters 70 stated that the tax rates were
too high and 130 thought they were “about right” .
a. Find a 99% confidence interval for the difference in proportions who think that taxes
are too high.
b. Do the data support the city planners claim?
Solution:
Home Owners Renters
n1 = 120 p1 = 0.425 n2 = 200 p 2 = 0.35
X1 = 51 q1 = 0.575 X2 = 70 q 2 = 0.65
C = 0.99
i. Calculate the sample standard error of the diff. between two proportions
p1 q1 p 2 q 2 0.425 * 0.575 0.35 * 0.65
p1 p 2 S 0.056
n1 n2 120 200
ii. Compute 2
α = 1-C = 1- 0.99 = 0.01 α/2 =
0.01/2 = 0.005
iii. Look upZ/ 2 Z0.005 2.57 iv. Construct the confidence interval
p1 q1 p 2 q 2
P1 P2 p1 p2 Z/ 2 n1 n 2
- 0.069 ≤ p1 p2 ≤ 0.219
We state with 99% confidence that the difference between the proportion of home owners
and renters who said that the tax rates are too high lies between –0.069 and 0.219.
Since the confidence interval contains positive, zero, and positive values, we can not
certainly state that home owners are more willing to pay for good schools and
recreational facilities than are renters. Hence, the data do not necessarily support the
city planner’s claim.
The reason for taking a sample from a population is that it would be too costly to gather
data for the whole population. But collecting sample data also costs money; and the
larger the sample, the higher the cost. To hold cost down, we want to use as small a
sample as possible. On the other hand, we want a sample to be large enough to provide
“good” approximation/estimates of population parameters. Consequently, the question
is “How large should the sample be?”
n
From the above expression Z/ 2 is called error of estimation (e). That is, the
n
2
e = Z/ 2 2
2
Squaring both sides results ine2 Z/ 2 n . Solving for n results in, n /2
e 2
Z2
n Z/
22
e
Example:
1. A gasoline service station shows a standard deviation of Birr 6.25 for the changes made
by the credit card customers. Assume that the station’s management would like to
estimate the population mean gasoline bill for its credit card customers to be with in ±
Birr 1.00. For a 95% confidence level, how large a sample would be necessary?
Solution:
e = Birr 1.00 σ = Birr 6.25 C = 0.95 Z/ 2 Z0.025 1.96
n Z/22
e
*
1.96 6.252 n 1 4
150.06 151