1-1
Statistics for economists
Dr. Ahmed El-Kotory
1-2
Course content
Sampling Methods and the Central Limit Theorem
Estimation and Confidence Intervals
One-Sample Tests of Hypothesis
Two-Sample Tests of Hypothesis
Using Minitab 18
Analysis of Variance
Package
Linear Regression.
Index Numbers
Course content
1-3
Descriptive Statistics introduces the methods of
collecting, organizing, summarizing, presenting, and
interpreting a data set, of a certain variable, in an
informative way.
Inferential Statistics: It is the branch of statistics
introduces the methods to assist in making more effective
decisions (estimation, prediction, testing, or generalization)
about a population, based on a sample.
A Population is a A Sample is a
collection of all possible portion, or part,
individuals, objects, or of the population
measurements of interest. of interest
Quick review
1-4
Which would you prefer to rely on the Population or
the sample, To calculate descriptive Statistics ?
Any measure Any measure
calculated from the calculated from a
population is called random sample is
"Parameter", which called "Statistic or
is always fixed (for Estimator", which is a
example ), and when random variable (for
it is unknown we need example ̅ ), with a
to conduct statistical probability
inference about it distribution or
(Estimation and "Sampling
Testing Hypotheses). Distribution".
1-5
Used Notations
For Population For Sample
Any measure calculated Any measure calculated is
is called "Parameter“ called "Statistic or Estimator“
which is always fixed which is a random variable
number of observations
The Mean
standard deviation
Variance
Proportion
1-6
Why sample?
1- The physical impossibility of checking all items in the
population.
2- The cost of studying all the items in a population may be
prohibitive.
3- The destructive nature of certain tests.
4- The time-consuming aspect of contacting the whole
population.
5- The adequacy of sample results in most cases.
Why Sample the Population?
1-7
Sampling
Method
probability nonprobability
sample sample
Cluster based on the
Simple Systematic Stratified judgment
Sampling:
Random Random Random
(Multistage Snow ball
Sample Sampling Sampling
Sample) sample
Sampling Method
1-8
A probability sample is a sample selected such that each
item or person in the population being studied has a known
likelihood of being included in the sample.
1- Simple Random Sample A sample formulated so
that each item or person in the population has the same chance
of being included (There are no fundamental differences
between individuals (homogeneous population).
2- Systematic Random Sampling
The items or individuals of the population
are arranged in some order. A random
starting point is selected and then every kth
member of the population is selected for
the sample.
Probability Sampling/Methods
1-9
3- Stratified Random Sampling: A population is first
divided into subgroups, called strata, and a Simple Random
Sample is selected from each stratum.
Methods of Probability
Sampling
1-10
4-Cluster Sampling: (Multistage Sample)A
population is first divided into primary units then samples are
selected from the primary units.
Difference between Stratified Random Sampling & Cluster Sampling
1-11
The sampling error is
the difference between
a sample statistic and The sampling distribution
its corresponding of the sample mean is a
population parameter. probability distribution
consisting of all possible
sample means of a given
Notice that the mean sample size selected from
of the sample means a population.
is exactly equal to
the population mean
.
1-12
The Mean of the sampling distribution of ̅ , is called the
The standard deviation of the sampling distribution of ̅ , is
called the "Standard Error of , S.E ( )
what does it mean?
1-13
Example: just for understanding
In the law firm F.Eldib and Associates, there are Five partners.
Listed next is the number of cases each associate actually tried
in court last month.
Associate Number of Cases
Ahmed 3
Ali 4
Norhan 1
Ibrahim 2
Farid 0
a. How many different samples of 3 are possible?
b. List all possible samples of size 3, and compute the mean
number of cases in each sample.
c. Compare the mean of the distribution of sample means to the
population mean.
1-14
5 objects A total of 10
taken 3 at different
a time. samples
∑𝐱
sample no. partner partner value total Mean 𝐱 =
𝐧
1 Ahmed, Ali, Norhan 3 ,4 , 1 8 2.667
2 Ahmed, Ali, Ibrahim 3, 4, 2 9 3
3 Ahmed, Ali, Farid 3, 4, 0 7 2.333
4 Ali, Norhan, Ibrahim 4, 1, 2 7 2.333
5 4, 1, 0 5 1.667
6 1, 2 , 0 3 1
7 3 , 1, 2 6 2
8 3,1,0 4 1.333
9 3,2,0 5 1.667
10 4,2,0 6 2
Total 20
Note that: the sample mean change from one sample to the other
The population mean . = =2
The Mean of the sampling distribution of ̅ , is called
the mean of the distribution of sample means = =2
1-15
sample mean 1 1.333 1.667 2 2.333 2.667 3 total
frequency 1 1 2 2 2 1 1 10
sample mean frequency
1.8
1.6
1.4
1.2
0.8
0.6
Shape of the sampling distribution of
0.4
0.2
looks like the Normal distribution
0
1 1.333 1.667 2 2.333 2.667 3
Does the shape remind you of something?
1-16
The shape of the sampling distribution of ̅ is:
Normal (Approximately) Normal
If the population is normal, If the population is not normal
whatever the sample size n (or unknown), but ≥
(Central Limit Theorem "CLT").
The sampling distribution of ̅ is normal or Approximately Normal
with mean
and variance,
i.e. )
1-17
The standardized units required to compute probabilities concerning
̅ are given by:
Remember that:
The table of " " gives the probability: ( < ) = ( ) , for +ve " ".
1) ( ≤ )= ( )
2) ( ≤− )= − ( ≤ )= − ( ) No Need
3) ( ≥ )= − ( ≤ )= − ( ) Already studied
4) ( ≥ − ) = − (− ) = ( ) last year
5) ( ≤ ≤ )= ( )− ( )
6) ( = )=
7) ( > 3.49) = 1
1-18
Almost
impossible
Don't forget the main target
population census population
parameters
Sampling
error
Sampling sample
sample
method Statistics
Non-probability probability sample
sample
Simple Random Sample
Systematic Random Sampling
Stratified Random Sampling
Cluster Sampling
1-19
Point and Interval Estimation for the population mean
A point estimate is a An Interval Estimate
single value (statistic) states the range within
used to estimate a which a population
population value parameter probably
(parameter). lies.
A confidence interval is a range of values within
which the population parameter is expected to
occur.
1-20
Point and Interval Estimation for the population mean
Estimation : Is the assignment of value(s) to a population parameter based on a
value(s) of the corresponding sample statistic.
Point Estimate : Is the value of the sample statistic (or estimator).
Point Estimator : Is the sample statistic (the rule).
Example (1): A random sample of size 3 is selected from a population as: 9, 2, 4.
a) What is the point estimator of the population mean?
b) What is the point estimate of the population mean?
Solution:
a) The point estimator is:
b) The point estimate is:
1-21
The Confidence Interval (C.I) for the population mean
≥ 0 < 30
̅± × ̅± ×
̅± × ̅± ×
The population is normal, or not The population must be normal.
normal.
Notes:
• The margin of error (the maximum error of estimation):
𝝈 𝑺 𝑺
𝟏
𝜶 × Or E= 𝟏
𝜶 × Or E= (𝒏 𝟏
𝜶
, 𝟐) ×
𝟐 𝒏 𝟐 𝒏 𝒏
• The confidence level (C.L) is denoted by ( − ) %
• Lower Confidence Limit: = ̅−E VIP
• Upper Confidence Limit: = ̅+
1-22
• Some values of : 𝟎.𝟗𝟓 𝟎.𝟗𝟕𝟓 𝟎.𝟗𝟗 𝟎.𝟗𝟗𝟓
1.645 1.96 2.326 2.576
(∑ )
• The sample standard deviation =
• from the " " table
Example (2):
A random sample of size 100 is drawn from a population with mean
and variance =64. If the sample mean is 50.
a. What is the point estimate of ?
b. What is the point estimator of the population mean?
c. What is the sampling (probability) distribution of that estimator?
d. Construct a 95% C.I for
e. What is the margin of error of estimate for part d?
1-23
Solution:
a. The point estimate is: ̅ =
b. The point estimator is the sample mean:
c. Because the distribution of the population is unknown, but the
sample size is large (n≥30), hence using the CLT: the sampling
distribution of ̅ approximately normal with mean and variance
= = 0.64 , i.e. ~N( , . ).
d. > 30 , is known Therefore, we will use the distribution to
make a C.I for . From the given information,
n = 100, 50 , = , = = ,
− = . , = . , Ú = . ,
− Ú = . , 𝜶 = = 1.96
𝟐
1-24
The C.I for is: : ± ×
± ×
±
LL = 50− 1.57= 48.43 UL= 50 + 1.57= 51.57
The C.I is:(48.43 ; 51.57)
Thus, we are 95% confident that the interval between 48.43 and
51.57 contains the population mean
e. the margin of error = E = × = × =
1-25
1 3
2
1-26
Confidence
Level
UL
LL
Sample size sample mean
n = 100 ̅ = 50 standard Error
Population
standard
Deviation
1-27
Example (3):
Given the following information: The sample size = , the
sample mean ̅ = . , the sample standard deviation = . ,
make a 99% C.I for the population mean .
Solution:
> 30 is unknown
Therefore, we will use the distribution to make a C.I for . From
the given information,
= ̅= . = .
− =. 99 =. 01 = 0.005 = 0.995
1-28
The C.I for is: ± ×
. ± × . 3
. ±0.11
LL = . - 0.11 = . 7 UL = . + 0.11 = . 9
The C.I is: ( . ; . )
Thus we can state with 99% confidence that the population mean
lies between 31.87 and 32.09.
1-29
1
3
2
1-30
Note: The values of the upper and lower bounds computed
manually using the Z distribution as a theoretical approximation
due to the large sample size are not very different from the values
calculated by computer using the original t distribution
1-31
Example (4):
Given that: = , ̅= , = . , construct a 90% C.I for the
population mean .
Solution:
> 30 Therefore, we will use the
distribution to make a C.I for .
is unknown
From the given information,
n = 40 = 23 = 3.75
− =. 90 =. 10 = 0.05 = 0.95
1-32
The C.I for is: ± ×
3.75
± ×
± × 0.593
±0.98
LL = - . 8= . 2 UL = + . 8= . 8
The C.I is: ( . 2 ; . 8)
Thus we can state with 90% confidence that the population mean
lies between . 2 and . 8
.
1-33
1
3
2
1-34
1-35
Example (5):
A random sample of 36 mid-sized cars tested fuel consumption gave
a mean of 26.4 miles per gallon with a standard deviation of 2.3
miles per gallon. Find a 98% C.I for the population mean.
Solution:
n > 30 Therefore, we will use the
distribution to make a C.I for . From
is unknown the given information,
n = 36 = 26.4 = 2.3
− =. 98 =. 02 = 0.01 = 0.99
1-36
The C.I for is: ± ×
2.3
26.4 ± ×
26.4 ± ×0.383
26.4 ±0.89
LL = 26.4 - 0.89 = 25.51 UL = 26.4 + 0.89 = 27.29
The C.I is: (25.51 ; 27.29)
Thus we can state with 98% confidence that the population mean
lies between 25.51 and 27.29.
1-37
Example (6):
For a data set obtained from a sample, = and ̅ = . . It is
known that = . , and the population is normally distributed.
a. What is the point estimator of the population mean?
b. What is the point estimate of ?
c. What is the sampling (probability) distribution of that estimator?
d. Make a 99% C.I for .
e. What is the margin of error of estimate for part d?
Solution:
a. the point estimator of the population mean is:
b. the point estimate of =
c. the sampling (probability) distribution of that estimator
Because the distribution of the population is normal, the sampling
distribution of ̅ is also normal with mean , and variance
= = . , i.e. ̅ ~ ( , . )
1-38
d. 99% C.I for
n < 30 Therefore, we will use the
is known distribution to make a C.I for
. From the given
The population is information,
normally distributed.
n = 20 = 24.5 = 3.1
− =. 99 =. 01 = 0.005 = 0.995
1-39
The C.I for is: ± ×
3.1
24.5± ×
20
24.5± ×0.693
24.5± 1.79
LL = 24.5 - 1.79 = 22.71 UL = 24.5 + 1.79 = 26.29
The C.I is: (22.71 ; 26.29)
Thus we can state with 99% confidence that the population mean
lies between 22.71 and 26.29.
3.1
e. the margin of error of estimate: E = × = × = 1.79
20
1-40
Example (7):
A certain machine is adjusted to produce units with average weight
grams and standard deviation 3 grams. A sample of 4 units is
selected at random from the production of this machine. The
weights were: 150, 200, 200, 250. Construct a 99% C.I for the
average weight, assuming that the weights of all units are normal.
Solution:
n < 30 Therefore, we will use the
is known distribution to make a C.I for
. From the given
The population is
information,
normally distributed.
n=4 =3 = = = 200
− =. 99 = 0.995
1-41
The C.I for is: ± ×
200 ± ×1.5
200 ± 3.86
LL = 200 - 3.86 = 196.14 UL = 200 + 3.86 = 203.86
The C.I is: (196.14 ; 203.86)
Thus we can state with 99% confidence that the population mean
lies between 196.14 and 203.86.
1-42
2
1-43
Note: The standard error was calculated using the population standard deviation
Confidence
Sample Level
standard
Deviation
S
UL
LL
Sample size sample mean
n = 100 standard Error
̅ = 200
Population
standard
Deviation
1-44