Econ 316: Applied Statistics For Economist
Econ 316: Applied Statistics For Economist
10/14/21 2
WHAT IS CONFIDENCE INTERVAL (CI)
• An interval of numbers used to approximate the
true value of a population parameter.
𝜎 𝜎 large samples
𝑥̅ − 𝑍! ≤ 𝜇 ≤ 𝑥̅ + 𝑍!
" 𝑛 " 𝑛
𝑠 𝑠 Small samples
*𝑥 − 𝑡! ≤ 𝜇 ≤ 𝑥̅ + 𝑡!
" 𝑛 " 𝑛
10/14/21 4
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS (Paired Data)
• Suppose we have two independent random samples
with mean 𝑥̅! and 𝑥̅" respective sample sizes n1 and n2
from a normal population with mean 𝜇! and 𝜇" and
variances and 𝜎!" and 𝜎"" .
10/14/21 6
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS contd
" "
• CASE 2: where and𝜎! are UNKNOWN but n1 and n2 are
𝜎"
LARGE.
• Then 100 1 − 𝛼 % confidence interval for 𝜇! − 𝜇" is given
by:
𝑠!" 𝑠"" 𝑠!" 𝑠""
𝑥!̅ − 𝑥̅" − 𝑍#/" + ≤ 𝜇! − 𝜇" ≤ 𝑥!̅ − 𝑥̅" + 𝑍#/" +
𝑛! 𝑛" 𝑛! 𝑛"
10/14/21 7
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS contd
" "
• CASE 3: 𝜎! and 𝜎" are UNKNOWN but n1 and n2 are SMALL.
A. If the variances are EQUAL, that is 𝝈𝟐𝟏 = 𝝈𝟐𝟐 then a
100 1 − 𝛼 % confidence interval on 𝜇! − 𝜇" is given by:
1 1 1 1
𝑥!̅ − 𝑥̅" − 𝑡#/" 𝑆𝑝 + ≤ 𝜇! − 𝜇" ≤ 𝑥!̅ − 𝑥̅" + 𝑡#/" 𝑆𝑝 +
𝑛! 𝑛" 𝑛! 𝑛"
10/14/21 8
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS contd
%# &! '#" ( %" &! '""
• Where 𝑆𝑝 =
%# (%" &"
𝑠!" 𝑠"" 𝑠!
"
𝑠"
"
𝑥!̅ − 𝑥̅" − 𝑡 ∗ #/" + ∗
≤ 𝜇! − 𝜇" ≤ 𝑥!̅ − 𝑥̅" + 𝑡 #/" +
𝑛! 𝑛" 𝑛! 𝑛"
10/14/21 9
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS contd
B. If no assumption of equality of the variance is
made, then a 100 1 − 𝛼 % confidence interval on
𝜇! − 𝜇" is given by:
10/14/21 10
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS
• where 𝑡 ∗ is approximately t-distribution with degrees of
freedom f given by:
" " "
𝑠# 𝑠"
+
𝑛# 𝑛"
𝑓=
" " " "
𝑠# 𝑠"
𝑛# 𝑛"
+
𝑛# − 1 𝑛" − 1
10/14/21 11
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS- EXAMPLE 1
• Research at the University JSS shows that first year JSS class
has 22 students whose mean height is 47.75 inches , while
the second year class has 25 students whose mean height is
50.40 inches. If the standard deviation for the heights of first
and second year students are known to be 1.80 and 2.05
inches respectively, find the 95% confidence interval for the
mean height 𝜇! − 𝜇" .
• Interpret your results
10/14/21 12
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS -Solution
• 𝑛! = 22, 𝑛" = 25, 𝑥!̅ = 47.75, 𝑥̅" = 50.40, 𝑠! = 1.80, 𝑠" =
2.05
• Note: Small sample size, mean and standard deviation are
known and no assumption about the equality of mean
Therefore we use the formula:
10/14/21 13
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS -solution
" " " "
$# $"
( #.(" ".)* "
%# %" ( *.*,-
"" "*
• 𝑓= " " = " " = = 48.98 ;
"
$# "
$" #.( " ".)* " *.**"
%# %" "" "*
( (
%# &# %" &# "# "+
∗ ∗
𝑡&.&"(,*+ ≈ 𝑡&.&"(,*& = 2.021
• By inserting the relevant values into the specified equation:
∗ ∗
2.65 − 𝑡*.*".,0, 0.561 ≤ 𝜇! − 𝜇" ≤ 2.65 + 𝑡*.*".,0, 0.561
2.65− 2.021 ∗ (0.561)≤𝜇_1−𝜇_2≤ 2.65+ 2.021∗ (0.561)
1.516 ≤ 𝜇! − 𝜇" ≤ 3.784
10/14/21 14
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS-INTERPRETATION
• If we are to take 100 different paired samples and find the
differences in the means. For 95% of the samples the
difference in means will lie in the interval. For the remaining
5% the difference in means will lie outside the interval.
10/14/21 15
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS-EXAMPLE 2
• QUESTION 2: A vending machine designed to dispense coffee into 8
milliliter cups was checked by a technician who samples 4 cups
before making an adjustment, and 5 cups after making an
adjustment. Assuming that the variance are known to be equal, find
a 90% confidence interval for the mean difference in the amount
dispensed due to the adjustment, if the sample showed the
following amount of coffee
Before adjustment. 6.92 7.34 7.26 6.88
After adjustment 7.33 7.93 7.65 7.49 7.10
10/14/21 16
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS -SOLUTION
𝑛# = 4, 𝑛" = 5, 𝑥#̅ = 7.10, 𝑥" = 7.50,
𝑠#" = 0.2338 " , 𝑠"" = 0.3148 " ,
𝑡$.$&,( = 1.895;
0&! *."11- " ( .&! (*.1!0-)"
𝑆𝑝 = = 0.2830
0(.&"
• By inserting the values above into the t-test formula specified:
1 1 1 1
𝑥!̅ − 𝑥̅" − 𝑡#/" 𝑆𝑝 + ≤ 𝜇! − 𝜇" ≤ 𝑥!̅ − 𝑥̅" + 𝑡#/" 𝑆𝑝 +
𝑛! 𝑛" 𝑛! 𝑛"
10/14/21 17
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS -SOLUTON
! !
• 7.10 − 7.50 ± 1.895 0.2830 +
0 .
• −0.40 ± 0.36
• Therefore −0.76 ≤ 𝜇! − 𝜇" ≤ −0.04
• INTERPRETATION:???.
10/14/21 18
DEGREE OF FREEDOM
• The degree of freedom represents the number of observations
in the sample that are free to vary around the mean of the
sample.
• Example 1: Let n = 2 and a and b are the values. For any mean
𝑥,̅ the value of b depends on a and not free to vary. That is if
𝑥̅ = 7 , and a = 10 then b must be 4.
10/14/21 20
DEGREE OF FREEDOM
• In other words, the degree of freedom represents the number
of observations in the sample that are free to vary around the
mean of the sample. Example, Let n = 2 and a and b are the
values. For any mean , the value of b depends on a and not
free to vary. That is if , and a = 10 then b must be 4.
10/14/21 21
CHI-SQUARE DISTRIBUTION
• Suppose we have a set of normal and independent variable
5# &6
𝑋! , 𝑋",…., 𝑋% and we normalized them, 𝑍! = , 𝑍" =
7
5" &6 5% &6
, … … . , 𝑍% = then the sum of the squares of the
7 7
normalized variables has a chi-squared (𝑥 " ) distribution.
;, &6 "
∑%9:! 𝑍9"
• That is 𝜒8" = = %
∑9:! is chi-square distribution with
7
v=n-1 degrees of freedom.
10/14/21 22
CHARACTERISTICS OF CHI-SQUARE
DISTRIBUTION
• It is skewed to the right as the right tail is asymmetrical to the
horizontal axis and the domain consists of non-negative real
numbers
• The sampling distribution for inferences about a population
variance to its estimator is described by the chi-square
distribution
• As the sample size increases the χ2 distribution becomes
symmetrical. Thus 𝑣 = 𝑛 − 1 > 30 it is approximated to the
normal distribution.
•
10/14/21 23
CHARACTERISTICS OF CHI-SQUARE
DISTRIBUTION
• Graphically:
𝑓(𝜒 " ) df=5
df=10
𝜒"
.
10/14/21 24
CHARACTERISTICS OF CHI-SQUARE
DISTRIBUTION
• The parameter of the chi-square distribution is called the
degree of freedom which is 𝑣 = 𝑛 −1
• Like the standard normal distribution 𝑍𝛼, the chi-squared
distribution 𝜒&" is defined as the value for which the area
under the curve to the right is equal to 𝛼.
• This value depends on the number of degrees of freedom
and must be obtained from a chi-square table.
10/14/21 25
a
2
1 - a2
a CHARACTERISTICS OF CHI-SQUARE
c 21 - a
DISTRIBUTION
1-a22 , v
c
c 2
a
1-a 2 , v
2 ,v
c 2
a
2 ,v
• Graphically:
10/14/21 26
CHARACTERISTICS OF CHI-SQUARE
DISTRIBUTION
"
• Also, 𝜒#/" is such that the area under the curve to the right
"
with v degrees of freedom is 𝛼/2 and 𝜒!&#/" means that the
area under the curve to the left with v degrees of freedom is
𝛼/2. This implies that the chi-square distribution is not
symmetrical.
10/14/21 27
CHARACTERISTICS OF CHI-SQUARE
DISTRIBUTION
• Graphically:
10/14/21 28
CI FOR A CHI-SQUARE DISTRIBUTION
• From the definition of chi-squared distribution, we know that:
;, &6 " %&! ;, &6 " %&! ' "
• 𝜒8" = ∑%9:! 𝑍9" = %
∑9:! = "
%
∑9:! = …1
7 7 %&! 7"
• where s is the sample variance and 𝑣 = 𝑛 − 1 > 0
" " "
• Let 𝑝 𝜒!& ! ≤ 𝜒 ≤ 𝜒! = 1 − 𝛼 … … .2
" "
" " "
• Consider the inequality 𝜒!& ! ≤ 𝜒 ≤ 𝜒 !…..3
"- "-
10/14/21 29
CI FOR A CHI-SQUARE DISTRIBUTION
(%&!)' "
• Recall: 𝜒 " =
7"
• By substituting equation 1 into equation 3, we obtain:
" (%&!)' " "
𝜒!& ! ≤ ≤ 𝜒 ! ……4
"-
7" "-
• By inverting equation 4:
" 7"
• 1/ 𝜒!& ! ≥ ≥ 1/𝜒 "!
"-
(%&!)' " "-
10/14/21 30
CI FOR A CHI-SQUARE DISTRIBUTION
• Multiply through the equation by (𝑛 − 1)𝑠 " we obtain:
(%&!)' " (%&!)' "
≥ 𝜎"≥
<" ! <"!
#&
"- "-
• Reversing the order of the inequality we obtain :
(()!)+ ! ()! + ! CONFIDENCE INTERVAL FOR A POPULATION
≤ 𝜎"≤ VARIANCE
,!" ,! "
!# $% !#
CONFIDENCE INTERVAL FOR A POPULATION
(%&!)' " %&! ' " STANDARD DEVIATION
≤σ≤
<"! <" !
10/14/21 "./
#&
"./ 31
CI FOR A CHI-SQUARE DISTRIBUTION
• INTERPRETATION: The interpretation is that with the help of
the sample variance (standard deviation) we are sure that the
population variance (standard deviation) will lie in between
the extreme points of the confidence interval.
10/14/21 32
MEAN AND VARIANCE OF CHI-SQUARE
DISTRITUTION
• If a variable x has a chi squared distribution, the expected value of the
variable 𝐸 𝑥 = 𝑣 degree of freedom and the variance 𝑉𝑎𝑟 𝑥 = 2𝑣 .
10/14/21 33
CI FOR A CHI-SQUARE DISTRIBUTION
• QUESTION 1: machine was tested 9 times with a standard deviation of
0.15. Construct a 90% confidence interval for the variance.
(()!)+ ! ()! + !
• By using the CI for pop Var. equation: ≤ 𝜎"≤
,!" ,! "
!# $% !#
10/14/21 34
CI FOR A CHI-SQUARE DISTRIBUTION
• By inserting the relevant value into the equation:
*.*"". - " *.*"". -
≤𝜎 ≤ = 0.012 ≤ 𝜎 " ≤ 0.066
!...! ".A1
10/14/21 37
F-DISTRIBUTION
• The F distribution is sometimes called the variance ratio.
10/14/21 38
F-DISTRIBUTION
• If two variables have independent chi – squared distribution 𝜒!"
and 𝜒"" with 𝑣! and 𝑣" degrees of freedom respectively.
(%# &#)$"# -# $"
#
0"
# C- 3" 3" $"
# -# # -#
The statistic formed by this ratio: 0"
#
= (%" &#)$"
= $"
= #
$"
= 𝐹8# 8"
" C- " " -" "
" 3" 3"
" -" " -"
10/14/21 39
CHARACTERISTICS OF F-DISTRIBUTION
• F distribution is skewed to the right and the range of values of
F is 0 ≤ 𝐹 ≤∝
10/14/21 40
CHARACTERISTICS OF F-DISTRIBUTION
• The F-distribution has two set of degrees of freedom, one for
the numerator and the other for the denominator.
• The degree of freedom depends on the way in which we obtain
estimates of the two variances appearing in the numerator and
denominator of the F-ratio.
• The F table gives the probability of the right-hand tail. Since
the F distribution is not symmetrical the left-hand tail cannot
be deduced from the regular F table.
10/14/21 41
CHARACTERISTICS OF F-DISTRIBUTION
• By conversion for a two-tail test, the F ratio is always evaluated with
the larger estimate of the variance as the numerator and the smaller
!
estimate as the denominator. Then 𝐹#/",8# 8" =
D#&!/",-# -"
$"
# E"
3
• 𝑝 𝐹!&!,8 ≤ $"
#
≤ 𝐹#/",8# 8" = 1 − 𝛼
" # 8" " E"
3
"
10/14/21 43
CONFIDENCE INTERVAL ON THE RATIO OF
TWO VARIANCE
$"
# E"
3
• Consider the inequality: 𝐹!&!,8 ≤ $"
#
≤ 𝐹#/",8# 8"
" # 8" "E
3"
"
'#" 7""
• By Rearranging we obtain: 𝐹!&!,8 ≤ ≤ 𝐹#/",8# 8"
" # 8" '"" 7#"
10/14/21 44
CONFIDENCE INTERVAL ON THE RATIO OF
TWO VARIANCE
#
• However, 𝐹#)!,* * =
" # " +!/",&# &"
10/14/21 47
CONFIDENCE INTERVAL ON THE RATIO OF
TWO VARIANCE
• QUESTION: In measuring the content of 6 boxes by one
machine, a student determined the sample variance to be
0.1754. In measuring the content of 11 boxes filled by a second
machine, he found a sample variance of 0.2704. Assuming that
the amount dispense follows a normal distribution for each
machine, find a 95% confidence interval for the ratio of the
variance.
10/14/21 48
CONFIDENCE INTERVAL ON THE RATIO OF
TWO VARIANCE
• SOLUTION: 𝑛! = 6 𝑛" = 11, 𝑠!" = 0. 1754, 𝑠"" = 0.2704 ,
#
𝛼 = 0.05, = 0.025, 𝐹!,8 8 = 𝐹*.*".,. !* =? ? ? ? ?
" " # "
HYPOTHESIS TESTING
Instructor: Prof. D. K. Twerefou & Emmanuel Abbey
WHAT IS STATISTICAL HYPOTHESIS?
• A statistical hypothesis is an assertion or conjecture about a
population parameter or parameters such as the mean, standard
deviation and the variance of a population.
• Hypotheses in statistics always come in pairs:
- Null hypothesis, 𝐻! (expresses no difference between a
parameter)
- Alternative hypothesis, 𝐻" (accepted when the null hypothesis is
rejected)
10/27/21 2
TYPES OF HYPOTHESIS
• Three main types of hypothesis.
• Consider the population parameter – the mean –μ and the
hypothesized value of the population mean -μo. Below are the
three main types of hypothesis:
𝑯𝑶 : 𝝁 ≥ 𝝁𝟎 𝑯𝑶 : 𝝁 ≤ 𝝁𝟎 𝑯𝑶 : 𝝁 = 𝝁𝟎
3
𝑯𝒂 : 𝝁 < 𝝁𝟎 𝑯𝒂 : 𝝁 > 𝝁𝟎 𝑯𝒂 : 𝝁 ≠ 𝝁𝟎
10/27/21
HYPOTHESIS TESTING
• In hypothesis testing, a specific idea concerning a parameter is
available before the study and the purpose of the test is to
conduct an experiment by collecting data to confirm, or otherwise
reject this idea
• Null hypothesis is valuated for reasonableness based on sample
evidence.
- If it is not reasonable, it is rejected.
- If it is reasonable or the observed difference is not significant,
then we do not reject it
10/27/21 4
TYPE I AND II ERRORS
• If the null hypothesis is true and rejected or false and
accepted, the decision is in either case an error.
• Type I error : If the null hypothesis is true and rejected then ;
probability of committing a type one error is designated by α
(Alpha).
• If a hypothesis is rejected at a 5 percent (α) level of
significance, then we are 95% confident that we have made
the right decision
10/27/21 5
TYPE I AND II ERRORS contd
• Type II error: When the null hypothesis is false and accepted then
• the probability of committing a type II error is designated by β (Beta
10/27/21 6
TYPE I AND II ERRORS contd
ACCEPT 𝑯𝑶 REJECT 𝑯𝑶
10/27/21 7
STEPS IN HYPOTHESIS TESTING
1. Formulate the null and alternative hypothesis
2. Set the value of outcomes (the level of
significance)
3. Choose the appropriate test statistics (Z, t, 𝜒 , F)
and calculate from the sample, the empirical value
of the chosen statistics
10/27/21 8
STEPS IN HYPOTHESIS TESTING contd
4. Construct the decision rule and find the critical values
of the chosen statistics
10/27/21 9
Two-sided or two-tailed test.
• If the decision rule lies within an interval and we reject the
null hypothesis for values outside this interval, we refer to
such criterion as a two-sided test or a two-tailed test.
• For example, if we are required to test the hypothesis
𝐻$ : 𝜇 = 42 and the decision rule says “ accept 𝐻$ if the
sample mean lies within the interval 40.1 and 45.5
(i.e., 40.1 ≤ 𝜇 ≤ 45.5 ) then the situation is a 2 tailed test.
10/27/21 10
One-sided or one-tailed test
• If we reject the null hypothesis for values of the mean-𝜇
greater than or less then a certain value, we refer to the
criterion as a one-sided test or one-tailed test
10/27/21 11
TEST FOR THE MEAN-TEST STATISTICS
10/27/21 12
TEST COMPARING TWO MEANS (PAIRED DATA)
• Sometimes we would like to know whether an observed difference
between two means is by chance or whether it is statistically significance
• Suppose x1, x2, ….xn are the observations on n individuals before an
experiment and y1, y2, ……yn are the corresponding observation after the
experiment. Then (x1,y1), (x2,y2),……(xn,yn) constitute a paired data.
• To compare the means of these data, we can transform the data into a
single sample data by finding the different between corresponding
observations and reducing it to a one-sample test.
10/27/21 13
TEST COMPARING TWO MEANS (PAIRED
DATA)contd
• Assume we have two independent samples with
means 𝑥"̅ and 𝑥̅# respective sample sizes 𝑛" and 𝑛#
from a normal population with mean 𝜇" and 𝜇# and
variance 𝜎"# and 𝜎## .
We can compare the means 𝜇" and 𝜇# by testing the
hypotheses under various conditions about the
population variance and sample size:
10/27/21 14
TEST COMPARING TWO MEANS –HYPOTHESIS
𝑯𝑶 : 𝝁𝟏 ≥ 𝝁𝟐 or 𝑯𝑶 : 𝝁𝟏 − 𝝁𝟐 ≥ 𝟎
𝑯𝒂 : 𝝁𝟏 < 𝝁𝟐 𝑯𝒂 : 𝝁𝟏 − 𝝁𝟐 < 𝟎
𝑯𝑶 : 𝝁𝟏 ≤ 𝝁𝟐 or 𝑯𝑶 : 𝝁𝟏 − 𝝁𝟐 ≤ 0
𝑯𝒂 : 𝝁𝟏 > 𝝁𝟐 𝑯𝒂 : 𝝁𝟏 − 𝝁𝟐 > 0
𝑯𝑶 : 𝝁𝟏 = 𝝁𝟐 or 𝑯𝑶 : 𝝁𝟏 − 𝝁% = 𝟎
10/27/21 𝑯𝒂 : 𝝁𝟏 ≠ 𝝁𝟐 𝑯𝒂 : 𝝁& − 𝝁𝟐 ≠ 0 15
CASE 1: POPULATION VARIANCES ARE KNOWN
• If the sample sizes are large, i.e 𝒏𝟏 ≥ 𝟑𝟎 and 𝒏𝟐 ≥ 𝟑𝟎 and
the variances are known, then the test statistics for the
difference between two means is given by :
• where 𝜕 = 𝜇% − 𝜇& the value on the right hand side of the null
hypothesis , which is usually zero
10/27/21 16
CASE 2: VARIANCES ARE UNKNOWN BUT n IS
LARGE
• If the population variances are unknown but the sample size is
large, then the sample variance can be used as an approximate
substitute for the population variance. That is:
2̅ % 32̅ & 34
•𝒛=
'& '
%5 &
&
(% (&
• where 𝜕 = 𝜇% − 𝜇&
10/27/21 17
CASE 3A: VARIANCES ARE UNKNOWN BUT n IS
SMALL(n less than 30)
If the variances are equal that 𝜎%& = 𝜎&& = 𝜎 & , then the test
statistics is:
2̅ % 32̅ & 34
•𝒕=
(% )% '&
% *(( & )%)' &
&∗ % 5 %
(% *(& )& (% (&
µ1 - µ2 ¹ d , Z < - Z or Z > Z
a
2
a
2
- Za 2 £ Z £ Za 2
t < -t or t > t
a
2
a
2
- ta 2 £ t £ ta 2
10/27/21 20
EXAMPLE 1
• A study by the EPA compared the highway fuel economy of
domestic and imported private cars. A test of 35 domestic
cars resulted in an average of 35.2 meter/gallon, and a test of
40 imported cars resulted in an average of 34.2 meter/gallon.
The population gas mileage is known to be normal in both
3 3
cases with a variances of 𝜎2 = 2.3 (domestic) and 𝜎3 = 1.8
(imported). Test the hypothesis that the population means
are equal at α=0.05
10/27/21 21
SOLUTION
• We solve the problem using the steps in testing for hypothesis
1. Formulate the null and alternative hypothesis:
𝐻! : 𝜇% − 𝜇& = 0
𝐻" : 𝜇% − 𝜇& ≠ 0
2. Set the value of outcomes (the level of significance) : α=0.05
3. Choose the appropriate test statistics and calculate the
empirical value of the statistics.
10/27/21 22
SOLUTION
• This is a test of the differences between two means and the sample
size is greater than 30. Also, the population variances are known.
Therefore the test statistics is the Z
• 𝑥%̅ = 35.2 𝑥̅& = 34.2 𝜎%& = 2.3 , 𝜎&& = 1.8, 𝑛% = 35, 𝑛& = 40
%̅ ! &%̅ "
•𝒛= = 3.011
#" #"
!( "
$! $"
10/27/21 23
SOLUTION
4. . Construct the decision rule and find the critical values of the
chosen statistics
If 𝑧 < −𝑧!⁄" or 𝑧 > 𝑧!⁄" reject 𝐻! otherwise accept 𝐻!
±𝑍(.(&*, ± 1.96
5. Compare the empirical values of the test statistics with the
critical values and conclude
Calculated Z=3.01 > 𝑧!⁄" = 1.96 .
By the decision rule, we reject the null hypothesis and conclude
that that the population means are not equal
10/27/21 24
EXAMPLE 2
• In 1994 Duraplast claimed that its PVC pipes are more durable
than that of Interplast. In order to test this claim a student made
an experiment by testing the durability of 5 PVC pipes from each
company. His experiment revealed that the mean duration of a
Duraplast PVC is 20,000 months and that of Interplast is 8,000
months. The standard deviation of the PVCs from Duraplast and
Interplast were 3,000 and 2,000 month respectively. Test this
claim using an 𝛼 of 0.05 if the population variances of the
duration of the two PVCs are equal
10/27/21 25
SOLUTION
1. Formulate the null and alternative hypothesis.
Thus, the null and alternative hypotheses to be tested are:
𝐻! : 𝜇% − 𝜇& ≤ 0
𝐻" : 𝜇% − 𝜇& > 0
10/27/21 26
SOLUTION
3. Choose the appropriate test statistics and calculate the
empirical value of the statistics
The test statistic is the t-statistics (when the variances are equal)
𝑥%̅ − 𝑥̅& − 𝜕
𝒕= = 7.44
𝑛% − 1 𝑆%& + (𝑛& − 1)𝑆&& 1 1
∗ +
𝑛% + 𝑛& − 2 𝑛% 𝑛&
10/27/21 27
SOLUTION
4. Construct the decision rule and find the critical values of the chosen
statistics
𝑡 > 𝑡,, -# .-# /& reject 𝐻! or reserve judgment; 𝑡(.(*,0 = 1.86
5. Compare the empirical values of the test statistics with the critical
values and conclude
𝑡 = 7.44 > 𝑡(.(*,0 = 1.86
Hence we reject the null hypothesis that the mean duration of
Duraplast PVC is not less than or equal to that of Interplast
10/27/21 28
QUESTION
• An advert in the Ghanaian Times provide the following prices for
renting a car from two different companies in 26 towns. Determine
if the two sample means differ significantly, using 𝛼 = 0.1
City
A B C D E F G H I J K L M
Company 1
46 61 59 55 54 52 34 50 49 46 41 54 68
Company 2
46 57 58 54 54 52 34 49 49 46 41 50 63
City
N O P Q R S T U V W X Y Z
Company 1
68 65 44 57 45 56 50 50 47 48 47 62 45
Company 2
10/27/21 63 61 44 53 45 52 50 50 47 48 47 59 45
29
SOLUTION
1. Formulate the null and alternative hypothesis.
𝐻! : 𝜇% − 𝜇& = 0 𝐻" : 𝜇% − 𝜇& ≠ 0
10/27/21 30
SOLUTION
• The sample size is small and the sample standard deviation is known,
nothing is said about the equality of the population variance and
therefore we use the t-statistics for the difference in mean when the
variances are not equal.
𝑥%̅ − 𝑥̅& − 𝜕
𝒕= 0.68
𝑛% − 1 𝑆%& + (𝑛& − 1)𝑆&& 1 1
∗ +
𝑛% + 𝑛& − 2 𝑛% 𝑛&
10/27/21 31
SOLUTION
4. Construct the decision rule and find the critical values of the
chosen statistics
If 𝑡 < −𝑡!⁄" or 𝑡 > 𝑡!⁄" reject 𝐻! or reserve judgment.
10/27/21 36
TESTING CONCERNING STANDARD DEVIATIONS
• The chi-square distribution is used to test whether the
population standard deviation is equal to a specific constant or
whether the population variance equals to a specific constant.
• 𝐻! : 𝜎 ≥ 𝜎( 𝐻! : 𝜎 ≤ 𝜎( 𝐻! : 𝜎 = 𝜎(
𝐻" : 𝜎 < 𝜎( 𝐻" : 𝜎 > 𝜎( 𝐻" : 𝜎 ≠ 𝜎(
10/27/21 37
TEST STATISTICS
• If our sample is random and comes from a normal population
& (-/% )5 "
then 𝜒 = with (n-1) degrees of freedom is the test
6#"
statistics.
&
• The critical values are 𝜒%/, and 𝜒,& for one sided alternatives
& &
and, 𝜒%/!⁄" and 𝜒!⁄" for two sided alternative.
c 2 > c a2
2
10/27/21 39
EXAMPLE 1
• In a laboratory experiment, 𝑠 = 0.0086 for 10 determination
of specific heat of iron. Use the 0.05 level of significance to test
the null hypothesis against the alternative hypothesis
𝜎 < 0.01
10/27/21 40
SOLUTION
1. Formulate the null and alternative hypothesis.
𝐻( : 𝜎 ≥ 0.01 , 𝐻" : 𝜎 < 0.01
10/27/21 41
SOLUTION
• This is a test of the standard deviation being equal to a specific
constant and therefore the chi squared test is used
(-/% )5 " 2((.((07)"
• 𝜒& = = = 6.7
6$" ( (.(%()"
4. Construct the decision rule and find the critical values of the
chosen statistics
Reject the null hypothesis if χ2< χ2 0.95,9 = 3.325 otherwise accept
Ho or reserve judgment.
10/27/21 42
SOLUTION
5. Compare the empirical values of the test statistics with the
critical values and conclude
10/27/21 43
QUESTION
A student in an attempt to check a Professor’s claim that convicted
embezzlers spend on the average 12.3 months in jail takes a random
sample of 35 such cases from court files.
• Using his results, 𝑥̅ = 11.5 𝑚𝑜𝑛𝑡ℎ 𝑎𝑛𝑑 𝑠 = 3.8 months and the
level of significance is 0.05. Should the student accept the null
hypothesis 𝜇 = 12.3 months or the alternate hypothesis 𝜇 ≠
12.3months?
• Test at 0.01 level of significance whether or not 𝜎 = 4.2 months for
the amount of time that convicted embezzlers spend in jail if n=41.
That is test if 𝜎 = 4.2 even though s=3.8.
10/27/21 44
SOLUTION (a)
1. Formulate the null and alternative hypothesis.
𝐻! : 𝜇 = 12.3 𝐻" : 𝜇 ≠ 12.3
10/27/21 45
SOLUTION
• The sample size is large n>30 and therefore the z test will be
used n=35, s=3.8,
̅
%&0 ̅
%&0
•𝒛=# or 𝒛 = -
1√$ 1√$
"".-&"#.,
•𝑧= +.& = −1.25
1√+)
10/27/21 46
SOLUTION
4. Construct the decision rule and find the critical values of the
chosen statistics
If Z < −𝑍!⁄" or Z > 𝑍!⁄" reject Ho
5. Compare the empirical values of the test statistics with the critical
values and conclude
𝑍!⁄" = 𝑍(.(&* = 0.0987 ; Z= -1.25< −𝑍!⁄" = −𝑍(.(&* = −0.0987
We reject Ho, the professor’s claim is not correct
10/27/21 47
SOLUTION (b)
• Formulate the null and alternative hypothesis.
𝐻! : 𝜎 = 4.2 𝐻" : 𝜎 ≠ 4.2
10/27/21 48
SOLUTION (b)
• Equality of standard deviation and therefore the χ2 Test. n=41,
s=3.8,
(802 )9 " ;4(<.=)"
• 𝜒3 = = = 32.74
:'" (;.3)"
4. Construct the decision rule and find the critical values of the
chosen statistics
& &
if 𝜒 & < 𝜒%/ !⁄
"
or 𝜒 &
> 𝜒 !⁄
"
reject Ho otherwise accept Ho or
reserve judgment; Χ0.005,40=66.7659 χ0.995,40=20.7065
10/27/21 49
SOLUTION (b)
• 5. Compare the empirical values of the test statistics with the
critical values and conclude
• 32.74 > 20.7065 and 32.74 < 66.7659= χ0.005
10/27/21 51
TEST CRITERIA , F-TEST
Ha Test statistics Reject H 0 Accept H0 or
reserve
s1 < s 2 s2
2
F < F1-a F ³ F1-a
F=
s1 < s 2
2 2
s1
2
s1 > s 2 s1
2
F > Fa F £ Fa
F=
s1 > s 2
2 2
s2
2
F<F a
1-
2
10/27/21 52
EXAMPLE
• In a sample of 8 observations, the sum of squared deviations of
item from the mean was 94.5. In another sample of 10
observations, the value was found to be 101.7. Test whether
the difference is significant at 5% level.
10/27/21 53
SOLUTION
1. Formulate the null and alternative hypothesis
𝐻! ∶ 𝜎%& = 𝜎&& , 𝐻" ∶ 𝜎%& ≠ 𝜎&&
10/27/21 54
SOLUTION
• The test statistic is the F- statistic because we are talking about the
equality of standard deviations. But we need to calculate the
standard deviation first.
• ∑(𝑥% − 𝑥%̅ )& = 94.5 ; ∑(𝑥& − 𝑥̅& )& = 101.7 ; 𝑛% = 8 ; 𝑛& = 10
∗ 5#" %=.*
• 𝐹 = = = 1.195; 𝐹(.(&*,;.2 = 4.2
5#" %%.=
10/27/21 55
SOLUTION
4. Construct the decision rule and find the critical values of the
chosen statistics
• If F > 𝐹!⁄" 𝑜𝑟 𝐹 < 𝐹%/!⁄" or reject Ho otherwise accept Ho.
Where F is the larger of the two ratios
5. Compare the empirical values of the test statistics with the
critical values and conclude
10/27/21 56
SOLUTION
1 1
F1-a = F0.975,7,9 = = = 0.207
2 F0.025,9,7 4.82
0.207 £ 1.195 £ 4.2 Þ F1-a £ F £ Fa
2 2
10/27/21 57
QUESTION
• Two random samples were drawn from two normal populations
and their values were
A 66 67 75 76 82 84 88 90 92
B 64 66 74 78 82 85 87 92 93 95 97
ECON 316
Applied Statistics for Economists
Slide 1
Session Outline
The key topics to be covered in the session are as follows:
• Definition of Time Series Analysis
• Components of time series
• Models of Time Series
• Objectives of time series analysis
• Explanation of Trends
• Fitting the Trend Line
• Method of Semi-averages
• Method of Moving Averages
Slide 2
What is Time Series Analysis?
• Time series is a set of regular observations of
some variable, collected and recorded
sequentially over a period.
• The period may be daily, weekly, monthly,
quarterly, annually or any other constant period.
Slide 3
Examples of Time Series
• Economics: share prices on successive days. Export
volumes in successive months, nominal GDP in
successive years.
• Marketing: sales volume in successive weeks,
production costs over successive years.
• Demographic: annual population data, child mortality
rate for successive years, enrolment in JHS in successive
years.
Slide 4
Re presentation of Time Series-Inflation
Tabular format
Graphical representation
Year Inflation (%)
2000 40. 5
2001 21.3
2002 15.2
2003 23.6
2004 11.8
2005 13.5
2006 10.9
2007 10.1 Slide 5
Objectives of Time Series
• Systems or variables have influences that are
associated with time.
• Example in agricultural production: output increases
during the rainy season and decrease during the dry
season
• Example in sales: sales increase during Christmas
and reduces after that
– does season affect agricultural production and sales?
Slide 6
Objectives of Time Series cont’d
• What factors affect time series value?
• How will these factors affect the values in future?
• How can we iron out these factors to obtain real
actual values?
Slide 10
Solution to Example 1
• The additive model is given by
• To find the trend value for 1985 - , we make Ti the
subject as:
or
Slide 12
The Multiplicative Model
• Assumes a multiplicative relation amongst the four
components.
• Mathematically,
• Time series value for year i is the product of the trend,
seasonal, cyclical and irregular components for year i.
• If we know any four of these values we should be able
to find the other value by making that value the
subject of the formula and inserting the values.
Slide 13
Example 2
The table below provides some of the components and actual values of
a time series from 1990 to 2000. Using the multiplicative model
complete the table.
Years Trend Cycle Seasonal Random Actual Values
1990 1251 1.04 1.2 1405.12
1991 1241 0.8 1.02 1.2
1992 1231 0.7 1.2 1127.10
1993 1222 0.7 0.97 1.1 912.71
1994 0.8 0.95 1.1 1019.92
1995 1200 0.9 0.9 972.00
1996 1195 0.9 1 860.40
1997 1195 0.7 1 752.85
1998 1194 0.8 1.1 935.14
1999 1192 0.8 0.88 923.08
2000 0.9 0.9 0.9
Slide 14
Solution to Example 2
• The multiplicative model is given by the formula
• or
Notice that and
• Inserting these values, we obtain:
Slide 15
Solution to Example 2 Cont’d
• Results of the calculations are presented in the table below.
Years Trend Cycle Seasonal Random Actual Values
1990 1251 0.90 1.04 1.2 1405.12
1991 1241 0.8 1.02 1.2 1215.19
1992 1231 0.7 1.09 1.2 1127.10
1993 1222 0.7 0.97 1.1 912.71
1994 1220 0.8 0.95 1.1 1019.92
1995 1200 0.9 0.9 1 972.00
1996 1195 0.80 0.9 1 860.40
1997 1195 0.7 0.9 1 752.85
1998 1194 0.8 0.89 1.1 935.14
1999 1192 0.8 0.88 1.1 923.08
2000 1190 0.9 0.9 0.9 867.51
Slide 16
Activity 1
The table below shows some of the values of the components and
actual time series from 1996 to 2005. Using the multiplicative model
complete the table.
Years Trend Cycle Seasonal Random Actual Values
1996 0.60 1.04 0.40 18335.37
1997 88150.80 0.80 0.50 35965.53
1998 96965.88 1.10 0.40 25598.99
1999 99874.86 0.70 0.97 0.50
2000 104868.60 0.70 0.95 41842.57
2001 111160.72 0.90 0.40 30013.39
2002 124500.00 0.65 0.90 0.50
2003 151890.00 0.70 0.40 40402.74
2004 162522.30 0.85 0.60 58020.46
2005 0.80 0.88 0.50 63615.13
2006 196990.03 0.80 0.90 0.50
Slide 17
Activity 2
The table below shows the values of some of the components and
actual values of a time series for 1996 to 2005. Using the additive
model complete the table.
Year Trend Cycle Seasonal Random Actual
Slide 20
Method of Semi-Averages contd
• Plot the trend line using the average values of
the two observations with the help of the
equation of a straight line:
Øb is the gradient of
Slide 21
Example 1
The table shows the Years Tons of Cocoa
production of cocoa by a 2000 3
2001 4
farmer in Ghana from 2000
2002 5
to 2009. 2003 1
a. Fit the trend line using 2004 3
the method of semi- 2005 4
averages. 2006 6
2007 2
b. Find the equation of the 2008 4
straight line 2009 5
Slide 22
Solution to Example 1
• We have even (10) number Tons of Centered
of observations from 2000 Years Cocoa average
2000 3
to 2009.
2001 4
• We split the time series into 2002 5 3.2
two equal parts 2000-2004 2003 1
and 2005-2009, each with 2004 3
5 observations. 2005 4
• The average for the first 2006 6
part is 3.2 and second part 2007 2 4.2
is 4.2 2008 4
2009
Slide 23 5
Solution to Example 1 Cont’d
Plotting the years on the x-axis and • We now Find the
output on the y-axis we obtain equation of the line with
Output
the two points
5 and
4.5
0.5
will have a code of 1,
0 2001 will have a code of
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
2, etc.
Slide 24
Solution to Example 1 Cont’d
• Using the codes, the two points become
and
• The equation of a line is given as:
• Or
Slide 25
Method of Moving Averages
• The average value of the time series for a number of years
is obtained in successive order.
• This average values are taken as the trend value for the
unit of time falling at the middle of the period.
• Reasoning: When you take these averages as trend, you
obtain a smoother curve and lessen the influence of
fluctuations that pull the annual times series values
away from the general trends.
• How do you choose this period of years? You choose it
to be even or odd to coincide with a practical natural
cycle.
Slide 26
Odd Period of Moving Average
By odd period of moving averages, we mean a period for moving
average such as 3-yearly moving average, 5-weekly moving average
etc. The table shows the calculation of 3-yearly moving averages.
Year Value 3-Yearly 3-Yearly Moving
Moving Totals Averages / Trend
1990 a0
1991 a1 ao + a1 + a2 a0 + a1 + a2
3
1992 a2 a1 + a2 + a3 a1 + a2 + a3
3
1993 a3 a2 + a3 + a4 a2 + a3 + a4
3
1994 a4 a3 + a4 + a5 a3 + a4 + a5
3
1995 a5 a4 + a5 + a6 a4 + a5 + a6
3
1996 a6
Slide 27
Odd Period of Moving Average contd
• Finding the equation of the trend line is done by using the first
and last points and the formula::
Slide 28
Odd Period of Moving Average-Example 1
Slide 31
Activity 1
The table below shows the nominal GDP of a country
from 2000 to 2008. Fit the trend equation using 3-yearly
moving averages.
200 200 200 200
Year 2000 2001 2 2003 2004 5 2006 7 8
GDP (trillion
¢) 235 242 255 258 285 291 292 297 300
a. Find the trend equation using 3-weekly moving
averages
b. Draw the trend line.
Slide 32
Even Period of Moving Averages
• If the moving average is an even period, for example -
4-yearly moving average, the moving averages that
are computed does not fall in a particular period
but rather between two periods.
• Implication: what year will corresponded to the
average that you have calculated.
• Centering-find the average of two successive moving
averages and placing it in the middle value solves the
problem.
Slide 33
Calculation of 4-Yearly Moving Average
Year Value 4-yearly moving 4-yearly centered
average moving average (Trend)
1990 a0
1991 a1
a0 + a1 + a2 + a3
= b1
4
1992 a2 b1 + b2
2
a1 + a2 + a3 + a4
= b2
4
1993 a3 b2 + b3
2
a2 + a3 + a4 + a5
= b3
4
1994 a4 b3 + b4
2
a3 + a 4 + a5 + a6
= b4
4
1995 a5
1996 a6
Slide 34
Example 3
i.The data below shows the output of a food processing
company. Assume a 4-yearly cycle and find the trend line
by the method of moving averages.
ii. Using the trend line, estimate the output for 1990
Year 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988
Output 464 515 518 467 502 540 557 571 586 612
(tons)
Slide 35
Calculation of 4-yearly moving average
Year Output 4-yearly MT 4-yearly 4-yearly CMA/
MA Trend
1979 464
1980 515
1964 491
1981 518 495.75
2002 500.5
1982 467 503.625
2027 506.75
1983 502 511.625
2066 516.5
1984 540 529.50
2170 542.5
1985 557 553.0
2254 563.5
1986 571 572.5
2326 581.5
1987 586
1988 612
Slide 36
Solution to Example 3 Cont’d
• By using codes for the first (1980) and last (1986) data
points, we obtain the two points as: and
• Equation of a straight line is given as:
• Inserting the values, we obtain:
•
• Thus,
• Forecasting for the year 1990 with a code of 10, we have:
• UNIVERSITY OF GHANA
• DEPARTMENT OF ECONOMICS
• ECON 316
Applied Statistics for Economists
Slide 1
Session Outline
The key topics to be covered in the session are as
follows:
• Fitting the Trend Line
-Method of Least Squares
• Explanation of seasonal variation
• Calculation of seasonal indices
• Deseasonalization of time series data
Slide 2
1
10/19/2021
Slide 3
Example
Year Unemploym • The table below shows
ent the unemployment data
2004 1005 of graduates from 2004-
2009 in a certain
2005 1010 country
2006 975 i. Determine the least
2007 950 squares trend equation
ii. Estimate unemployment
2008 955
for 2015.
2009 940
Slide 4
2
10/19/2021
Solution to Example 4
• Table
X Y XY X2
1 1005 1005 1
2 1010 2020 4
3 975 2925 9
4 950 3800 16
5 955 4775 25
6 940 5640 36
21 5835 20165 91
Slide 5
Slide 6
3
10/19/2021
Slide 7
Activity 2
1. The following data shows the number of road accidents from 1998
to 2008.
199
Year 8 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
No. of
accidents 220 224 235 242 255 258 285 291 292 297 300
Slide 8
4
10/19/2021
Slide 10
10
5
10/19/2021
Slide 11
11
12
6
10/19/2021
Example 1
The table below shows labour turnover on quarterly
bases for three years. Calculate the seasonal index
under the multiplicative model using the method of
moving averages.
Interpret the Seasonal Index for the first and fourth
quarters Year/Quarter 1 2 3 4
1985 68 62 61 63
1986 65 58 66 61
1987 68 63 63 67
Slide 13
13
Solution to Example 1
• Consider table on next slide
• Step 1: estimate the trend using the method of
moving averages.
• We have 4 quarters in a year and therefore we use 4-
quarterly moving average (4QMA) and center it
(Column 4 and 5)
Step 2: Obtain the s S×I components by dividing the
actual value by the T and C component i.e. (column 6)
• For example, the value 0.966 was obtained by dividing
61 by 63.125.
Slide 14
14
7
10/19/2021
Slide 15
15
Slide 16
16
8
10/19/2021
17
18
9
10/19/2021
Slide 19
19
Slide 20
20
10
10/19/2021
Activity 1
1. The table below shows the number of people who attend morning,
afternoon and evening school for a week.
Monday Tuesday Wednesday Thursday Friday
Slide 22
22
11
10/19/2021
Activity 2
1. The table below shows the sales of a supermarket in Ghana in
billion cedis.
Quarter
Year First Second Third Fourth
2003 18 80 30 522
2004 24 105 54 527
2005 33 141 48 538
2006 40 150 75 545
2007 35 180 55 550
2008 48 205 70 557
23
12
• UNIVERSITY OF GHANA
• DEPARTMENT OF ECONOMICS
• ECON 316
Applied Statistics for Economists
Slide 1
Session Overview
By the end of this section, you should be able to
• Explain cyclical variations
• Explain random variations
• Calculate cyclical relative
• Discuss the importance of time series
• Explain forecasting
• Forecast based on trend and seasonal components
• Forecast using simple exponential smoothing
Slide 2
Session Outline
The key topics to be covered in the session are as follows:
• Cyclical Variation
• Cyclical Relative
• Irregular Variation
• Forecasting
Slide 3
Topic 1
What is Cyclical Variation (CV)
• Cyclical variation is the recurrent up and down wavelike
variation from trend that have duration of about 2-10
years and may repeat again and again
- for a specific time series, one cycle can take 3 years but
another cycle can take about 10 years.
• Example: Business cycle- propensity, recession,
depression and recovery
• These cycles have the propensity to affect time series
value in sales, production, growth etc.
Slide 4
Example : Business Cycle
• Measuring CV is very
difficult. Why?
• Successive cycles may
vary widely in time and
pattern and usually mixed
up with irregular factors.
• .
Slide 5
Topic 2
Measuring Cyclical Variation: Cyclical Relative
approach
• Assumption: Annual time series reflects the effects of T
and C component because the S and I components are
defined as short-run influence which occurs within the
course of each year.
• Under this assumption the multiplicative model
reduces to
Or T - trend estimate, Y - actual value
Slide 7
Example 1
• The table below shows the actual (Y) and Trend (T)
estimate of the production of aluminium ingots. Calculate
the cyclical relative and interpret your result.
Year Y T
1994 700 728
1995 850 799
1996 900 972
1997 1100 1078
1998 1225 1225
1999 1324 1347
2000 1500 1485
2001 1675 1682
2002 1700 Slide 8 1564
Solution to Example 1
• CR for 1994 is
æ Y ö
Cyclical relative ç ´100 ÷ 96.2%. This means
Year Y T èT ø
that there is cyclical
1994 700 728 96.2 influence (recession)
1995 850 799 106.4 as what was obtained
1996 900 972 92.6 (actual value) is less
1997 1100 1078 102.0 than what was
1998 1225 1225 100.0 estimated (trend).
1999 1324 1347 98.3 • CR for 1995 is
2000 1500 1485 101.0 106.4% meaning
2001 1675 1682 99.6 there is a recovery.
2002 1700 1564 108.7
Slide 9
Activity 1
• The table below shows the actual (Y), Trend (T) estimate and the
cyclical relative (C) of wheat production in a country. Calculate the
missing values. Interpret your results for 1998, 1999 and 2005.
Slide 13
Activity 1
• The data below shows quarterly export of a cash crop from a
country from 1995 – 2000 in million metric tons.
I II III IV Total
1995 5 8 9.5 26 48.5
1996 6.5 8.7 10.5 27 52.7
1997 6.8 9 12.1 26.5 54.4
1998 7.5 9.5 12 28 57
1999 7.8 10.5 12.5 29.5 60.3
2000 9.5 11.5 13.5 30 64.5
Question 1
a. Plot the data for all the years
b. Which component of time series is visibly present
c. Calculate the seasonal indices
d. Estimate the annual trend line using theSlidemethod
14 of least square
Activity 1 Cont’d
e. Forecast the sales value for the year 2008
f. Forecast the sales value for the first and fourth quarters of
year 2008.
Question 2
i. Estimate the trend line using the method of moving
averages
ii. Forecast the sales value for the year 2008
iii. Forecast the sales value for the first and fourth quarters
of year 2008
iv. Compare the forecast annual and quarterly values for
2008 in question 1 to that of question 2.
Slide 15
Exponential Smoothing
• Forecasting method that applies unequal weights to
time series observations.
• Weighting is done by using a smoothing constant
which determines how much
weight is attached to each observation.
• Recent observations are given greater weights,
while more distant observations are given
successively smaller weights.
Slide 16
Exponential Smoothing Cont’d
• Given a smoothing constant, α, the forecast formula is:
Slide 18
Challenge:
• How do you choose α?.
• When α is close to zero, previous forecast
error
• are given a low weight.
• If α is chosen close to 1, then forecast error is
heavily weighted by the most recent result.
• Limitation: Used for only one period forecast
because you need the actual value before you
can calculate the next forecast.
Slide 19
Example 2
• A shopping mall experienced the following monthly sales
for the first four months of the year, in thousands of units.
Slide 20
Solution to Example 2
• Formula for simple exponential smoothing is given as:
• α=0.15
• The results also shows that the forecast for May is 29,220.
Slide 22
Activity 2
• The table below shows the monthly export of cocoa for
the first five months of the year in thousand metric tons.
Slide 1
Outline
• Constuction of paasche Index
• Constuction of the Fisher ideal index
Slide 2
What is the Paasche Price Index?
• An aggregated index that measures the percentage ratio of the
prices of a group of commodities comparing two time periods -
base year and the current year. However, the prices of the
commodities are weighted by the current year quantities.
Mathematically: Mathematically, the Paasche price index is
defined as:
and
Slide 3
Example 3
The table below shows the prices in cedis and the quantities
(Hectoliters) of oils produces in a country and their average
annual prices.
Slide 4
Example 3
2002 2003 2004 2005 2006
Items Price Quantity Price Quantity Price Quantity Price Quantity Price Quantity
Palm oil 25 11200 27.50 11424 27.78 11652 30.55 12002 35.14 13442
Groundnut oil 30 11300 36.00 11752 36.72 12222 44.06 12711 52.88 14363
Shea butter 15 11100 15.75 11433 16.54 11776 17.36 12129 21.71 12493
Coconut oil 11 14000 15.40 14280 16.02 14566 22.42 14857 33.63 15154
Palm kernel oil 9 12700 9.45 13081 9.92 13473 11.41 13878 15.40 14294
Slide 5
Animal fat 8 12300 10.40 12915 13.52 13561 17.58 14103 22.85 14808
Solution to Example 3
• A table for 2003 will look like ...
Items p2003 q 2003 p2003q2003 q2002 p2003 q2002
Slide 6
Solution to Example 3
• a. Recall:
Slide 7
Solution to Example 3 Cont’d
• For 2004
Items p2004 q2004 p2004 q2004 q2002 p2004 q2002
Palm oil 27.78 11,652 323,692.56 11200 311,136
Groundnut 36.72 12,222 448,791.84 11300 414,936
oil
Shea 16.54 11,776 194,775.04 11100 183,594
butter
Coconut 16.02 14,566 233,347.32 14000 224,280
oil
Palm 9.92 13,473 133,652.16 12700 125,984
kernel oil
Animal fat 13.52 13,561 183,344.72 12300 166,296
Total 1,517,603.64 1,426,226
• We calculate the index for 2004 as:
Slide 8
Solution to Example 3 Cont’d
• For the year 2005
Items p2005 q2005 p2005 q2005 q2002 p2005 q2002
Palm oil 30.55 12,002 366,661.10 11,200 342,160
Groundnut 44.06 12,711 560,046.66 11,300 497,878
oil
Shea 17.36 12,129 210,559.44 11,100 192,696
butter
Coconut 22.42 14,857 333,093.94 14,000 313,880
oil
Palm 11.41 13,878 158,347.98 12,700 144,907
kernel oil
Animal fat 17.58 14,103 247,930.74 12,300 216,234
Total 1,876,639.86 1,707,755
We compute the index for 2005 as
Slide 9
Solution to Example 3 Cont’d
• 2006
Items p2006 q2006 p2006 q2006 q2002 p2006 q2002
Palm oil 35.14 13,442 472,351.88 11,200 393,568
Groundnut 52.88 14,363 759,515.44 11,300 597,544
oil
Shea 21.71 12,493 271,223.03 11,100 240,981
butter
Coconut 33.63 15,154 509,629.02 14,000 470,820
oil
Palm 15.40 14,294 220,127.60 12,700 195,580
kernel oil
Animal fat 22.85 14,808 338,362.80 12,300 281,055
Total 2,571,209.77 2,179,548
Slide 10
Solution to Example 3 Cont’d
• The table summarizes the Paasche quantity index.
Slide 11
Solution to Example 3 Cont’d
The Paasche price index is computed as:
• 2004:
• 2005:
• 2006:
Slide 13
Solution to Example 3 Cont’d
• The table summarizes the results
Year 2002 2003 2004 2005 2006
100 117.46 123.93 148.41 187.93
I v ( P)
Slide 15
Solution to Example 3 Cont’d
• Putting the summations into the Paasche formula, we obtain:
2003:
2004:
2005:
2006:
Slide 16
Solution to Example 3 Cont’d
• The table summarizes the results
• For the year 2006, the Paasche value index was 223.16. this
shows a net increase in the value of about 123.16% compared
to the base year.
Slide 17
Activity 3
• The data below shows the prices (in cedis) and imports (in million
barrels) of some fuels from 2003-2007.
2003 2004 2005 2006 2007
Items Price Quantity Price Quantity Price Quantity Price Quantity Price Quantity
Gasoline 20 2220 22.00 2264.4 22.22 2310 24.44 2379 28.11 2664
Diesel 19 2730 22.80 2839.2 23.26 2953 27.91 3071 33.49 3470
RFO 12 1210 12.60 1246.3 13.23 1284 13.89 1322 17.36 1362
LCO 8 1400 11.20 1428 11.65 1457 16.31 1486 24.46 1515
Kerosene 14 1210 14.70 1246.3 15.44 1284 17.75 1322 23.96 1362
Premix 15 900 21.00 918 21.84 936 30.58 955 45.86 974
Slide 19
Comparison of Laspeyres & Paasche prices
Indices
• The Laspeyres price index (LPI) measures change in a fixed
market basket of goods and services (same quantities used
in each period) whiles the Paasche index continually updates
the quantities to the levels of current consumption.
• LPI is generally expected to overestimate or have an upward
bias.
- All things been equal, when prices increase there is usually
a reduction in the consumption of those items for which the
increase has been the most pronounced and, hence, by using
base year quantities we will be giving too much weight to the
prices of the items whose prices have increased.
Slide 20
Comparison of Laspeyres & Paasche
prices Indices
• Paasche price index (PPI) tends to underestimate or
shows a downward bias.
Slide 21
Comparison of Laspeyres & Paasche Indices
• The
=
Or
=
• These formuli helps us to derive the quantity indices from the value
indices. This process is known as deflating a series.
and
Slide 22
The Fisher Ideal Price Index
Slide 23
Example 1
The table below shows the price and quantities of 4
commodities in1988 and 1989.
Commodity 1988 1989
A 2 8 4 6
B 5 10 6 5
C 4 14 5 10
D 2 19 2 13
Slide 25
Activity 1
• The data below shows the prices (in cedis) and imports (in million
barrels) of some fuels from 2005 -2007.
2003 2004 2005 2006 2007
Items Price Quantity Price Quantity Price Quantity Price Quantity Price Quantity
Gasoline 20 2220 22.00 2264.4 22.22 2310 24.44 2379 28.11 2664
Diesel 19 2730 22.80 2839.2 23.26 2953 27.91 3071 33.49 3470
RFO 12 1210 12.60 1246.3 13.23 1284 13.89 1322 17.36 1362
LCO 8 1400 11.20 1428 11.65 1457 16.31 1486 24.46 1515
Kerosene 14 1210 14.70 1246.3 15.44 1284 17.75 1322 23.96 1362
Premix 15 900 21.00 918 21.84 936 30.58 955 45.86 974
1. Using 2003 as the base year calculate the
i. Laspeyres price Index
ii. Paasche price Index and hence
iii. Fisher’s Ideal Index
2. Interpret your results for 2006 in l, ii, and iii.
Slide 26
The Consumer Price Index
Slide 27
Formula for CPI
r
åp ni qo i
CPI = i =1
r
´ 100 =
• åp
i =1
oi qoi
• shares of commodities.
• then r pni
•
CPI = å ´ w0i ´100
i =1 p0i
Slide 28
Example 2
• The table below shows the prices (in cedis) and
quantities (in kilograms) of the basket of goods used in
the calculation of CPI in a country.
Slide 29
Example 2
2003 2004 2005 2006 2007
Items Price Qty Price Price Price Price
Cereals 15 22 17 17 18 21
Tubers 6 27 7 7 9 11
Plantain 8 12 8 9 9 12
Meat 11 14 15 16 22 34
Fruits 21 12 22 23 27 36
Vegetables 9 9 13 13 18 28
pulses & nuts 21 22 23 23 26 30
Beverages 12 27 14 15 18 21
Eggs 5 14 7 7 10 15
Milk 9 12 9 10 11 15
Poultry 15 10 21 22 31 46
Slide 30
Solution to Example 2
• Given . The calculations are presented in the table
b. The CPI for 2007 is 188% this means that on the average the prices of the
basket of goods have gone up by 88 % compared to the base years’ price.
Slide 32
Uses of the CPI- Calculating Inflation
• Inflation is persistent increase in the generally price level
whiles deflation is the persistent decrease in the general
price levels .
• Mathematically, inflation for year t is computed as:
, where
198 35
´ 145 = 50.75
10
´ 23 = 2.3
20
´ 65 = 13
15
´ 30 = 4.5
20
´ 45 = 9 79.55
6 100 100 100 100 100
Slide 35
82.5
Solution cont’d
CPI 1985 = ´ 100 = 100%
82.5
82.5
CPI 1985 = ´ 100 = 100%
82.5
79.55
CPI 1986 = ´ 100 = 96.42%
82.5
Interpretation: Compared to 1985 the expenditure
of middle class families have gone down by 4.58
percent in 1986.
Slide 36
Uses of the CPI-inflation calculation cont’d
Slide 37
Solution to Example 3
• Inflation is computed for 2006 as:
Slide 39
Uses of CPI – Deflating -Example 4
The CPI of a country is 300 % in 2005 (2000=100). If an
employee earned ¢ 7000 in 2000 and ¢ 21,000 in 2005
a. Calculate his real income in 2005
b. Comment on the welfare of the employee
Slide 40
Solution to Example 4
• Recall :
• Thus
Slide 42
Uses of Index Numbers – purchasing
power : Example 5
• The CPI for 2005 is 200 (1987 =100). Calculate the Purchasing
Power of the cedi in 2005 and interpret your results.
Solution
Recall :
• Thus,
Slide 1
What is a simple linear regression model
• Regression analysis studies the nature of relationship
between variables and thereby provide a mechanism for
prediction
• The simplest bivariate econometric model is the linear
model, which assumes that the relationship between two
variables can be summarized by a straight line of the
form:
(2)
Slide 7
Derivation of Least Squares
Parameters
• Simplifying and expanding Equations (1) & (2), we obtain:
• Or
• ……….. (3)
• Since
• ………(4)
Slide 8
Derivation of Least Squares
Parameters
Solve Equations (3) and (4) simultaneously using Cramer’s rule
.
Slide 9
Derivation of Least Squares
Parameters
• The special determinants of a and b are given as:
Slide 10
Deriving the Least Squares Formula
• We obtain the values of a and b as follows:
Slide 12
Deriving the Least Squares Formula
• Consider:
Slide 13
Deriving the Least Squares Formula
• Or
But and
• Now that b is known, we can put it in Eqn (i) and make a the
subject as follows:
Slide 15
Deriving the Least Squares Formula
• Summarizing the formulae, we have;
• and
Slide 16
Example 1
Price(₵) Qty dd (kg)
The table below shows the 61 105
price and the quantity
62 120
supply of a commodity.
63 120
a. Find the regression
65 160
of quantity supply (Y) on
price (X) 65 120
68 145
b. Plot the actual points
and the regression line on 69 175
the same graph 70 160
72 185
75
Slide 17
210
Solution to Example 1
• We need to find the values of a and b by using the least
squares formula:
• and
Slide 18
Solution to Example 1 Cont’d
Price (X) Quantity X2 YX
supply (Y)
• Thus, the regression equation of quantity supply (Y) on price (X) is:
Quantity supply = -316.86+6.97Price
Slide 20
Solution to Example 1 Cont’d
b. The regression line has been plotted on a graph below
Slide 21
Example 2
• The regression equation as calculated in Example 1 is
Slide 22
Solution to Example 2
We required to find the quantity demand if the price.
What I have to do is to put the value of X=90 into the
equation and evaluate. That is:
Slide 23
Solution to Example 2 Cont’d
• We are required to find the price if quantity supplied is
300kg.
• We just have to put the value Y=300 into the equation and
evaluate
• This gives:
Sales(kg)
840 760 700 800 880 760 900 840 880 760
Price (₵)
8.25 9 9.75 9 7.5 9.75 6.75 7.5 7.5 9.75
a. Plot these data, and estimate the linear regression of sales on price.
b. Estimate the value of sales when price is 10.50.
Slide 25
ECON 316
Applied Statistics for Economists
Slide 1
Partitioning the Sum of Squares
• Recall the linear model is composed
of two parts:
1. A linear function of X -
2. An error component -
Let us illustrate this with our regression equation
Slide 6
• Graph: Shows plot of actual observation and the
regression line.
• Any actual observation of Y can be decomposed into
the linear or regression part and the error part.
• Consider the actual point (65, 120).
- Deviation of Y from the Mean (150) is equal to
120-150=-30.
- This value (-30) can be decomposed into the linear
and error parts.
Slide 7
Partitioning the Sum of Squares
• To do this, we have to find the estimated value of Y when
X=65.
Slide 8
Partitioning the Sum of Squares
The table below summarizes the relation between the actual
value and the deviations as:
Observed/actual Mean Deviation from mean due to Error
regression part
Y Y (Y¢ -Y ) (Y -Y¢ )
This means
or (1)
• squaring and summing both sides of equation (1), we
obtain:
Slide 9
Partitioning the Sum of Squares
• Expanding, we obtain:
(2)
• Dividing through by :
Slide 11
Partitioning the Sum of Squares
This is equivalent to
Slide 12
Example 1
Consider our example
involving quantity supply
and price. Calculate the
regression and error sum
of squares from the data.
Slide 13
(Y¢---YYY¢ ¢) 2
Y
Slide 14
Solution to Example 1 Cont’d
• Total sum of squares for Y (10400) is equal to the sum of
squares regression (9129.31) and the sum of squares error
(1271.91).
• or
Slide 15
xy
Activity 1
• The table below shows the i. Find the regression of y
weight (x) and height (y) on x
of a sample of students ii. Find the total sum of
x y squares (TSS), error
2 2 sum of squares (ESS)
6 4 and the regression sum
4 4 of squares (RSS)
3 2 iii. Show that the
7 5 proportions of ESS and
7 6 the RSS to the TSS is
0 2 equal to 1
Slide 16
Coefficient of Determination
• The method of least squares reduces the error but does not
eliminate it
• Variability in the dependent variable is measured by the
sum of the squared deviations of Y around the mean. i.e
TSS = RSS + ESS or
• .
Slide 17
Coefficient of Determination Cont’d
• The coefficient of determination is a number between
0 and 1.
Slide 18
Example 2
• Consider Example 1. Calculate the coefficient of
determination and interpret your results.
SOLUTION
• From Example 1, TSS is 10400, RSS is 9129.31 and ESS
is 1271.91.
Slide 19
Solution to Example 2 Cont’d
• Alternatively, given the ESS,
Or
(using the regression sum of squares)
Slide 23
Hypothesis Testing
Step 3: Define the decision rule. If the calculated F is
greater than the critical F, then we reject the null hypothesis.
Rejecting the null hypothesis will imply that the sample
slope is not zero. It also implies that the regression that was
done is significant and that it best explains the relationship
between weight and height.
Slide 25
Solution to Example 3 Cont’d
Step 2: Select a test statistic and calculate the value of the
test statistics
• In testing for the regression, the test statistics is the F-
statistics with 1 and n-2 degrees of freedom given as:
where n is the sample size
Slide 26
Solution to Example 3 Cont’d
• Step 3: Define the decision rule
If the calculated F is greater than the critical F then we
reject the null hypothesis.
• Step 4: Compare the calculated F with the critical values
from the statistical table and make your conclusion.
• The calculated F is 57.42 whilst the critical
Slide 29
ECON 316
Applied Statistics for Economists
Slide 1
Session Overview
By the end of this section, you should be able to:
• Define correlation analysis
• State the objectives and importance of correlation
analysis
• Explain the types of correlation
• Give examples of correlation between two variables
• Find the correlation between two variables using the
scatter diagram and the graphical methods
Slide 2
What is Correlation Analysis?
• .
Slide 4
Objectives of Studying Correlation
Slide 7
Positive Correlation and Negative
Correlation
• Positive Correlation
─ Both variables change in the same direction.
─ As on variable increases (decreases), the other
variable also increases (decreases) in the same
direction.
• Negative Correlation
─ Both variables change in opposite directions.
─ As on variable increases (decreases), the other
variable decreases (increases).
Slide 8
Example 1.0
The data below shows the number the volume (X) and value
(Y) of exports of cocoa in a country. Describe the type of
correlation between the variables.
X 5 7 9 11 13 15
Y 20 25 29 32 34.5 36
Slide 9
Solution to Example 1
• Lets plot them on the
same grapth
Positive Correlation
40
• As the volume increases
35
30
the value increases and
25
20 Series1
vice versa.
Y
15
10 • There is a positive
5
0
0 5 10 15 20
correlation between the
X value and volume of
cocoa exports.
Slide 10
Example 2.0
The data below shows the demand (X) for a commodity and
the price (Y). Describe the type of correlation between the
variables.
X 22 30 33 36 40 42
(Demand)
Y (Price) 12 10 9.5 8 6 2
Slide 11
• Demand increases the
Negative Correlation price decreases and vice
14 versa.
12
10
• There is a negative
8 correlation between
Series1
Y
6
demand and price of the
4
2 product.
0
0 10 20 30 40 50
X
Slide 12
Simple, Muliple and partial Correlation
• Simple Correlation
─ Observed between only two variables.
─ Example observing the height of students alongside the
weight.
• Multiple Correlation
—Involve three or more variables.
—Example, when the quantity supplied of a commodity is
observed alongside the price, level of technology, tax structure
, etc..
—4 variables – quantity, price, technology and tax and therefore a
multiple correlation.
• Partial Correlation
─ More than two variables are observed but only two are
considered for correlation as the effects
Slide 13 of the other variables
Linear and Non-Linear Correlation
• Linear Correlation
─ Situation whereby there is a constant change in the
variables.
─ A straight line can be drawn to pass through some or
all of the points a graph sheet.
• Non-linear (Curvilinear) Correlation
—Situation where the level of change in one variable
does not show a constant level of change in the other
variable.
—In practice, most relationships between variables are
non-linear or curvilinear in nature.
Slide 14
Example 3.0
The table below shows the number of hours a student spends
in a week (X) on his book and his average score (Y) in a
weekly quiz. Establish the relationship between the number
of hours spent on books per week and the weekly score.
X 20 40 60 80 100
Slide 15
Solution to Example 3.0
• X values increase by 10
Linear Correlation
whilst the Y values also
250 increase by 50.
200 • There is a uniform
150 change in the X values
Series1
Y
Slide 16
Methods of Studying Correlation-Scatter
Diagramme
─ Plot the bivariate data on a graph paper and determine
the degree of correlation with the help of the spread of
the points.
─ Perfectly positive correlation:. All the points lie on a
straight line with a positive gradient
─ Perfectly negative correlation:. All the points lie on a
straight line with a negative gradient.
─ Depending on the degree of spread of the point, there
can be a high or low degree of positive or negative
correlation.
Slide 17
Methods of Studying Correlation-Scatter
Diagramme cont’d
Figure 1.0: Perfectly negative Figure 2.0: Perfectly Positive
correlation Correlation
Slide 18
Advantages and disadvantages of Scatter
diagrame
• Advantages
─ Simple and non-mathematical thus making it easy
to understand.
─ Method is not also influenced by the size of
extreme value whereas most mathematical
methods are influenced by extreme values.
• Disadvantages
─ Exact degree of relationship between the two
variables cannot be determined. What we obtain
is a rough description of correlation
Slide 19
Methods of Studying Correlation: Graphical
Year X Y
2000 10 15
2001 20 34
2002 27 48.6
2003 35 52.5
2004 40 64
Slide 21
Solution to Example4
• Both production values are
plotted on the y-axis as
shown in the following
diagram.
• A positive relationship can
be observed between income
(X) and expenditure (Y).
• Why? graph for income and
expenditure are increasing in
the same direction.
•
Slide 22
Example 5
The table below shows the export (X) and import (Y)
value of a country from 1995 to 1999 in million US
dollars. Using the graphical method describe the
correlation between income and expenditure.
Year X Y
1995 100 300
1996 200 350
1997 270 300
1998 350 200
1999 400 100
Slide 23
Solution to Example 5
• Both production values are
plotted on the y-axis as
shown in the following
diagram.
• Generally there is a negative
relationship between income
(X) and expenditure (Y).
• Why? graph of exports and
imports are moving in the
opposite direction. Whereas
exports are increasing import
are decreasing.
Slide 24
ECON 316
Applied Statistics for Economists
r= è øè ø
2 2
æ -
ö æ -
ö
å çè X - X ÷
ø
å çè Y - Y ÷
ø
Where X and are
Y the means of X and Y observations.
X2 Y2
åX åY å XY åX
2
åY
2
Slide 4
Interpretation of Pearson Correlation Coefficient
Disadvantages
• It is applied mostly in situations where the relation is
linear.
• The mathematical computation and the technical
interpretation sometime pose difficulty.
• It is affected by extreme values.
Slide 6
Example 1
A 7 15 105 49 225
B 9 6 54 81 36
C 5 7 35 25 49
D 11 10 110 121 100
E 2 8 16 4 64
F 4 7 28 16 49
G 4 12 48 16 144
H 3 11 33 9 121
I 8 9 72 64 81
J 11 8 88 121 64
Total 64 93 589 506 933
å X = 64 åY = 93 å X = 505
2
å XY = 589
n=10
Slide 9
Pearson Correlation (Product Moment) Coefficient
nå XY - ( å X )( å Y ) 10(589) - 64(93)
r= = =
én X 2 - ( X ) ù én Y 2 - ( Y ) ù é10 ( 506 ) - ( 64 )2 ù é10 ( 933) - ( 93)2 ù
êë å å úû êë å å úû
2 2
ë ûë û
-62
=
( 964 )( 681) = -0.07652
• C. There is a weak negative correlation between sales and
cost
Slide 10
Spearman’s Rank Correlation Coefficient
Slide 13
Example 2
• The following are two sets of ranks assigned by two
Professors of management to ten styles of business
leadership.
• The first, second and third rows shows the styles, rank by
the first Professor and rank by the second Professor
respectively.
• The next step is to find the difference between the ranks
which has been presented as row 4.
Style
A B C D E F G H I
r1 10 1 2 9 8 6 7 5 4
r2 6 9 3 10 2 7 5 1 8
d=r1- 4 -8 -1 -1 6 -1 2 4 -4
r2
d2 16 64 1 1 36 1 4 16 16 å d= 155
2
Slide 15
Solution to Example 2
rs = 1 - = 1-
(
n n2 -1 ) (
9 92 - 1 )
rs = -0.29167
b. Interpretation: There is a weak negative correlation
amongst the rankings by the two professors.
Slide 16
Testing the Significance of the Correlation
Coefficient
• We usually do not have information on the population so
we always use the sample to make estimates of correlation
• Population correlation coefficient ( r ) is the correlation
coefficient calculated by examining all the possible
• Sample correlation coefficient is obtained by examining
only a sample and the best estimator of the population
correlation coefficient if the two variables are random
variables, linearly related and normally distribution.
• From the sample results can I make any inference about
the population correlation coefficient? Sample values can
vary based on the sample chosen
Slide 17
Testing the Significance of the Correlation
Coefficient cont’d
• To answer the preceding question, we need to use the
sample correlation coefficient to test the hypothesis that
there is no linear relationship between the two variables in
the population.
• Four steps are involved: H0 : r = 0
Step 1: State your null and alternative hypothesis. H 1 : r =/ 0
Slide 19
Testing the Significance of the Correlation
Coefficient cont’d
• Step 3: Construct your decision rule. Reject the Ho
when the computed t-value is greater than the critical
t value as a particular level of significance. It means
there is a significant difference between the value of
( r) and 0 hence there is the presence of correlation.
When we fail to reject the Ho at the significance
level, it means the value of ( r ) is not different from
0 and it is probably due to chance.
• Step 4: Compare the calculated values with the
critical values and conclude. The critical value of t is
read from the t-distribution table
Slide 20 r
Example 3
Slide 21
Solution to Example 3
• Step 1: State your null and alternative hypothesis.
H0 : r = 0
H1 : r =
/ 0
• Step 2: Select a test statistic and calculate the critical
values :
n-2 10 - 2 8
t=r = -0.871 = -0.871 = -5.014
1- r 2
1 - (-0.871) 2
0.2414
Slide 1
Concept of ANOVA
• Analysis of Variance (ANOVA) is the technique of using
the F test to compare the equality in three or more
population means.
• Assuming that we have 3 different brands of cars Nissan,
Toyota and Opel, and we would like to test the null
hypothesis that the mean kilometers covered by all the 3
brands of cars are equal against the alternative that they
are not the same. ANOVA technique is used in such a
case.
• ANOVA technique helps us to examine if a significant
difference exists between the kilometers covered by the 3
car brands or any difference observed is due to chance.
Slide 2
Advantages of Using ANOVA technique
Compared to the z or t test, ANOVA technique is good in testing
for the equality of 3 or more means for the following reasons:
1. Using the z or t test in comparing several means will imply
that only two means can be compared at one particular time
whilst the rest of the means are ignored. F test used in
ANOVA helps to compare several means simultaneously.
2. In making pairwise comparisons, the z or t tests increases the
probability of rejecting the null hypothesis when it is true.
3. The more means that have to be compared, the more z or t
tests that are required. This is a laborious task and can be
done at a goal using ANOVA technique.
Slide 3
Assumptions Underlying the ANOVA Technique
Slide 4
One-Way ANOVA
• Involves the comparison of three or more population
means involving only one variable.
• F -test is used in making such comparisons since variances
rather than means are compared.
- Explains why the technique is called Analysis of
Variance (ANOVA).
• Example: Assuming I want to test the hypotheses of the
equality of the mean grade of 5 study groups in level 300.
- Variable - mean grade obtained,
- More than 3 groups involved -5 study groups.
Slide 5
Procedure for Hypothesis Testing
Assuming we want to compare the means of k groups
usually called treatments. Independent samples would have
to be taken from the pop.
STEP 1: Statement of Null and Alternative Hypothesis
• H0:
H1 : for some
• Ho: The k pop. means are equal given the independent
random samples from the pop.
• Ha: At least for one pair, the means of the pop.are
different.
Slide 6
Procedure for Hypothesis Testing
STEP 2: Determine the Choice of Statistical Test to
Use
• We comparing several pop. means involving one
variable and therefore the F statistic which will be
used
Slide 9
Computation of F-Test Value
• Consider the sum of squares deviation from the overall
mean called Total Sum of Squares (SST )
• Mathematically:
Slide 10
Computation of F-Test Value cont’d
(1)
k ni k ni k ni
= åå ( xij - xi ) + åå ( xi - x ) + 2å ( xi - x ) å ( xij - xi )
2 2 2
i =1 j =1 i =1 j =1 i =1 j =1
since
Slide 11
Computation of F-Test Value cont’d
• Consider the additive
• Expanding:
= =
• k
ån (x - x)
2
• = i i (2)
i =1
Slide 12
Computation of F-Test Value Cont’d
Inserting equation 2 into 1 consideration
We obtain:
…. (3)
k
å (x )
2
= ni i - x
i =1
Slide 13
Computation of F-Test Value Cont’d
• The operational formula used are is:
• :
Slide 14
Computation of F-Test Value Cont’d
• Calculting mean sum of squares : done to estimate a common
population variance.
Slide 15
One-way ANOVA Table
Source of Sum of Degrees of Mean F ratio
variation Squares freedom Squares
Error SS E n-k SS E
MSE =
n-k
Total SST n -1
Slide 16
Hypothesis Testing
STEP 5: Make your decision in the light of the decision
rules and summarize your results
• We compare the calculated F test value with the critical F
value which is read from the F distribution table given
that all assumptions stated in the previous section hold.
• If the F calculated value is greater than the F critical
value, then we reject the null hypothesis at the
significance level.
• If the F calculated value is less than the F critical value,
then we fail to reject the null hypothesis at the
significance level. This result will imply that the
population means are indeed equal.
Slide 17
Example 1
The Research Department of a Transport Company wishes
to know whether the mean mileages of 4 different brands of
tyres are the same. If the means are the same, then the
company will always go for the brand of tyres with the
lowest price. The Department therefore took the mileage of
the different tyres for 5 months. The result of their
experiment is represented below. Test whether the mean
mileages are the same if α =0.05
Slide 18
Example 1
• Mileage of 4 different cars
Brands of tyres/ Month 1 2 3 4
1 80 81 76 93
2 86 82 82 90
3 83 88 84 91
4 85 83 82 93
5 81 86 84 88
Slide 19
Solution to Example 1
STEP 1: State your null and alternative hypothesis
H0:
• H0: for some
STEP 2: Determine the choice of statistical test to use
• One independent variable –type of tyre and one depended
variable -mean mileage covered. One-Way ANOVA
STEP 3: Determine the decision rules and find the
critical values
• n = 20 and k = 4.
Slide 20
Solution to Example 1 Cont’d
Table for Computations of Total and Mean Mileages
Brands 1 2 3 4
80 81 76 93
86 82 82 90
83 88 84 91
85 83 82 93
81 86 84 88
Means x1 = 83 x2 = 84 x3 = 82 x4 = 91
Slide 21
Solution to Example 1 Cont’d
• STEP 4: Compute the F test value and tabulate
Slide 22
Solution to Example 1 Cont’d
• First additive= summation of the square of each mileage
covered by all the brands.
Slide 23
Solution to Example 1 Cont’d
Slide 24
Solution to Example 1 Cont’d
• Computations using the One-Way ANOVA
Sources of Sum of Degrees of Mean sum F ratio
variation Squares freedom of Squares
Total 350 20 – 1 = 19
Slide 25
Solution to Example 1 Cont’d
STEP 5: Make your decision in the light of the decision
rules and summarize your results
• The calculated F test value is greater than the F critical
value.
• That is,
• We reject the null hypothesis at the 5% level of
significance.
• This means that the mean mileages of the four brands of
tyres are not equal. There is a difference in the means of
at least one pair of the 4 different brands of tyres.
Slide 26
Activity 1
A researcher wanted to investigate Weight gains from 3 different
the impact of three forms of diet on diets
the weight of students in the Diet I Diet II Diet III
University of Ghana. Students were 8 10 7
therefore randomly assigned to take 5 9 6
the three different kinds of diet. They 5 14 8
were placed on the diet for 8 weeks. 4 8 5
The table below shows the weight 3 11 15
gains (in kg) of the students in the 3 2 9
different diets. 7 13
• Test the hypothesis that there is no 4
difference in the diets given to the 12
students if Slide 27
ECON 316
Applied Statistics for Economists
Slide 1
Session Outline
The key topics to be covered in the session are as follows:
• Concept of Two-Way ANOVA
• Randomized Blocks Design
• Hypothesis Testing in Two-Way ANOVA
Slide 2
Concept of Two-Way ANOVA
• Two-Way ANOVA technique is used to study the influence of two
independent variables simultaneously on a dependent variable.
• Example: a student may be interested in the effects of three different
types of plant food and three different types of soil on the growth of
a certain seedling.
- two independent variables: plant food and soil type
- one dependent variable: growth of the plant.
• Assuming there are:
- k number of treatment groups (different set of observations of one
of the independent variable) and
- b number of blocks (different observations of the other
independent variable).
Slide 3
Concept of Two-Way ANOVA- randomized
blocks design.
• Let xij represent the sample TREATMENT GROUPS
observation corresponding
BLOCKS 1 2 3 ….. K
to the response of ith
treatment in the jth block. 1 x11 x12 x13 ….. x1k
Slide 5
Procedure for Two-Way ANOVA
STEP 1: statement of null and alternative hypothesis
• In Two-Way ANOVA , there two sets of hypotheses
that have to be tested.
•
• and
• for some treatments for some blocks
Slide 8
Procedure for Two-Way ANOVA Cont’d
STEP 4: COMPUTATION OF THE F TEST VALUE
Slide 9
Procedure for Two-Way ANOVA Cont’d
• In Two-Way ANOVA, SST which measures the total
sample variability of the observations about the overall
mean can be partitioned into three main components
where
- Total Sum of Squares
Slide 11
Procedure for Two-Way ANOVA Cont’d
• F test value for comparing treatment means is found by
dividing the means sum of squares for treatment by the
mean sum of squares for errors.
Slide 12
General Set-up of a Two-Way ANOVA
Source of Sum of Degrees of Mean Squares F ratio
Variation Squares freedom
Block SS B b–1 SS B MS B
MS B = F=
b -1 MS E
Total SST n– 1
Slide 13
Procedure for Two-Way ANOVA Cont’d
STEP 5: Make your decision in the light of the decision rules and
summarize your results
• First set of hypothesis: equality of the treatment means. If
the calculated value of F for the treatment group is greater
than its respective critical value of F at a given level of
significance, then we reject the null hypothesis of equal
treatment means.
• Second set of hypothesis: equality of the block. If the
calculated value of F for the blocks is greater than its
respective critical value of F at a given level of
significance, then we reject the null hypothesis of equal
block means.
Slide 14
Example 1
Economists
In an annual meeting of Pharmaceutical A B C D
shareholders of Companies
pharmaceutical 1 7 10 8 15
companies in Ghana, 2 13 9 10 14
four economists were 3 11 8 6 9
asked to predict the 4 9 12 11 10
percentage increase in 5 9 8 13 12
share prices for five
pharmaceutical i. test the null hypothesis that the population
companies. Information mean percentage increases are the same for all
obtained from the the four economists
experiment is presented ii. Test the null hypothesis that the mean
in the table below: percentage increases are the same for all the
pharmaceutical companies. Take alpha to be
0.05.
Slide 15
Solution to Example 1
STEP 1: statement of null and alternative hypothesis
Slide 17
Solution to Example 1 Cont’d
STEP 4: COMPUTATION OF THE F TEST VALUES
• Table for treatment and block totals
Economists
Pharmaceutical A B C D Totals for
Companies Blocks
1 7 10 8 15 40
2 13 9 10 14 46
3 11 8 6 9 34
4 9 12 11 10 42
5 9 8 13 12 42
Totals for T =204
treatments 49 47 48 60
Slide 18
Solution to Example 1 Cont’d
Slide 19
Solution to Example 1 Cont’d
• ,
The 2nd term has already been computed to be 2080.80
• Thus
Thus,
Slide 20
ANOVA Table
Source of Sum of Degrees of Mean Squares F ratio
Variation Squares freedom
Error 68 (k – 1)(b – 1) 68
MS E = = 5.667
= 12 12
Total 109.2 n– 1= 19
Slide 21
Solution to Example 1 Cont’d
STEP 5: Make your decision in the light of the decision rules
and summarize your results
• Test of equality of the treatment means:, F calculated value
(1.294) is less than the F critical value (3.49) so we fail to
reject the null hypothesis of equal treatment means at the 5%
level of significance. The mean percentage increase in share
prices is the same for all economists.
• Test fo equality of the block means : F test value (0.847) is
less than the F critical value(3.26) so we fail to reject the null
hypothesis of equal block means at the 5% level of
significance. The mean percentage increase in share prices is
the same for all the pharmaceutical companies.
Slide 22
Activity 1
In 4 years, Unilever tested its Kinds of Soap
three different kinds of soap in
selected towns in Ghana. The Years A B C
table below shows the sales 1 70 50 60
recorded (in million Ghana
2 100 70 120
cedis) for each of the three
soaps sold in the four-year 3 60 80 90
period. You are required to use 4 40 90 150
Two-Way ANOVA to test the
hypotheses that the mean sales
for the three kinds of soap are
the same for all the 4 years.
Slide 23