0% found this document useful (0 votes)
28 views376 pages

Econ 316: Applied Statistics For Economist

The document outlines the concept of confidence intervals (CIs) in statistics, specifically focusing on CIs for the difference in means, population variance, and standard deviation. It provides formulas and examples for calculating CIs under various conditions, including known and unknown variances. Additionally, it discusses the chi-square distribution and its characteristics, including degrees of freedom.

Uploaded by

Ntobea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views376 pages

Econ 316: Applied Statistics For Economist

The document outlines the concept of confidence intervals (CIs) in statistics, specifically focusing on CIs for the difference in means, population variance, and standard deviation. It provides formulas and examples for calculating CIs under various conditions, including known and unknown variances. Additionally, it discusses the chi-square distribution and its characteristics, including degrees of freedom.

Uploaded by

Ntobea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 376

ECON 316: APPLIED

STATISTICS FOR ECONOMIST

Lecture 1: Confidence Interval


Instructor: Prof. D. K. Twerefou/Dr. Emmanuel Abbey
Outline
• Confidence interval for difference in means
• Confidence interval for the population variance
and standard deviation
• Confidence interval for the ratio of two variances

10/14/21 2
WHAT IS CONFIDENCE INTERVAL (CI)
• An interval of numbers used to approximate the
true value of a population parameter.

• Associated with any CI is a number that indicates


the faith or confidence we have that the
population parameter lies between the lower and
upper bounds
10/14/21 3
CONFIDENCE INTERVAL FOR THEMEAN
• For the population mean: 𝑝 𝑧! ≤ 𝑧 ≤ 𝑧! = 1 − α
" "
(1-α)100% CI for the population mean :

𝜎 𝜎 large samples
𝑥̅ − 𝑍! ≤ 𝜇 ≤ 𝑥̅ + 𝑍!
" 𝑛 " 𝑛
𝑠 𝑠 Small samples
*𝑥 − 𝑡! ≤ 𝜇 ≤ 𝑥̅ + 𝑡!
" 𝑛 " 𝑛
10/14/21 4
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS (Paired Data)
• Suppose we have two independent random samples
with mean 𝑥̅! and 𝑥̅" respective sample sizes n1 and n2
from a normal population with mean 𝜇! and 𝜇" and
variances and 𝜎!" and 𝜎"" .

• Then we can determine the CI on 𝜇! − 𝜇" , the difference


between the two population means under various
assumptions of the population variance.
10/14/21 5
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS contd
• Case 1: where 𝜎#" and 𝜎"" are KNOWN but n1 and n2
are LARGE.
•.
•𝑥̅Then 100 1 −𝜎!𝛼 %" confidence interval for 𝜇# −!𝜇" is"
"
𝜎 "
𝜎 "
𝜎 "
! − 𝑥̅ " − 𝑍#/" + ≤ 𝜇! − 𝜇" ≤ 𝑥!̅ − 𝑥̅" + 𝑍#/" +
given by: 𝑛 ! 𝑛 " 𝑛 ! 𝑛 "

10/14/21 6
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS contd
" "
• CASE 2: where and𝜎! are UNKNOWN but n1 and n2 are
𝜎"
LARGE.
• Then 100 1 − 𝛼 % confidence interval for 𝜇! − 𝜇" is given
by:
𝑠!" 𝑠"" 𝑠!" 𝑠""
𝑥!̅ − 𝑥̅" − 𝑍#/" + ≤ 𝜇! − 𝜇" ≤ 𝑥!̅ − 𝑥̅" + 𝑍#/" +
𝑛! 𝑛" 𝑛! 𝑛"

10/14/21 7
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS contd
" "
• CASE 3: 𝜎! and 𝜎" are UNKNOWN but n1 and n2 are SMALL.
A. If the variances are EQUAL, that is 𝝈𝟐𝟏 = 𝝈𝟐𝟐 then a
100 1 − 𝛼 % confidence interval on 𝜇! − 𝜇" is given by:

1 1 1 1
𝑥!̅ − 𝑥̅" − 𝑡#/" 𝑆𝑝 + ≤ 𝜇! − 𝜇" ≤ 𝑥!̅ − 𝑥̅" + 𝑡#/" 𝑆𝑝 +
𝑛! 𝑛" 𝑛! 𝑛"

10/14/21 8
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS contd
%# &! '#" ( %" &! '""
• Where 𝑆𝑝 =
%# (%" &"

𝑠!" 𝑠"" 𝑠!
"
𝑠"
"
𝑥!̅ − 𝑥̅" − 𝑡 ∗ #/" + ∗
≤ 𝜇! − 𝜇" ≤ 𝑥!̅ − 𝑥̅" + 𝑡 #/" +
𝑛! 𝑛" 𝑛! 𝑛"

10/14/21 9
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS contd
B. If no assumption of equality of the variance is
made, then a 100 1 − 𝛼 % confidence interval on
𝜇! − 𝜇" is given by:

10/14/21 10
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS
• where 𝑡 ∗ is approximately t-distribution with degrees of
freedom f given by:
" " "
𝑠# 𝑠"
+
𝑛# 𝑛"
𝑓=
" " " "
𝑠# 𝑠"
𝑛# 𝑛"
+
𝑛# − 1 𝑛" − 1
10/14/21 11
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS- EXAMPLE 1
• Research at the University JSS shows that first year JSS class
has 22 students whose mean height is 47.75 inches , while
the second year class has 25 students whose mean height is
50.40 inches. If the standard deviation for the heights of first
and second year students are known to be 1.80 and 2.05
inches respectively, find the 95% confidence interval for the
mean height 𝜇! − 𝜇" .
• Interpret your results
10/14/21 12
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS -Solution
• 𝑛! = 22, 𝑛" = 25, 𝑥!̅ = 47.75, 𝑥̅" = 50.40, 𝑠! = 1.80, 𝑠" =
2.05
• Note: Small sample size, mean and standard deviation are
known and no assumption about the equality of mean
Therefore we use the formula:

∗ '#" '"" ∗ '#" '""


• 𝑥!̅ − 𝑥̅" − 𝑡 #/" + ≤ 𝜇! − 𝜇" ≤ 𝑥!̅ − 𝑥̅" + 𝑡 #/" +
%# %" %# %"

10/14/21 13
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS -solution
" " " "
$# $"
( #.(" ".)* "
%# %" ( *.*,-
"" "*
• 𝑓= " " = " " = = 48.98 ;
"
$# "
$" #.( " ".)* " *.**"
%# %" "" "*
( (
%# &# %" &# "# "+
∗ ∗
𝑡&.&"(,*+ ≈ 𝑡&.&"(,*& = 2.021
• By inserting the relevant values into the specified equation:
∗ ∗
2.65 − 𝑡*.*".,0, 0.561 ≤ 𝜇! − 𝜇" ≤ 2.65 + 𝑡*.*".,0, 0.561
2.65− 2.021 ∗ (0.561)≤𝜇_1−𝜇_2≤ 2.65+ 2.021∗ (0.561)
1.516 ≤ 𝜇! − 𝜇" ≤ 3.784
10/14/21 14
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS-INTERPRETATION
• If we are to take 100 different paired samples and find the
differences in the means. For 95% of the samples the
difference in means will lie in the interval. For the remaining
5% the difference in means will lie outside the interval.

10/14/21 15
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS-EXAMPLE 2
• QUESTION 2: A vending machine designed to dispense coffee into 8
milliliter cups was checked by a technician who samples 4 cups
before making an adjustment, and 5 cups after making an
adjustment. Assuming that the variance are known to be equal, find
a 90% confidence interval for the mean difference in the amount
dispensed due to the adjustment, if the sample showed the
following amount of coffee
Before adjustment. 6.92 7.34 7.26 6.88
After adjustment 7.33 7.93 7.65 7.49 7.10
10/14/21 16
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS -SOLUTION
𝑛# = 4, 𝑛" = 5, 𝑥#̅ = 7.10, 𝑥" = 7.50,
𝑠#" = 0.2338 " , 𝑠"" = 0.3148 " ,
𝑡$.$&,( = 1.895;
0&! *."11- " ( .&! (*.1!0-)"
𝑆𝑝 = = 0.2830
0(.&"
• By inserting the values above into the t-test formula specified:
1 1 1 1
𝑥!̅ − 𝑥̅" − 𝑡#/" 𝑆𝑝 + ≤ 𝜇! − 𝜇" ≤ 𝑥!̅ − 𝑥̅" + 𝑡#/" 𝑆𝑝 +
𝑛! 𝑛" 𝑛! 𝑛"
10/14/21 17
CI FOR THE DIFFERENCE BETWEEN TWO
MEANS -SOLUTON
! !
• 7.10 − 7.50 ± 1.895 0.2830 +
0 .

• −0.40 ± 0.36
• Therefore −0.76 ≤ 𝜇! − 𝜇" ≤ −0.04

• INTERPRETATION:???.

10/14/21 18
DEGREE OF FREEDOM
• The degree of freedom represents the number of observations
in the sample that are free to vary around the mean of the
sample.
• Example 1: Let n = 2 and a and b are the values. For any mean
𝑥,̅ the value of b depends on a and not free to vary. That is if
𝑥̅ = 7 , and a = 10 then b must be 4.

• Example 2: If n=1 then there is no degree of freedom because


this number is the mean 𝑥̅ .
10/14/21 19
DEGREE OF FREEDOM
• EXAMPLE 2: If n=2, 𝑥̅ = 10 and a=15 then b=5
• If n=3, then any two values are free to vary but once the two
are selected, the third is fixed.

• In general given any mean value 𝑥̅ and , 𝑛 sample


observations, once , 𝑛 − 1 value are determined the final value
is no longer free to vary.

10/14/21 20
DEGREE OF FREEDOM
• In other words, the degree of freedom represents the number
of observations in the sample that are free to vary around the
mean of the sample. Example, Let n = 2 and a and b are the
values. For any mean , the value of b depends on a and not
free to vary. That is if , and a = 10 then b must be 4.

10/14/21 21
CHI-SQUARE DISTRIBUTION
• Suppose we have a set of normal and independent variable
5# &6
𝑋! , 𝑋",…., 𝑋% and we normalized them, 𝑍! = , 𝑍" =
7
5" &6 5% &6
, … … . , 𝑍% = then the sum of the squares of the
7 7
normalized variables has a chi-squared (𝑥 " ) distribution.
;, &6 "
∑%9:! 𝑍9"
• That is 𝜒8" = = %
∑9:! is chi-square distribution with
7
v=n-1 degrees of freedom.

10/14/21 22
CHARACTERISTICS OF CHI-SQUARE
DISTRIBUTION
• It is skewed to the right as the right tail is asymmetrical to the
horizontal axis and the domain consists of non-negative real
numbers
• The sampling distribution for inferences about a population
variance to its estimator is described by the chi-square
distribution
• As the sample size increases the χ2 distribution becomes
symmetrical. Thus 𝑣 = 𝑛 − 1 > 30 it is approximated to the
normal distribution.

10/14/21 23
CHARACTERISTICS OF CHI-SQUARE
DISTRIBUTION
• Graphically:
𝑓(𝜒 " ) df=5

df=10

𝜒"
.

10/14/21 24
CHARACTERISTICS OF CHI-SQUARE
DISTRIBUTION
• The parameter of the chi-square distribution is called the
degree of freedom which is 𝑣 = 𝑛 −1
• Like the standard normal distribution 𝑍𝛼, the chi-squared
distribution 𝜒&" is defined as the value for which the area
under the curve to the right is equal to 𝛼.
• This value depends on the number of degrees of freedom
and must be obtained from a chi-square table.

10/14/21 25
a
2

1 - a2
a CHARACTERISTICS OF CHI-SQUARE
c 21 - a

DISTRIBUTION
1-a22 , v
c
c 2
a
1-a 2 , v
2 ,v
c 2
a
2 ,v

• Graphically:

10/14/21 26
CHARACTERISTICS OF CHI-SQUARE
DISTRIBUTION
"
• Also, 𝜒#/" is such that the area under the curve to the right
"
with v degrees of freedom is 𝛼/2 and 𝜒!&#/" means that the
area under the curve to the left with v degrees of freedom is
𝛼/2. This implies that the chi-square distribution is not
symmetrical.

10/14/21 27
CHARACTERISTICS OF CHI-SQUARE
DISTRIBUTION
• Graphically:

10/14/21 28
CI FOR A CHI-SQUARE DISTRIBUTION
• From the definition of chi-squared distribution, we know that:
;, &6 " %&! ;, &6 " %&! ' "
• 𝜒8" = ∑%9:! 𝑍9" = %
∑9:! = "
%
∑9:! = …1
7 7 %&! 7"
• where s is the sample variance and 𝑣 = 𝑛 − 1 > 0
" " "
• Let 𝑝 𝜒!& ! ≤ 𝜒 ≤ 𝜒! = 1 − 𝛼 … … .2
" "
" " "
• Consider the inequality 𝜒!& ! ≤ 𝜒 ≤ 𝜒 !…..3
"- "-

10/14/21 29
CI FOR A CHI-SQUARE DISTRIBUTION
(%&!)' "
• Recall: 𝜒 " =
7"
• By substituting equation 1 into equation 3, we obtain:
" (%&!)' " "
𝜒!& ! ≤ ≤ 𝜒 ! ……4
"-
7" "-

• By inverting equation 4:
" 7"
• 1/ 𝜒!& ! ≥ ≥ 1/𝜒 "!
"-
(%&!)' " "-

10/14/21 30
CI FOR A CHI-SQUARE DISTRIBUTION
• Multiply through the equation by (𝑛 − 1)𝑠 " we obtain:
(%&!)' " (%&!)' "
≥ 𝜎"≥
<" ! <"!
#&
"- "-
• Reversing the order of the inequality we obtain :
(()!)+ ! ()! + ! CONFIDENCE INTERVAL FOR A POPULATION
≤ 𝜎"≤ VARIANCE
,!" ,! "
!# $% !#
CONFIDENCE INTERVAL FOR A POPULATION
(%&!)' " %&! ' " STANDARD DEVIATION
≤σ≤
<"! <" !
10/14/21 "./
#&
"./ 31
CI FOR A CHI-SQUARE DISTRIBUTION
• INTERPRETATION: The interpretation is that with the help of
the sample variance (standard deviation) we are sure that the
population variance (standard deviation) will lie in between
the extreme points of the confidence interval.

10/14/21 32
MEAN AND VARIANCE OF CHI-SQUARE
DISTRITUTION
• If a variable x has a chi squared distribution, the expected value of the
variable 𝐸 𝑥 = 𝑣 degree of freedom and the variance 𝑉𝑎𝑟 𝑥 = 2𝑣 .

• As the sample size becomes larger the chi-squared is approximated to


5&=(;)
normal distribution and the standard variable Z = ~𝑁(0,1)
>?@(;)

10/14/21 33
CI FOR A CHI-SQUARE DISTRIBUTION
• QUESTION 1: machine was tested 9 times with a standard deviation of
0.15. Construct a 90% confidence interval for the variance.

• SOLUTION: 𝑛 = 9, 𝑠 = 0.15, 𝐶𝐼 = 0.90, 𝑑𝑓 = 𝑛 − 1 = 9 − 1 = 8, 𝛼 =


# #
0.1, = 0.05, 1 − = 1 − 0.05 = 0.95 , 𝑥&.&(,, = 15.51, 𝑥&.+(,, = 2.733
" "

(()!)+ ! ()! + !
• By using the CI for pop Var. equation: ≤ 𝜎"≤
,!" ,! "
!# $% !#

10/14/21 34
CI FOR A CHI-SQUARE DISTRIBUTION
• By inserting the relevant value into the equation:
*.*"". - " *.*"". -
≤𝜎 ≤ = 0.012 ≤ 𝜎 " ≤ 0.066
!...! ".A1

• INTERPRETATION: If we are to test the machine 100 there would


be many different values of the variance. But in 1 − 𝛼 100% =
90% of all the possible tests we will have the variance lying in the
interval 0.012 ≤ 𝜎 " ≤ 0.066. In 𝛼% = 10% of the tests the
variance will not lying in the interval.

10/14/21 35
CI FOR A CHI-SQUARE DISTRIBUTION
• QUESTION 2: The weight of 15 books randomly selected from a
library has a sample standard deviation of 0.011. Construct a
95% confidence interval for the standard deviation of the
population sampled.

• SOLUTION: 𝑛 = 15, 𝑠 = 0.011, 𝐶. 𝐼 = 95%, 𝑑𝑓 = 𝑛 − 1 =


# #
15 − 1 = 14, 𝛼 = 0.05, = 0.025, 1 − =
" "
" "
0.975, 𝜒*.*".,!0 = 26.12, 𝜒*.,A.,!0 = 5.63
10/14/21 36
CI FOR A CHI-SQUARE DISTRIBUTION
(%&!)' " %&! ' "
• By using the CI FOR THE POP STD DEV.: ≤σ≤
<"! <" !
#&
"./ "./

• By inserting the relevant values into the equation:


(!0)(*.*!!)" !0 (*.*!!)"
≤σ≤ = 0.000064 ≤ 𝜎 ≤ 0.0003
"B.!" ..B1

10/14/21 37
F-DISTRIBUTION
• The F distribution is sometimes called the variance ratio.

• The F – statistics usually involve the ratio of two independent


estimates of variance and used to test for the equality of two
independent estimates of the variance or standard deviation.

10/14/21 38
F-DISTRIBUTION
• If two variables have independent chi – squared distribution 𝜒!"
and 𝜒"" with 𝑣! and 𝑣" degrees of freedom respectively.
(%# &#)$"# -# $"
#
0"
# C- 3" 3" $"
# -# # -#
The statistic formed by this ratio: 0"
#
= (%" &#)$"
= $"
= #
$"
= 𝐹8# 8"
" C- " " -" "
" 3" 3"
" -" " -"

• If 𝜎!" = 𝜎"" has an F-distribution with 𝑣! 𝑎𝑛𝑑 𝑣" degrees of freedom.

10/14/21 39
CHARACTERISTICS OF F-DISTRIBUTION
• F distribution is skewed to the right and the range of values of
F is 0 ≤ 𝐹 ≤∝

10/14/21 40
CHARACTERISTICS OF F-DISTRIBUTION
• The F-distribution has two set of degrees of freedom, one for
the numerator and the other for the denominator.
• The degree of freedom depends on the way in which we obtain
estimates of the two variances appearing in the numerator and
denominator of the F-ratio.
• The F table gives the probability of the right-hand tail. Since
the F distribution is not symmetrical the left-hand tail cannot
be deduced from the regular F table.

10/14/21 41
CHARACTERISTICS OF F-DISTRIBUTION
• By conversion for a two-tail test, the F ratio is always evaluated with
the larger estimate of the variance as the numerator and the smaller
!
estimate as the denominator. Then 𝐹#/",8# 8" =
D#&!/",-# -"

• Rule of thumb Method: When conducting a two-tail test we halve the


value of our significant level and read from the F – table. E.g., if you
choose the 5% level of significant for a two-tail test, we take the value
𝐹*.*". with the relevant degrees of freedom as our critical value.
10/14/21 42
CONFIDENCE INTERVAL ON THE RATIO OF
TWO VARIANCE
7#"
• To find the confidence interval for means that we have to
7""

$"
# E"
3
• 𝑝 𝐹!&!,8 ≤ $"
#
≤ 𝐹#/",8# 8" = 1 − 𝛼
" # 8" " E"
3
"

10/14/21 43
CONFIDENCE INTERVAL ON THE RATIO OF
TWO VARIANCE
$"
# E"
3
• Consider the inequality: 𝐹!&!,8 ≤ $"
#
≤ 𝐹#/",8# 8"
" # 8" "E
3"
"

'#" 7""
• By Rearranging we obtain: 𝐹!&!,8 ≤ ≤ 𝐹#/",8# 8"
" # 8" '"" 7#"

'"" 7"" '""


• By simplifying further: 𝐹!&!,8 8" '#"
≤ ≤ 𝐹!,8
" # 7#" '#" 8
" # "

10/14/21 44
CONFIDENCE INTERVAL ON THE RATIO OF
TWO VARIANCE
#
• However, 𝐹#)!,* * =
" # " +!/",&# &"

+!! .!! +!!


• By inserting the above equation: ≤ ≤ + ! 𝐹",0
+$! -" .$! $
0
! ! $
,# #
! $ !

" " " 𝟏𝟎𝟎 𝟏 − 𝜶 % CONFIDENCE


𝑠" 𝜎" 𝑠" INTERVAL OF THE RATIO OF
≤ ≤ 𝐹&,0
𝑠!" 𝐹&,0 0 𝜎!" 𝑠!" " ! 0$ THE TWO VARIANCES
!!!
!"!
" $ !
10/14/21 45
CONFIDENCE INTERVAL ON THE RATIO OF
TWO VARIANCE
• QUESTION: A study was conducted to compare the nicotine
content of two brands of cigarettes. Ten cigarettes of brand A
had average nicotine content of 3.1 mg and a standard
deviation of 0.5, while 8 of brand B had an average of 2.7 mg
and a standard deviation of 0.7.
a. Construct a 95% confidence interval for the ratio of the
variances.
b. Construct a 95 % confidence interval for the difference in
means
10/14/21 46
CONFIDENCE INTERVAL ON THE RATIO OF
TWO VARIANCE
• SOLUTION: 𝑛! = 10, 𝑛" = 8, 𝑥!̅ = 3.10, 𝑥" = 2.50, 𝑠! = 0.5 , 𝑠" =
#
0.7, 𝑣! = 10 − 1 = 9, 𝑣" = 8 − 1 = 7; 𝛼 = 0.05, = 0.025
"

+!! .!! +!!


• By using the formula: ≤ ≤ + ! 𝐹",0
+$! -" .$! $
0
! ! $
! ,#$ #!
• By inserting relevant values into the formula:
(*.A)" ! 7"" (*.A)" 7""
• " ∗ ≤ ≤ " ∗ 4.1970 = 0.4046 ≤ ≤ 8.226
(*..) 0.-"1" 7#" (*..) 7#"

10/14/21 47
CONFIDENCE INTERVAL ON THE RATIO OF
TWO VARIANCE
• QUESTION: In measuring the content of 6 boxes by one
machine, a student determined the sample variance to be
0.1754. In measuring the content of 11 boxes filled by a second
machine, he found a sample variance of 0.2704. Assuming that
the amount dispense follows a normal distribution for each
machine, find a 95% confidence interval for the ratio of the
variance.

10/14/21 48
CONFIDENCE INTERVAL ON THE RATIO OF
TWO VARIANCE
• SOLUTION: 𝑛! = 6 𝑛" = 11, 𝑠!" = 0. 1754, 𝑠"" = 0.2704 ,
#
𝛼 = 0.05, = 0.025, 𝐹!,8 8 = 𝐹*.*".,. !* =? ? ? ? ?
" " # "

'"" 7"" '""


• By using the formula: ≤ ≤ 𝐹!,8
'#" D! 7#" '#" 8
" " #
,- -
" # "

0.2704 7"" 0.2704


• Inserting the values: ≤ ≤ (? ? ? )
*.!A.0(???) 7#" *.!A.0
7""
=? ? ? ? ? ≤ ≤? ? ? ? ?
10/14/21
7#" 49
APPLIED STATISTICS FOR
ECONOMIST

HYPOTHESIS TESTING
Instructor: Prof. D. K. Twerefou & Emmanuel Abbey
WHAT IS STATISTICAL HYPOTHESIS?
• A statistical hypothesis is an assertion or conjecture about a
population parameter or parameters such as the mean, standard
deviation and the variance of a population.
• Hypotheses in statistics always come in pairs:
- Null hypothesis, 𝐻! (expresses no difference between a
parameter)
- Alternative hypothesis, 𝐻" (accepted when the null hypothesis is
rejected)

10/27/21 2
TYPES OF HYPOTHESIS
• Three main types of hypothesis.
• Consider the population parameter – the mean –μ and the
hypothesized value of the population mean -μo. Below are the
three main types of hypothesis:

𝑯𝑶 : 𝝁 ≥ 𝝁𝟎 𝑯𝑶 : 𝝁 ≤ 𝝁𝟎 𝑯𝑶 : 𝝁 = 𝝁𝟎
3

𝑯𝒂 : 𝝁 < 𝝁𝟎 𝑯𝒂 : 𝝁 > 𝝁𝟎 𝑯𝒂 : 𝝁 ≠ 𝝁𝟎

10/27/21
HYPOTHESIS TESTING
• In hypothesis testing, a specific idea concerning a parameter is
available before the study and the purpose of the test is to
conduct an experiment by collecting data to confirm, or otherwise
reject this idea
• Null hypothesis is valuated for reasonableness based on sample
evidence.
- If it is not reasonable, it is rejected.
- If it is reasonable or the observed difference is not significant,
then we do not reject it
10/27/21 4
TYPE I AND II ERRORS
• If the null hypothesis is true and rejected or false and
accepted, the decision is in either case an error.
• Type I error : If the null hypothesis is true and rejected then ;
probability of committing a type one error is designated by α
(Alpha).
• If a hypothesis is rejected at a 5 percent (α) level of
significance, then we are 95% confident that we have made
the right decision
10/27/21 5
TYPE I AND II ERRORS contd
• Type II error: When the null hypothesis is false and accepted then
• the probability of committing a type II error is designated by β (Beta

10/27/21 6
TYPE I AND II ERRORS contd

ACCEPT 𝑯𝑶 REJECT 𝑯𝑶

𝑯𝑶 is true Correct Decision Type I Error

𝑯𝑶 is false Type II Error Correct Decision

10/27/21 7
STEPS IN HYPOTHESIS TESTING
1. Formulate the null and alternative hypothesis
2. Set the value of outcomes (the level of
significance)
3. Choose the appropriate test statistics (Z, t, 𝜒 , F)
and calculate from the sample, the empirical value
of the chosen statistics

10/27/21 8
STEPS IN HYPOTHESIS TESTING contd
4. Construct the decision rule and find the critical values
of the chosen statistics

5. Compare the empirical values of the test statistics


from the sample to the critical values and make the
decision to either reject Ho or not

10/27/21 9
Two-sided or two-tailed test.
• If the decision rule lies within an interval and we reject the
null hypothesis for values outside this interval, we refer to
such criterion as a two-sided test or a two-tailed test.
• For example, if we are required to test the hypothesis
𝐻$ : 𝜇 = 42 and the decision rule says “ accept 𝐻$ if the
sample mean lies within the interval 40.1 and 45.5
(i.e., 40.1 ≤ 𝜇 ≤ 45.5 ) then the situation is a 2 tailed test.

10/27/21 10
One-sided or one-tailed test
• If we reject the null hypothesis for values of the mean-𝜇
greater than or less then a certain value, we refer to the
criterion as a one-sided test or one-tailed test

• Example if 𝐻! : 𝜇 ≥ 42 and the decision rule state that accept


𝐻! if the sampled mean is greater than or equal to 43 (i.e. 𝜇 ≥
42 ). Then we reject 𝐻! if the mean is less than 42 (𝜇 < 42),
then we have a one-sided test.

10/27/21 11
TEST FOR THE MEAN-TEST STATISTICS

If the sample size is large, 𝑥̅ − 𝜇


that is 𝒏 ≥ 𝟑𝟎 𝒛=𝜎
'√𝑛
If the sample size is small, ̅
"#$
that is 𝑛 < 30 t= (
%√*

10/27/21 12
TEST COMPARING TWO MEANS (PAIRED DATA)
• Sometimes we would like to know whether an observed difference
between two means is by chance or whether it is statistically significance
• Suppose x1, x2, ….xn are the observations on n individuals before an
experiment and y1, y2, ……yn are the corresponding observation after the
experiment. Then (x1,y1), (x2,y2),……(xn,yn) constitute a paired data.
• To compare the means of these data, we can transform the data into a
single sample data by finding the different between corresponding
observations and reducing it to a one-sample test.

10/27/21 13
TEST COMPARING TWO MEANS (PAIRED
DATA)contd
• Assume we have two independent samples with
means 𝑥"̅ and 𝑥̅# respective sample sizes 𝑛" and 𝑛#
from a normal population with mean 𝜇" and 𝜇# and
variance 𝜎"# and 𝜎## .
We can compare the means 𝜇" and 𝜇# by testing the
hypotheses under various conditions about the
population variance and sample size:
10/27/21 14
TEST COMPARING TWO MEANS –HYPOTHESIS
𝑯𝑶 : 𝝁𝟏 ≥ 𝝁𝟐 or 𝑯𝑶 : 𝝁𝟏 − 𝝁𝟐 ≥ 𝟎

𝑯𝒂 : 𝝁𝟏 < 𝝁𝟐 𝑯𝒂 : 𝝁𝟏 − 𝝁𝟐 < 𝟎

𝑯𝑶 : 𝝁𝟏 ≤ 𝝁𝟐 or 𝑯𝑶 : 𝝁𝟏 − 𝝁𝟐 ≤ 0

𝑯𝒂 : 𝝁𝟏 > 𝝁𝟐 𝑯𝒂 : 𝝁𝟏 − 𝝁𝟐 > 0
𝑯𝑶 : 𝝁𝟏 = 𝝁𝟐 or 𝑯𝑶 : 𝝁𝟏 − 𝝁% = 𝟎

10/27/21 𝑯𝒂 : 𝝁𝟏 ≠ 𝝁𝟐 𝑯𝒂 : 𝝁& − 𝝁𝟐 ≠ 0 15
CASE 1: POPULATION VARIANCES ARE KNOWN
• If the sample sizes are large, i.e 𝒏𝟏 ≥ 𝟑𝟎 and 𝒏𝟐 ≥ 𝟑𝟎 and
the variances are known, then the test statistics for the
difference between two means is given by :

%̅ ! &%̅ " &'


•𝒛= This is the z statistics
#" #"
!( "
$! $"

• where 𝜕 = 𝜇% − 𝜇& the value on the right hand side of the null
hypothesis , which is usually zero
10/27/21 16
CASE 2: VARIANCES ARE UNKNOWN BUT n IS
LARGE
• If the population variances are unknown but the sample size is
large, then the sample variance can be used as an approximate
substitute for the population variance. That is:

2̅ % 32̅ & 34
•𝒛=
'& '
%5 &
&
(% (&

• where 𝜕 = 𝜇% − 𝜇&
10/27/21 17
CASE 3A: VARIANCES ARE UNKNOWN BUT n IS
SMALL(n less than 30)
If the variances are equal that 𝜎%& = 𝜎&& = 𝜎 & , then the test
statistics is:

2̅ % 32̅ & 34
•𝒕=
(% )% '&
% *(( & )%)' &
&∗ % 5 %
(% *(& )& (% (&

• which is the t distribution with 𝑛% + 𝑛& − 2 degrees of


freedom
10/27/21 18
CASE 3B: VARIANCES ARE UNKNOWN BUT n IS
SMALL(n less than 30)
If no assumption of equality of the variances is made then
∗ /̅ ! 0/̅ "
the test statistic 𝑡 =
#" #"
!1 "
$! $"

• where t* is approximately the t-distribution with f degrees


" " "
%! %"
1
$! $"
where 𝑓 = " " " "
%! %"
$! $"
10/27/21 1 19
$! &! $" &!
DECISION RULE

Ha Reject Ho if Accept Ho or rese


judgment if
µ1 - µ2 < d Z<-Zα, t<-tα Z ³ -Z , t ³ -t
a a

µ1 - µ2 > d , Z>Zα, t>tα Z£Z , a t £t a

µ1 - µ2 ¹ d , Z < - Z or Z > Z
a
2
a
2
- Za 2 £ Z £ Za 2

t < -t or t > t
a
2
a
2
- ta 2 £ t £ ta 2

10/27/21 20
EXAMPLE 1
• A study by the EPA compared the highway fuel economy of
domestic and imported private cars. A test of 35 domestic
cars resulted in an average of 35.2 meter/gallon, and a test of
40 imported cars resulted in an average of 34.2 meter/gallon.
The population gas mileage is known to be normal in both
3 3
cases with a variances of 𝜎2 = 2.3 (domestic) and 𝜎3 = 1.8
(imported). Test the hypothesis that the population means
are equal at α=0.05
10/27/21 21
SOLUTION
• We solve the problem using the steps in testing for hypothesis
1. Formulate the null and alternative hypothesis:
𝐻! : 𝜇% − 𝜇& = 0
𝐻" : 𝜇% − 𝜇& ≠ 0
2. Set the value of outcomes (the level of significance) : α=0.05
3. Choose the appropriate test statistics and calculate the
empirical value of the statistics.

10/27/21 22
SOLUTION
• This is a test of the differences between two means and the sample
size is greater than 30. Also, the population variances are known.
Therefore the test statistics is the Z
• 𝑥%̅ = 35.2 𝑥̅& = 34.2 𝜎%& = 2.3 , 𝜎&& = 1.8, 𝑛% = 35, 𝑛& = 40

%̅ ! &%̅ "
•𝒛= = 3.011
#" #"
!( "
$! $"

10/27/21 23
SOLUTION
4. . Construct the decision rule and find the critical values of the
chosen statistics
If 𝑧 < −𝑧!⁄" or 𝑧 > 𝑧!⁄" reject 𝐻! otherwise accept 𝐻!
±𝑍(.(&*, ± 1.96
5. Compare the empirical values of the test statistics with the
critical values and conclude
Calculated Z=3.01 > 𝑧!⁄" = 1.96 .
By the decision rule, we reject the null hypothesis and conclude
that that the population means are not equal
10/27/21 24
EXAMPLE 2
• In 1994 Duraplast claimed that its PVC pipes are more durable
than that of Interplast. In order to test this claim a student made
an experiment by testing the durability of 5 PVC pipes from each
company. His experiment revealed that the mean duration of a
Duraplast PVC is 20,000 months and that of Interplast is 8,000
months. The standard deviation of the PVCs from Duraplast and
Interplast were 3,000 and 2,000 month respectively. Test this
claim using an 𝛼 of 0.05 if the population variances of the
duration of the two PVCs are equal
10/27/21 25
SOLUTION
1. Formulate the null and alternative hypothesis.
Thus, the null and alternative hypotheses to be tested are:
𝐻! : 𝜇% − 𝜇& ≤ 0
𝐻" : 𝜇% − 𝜇& > 0

2. Set the value of outcomes (the level of significance) : α=0.05

10/27/21 26
SOLUTION
3. Choose the appropriate test statistics and calculate the
empirical value of the statistics
The test statistic is the t-statistics (when the variances are equal)
𝑥%̅ − 𝑥̅& − 𝜕
𝒕= = 7.44
𝑛% − 1 𝑆%& + (𝑛& − 1)𝑆&& 1 1
∗ +
𝑛% + 𝑛& − 2 𝑛% 𝑛&

10/27/21 27
SOLUTION
4. Construct the decision rule and find the critical values of the chosen
statistics
𝑡 > 𝑡,, -# .-# /& reject 𝐻! or reserve judgment; 𝑡(.(*,0 = 1.86
5. Compare the empirical values of the test statistics with the critical
values and conclude
𝑡 = 7.44 > 𝑡(.(*,0 = 1.86
Hence we reject the null hypothesis that the mean duration of
Duraplast PVC is not less than or equal to that of Interplast
10/27/21 28
QUESTION
• An advert in the Ghanaian Times provide the following prices for
renting a car from two different companies in 26 towns. Determine
if the two sample means differ significantly, using 𝛼 = 0.1
City
A B C D E F G H I J K L M
Company 1
46 61 59 55 54 52 34 50 49 46 41 54 68
Company 2
46 57 58 54 54 52 34 49 49 46 41 50 63
City
N O P Q R S T U V W X Y Z
Company 1
68 65 44 57 45 56 50 50 47 48 47 62 45
Company 2
10/27/21 63 61 44 53 45 52 50 50 47 48 47 59 45
29
SOLUTION
1. Formulate the null and alternative hypothesis.
𝐻! : 𝜇% − 𝜇& = 0 𝐻" : 𝜇% − 𝜇& ≠ 0

2. Set the value of outcomes (the level of significance) : α=0.1

3. Choose the appropriate test statistics and calculate the empirical


value of the statistics

10/27/21 30
SOLUTION
• The sample size is small and the sample standard deviation is known,
nothing is said about the equality of the population variance and
therefore we use the t-statistics for the difference in mean when the
variances are not equal.
𝑥%̅ − 𝑥̅& − 𝜕
𝒕= 0.68
𝑛% − 1 𝑆%& + (𝑛& − 1)𝑆&& 1 1
∗ +
𝑛% + 𝑛& − 2 𝑛% 𝑛&

10/27/21 31
SOLUTION
4. Construct the decision rule and find the critical values of the
chosen statistics
If 𝑡 < −𝑡!⁄" or 𝑡 > 𝑡!⁄" reject 𝐻! or reserve judgment.

5. Compare the empirical values of the test statistics with the


critical values and conclude
𝑡(.(*,*1 ⋍ 𝑡(.(*,*( = 2.01 > 𝑡 = 0.68
Therefore accept 𝐻!
10/27/21 32
QUESTION
• Zulu Obaapanyin - Ama Ansomaah believes that the average cost of
housing in a certain suburb has gone up more than ¢15,000 over the
last ten years. She believes that the standard deviation of the cost of
single-family dwelling 10 years ago was ¢4,000, but that is currently
¢8,000. In checking a random sample of old records she has found that
30 houses in this suburb 10 years ago sold for an average of ¢22,875.
Furthermore, a random sample of 25 houses in this suburb sold this
year for an average of 40,345. Is her belief about the increase in the
cost of housing supported by data? Assume the distribution is normal
at an alpha of 0.05.
10/27/21 33
SOLUTION
1. Formulate the null and alternative hypothesis.
𝐻! : 𝜇% − 𝜇& ≤ 15000 𝐻" : 𝜇% − 𝜇& > 15000

2. Set the value of outcomes (the level of significance) : α=0.05

3. Choose the appropriate test statistics and calculate the empirical


value of the statistics
𝑥%̅ = 40,345 𝑥̅& = 22,875 𝑠% = 4000 𝑠& = 8000 𝑛% =
25 𝑛& = 30
10/27/21 34
SOLUTION
%̅ ! &%̅ " &' )*,,)-&##,./- &"-,***
•𝒛= = = 1.404
#" #"
!( " (&''')" ( *''')"
$! $" (
") +'

4. Construct the decision rule and find the critical values of


the chosen statistics
Reject 𝐻$ if Z > 𝑍4.67 = 1.645
10/27/21 35
SOLUTION
5. Compare the empirical values of the test statistics with the
critical values and conclude . 𝑍 = 1.404 > 𝑍(.2* = 1.645.
We accept Ho

10/27/21 36
TESTING CONCERNING STANDARD DEVIATIONS
• The chi-square distribution is used to test whether the
population standard deviation is equal to a specific constant or
whether the population variance equals to a specific constant.

• 𝐻! : 𝜎 ≥ 𝜎( 𝐻! : 𝜎 ≤ 𝜎( 𝐻! : 𝜎 = 𝜎(
𝐻" : 𝜎 < 𝜎( 𝐻" : 𝜎 > 𝜎( 𝐻" : 𝜎 ≠ 𝜎(

10/27/21 37
TEST STATISTICS
• If our sample is random and comes from a normal population
& (-/% )5 "
then 𝜒 = with (n-1) degrees of freedom is the test
6#"
statistics.
&
• The critical values are 𝜒%/, and 𝜒,& for one sided alternatives
& &
and, 𝜒%/!⁄" and 𝜒!⁄" for two sided alternative.

• The test criteria for the null hypothesis 𝜎 = 𝜎( are as


follows:
10/27/21 38
DECISION RULE

Ha Reject Ho if Accept Ho or reserve


judgment
s <s0 c 2 < c12-a c 2 ³ c12-a
s > s0 c 2 > c a2 c 2 £ c a2
s ¹ s0 c <c
2 2
or c 1-a £ c 2 £ c a2
1-a 2 2
2

c 2 > c a2
2
10/27/21 39
EXAMPLE 1
• In a laboratory experiment, 𝑠 = 0.0086 for 10 determination
of specific heat of iron. Use the 0.05 level of significance to test
the null hypothesis against the alternative hypothesis
𝜎 < 0.01

10/27/21 40
SOLUTION
1. Formulate the null and alternative hypothesis.
𝐻( : 𝜎 ≥ 0.01 , 𝐻" : 𝜎 < 0.01

2. Set the value of outcomes (the level of significance) : α=0.05

3. Choose the appropriate test statistics and calculate the


empirical value of the statistics

10/27/21 41
SOLUTION
• This is a test of the standard deviation being equal to a specific
constant and therefore the chi squared test is used
(-/% )5 " 2((.((07)"
• 𝜒& = = = 6.7
6$" ( (.(%()"
4. Construct the decision rule and find the critical values of the
chosen statistics
Reject the null hypothesis if χ2< χ2 0.95,9 = 3.325 otherwise accept
Ho or reserve judgment.

10/27/21 42
SOLUTION
5. Compare the empirical values of the test statistics with the
critical values and conclude

• χ2=6.7> χ2 0.95,9 = 3.325 hence accept Ho or reserve judgement.

• The standard deviation is greater than or equal 0.0100

10/27/21 43
QUESTION
A student in an attempt to check a Professor’s claim that convicted
embezzlers spend on the average 12.3 months in jail takes a random
sample of 35 such cases from court files.
• Using his results, 𝑥̅ = 11.5 𝑚𝑜𝑛𝑡ℎ 𝑎𝑛𝑑 𝑠 = 3.8 months and the
level of significance is 0.05. Should the student accept the null
hypothesis 𝜇 = 12.3 months or the alternate hypothesis 𝜇 ≠
12.3months?
• Test at 0.01 level of significance whether or not 𝜎 = 4.2 months for
the amount of time that convicted embezzlers spend in jail if n=41.
That is test if 𝜎 = 4.2 even though s=3.8.
10/27/21 44
SOLUTION (a)
1. Formulate the null and alternative hypothesis.
𝐻! : 𝜇 = 12.3 𝐻" : 𝜇 ≠ 12.3

2. Set the value of outcomes (the level of significance) : α=0.05

3. Choose the appropriate test statistics and calculate the empirical


value of the statistics

10/27/21 45
SOLUTION
• The sample size is large n>30 and therefore the z test will be
used n=35, s=3.8,
̅
%&0 ̅
%&0
•𝒛=# or 𝒛 = -
1√$ 1√$

"".-&"#.,
•𝑧= +.& = −1.25
1√+)

10/27/21 46
SOLUTION
4. Construct the decision rule and find the critical values of the
chosen statistics
If Z < −𝑍!⁄" or Z > 𝑍!⁄" reject Ho

5. Compare the empirical values of the test statistics with the critical
values and conclude
𝑍!⁄" = 𝑍(.(&* = 0.0987 ; Z= -1.25< −𝑍!⁄" = −𝑍(.(&* = −0.0987
We reject Ho, the professor’s claim is not correct
10/27/21 47
SOLUTION (b)
• Formulate the null and alternative hypothesis.
𝐻! : 𝜎 = 4.2 𝐻" : 𝜎 ≠ 4.2

• Set the value of outcomes (the level of significance) : α=0.01

• Choose the appropriate test statistics and calculate the


empirical value of the statistics

10/27/21 48
SOLUTION (b)
• Equality of standard deviation and therefore the χ2 Test. n=41,
s=3.8,
(802 )9 " ;4(<.=)"
• 𝜒3 = = = 32.74
:'" (;.3)"
4. Construct the decision rule and find the critical values of the
chosen statistics
& &
if 𝜒 & < 𝜒%/ !⁄
"
or 𝜒 &
> 𝜒 !⁄
"
reject Ho otherwise accept Ho or
reserve judgment; Χ0.005,40=66.7659 χ0.995,40=20.7065
10/27/21 49
SOLUTION (b)
• 5. Compare the empirical values of the test statistics with the
critical values and conclude
• 32.74 > 20.7065 and 32.74 < 66.7659= χ0.005

• This implies that 20.7065 < 32.74 < 66.7659

• Accept Ho or reserve judgment. The amount of time


embezzlers stay in jail has got a standard deviation of 4.2.
10/27/21 50
TEST FOR THE EQUALITY OF TWO VARIANCES –
F-test
• The sampling distribution for inferences about the equality of two
variances is described by the F distribution. The F-test is used to test
the equality of 2 independent variance or standard deviation
• The criteria for testing the null hypothesis
• 𝐻! ∶ 𝜎% = 𝜎& , 𝐻! ∶ 𝜎%& = 𝜎&&

10/27/21 51
TEST CRITERIA , F-TEST
Ha Test statistics Reject H 0 Accept H0 or
reserve
s1 < s 2 s2
2
F < F1-a F ³ F1-a
F=
s1 < s 2
2 2
s1
2

s1 > s 2 s1
2
F > Fa F £ Fa
F=
s1 > s 2
2 2
s2
2

s1 ¹ s 2 The largest of F > Fa or F1-a £ F £ Fe


s1 ¹ s 2
2 2
the 2 ratios 2 2 2

F<F a
1-
2
10/27/21 52
EXAMPLE
• In a sample of 8 observations, the sum of squared deviations of
item from the mean was 94.5. In another sample of 10
observations, the value was found to be 101.7. Test whether
the difference is significant at 5% level.

10/27/21 53
SOLUTION
1. Formulate the null and alternative hypothesis
𝐻! ∶ 𝜎%& = 𝜎&& , 𝐻" ∶ 𝜎%& ≠ 𝜎&&

2. Set the value of outcomes (the level of significance): 𝛼: 0.05

3. Choose the appropriate test statistics and calculate the


empirical value of the statistics

10/27/21 54
SOLUTION
• The test statistic is the F- statistic because we are talking about the
equality of standard deviations. But we need to calculate the
standard deviation first.
• ∑(𝑥% − 𝑥%̅ )& = 94.5 ; ∑(𝑥& − 𝑥̅& )& = 101.7 ; 𝑛% = 8 ; 𝑛& = 10

∑(9# /9̅ # )" 21.* ∑(9# /9̅ # )" %(%.;


• 𝑠%& = = = 13.5, 𝑠&& = = = 11.3
-# /% ; -" /% 2

∗ 5#" %=.*
• 𝐹 = = = 1.195; 𝐹(.(&*,;.2 = 4.2
5#" %%.=
10/27/21 55
SOLUTION

4. Construct the decision rule and find the critical values of the
chosen statistics
• If F > 𝐹!⁄" 𝑜𝑟 𝐹 < 𝐹%/!⁄" or reject Ho otherwise accept Ho.
Where F is the larger of the two ratios
5. Compare the empirical values of the test statistics with the
critical values and conclude

10/27/21 56
SOLUTION

𝐹 = 1.195 < 𝐹(.(&*,;.2 = 4.2

1 1
F1-a = F0.975,7,9 = = = 0.207
2 F0.025,9,7 4.82
0.207 £ 1.195 £ 4.2 Þ F1-a £ F £ Fa
2 2

10/27/21 57
QUESTION
• Two random samples were drawn from two normal populations
and their values were

A 66 67 75 76 82 84 88 90 92

B 64 66 74 78 82 85 87 92 93 95 97

• Test whether the two populations have the same variance at 5 %


level of significance
10/27/21 58
UNIVERSITY OF GHANA
DEPARTMENT OF ECONOMICS

ECON 316
Applied Statistics for Economists

Time Series Analysis 1

Lecturer: Prof. Daniel K. Twerefou & Dr. Emmanuel Abbey


• Contact Information: dktwerefou@ug.edu.gh

Slide 1
Session Outline
The key topics to be covered in the session are as follows:
• Definition of Time Series Analysis
• Components of time series
• Models of Time Series
• Objectives of time series analysis
• Explanation of Trends
• Fitting the Trend Line
• Method of Semi-averages
• Method of Moving Averages

Slide 2
What is Time Series Analysis?
• Time series is a set of regular observations of
some variable, collected and recorded
sequentially over a period.
• The period may be daily, weekly, monthly,
quarterly, annually or any other constant period.

Slide 3
Examples of Time Series
• Economics: share prices on successive days. Export
volumes in successive months, nominal GDP in
successive years.
• Marketing: sales volume in successive weeks,
production costs over successive years.
• Demographic: annual population data, child mortality
rate for successive years, enrolment in JHS in successive
years.

Slide 4
Re presentation of Time Series-Inflation
Tabular format
Graphical representation
Year Inflation (%)
2000 40. 5
2001 21.3
2002 15.2
2003 23.6
2004 11.8
2005 13.5
2006 10.9
2007 10.1 Slide 5
Objectives of Time Series
• Systems or variables have influences that are
associated with time.
• Example in agricultural production: output increases
during the rainy season and decrease during the dry
season
• Example in sales: sales increase during Christmas
and reduces after that
– does season affect agricultural production and sales?

Slide 6
Objectives of Time Series cont’d
• What factors affect time series value?
• How will these factors affect the values in future?
• How can we iron out these factors to obtain real
actual values?

• The main objective of studying time series


analysis is to identify and measure the
influences of these different time related factors
that affect time series values.
Slide 7
Components of Time Series
• Four factors/components influence time series value:
• Trend (T): Overall long-run upward or downward
movements of the series.
• Seasonal variation (S): Periodic variation within a short
time usually a month or a quarter but repeats itself year
after year.
• Cyclical variation (c): Periodic variations within a long
time, usually between two to ten years and repeats itself
approximately over the same period.
• Irregular or random variation (I): Variations that occur
due to chance or unexplained events.
Models of Time Series
• How do the components combine to determine the time
series values? This is referred to as the models of time
series.
• Two models of time series: Additive and multiplicative
models.
The Additive Model
• Any actual time series value is obtained by adding up the
contributions of trend, cycle, seasonal and Irregular
components to obtain the actual time series value.
Mathematically,
Slide 9
Example 1
• The table below provides the components of a time series from
1985 to 1997. Using the additive model complete the table.
Year Trend Cycle Seasonal Random Actual values
Yi = Ti + Si + Ci + I i
1985 3 1 1 162
1986 159 4 2 1
1987 160 5 2 167
1989 162 6 2 169
1990 5 1 2 173
1991 166 6 -1 2
1992 168 7 2 180
1993 171 8 1 3
1994 174 2 3 185
1995 177 7 1 184
1996 180 2 2 192
1997 184 8 -1 192

Slide 10
Solution to Example 1
• The additive model is given by
• To find the trend value for 1985 - , we make Ti the
subject as:
or

• Inserting the values and simplifying

• Similarly, we obtain the other values and present as table.


Please crosscheck all the answers.
Slide 11
Solution to Example 1 Cont’d
Year Trend Cycle Seasonal Random Actual values

1985 157 3 1 1 162


1986 159 4 2 1 168
1987 160 5 0 2 167
1988 162 6 -1 2 169
1989 165 5 1 2 173
1990 166 6 -1 2 175
1991 168 7 2 3 180
1992 171 8 1 3 183
1993 174 6 2 3 185
1994 177 7 -1 1 184
1995 180 8 2 2 192
1996 184 8 -1 1 192

Slide 12
The Multiplicative Model
• Assumes a multiplicative relation amongst the four
components.
• Mathematically,
• Time series value for year i is the product of the trend,
seasonal, cyclical and irregular components for year i.
• If we know any four of these values we should be able
to find the other value by making that value the
subject of the formula and inserting the values.

Slide 13
Example 2
The table below provides some of the components and actual values of
a time series from 1990 to 2000. Using the multiplicative model
complete the table.
Years Trend Cycle Seasonal Random Actual Values
1990 1251 1.04 1.2 1405.12
1991 1241 0.8 1.02 1.2
1992 1231 0.7 1.2 1127.10
1993 1222 0.7 0.97 1.1 912.71
1994 0.8 0.95 1.1 1019.92
1995 1200 0.9 0.9 972.00
1996 1195 0.9 1 860.40
1997 1195 0.7 1 752.85
1998 1194 0.8 1.1 935.14
1999 1192 0.8 0.88 923.08
2000 0.9 0.9 0.9
Slide 14
Solution to Example 2
• The multiplicative model is given by the formula

• We want o find the trend value for 1994. Making the


subject we obtain:

• or
Notice that and
• Inserting these values, we obtain:

Slide 15
Solution to Example 2 Cont’d
• Results of the calculations are presented in the table below.
Years Trend Cycle Seasonal Random Actual Values
1990 1251 0.90 1.04 1.2 1405.12
1991 1241 0.8 1.02 1.2 1215.19
1992 1231 0.7 1.09 1.2 1127.10
1993 1222 0.7 0.97 1.1 912.71
1994 1220 0.8 0.95 1.1 1019.92
1995 1200 0.9 0.9 1 972.00
1996 1195 0.80 0.9 1 860.40
1997 1195 0.7 0.9 1 752.85
1998 1194 0.8 0.89 1.1 935.14
1999 1192 0.8 0.88 1.1 923.08
2000 1190 0.9 0.9 0.9 867.51
Slide 16
Activity 1
The table below shows some of the values of the components and
actual time series from 1996 to 2005. Using the multiplicative model
complete the table.
Years Trend Cycle Seasonal Random Actual Values
1996 0.60 1.04 0.40 18335.37
1997 88150.80 0.80 0.50 35965.53
1998 96965.88 1.10 0.40 25598.99
1999 99874.86 0.70 0.97 0.50
2000 104868.60 0.70 0.95 41842.57
2001 111160.72 0.90 0.40 30013.39
2002 124500.00 0.65 0.90 0.50
2003 151890.00 0.70 0.40 40402.74
2004 162522.30 0.85 0.60 58020.46
2005 0.80 0.88 0.50 63615.13
2006 196990.03 0.80 0.90 0.50
Slide 17
Activity 2
The table below shows the values of some of the components and
actual values of a time series for 1996 to 2005. Using the additive
model complete the table.
Year Trend Cycle Seasonal Random Actual

1995 91823.75 734.59 36.73 92962.36


1996 110188.50 440.75 44.08 111334.46
1997 121207.35 727.24 48.48 122467.91
1998 499.37 749.06 49.94 126141.94
1999 131085.75 524.34 52.43 132449.04
2000 138950.89 833.71 55.58 140395.98
2001 622.50 933.75 62.25 157243.50
2002 189862.50 7 1139.18 75.95 191837.07
2003 812.61 1218.92 81.26 205265.67
2004 225906.00 903.62 1355.44 228255.42
2005 246237.54 984.95 1477.43 98.50
2006 253624.66 1003.00 1521.75 256250.86
Slide 18
Explanation of Trends
• Trend component of a time series is the overall
movement in the time series value over a long
period of time.
• Usually, identification of trend component is
done using annual data.
- possible to establish the trend line using daily,
weekly, monthly and quarterly data.
• We will consider 3 main methods of fitting the
trend line –methods of semi-averages, moving
averages and least squares. Slide 19
Method of Semi-Averages
• Data is first divided into two parts with the same
number of periods.
• If number of observations is even, for example
from 1973-1990 (18 observations), then the equal
parts will be from 1973-1981 and 1982-1990 – 9
observations each.
• If number of observations is odd, the two equal
parts is formed by omitting the middle year.

Slide 20
Method of Semi-Averages contd
• Plot the trend line using the average values of
the two observations with the help of the
equation of a straight line:

Øb is the gradient of

Slide 21
Example 1
The table shows the Years Tons of Cocoa
production of cocoa by a 2000 3
2001 4
farmer in Ghana from 2000
2002 5
to 2009. 2003 1
a. Fit the trend line using 2004 3
the method of semi- 2005 4
averages. 2006 6
2007 2
b. Find the equation of the 2008 4
straight line 2009 5
Slide 22
Solution to Example 1
• We have even (10) number Tons of Centered
of observations from 2000 Years Cocoa average
2000 3
to 2009.
2001 4
• We split the time series into 2002 5 3.2
two equal parts 2000-2004 2003 1
and 2005-2009, each with 2004 3
5 observations. 2005 4
• The average for the first 2006 6
part is 3.2 and second part 2007 2 4.2
is 4.2 2008 4
2009
Slide 23 5
Solution to Example 1 Cont’d
Plotting the years on the x-axis and • We now Find the
output on the y-axis we obtain equation of the line with
Output
the two points
5 and
4.5

3.5 • Since the intervals


3
between the years are
equal, we will use codes
2.5

1.5 for the year. Year 2000


1

0.5
will have a code of 1,
0 2001 will have a code of
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

2, etc.
Slide 24
Solution to Example 1 Cont’d
• Using the codes, the two points become
and
• The equation of a line is given as:

• Inserting the figures, we obtain:

• Or

Slide 25
Method of Moving Averages
• The average value of the time series for a number of years
is obtained in successive order.
• This average values are taken as the trend value for the
unit of time falling at the middle of the period.
• Reasoning: When you take these averages as trend, you
obtain a smoother curve and lessen the influence of
fluctuations that pull the annual times series values
away from the general trends.
• How do you choose this period of years? You choose it
to be even or odd to coincide with a practical natural
cycle.
Slide 26
Odd Period of Moving Average
By odd period of moving averages, we mean a period for moving
average such as 3-yearly moving average, 5-weekly moving average
etc. The table shows the calculation of 3-yearly moving averages.
Year Value 3-Yearly 3-Yearly Moving
Moving Totals Averages / Trend
1990 a0
1991 a1 ao + a1 + a2 a0 + a1 + a2
3
1992 a2 a1 + a2 + a3 a1 + a2 + a3
3
1993 a3 a2 + a3 + a4 a2 + a3 + a4
3
1994 a4 a3 + a4 + a5 a3 + a4 + a5
3
1995 a5 a4 + a5 + a6 a4 + a5 + a6
3
1996 a6
Slide 27
Odd Period of Moving Average contd

• Finding the equation of the trend line is done by using the first
and last points and the formula::

Slide 28
Odd Period of Moving Average-Example 1

The table shows the Year Production Year Production


production of timber from 1973 15 1980 56
1973 to 1987. Find the
1974 21 1981 63
trend equation using the 3-
1975 30 1982 70
yearly moving averages.
1976 36 1983 74
1977 41 1984 82
1978 46 1985 93
1979 50 1986 95
1987 102
Slide 29
Solution to Example 2
Year Production 3-yearly moving 3-yearly moving averages (Trend)
totals
1973 15
1974 21 66 22.00
1975 30 87 29.00
1976 36 107 35.66
1977 41 123 41.00
1978 46 137 45.66
1979 50 152 50.66
1980 56 169 53.33
1981 63 189 63.00
1982 70 207 69.00
1983 74 226 73.33
1984 82 249 83.00
1985 93 270 90.00
1986 95 290 96.66
1987 102 Slide 30
Solution to Example 2 Cont’d
• Finding the equation of the trend line, we apply the codes
to the years. Thus, 1974 will have a code of 1 and 1986, a
code of 13.
• The two points are: and

Slide 31
Activity 1
The table below shows the nominal GDP of a country
from 2000 to 2008. Fit the trend equation using 3-yearly
moving averages.
200 200 200 200
Year 2000 2001 2 2003 2004 5 2006 7 8
GDP (trillion
¢) 235 242 255 258 285 291 292 297 300
a. Find the trend equation using 3-weekly moving
averages
b. Draw the trend line.
Slide 32
Even Period of Moving Averages
• If the moving average is an even period, for example -
4-yearly moving average, the moving averages that
are computed does not fall in a particular period
but rather between two periods.
• Implication: what year will corresponded to the
average that you have calculated.
• Centering-find the average of two successive moving
averages and placing it in the middle value solves the
problem.

Slide 33
Calculation of 4-Yearly Moving Average
Year Value 4-yearly moving 4-yearly centered
average moving average (Trend)
1990 a0
1991 a1
a0 + a1 + a2 + a3
= b1
4
1992 a2 b1 + b2
2
a1 + a2 + a3 + a4
= b2
4
1993 a3 b2 + b3
2
a2 + a3 + a4 + a5
= b3
4
1994 a4 b3 + b4
2
a3 + a 4 + a5 + a6
= b4
4
1995 a5
1996 a6
Slide 34
Example 3
i.The data below shows the output of a food processing
company. Assume a 4-yearly cycle and find the trend line
by the method of moving averages.
ii. Using the trend line, estimate the output for 1990

Year 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988
Output 464 515 518 467 502 540 557 571 586 612
(tons)

Slide 35
Calculation of 4-yearly moving average
Year Output 4-yearly MT 4-yearly 4-yearly CMA/
MA Trend
1979 464
1980 515
1964 491
1981 518 495.75
2002 500.5
1982 467 503.625
2027 506.75
1983 502 511.625
2066 516.5
1984 540 529.50
2170 542.5
1985 557 553.0
2254 563.5
1986 571 572.5
2326 581.5
1987 586

1988 612
Slide 36
Solution to Example 3 Cont’d
• By using codes for the first (1980) and last (1986) data
points, we obtain the two points as: and
• Equation of a straight line is given as:
• Inserting the values, we obtain:

• Thus,
• Forecasting for the year 1990 with a code of 10, we have:

• Therefore, output for 1990 will be 635.9


Slide 37
10/19/2021

• UNIVERSITY OF GHANA
• DEPARTMENT OF ECONOMICS

• ECON 316
Applied Statistics for Economists

• Time Series Analysis 2


• Lecturer: Prof. Daniel K. Twerefou,
• Contact Information: dktwerefou@ug.edu.gh

Slide 1

Session Outline
The key topics to be covered in the session are as
follows:
• Fitting the Trend Line
-Method of Least Squares
• Explanation of seasonal variation
• Calculation of seasonal indices
• Deseasonalization of time series data

Slide 2

1
10/19/2021

Method of Least Squares


• Aim: fit a trend line such that the sum of the
deviations of the actual observations from the trend
value is the least.
• If observations appear to follow a linear trend, then
according to the method of least squares the trend
equation is
Where a is the intercept and b is the slope or gradient
and

Slide 3

Example
Year Unemploym • The table below shows
ent the unemployment data
2004 1005 of graduates from 2004-
2009 in a certain
2005 1010 country
2006 975 i. Determine the least
2007 950 squares trend equation
ii. Estimate unemployment
2008 955
for 2015.
2009 940
Slide 4

2
10/19/2021

Solution to Example 4
• Table
X Y XY X2
1 1005 1005 1
2 1010 2020 4
3 975 2925 9
4 950 3800 16
5 955 4775 25
6 940 5640 36
21 5835 20165 91
Slide 5

Solution to Example 4 Cont’d


n=6

Slide 6

3
10/19/2021

Solution to Example 4 Cont’d


• Thus, the least squares equation is
• Estimating unemployment for 2015, the code will
be 12

Slide 7

Activity 2
1. The following data shows the number of road accidents from 1998
to 2008.
199
Year 8 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

No. of
accidents 220 224 235 242 255 258 285 291 292 297 300

i. Obtain the trend equation using the method of least squares


ii. Estimate the number of road accidents in 2016

Slide 8

4
10/19/2021

What is Seasonal Variation?


• Seasonal variation refers to the systematic intra-year
movements of a time series which repeats itself year
after year.
• - Takes the same pattern each year and repeat
themselves year after year.
• Example 1: Rise and fall in agricultural output caused
by the weather,
• Example 2: Increase in sales during Christmas, etc
• Importance: Ability of a manager to predict demand
cycles can help him plan his finances, human
Slide 9
resources and other needs etc.
9

Practical example of data with Seasonal


Variation
• Consider table below on agriculture output of a
certain farmer.
• Output fluctuates in different quaters
• With this the farmer can plan the number of
employees, tractors that he/she will need at any time
etc.

Slide 10

10

5
10/19/2021

Deseasonalization of Time Series


• Deseasonalization: The process of adjusting time
series data to eliminate fluctuations due to
seasonal variations so that trend and cycle can be
studied.
• After deseasonalization, the resulting figures are
then said to be seasonally adjusted.
• Deseasonalising first involve the calculation the
seasonal index.

Slide 11

11

Calculating Seasonal indices using the


multiplicative model
• We decompose series value using the multiplicative
model:
• Four main steps are used in calculating the seasonal
index
1. Estimate T and C components using moving averages
2. Obtain the s S×I components by dividing the actual
value by the T and C component i.e.
3. Average the S×I term for each remove the I
component.
4. Normalize the estimated S component
Slide 12

12

6
10/19/2021

Example 1
The table below shows labour turnover on quarterly
bases for three years. Calculate the seasonal index
under the multiplicative model using the method of
moving averages.
Interpret the Seasonal Index for the first and fourth
quarters Year/Quarter 1 2 3 4
1985 68 62 61 63
1986 65 58 66 61
1987 68 63 63 67
Slide 13

13

Solution to Example 1
• Consider table on next slide
• Step 1: estimate the trend using the method of
moving averages.
• We have 4 quarters in a year and therefore we use 4-
quarterly moving average (4QMA) and center it
(Column 4 and 5)
Step 2: Obtain the s S×I components by dividing the
actual value by the T and C component i.e. (column 6)
• For example, the value 0.966 was obtained by dividing
61 by 63.125.

Slide 14

14

7
10/19/2021

Solution to Example 1 Cont’d

Slide 15

15

Solution to Example 1 Cont’d


• Step 3: Average the S and I term for each quarter.
• Consider table on next slide. We have rearranged the
S and I term by years and quarters.
• The averages have been provided as the fifth row of
the table below.

Slide 16

16

8
10/19/2021

Solution to Example 1 Cont’d

The 4th step-normalizing the estimated seasonal component


is necessary where the sum of the averages of the seasonal
indices is not equal to the number of seasons.
Slide 17

17

Solution to Example 1 Cont’d


Step 4: normalizing S.
• 1. Find the sum of the seasonal average - 3.9914 and later
the Seasonal index correction factor (SICF)

• 2. Multiplied SICF by the average seasonal indices to


obtain the Adjusted Seasonal Index (ASI).
• The ASI are presented in row 6 .
Slide 18

18

9
10/19/2021

Solution to Example 1 Cont’d


• The seasonal index for the first quarter – 1.054. This
means a typical 1st quarter turnover is 105.4 percent
of the average quarter
• The seasonal index for the fourth quarter - 0.985 .
This means a typical fourth quarter turnover is
98.50% of those of the average quarter.

Slide 19

19

Deseasonalising time series values


• The formula for deseasonalisingd time series values
is given as:

• The adjusted seasonal index is season-specific


• Lets deseasonalized the ffigure.

Slide 20

20

10
10/19/2021

Deseasonalising time series values


cont’d
• Consider the actual values in our recent example for the year
1985. The table below shows the actual values and the ASI.
We can compute the deseasonalized using the formula
(column 4).
• Interpretation: 1st quarter of 1985, the turnover should have
been 64.5 without the seasonal effects. However, it is 68 due
to seasonal influences.
198 Actual Values Adjusted seasonal Deseasonalized
5 Index figures
I 68 1.05353 64.5
II 62 0.95257 65.09
III 61 1.00917
Slide 21
60.45
IV 63 0.98471 63.98
21

Activity 1
1. The table below shows the number of people who attend morning,
afternoon and evening school for a week.
Monday Tuesday Wednesday Thursday Friday

Morning 122 128 129 123 133


Afternoon 111 112 143 109 103
Evening 116 140 141 125 125

a. Using the method of moving averages and the multiplicative model,


Determine the seasonal indices for morning, afternoon and evening.
b. Deseasonalize the data.

Slide 22

22

11
10/19/2021

Activity 2
1. The table below shows the sales of a supermarket in Ghana in
billion cedis.
Quarter
Year First Second Third Fourth
2003 18 80 30 522
2004 24 105 54 527
2005 33 141 48 538
2006 40 150 75 545
2007 35 180 55 550
2008 48 205 70 557

i. Calculate the trend using the method of moving averages


ii. Calculate the average seasonal Index under the multiplicative
model
iii. Deseasonalize the data. Slide 23

23

12
• UNIVERSITY OF GHANA
• DEPARTMENT OF ECONOMICS

• ECON 316
Applied Statistics for Economists

• Time Series Analysis 3


• Lecturer: Prof. Daniel K. Twerefou,
• Contact Information: dktwerefou@ug.edu.gh

Slide 1
Session Overview
By the end of this section, you should be able to
• Explain cyclical variations
• Explain random variations
• Calculate cyclical relative
• Discuss the importance of time series
• Explain forecasting
• Forecast based on trend and seasonal components
• Forecast using simple exponential smoothing

Slide 2
Session Outline
The key topics to be covered in the session are as follows:
• Cyclical Variation
• Cyclical Relative
• Irregular Variation
• Forecasting

Slide 3
Topic 1
What is Cyclical Variation (CV)
• Cyclical variation is the recurrent up and down wavelike
variation from trend that have duration of about 2-10
years and may repeat again and again
- for a specific time series, one cycle can take 3 years but
another cycle can take about 10 years.
• Example: Business cycle- propensity, recession,
depression and recovery
• These cycles have the propensity to affect time series
value in sales, production, growth etc.

Slide 4
Example : Business Cycle
• Measuring CV is very
difficult. Why?
• Successive cycles may
vary widely in time and
pattern and usually mixed
up with irregular factors.
• .

Slide 5
Topic 2
Measuring Cyclical Variation: Cyclical Relative
approach
• Assumption: Annual time series reflects the effects of T
and C component because the S and I components are
defined as short-run influence which occurs within the
course of each year.
• Under this assumption the multiplicative model
reduces to
Or T - trend estimate, Y - actual value

• The ratio is called the cyclical relative and


indicates the presence or absence of any cyclical influence
on an annual time series value. Slide 6
Interpretation of Cyclical Relative
• Cyclical relative= 100% implies total absence of any
cyclical influence on the time series.
• Cyclical relative<100% implies there is recession
because what was obtained (actual value – Y) is less
than what was estimated (Trend).
• Cyclical relative>100% implies there is recovery
because what was obtained (actual value or Y) is
greater than what was estimated (Trend).

Slide 7
Example 1
• The table below shows the actual (Y) and Trend (T)
estimate of the production of aluminium ingots. Calculate
the cyclical relative and interpret your result.
Year Y T
1994 700 728
1995 850 799
1996 900 972
1997 1100 1078
1998 1225 1225
1999 1324 1347
2000 1500 1485
2001 1675 1682
2002 1700 Slide 8 1564
Solution to Example 1
• CR for 1994 is
æ Y ö
Cyclical relative ç ´100 ÷ 96.2%. This means
Year Y T èT ø
that there is cyclical
1994 700 728 96.2 influence (recession)
1995 850 799 106.4 as what was obtained
1996 900 972 92.6 (actual value) is less
1997 1100 1078 102.0 than what was
1998 1225 1225 100.0 estimated (trend).
1999 1324 1347 98.3 • CR for 1995 is
2000 1500 1485 101.0 106.4% meaning
2001 1675 1682 99.6 there is a recovery.
2002 1700 1564 108.7
Slide 9
Activity 1
• The table below shows the actual (Y), Trend (T) estimate and the
cyclical relative (C) of wheat production in a country. Calculate the
missing values. Interpret your results for 1998, 1999 and 2005.

Year Actual Value(Y) Trend value (T) Cyclical relative (%)


1998 1938 88
1999 1850 100
2000 1900 2242
2001 11100 10978
2002 11225 83
2003 12844 103
2004 15001 86
2005 11208 104
2006 17001 17239
Slide 10
Irregular Variation
• This is the unpredictable or irregular fluctuation in a time
series caused by unpredictable changes in weather,
unforeseen political activities, among others, which does
not repeat in a definite time.
- Erratic variations or the residue from trend that cannot
be ascribed to cyclical or seasonal influences.
- Cyclical variation is considered to be erratic and mixed
with irregular variation and impossible to be separated.

• No specific way of measuring irregular variation because


of its uncertainty.
Slide 11
Importance of Time Series
1. Helps in understanding past behaviour. Observing data
over a long period can help in understanding changes which
have taken place over time and consequently policy making.
2. Helps in planning for the future. Linked to policy making is
the issue of planning which require forecasting. With past and
present data one can easily forecast and plan for the future.
3. Helps in evaluating current accomplishments. Keeping
data on expected and actual performances can always help to
evaluate accomplishments.
4. Facilitates comparison. With data on different time series
one can easily make comparism and draw important
conclusions.
Slide 12
Topic 4
What is Forecasting?
• Process of systematically estimating future conditions
from past and present conditions.
• Trend equation normally based on annual data is
mostly used in forecasting.
• After that, the data is corrected for any seasonal
influence if it exists.
• After estimating the trend value, we then use the
seasonal Index to estimate the value for a season

Slide 13
Activity 1
• The data below shows quarterly export of a cash crop from a
country from 1995 – 2000 in million metric tons.
I II III IV Total
1995 5 8 9.5 26 48.5
1996 6.5 8.7 10.5 27 52.7
1997 6.8 9 12.1 26.5 54.4
1998 7.5 9.5 12 28 57
1999 7.8 10.5 12.5 29.5 60.3
2000 9.5 11.5 13.5 30 64.5
Question 1
a. Plot the data for all the years
b. Which component of time series is visibly present
c. Calculate the seasonal indices
d. Estimate the annual trend line using theSlidemethod
14 of least square
Activity 1 Cont’d
e. Forecast the sales value for the year 2008
f. Forecast the sales value for the first and fourth quarters of
year 2008.
Question 2
i. Estimate the trend line using the method of moving
averages
ii. Forecast the sales value for the year 2008
iii. Forecast the sales value for the first and fourth quarters
of year 2008
iv. Compare the forecast annual and quarterly values for
2008 in question 1 to that of question 2.
Slide 15
Exponential Smoothing
• Forecasting method that applies unequal weights to
time series observations.
• Weighting is done by using a smoothing constant
which determines how much
weight is attached to each observation.
• Recent observations are given greater weights,
while more distant observations are given
successively smaller weights.

Slide 16
Exponential Smoothing Cont’d
• Given a smoothing constant, α, the forecast formula is:

Where - the forecast for the next period.


• smoothing constant
• - actual value for the period preceding the most recen
period
• - actual value for the kth period preceding the mos
recent period.
• Use of this formula is difficultSlide 17
Exponential Smoothing Cont’d
• Operational formula used:

• - forecast for the next period


• - forecast for the most recent period
• - actual value for the most recent period
• is an error term.

Slide 18
Challenge:
• How do you choose α?.
• When α is close to zero, previous forecast
error
• are given a low weight.
• If α is chosen close to 1, then forecast error is
heavily weighted by the most recent result.
• Limitation: Used for only one period forecast
because you need the actual value before you
can calculate the next forecast.
Slide 19
Example 2
• A shopping mall experienced the following monthly sales
for the first four months of the year, in thousands of units.

January February March April

23.3 72.3 30.3 15.5

• If the forecast for January was 25000, determine the


forecasts for February through May using exponential
smoothing with a smoothing constant of α=0.15

Slide 20
Solution to Example 2
• Formula for simple exponential smoothing is given as:
• α=0.15

Month Actual Sales ( yt ) Recent Error


~
forecast ( yt ) ( yt - yt ) = et
~ Next Sales
Forecast ( ~yt +1 )
January 23.3 25 -1.7 24.745
February 72.3 24.745 47.555 31.878
March 30.3 31.878 -1.578 31.642
April 15.5 31.642 -16.142 29.220
May - 29.220 - -
Slide 21
Solution to Example 2 Cont’d
• The 2nd column on the table is the actual sales.
• Notice that the actual and forecast (column 3) value for January has
been given and therefore we can easily calculate the forecast error
for January as: as in column 4.
• February forecast value by using the exponential smoothing
formula as:
• Also, using the formula for march, we have:

• The results also shows that the forecast for May is 29,220.

Slide 22
Activity 2
• The table below shows the monthly export of cocoa for
the first five months of the year in thousand metric tons.

January February March April May

15.2 30.9 41 18.3 36.2

• If the forecast for January is 20,000 metric tons,


determine forecasts for February through June using
exponential smoothing with a smoothing constant of
α=0.17
Slide 23
ECON 316
Applied Statistics for Economists

Lecture 7 – Construction of Weighted (Aggregated)


Indices

Lecturer: Prof. Daniel K. Twerefou,


Department of Economics
• Contact Information: dktwerefou@ug.edu.gh

Slide 1
Outline
• Constuction of paasche Index
• Constuction of the Fisher ideal index

Slide 2
What is the Paasche Price Index?
• An aggregated index that measures the percentage ratio of the
prices of a group of commodities comparing two time periods -
base year and the current year. However, the prices of the
commodities are weighted by the current year quantities.
Mathematically: Mathematically, the Paasche price index is
defined as:

• The Paasche quantity and value indices can be expressed


mathematically as:

and

Slide 3
Example 3
The table below shows the prices in cedis and the quantities
(Hectoliters) of oils produces in a country and their average
annual prices.

Using 2002 as the base year


a. Calculate the Paasche quantity index, price index and
value index
b. Interpret your results for 2006.

Slide 4
Example 3
2002 2003 2004 2005 2006

Items Price Quantity Price Quantity Price Quantity Price Quantity Price Quantity

Palm oil 25 11200 27.50 11424 27.78 11652 30.55 12002 35.14 13442

Groundnut oil 30 11300 36.00 11752 36.72 12222 44.06 12711 52.88 14363

Shea butter 15 11100 15.75 11433 16.54 11776 17.36 12129 21.71 12493

Coconut oil 11 14000 15.40 14280 16.02 14566 22.42 14857 33.63 15154

Palm kernel oil 9 12700 9.45 13081 9.92 13473 11.41 13878 15.40 14294

Slide 5
Animal fat 8 12300 10.40 12915 13.52 13561 17.58 14103 22.85 14808
Solution to Example 3
• A table for 2003 will look like ...
Items p2003 q 2003 p2003q2003 q2002 p2003 q2002

Palm oil 27.50 11,424 314,160 11,200 308,000


Groundnut 36.00 11,752 423,072 11,300 406,800
oil
Shea 15.75 11,433 180,069.75 11,100 174,825
butter
Coconut 15.40 14,280 219,912 14,000 215,600
oil
Palm 9.45 13,081 123,615.45 12,700 120,015
kernel oil
Animal fat 10.40 12,195 134,316 12,300 127,920
Total 1,395,145.2 1,353,160

Slide 6
Solution to Example 3
• a. Recall:

We calculate the index for 2003 as follows:

Slide 7
Solution to Example 3 Cont’d
• For 2004
Items p2004 q2004 p2004 q2004 q2002 p2004 q2002
Palm oil 27.78 11,652 323,692.56 11200 311,136
Groundnut 36.72 12,222 448,791.84 11300 414,936
oil
Shea 16.54 11,776 194,775.04 11100 183,594
butter
Coconut 16.02 14,566 233,347.32 14000 224,280
oil
Palm 9.92 13,473 133,652.16 12700 125,984
kernel oil
Animal fat 13.52 13,561 183,344.72 12300 166,296
Total 1,517,603.64 1,426,226
• We calculate the index for 2004 as:

Slide 8
Solution to Example 3 Cont’d
• For the year 2005
Items p2005 q2005 p2005 q2005 q2002 p2005 q2002
Palm oil 30.55 12,002 366,661.10 11,200 342,160
Groundnut 44.06 12,711 560,046.66 11,300 497,878
oil
Shea 17.36 12,129 210,559.44 11,100 192,696
butter
Coconut 22.42 14,857 333,093.94 14,000 313,880
oil
Palm 11.41 13,878 158,347.98 12,700 144,907
kernel oil
Animal fat 17.58 14,103 247,930.74 12,300 216,234
Total 1,876,639.86 1,707,755
We compute the index for 2005 as

Slide 9
Solution to Example 3 Cont’d
• 2006
Items p2006 q2006 p2006 q2006 q2002 p2006 q2002
Palm oil 35.14 13,442 472,351.88 11,200 393,568
Groundnut 52.88 14,363 759,515.44 11,300 597,544
oil
Shea 21.71 12,493 271,223.03 11,100 240,981
butter
Coconut 33.63 15,154 509,629.02 14,000 470,820
oil
Palm 15.40 14,294 220,127.60 12,700 195,580
kernel oil
Animal fat 22.85 14,808 338,362.80 12,300 281,055
Total 2,571,209.77 2,179,548

• We compute the index for 2006 as:

Slide 10
Solution to Example 3 Cont’d
• The table summarizes the Paasche quantity index.

Year 2002 2003 2004 2005 2006


100 103.10 106.41 109.89 117.97
I q ( P)

• The Paasche index for 2006 is 117.97. This shows a net


increase in quantities of 17.97% compared to that of the base
year.

Slide 11
Solution to Example 3 Cont’d
The Paasche price index is computed as:

2003 2004 2005 2006


Items p2002 q2003 p2002 q2004 p2002 q2005 p2002 q2006
Palm oil 285,600 291,300 300,050 336,050
Groundnut 352,560 366,660 381,330 430,890
oil
Shea 171,495 176,640 181,935 187,395
butter
Coconut 157,080 160,226 163,427 166,694
oil
Palm 117,729 121,257 124,902 128,646
kernel oil
Animal fat 103,320 108,488 112,824 118,464
Total 1,187,784 1,224,571 1,264,468 1,368,139
Slide 12
Solution to Example 3 Cont’d
• Putting the summations into the Paasche formula, we obtain:
• 2003:

• 2004:

• 2005:

• 2006:
Slide 13
Solution to Example 3 Cont’d
• The table summarizes the results
Year 2002 2003 2004 2005 2006
100 117.46 123.93 148.41 187.93
I v ( P)

• The results in 2006 shows an average increase in the


Paasche price index of about of 87.93%. This is obtained
by subtracting 100% from 187.93.
Slide 14
Solution to Example 3 Cont’d
• The Paasche value index is calculated as:

2002 2003 2004 2005 2006


Items p2002 q2002 p2003q2003 p2004 q2004 p2005 q2005 p2006 q2006
Palm oil 280,000 314,160 323,692.56 366,661.1 472,351.88
Groundnut 339,000 423,072 448,791.84 560,046.66 759,515.44
oil
Shea 166,500 180,069.75 194,775.04 210,559.44 271,223.03
butter
Coconut oil 154,000 219,912 233,347.32 333,093.94 509,629.02
Palm 114,300 123,615.45 133,652.16 158,347.98 220,127.60
kernel oil
Animal fat 98,400 134,316 183,344.72 247,930.74 338,362.80
Total 1,152,200 1,395,145.2 1,517,603.64 1,876,639.86 2,571,209.77

Slide 15
Solution to Example 3 Cont’d
• Putting the summations into the Paasche formula, we obtain:

2003:

2004:

2005:

2006:

Slide 16
Solution to Example 3 Cont’d
• The table summarizes the results

Year 2002 2003 2004 2005 2006


100 121.09 131.71 162.87 223.16
I v ( P)

• For the year 2006, the Paasche value index was 223.16. this
shows a net increase in the value of about 123.16% compared
to the base year.

Slide 17
Activity 3
• The data below shows the prices (in cedis) and imports (in million
barrels) of some fuels from 2003-2007.
2003 2004 2005 2006 2007
Items Price Quantity Price Quantity Price Quantity Price Quantity Price Quantity
Gasoline 20 2220 22.00 2264.4 22.22 2310 24.44 2379 28.11 2664
Diesel 19 2730 22.80 2839.2 23.26 2953 27.91 3071 33.49 3470
RFO 12 1210 12.60 1246.3 13.23 1284 13.89 1322 17.36 1362
LCO 8 1400 11.20 1428 11.65 1457 16.31 1486 24.46 1515
Kerosene 14 1210 14.70 1246.3 15.44 1284 17.75 1322 23.96 1362
Premix 15 900 21.00 918 21.84 936 30.58 955 45.86 974

• Using 2003 as the base year, calculate the


i. Paasche price index
ii. Paasche quantity index
iii. Paasche value index
Slide 18
ECON 316
Applied Statistics for Economists

Session 3 – Contruction of Weighted Indices


and Consumer Price Index

Lecturer: Dr. Daniel K. Twerefou, Department of Economics


• Contact Information: dktwerefou@ug.edu.gh

Slide 19
Comparison of Laspeyres & Paasche prices
Indices
• The Laspeyres price index (LPI) measures change in a fixed
market basket of goods and services (same quantities used
in each period) whiles the Paasche index continually updates
the quantities to the levels of current consumption.
• LPI is generally expected to overestimate or have an upward
bias.
- All things been equal, when prices increase there is usually
a reduction in the consumption of those items for which the
increase has been the most pronounced and, hence, by using
base year quantities we will be giving too much weight to the
prices of the items whose prices have increased.
Slide 20
Comparison of Laspeyres & Paasche
prices Indices
• Paasche price index (PPI) tends to underestimate or
shows a downward bias.

-People tend to spend less on goods when their prices


are rising. Hence the use of current quantities as weights
produces an index which tends to underestimate the rise
in prices.

Slide 21
Comparison of Laspeyres & Paasche Indices
• The
=

Or
=

• These formuli helps us to derive the quantity indices from the value
indices. This process is known as deflating a series.

and
Slide 22
The Fisher Ideal Price Index

• Both the Laspeyres and Paasche indices have biases


which mislead policy makers especially in the area of
forecasting.
• The Fisher’s index attempts to correct the positive bias of
the Laspeyres and the negative bias of the Paasche index
by determining a composite index which is a geometrical
mean of the two indices.
• Mathematically, the Fisher ideal index is expressed as:

Slide 23
Example 1
The table below shows the price and quantities of 4
commodities in1988 and 1989.
Commodity 1988 1989

Price Quantity Price Quantity

A 2 8 4 6
B 5 10 6 5
C 4 14 5 10
D 2 19 2 13

Using 1988 as the base year, calculate the


i. Laspeyres price index
ii. Paasche price index
iii. Fisher’s Ideal index Slide 24
Solution to Example 1

• The Laspeyres price index:

• The Paasche price index:

• The Fisher’s Ideal index

Slide 25
Activity 1
• The data below shows the prices (in cedis) and imports (in million
barrels) of some fuels from 2005 -2007.
2003 2004 2005 2006 2007

Items Price Quantity Price Quantity Price Quantity Price Quantity Price Quantity
Gasoline 20 2220 22.00 2264.4 22.22 2310 24.44 2379 28.11 2664
Diesel 19 2730 22.80 2839.2 23.26 2953 27.91 3071 33.49 3470
RFO 12 1210 12.60 1246.3 13.23 1284 13.89 1322 17.36 1362
LCO 8 1400 11.20 1428 11.65 1457 16.31 1486 24.46 1515
Kerosene 14 1210 14.70 1246.3 15.44 1284 17.75 1322 23.96 1362
Premix 15 900 21.00 918 21.84 936 30.58 955 45.86 974
1. Using 2003 as the base year calculate the
i. Laspeyres price Index
ii. Paasche price Index and hence
iii. Fisher’s Ideal Index
2. Interpret your results for 2006 in l, ii, and iii.
Slide 26
The Consumer Price Index

• The Consumer Price Index (CPI) measures the change in


prices for a fixed market basket of goods and services
from one period to another.
• In constructing the consumer price index, the universe or
group of commodities to be included should be defined,
then a sample is chosen to represent the universe.
• The aggregate expenditure (Laspeyres) is used.
• Mathematically,

Slide 27
Formula for CPI
r

åp ni qo i
CPI = i =1
r
´ 100 =
• åp
i =1
oi qoi

• Let be the weights ( expenditure

• shares of commodities.
• then r pni

CPI = å ´ w0i ´100
i =1 p0i
Slide 28
Example 2
• The table below shows the prices (in cedis) and
quantities (in kilograms) of the basket of goods used in
the calculation of CPI in a country.

a. Construct the CPI for the years 2003 to 2007


b. Interpret your results in 2007

Slide 29
Example 2
2003 2004 2005 2006 2007
Items Price Qty Price Price Price Price
Cereals 15 22 17 17 18 21
Tubers 6 27 7 7 9 11
Plantain 8 12 8 9 9 12
Meat 11 14 15 16 22 34
Fruits 21 12 22 23 27 36
Vegetables 9 9 13 13 18 28
pulses & nuts 21 22 23 23 26 30
Beverages 12 27 14 15 18 21
Eggs 5 14 7 7 10 15
Milk 9 12 9 10 11 15
Poultry 15 10 21 22 31 46
Slide 30
Solution to Example 2
• Given . The calculations are presented in the table

2003 2004 2005 2006 2007


Cereals 330 363 367 367 455
Tubers 162 194 198 198 286
plantain 96 101 106 106 139
Meat 154 216 224 224 471
Fruits 252 265 278 278 431
vegetables 81 113 118 118 248
pulses & nuts 462 508 513 513 649
beverages 324 389 397 397 571
Eggs 70 98 102 102 214
Milk 108 113 119 119 185
Poultry 150 210 218 218 459
Total
æ v ö 2189 2570 2640 2640 4108
ç å
è i =1
qoi
pni ÷
ø Slide 31
Solution to Example 2 Cont’d
The CPI is computed as follows:

The table summarizes the results

Year 2003 2004 2005 2006 2007


CPI 100 117 121 121 188

b. The CPI for 2007 is 188% this means that on the average the prices of the
basket of goods have gone up by 88 % compared to the base years’ price.
Slide 32
Uses of the CPI- Calculating Inflation
• Inflation is persistent increase in the generally price level
whiles deflation is the persistent decrease in the general
price levels .
• Mathematically, inflation for year t is computed as:
, where

is the CPI for year t-1


is the CPI for year t
• If the value is greater than zero, then there is inflation, if
less than zero the there is deflation.
Slide 33
Example 3

An enquiry into the expenditure of middle class families in a


certain city gave the following information: Construct the CPI
using the weights provided and 1985 as the base and
interpret your results for 1986.
Items Food Fuel Clothin Rent Miscellaneous
35% 10% g 15% 20%
20%
Prices (1985) 150 25 75 30 40
(GHc)

Prices (1986) 145 23 65 30 45


(GHc
Slide 34
Solution
Yea Food Fuel Clothin Rent Miscellaneou Total
r g s Amoun
t
198 35
´ 150 = 52.5
10
´ 25 = 2.5
20
´ 75 = 15
15
´ 30 = 4.5
20
´ 40 = 8 82.5
5 100 100 100 100 100

198 35
´ 145 = 50.75
10
´ 23 = 2.3
20
´ 65 = 13
15
´ 30 = 4.5
20
´ 45 = 9 79.55
6 100 100 100 100 100

Slide 35
82.5

Solution cont’d
CPI 1985 = ´ 100 = 100%
82.5

CPI for the base year is always 100%. Thus

82.5
CPI 1985 = ´ 100 = 100%
82.5
79.55
CPI 1986 = ´ 100 = 96.42%
82.5
Interpretation: Compared to 1985 the expenditure
of middle class families have gone down by 4.58
percent in 1986.
Slide 36
Uses of the CPI-inflation calculation cont’d

The table below shows the CPI for a country from


1995 to 2002.
a. Calculate the level of inflation/deflation for each
year
b. Interpret your results for the year 2002.
Year

1995 1996 1997 1998 1999 2000 2001 2002


CPI

112 106 100 105 110 116 120 118

Slide 37
Solution to Example 3
• Inflation is computed for 2006 as:

• The summary of the results is provided in the table below.

Year 1995 1996 1997 1998 1999 2000 2001 2002


Inflation
(%) -5.4 -5.7 5.0 4.8 5.5 3.4 -1.7
b. Prices of commodities in 2002 have declined by 1.7% compared to
that of 2001.
Slide 38
Uses of CPI - Deflating
• Real income: income of an individual, organization or a
country, after taking into consideration the effects of
inflation
• The process of calculating real income from current
(nominal) income is called deflating.
• Mathematically, if RI is real income and NI is nominal
income, then:

Slide 39
Uses of CPI – Deflating -Example 4
The CPI of a country is 300 % in 2005 (2000=100). If an
employee earned ¢ 7000 in 2000 and ¢ 21,000 in 2005
a. Calculate his real income in 2005
b. Comment on the welfare of the employee

Slide 40
Solution to Example 4
• Recall :

• Thus

b. Although the employee’s nominal income tripled over


the period, his standard of living or purchasing power is
the same as it was in 2000.
Slide 41
Uses of Index Numbers – purchasing
power
• Related to the calculation of real income is the calculation of
purchasing power of a currency or income.
• Rise in price levels means a reduction in the purchasing power
of money or the amount of goods and services that you can
buy.
• Suppose the price of rice increases from ¢ 10 per bag in 2005
to ¢ 20 per bag in 2008. This means that in 2008 if you had ¢
10, you could buy only half bag of rice.
• Mathematically, if PP is the purchasing power of the cedi,
then:

Slide 42
Uses of Index Numbers – purchasing
power : Example 5
• The CPI for 2005 is 200 (1987 =100). Calculate the Purchasing
Power of the cedi in 2005 and interpret your results.
Solution
Recall :

• Thus,

• The purchasing power of the cedi has reduced by half.


• - if you lost GH¢ 1 in 1987 and found it in 2005, the GH¢ 1
could buy in 2005 only half of what it could have bought in
1987.
Slide 43
Activity 2
• Use the table below to answer the questions that
follow.
Year CPI (2000 = 100) Ama’s monthly income (GH¢)

2000 100.0 200


2005 145.0 1,200
2010 480.0 10,000

a. What is the purchasing power of the cedi for 2005


based on 2000?
b. Determine Ama’s real income for (i) 2005 and (ii)
2010. Slide 44
Uses of Index Numbers
• They help in framing suitable policies. For instance, employers
and employees in the labour market use the CPI to make wage
adjustments.
• They reveal trends and tendencies. Index numbers measure
changes over a period of time. Thus, trends could be examined to
draw important conclusions as to how much change is taking place
and to identify the strength of its effects.
• They are important in forecasting future economic activity.
Index numbers make use of time series data and hence can aid
decision makers to formulate policies to keep pace with changes in
economic activity.
• They are used in deflating i.e. adjusting the original data for price
changes. Slide 45
Limitations of Index Numbers
• Useful only when data is expressed quantitatively. This
means it is less important when qualitative data is involved.
However, some sociologists have a way of applying index
numbers using qualitative data.
• Index numbers based on sample may not necessarily be
the true representative of the actual state of affairs.
• It gives rough relative changes. Thus, accuracy cannot be
ascertained when samples of commodities are used (e.g.
CPI).
• Difficulty in choosing the method. There are several
methods and sometimes it is very difficult to know which
method should be used. Slide 46
ECON 316
Applied Statistics for Economists

Session 8 – Linear Regression Model

Lecturer: Prof. Daniel K. Twerefou & Dr. Emmanuel Abbey


Department of Economics
Contact Information: dktwerefou@ug.edu.gh

Slide 1
What is a simple linear regression model
• Regression analysis studies the nature of relationship
between variables and thereby provide a mechanism for
prediction
• The simplest bivariate econometric model is the linear
model, which assumes that the relationship between two
variables can be summarized by a straight line of the
form:

• Here, is the independent variable or the predictor for


the ith variable, is the dependent variable and is
the error term.
Slide 2
Simple Regression Model
• For example we can talk about the regression of the
supply of a commodity (Y) on the price (X).
• The additive describes a linear function relating
X to Y, and is the disturbance term which explains the
stochastic part.
• is added because the specifications of the relationships
between X and Y is considered as imperfect.
• a is the value of Y when X is zero whiles b describes the
slope.
• The value of b denotes the number of units that Y
changes when X changes by 1 unit. Slide 3
Simple Regression Model Cont’d
• The mean/expected value of the random error is
zero or
- Factors other than X that has not been included, for
which is introduced can be negative or positive. But the
positive ones will cancel the negative ones and the net will
be zero.
• Hence, the expected value of the variable Y is equal to
the linear function or : since
• a and b are the parameters of the model.
• Let
Slide 4
Assumptions about the Population Model
• Consider the regression model:
• Assumption 1: The relationship between x and y is linear
and the expected value of the error term is zero. That is,
,
• Assumption 2: The variance of the error term is constant
say, for all values of X.
• Assumption 3: The error term is normally distributed.
That is, it has a mean of 0 and a constant variance, for
all values of X.
• Assumption 4: The values of the error term do not have
any relationship with X.
Slide 5
Derivation of Least Squares Parameters
• The estimated trend relationship between X and Y can be
expressed mathematically as:
• a and b can be estimated using the method of least
squares.
• Method of least squares: The trend line should be drawn
through the plotted points (actual values) in such a way
that the sum of the deviations of the actual value (Y) and
the estimated values ( ) is the least.
• Implication: Minimize
a and b
Where Slide 6
Derivation of Least Squares
Parameters
• Our objective function is

• Differentiating partially with respect to a and b and


equating to zero, we obtain:
(1)

(2)
Slide 7
Derivation of Least Squares
Parameters
• Simplifying and expanding Equations (1) & (2), we obtain:
• Or

• ……….. (3)

• Since

• ………(4)

Slide 8
Derivation of Least Squares
Parameters
Solve Equations (3) and (4) simultaneously using Cramer’s rule
.

Slide 9
Derivation of Least Squares
Parameters
• The special determinants of a and b are given as:

Slide 10
Deriving the Least Squares Formula
• We obtain the values of a and b as follows:

• These formulae are clumsy and the computational


formula is usually used.
Slide 11
Deriving the Least Squares Formula
• Consider:

• Multiply and divide both the 2nd additive of the num.


and den. by

Slide 12
Deriving the Least Squares Formula
• Consider:

• Multiply and divide both the 2nd additive of the num.


and den. by

Slide 13
Deriving the Least Squares Formula
• Or

But and

are the means of the observations.


Slide 14
Deriving the Least Squares Formula
• Substituting this into the value of b, we obtain:

• Now that b is known, we can put it in Eqn (i) and make a the
subject as follows:

Slide 15
Deriving the Least Squares Formula
• Summarizing the formulae, we have;

• and

• The next step is to obtain the line of best fit by inserting


the values of a and b into
• This is We have been talking about the regression Y on X.
We can also have a regression X on Y as follows:

Slide 16
Example 1
Price(₵) Qty dd (kg)
The table below shows the 61 105
price and the quantity
62 120
supply of a commodity.
63 120
a. Find the regression
65 160
of quantity supply (Y) on
price (X) 65 120
68 145
b. Plot the actual points
and the regression line on 69 175
the same graph 70 160
72 185
75
Slide 17
210
Solution to Example 1
• We need to find the values of a and b by using the least
squares formula:

• and

• And latter the least square equation:

Slide 18
Solution to Example 1 Cont’d
Price (X) Quantity X2 YX
supply (Y)

61 105 3721 6405


62 120 3844 7440
63 120 3969 7560
65 160 4225 10400
65 120 4225 7800
68 145 4624 9860
69 175 4761 12075
70 160 4900 11200
72 185 5184 13320
75 210 5625 15750
å X = 1500 åY = 670 åX 2
= 45078 å XY = 101810
Slide 19
Solution to Example 1 Cont’d
• Firstly, we calculate the means.

• And consequently a and b as follows:

• Thus, the regression equation of quantity supply (Y) on price (X) is:
Quantity supply = -316.86+6.97Price
Slide 20
Solution to Example 1 Cont’d
b. The regression line has been plotted on a graph below

Slide 21
Example 2
• The regression equation as calculated in Example 1 is

i. What will be the quantity supply at a price of 90 cedis.


ii. What will be the price if the quantity supply is 300kg.

Slide 22
Solution to Example 2
We required to find the quantity demand if the price.
What I have to do is to put the value of X=90 into the
equation and evaluate. That is:

• This means that if the price increase to 90 cedis about


310.44 of the commodity will be supplied

Slide 23
Solution to Example 2 Cont’d
• We are required to find the price if quantity supplied is
300kg.
• We just have to put the value Y=300 into the equation and
evaluate
• This gives:

• This means that if quantity supplied is 300 kilograms then


the price will be 88.50 cedis.
Slide 24
Activity 1
• A company sets different prices for its product in all the ten
different regions of the country. The table below shows the numbers
of units sold and the corresponding prices (in cedis).

Sales(kg)
840 760 700 800 880 760 900 840 880 760
Price (₵)
8.25 9 9.75 9 7.5 9.75 6.75 7.5 7.5 9.75

a. Plot these data, and estimate the linear regression of sales on price.
b. Estimate the value of sales when price is 10.50.

Slide 25
ECON 316
Applied Statistics for Economists

Lecture 9 – Partitioning the Sum of Squares &


Coefficient of Determination

Lecturer: Prof. Daniel K. Twerefou,


Department of Economics
Contact Information: dktwerefou@ug.edu.gh

Slide 1
Partitioning the Sum of Squares
• Recall the linear model is composed
of two parts:
1. A linear function of X -
2. An error component -
Let us illustrate this with our regression equation

• This regression equation can be used to predict the value


of when X is given. Let us denote the estimated Y as
and the actual value as Y.
Slide 2
Partitioning the Sum of Squares
• For example, if X is 61, the corresponding value of Y’ is
108.19.
• The difference between the observed Y and the predicted
value Y’ (Y – Y’) is called a residual.
- If the values of X is 61, then the actual value of Y is
105 but the estimated value of Y using the regression
equation is 108.19
• The residual is the error part.
• The table on the next slides shows the linear and residual
parts of the regression equation.
Slide 3
Calculation of the Residual Part
N Price (X) Quantity supply ( Y ) Y¢ Residual
( Y - Y ¢)

1 61 105 108.19 -3.19

2 62 120 115.16 4.84

3 63 120 122.13 -2.13

4 65 160 136.06 23.94

5 65 120 136.06 -16.06

6 68 145 156.97 -11.97

7 69 175 163.94 11.06

8 70 160 170.91 -10.91

9 72 185 184.84 0.16

10 75 210 205.75 4.25

Mean 67 150 150.00 0.00

Variance 20.89 1155.56 1014.37 141.32


Slide 4
Partitioning the Sum of Squares
• From the table each value of Y could be thought of as a part
due to the linear regression (a + bX or Y') and a part due to
error (e or Y-Y'). Or

• Note: The variance of Y equals the variance of Y' (variance


due to regression) plus the variance of (variance due
to error). That is 1155.56 = 1014.37+141.32.
• We can further explain this by considering each observed
score as deviation from the mean of Y due in part to
regression and in part due to error.
Slide 5
Graphical Representation

Slide 6
• Graph: Shows plot of actual observation and the
regression line.
• Any actual observation of Y can be decomposed into
the linear or regression part and the error part.
• Consider the actual point (65, 120).
- Deviation of Y from the Mean (150) is equal to
120-150=-30.
- This value (-30) can be decomposed into the linear
and error parts.

Slide 7
Partitioning the Sum of Squares
• To do this, we have to find the estimated value of Y when
X=65.

• The deviation due to regression is

With the help of the estimated value of Y, , we


can calculate the error part as:

Slide 8
Partitioning the Sum of Squares
The table below summarizes the relation between the actual
value and the deviations as:
Observed/actual Mean Deviation from mean due to Error
regression part
Y Y (Y¢ -Y ) (Y -Y¢ )

This means

or (1)
• squaring and summing both sides of equation (1), we
obtain:

Slide 9
Partitioning the Sum of Squares
• Expanding, we obtain:

• Middle term on right hand side is equal to zero. thus,

(2)

Sum of squared deviations of Y from the mean is equal to


the sum of squares of regression plus the sum of squares of
error (residual). Slide 10
Partitioning the Sum of Squares
• Dividing through by n, the variance of Y becomes
equal to the variance of regression plus the variance
of the residual.

• Dividing through by :

Slide 11
Partitioning the Sum of Squares
This is equivalent to

• Sum of squares Y can be divided into two proportions,


that due to regression and that due to error.
• The two proportions must add to 1.

Slide 12
Example 1
Consider our example
involving quantity supply
and price. Calculate the
regression and error sum
of squares from the data.

Slide 13
(Y¢---YYY¢ ¢) 2
Y

Solution to Example 1 Cont’d


Y Y Y -Y (Y -Y 2 Y' Y¢ -Y ( Y ¢ - Y )2 (Y -Y¢ ) (Y - Y ¢) 2
105 150 -45 2025 108.19 -41.81 1748.076 -3.19 10.1761
120 150 -30 900 115.16 -34.84 1213.826 4.84 23.4256
120 150 -30 900 122.13 -27.87 776.7369 -2.13 4.5369
160 150 10 100 136.06 -13.94 194.3236 23.94 573.1236
120 150 -30 900 136.06 -13.94 194.3236 -16.06 257.9236
145 150 -5 25 156.97 6.97 48.5809 -11.97 143.2809
175 150 25 625 163.94 13.94 194.3236 11.06 122.3236
160 150 10 100 170.91 20.91 437.2281 -10.91 119.0281
185 150 35 1225 184.84 34.84 1213.826 0.16 0.0256
210 150 60 3600 205.75 55.75 3108.063 4.25 18.0625
Sum = 150 0 10400 1500.0 0.01 9129.307 -0.01 1271.907
1500
Variance 1155.6 1014.37 141.32

Slide 14
Solution to Example 1 Cont’d
• Total sum of squares for Y (10400) is equal to the sum of
squares regression (9129.31) and the sum of squares error
(1271.91).
• or

• Sum of squares of Y can be split into two components, the


regression sum of squares and error sum of squares

Slide 15
xy

Activity 1
• The table below shows the i. Find the regression of y
weight (x) and height (y) on x
of a sample of students ii. Find the total sum of
x y squares (TSS), error
2 2 sum of squares (ESS)
6 4 and the regression sum
4 4 of squares (RSS)
3 2 iii. Show that the
7 5 proportions of ESS and
7 6 the RSS to the TSS is
0 2 equal to 1
Slide 16
Coefficient of Determination
• The method of least squares reduces the error but does not
eliminate it
• Variability in the dependent variable is measured by the
sum of the squared deviations of Y around the mean. i.e
TSS = RSS + ESS or

• The coefficient of determination (R-squared) measures


the percentage of the variability in the dependent variable
that is explained by the independent variable or the
goodness of fit

• .
Slide 17
Coefficient of Determination Cont’d
• The coefficient of determination is a number between
0 and 1.

• of 0.80 means that 80% of the variability in the


dependent variable is explained by the independent
variable.

• of 0 means the independent variable does not explain


the dependent variable.

Slide 18
Example 2
• Consider Example 1. Calculate the coefficient of
determination and interpret your results.
SOLUTION
• From Example 1, TSS is 10400, RSS is 9129.31 and ESS
is 1271.91.

Inserting the values and simplifying, we obtain:

Slide 19
Solution to Example 2 Cont’d
• Alternatively, given the ESS,

• Recall that our regression is:


Qs = -316.86+6.97Price

• Interpretation: About 88% of the variability in the


quantity supply is explained by variability in the price
of that commodity. Slide 20
Relationship between Coefficient of
Determination & the Correlation Coefficient
• Correlation coefficient (rc) measures the nature and
degrees of relation between two variables. It falls in the
range -1≤ rc≤1.

• If the slope of the regression line- b is positive, then the


sign of rc is positive. If the slope is negative, the sign of rc
is negative
• Given that and are the slopes of the regression x
on y and y on x respectively, then

Slide 21
Testing the Significance of Regression
• Usually we do not have info about the whole population
and therefore we use the sample estimate of a population
parameter
• Can we make inference about the population using the
sample estimates ?
• To be able to do this we need to test for the significance of
the correlation coefficient
• Thus given the population model
• The relationship between x and y is tested using the steps
in hypothesis testing.
Slide 22
Testing the Significance of Regression
Step 1:Sstate the null and alternative hypothesis

Ho: x does not influence y , H1: x influences y.


Step 2: Select a test statistic and calculate the value of the
test statistics. The test statistics is the F-statistics with 1 and
n-2 degrees of freedom given as:
(using r-square)

Or
(using the regression sum of squares)
Slide 23
Hypothesis Testing
Step 3: Define the decision rule. If the calculated F is
greater than the critical F, then we reject the null hypothesis.
Rejecting the null hypothesis will imply that the sample
slope is not zero. It also implies that the regression that was
done is significant and that it best explains the relationship
between weight and height.

Step 4: Compare the calculated F with the


critical value on the statistical table and make your
conclusion.
Slide 24
Example 3

Consider the least square equation from example our


example. Test the significance of the regression at 0.05 level
of significance.
Solution
• Step1: State your null and alternative hypothesis.

Slide 25
Solution to Example 3 Cont’d
Step 2: Select a test statistic and calculate the value of the
test statistics
• In testing for the regression, the test statistics is the F-
statistics with 1 and n-2 degrees of freedom given as:
where n is the sample size

• From our earlier example, RSS=9129.31, ESS=1271.91


and n=10. Thus :
• OR

Slide 26
Solution to Example 3 Cont’d
• Step 3: Define the decision rule
If the calculated F is greater than the critical F then we
reject the null hypothesis.
• Step 4: Compare the calculated F with the critical values
from the statistical table and make your conclusion.
• The calculated F is 57.42 whilst the critical

• Since the calculated F is greater than the critical F, we


reject the null hypothesis and conclude that the regression
is significant at the 0.05 significance level
Slide 27
Activity 2
• A study was conducted at Mid-semester Final exam
the Economics Department score ( x ) score ( y )
to determine whether a 39 65
student’s final grade in an 43 78
exam is linearly related to 21 52
his or her performance in a 64 82
mid-semester test. A 57 92
sample of 10 students was 47 89
taken and their mid- 28 73
semester test score and
75 98
final exam has been
34 56
provided below.
52 75
Slide 28
Activity 2 Cont’d
a. Find the least squares line relating y to x and interpret it.
b. Predict a student’s final score if he or she scored 60 in
the mid-semester exam.
c. Find the TSS, RSS and ESS.
d. Calculate the value of and interpret it.
e. What form of correlation exists between the mid-
semester score and the final exam?
f. Can you conclude that the slope of the least squares line
is insignificant at the 0.05 level?

Slide 29
ECON 316
Applied Statistics for Economists

Lecture 10– Introduction to Correlation Analysis

Lecturer: Prof. Daniel K. Twerefou


Department of Economics
Contact Information: dktwerefou@ug.edu.gh

Slide 1
Session Overview
By the end of this section, you should be able to:
• Define correlation analysis
• State the objectives and importance of correlation
analysis
• Explain the types of correlation
• Give examples of correlation between two variables
• Find the correlation between two variables using the
scatter diagram and the graphical methods

Slide 2
What is Correlation Analysis?

• Correlation analysis attempts to determine the nature


and degree of relationship between two variables.
- techniques used in measuring the closeness
between the two variables.
• Correlated variable: changes in one variable is
accompanied by changes in the other variable .
• For example, income of a consumer and the quantity
he demands or the increase in the minimum wage rate
and the level of savings.
• Correlation does not imply causality.
Slide 3
Objectives of Studying Correlation

• Measure the degree of relationship existing between any


two variables.
─ For example the relationship between price and
demand, income and expenditure, government
expenditure and inflation etc.

• Given the relationship between any two variables, one can


estimate the value of one given the value of the other.

• .

Slide 4
Objectives of Studying Correlation

• Contributes to the understanding of economic


behaviour.
─ In business, correlation analysis enables the
executive to estimate costs, sales, prices and other
variables. However correlation does not
necessarily imply a cause-effect relationship
• The development of disciplines such as Science and
Philosophy has been characterized by increase in the
knowledge of relationship or correlations
• The effect of correlation is to reduce uncertainty. The
prediction based on correlation analysis is likely to be
Slide 5
Importance of Studying Correlation

• Helps measure the degree of relationship existing


between variables.
• Knowledge of the correlation between the two variables,
help us to estimate the value of one of the variable given
the value of the other.
• Helps to reduce the range of uncertainty. The predictor
based on correlation analysis is likely to be more accurate
and close to reality.
• In business, it helps in the estimation of costs, sales, prices
and other variables on the basis of some other series with
which these costs, sales or prices may be functionally
Slide 6
related.
Types of Correlation Analysis

Some ways in which correlation can be classified are:


─ Positive Correlation and Negative Correlation:
Difference depends on the direction of change of the
variables.
─ Simple Correlation, Partial Correlation, and Multiple
Correlations: Difference depends on the number of
variables being studied for correlation at a particular
point in time.
─ Linear Correlation and Non-linear (Curvilinear)
Correlation

Slide 7
Positive Correlation and Negative
Correlation
• Positive Correlation
─ Both variables change in the same direction.
─ As on variable increases (decreases), the other
variable also increases (decreases) in the same
direction.
• Negative Correlation
─ Both variables change in opposite directions.
─ As on variable increases (decreases), the other
variable decreases (increases).
Slide 8
Example 1.0
The data below shows the number the volume (X) and value
(Y) of exports of cocoa in a country. Describe the type of
correlation between the variables.

X 5 7 9 11 13 15

Y 20 25 29 32 34.5 36

Slide 9
Solution to Example 1
• Lets plot them on the
same grapth
Positive Correlation

40
• As the volume increases
35
30
the value increases and
25
20 Series1
vice versa.
Y

15
10 • There is a positive
5
0
0 5 10 15 20
correlation between the
X value and volume of
cocoa exports.

Slide 10
Example 2.0
The data below shows the demand (X) for a commodity and
the price (Y). Describe the type of correlation between the
variables.

X 22 30 33 36 40 42
(Demand)

Y (Price) 12 10 9.5 8 6 2

Slide 11
• Demand increases the
Negative Correlation price decreases and vice
14 versa.
12
10
• There is a negative
8 correlation between
Series1
Y

6
demand and price of the
4
2 product.
0
0 10 20 30 40 50
X

Slide 12
Simple, Muliple and partial Correlation
• Simple Correlation
─ Observed between only two variables.
─ Example observing the height of students alongside the
weight.
• Multiple Correlation
—Involve three or more variables.
—Example, when the quantity supplied of a commodity is
observed alongside the price, level of technology, tax structure
, etc..
—4 variables – quantity, price, technology and tax and therefore a
multiple correlation.
• Partial Correlation
─ More than two variables are observed but only two are
considered for correlation as the effects
Slide 13 of the other variables
Linear and Non-Linear Correlation

• Linear Correlation
─ Situation whereby there is a constant change in the
variables.
─ A straight line can be drawn to pass through some or
all of the points a graph sheet.
• Non-linear (Curvilinear) Correlation
—Situation where the level of change in one variable
does not show a constant level of change in the other
variable.
—In practice, most relationships between variables are
non-linear or curvilinear in nature.
Slide 14
Example 3.0
The table below shows the number of hours a student spends
in a week (X) on his book and his average score (Y) in a
weekly quiz. Establish the relationship between the number
of hours spent on books per week and the weekly score.

X 20 40 60 80 100

Y 10 60 110 160 210

Slide 15
Solution to Example 3.0

• X values increase by 10
Linear Correlation
whilst the Y values also
250 increase by 50.
200 • There is a uniform
150 change in the X values
Series1
Y

100 and the Y values as well


50 There is a linear relation
0 between X and Y
0 50 100 150
X

Slide 16
Methods of Studying Correlation-Scatter
Diagramme
─ Plot the bivariate data on a graph paper and determine
the degree of correlation with the help of the spread of
the points.
─ Perfectly positive correlation:. All the points lie on a
straight line with a positive gradient
─ Perfectly negative correlation:. All the points lie on a
straight line with a negative gradient.
─ Depending on the degree of spread of the point, there
can be a high or low degree of positive or negative
correlation.

Slide 17
Methods of Studying Correlation-Scatter
Diagramme cont’d
Figure 1.0: Perfectly negative Figure 2.0: Perfectly Positive
correlation Correlation

Figure 5.0: No Correlation

Figure 3.0: Low degree of Figure 4.0: Low degree of


negative correlation positive correlation

Slide 18
Advantages and disadvantages of Scatter
diagrame

• Advantages
─ Simple and non-mathematical thus making it easy
to understand.
─ Method is not also influenced by the size of
extreme value whereas most mathematical
methods are influenced by extreme values.

• Disadvantages
─ Exact degree of relationship between the two
variables cannot be determined. What we obtain
is a rough description of correlation
Slide 19
Methods of Studying Correlation: Graphical

─ Plot values of the two variables on a graph paper


individually.
─ Two curves, one for the variable X and another for Y.
─ By examining the direction and closeness of the two
curves drawn you can infer whether or not the
variables are related.
─ If both of the curves drawn are moving in the same
direction (either upward or downward) correlation is
said to be positive.
─ If the curves are moving in the opposite directions
correlation is said to be negative.
Slide 20
Example 4
The table below shows the expenditure (X) and income (Y)
of a consumer from 2000 to 2004. Using the graphical
method describe the correlation between income and
expenditure.

Year X Y
2000 10 15
2001 20 34
2002 27 48.6
2003 35 52.5
2004 40 64
Slide 21
Solution to Example4
• Both production values are
plotted on the y-axis as
shown in the following
diagram.
• A positive relationship can
be observed between income
(X) and expenditure (Y).
• Why? graph for income and
expenditure are increasing in
the same direction.

Slide 22
Example 5

The table below shows the export (X) and import (Y)
value of a country from 1995 to 1999 in million US
dollars. Using the graphical method describe the
correlation between income and expenditure.

Year X Y
1995 100 300
1996 200 350
1997 270 300
1998 350 200
1999 400 100
Slide 23
Solution to Example 5
• Both production values are
plotted on the y-axis as
shown in the following
diagram.
• Generally there is a negative
relationship between income
(X) and expenditure (Y).
• Why? graph of exports and
imports are moving in the
opposite direction. Whereas
exports are increasing import
are decreasing.

Slide 24
ECON 316
Applied Statistics for Economists

Session 11 – Karl Pearson’s and Spearman’s Rank


Correlation Coefficients

Lecturer: Prof. Daniel K. Twerefou,


Department of Economics
Contact Information: dktwerefou@ug.edu.gh
Slide 1
The Karl Pearson (Product Moment) Correlation
Coefficient
• Applies a computational formula to find a definite figure
that shows the nature and degree of correlation between
the two variables.

• Most widely used mathematical method of measuring


correlation and denoted by the symbol r.

• Formula for computing the Pearson’s correlation


coefficient (r) is based on the assumption that the bivariate
data involved are quantitative.
Slide 2
The Karl Pearson (Product Moment) Correlation
Coefficient
• If 𝑋! and 𝑌! (i=1,2…n) are the observations of two variables X
and Y, then : å æç X - X öæ
-
ö
÷ç Y - Y ÷
-

r= è øè ø
2 2
æ -
ö æ -
ö
å çè X - X ÷
ø
å çè Y - Y ÷
ø
Where X and are
Y the means of X and Y observations.

If I denote ( X - X ) = x and (Y - Y ) = y then: r =


å xy
å y
x 2 2

Use of these formula clumsy. Computation formula most often


used is: nå XY - ( å X )( å Y )
r=
é n X 2 - ( X )2 ù é n Y 2 - ( Y )2 ù
êë å å úû êë åSlide 3 å úû

X22YX2Y2

The Karl Pearson Correlation Coefficient


cont’d
Where:
n is the number of paired observations
åX is the sum of the X variable
åY is the sum of the Y variable
åX is the sum of the squared X variable
2

åY is the sum of the squared Y variable


2

(å X ) is the square of the sum of the X variable


2

(åY ) is the square of the sum of the Y variable


2

å XY is the sum of the product of the X and Y values

X2 Y2

åX åY å XY åX
2
åY
2

Slide 4
Interpretation of Pearson Correlation Coefficient

• The Pearson correlation coefficient is symmetric: if you interchange the


values of X and Y in the equation you will obtain the same results
• Pearson’s correlation coefficient takes values from negative one to
positive one ( -1 £ r £ 1).
• The table below summarizes the interpretation given to various values of
r.
Value of r Interpretation
r = -1 Perfectly negative correlation
r=1 Perfectly positive correlation
r=0 No correlation
-1 < r £ -0.5 Strong negative correlation
0.5 £ r < 1 strong positive correlation
0 < r < 0.5 Weak positive correlation
-0.5 < r < 0 Weak
Slide 5
negative correlation
Advantages and disadvantages of the Pearson
Correlation Coefficient
Advantage
• It is able to provide a precise quantitative figure that
explains the nature and degree of correlation between two
variables.

Disadvantages
• It is applied mostly in situations where the relation is
linear.
• The mathematical computation and the technical
interpretation sometime pose difficulty.
• It is affected by extreme values.
Slide 6
Example 1

The CEO of a factory is interested in the relationship between the


production and sale of their products. The costs and sales of the
various products for last year are shown below:
Prod A B C D E F G H I J
uct
Costs 7 9 5 11 2 4 4 3 8 11
Sales 15 6 7 10 8 7 12 11 9 8
(a)What is the main purpose of investigating the relationship
between sales and costs?
(b) Calculate the Pearson’s product-moment correlation
coefficient between sales and costs
(c) What can you say about the relationship between sales and
costs? Slide 7
Solution to Example 1

(a) The main reason for investigating the relationship is to


help in prediction. This implies that given the costs of the
production the company can estimate the nature and
direction of sales.

(b) We will use the formula


nå XY - ( å X )( å Y )
r=
én X - ( X ) ù én Y - ( Y ) ù
êë å å úû êë å å úû
2 2 2 2

• Let the cost variable by X and the sales variable by Y.


Slide 8
Solution to Example 1 cont’d
Products Cost (X) Sales (Y) XY 𝑿𝟐 𝒀𝟐

A 7 15 105 49 225
B 9 6 54 81 36
C 5 7 35 25 49
D 11 10 110 121 100
E 2 8 16 4 64
F 4 7 28 16 49
G 4 12 48 16 144
H 3 11 33 9 121
I 8 9 72 64 81
J 11 8 88 121 64
Total 64 93 589 506 933

å X = 64 åY = 93 å X = 505
2
å XY = 589
n=10

Slide 9
Pearson Correlation (Product Moment) Coefficient

nå XY - ( å X )( å Y ) 10(589) - 64(93)
r= = =
én X 2 - ( X ) ù én Y 2 - ( Y ) ù é10 ( 506 ) - ( 64 )2 ù é10 ( 933) - ( 93)2 ù
êë å å úû êë å å úû
2 2

ë ûë û

-62
=
( 964 )( 681) = -0.07652
• C. There is a weak negative correlation between sales and
cost

Slide 10
Spearman’s Rank Correlation Coefficient

• There are many situations in which variables involved are


qualitative and hence cannot be measured numerically using
Pearson’s correlation coefficient.
─ Example: variables like status or position in competition
are qualitative and cannot be measured numerically but can
be ranked.
• Rank correlation uses a bi-variate data of any one of the
following forms:
─ Both are quantitative and appears non-linear.
─ Both are qualitative.
─ One is quantitative and the other is qualitative.
The Spearman’s rank correlation ( rs )is used to calculate the
Slide 11
ranked correlation between two variable.
Steps in calculating the Spearman’s rank
correlation coefficients

a. Rank one of the variables (x) in either ascending or


descending order or in any way you want to obtain 𝑟" .
b. Rank the other variable (y) the same way as you ranked the x
variable to obtain the 𝑟# . If there is repetition of two or more
observations, they are assigned the mean rank.
c. Calculate the square of the diff. for each ranked pair d = ( rx - ry )
2
2
i i

d. Calculate the sum of the squared difference å d .2


e. Compute the Spearman’s rank correlation coefficient using
the formula:
6(å d i )
2

rs = 1 - where n is the number of observations


(
n n2 -1 )
Slide 12
Interpretation of Spearman’s Rank Correlation
Coefficient
• Like the Pearson’s correlation coefficient, the
Spearman’s correlation coefficient takes values
from negative one to positive one ( - 1 £ rs £ 1).
• Interpretation is same as that of the Pearson’s
correlation

Slide 13
Example 2
• The following are two sets of ranks assigned by two
Professors of management to ten styles of business
leadership.

a. Compute the rank correlation coefficient.


b. Interpret your result.
Slide 14
Solution to Example 2

• The first, second and third rows shows the styles, rank by
the first Professor and rank by the second Professor
respectively.
• The next step is to find the difference between the ranks
which has been presented as row 4.
Style
A B C D E F G H I
r1 10 1 2 9 8 6 7 5 4
r2 6 9 3 10 2 7 5 1 8
d=r1- 4 -8 -1 -1 6 -1 2 4 -4
r2
d2 16 64 1 1 36 1 4 16 16 å d= 155
2

Slide 15
Solution to Example 2

• The last row presents the square of the differences. In this


example there are 9 observations and therefore n=9.
6(å d i ) 6(155)
2

rs = 1 - = 1-
(
n n2 -1 ) (
9 92 - 1 )
rs = -0.29167
b. Interpretation: There is a weak negative correlation
amongst the rankings by the two professors.

Slide 16
Testing the Significance of the Correlation
Coefficient
• We usually do not have information on the population so
we always use the sample to make estimates of correlation
• Population correlation coefficient ( r ) is the correlation
coefficient calculated by examining all the possible
• Sample correlation coefficient is obtained by examining
only a sample and the best estimator of the population
correlation coefficient if the two variables are random
variables, linearly related and normally distribution.
• From the sample results can I make any inference about
the population correlation coefficient? Sample values can
vary based on the sample chosen
Slide 17
Testing the Significance of the Correlation
Coefficient cont’d
• To answer the preceding question, we need to use the
sample correlation coefficient to test the hypothesis that
there is no linear relationship between the two variables in
the population.
• Four steps are involved: H0 : r = 0
Step 1: State your null and alternative hypothesis. H 1 : r =/ 0

• Null hypothesis: There is no correlation between the two


variables in the population.
• Alternative hypothesis: The is a correlation between the
two variable in the population .
Slide 18
Testing the Significance of the Correlation
Coefficient
Step 2: Select a test statistic and calculate the critical
values. The test statistics is the student’s t-statistic given by
the formula:
n-2
t =r
1- r2
with (n-2) as the degrees of freedom.
• Here, r is the sample correlation coefficient and n is the
number of observations

Slide 19
Testing the Significance of the Correlation
Coefficient cont’d
• Step 3: Construct your decision rule. Reject the Ho
when the computed t-value is greater than the critical
t value as a particular level of significance. It means
there is a significant difference between the value of
( r) and 0 hence there is the presence of correlation.
When we fail to reject the Ho at the significance
level, it means the value of ( r ) is not different from
0 and it is probably due to chance.
• Step 4: Compare the calculated values with the
critical values and conclude. The critical value of t is
read from the t-distribution table
Slide 20 r
Example 3

Test the significance of the correlation coefficient between


two variables whose r is - 0.871. There are 10 numbers of
observations. Use 0.05 level of significance.

Slide 21
Solution to Example 3
• Step 1: State your null and alternative hypothesis.
H0 : r = 0
H1 : r =
/ 0
• Step 2: Select a test statistic and calculate the critical
values :
n-2 10 - 2 8
t=r = -0.871 = -0.871 = -5.014
1- r 2
1 - (-0.871) 2
0.2414

• Step 3: Construct your decision rule. Reject the null


hypothesis if the computed t-value is greater than the
critical t value.
Slide 22
Testing the Significance of the Correlation
Coefficient
Step 4: Compare the calculated value with the critical values
and conclude. The critical value of t is read from the t-
distribution table
±ta ,n-2 = ±t0.05 ,10-2 = ±t0.025,8 = ±2.306
2 2
Conclusion: The calculated t = -5.014 is less than the critical
-t0.025,8 = -2.306 hence we accept the null hypothesis at the 0.05
level of significance.
This means that there is enough evidence to suggest that the
value of ( r) is different from 0 hence there is no correlation
between the two variables involved
Slide 23
ECON 316
Applied Statistics for Economists
Session 12 – One-Way ANOVA

Lecturer: Prof. Daniel K. Twerefou


Department of Economics
Contact Information: dktwerefou@ug.edu.gh

Slide 1
Concept of ANOVA
• Analysis of Variance (ANOVA) is the technique of using
the F test to compare the equality in three or more
population means.
• Assuming that we have 3 different brands of cars Nissan,
Toyota and Opel, and we would like to test the null
hypothesis that the mean kilometers covered by all the 3
brands of cars are equal against the alternative that they
are not the same. ANOVA technique is used in such a
case.
• ANOVA technique helps us to examine if a significant
difference exists between the kilometers covered by the 3
car brands or any difference observed is due to chance.
Slide 2
Advantages of Using ANOVA technique
Compared to the z or t test, ANOVA technique is good in testing
for the equality of 3 or more means for the following reasons:
1. Using the z or t test in comparing several means will imply
that only two means can be compared at one particular time
whilst the rest of the means are ignored. F test used in
ANOVA helps to compare several means simultaneously.
2. In making pairwise comparisons, the z or t tests increases the
probability of rejecting the null hypothesis when it is true.
3. The more means that have to be compared, the more z or t
tests that are required. This is a laborious task and can be
done at a goal using ANOVA technique.
Slide 3
Assumptions Underlying the ANOVA Technique

1. Populations from which the samples were obtained must


be normal or approximately normally distributed.
2. Samples must be independent of each other.
3. Variances of the populations must be equal.

Slide 4
One-Way ANOVA
• Involves the comparison of three or more population
means involving only one variable.
• F -test is used in making such comparisons since variances
rather than means are compared.
- Explains why the technique is called Analysis of
Variance (ANOVA).
• Example: Assuming I want to test the hypotheses of the
equality of the mean grade of 5 study groups in level 300.
- Variable - mean grade obtained,
- More than 3 groups involved -5 study groups.
Slide 5
Procedure for Hypothesis Testing
Assuming we want to compare the means of k groups
usually called treatments. Independent samples would have
to be taken from the pop.
STEP 1: Statement of Null and Alternative Hypothesis
• H0:
H1 : for some
• Ho: The k pop. means are equal given the independent
random samples from the pop.
• Ha: At least for one pair, the means of the pop.are
different.
Slide 6
Procedure for Hypothesis Testing
STEP 2: Determine the Choice of Statistical Test to
Use
• We comparing several pop. means involving one
variable and therefore the F statistic which will be
used

Step 3: Determine the Decision Rules and Find the


Critical Values
• Reject the Ho if the F calculated value is greater than
the F critical value at a specific level of significance.
Otherwise do not reject
Slide 7
Procedure for Hypothesis Testing
Step 4: Computation of the F Test Value
Testing the equality of the pop. means involves two different
estimates of the variances calculated using the sample means.
a. Treatment variance (treatment sun of squares): measures
variability among the k group means computed by finding
the variance of the means.
b. Error variance (error sum of squares): measures variability
of the individual sample means within the k number of
groups computed by finding the variance of individual
observations in a group from the group mean
Slide 8
ANOVA Framework
• Treatment
n1
T1 = åx 1i
1 2 K
i =1
x11 x21 xk1
n2
x12 x22 xk 2
T2 = å x2 i
i =1

x1n1 x2n2 xknk


k
T = åT
i =1
i
T1 T2 Tk

Slide 9
Computation of F-Test Value
• Consider the sum of squares deviation from the overall
mean called Total Sum of Squares (SST )
• Mathematically:

• Adding and subtracting

Slide 10
Computation of F-Test Value cont’d

(1)
k ni k ni k ni
= åå ( xij - xi ) + åå ( xi - x ) + 2å ( xi - x ) å ( xij - xi )
2 2 2

i =1 j =1 i =1 j =1 i =1 j =1

since

Slide 11
Computation of F-Test Value cont’d
• Consider the additive
• Expanding:

= =
• k

ån (x - x)
2
• = i i (2)
i =1

Slide 12
Computation of F-Test Value Cont’d
Inserting equation 2 into 1 consideration
We obtain:
…. (3)
k

å (x )
2
= ni i - x
i =1

SSTR: measure the variability among the k group means and


computed as:
• SSE: measures the variability about the individual sample
means within the k groups of observations and computed as :

Slide 13
Computation of F-Test Value Cont’d
• The operational formula used are is:

• :

Slide 14
Computation of F-Test Value Cont’d
• Calculting mean sum of squares : done to estimate a common
population variance.

• (n – k)- degree of freedom


SSTR
MST =
n -1
• (n – 1)- degree of freedom

Slide 15
One-way ANOVA Table
Source of Sum of Degrees of Mean F ratio
variation Squares freedom Squares

Treatments SSTR k -1 SSTR MST


MST = F=
k -1 MSE

Error SS E n-k SS E
MSE =
n-k

Total SST n -1

Slide 16
Hypothesis Testing
STEP 5: Make your decision in the light of the decision
rules and summarize your results
• We compare the calculated F test value with the critical F
value which is read from the F distribution table given
that all assumptions stated in the previous section hold.
• If the F calculated value is greater than the F critical
value, then we reject the null hypothesis at the
significance level.
• If the F calculated value is less than the F critical value,
then we fail to reject the null hypothesis at the
significance level. This result will imply that the
population means are indeed equal.
Slide 17
Example 1
The Research Department of a Transport Company wishes
to know whether the mean mileages of 4 different brands of
tyres are the same. If the means are the same, then the
company will always go for the brand of tyres with the
lowest price. The Department therefore took the mileage of
the different tyres for 5 months. The result of their
experiment is represented below. Test whether the mean
mileages are the same if α =0.05

Slide 18
Example 1
• Mileage of 4 different cars
Brands of tyres/ Month 1 2 3 4

1 80 81 76 93

2 86 82 82 90

3 83 88 84 91

4 85 83 82 93

5 81 86 84 88

Slide 19
Solution to Example 1
STEP 1: State your null and alternative hypothesis
H0:
• H0: for some
STEP 2: Determine the choice of statistical test to use
• One independent variable –type of tyre and one depended
variable -mean mileage covered. One-Way ANOVA
STEP 3: Determine the decision rules and find the
critical values
• n = 20 and k = 4.
Slide 20
Solution to Example 1 Cont’d
Table for Computations of Total and Mean Mileages
Brands 1 2 3 4

80 81 76 93

86 82 82 90

83 88 84 91

85 83 82 93

81 86 84 88

Totals T1 = 415 T2 = 420 T3 = 410 T1 = 455

Means x1 = 83 x2 = 84 x3 = 82 x4 = 91

Slide 21
Solution to Example 1 Cont’d
• STEP 4: Compute the F test value and tabulate

• T -sum of all the observations. Mathematically,

Slide 22
Solution to Example 1 Cont’d
• First additive= summation of the square of each mileage
covered by all the brands.

• Second additive = square of the total number of mileages


covered divided by the total sample size.

• Inserting these values into the SST and simplifying:

Slide 23
Solution to Example 1 Cont’d

• First term =sum of the squares of total observation


divided by the respective sample sizes.

• Second term = already computed: Thus

Slide 24
Solution to Example 1 Cont’d
• Computations using the One-Way ANOVA
Sources of Sum of Degrees of Mean sum F ratio
variation Squares freedom of Squares

Treatment 250 4-1=3 250 83.33


= 83.33 = 13.333
3 6.25

Error 100 20 - 4 = 16 100


= 6.25
16

Total 350 20 – 1 = 19

Slide 25
Solution to Example 1 Cont’d
STEP 5: Make your decision in the light of the decision
rules and summarize your results
• The calculated F test value is greater than the F critical
value.
• That is,
• We reject the null hypothesis at the 5% level of
significance.
• This means that the mean mileages of the four brands of
tyres are not equal. There is a difference in the means of
at least one pair of the 4 different brands of tyres.
Slide 26
Activity 1
A researcher wanted to investigate Weight gains from 3 different
the impact of three forms of diet on diets
the weight of students in the Diet I Diet II Diet III
University of Ghana. Students were 8 10 7
therefore randomly assigned to take 5 9 6
the three different kinds of diet. They 5 14 8
were placed on the diet for 8 weeks. 4 8 5
The table below shows the weight 3 11 15
gains (in kg) of the students in the 3 2 9
different diets. 7 13
• Test the hypothesis that there is no 4
difference in the diets given to the 12
students if Slide 27
ECON 316
Applied Statistics for Economists

Session 13 – Two-Way ANOVA

Lecturer: Prof. Daniel K. Twerefou,


Department of Economics
Contact Information: dktwerefou@ug.edu.gh

Slide 1
Session Outline
The key topics to be covered in the session are as follows:
• Concept of Two-Way ANOVA
• Randomized Blocks Design
• Hypothesis Testing in Two-Way ANOVA

Slide 2
Concept of Two-Way ANOVA
• Two-Way ANOVA technique is used to study the influence of two
independent variables simultaneously on a dependent variable.
• Example: a student may be interested in the effects of three different
types of plant food and three different types of soil on the growth of
a certain seedling.
- two independent variables: plant food and soil type
- one dependent variable: growth of the plant.
• Assuming there are:
- k number of treatment groups (different set of observations of one
of the independent variable) and
- b number of blocks (different observations of the other
independent variable).
Slide 3
Concept of Two-Way ANOVA- randomized
blocks design.
• Let xij represent the sample TREATMENT GROUPS
observation corresponding
BLOCKS 1 2 3 ….. K
to the response of ith
treatment in the jth block. 1 x11 x12 x13 ….. x1k

• Table-randomized blocks 2 …..


design: . . . .
• selecting randomly . . . .
elements from the blocks to
. . . .
combine with each of the
elements in the treatment B xb1 xb 2 xb3 ….. xbk
groups.
Slide 4
Assumptions Underlying Two –Way
ANOVA
• Populations from which the samples were obtained must
be normally or approximately normally distributed
• Samples must be independent of each other
• Variances of the populations must be equal
• Treatment groups must all have equal sample size

Slide 5
Procedure for Two-Way ANOVA
STEP 1: statement of null and alternative hypothesis
• In Two-Way ANOVA , there two sets of hypotheses
that have to be tested.


• and
• for some treatments for some blocks

STEP 2: Determine the choice of statistical test to use


• The F statistic is used in Two-Way ANOVA. We compute
two F test values- one for comparing the treatment means
and the other for comparing the block means.
Slide 6
Procedure for Two-Way ANOVA Cont’d
STEP 3: Determine the decision rules to use and find the critical values
• Decision rule: Reject the Ho if the F calculated value is greater than
the F critical value for the comparison of the treatment means or
block means.
Critical value for comparing treatment means
• Numerator degree of freedom:
• Denominator degree of freedom:
• Example. If we have 6 treatment groups and 4 blocks with a level of
significance of 0.05, then we can find the F critical values as follows:

• Thus, from the table:


Slide 7
Procedure for Two-Way ANOVA Cont’d
Critical value for comparison block means
• Numerator degree of freedom:
• Denominator degree of freedom:
Thus, the F critical value for comparison of block means is:
• For example, given 6 treatment groups and 4 blocks with a level of
significance of 0.05, then we can find the F critical values as
follows:

• Thus, the F critical value is:

Slide 8
Procedure for Two-Way ANOVA Cont’d
STEP 4: COMPUTATION OF THE F TEST VALUE

Let summation of all observations for block j=1,b

Let summation of all observations for treatment i=1,k

-summation of all observations in both

treatment and blocks.

• is the overall mean

Slide 9
Procedure for Two-Way ANOVA Cont’d
• In Two-Way ANOVA, SST which measures the total
sample variability of the observations about the overall
mean can be partitioned into three main components

where
- Total Sum of Squares

- Treatment Sum of Squares


- Block Sum of Squares
- Error Sum of Squares
Slide 10
Procedure for Two-Way ANOVA Cont’d
• To obtain the F-value, first obtain the mean sum of squares as
follows:
• Mean square treatment (MSTR) =

• Mean squares blocks (MSB) =

• Mean squares errors (MSE) =

Slide 11
Procedure for Two-Way ANOVA Cont’d
• F test value for comparing treatment means is found by
dividing the means sum of squares for treatment by the
mean sum of squares for errors.

• F test value for comparing block means is found by


dividing the mean squares blocks by the mean sum
squares errors.

Slide 12
General Set-up of a Two-Way ANOVA
Source of Sum of Degrees of Mean Squares F ratio
Variation Squares freedom

Treatment SSTR k– 1 SSTR MSTR


MSTR = F=
k -1 MS E

Block SS B b–1 SS B MS B
MS B = F=
b -1 MS E

Error SSE (k – 1)(b – 1) SS E


MS E =
(k - 1)(b - 1)

Total SST n– 1
Slide 13
Procedure for Two-Way ANOVA Cont’d
STEP 5: Make your decision in the light of the decision rules and
summarize your results
• First set of hypothesis: equality of the treatment means. If
the calculated value of F for the treatment group is greater
than its respective critical value of F at a given level of
significance, then we reject the null hypothesis of equal
treatment means.
• Second set of hypothesis: equality of the block. If the
calculated value of F for the blocks is greater than its
respective critical value of F at a given level of
significance, then we reject the null hypothesis of equal
block means.
Slide 14
Example 1
Economists
In an annual meeting of Pharmaceutical A B C D
shareholders of Companies
pharmaceutical 1 7 10 8 15
companies in Ghana, 2 13 9 10 14
four economists were 3 11 8 6 9
asked to predict the 4 9 12 11 10
percentage increase in 5 9 8 13 12
share prices for five
pharmaceutical i. test the null hypothesis that the population
companies. Information mean percentage increases are the same for all
obtained from the the four economists
experiment is presented ii. Test the null hypothesis that the mean
in the table below: percentage increases are the same for all the
pharmaceutical companies. Take alpha to be
0.05.
Slide 15
Solution to Example 1
STEP 1: statement of null and alternative hypothesis

for some treatments

for some blocks


STEP 2: Determine the choice of statistical test to use
• Two-Way ANOVA since there are two independent
variables - mean percentage increase by economists and
mean percentage increase by company and there are more
than three group involved. Slide 16
Solution to Example 1 Cont’d
STEP 3: Determine the decision rules to and find the critical values
• To find the F critical value, we need to compute the degree of
freedoms. We have four treatments and therefore k= 4. Also, we have
5 blocks and therefore b=5. Thus,

• Alpha is equal to 0.05. Therefore reading from the F distribution


table, the F critical value for testing treatment means is

• The numerator and denominator degrees of freedom for the block


gives

Slide 17
Solution to Example 1 Cont’d
STEP 4: COMPUTATION OF THE F TEST VALUES
• Table for treatment and block totals
Economists
Pharmaceutical A B C D Totals for
Companies Blocks
1 7 10 8 15 40
2 13 9 10 14 46
3 11 8 6 9 34
4 9 12 11 10 42
5 9 8 13 12 42
Totals for T =204
treatments 49 47 48 60

Slide 18
Solution to Example 1 Cont’d

The first term of the formula gives:

Slide 19
Solution to Example 1 Cont’d

• ,
The 2nd term has already been computed to be 2080.80
• Thus

The 2nd term has already been computed to be 2080.80


• Thus,

Thus,
Slide 20
ANOVA Table
Source of Sum of Degrees of Mean Squares F ratio
Variation Squares freedom

Treatment 22 k–1=3 22 7.333


MSTR = = 7.333 F= = 1.294
3 5.667

Block 19.2 b–1=4 19.2 4.8


MS B = = 4.8 F= = 0.847
4 5.667

Error 68 (k – 1)(b – 1) 68
MS E = = 5.667
= 12 12

Total 109.2 n– 1= 19

Slide 21
Solution to Example 1 Cont’d
STEP 5: Make your decision in the light of the decision rules
and summarize your results
• Test of equality of the treatment means:, F calculated value
(1.294) is less than the F critical value (3.49) so we fail to
reject the null hypothesis of equal treatment means at the 5%
level of significance. The mean percentage increase in share
prices is the same for all economists.
• Test fo equality of the block means : F test value (0.847) is
less than the F critical value(3.26) so we fail to reject the null
hypothesis of equal block means at the 5% level of
significance. The mean percentage increase in share prices is
the same for all the pharmaceutical companies.
Slide 22
Activity 1
In 4 years, Unilever tested its Kinds of Soap
three different kinds of soap in
selected towns in Ghana. The Years A B C
table below shows the sales 1 70 50 60
recorded (in million Ghana
2 100 70 120
cedis) for each of the three
soaps sold in the four-year 3 60 80 90
period. You are required to use 4 40 90 150
Two-Way ANOVA to test the
hypotheses that the mean sales
for the three kinds of soap are
the same for all the 4 years.
Slide 23

You might also like