Readings for Lecture 5,
Topic
Chapters
Theory of F Distribution
Pages
Primarily from Instructor
F test for Differences in two Variances
11
458-467
One Way ANOVA Completely Randomized Design- Fixed Effects Model
Assumptions, Model and Rational
Calculations
Instructor
12
476-488
Lecture5, Notes
F Distribution: The F distribution is a test statistic
used in all ANOVA (Analysis of Variance)
techniques. It is also used in all multiple regression
techniques. An F statistic is the ratio of two chisquare variables divided by their degrees of
freedom.
In the example below, I have begun with two chisquare variables divided by their degrees of
freedom. Notice, that in the second step, especially
if the sample sizes were huge, that one would
expect to get a value reasonably close to one.
 n1  1 s12
s12
 12
 n1  1
 12 s12
F
 n2  1 s 22 s 22 s 22
 22
 22
 n2  1
(1)
(2) (3)
The third step is only true if 1 2 were equal to 2 2. If
it were true, then again, one would expect to get a
value close to one, especially if the sample sizes
1
were huge. If, however, 1 2 were not equal to 2 2,
then the value one would get by calculating s12/s22
would of course deviate from one. Thus a null
hypothesis is associated with this test statistic, and
it is
Ho: 1 2 = 2 2 versus
Ha: 1 2  2 2
This looks like a two-tailed test procedure, but,
because we always put the larger sample variance
on top (call it s12), we only employ the upper tail of
the F distribution.
Example:
Men
( X i X m )
Wom
en
( X i X w )
60
100
80
70
78
80
100
82
80
=200
Df=2
s w2 
=8
Df=3
8 2 200
sm 
 100
3
2
Notice, students, the sample size is
very small and even though the differences in the
sample variances are huge, we may not be able to
reject the null hypothesis of equality (because of
the small sample size, and thus large variability).
For this specific two tailed Ftest, the larger variance
is assigned to the numerator.
F  2,3  
100 300
 37.5
8/3
8
  0.02
2
In this case, in order to find the critical F value, we
actually look up the F value under /2. This is due
to the fact that we always put the larger value of
the sample variance on the top. In this case we look
up in the F tables under 0.01, with 2 degrees of
freedom for the numerator, and three degrees of
freedom for the denominator. The critical value
would be 30.816. Thus we would reject the Null
hypothesis of equality.
F0.01=30.82
37.5
Now, dear students:
If you have a test of hypothesis as follows:
Ho: 1 2  2 2
Ha: 1 2 > 2 2
You would again place the value of s12 on top, and then
use the upper tail value of alpha, not alpha divided by two.
And again if you have a test of hypothesis as follows.
Ho: 1 2  2 2
Ha: 1 2 < 2 2
You would need to change it to
Ho: 2 2  1 2
3
Ha: 2 2 > 1 2 and then proceed by using the upper tail
value of alpha,
not alpha/2
ANOVA  Analysis of Variance: We will begin our
study of ANOVAs by examining a Fixed Effects
Model, completely randomized design.
The null hypothesis associated with this ANOVA
technique might appear as follows:
Ho: 1= 2= 3
Ha: The null hypothesis is not true
The amazing thing is that we test the above
hypothesis by calculating the ratio of two
variances.
Ftest
sn2
 2
sd
In a fixed effects model, we are only interested in
the specific treatments. For example, imagine that
we are interest in selling a product, and we are
thinking of packaging the product in one of three
manners. We are only thinking of these 3 packaging
designs. That is, these packaging designs are not a
sample of three packaging designs from a world of
many possible packaging designs.
When we use the phrase, completely randomized
design, that means that the samples are assigned
to the treatments at random. Imagine for example,
that we have 30 stores who are willing to
participate in the above study. We could accomplish
a completely randomized design by randomly
assigning ten of these stores to sell the product
using packaging design one, the next ten randomly
4
picked stores, assign them packaging design two,
and the last ten are stuck with packaging 3
treatment. In a one way ANOVA we may have
anywhere from two treatments to r treatments.
Example of hypothesis that could be tested.
Testing that:
1) Teaching Techniques are equal.
2) Marketing techniques are equal.
3) Manufacturing Techniques are equal.
2
 12   22   32   e2   common
In an ANOVA technique, Ho may
not be true, but in order to test that hypothesis
about equality of means, the following
needs to be true.
That is, in order for us to test the hypothesis of equality of means, there
must exist equal population variances within each treatment group.
The model is Xij =  + j + e ij and E(Xij) =  + j . That is to say, the
average value in each treatment group is equal to the Grand mean plus
the treatment effect. Certainly there is variance around this mean. The
population variance (not necessarily the sample variance) around one
treatment mean is equal to the population variance around the other
treatment means. This variance term is labeled e2. An important
assumption of the Anova technique is that these error terms are normally
distributed with mean equal to zero and variance equal to e2. Notice,
there is no subscript j associated with the treatment group, that is the
amount of the variance is not an artifact of which treatment group we are
dealing with. All treatment groups have the same population variance.
You should know, however, that this test procedure is very robust in terms
of this assumption of equality of variances. That is, even if this assumption
is reasonable violated, the ANOVA procedure still yields appropriate
results.
Again, the Ftest that is used to test the hypothesis of equality of
population means is the following:
Ftest 
sn2
sd2
The estimate of the variance in the denominator is
a valid estimate of the common variance (  e )
regardless of whether or not the hypothesis of
equality of means is equal. This estimate of the
common variance turns out to be nothing more than
the average of the sample variances if the sample
sizes are equal. If they are not equal, the estimate
of the common variance used in the denominator is
the weighted average of the sample variances
(where the sample variances are weighted by their
degrees of freedom) .
2
But the variance in the numerator is an estimate of
the common variance only if Ho is true (1 = 2=3)
How do we get the value of s2 in numerator? Recall
that
 2x =
2
n
Therefore
 2=n  2x
And thus:
= n s 2x where n is the sample size used to
obtain the treatment means. This assumes that all
treatment means are the same.
s2
Let us refer to the example below. We have three
sample means; all obtained using a sample size of
five. We obtain the sample variance of these
means; we then multiply by five, and we have an
estimate of the common variance.
6
But it is only an estimate of the common variance,
if these means came from a population with the
same mean. Notice the argument below. In the first
picture the sample means (using a specific sample
size) came from one population, and the only
reason the means differ is due to the random
nature of the variable itself which causes the
sample means to vary.
But in the second picture, the reason that the
sample means vary is two-fold. The sample means
come from different worlds, and there is variability
in each of those worlds.
Same Population
1
2
3
Different Populations
x2
x1
If this is the reality, then the above estimate of the common
variance will not be valid because the sample means come
from different populations.
Example 1
Treatment:
(1)
 X1
(2)
 X2
(3)
 X3
20
30
40
10
100
20
100
30
100
30
100
40
100
50
100
15
25
25
25
35
25
25
25
35
25
45
25
Total
250
S 1=
250
=62 .5
4
250
=62 .5
4
250
250
S 2=
S 3=
250
=62 .5
4
S X =
200
=100
2
First: Find the estimate of the common variance
which is used in the denominator of the F ratio:
( n11 ) S21 + ( n2 1 ) S 22+ ( n31 ) S23
S =
( n1 1 ) + ( n21 )+ ( n31 )
2
P
1+ n 2+n 3
n
2
( n11 ) S 1 +( n2 1 ) S 22+ ( n31 ) S 23
=(4*62.5+4*62.5+4*62.5)/12=62.5
=(62.5+62.5+62.5)/3=62.5
since sample sizes are
equal.
Note: when we turn to the F table we have to have the
degrees of freedom used to estimate the common
variance in the denominator. In general that will be N-r,
where N is the total sample size and r is the number of
treatments. In this case the degrees of freedom are=
2+ n33
1+ n
n
= 12.
Now we have to obtain the estimate of the common
variance that is used in the numerator. Remember, this
estimate of the common variance is only a valid estimate
of e2 if the hypothesis of equality of means is equal.
= n s 2x = 5[(20-30)2 +(30-30)2 +(40-30)2 ]/(3-1)
= 5*(200/2) = 500
s
S 2n 500
Ftest = 2 =
=8
S d 62.5
For a ratio that is supposed to be close to
one, this value of 8 is quite large. How large would too
large be. We need to look up the F 0.05 value associated
with two degrees of freedom for the numerator and 12
degrees of freedom associated with the denominator.
F(=0.05,2,12)=3.89
P-value    reject Ho of equality of means
More background
X ij =+  j+l ij X ij = ^ + ^
 j + l^ij
X ij : Observation
^ : Estimate of the population mean
^
 J : Estimate of the treatment effect=
 sum of treatment effects equal 0
l^ij : estimate of an error term
^= X=30
^
 1=10 ^2=0 ^3=10
 X ij = ^ + ^
 j + l^ij
  ^ +( X ij ^ )
= ^ + ( ^
10
 ^
^
To calculate the grand sum of squares, we subtract the
grand mean from each observation and square, and then
sum the squares. Note the following.
(1)
(2)
( X  X )
( X  X )
(3)
( X  X )
X  X 
20
100
30
40
100
20
100
10
400
20
100
30
30
30
40
100
50
400
40
100
15
225
25
25
35
25
25
25
35
25
45
225
Total
750
250
750
200
The sum of squares total is equal to 1,750 = 750 +
250 + 750
The sum of squares total is equal to 1,750
Let us break the total sum of squares SST, down
into its two parts, SSW (sum of squares within) and
SSA (sum of squares among means)
r
nj
  (X ij X )2=SST =  [(
j=1 i=1
( X j X)
X ij  X j + 2
= (X ij  X j )2 + ( X j X )2 Note, the cross product is
equal to zero.
=SSW
SSA
r =3 (# of treatment groups)
SSA= ( X j X )2 =
 2
( X j X)
nj 
r
j=1
sizes are equal.
11
=n ( X j X )2 when the sample
Now remember SSA is equal to n ( X j X )2 given equal
samples sizes and turns out to be 5[(20-30)2 +(30-30)2
+(40-30)2 ] which is 1,000 and
SSW is  (X ij  X j )2 = 250 + 250 + 250 = 750
which are displayed in the table below
(1)
( X  X 1)
(2)
( X  X 2 )
(3)
( X  X 3 )
X  X 
20
30
40
20
100
10
100
20
100
30
100
30
30
100
40
100
50
100
40
100
15
25
25
25
35
25
25
25
35
25
45
25
Total
250
250
SSW = 250 + 250 + 250 = 750
1000
250
200
SSA = 5* 200 =
SST(Total sum of squares)=SSW +SSA
=750+5200=1750
So students we have arrived at SST two ways, first by
calculating it directly, and then by calculating the sum of
SSW Plus SSA
12
Now, we have two independent estimates of the common
variance:
MSW and MSA
MSW =
( n11 ) S21 + ( n21 ) S 22+ ( n31 ) S23
SSW
=S2P=
N r
( n11 ) + ( n 21 ) + ( n3 1 )
1+ n 2+n 3
n
2
( n11 ) S 1 + ( n2 1 ) S 22+ ( n31 ) S 23
r =3
N=
2+ n3
1+ n
n
MSA=
( X j X )2
SSA
=n
=n S2X
r1
r 1
if the sample means are
from one population. Expected value of MSA = 2
only if there are no treatment effects
Source of
variation
Sum
of
squar
es
Degrees
of
freedom
Among
group
SSA
r-1
Within
group
SSW
N-r
Total
SST
N-1
13
Mean
square
SSA
r1
EMS
E(MSA)=
j
nj
r
1
2
e +
r1 j=1
SSW
N r
E(MSW)=  2e
F=
MSA
MSW
MSA: Mean Square Due to Treatment if Ho of equal
means is true
MSW: Mean Square Due to Error regardless if Ho of
equal means is true
Homework: Lecture 5
A. Dear Students: This is going to be a challenge
that will shape up both your excel skills as well as
your knowledge of one way ANOVA. I want you to
duplicate the simulation spreadsheet that I have up
on blackboard associated with this one way ANOVA
lecture. The spreadsheet that I want you to
duplicate is the spreadsheet labeled Fixed RN (RN
stands for Random Numbers). Note that the
formulas in the spread sheet are not displayed.
They were displayed in class in the spreadsheet I
used.
Please form three teams of equal size, (you have
done that) and construct this simulation. I want you
to be able to demonstrate its proper working in
class.
Construct the simulated ANOVA table as a team,
following carefully the procedure demonstrated in
class. Please email the excel sheet to me once you
are done, with your names on it.
B. Prove that ^ 2e = s 2p = MSW=
SSW
, note, it is
N r
not necessary to assume that n1=n2=  nr. That
which you will obtain is a weighted average of the
sample variances, each sample variance weighted
by its degrees of freedom.
14
However, when one does assume that n1=n2 = . nr,
2
then s 2p = s 1
+ s 22
+  s 2r )/r Students, use
your notes.
1. Pg 467. # 19, 22, 23,24
2. P.468#25, 26, 29
Students for problems 3 through 6 do not use the
Tukey-Kramer procedure. Your work on each of
these problems is done after the conclusion
concerning the F test.
3. P.493#1
4. P.494#2, 3, 4, 5
5. P.495#9
6. P.496# 11
7.
(1)
( X  X )
(2)
( X  X )
(3)
( X  X )
X  X 
20
30
40
20
100
10
100
20
100
30
100
30
30
100
40
100
50
100
40
100
15
25
25
25
35
25
25
25
35
25
45
25
Total
250
SSW
MSW= N r =?
250
250
200
SSA
MSA= r1 =? F=?
8. Complete ANOVA Table using the following data
set.
15
20
30
40
10
20
30
30
40
50
15
25
35
25
35
45
18
33
44
22
27
36
9) Students: I now want you to study the ANOVA
simulation that I demonstrated in class. Study both
the spreadsheet labeled Fixed RN, and RNs vary.
a) I want you to increase the treatment effects,
determine what will happen to the F value, and its
corresponding p value associated with the overall
hypothesis, and be able to explain why.
b) I want you to increase the standard deviation of
the error term. Now again, tell me what will happen
to the F value and the corresponding p value
associated with the overall hypothesis, and be able
to explain why.
c) Now, tell me why the sum of the real error terms
is not zero, and the sum of the estimate of the error
terms is equal to zero. Tell me what is the
relationship between the estimate of the treatment
effect and the mean of the error terms.
16