0% found this document useful (0 votes)
47 views46 pages

ANOVA & T-Test Presentation

The document discusses analysis of variance (ANOVA) and t-tests, which are statistical tests used to examine differences between groups. It covers parametric tests which assume normal distribution and equal variances, as well as checking these assumptions. Examples are provided on using t-tests and ANOVA to analyze satisfaction ratings and number of counseling sessions between two counselors.

Uploaded by

datum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views46 pages

ANOVA & T-Test Presentation

The document discusses analysis of variance (ANOVA) and t-tests, which are statistical tests used to examine differences between groups. It covers parametric tests which assume normal distribution and equal variances, as well as checking these assumptions. Examples are provided on using t-tests and ANOVA to analyze satisfaction ratings and number of counseling sessions between two counselors.

Uploaded by

datum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Lecture 2: Analysis of Variance (ANOVA) & T

test

Ephrem Mamo (MPHE, PhD Candidate)


Assistant Professor of Epidemiology

May, 2023
Addis Ababa, Ethiopia
Exposure
variable Parametric Non Parametric
Normal Skewed
Independent
2 groups Man whithey Test
sample T- test

Paired Wilcoxon test


Out come Paired t test

>2 groups Kruskal Wallis test


One-way ANOVAtest
Continuous
Spearman Corr / Linear Reg
Pearson Corr / Linear Reg
Continuous
Examining differences between two sets of scores
3

 We will look at statistical tests which tell us “if


there is a significant difference between two sets of
scores” or “before and after scores”

 For Independent sample – Scores on an Biostatistics


test comparing a sample of males and females, or

 For Related sample – Mental health scores before


and after a course of therapy – Measuring levels of
anxiety before and after counseling
Sample T-test and analysis of variance

 t-test and one-way (ANOVA) are basic tools for assessing the
statistical significance of differences between the average
values of a continuous outcome across two or more samples.
 t-test is used to test the association of a binary predictor and a
continuous outcome variable.
 ANOVA is used to test the association of a multi-level predictor with
a continuous outcome variable.
 Both are based in statistical theory for normally distributed
outcomes, but work well for many other types of data; and both
turn out to be special cases of linear regression models.
parametric tests
5

should only be used when the following conditions


apply:
1.Level of measurement is (continues) interval or
ratio
2.The scores approximate a normal distribution
3.The variance (spread of scores) within both groups
is relatively similar/not significantly different
(homogeneity of variance).
Independent Sample T -Test
6

e.g
 Comparing satisfaction ratings for the two

counselors
7

 E.g., Checking parametric test for Satisfaction


rating (more on normality test…)
 We can use
 1.Histogram; but if not sure… tests like …
 2.Kolmogorov-Smirnov and Shapiro-Wilk
How to check normality of data
8

1.Click Graphs> Chart Builder.

2.Select Histogram and drag the Simple Histogram into the preview area.

3.Drag the variable satisfaction to the X-axis.

4.Click Groups/Point ID and put a tick in the Rows panel ID checkbox.

5.This will produce a Panel box to the right of the preview area. Drag the variable

counselor into the Panel box.

6.Finally, to the right of your screen in the Element Properties section, place a

tick in the box next to Display normal curve and then click Apply.

7.Click OK.
How to check ......
9

 This can be quite subjective and


open to abuse (‘Well, it looked
normal to me’)
 •So, for a more objective measure of
“whether a distribution is normal”
 •we may refer to two further tests:
Kolmogorov-Smirnov and Shapiro-
Wilk.
 •If the results of these tests are
statistically significant then
 –this suggests that the distribution
deviates significantly from a normal
distribution.
Kolmogorov-Smirnov and Shapiro-Wilk.
10

To check by Tests of normality

1.Click Analyze -> Descriptive -> Explore

2.Move satisfaction into the dependent list and counsellor into the Factor list

3.Under display ensure that there is only a tick next to Plots.

4.Click on the Plots tab to open the plots dialogue box.

 Under Box plots click None

 Under Descriptive remove any ticks

 But tick in Normality plots with tests

 Under Spread Vs Level tick none

5.Click Continue, then OK.


SPSS out put
11

Since neither of these tests(KSor SW) includes any significant results


(all are > p-value of 0.05 in the sig. columns)
•this confirms our observations from the histograms, or simply our
data is sufficiently “normally distributed”.
Test of norm..
12

 So, having checked the data is normally distributed,


we are able to go to the (parametric) independent
samples t-test.
Running an independent samples t-test

13

1.Click Analyze> Compare Means > Independent samples t-test

2.Move satisfaction into Test Variable.

3.Move counsellor into Grouping Variable.

4.Click define Groups (since your Grouping Variable is counsellor, you


need to specify a numerical value for each, (1 for John, 2 for Jane).

5.Enter the value 1 (for John) in Group 1 and value 2 (for Jane)in
Group 2.

6.Click Continue> OK.


SPSS out-put
14

The group statistics table shows the number of students in each category and
•The Mean (mean score) for John’s >Jane’s. And the SD(Spread of
scores), Jane’s > John’s
What is the meaning???
Interpretation
15

So we can see that


 The satisfaction ratings were higher for patients who went to

see John, but is this a statistically significant difference?

Let us check
 Levene’s test for equality of variances. Recall that

‘homogeneity of variance’.

 The value (0.281) shows it is not significant; the variance


within both groups of scores(ratings for John and Jane) is
relatively similar.
Interpretation...
16

Next will be P-value


 P-value= 0.015
 •Therefore a smaller value than our conventional significance
level of 0.05, we may conclude that “there was a significant
difference in scores from the two groups”.
Interpretation
17

We might write this up in a report thus:

 –“An independent samples t-test was used to

examine differences in satisfaction ratings for each of

the counselors. The results showed that ratings for

John were higher (M = 4.86; SD = 1.23) than those

for Jane (M = 3.5; SD = 1.59); t = 2.58, p = .015.”


18

When we come to CI (Confidence interval)

 –So, although the mean difference for our data was 1.36, the

CIs suggest that, “if we repeated this data collection 100 times,

we would expect to find a difference in satisfaction ratings for

John and Jane between 0.28 and 2.43 on 95% of occasions”.

 •Hence, “we are 95% confident that the difference would lie

between 0.28 and 2.43”.


Comparing the number of sessions for each counsellor

19

E.g., Checking parametric test for number of sessions (more on


normality test…)

1.Histogram; but if not sure… tests like …


2.Kolmogorov-Smirnovand Shapiro-Wilk
Comparing the number….
20

 The next difference we decided to examine is the


number of counselling sessions conducted by each
counsellor.

 Is one offering more sessions than the other? Perhaps


this could help to explain why one received higher
satisfaction ratings?
Comparing the number…..
21

To check by Histogram

1.Click Graphs> Chart Builder.

2.Select Histogram and drag the Simple Histogram into the preview area.

3.Drag the variable number of session to the X-axis.

4.Click Groups/Point ID and put a tick in the Rows panel ID checkbox.

5.This will produce a Panel box to the right of the preview area. Drag the variable

counsellor into the Panel box.

6.Finally, to the right of your screen in the Element Properties section, place a tick in the

box next to Display normal curve and then click Apply.

7.Click OK.
SPSS out-put
22

What will looks like the graph?

we can see that the histogram for Jane


approximates a normal distribution,

the histogram for John’s sessions


suggests that the data is skewed
towards the lower end.

Thus we have a prime candidate for


confirmation using the tests of
normality statistics :
Tests of normality
23

Kolmogorov-Smirnov and Shapiro-Wilk

1. Click Analyze> Descriptive> Explore.

2.Move number of sessions into the dependent list and counsellor into the Factor list.

3.Under Display ensure that there is only a tick next to Plots.

4.Click on the Plots tab to open the plots dialogue box.

Under Box plots click None,

Under Descriptive remove any ticks .

But tick in Normality plots with tests.

Under Spread Vs Level tick none.


SPSS out put
24

Indeed, when we do produce the Tests of Normality we find that the K-S
test is actually significant (0.014) suggesting that the distribution is
significantly deviating from a normal distribution.
It might therefore be more appropriate to use the test for non-parametric
data which, is the (Mann-Whitney test).
Running the Mann-Whitney U test
25

1.Click Analyze> Nonparametric Tests > 2 Independent Samples.

2.Move sessions into Test Variable.

3.Move counsellor into Grouping Variable.

4.Click define Groups; as we did for the t-test before.

5.Enter the value 1 (for John) in Group 1 and value 2 (for Jane) in Group 2. >

Continue.

6.Click the Exact tab to open the Exact tests dialogue box and click on the button

next to Exact (the Exact test is more appropriate for smaller samples). > Continue.

7.Ensure the Mann-Whitney U testis selected (it should be). >OK.


SPSS output
26
Interpretation
27

 The first table is about how the satisfaction ratings were ranked. The
procedure for calculating the Mann-Whitney U statistic involves ranking all
the scores from both groups
 in order of magnitude and then calculating the mean rank for each of the
two groups.
 So, for John we can see that the mean ranking for his session is 20.61, but
for Jane it is much lower at 11.03.
 In the second table, we are interested in the Mann-Whitney U (40.5) and the
Exact sig. (2 tailed) which is p = 0.002, values.
Interpretation ...
28

 We would write-up the results of this analysis, first, by referring to the

median number of counselling sessions because (Data is not normally

distributed)

 We would then proceed to cite the relevant statistics from the Mann-

Whitney U Test.

 When the number of sessions conducted by the two counsellors was

examined it was found that the median number of sessions conducted

by John was 8.5 compared to 6.0 for Jane.


Interpretation ...
29

 The Mann-Whitney Test found this difference to be


statistically significant: U = 40.5, p = .002.
 This result might then be related to; The higher
satisfaction ratings for John, in other words, perhaps
 The higher satisfaction ratings related to the greater
number of sessions he conducted with his patients?
Analysis of Variance (ANOVA)

 ANOVA is a method to determine if three or more population


means are equal:the procedure to consider means from “k”
independent groups, where k is more than 2.

 Depending upon the type of analysis, it may be important to


determine:
1. Which factors have a significant effect on the response,
and/or
2. How much of the variability in the response variable is
attributable to each factor
ANOVA…
 For example, in an experimental study, various treatments are
applied to test subjects and the response data is gathered for
analysis.

 It enables a researcher to differentiate treatment results based on


easily computed statistical quantities from the treatment
outcome.

31
Assumptions
 Observed data constitute independent random sample from the
respective populations.
 Each of the populations from which the sample comes is
normally distributed.
 Each of the populations has similar variance/not significantly
different (homogeneity of variance).

32
How does ANOVA work?
 Instead of dealing with means as data points it deal with
variation
 There is variation (variance) within groups (data)
 There is variation between group means
 If groups are equivalent then the variance between and within
groups will be equal.
 Expected variation is used to calculate statistical significance
as expected differences in means are used in t-tests
33
One-way ANOVA
34

 The One-Way ANOVA compares the mean of three or more groups


based on one IV (or factor)

 Between Groups Variance – measures how groups vary about the


grand mean

 Within Groups Variance – measures how scores in each group


vary about the group mean.
 A one-way ANOVA test of the null hypothesis requires
calculation of the following quantities
 Between –groups sum of squares (SSB)

 Within-groups sum of squares (SSW)

 Between-groups degrees of freedom = k-1


 Within-groups degrees of freedom = n-k, where n = n1 +
n2 + n3 + … + n k
One Way ANOVA…
 Between -groups mean square (MSB)

 Within-groups mean square (MSW)

 The F-statistic is used to test the null


hypothesis

36
One Way ANOVA….

 If calculated F > F tabulated→ reject HO.


 F tabulated depends on α, degree of freedom for the numerator and
degree of freedom for the denominator
 Follow-up Procedures
 “Significant” F only tells us there are differences, not where specific
differences lie.
One Way ANOVA…

 As MSW gets larger, F gets smaller


 As MSW gets smaller, F gets larger

 So, as F gets smaller, the groups are less distinct.


 As F gets larger, the groups are more distinct.
38
 Example: The following table shows the natural killer cell activity
measured for three groups of subjects: those who had low, medium,
and high scores on the social readjustment rating scale.
Low score Moderate High score
score
22.2 15.1 10.2
97.8 23.2 11.3
29.1 10.5 11.4
37.0 13.9 5.3
35.8 9.7 14.5
44.2 19.0 11.0
88.0 19.8 13.6
56.0 9.1 33.4
9.3 30.1 25.0
19.9 15.5 27.0
39.5 10.3 36.3
12.8 11.0 17.7
37.4
Mean 40.23 15.60 18.06
SD 25.71 6.42 9.97
Example…
1. μ1 = μ2= μ3 (the mean natural killer cell activity is
equal in the three groups)
2. μ1 ≠μ2 or μ2≠μ3 or μ1≠μ3 (the means are not equal)
3. The data is collected
4. The test statistic is the F-statistic
5. α = 0.01
6. Critical value with d.f. k-1=2 and n-k=34 is F0.01;
2, 34 = 5.31
Example…

41 Dube Jara (Assistant Professor), For MPH


Example…
 Between-groups d.f. = k-1 = 3-1 = 2
 Within-groups d.f. = n-k = 37-3 = 34
 MSB = SSB/k-1 = 4653.57/2 = 2326.79
 MSW = SSW/n-k = 9478.84/34 = 278.79
 F = MSB/MSW = 2326.79/278.79 = 8.35
 Since Fcal = 8.35 > Ftab = 5.31 we reject H0. There is a
difference in mean natural killer cell activity among patients
with low, moderate and high scores on the social readjustment
rating scale.
 Note: that rejecting H0 does not tell us which group means
differ
SPSS
43

 E.g., we are interested to compare the mean serum progesterone after treatment, for

the four treatment types(0,1,2 and 3)

 To run on SPSS

 –Click Analyze> Compare means > one way ANOVA

 –Move serum progesterone to dependent list box and treatment type to the factor

box

 –Click post-hoc; tick on LS D and Tukey’s> Continue

 –Click option; and tick on Descriptive and homogeneity of variance tests > Continue

 –Click OK
SPSS output
44
Interpretation
45

 The first table is about descriptive…

 •We are interested in the third table…

 –The test is statistically significant with F=232.043, Sig=0.000

 –Therefore, we can conclude that “there is statistically significant

difference among the treatment means on serum progesterone;

F=232.043, Sig=0.000”.

 •We can see the post-hoc to see each pair of mean difference (Like tukey’s

and LSD…)
46

You might also like