0% found this document useful (0 votes)
11 views60 pages

RMB W4

The document discusses two-sample hypothesis testing to determine if the means of two groups differ, outlining independent and within-subjects designs. It explains the use of t-tests, including assumptions like homogeneity of variance, and introduces Levene’s Test for checking equal variances. The document also highlights the differences between independent samples t-tests and paired samples t-tests, along with their respective applications and considerations.

Uploaded by

laiba Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views60 pages

RMB W4

The document discusses two-sample hypothesis testing to determine if the means of two groups differ, outlining independent and within-subjects designs. It explains the use of t-tests, including assumptions like homogeneity of variance, and introduces Levene’s Test for checking equal variances. The document also highlights the differences between independent samples t-tests and paired samples t-tests, along with their respective applications and considerations.

Uploaded by

laiba Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

A two-sample hypothesis tests if two groups are different on some

measure (called a metric).


We ask: Is the average (mean) for one group different from the
other?

Examples:

• Bilinguals vs. monolinguals on working memory


• Alzheimer’s patients vs. healthy controls on spatial accuracy
• People judging smiling faces vs. neutral faces on trustworthiness

Group = The category or condition participants belong to


Metric = What we’re measuring (memory, accuracy, trust)
1. Between-subjects
• Different people in each group
• Each person gives 1 data point
• Example: Bilinguals vs Monolinguals on memory

2. Within-subjects
• Same people tested twice (once in each condition)
• Each person gives 2 data points
• Example: A person rates a smiling and a neutral face

• Why this matters: It tells us which t-test to use later


(independent or paired).
• We’re now looking at two separate
groups and their scores:

• Each group has a distribution (bell


curve)
• If the distributions don’t overlap
much, the groups are probably
different
• If they do overlap, the groups
might not be different enough
• t = (Mean of Group 1 - Mean of Group 2) /
Pooled Standard Error

• What does this mean?


• You subtract the group means to get the
difference
• Then you divide that by the uncertainty (standard
error) to get a t-value
• A big t-value means a big difference, with high
certainty
• t-value = “difference / noise”
• The pooled standard error is not the
difference between the two standard errors.

• Instead, it’s a combined estimate of


variability, based on the standard deviations
and sample sizes from both groups. We’re
pooling the information from both groups to
get a single, more stable estimate of how
Here’s the full formula for the pooled standard error of the difference between much uncertainty (or “noise”) there is in the
two independent means: difference between their means.

Where:
• s_1^2 = variance of Group 1 (i.e. standard deviation squared)
• s_2^2 = variance of Group 2
• n_1 = sample size of Group 1
• n_2 = sample size of Group 2

• We pool the variances (not standard errors) because we’re assuming that the
underlying population variability is the same in both groups. That’s the
assumption of homogeneity of variance. If this assumption is violated, we
use Welch’s t-test instead.
• Positive t → Group 1 > Group 2
• Negative t → Group 1 < Group 2
• t ≈ 0 → No real difference between the groups

• We compare this t-value to a critical value (based


on sample size & α level) to see if it’s statistically
significant.
When we use Student’s t-test (the regular
independent t-test), we’re assuming
something:

• That both groups (e.g., Group 1 and


Group 2) have similar spread in their
data (same variance or standard
deviation).

• This is called the assumption of


homogeneity of variance.

Why does it matter?

• Because if one group has much more


variability than the other (like wider
bars on a histogram), the math of
Student’s t-test becomes unreliable.
• Levene’s Test is like the gatekeeper. It checks:
“Do these two groups have equal variance?”

How it works:
• If the result is not significant: We say variances
are equal → use Student’s t-test
• If the result is significant (p < 0.05): It means
variances are different → Student’s t-test is not
appropriate

What to do if variances are unequal?


• → Use Welch’s t-test
Welch’s t-test is the cool, flexible cousin of the Student’s t-test.

• It does not assume equal variance


• It uses separate standard deviations for each group (called unpooled
variance)
• More accurate and reliable when your groups have different spreads

Use Welch’s t-test when:

• Your two groups are independent


• AND Levene’s test shows that their variances are significantly different
Use it when:
• The same people take part in both conditions
• E.g., test scores before and after training
• Or testing right vs left eye in the same person
• These aren’t two separate groups—they’re paired observations.

How it works: You don’t compare the groups directly. Instead, for each person:

• You subtract one condition from the other → get a difference score.
• You test if the average of those differences is significantly different from zero.
• Two Independent Groups, Equal Variance?→
Student’s t-test

• Two Independent Groups, Unequal Variance?


→ Welch’s t-test

• Same Group, Two Conditions? → Paired


Samples t-test
Independent
Samples
t-tests
Andrew Quinn – a.quinn@bham.ac.uk
Research Methods B
Outline & Objectives

1. Two sample hypotheses & 2. Independent Samples 3. Paired Samples T-Test


study design T-Tests
 A test for within subjects designs
 We are looking comparing the  How does a two-sample t-test
means of two different groups work?  (actually, a one-sample t-test in
disguise)
 Using data to answer questions  What are its assumptions?
 How can we run this in Jamovi?

8 Research Methods B
1. Two sample hypotheses.
Research Design
Comparing two groups
Two-Sample Hypotheses
Study design Study is important for statistics as many tests assume
that data observations are independent. That is, each
data sample is a separate observation.
If some observations are related, for example because
one participant did the experiment twice then this
changes how we can interpret the results.
Both are valid but we need to run a slightly different test
for each
• What is “study design” and why does it matter?
• In statistics, most tests assume that every data point is independent — meaning
Independent each value comes from a different participant who only did the test once. If that’s
not the case, we need to choose a different test.
Group 1 Not independent

Participants

Group 2

10 Research Methods B
Between Subjects Design
Each person contributes to a single condition • You start with a big pool of
participants.
• Randomly split them into two
groups.
• Each group experiences only one
condition.
Condition 1
• Example: Study: “Does music
improve concentration?”
• Group 1: Studies with music
• Group 2: Studies in silence
You compare their test scores.

Condition
• Each person only contributes one2
data point.

11 Research Methods B
Between Subjects Design
Each person contributes to a single condition

• Recruitment – Are the groups similar enough to


compare? If Group 1 has all older people and Group 2 Challenges with between subjects
has young adults, differences might be due to age, not
your condition.
Recruitment – are your participants in each condition
• Representation – Do the groups match your target
population? If you want results that apply to everyone, sufficiently similar?
both groups need to reflect that population (not just uni
students, for example). Representation – both groups need to represent your target
• Environment – Are testing conditions fair? If Group 1 population
does the test at 9 a.m. and Group 2 at 11 p.m., you’re
adding unfair variables. Environment – are your groups in sufficiently similar research
• Ethics – Is it fair to give one group a treatment and environments/contexts? Do they get the same instructions?
deny it to the other? In medicine, this can be especially
sensitive. Ethics – Is it ethical to treat the groups differently, eg in medical
12 Research Methods B
research, is it fair to withhold treatment from one group?
Between Subjects Design
Each person contributes to a single condition

Some solutions…
• Standardize your procedures: Make sure every
group has the same instructions, time limits, Rigorous & standardised experimental procedures –
conditions. make sure things really are the same each time
• Balance participant characteristics: Make sure
both groups have a fair mix of gender, age,
experience. Planning & recruitment – What aspects of your
• Random assignment: Let chance decide who goes participants do you need to balance? How can you
in each group — it prevents unconscious bias. ensure both groups are representative and balanced.

Allocation of groups – how can you fairly distribute


conditions amongst participants. Randomness is
normally good here
13 Research Methods B
Within Subjects Design
Everyone contributes to both conditions – repeated measures

Condition 1 Condition 2

• This is where every participant experiences both conditions. You test everyone twice (or more).

• Example: You want to test if red light vs blue light affects reading speed.

•Each person reads a passage under red light and then under blue light.
•You compare their two scores.

14 Research Methods B • This design reduces noise from individual differences (e.g., if someone is just naturally faster at
reading).
Within Subjects Design
Everyone contributes to both conditions – repeated measures

1. Practice Effects – People might do better the second time just


because they’ve done it before, not because of your manipulation.

2. Fatigue – People get tired or bored if the study is too long.


This affects performance. Challenges with repeated measures
3. Carryover Effects – The first condition might affect the
second. (If caffeine is given first, its effect might still be there during
the second condition.) Practice – Could people get better at your task with
repeated exposure?
To control for order effects, you can switch up the order of
conditions:
• Half of participants do Condition A → Condition B Fatigue – Could behaviour change as people get
• The other half do Condition B → Condition A bored or tired?
With more than 2 conditions (e.g., A, B, C), you can use all possible
orders: ABC, ACB, BAC, BCA, CAB, CBA. This balances out any Carryover – Could one condition impact the other?
effects from the order.
15 Research Methods B
Within Subjects Design
Everyone contributes to both conditions – repeated measures

Counterbalancing
Set 1 Condition 1 Condition 2 Systematically vary the order of conditions for
different participants.

Easy for two conditions but gets tricky for


complicated designs

eg with 3 conditions A, B and C there are already six


Set 2 Condition 2 Condition 1 counterbalancing orders to consider!
ABC, ACB, BAC, BCA, CAB, CBA

16 Research Methods B
Within and Between subjects design
Between subjects Within subjects
 Two independent groups of data points  Two dependent groups of data points
 Each participant is in a single group and  Each participant completes two conditions and contributes
contributes a single data point
two data points
 Sometimes called ‘repeated measures’
Pros
 Shorter participation per individual Pros

 Lower practice/order effects  Removes individual differences


 Need fewer participants
Cons
 Can be affected by individual differences in Cons
sampling
 Need to decide how to allocate participants to  Practice or order effects
groups
 Longer participation per individual
 Needs more participants

17 Research Methods
Hypotheses

A two-sample hypothesis is a statement about the


means of two different groups.
Typically, we’re interested in learning whether the
mean of the groups are different

A two-sample hypothesis compares two groups to see if their means


are different.
Example:

• Research question: Do footballers run faster than rugby players?


• Hypothesis: The average 200m sprint time for footballers is less
than that of rugby players.

18 Research Methods B
Independent Samples: Comparing two different groups

Example:
• Football players vs rugby players
Two-sample hypotheses • Reoffending rates in two different types of prisons

Dependent Samples: Comparing the same people under


two conditions

Independent Samples Example:


• Student attention span on long vs short school days
Football players run 200m faster than rugby players • Calorie intake before vs after a therapy session
Reoffending rates are lower for prisons that focus on
rehabilitation rather than punishment

Dependent Samples
Students' attention spans are longer on days with
fewer teaching sessions.
A new therapy increases calorie intake in people
with Anorexia Nervosa

19 Research Methods B
Two-sample NULL hypotheses

Independent Samples
Football players run 200m in the same time as
rugby players
Reoffending rates are the same for prisons that
focus on rehabilitation rather than punishment

Dependent Samples
Students' attention spans are the same on days with
fewer teaching sessions.
A new therapy doesn’t change calorie intake in
people with Anorexia Nervosa
For each of those research hypotheses, the null assumes
20 Research Methods B
there is no difference and This is what we try to disprove
using a t-test.
Hypotheses
• Your design affects which statistical test you can
• We need to think about our statistics from the start of the use.
study design phase of a project
• If you ignore whether your data is independent or
repeated, you might get misleading results.
• The design of the experiment impacts which statistics we
can use. Whether or not we have independent data
samples is a key example.

• Both are valid and appropriate for different scenario’

• The distinction must be clear when writing hypotheses

21 Research Methods B
Quantitative Methods

Independent
samples t-tests
When to
use a t-test
• Comparisons of two group means, or
Use a t-test when: a single mean to a reference value
• Data must be interval or ratio type
• You’re comparing means (either between two groups
or against a known value) • Assumptions must be met
• Your data is interval or ratio scale (e.g., time, weight,
score — not categories)
• Certain assumptions are met
⸻ We must be sure that the data we’re looking at have
t-test Assumptions. You can use a t-test if: both an interpretable mean and standard deviation to
run a t-test.
1. You have the right type of data (interval or ratio)
2. Data is normally distributed
3. Each observation is independent (for between- Nominal data have neither (mode is most appropriate)
subjects)
4. Both groups have equal variances
→ This is the homogeneity of variance assumption Ordinal data may have a mean, but the standard
→ If violated, use Welch’s t-test (which allows different deviation is hard to interpret (we don’t know the
variances) ‘distance’ between steps on the scale)
When to
use a t-test
• Comparisons of two group means, or
a single mean to a reference value
• Data must be interval or ratio type
• Assumptions must be met

Assumptions
• Appropriate data type
• Data are normally distributed (Normality)
• Data observations are independent (Independence)
• Groups have equal variance* (Equality of Variance)

* Welch’s test removes this assumption…


Comparing two independent groups This histogram shows two groups with slightly different
data distributions.
• The vertical axis = proportion of scores
• The horizontal axis = data values
• You’d use an independent samples t-test
here (each person belongs to only one group)

25 Research Methods B
Independent Samples T-Test


An independent samples t-test is difference between
the two means of two groups of data, all divided by the
standard error of that difference

It is a ratio between the size of the difference and the


precision to which it is estimated
• \bar{X}_1, \bar{X}_2 = group means
• S_p = pooled standard deviation
• N = number of participants in each group

This formula gives you a t-value, which tells you how big the difference
is compared to the noise (uncertainty).

𝑿𝟏 𝑿𝟐
t(df) = Mean of Mean of
𝟐
𝑺𝒑
𝑵
Group 1 Group 2

Pooled Standard error


of the difference
Independent Samples T-Test


The standard error of the difference is computed
using the pooled standard deviation of the two groups

The pooled standard deviation is a single standard


deviation to represent the variability in both groups –
assuming that both groups have the same variability…

• Large positive t-value = Group 1’s mean is much


higher
• Near 0 t-value = Groups are basically the same Mean of Mean of
• Large negative t-value = Group 1’s mean is much
lower Group 1 Group 2
• This helps you interpret the result — not just
whether there’s a difference, but in which
direction. Pooled Standard error
of the difference
Independent Samples T-Test


A large positive t-value indicates that:
• the mean of Group 1 is above than the mean of Group 2

A near zero t-value indicates that:


• the mean of Group 1 is indistinguishable from the mean of Group 2

A large negative t-value indicates that:


• the mean of Group 1 is below than the mean of Group 2

Mean of Mean of
Group 1 Group 2

Pooled Standard error


of the difference
The t-value is affected by:
Both changes in the difference between
the mean and changes in standard • How far apart the means are
deviation can modulate t-statistics • How much spread (variance) there is
in the scores

→ If means are close but spread is wide,


t-value might be small
→ If means are far and spread is tight, t-
value will be big

29 Research Methods B
Student’s t-test assumes Homogeneity of Variance
i.e. the distributions of the two groups have the same standard deviation
– is that always fair?

• Student’s t-test assumes equal variability in both


groups.
• If Group 1 has wide spread and Group 2 doesn’t,
this violates the assumption
• In that case, use Welch’s t-test, which adjusts for
this

30 Research Methods B
• Levene’s test is a statistical test that checks
Levene’s Test for whether two (or more) groups have equal
variability (or spread) in their data. This spread is
Homogeneity of Variance


called variance.

• Why is this important? Some tests, like the


Levene’s test assesses the null hypothesis that Student’s t-test, assume that both groups you’re
different groups of samples are from populations with comparing have similar variability. If this
equal variances. assumption is broken, the results of your test can
be unreliable.

Levene’s test checks this assumption.


• Null hypothesis (H₀): The variances of the groups
are equal
• Alternative hypothesis (H₁): The variances of the
A significant value for Levene’s test indicates that the groups groups are not equal
are likely to have different variances – suggesting that a
pooled estimate of standard deviation is not appropriate. How do you interpret the result?
• If p > 0.05, the test is not significant → You can
assume equal variances.
• If p < 0.05, the test is significant → You cannot
assume equal variances. Use a different test
(Welch’s t-test).

• So we should not use Student’s t-test. Example:


• Instead, we switch to Welch’s t-test which doesn’t • You’re comparing grey matter volume in young
require equal variance. vs. old adults.
• Levene’s test result: p = 0.004
• Since p < 0.05, this means the two groups don’t
Welch’s t-test


Welch’s test uses an UNPOOLED measure of
standard deviation which is valid when the groups
have different variance
• When should you use Welch’s t-test? Whenever
Levene’s test is significant (p < 0.05) and when
your group sizes are unequal and standard
The unpooled standard deviation valid whether the deviations are different.
groups have equal variances or not
Real Example: If Levene’s test says variances are
• Welch’s t-test is a modified version of the t-test that different (p = 0.004), then Welch’s t-test should be
does not assume equal variance. It’s more flexible and used:
safer when your groups are messy or have different
spreads. • Young: M = 43.2, SD = 1.63
• What does it calculate? Just like Student’s t-test, it
Mean
• Old:of
M = 40.2, SD = 2.05Mean of
• Welch’s t = 19.1, p < .001 → Significant
compares the means of two groups, but instead of
“pooling” the standard deviations, it uses each group’s
Group 1 between age groups
difference Group 2
own standard deviation.

• Formula: t = (Mean of Group 1 – Mean of Group 2) / Unpooled Standard error


Unpooled Standard Error
of the difference
Paired Samples t-test


A paired samples t-test follows the same principle of
the independent samples test.

It must be used when we’re comparing the means of


two dependent distributions – that is, when the same
participants have contributed to each condition.

In these cases, the assumptions of ‘independent


samples’ is violated in the standard t-test.

• It’s used when you’re testing the same people twice — before/after a treatment, two different conditions, etc. Because the two sets
of scores are linked (dependent), not from separate people. So we don’t compare the two means directly — we compare the
differences between the pairs of scores.

• Formula: t = (Mean of Differences – 0) / Standard Error of the Differences

• Example: You test people’s memory before and after they take a supplement:

• For each person, you calculate: After – Before


• You test if the average difference is significantly different from zero

Use Paired t-test when: You have a within-subjects design and each person does both conditions
Paired Samples t-test


This is really simple!

We simply take the difference between each pair of


samples and compute a one-sample t-test between the
paired difference and zero.

Jamovi and R can do this for us by specifying that we’re


running a paired samples test.

Mean of paired
0
differences

Standard error of the


mean paired difference
Data Skills & Coding

Worked Example
We’re comparing Grey Matter Volume in young vs old adults
using MRI scans.

What are we measuring? Grey Matter Volume


• Grey Matter Volume is the outcome variable (continuous)
• Age group (Young or Old) is the grouping variable
changes with age
(categorical)
We want to see if age makes a difference in grey matter volume.
We have a dataset of MRI scans from
people aged between 18 and 88 years old.

We can compute the percentage of each


persons brain that is composed of grey
matter, white matter and CSF.

How is that composition different between


younger and older adults?

https://cam-can.mrc-cbu.cam.ac.uk/
Analysis in Jamovi
We need 2 columns to do our analysis.
A categorical Grouping Variable
A continuous Outcome Variable

In this dataset…
We are comparing Grey Matter Volume between the
Old and Young groups specified in ‘AgeGroup’
Jamovi is a stats software where we:

• Choose the outcome variable (GM_Vol_Norm)


• Choose the grouping variable (AgeGroup)

We select Student’s t-test to start with, but we also check


Levene’s Test to make sure we’re meeting assumptions.

38 Research Methods B
Analysis in Jamovi
We need 2 columns to do our analysis.
A categorical Grouping Variable
A continuous Outcome Variable

In this dataset…
We are comparing Grey Matter Volume between the
Old and Young groups specified in ‘AgeGroup’

39 Research Methods B
Independent Samples T-Test
The results are in! the effect is massive
A t-value of 17 suggests that the difference is
enormous compared to the precision to which we can
estimate it from this data. Strong evidence for a real
effect.
Jamovi gives us:

• t-value = 17.7
• df = 569
• p < .001 → This is very significant, meaning there’s a real
difference between age groups.

• BUT… Levene’s test is also significant (p = 0.004) → This


tells us the variances are not equal. So even though the t-test
showed a strong difference, we can’t trust Student’s t-test
alone — we need to switch to Welch’s t-test.
40 Research Methods B
Independent Samples T-Test
The results are in! the effect is massive
A t-value of 17 suggests that the difference is
enormous compared to the precision to which we can
estimate it from this data. Strong evidence for a real
effect.

Descriptive plots give an intuitive picture of


the result

41 Research Methods B
Levene’s Test in detail

Our t-test hinted that we might have a


violation of our assumptions. Let’s run
Levene’s test ourselves for a closer look

42 Research Methods B
Visualising
homogeneity of
variance

Levene’s test
suggested a violation
of homogeneity of
variance but it is
always a good idea to
check the data out
yourself as well.

Descriptive statistics
and plots are a useful
tool for this

43 Research Methods B
Levene’s test said the variances are unequal — but we also want to see this.
Visualising In Jamovi:
homogeneity of • You check histograms or box plots for each group
• In this example, young people’s scores are more tightly packed (SD = 1.63)

variance
• Older people’s scores are more spread out (SD = 2.05)

This visual confirms what Levene’s test told us.

Levene’s test
suggested a violation
of homogeneity of
variance but it is
always a good idea to
check the data out
yourself as well.

Descriptive plots are a


useful tool for this

44 Research Methods B
Reporting Welch’s Test

“An independent samples t-test was used to compare the grey


matter volumes between the young and old participant groups.
Levene’s test on these variables suggested that the assumption
of equal variance was violated; F(1, 569) = 8.14, p=0.004.
Welch’s t-test showed that grey matter volume was higher in the
young (M=43.2, SD=1.63) compared to the old (M=40.2,
SD=2.05) groups; t(462) = 19.1, p<0.001.”

45 Research Methods B
Data to be analysed
Based on everything, the proper report is:

Overview
“An independent samples t-test was used to compare grey matter volume between young and old participants.
Levene’s test indicated unequal variances; F(1, 569) = 8.14, p = 0.004.
Welch’s t-test showed that grey matter volume was higher in the young (M = 43.2, SD = 1.63) than in the old (M = 40.2, SD = 2.05),
Compare one sample to
t(462) = 19.1, p < .001.” Compare two samples
reference
This statement clearly explains:
• Why we didn’t use Student’s t-test
• That we checked assumptions
Assumptions
• The final result is statistically significant

Shapiro-Wilk test
Normal Distribution
Consider non-parametric alternative Wilcoxon Rank Test
if assumption of normality is violated

• One sample test -> Wilcoxon Rank


• Ind sample test -> Mann-Whitney U One-Sample t-test

Levene’s test
Homogeneity of variance

Consider Welch’s t-test if groups do


not have comparable variance

46 Research Methods B
Data to be analysed

Overview
Compare one sample to
Compare two samples
reference

Assumptions Independent Samples Dependent Samples


Shapiro-Wilk test
Normal Distribution
Consider non-parametric alternative Wilcoxon Rank Test
if assumption of normality is violated *
• One sample test -> Wilcoxon Rank
• Ind sample test -> Mann-Whitney U One-Sample t-test
Mann-Whitney U
Wilcoxon Rank Test
Levene’s test
Homogeneity of variance Paired t-test
Consider Welch’s t-test if groups do
not have comparable variance
* Checking normality of
the paired-difference,
48 Research Methods B Student’s t-test Welch’s t-test not the data
Outline & Objectives

1. Two sample hypotheses & 2. Independent Samples 3. Paired Samples T-Test


study design T-Tests
 A test for within subjects designs
 We are looking comparing the  How does a two-sample t-test
means of two different groups work?  (actually, a one-sample t-test in
disguise)
 Using data to answer questions  What are its assumptions?
 How can we run this in Jamovi?

49 Research Methods B

You might also like