Basic statistics: a survival
guide
Tom Sensky
HOW TO USE THIS POWERPOINT
PRESENTATION
The presentation covers the basic statistics
you need to have some understanding of.
After the introductory slides, youll find two
slides listing topics.
When you view the presentation in Slide
show mode, clicking on any topic in these
lists gets you to slides covering that topic.
Clicking on the
symbol (in the top right
corner of each slide still in slide show
mode) gets you back to the list of topics.
HOW TO USE THIS POWERPOINT
PRESENTATION
You can either go through the slide
show sequentially from the start
(some topics follow on from those
before) or review specific topics when
you encounter them in your reading.
A number of the examples in the
presentation are taken from PDQ
Statistics, which is one of three basic
books I would recommend (see next
page).
RECOMMENDED RESOURCES
The books below explain statistics simply,
without excessive mathematical or logical
language, and are available as inexpensive
paperbacks.
Geoffrey Norman and David Steiner. PDQ 1
Statistics. 3rd Edition. BC Decker, 2003
David Bowers, Allan House, David Owens.
Understanding Clinical Papers (2nd Edition).
Wiley, 2006
Douglas Altman et al. Statistics with
Confidence. 2nd Edition. BMJ Books, 2000
PDQ stands for Pretty Darn Quick a series of publications
AIM OF THIS PRESENTATION
The main aim has been to present the
information in such a way as to allow
you to understand the statistics
involved rather than having to rely on
rote learning.
Thus formulae have been kept to a
minimum they are included where
they help to explain the statistical
test, and (very occasionally) for
convenience.
You may have to go through parts of
the presentation several times in order
to understand some of the points
BASIC STATISTICS
Types of data
ANOVA
Normal distribution
Repeated measures ANOVA
Describing data
Non-parametric tests
Mann-Whitney U test
Boxplots
Summary of common tests
Standard deviations
Summaries of proportions
Skewed distributions
Odds and Odds Ratio
Parametric vs Non-parametric
Sample size
Absolute and Relative Risks
Statistical errors
Number Needed to Treat (NNT)
Power calculations
Confidence intervals (CIs)
Clinical vs statistical significanceCI (diff between two proportions)
Two-sample t test
Correlation
Problem of multiple tests
Regression
Subgroup analyses
Paired t test
Chi-square test
Logistic regression
Mortality statistics
Survival analysis
TYPES OF DATA
VARIABLES
QUANTITATIVE
RATIO
Pulse rate
Height
QUALITATIVE
INTERVAL ORDINAL NOMINAL
36o-38oC Social class Gender
Ethnicity
NORMAL DISTRIBUTION
THE EXTENT OF THE
SPREAD OF DATA
AROUND THE MEAN
MEASURED BY THE
STANDARD
DEVIATION
MEAN
CASES DISTRIBUTED
SYMMETRICALLY
ABOUT THE MEAN
AREA BEYOND TWO
STANDARD
DEVIATIONS ABOVE
THE MEAN
DESCRIBING DATA
MEAN
Average or arithmetic mean of the
data
MEDIAN
The value which comes half way
when the data are ranked in order
MODE
Most common value observed
In a normal distribution, mean and median
are the same
If median and mean are different, indicates
that the data are not normally distributed
The mode is of little if any practical use
BOXPLOT
(BOX AND WHISKER PLOT)
97.5th
Centile
75th Centile
12
Pain (VAS)
10
8
6
MEDIAN
(50th
centile)
25th Centile
4
2
0
-2
N=
Interquartile
range
74
27
Female
Male
2.5th Centile
STANDARD DEVIATION
MEASURE OF THE SPREAD OF
THE SQUARE OF
2
VALUES OF A SAMPLE
AROUND
THE SD IS
Sum(Value Mean)
KNOWN
AS THE
SD
THE
MEAN
VARIANCE
Numberof values
SD decreases as a function
of:
smaller spread of values
about the mean
larger number of values
IN A NORMAL
DISTRIBUTION, 95%
OF THE VALUES WILL
LIE WITHIN 2 SDs OF
THE MEAN
STANDARD DEVIATION AND
SAMPLE SIZE
As sample
size
increases, so
SD decreases
n=150
n=50
n=10
SKEWED DISTRIBUTION
MEAN
MEDIAN 50%
OF VALUES WILL
LIE ON EITHER
SIDE OF THE
MEDIAN
DOES A VARIABLE FOLLOW A
NORMAL DISTRIBUTION?
Important because parametric statistics
assume normal distributions
Statistics packages can test normality
Distribution unlikely to be normal if:
Mean is very different from the median
Two SDs below the mean give an
impossible answer (eg height <0 cm)
DISTRIBUTIONS: EXAMPLES
NORMAL
DISTRIBUTION
SKEWED
DISTRIBUTION
Height
Weight
Haemoglobin
Bankers bonuses
Number of
marriages
DISTRIBUTIONS AND
STATISTICAL TESTS
Many common statistical tests rely on the
variables being tested having a normal distribution
These are known as parametric tests
Where parametric tests cannot be used, other,
non-parametric tests are applied which do not
require normally distributed variables
Sometimes, a skewed distribution can be made
sufficiently normal to apply parametric statistics
by transforming the variable (by taking its square
root, squaring it, taking its log, etc)
EXAMPLE: IQ
Say that you have tested a sample of
people on a validated IQ test
The IQ test has
been carefully
standardized on a
large sample to
have a mean of 100
and an SD of 15
94
97
100
SD
103
106
Sumof (Individual Value- MeanValue)2
Numberof values
EXAMPLE: IQ
Say you now administer the test
to repeated samples of 25 people
Expected random variation of
these means equals the
Standard Error
SE
SD
SampleSize
15
94
97
100
103
106
25
3.0
STANDARD DEVIATION vs
STADARD ERROR
Standard Deviation is a measure
of variability of scores in a
particular sample
Standard Error of the Mean is an
estimate of the variability of
estimated population means taken
from repeated samples of that
population (in other words, it
gives an estimate of the precision
of the sample mean)
See Douglas G. Altman and J. Martin Bland. Standard
deviations and standard errors. BMJ 331 (7521):903,
2005.
EXAMPLE: IQ
One sample of 25 people yields a mean
IQ score of 107.5
94
97
100
103
106
What are the chances
of obtaining an IQ of
107.5 or more in a
sample of 25 people
from the same
population as that on
which the test was
standardized?
EXAMPLE: IQ
How far out the sample IQ is in the
population distribution is calculated as the
area under the curve to the right of the
sample mean:
SampleMean- Population
Mean
StandardError
94
97
100
103
106
107.5- 100
3.0
2.5
This ratio tells us
how far out on the
standard distribution
we are the higher
the number, the
further we are from
EXAMPLE: IQ
94
97
Look up this figure (2.5) in a table
of values of the normal
distribution
From the table, the area in the
tail to the right of our sample
mean is 0.006 (approximately
1 in 160)
This means that there
is a 1 in 160 chance
that our sample mean
came from the same
population as the IQ
100 103 106
test was standardized
on
EXAMPLE: IQ
This is commonly referred to as p=0.006
94
97
100
By convention, we accept as
significantly different a sample
mean which has a 1 in 20
chance (or less) of coming from
the population in which the
test was standardized
(commonly referred to as
p=0.05)
Thus our sample had
a significantly
103 106
greater IQ than the
reference population
(p<0.05)
EXAMPLE: IQ
If we move the
sample mean (green)
closer to the
population mean
(red), the area of the
distribution to the
right of the sample
mean increases
94
97
100
103
106
Even by inspection,
the sample is more
likely than our
previous one to come
from the original
COMPARING TWO SAMPLES
SAMPLE
A MEAN
SAMPLE
A
In this case, there is
very little overlap
between the two
distributions, so they
are likely to be
different
SAMPLE
B MEAN
SAMPLE
B
COMPARING TWO SAMPLES
Returning to the IQ example, lets say that we
know that the sample we tested (IQ=107.5)
actually came from a population with a mean
IQ of 110
100
107.5
110
SAMPLES AND POPULATIONS
Repeatedly measuring small
samples from the same
population will give a normal
distribution of means
The spread of these small
sample means about the
population mean is given
by the Standard Error, SE
SE
SD
SampleSize
COMPARING TWO SAMPLES
We start by assuming that our sample came from
the original population
Our null hypothesis (to be tested) is that
IQ=107.5 is not significantly different from
IQ=100
100
107.5
110
COMPARING TWO SAMPLES
The area under the standard population curve to the
right of our sample IQ of 107.5 represents the likelihood
of observing this sample mean of 107.5 by chance under
the null hypothesis ie that the sample is from the
standard population
This is known as
the level and is
normally set at
0.05
If the sample
100
107.5
110
comes from
the standard
population,
we expect to
find a mean
of 107.5 in 1
out of 20
estimates
COMPARING TWO SAMPLES
It is perhaps easier to conceptualise by seeing what
happens if we move the sample mean
Sample mean is
closer to the
red population
mean
100
Area under the curve
to the right of
sample mean() is
bigger
The larger ,
the greater
the chance
that the
sample
comes from
the Red
population
110
COMPARING TWO SAMPLES
The level represents the probability of finding a
significant difference between the two means when
none exists
This is known as
a Type I error
100
107.5
110
COMPARING TWO SAMPLES
The area under the other population curve (blue) to
the left of our sample IQ of 107.5 represents the
likelihood of observing this sample mean of 107.5 by
chance under the alternative hypothesis (that the
sample is from the other population)
This is known
as the level
and is normally
set at 0.20
100
107.5
110
COMPARING TWO SAMPLES
The level represents the probability of not finding a
significant difference between the two means when one
exists
This is known as a Type II error
(usually due to inadequate
sample size)
100
107.5
110
COMPARING TWO SAMPLES
Note that if the population sizes are reduced, the
standard error increases, and so does (hence also the
probability of failing to find a significant difference
between the two means)
This increases
the likelihood of
a Type II error
inadequate
sample size is
the most
common cause
of Type II errors
100
107.5
110
STATISTICAL ERRORS:
SUMMARY
False positive
Find a significant difference even
Type I
()
though one does not exist
Usually set at 0.05 (5%) or 0.01
(1%)
False negative
Fail to find a significant difference
Type II ()
even though one exists
Usually set at 0.20 (20%)
Power = 1 (ie usually 80%)
Remember that power is related to sample size because
a larger sample has a smaller SE thus there is less
overlap between the curves
SAMPLE SIZE: POWER
CALCULATIONS
Using the standard =0.05 and =0.20, and having
estimates for the standard deviation and the difference
in sample means, the smallest sample size needed to
avoid a Type II error can be calculated with a formula
POWER CALCULATIONS
Intended to estimate sample size
required to prevent Type II errors
For simplest study designs, can apply a
standard formula
Essential requirements:
A research hypothesis
A measure (or estimate) of variability
for the outcome measure
The difference (between intervention
and control groups) that would be
considered clinically important
STATISTICAL SIGNIFICANCE IS
NOT NECESSARILY CLINICAL
SIGNIFICANCE
Sample
Populatio
Sample
p
Size
n Mean
Mean
100.0
110.0
0.05
25
100.0
104.0
0.05
64
100.0
102.5
0.05
400
100.0
101.0
0.05
2,500
100.0
100.4
0.05
10,000
100.0
100.2
0.05
CLINICALLY SIGNIFICANT
IMPROVEMENT
Large proportion of
patients improving
Hugdahl & Ost
(1981)
A change which is large in
magnitude
Barlow (1981)
An improvement in
patients everyday
functioning
Kazdin &
Wilson (1978)
Reduction in symptoms by
50% or more
Jansson & Ost
(1982)
Elimination of the
presenting problem
Kazdin &
Wilson (1978)
MEASURES OF CLINICALLY
SIGNIFICANT IMPROVEMENT
ABNORMAL
POPULATION
DISTRIBUTION
OF
DYSFUNCTION
AL SAMPLE
FIRST POSSIBLE CUT-OFF:
OUTSIDE THE RANGE OF
THE DYSFUNCTIONAL
POPULATION
AREA BEYOND TWO
STANDARD
DEVIATIONS ABOVE
THE MEAN
MEASURES OF CLINICALLY
SIGNIFICANT IMPROVEMENT
SECOND POSSIBLE CUTABNORMAL NORMAL
POPULATION POPULATIO
b
c
a
N
OFF: WITHIN THE RANGE
OF THE NORMAL
POPULATION
THIRD POSSIBLE CUT-OFF:
MORE WITHIN THE
NORMAL THAN THE
ABNORMAL RANGE
DISTRIBUTION OF
FUNCTIONAL
(NORMAL)
SAMPLE
UNPAIRED OR INDEPENDENTSAMPLE t-TEST: PRINCIPLE
The two
distributions are
widely separated so
their means clearly
different
The distributions
overlap, so it is
unclear whether the
samples come from
the same population
Differencebetweenmeans
t
SE of thedifference
In essence, the t-test
gives a measure of the
difference between the
sample means in
relation to the overall
UNPAIRED OF INDEPENDENTSAMPLE t-TEST: PRINCIPLE
SE
Differencebetweenmeans
t
SE of thedifference
SD
SampleSize
With smaller
sample sizes, SE
increases, as does
the overlap
between the two
curves, so value of
t decreases
THE PREVIOUS IQ EXAMPLE
In the previous IQ example, we
were assessing whether a
particular sample was likely to
have come from a particular
population
If we had two samples (rather
than sample plus population), we
would compare these two samples
using an independent-sample ttest
MULTIPLE TESTS AND TYPE I
ERRORS
The
risk of observing by chance a
difference between two means
(even if there isnt one) is
This risk is termed a Type I error
By convention, is set at 0.05
For an individual test, this becomes
the familiar p<0.05 (the probability
of finding this difference by chance
is <0.05 or less than 1 in 20)
However, as the number of tests
rises, the actual probability of
finding a difference by chance rises
markedly
Tests (N)
0.05
0.098
0.143
0.185
0.226
0.264
10
0.401
20
0.641
SUBGROUP ANALYSIS
Papers sometimes report analyses of
subgroups of their total dataset
Criteria for subgroup analysis:
Must have large sample
Must have a priori hypothesis
Must adjust for baseline differences
between subgroups
Must retest analyses in an
independent sample
TORTURED DATA - SIGNS
Did the reported findings result from testing a
primary hypothesis of the study? If not, was
the secondary hypothesis generated before
the data were analyzed?
What was the rationale for excluding various
subjects from the analysis?
Were the following determined before looking
at the data: definition of exposure, definition
of an outcome, subgroups to be analyzed, and
cutoff points for a positive result?
Mills JL. Data torturing. NEJM 329:1196-1199,
1993.
TORTURED DATA - SIGNS
How many statistical tests were
performed, and was the effect of multiple
comparisons dealt with appropriately?
Are both P values and confidence
intervals reported?
And have the data been reported for all
subgroups and at all follow-up points?
Mills JL. Data torturing. NEJM 329:1196-1199,
1993.
COMPARING TWO MEANS FROM
THE SAME SAMPLE-THE PAIRED t
TEST
Assume that A and B represent
Subject
10
11
60
65
27
31
measures on the same subject
(eg at two time points)
Note that the variation
between subjects is much
wider than that within subjects
ie the variance in the columns
swamps the variance in the
rows
Treating A and B as entirely
separate, t=-0.17, p=0.89
Treating the values as paired,
t=3.81, p=0.03
SUMMARY THUS FAR
ONE-SAMPLE
(INDEPENDENT
SAMPLE) t-TEST
Used to compare means
of two independent
samples
Used to compare two
PAIRED (MATCHED
(repeated) measures
PAIR) t-TEST
from the same subjects
COMPARING PROPORTIONS:
THE CHI-SQUARE TEST
A
Number of
patients
100
50
Actual %
Discharged
15
30
Actual
number
discharged
15
15
Expected
number
discharged
Say that we are
interested to know
whether two
interventions, A and B,
lead to the same
percentages of patients
being discharged after
one week
COMPARING PROPORTIONS:
THE CHI-SQUARE TEST
A
Number of
patients
100
50
Actual %
Discharged
15
30
Actual
number
discharged
15
15
Expected
number
discharged
20
10
We can calculate the
number of patients in each
group expected to be
discharged if there were no
difference between the
groups
Total of 30 patients
discharged out of 150 ie
20%
If no difference between
the groups, 20% of
patients should have been
discharged from each
group (ie 20 from A and
10 from B)
These are the expected
COMPARING PROPORTIONS:
THE CHI-SQUARE TEST
2
(Observed- Expected)
Sum
Expected
Number of
patients
100
50
(15 20)2 (15 10)2
20
10
Actual %
Discharged
15
30
Actual
number
discharged
15
15
Expected
number
discharged
20
10
25 25
1.25 2.5 3.75
20 10
According to tables, the
minimum value of chi
square for p=0.05 is 3.84
Therefore, there is no
significant difference
between our treatments
COMPARISONS BETWEEN THREE
OR MORE SAMPLES
Cannot use t-test (only for 2 samples)
Use analysis of variance (ANOVA)
Essentially, ANOVA involves dividing the
variance in the results into:
Between groups variance
Within groups variance
Measureof BetweenGroupsvariance
F
Measureof WithinGroupsvariance
The greater F, the more significant the
result (values of F in standard tables)
ANOVA - AN EXAMPLE
BetweenGroup
Variance
WithinGroup
Variance
Here, the between-group variance
is large relative to the withingroup variance, so F will be large
ANOVA - AN EXAMPLE
WithinGroup
Variance
BetweenGroup
Variance
Here, the within-group variance is
larger, and the between-group
variance smaller, so F will be smaller
(reflecting the likeli-hood of no
significant differences between
these three sample means
ANOVA AN EXAMPLE
Data from SPSS sample
data file dvdplayer.sav
Age
Group
Mean
SD
Focus group where 68
participants were asked
to rate DVD players
18-24
13
31.9
5.0
25-31
12
31.1
5.7
Results from running
One Way ANOVA (found
under Compare Means)
32-38
10
35.8
5.3
39-45
10
38.0
6.6
Table shows scores for
Total DVD assessment
by different age groups
46-52
12
29.3
6.0
53-59
11
28.5
5.3
Total
68
32.2
6.4
ANOVA SPSS PRINT-OUT
Data from SPSS print-out shown below
Sum of
Squares
df
Mean
Square
Sig.
Between
Groups
733.27
146.65
4.60
0.0012
Within Groups
1976.42
62
31.88
Total
2709.69
67
Between Groups Sum of Squares concerns
the variance (or variability) between the
groups
Within Groups Sum of Squares concerns
the variance within the groups
ANOVA MAKING SENSE OF THE
SPSS PRINT-OUT
Sum of
Squares
df
Mean
Square
Sig.
Between
Groups
733.27
146.65
4.60
0.0012
Within Groups
1976.42
62
31.88
Total
2709.69
67
The degrees of freedom (df) represent the number of
independent data points required to define each value
calculated.
If we know the overall mean, once we know the ratings of 67
respondents, we can work out the rating given by the 68th
(hence Total df = N-1 = 67).
Similarly, if we know the overall mean plus means of 5 of the
6 groups, we can calculate the mean of the 6th group (hence
Between Groups df = 5).
ANOVA MAKING SENSE OF THE
SPSS PRINT-OUT
Sum of
Squares
df
Mean
Square
Sig.
Between
Groups
733.27
146.65
4.60
0.0012
Within Groups
1976.42
62
31.88
Total
2709.69
67
This would be reported as follows:
Mean scores of total DVD assessment varied significantly
between age groups (F(5,62)=4.60, p=0.0012)
Have to include the Between Groups and Within Groups
degrees of freedom because these determine the
significance of F
SAMPLING SUBJECTS THREE OR
MORE TIMES
Analogous to the paired t-test
Usually interested in within-subject
changes (eg changing some
biochemical parameter before
treatment, after treatment and at
follow-up)
ANOVA must be modified to take
account of the same subjects being
tested (ie no within-subject variation)
Use repeated measures ANOVA
NON-PARAMETRIC TESTS
If the variables being tested do
not follow a normal distribution,
cannot use standard t-test or
ANOVA
In essence, all the data points are
ranked, and the tests determine
whether the ranks within the
separate groups are the same, or
significantly different
MANN-WHITNEY U TEST
Say you have two groups, A and B, with ordinal data
Pool all the data from A and B, then rank each score, and indicate which group each
score comes from
If scores in A were more highly ranked than those in B, all the A scores would be on the
left, and B scores on the right
If there were no difference between A and B, their respective scores would be evenly
spread by rank
Rank
Grou
p
10 11 12
B
MANN-WHITNEY U TEST
Generate a total score (U) representing the number of times an A score precedes each B
Rank
10 11 12
The first B is preceded by 3 As
The second
Grou B is preceded by 4 As etc etc
A =A
A
B
A
B
A
B
A
B
30
U = 3+4+5+6+6+6
p
Look up significance of U from tables (generated automatically by SPSS)
SUMMARY OF BASIC
STATISTICAL TESTS
2 groups
>2 groups
Continuous variables
Independent ttest
ANOVA
Continuous
variables+same
sample
Matched pairs
t-test
Repeated
measures
ANOVA
Categorical variables
Chi square test
(Chi square
test)
Ordinal variables
(not normally
distributed)
Mann-Whitney
U test
Median test
Kruskal-Wallis
ANOVA
KAPPA
(Non-parametric) measure of agreement
TIME 1 (OR OBSERVER 1)
TIME 2(OR
OBSERVER
2)
Positive
Negativ
e
Total
Positive
A+C
Negativ
e
B+D
Total (A+B)/N
A+D
B+C
N
Simple agreement:
The above does not take account of agreement
by chance
Kappa takes account of chance agreement
KAPPA - INTERPRETATION
Kappa
Agreement
<0.20
Poor
0.21-0.40
Slight
0.41-0.60
Moderate
0.61-0.80
Good
0.80-1.00
Very good
DESCRIPTIVE STATISTICS
INVOLVING PROPORTIONS
The data below are from a sample of
people with early rheumatoid arthritis
randomised to have either usual
treatment alone or usual treatment plus
cognitive therapy
The table gives the number of patients in
each group who showed >25% worsening
in disability at 18-month follow-up
CBT
Usual Care
(TAU)
Cases
23
21
Deterioration
3 (13%)
11 (52%)
No
Deterioration
20 (83%)
10 (48%)
RATES, ODDS, AND ODDS
RATIOS
Deterioration
CBT
Usual Care
(TAU)
3 (13%)
11 (52%)
No
20 (83%)
Deterioration
Rate of deterioration
3/23
(CBT)
10 (48%)
13%
Odds of deterioration
(CBT)
3/20
0.15
Rate of deterioration
(TAU)
11/21
52%
One measure of the difference between the two
Oddsisofthe
deterioration
1.1
groups
extent to which11/10
the odds of
(TAU)
deterioration
differ between the groups
This is the ODDS RATIO, and the test applied is
whether this is different from 1.0
ABSOLUTE AND RELATIVE RISKS
Deterioration
No
Deterioration
Absolute
Risk
Reduction
(ARR)
Relative Risk
Reduction
(RRR)
CBT
Usual Care
(TAU)
3 (13%)
11 (52%)
20 (83%)
10 (48%)
Deterioratio _ Deterioratio
n rate (TAU)
n rate (CBT)
= 52% 13% = 39% or 0.39
Deterioratio _ Deterioratio
n rate (TAU)
n rate (CBT)
Deterioration rate
= (52 13)/53 (TAU)
= 73% or 0.73
Note that this could also be expressed as a Benefit
Increase rather than an Risk Reduction the answer is
NUMBER NEEDED TO TREAT
Deterioration
No
Absolute
Risk
Deterioration
Reduction
(ARR)
Number
CBT
Usual Care
(TAU)
3 (13%)
11 (52%)
20 (83%)
10 (48%)
= 0.39
= 1/ARR = 1/0.39 = 2.56 (~ 3)
Needed to
Treatis(NNT)
NNT
the number of patients that need to be
treated with CBT, compared with treatment as
usual, to prevent one patient deteriorating
In this case, 3 patients have to be treated to
prevent one patient deteriorating
NNT is a very useful summary measure, but is
commonly not given explicitly in published
ANOTHER APPROACH:
CONFIDENCE INTERVALS
If a population is sampled 100 times, the means of
the samples will lie within a normal distribution
95 of these 100 sample means
will lie between the shaded
areas at the edges of the
curve this represents the
95% confidence interval (96%
CI)
The 95% CI can be
viewed as the range
within which one can be
95% confident that the
true value (of the mean,
in this case) lies
ANOTHER APPROACH:
CONFIDENCE INTERVALS
95% CI SampleMean 1.96 SE
Returning to the IQ example,
Mean=107.5 and SE=3.0
95% CI 107.5 1.96 3.0
107.5 5.88
Thus we can be 95%
confident that the
true mean lies
between 101.62 and
113.4
CONFIDENCE INTERVAL (CI)
Gives a measure of the precision (or
uncertainty) of the results from a particular
sample
The X% CI gives the range of values which we
can be X% confident includes the true value
CIs are useful because they quantify the size
of effects or differences
Probabilities (p values) only measure strength
of evidence against the null hypothesis
CONFIDENCE INTERVALS
There are formulae to simply
calculate confidence intervals for
proportions as well as means
Statisticians (and journal editors!)
prefer CIs to p values because all
p values do is test significance,
while CIs give a better indication
of the spread or uncertainty of
any result
CONFIDENCE INTERVALS FOR
DIFFERENCE BETWEEN TWO
PROPORTIONS
Usual Care
CBT
(TAU)
Cases
23
21
Deterioration
3 (13%)
11 (52%)
No
20 (83%)
10 (48%)
Deterioration
95% CI = Risk Reduction 1.96 x
se
where se = standard error
se
se(ARR)
p1 (1 p)1 p2 (1 p2 )
n1
n2
0.13(1 0.13) 0.52(1 0.52)
23
23
NB This formula is given for convenience. You are not required to commit
any of these formulae to memory they can be obtained from numerous
CONFIDENCE INTERVAL OF
ABSOLUTE RISK REDUCTION
ARR = 0.39
se = 0.13
95% CI of ARR = ARR 1.95 x se
95% CI = 0.39 1.95 x 0.13
95% CI = 0.39 0.25 = 0.14 to 0.64
The calculated value of ARR is 39%, and the 95%
CI indicates that the true ARR could be as low as
14% or as high as 64%
Key point result is statistically significant
because the 95% CI does not include zero
INTERPRETATION OF
CONFIDENCE
that the INTERVALS
mean estimated from a
Remember
sample is only an estimate of the population
mean
The actual mean can lie anywhere within the 95%
confidence interval estimated from your data
For an Odds Ratio, if the 95% CI passes through
1.0, this means that the Odds Ratio is unlikely to
be statistically significant
For an Absolute Risk Reduction or Absolute
Benefit increase, this is unlikely to be significant
if its 95% CI passes through zero
CORRELATION
RHEUMATOID ARTHRITIS (N=24)
16
HADS Depression
14
Here, there are two
variables (HADS
depression score and
SIS) plotted against
each other
12
10
8
6
4
2
0
10
15
SIS
20
25
30
The question is
do HADS scores
correlate with SIS
ratings?
CORRELATION
RHEUMATOID ARTHRITIS (N=24)
16
r2=0.34
HADS Depression
14
12
10
8
x1
x2
x3
x4
2
0
10
15
SIS
20
25
30
In correlation, the aim is
to draw a line through
the data such that the
deviations of the points
from the line (xn) are
minimised
Because deviations can
be negative or positive,
each is first squared,
then the squared
deviations are added
together, and the
square root taken
CORRELATION
RHEUMATOID ARTHRITIS (N=24)
CORONARY ARTERY BYPASS (N=87)
16
16
r2=0.34
14
12
HADS Depression
HADS Depression
14
10
8
6
4
12
10
8
6
4
10
15
SIS
20
25
30
r2=0.06
10
15
SIS
20
25
30
CORRELATION
Can express correlation as an
equation:
y = A + Bx
CORRELATION
Can express correlation as an
equation:
y = A + Bx
If B=0, there is no correlation
CORRELATION
Can express correlation as an
equation:
y = A + Bx
Thus can test statistically whether
B is significantly different from
zero
REGRESSION
Can extend correlation
methods (see previous
slides) to model a
dependent variable on more
than one independent
variable
y = A + B1x1 + B2x2 + B3x3
Again, the main statistical
test is whether B1, B2, etc,
are different from zero
This method is known as
INTERPRETATION OF REGRESSION
DATA I
Regression models fit a general equation:
y=A + Bpxp + Bqxq + Brxr
y is the dependent variable, being predicted by
the equation
xp, xq and xr are the independent (or predictor)
variables
The basic statistical test is whether Bp, Bq and Br
(called the regression coefficients) differ from
zero
This result is either shown as a p value (p<0.05)
or as a 95% confidence interval (which does not
pass through zero)
INTERPRETATION OF REGRESSION DATA
II
Note that B can be positive (where x is
positively correlated with y) or negative
(where as x increases, y decreases)
The actual value of B depends on the scale of
x if x is a variable measured on a 0-100
scale, B is likely to be greater than if x is
measured on a 0-5 scale
For this reason, to better compare the
coefficients, they are usually converted to
standardised form (then called beta
coefficients), which assumes that all the
independent variables have the same scaling
INTERPRETATION OF REGRESSION DATA
III
In regression models, values of the beta coefficients
are reported, along with their significance or
confidence intervals
In addition, results report the extent to which a
particular regression model correctly predicts the
dependent variable
This is usually reported as R2, which ranges from 0
(no predictive power) to 1.0 (perfect prediction)
Converted to a percentage, R2 represents the extent
to which the variance in the dependent variable is
predicted by the model eg R2 = 0.40 means that the
model predicts 40% of the variance in the dependent
variable (in medicine, models are seldom
comprehensive, so R2 = 0.40 is usually a very good
result!)
INTERPRETATION OF REGRESSION
DATA IV: EXAMPLE
Subjects were outpatients (N=89)
with RA attending a rheumatology
outpatient clinic the dependent
variable was a measure of
Suffering
Bchi S et al: J
Rheumatol
1998;25:869-75
LOGISTIC REGRESSION
In linear regression (see preceding
slides), values of a dependent variable
are modelled (predicted) by combinations
of independent variables
This requires the dependent variable to
be a continuous variable with a normal
distribution
If the dependent variable has only two
values (eg alive or dead), linear
regression is inappropriate, and logistic
regression is used
LOGISTIC REGRESSION II
The statistics of logistic regression are complex and
difficult to express in graphical or visual form (the
dichotomous dependent variable has to be converted
to a function with a normal distribution)
However, like linear regression, logistic regression
can be reported in terms of beta coefficients for the
predictor variables, along with their associated
statistics
Contributions of dichotomous predictor variables are
sometimes reported as odds ratios (for example, if
presence or absence of depression is the dependent
variable, the effect of gender can be reported as an
odds ratio) if 95% confidence intervals of these odds
ratios are reported, the test is whether these include
1.0 (see odds ratios)
CRONBACHS ALPHA
You will come across this as an indication
of how rating scales perform
It is essentially a measure of the extent to
which a scale measures a single underlying
variable
Alpha goes up if
There are more items in the scale
Each item shows good correlation with
the total score
Values of alpha range from 0-1
Values of 0.8+ are satisfactory
MORTALITY
Number of deaths
Mortality Rate =
Total Population
Number of deaths (particular cause)
Proportional
=
Mortality Rate
Total deaths
Age-specific
=
Mortality Rate
Number of deaths (given
cause and specified age
range)
Total deaths
(same age range)
Number of deaths from a
Standardized
particular cause corrected for
Mortality Rate =
the age distribution (and
possibly other factors) of the
SURVIVAL ANALYSIS
1
X=Relapsed
4
Case
W=Withdrew
7
8
9
10
X
0
Year of Study
Patients
who have
not relapsed
at the end
of the study
are
described as
censored
SURVIVAL ANALYSIS: ASSUME
ALL CASES RECRUITED AT
X
X=Relapsed
TIME=0
1
2
C
W
C=Censored
4
Case
W=Withdrew
8
9
X
X
10
0
Year of Study
SURVIVAL ANALYSIS:
EVENTS IN YEAR 1
1
X=Relapsed
C
W
Case
W
W
8
9
X
X
10
10 people
at
0
risk at start
of Year 1
Year of Study
C=Censored
Case 6 withdrew
within the first year
(leaving 9 cases).
The average number
of people at risk
during the first year
was (10+9)/2 = 9.5
Of the 9.5 people at risk
during
Year51, one relapsed
4
Probability of surviving first
year = (9.5-1)/9.5 = 0.896
C
W=Withdrew
SURVIVAL ANALYSIS:
EVENTS IN YEAR 2
1
X=Relapsed
C
W
Case
W
W
8
9
X
X
10
0
8 people
1 at
risk at start
of Year 2
C=Censored
C Case 7 withdrew in
Year 2, thus 7.5 people
(average) at risk
during
Year
Of the 7.5
people
at 2risk
5
6
W=Withdrew
Year of Study
during Year 2, two relapsed
Probability of surviving
second year = (7.5-2)/7.5 =
0.733
4
Chances
of 5
surviving for 2
years = 0.733 x 0.895 =
0.656
SURVIVAL ANALYSIS:
EVENTS IN YEAR 3
1
C
W
4
Case
X=Relapsed
8
9
X
X
10
0
5 people
at
2
3
risk at start
of Year
Year of
3 Study
W=Withdrew
C=Censored
Cases 2 and 8 censored
C
(ie withdrew) in Year 3,
thus average people at
risk during Year 3 =
=4
Of the(5+3)/2
4 people
at risk
during Year 3, one relapsed
Probability of surviving
third year = (4-1)/4 = 0.75
4
Chances
of 5
surviving for 3
years = 0.75 x 0.656 =
0.492
Relapse-free survival
SURVIVAL CURVE
Year
KAPLAN-MAIER SURVIVAL
ANALYSIS
Where outcome is measured at
regular predefined time intervals eg
every 12 months, this is termed an
actuarial survival analysis
The Kaplan-Maier method follows
the same principles, but the
intervals of measurement are
between successive outcome events
ie the intervals are usually irregular
COXS PROPORTIONAL
HAZARDS METHOD
You do not need to know the details
of this, but should be aware of its
application
This method essentially uses a form
of analysis of variance (see ANOVA)
to correct survival data for baseline
difference between subjects (for
example, if mortality is the
outcome being assessed, one might
wish to correct for the age of the
patient at the start of the study)