0% found this document useful (0 votes)

380 views101 pages

Statistics

This document provides an overview of how to use a PowerPoint presentation on basic statistics. It explains that the presentation covers common statistical topics through slides that can be viewed sequentially or by clicking on specific topics. It recommends three basic books for further reading on statistics. The main aim of the presentation is to explain statistical concepts without extensive formulas in order to help understanding rather than rote learning. The document then provides a list of statistical topics that are covered in the presentation slides.

Uploaded by

Rahul Ghosale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

380 views101 pages

Statistics

Uploaded by

Rahul Ghosale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 101

Basic statistics: a survival

guide
Tom Sensky

HOW TO USE THIS POWERPOINT

PRESENTATION

The presentation covers the basic statistics

you need to have some understanding of.

After the introductory slides, youll find two

slides listing topics.

When you view the presentation in Slide

show mode, clicking on any topic in these

lists gets you to slides covering that topic.

Clicking on the

symbol (in the top right

corner of each slide still in slide show
mode) gets you back to the list of topics.

HOW TO USE THIS POWERPOINT

PRESENTATION

You can either go through the slide

show sequentially from the start

(some topics follow on from those
before) or review specific topics when
you encounter them in your reading.

A number of the examples in the

presentation are taken from PDQ

Statistics, which is one of three basic
books I would recommend (see next
page).

RECOMMENDED RESOURCES

The books below explain statistics simply,

without excessive mathematical or logical

language, and are available as inexpensive
paperbacks.

Geoffrey Norman and David Steiner. PDQ 1

Statistics. 3rd Edition. BC Decker, 2003

David Bowers, Allan House, David Owens.

Understanding Clinical Papers (2nd Edition).

Wiley, 2006

Douglas Altman et al. Statistics with

Confidence. 2nd Edition. BMJ Books, 2000

PDQ stands for Pretty Darn Quick a series of publications

AIM OF THIS PRESENTATION

The main aim has been to present the

information in such a way as to allow

you to understand the statistics
involved rather than having to rely on
rote learning.

Thus formulae have been kept to a

minimum they are included where

they help to explain the statistical
test, and (very occasionally) for
convenience.

You may have to go through parts of

the presentation several times in order

to understand some of the points

BASIC STATISTICS
Types of data
ANOVA
Normal distribution
Repeated measures ANOVA
Describing data
Non-parametric tests
Mann-Whitney U test
Boxplots
Summary of common tests
Standard deviations
Summaries of proportions
Skewed distributions
Odds and Odds Ratio
Parametric vs Non-parametric
Sample size
Absolute and Relative Risks
Statistical errors
Number Needed to Treat (NNT)
Power calculations
Confidence intervals (CIs)
Clinical vs statistical significanceCI (diff between two proportions)
Two-sample t test
Correlation
Problem of multiple tests
Regression
Subgroup analyses
Paired t test
Chi-square test

Logistic regression
Mortality statistics
Survival analysis

TYPES OF DATA
VARIABLES
QUANTITATIVE

RATIO
Pulse rate
Height

QUALITATIVE

INTERVAL ORDINAL NOMINAL

36o-38oC Social class Gender
Ethnicity

NORMAL DISTRIBUTION
THE EXTENT OF THE
SPREAD OF DATA
AROUND THE MEAN
MEASURED BY THE
STANDARD
DEVIATION

MEAN

CASES DISTRIBUTED
SYMMETRICALLY
ABOUT THE MEAN
AREA BEYOND TWO
STANDARD
DEVIATIONS ABOVE
THE MEAN

DESCRIBING DATA
MEAN

Average or arithmetic mean of the

data

MEDIAN

The value which comes half way

when the data are ranked in order

MODE

Most common value observed

In a normal distribution, mean and median

are the same

If median and mean are different, indicates

that the data are not normally distributed

The mode is of little if any practical use

BOXPLOT
(BOX AND WHISKER PLOT)
97.5th
Centile
75th Centile

Pain (VAS)

10
8
6

MEDIAN
(50th
centile)
25th Centile

4
2
0
-2
N=

Interquartile
range

Female

Male

2.5th Centile

STANDARD DEVIATION
MEASURE OF THE SPREAD OF
THE SQUARE OF
2
VALUES OF A SAMPLE
AROUND
THE SD IS
Sum(Value Mean)
KNOWN
AS THE
SD
THE
MEAN
VARIANCE
Numberof values
SD decreases as a function
of:
smaller spread of values
about the mean
larger number of values
IN A NORMAL
DISTRIBUTION, 95%
OF THE VALUES WILL
LIE WITHIN 2 SDs OF
THE MEAN

STANDARD DEVIATION AND

SAMPLE SIZE
As sample
size
increases, so
SD decreases

n=150

n=50

n=10

SKEWED DISTRIBUTION
MEAN
MEDIAN 50%
OF VALUES WILL
LIE ON EITHER
SIDE OF THE
MEDIAN

DOES A VARIABLE FOLLOW A

NORMAL DISTRIBUTION?

Important because parametric statistics

assume normal distributions

Statistics packages can test normality

Distribution unlikely to be normal if:
Mean is very different from the median
Two SDs below the mean give an
impossible answer (eg height <0 cm)

DISTRIBUTIONS: EXAMPLES
NORMAL
DISTRIBUTION

SKEWED
DISTRIBUTION

Height
Weight
Haemoglobin

Bankers bonuses
Number of
marriages

DISTRIBUTIONS AND
STATISTICAL TESTS

Many common statistical tests rely on the

variables being tested having a normal distribution

These are known as parametric tests

Where parametric tests cannot be used, other,
non-parametric tests are applied which do not
require normally distributed variables

Sometimes, a skewed distribution can be made

sufficiently normal to apply parametric statistics

by transforming the variable (by taking its square
root, squaring it, taking its log, etc)

EXAMPLE: IQ
Say that you have tested a sample of
people on a validated IQ test
The IQ test has
been carefully
standardized on a
large sample to
have a mean of 100
and an SD of 15
94

100

103

106

Sumof (Individual Value- MeanValue)2

Numberof values

EXAMPLE: IQ
Say you now administer the test
to repeated samples of 25 people
Expected random variation of
these means equals the
Standard Error
SE

SD
SampleSize

15
94

100

103

106

3.0

STANDARD DEVIATION vs
STADARD ERROR

Standard Deviation is a measure

of variability of scores in a
particular sample

Standard Error of the Mean is an

estimate of the variability of

estimated population means taken
from repeated samples of that
population (in other words, it
gives an estimate of the precision
of the sample mean)

See Douglas G. Altman and J. Martin Bland. Standard

deviations and standard errors. BMJ 331 (7521):903,
2005.

EXAMPLE: IQ
One sample of 25 people yields a mean
IQ score of 107.5

100

103

106

What are the chances

of obtaining an IQ of
107.5 or more in a
sample of 25 people
from the same
population as that on
which the test was
standardized?

EXAMPLE: IQ
How far out the sample IQ is in the
population distribution is calculated as the
area under the curve to the right of the
sample mean:
SampleMean- Population
Mean
StandardError

100

103

106

107.5- 100
3.0

2.5

This ratio tells us

how far out on the
standard distribution
we are the higher
the number, the
further we are from

EXAMPLE: IQ

Look up this figure (2.5) in a table

of values of the normal
distribution
From the table, the area in the
tail to the right of our sample
mean is 0.006 (approximately
1 in 160)
This means that there
is a 1 in 160 chance
that our sample mean
came from the same
population as the IQ
100 103 106
test was standardized
on

EXAMPLE: IQ
This is commonly referred to as p=0.006

100

By convention, we accept as
significantly different a sample
mean which has a 1 in 20
chance (or less) of coming from
the population in which the
test was standardized
(commonly referred to as
p=0.05)
Thus our sample had
a significantly
103 106
greater IQ than the
reference population
(p<0.05)

EXAMPLE: IQ
If we move the
sample mean (green)
closer to the
population mean
(red), the area of the
distribution to the
right of the sample
mean increases

100

103

106

Even by inspection,
the sample is more
likely than our
previous one to come
from the original

COMPARING TWO SAMPLES

SAMPLE
A MEAN

SAMPLE
A

In this case, there is

very little overlap
between the two
distributions, so they
are likely to be
different

SAMPLE
B MEAN

SAMPLE
B

COMPARING TWO SAMPLES

Returning to the IQ example, lets say that we
know that the sample we tested (IQ=107.5)
actually came from a population with a mean
IQ of 110

100

107.5

110

SAMPLES AND POPULATIONS

Repeatedly measuring small
samples from the same
population will give a normal
distribution of means
The spread of these small
sample means about the
population mean is given
by the Standard Error, SE

SD
SampleSize

COMPARING TWO SAMPLES

We start by assuming that our sample came from
the original population
Our null hypothesis (to be tested) is that
IQ=107.5 is not significantly different from
IQ=100

100

107.5

110

COMPARING TWO SAMPLES

The area under the standard population curve to the
right of our sample IQ of 107.5 represents the likelihood
of observing this sample mean of 107.5 by chance under
the null hypothesis ie that the sample is from the
standard population
This is known as
the level and is
normally set at
0.05
If the sample

100

107.5

110

comes from
the standard
population,
we expect to
find a mean
of 107.5 in 1
out of 20
estimates

COMPARING TWO SAMPLES

It is perhaps easier to conceptualise by seeing what
happens if we move the sample mean
Sample mean is
closer to the
red population
mean

Usually set at 0.05 (5%) or 0.01

(1%)

False negative
Fail to find a significant difference
Type II ()

even though one exists

Usually set at 0.20 (20%)

Power = 1 (ie usually 80%)
Remember that power is related to sample size because
a larger sample has a smaller SE thus there is less
overlap between the curves

SAMPLE SIZE: POWER

CALCULATIONS
Using the standard =0.05 and =0.20, and having
estimates for the standard deviation and the difference
in sample means, the smallest sample size needed to
avoid a Type II error can be calculated with a formula

POWER CALCULATIONS

Intended to estimate sample size

required to prevent Type II errors

For simplest study designs, can apply a
standard formula
Essential requirements:
A research hypothesis
A measure (or estimate) of variability
for the outcome measure
The difference (between intervention
and control groups) that would be
considered clinically important

STATISTICAL SIGNIFICANCE IS
NOT NECESSARILY CLINICAL
SIGNIFICANCE
Sample
Populatio
Sample
p

Size

n Mean

Mean

100.0

110.0

0.05

100.0

104.0

0.05

100.0

102.5

0.05

400

100.0

101.0

0.05

2,500

100.0

100.4

0.05

10,000

100.0

100.2

0.05

CLINICALLY SIGNIFICANT
IMPROVEMENT
Large proportion of
patients improving

Hugdahl & Ost

(1981)

A change which is large in

magnitude

Barlow (1981)

An improvement in
patients everyday
functioning

Kazdin &
Wilson (1978)

Reduction in symptoms by
50% or more

Jansson & Ost

(1982)

Elimination of the
presenting problem

Kazdin &
Wilson (1978)

MEASURES OF CLINICALLY
SIGNIFICANT IMPROVEMENT
ABNORMAL
POPULATION
DISTRIBUTION
OF
DYSFUNCTION
AL SAMPLE

FIRST POSSIBLE CUT-OFF:

OUTSIDE THE RANGE OF
THE DYSFUNCTIONAL
POPULATION

AREA BEYOND TWO

STANDARD
DEVIATIONS ABOVE
THE MEAN

MEASURES OF CLINICALLY
SIGNIFICANT IMPROVEMENT
SECOND POSSIBLE CUTABNORMAL NORMAL
POPULATION POPULATIO
b
c
a
N

OFF: WITHIN THE RANGE

OF THE NORMAL
POPULATION
THIRD POSSIBLE CUT-OFF:
MORE WITHIN THE
NORMAL THAN THE
ABNORMAL RANGE
DISTRIBUTION OF
FUNCTIONAL
(NORMAL)
SAMPLE

UNPAIRED OR INDEPENDENTSAMPLE t-TEST: PRINCIPLE

The two
distributions are
widely separated so
their means clearly
different
The distributions
overlap, so it is
unclear whether the
samples come from
the same population

Differencebetweenmeans
t
SE of thedifference

In essence, the t-test

gives a measure of the
difference between the
sample means in
relation to the overall

UNPAIRED OF INDEPENDENTSAMPLE t-TEST: PRINCIPLE

Differencebetweenmeans
t
SE of thedifference

SD
SampleSize

With smaller
sample sizes, SE
increases, as does
the overlap
between the two
curves, so value of
t decreases

THE PREVIOUS IQ EXAMPLE

In the previous IQ example, we

were assessing whether a
particular sample was likely to
have come from a particular
population

If we had two samples (rather

than sample plus population), we

would compare these two samples
using an independent-sample ttest

MULTIPLE TESTS AND TYPE I

ERRORS
The
risk of observing by chance a

difference between two means

(even if there isnt one) is

This risk is termed a Type I error

By convention, is set at 0.05
For an individual test, this becomes
the familiar p<0.05 (the probability
of finding this difference by chance
is <0.05 or less than 1 in 20)
However, as the number of tests
rises, the actual probability of
finding a difference by chance rises
markedly

Tests (N)

0.05

0.098

0.143

0.185

0.226

0.264

0.401

0.641

SUBGROUP ANALYSIS

Papers sometimes report analyses of

subgroups of their total dataset

Criteria for subgroup analysis:

Must have large sample

Must have a priori hypothesis

Must adjust for baseline differences

between subgroups

Must retest analyses in an

independent sample

TORTURED DATA - SIGNS

Did the reported findings result from testing a

primary hypothesis of the study? If not, was
the secondary hypothesis generated before
the data were analyzed?

What was the rationale for excluding various

subjects from the analysis?

Were the following determined before looking

at the data: definition of exposure, definition

of an outcome, subgroups to be analyzed, and
cutoff points for a positive result?
Mills JL. Data torturing. NEJM 329:1196-1199,
1993.

TORTURED DATA - SIGNS

How many statistical tests were

performed, and was the effect of multiple

comparisons dealt with appropriately?

Are both P values and confidence

intervals reported?

And have the data been reported for all

subgroups and at all follow-up points?

Mills JL. Data torturing. NEJM 329:1196-1199,

1993.

COMPARING TWO MEANS FROM

THE SAME SAMPLE-THE PAIRED t
TEST
Assume that A and B represent
Subject

measures on the same subject

(eg at two time points)

Note that the variation

between subjects is much

wider than that within subjects
ie the variance in the columns
swamps the variance in the
rows

Treating A and B as entirely

separate, t=-0.17, p=0.89

Treating the values as paired,

t=3.81, p=0.03

SUMMARY THUS FAR

ONE-SAMPLE
(INDEPENDENT
SAMPLE) t-TEST

Used to compare means

of two independent
samples

Used to compare two

PAIRED (MATCHED
(repeated) measures
PAIR) t-TEST
from the same subjects

COMPARING PROPORTIONS:
THE CHI-SQUARE TEST
A

Number of
patients

100

Actual %
Discharged

Actual
number
discharged

Expected
number
discharged

Say that we are

interested to know
whether two
interventions, A and B,
lead to the same
percentages of patients
being discharged after
one week

COMPARING PROPORTIONS:
THE CHI-SQUARE TEST
A

Number of
patients

100

Actual %
Discharged

Actual
number
discharged

Expected
number
discharged

We can calculate the

number of patients in each
group expected to be
discharged if there were no
difference between the
groups
Total of 30 patients

discharged out of 150 ie

20%
If no difference between
the groups, 20% of
patients should have been
discharged from each
group (ie 20 from A and
10 from B)
These are the expected

COMPARING PROPORTIONS:
THE CHI-SQUARE TEST
2
(Observed- Expected)

Sum

Expected

Number of
patients

100

(15 20)2 (15 10)2

20
10

Actual %
Discharged

Actual
number
discharged

Expected
number
discharged

25 25

1.25 2.5 3.75

20 10

According to tables, the

minimum value of chi
square for p=0.05 is 3.84
Therefore, there is no
significant difference
between our treatments

COMPARISONS BETWEEN THREE

OR MORE SAMPLES

Cannot use t-test (only for 2 samples)

Use analysis of variance (ANOVA)
Essentially, ANOVA involves dividing the
variance in the results into:

Between groups variance

Within groups variance
Measureof BetweenGroupsvariance
F
Measureof WithinGroupsvariance
The greater F, the more significant the
result (values of F in standard tables)

ANOVA - AN EXAMPLE
BetweenGroup
Variance
WithinGroup
Variance

Here, the between-group variance

is large relative to the withingroup variance, so F will be large

ANOVA - AN EXAMPLE
WithinGroup
Variance

BetweenGroup
Variance

Here, the within-group variance is

larger, and the between-group
variance smaller, so F will be smaller
(reflecting the likeli-hood of no
significant differences between
these three sample means

ANOVA AN EXAMPLE

Data from SPSS sample

data file dvdplayer.sav

Age
Group

Mean

Focus group where 68

participants were asked
to rate DVD players

18-24

31.9

5.0

25-31

31.1

5.7

Results from running

One Way ANOVA (found
under Compare Means)

32-38

35.8

5.3

39-45

38.0

6.6

Table shows scores for

Total DVD assessment
by different age groups

46-52

29.3

6.0

53-59

28.5

5.3

Total

32.2

6.4

ANOVA SPSS PRINT-OUT

Data from SPSS print-out shown below
Sum of
Squares

Mean
Square

Sig.

Between
Groups

733.27

146.65

4.60

0.0012

Within Groups

1976.42

31.88

Total

2709.69

Between Groups Sum of Squares concerns

the variance (or variability) between the
groups

Within Groups Sum of Squares concerns

the variance within the groups

ANOVA MAKING SENSE OF THE

SPSS PRINT-OUT

Sum of
Squares

Mean
Square

Sig.

Between
Groups

733.27

146.65

4.60

0.0012

Within Groups

1976.42

31.88

Total

2709.69

The degrees of freedom (df) represent the number of

independent data points required to define each value
calculated.
If we know the overall mean, once we know the ratings of 67
respondents, we can work out the rating given by the 68th
(hence Total df = N-1 = 67).
Similarly, if we know the overall mean plus means of 5 of the
6 groups, we can calculate the mean of the 6th group (hence
Between Groups df = 5).

ANOVA MAKING SENSE OF THE

SPSS PRINT-OUT

Sum of
Squares

Mean
Square

Sig.

Between
Groups

733.27

146.65

4.60

0.0012

Within Groups

1976.42

31.88

Total

2709.69

This would be reported as follows:

Mean scores of total DVD assessment varied significantly
between age groups (F(5,62)=4.60, p=0.0012)

Have to include the Between Groups and Within Groups

degrees of freedom because these determine the
significance of F

SAMPLING SUBJECTS THREE OR

MORE TIMES

Analogous to the paired t-test

Usually interested in within-subject
changes (eg changing some
biochemical parameter before
treatment, after treatment and at
follow-up)

ANOVA must be modified to take

account of the same subjects being

tested (ie no within-subject variation)

Use repeated measures ANOVA

NON-PARAMETRIC TESTS

If the variables being tested do

not follow a normal distribution,

cannot use standard t-test or
ANOVA

In essence, all the data points are

ranked, and the tests determine
whether the ranks within the
separate groups are the same, or
significantly different

MANN-WHITNEY U TEST
Say you have two groups, A and B, with ordinal data
Pool all the data from A and B, then rank each score, and indicate which group each
score comes from

If scores in A were more highly ranked than those in B, all the A scores would be on the

left, and B scores on the right

If there were no difference between A and B, their respective scores would be evenly
spread by rank

Rank

Grou
p

10 11 12
B

MANN-WHITNEY U TEST
Generate a total score (U) representing the number of times an A score precedes each B

Rank

10 11 12

The first B is preceded by 3 As

The second
Grou B is preceded by 4 As etc etc
A =A
A
B
A
B
A
B
A
B
30
U = 3+4+5+6+6+6
p
Look up significance of U from tables (generated automatically by SPSS)

SUMMARY OF BASIC
STATISTICAL TESTS
2 groups

>2 groups

Continuous variables

Independent ttest

ANOVA

Continuous
variables+same
sample

Matched pairs
t-test

Repeated
measures
ANOVA

Categorical variables

Chi square test

(Chi square
test)

Ordinal variables
(not normally
distributed)

Mann-Whitney
U test
Median test

Kruskal-Wallis
ANOVA

KAPPA
(Non-parametric) measure of agreement
TIME 1 (OR OBSERVER 1)

TIME 2(OR
OBSERVER
2)

Positive

Negativ
e

Total

Positive

A+C

Negativ
e

B+D

Total (A+B)/N
A+D
B+C
N
Simple agreement:
The above does not take account of agreement

by chance

Kappa takes account of chance agreement

KAPPA - INTERPRETATION

Kappa

Agreement

<0.20

Poor

0.21-0.40

Slight

0.41-0.60

Moderate

0.61-0.80

Good

0.80-1.00

Very good

DESCRIPTIVE STATISTICS
INVOLVING PROPORTIONS

The data below are from a sample of

people with early rheumatoid arthritis

randomised to have either usual
treatment alone or usual treatment plus
cognitive therapy

The table gives the number of patients in

each group who showed >25% worsening
in disability at 18-month follow-up
CBT

Usual Care
(TAU)

Cases

Deterioration

3 (13%)

11 (52%)

No
Deterioration

20 (83%)

10 (48%)

RATES, ODDS, AND ODDS

RATIOS
Deterioration

CBT

Usual Care
(TAU)

3 (13%)

11 (52%)

No
20 (83%)
Deterioration
Rate of deterioration
3/23
(CBT)

10 (48%)
13%

Odds of deterioration
(CBT)

3/20

0.15

Rate of deterioration
(TAU)

11/21

52%

One measure of the difference between the two

Oddsisofthe
deterioration
1.1
groups
extent to which11/10
the odds of
(TAU)
deterioration
differ between the groups
This is the ODDS RATIO, and the test applied is
whether this is different from 1.0

ABSOLUTE AND RELATIVE RISKS

Deterioration
No
Deterioration
Absolute
Risk

Reduction
(ARR)

Relative Risk
Reduction
(RRR)

CBT

Usual Care
(TAU)

3 (13%)

11 (52%)

20 (83%)

10 (48%)

Deterioratio _ Deterioratio
n rate (TAU)
n rate (CBT)

= 52% 13% = 39% or 0.39

Deterioratio _ Deterioratio
n rate (TAU)
n rate (CBT)

Deterioration rate
= (52 13)/53 (TAU)
= 73% or 0.73

Note that this could also be expressed as a Benefit

Increase rather than an Risk Reduction the answer is

NUMBER NEEDED TO TREAT

Deterioration
No
Absolute
Risk
Deterioration

Reduction
(ARR)
Number

CBT

Usual Care
(TAU)

3 (13%)

11 (52%)

20 (83%)

10 (48%)

= 0.39

= 1/ARR = 1/0.39 = 2.56 (~ 3)

Needed to
Treatis(NNT)
NNT
the number of patients that need to be
treated with CBT, compared with treatment as
usual, to prevent one patient deteriorating
In this case, 3 patients have to be treated to
prevent one patient deteriorating
NNT is a very useful summary measure, but is
commonly not given explicitly in published

ANOTHER APPROACH:
CONFIDENCE INTERVALS
If a population is sampled 100 times, the means of
the samples will lie within a normal distribution
95 of these 100 sample means
will lie between the shaded
areas at the edges of the
curve this represents the
95% confidence interval (96%
CI)
The 95% CI can be
viewed as the range
within which one can be
95% confident that the
true value (of the mean,
in this case) lies

ANOTHER APPROACH:
CONFIDENCE INTERVALS
95% CI SampleMean 1.96 SE
Returning to the IQ example,
Mean=107.5 and SE=3.0

95% CI 107.5 1.96 3.0

107.5 5.88
Thus we can be 95%
confident that the
true mean lies
between 101.62 and
113.4

CONFIDENCE INTERVAL (CI)

Gives a measure of the precision (or

uncertainty) of the results from a particular
sample

The X% CI gives the range of values which we

can be X% confident includes the true value

CIs are useful because they quantify the size

of effects or differences

Probabilities (p values) only measure strength

of evidence against the null hypothesis

CONFIDENCE INTERVALS

There are formulae to simply

calculate confidence intervals for

proportions as well as means

Statisticians (and journal editors!)

prefer CIs to p values because all
p values do is test significance,
while CIs give a better indication
of the spread or uncertainty of
any result

CONFIDENCE INTERVALS FOR

DIFFERENCE BETWEEN TWO
PROPORTIONS
Usual Care
CBT
(TAU)

Cases

Deterioration

3 (13%)

11 (52%)

No
20 (83%)
10 (48%)
Deterioration
95% CI = Risk Reduction 1.96 x

se
where se = standard error

se
se(ARR)

p1 (1 p)1 p2 (1 p2 )

n1
n2

0.13(1 0.13) 0.52(1 0.52)

23
23

NB This formula is given for convenience. You are not required to commit
any of these formulae to memory they can be obtained from numerous

CONFIDENCE INTERVAL OF
ABSOLUTE RISK REDUCTION

ARR = 0.39
se = 0.13
95% CI of ARR = ARR 1.95 x se
95% CI = 0.39 1.95 x 0.13
95% CI = 0.39 0.25 = 0.14 to 0.64
The calculated value of ARR is 39%, and the 95%
CI indicates that the true ARR could be as low as
14% or as high as 64%

Key point result is statistically significant

because the 95% CI does not include zero

INTERPRETATION OF
CONFIDENCE
that the INTERVALS
mean estimated from a
Remember

sample is only an estimate of the population

mean

The actual mean can lie anywhere within the 95%

confidence interval estimated from your data

For an Odds Ratio, if the 95% CI passes through

1.0, this means that the Odds Ratio is unlikely to

be statistically significant

For an Absolute Risk Reduction or Absolute

Benefit increase, this is unlikely to be significant

if its 95% CI passes through zero

CORRELATION
RHEUMATOID ARTHRITIS (N=24)
16

HADS Depression

Here, there are two

variables (HADS
depression score and
SIS) plotted against
each other

12
10
8
6
4
2
0

SIS

The question is
do HADS scores
correlate with SIS
ratings?

CORRELATION
RHEUMATOID ARTHRITIS (N=24)
16

r2=0.34

HADS Depression

14
12
10
8

2
0

SIS

In correlation, the aim is

to draw a line through
the data such that the
deviations of the points
from the line (xn) are
minimised
Because deviations can
be negative or positive,
each is first squared,
then the squared
deviations are added
together, and the
square root taken

CORRELATION
RHEUMATOID ARTHRITIS (N=24)

CORONARY ARTERY BYPASS (N=87)

r2=0.34

HADS Depression

10
8
6
4

12
10
8
6
4

SIS

r2=0.06

SIS

CORRELATION
Can express correlation as an
equation:

y = A + Bx

CORRELATION
Can express correlation as an
equation:

y = A + Bx
If B=0, there is no correlation

CORRELATION
Can express correlation as an
equation:

y = A + Bx

Thus can test statistically whether

B is significantly different from
zero

REGRESSION
Can extend correlation
methods (see previous
slides) to model a
dependent variable on more
than one independent
variable

y = A + B1x1 + B2x2 + B3x3

Again, the main statistical

test is whether B1, B2, etc,
are different from zero
This method is known as

INTERPRETATION OF REGRESSION
DATA I

Regression models fit a general equation:

y=A + Bpxp + Bqxq + Brxr

y is the dependent variable, being predicted by

the equation

xp, xq and xr are the independent (or predictor)

variables

The basic statistical test is whether Bp, Bq and Br

(called the regression coefficients) differ from
zero

This result is either shown as a p value (p<0.05)

or as a 95% confidence interval (which does not
pass through zero)

INTERPRETATION OF REGRESSION DATA

Note that B can be positive (where x is

positively correlated with y) or negative

(where as x increases, y decreases)

The actual value of B depends on the scale of

x if x is a variable measured on a 0-100
scale, B is likely to be greater than if x is
measured on a 0-5 scale

For this reason, to better compare the

coefficients, they are usually converted to

standardised form (then called beta
coefficients), which assumes that all the
independent variables have the same scaling

INTERPRETATION OF REGRESSION DATA

III

In regression models, values of the beta coefficients

are reported, along with their significance or
confidence intervals

In addition, results report the extent to which a

particular regression model correctly predicts the
dependent variable

This is usually reported as R2, which ranges from 0

(no predictive power) to 1.0 (perfect prediction)

Converted to a percentage, R2 represents the extent

to which the variance in the dependent variable is
predicted by the model eg R2 = 0.40 means that the
model predicts 40% of the variance in the dependent
variable (in medicine, models are seldom
comprehensive, so R2 = 0.40 is usually a very good
result!)

INTERPRETATION OF REGRESSION
DATA IV: EXAMPLE

Subjects were outpatients (N=89)

with RA attending a rheumatology
outpatient clinic the dependent
variable was a measure of
Suffering

Bchi S et al: J
Rheumatol
1998;25:869-75

LOGISTIC REGRESSION

In linear regression (see preceding

slides), values of a dependent variable

are modelled (predicted) by combinations
of independent variables

This requires the dependent variable to

be a continuous variable with a normal
distribution

If the dependent variable has only two

values (eg alive or dead), linear

regression is inappropriate, and logistic
regression is used

LOGISTIC REGRESSION II

The statistics of logistic regression are complex and

difficult to express in graphical or visual form (the
dichotomous dependent variable has to be converted
to a function with a normal distribution)

However, like linear regression, logistic regression

can be reported in terms of beta coefficients for the
predictor variables, along with their associated
statistics

Contributions of dichotomous predictor variables are

sometimes reported as odds ratios (for example, if
presence or absence of depression is the dependent
variable, the effect of gender can be reported as an
odds ratio) if 95% confidence intervals of these odds
ratios are reported, the test is whether these include
1.0 (see odds ratios)

CRONBACHS ALPHA
You will come across this as an indication
of how rating scales perform

It is essentially a measure of the extent to

which a scale measures a single underlying

variable

Alpha goes up if
There are more items in the scale
Each item shows good correlation with
the total score

Values of alpha range from 0-1

Values of 0.8+ are satisfactory

MORTALITY
Number of deaths
Mortality Rate =
Total Population

Number of deaths (particular cause)

Proportional
=
Mortality Rate
Total deaths
Age-specific
=
Mortality Rate

Number of deaths (given

cause and specified age
range)
Total deaths
(same age range)

Number of deaths from a

Standardized
particular cause corrected for
Mortality Rate =
the age distribution (and
possibly other factors) of the

SURVIVAL ANALYSIS
1

X=Relapsed

4
Case

W=Withdrew

7
8

9
10

X
0

Year of Study

Patients
who have
not relapsed
at the end
of the study
are
described as
censored

SURVIVAL ANALYSIS: ASSUME

ALL CASES RECRUITED AT
X
X=Relapsed
TIME=0

1
2

C
W

C=Censored

4
Case

W=Withdrew

8
9

X
X

10
0

Year of Study

SURVIVAL ANALYSIS:
EVENTS IN YEAR 1
1

X=Relapsed

C
W

Case

W
W

8
9

X
X

10
10 people
at
0
risk at start
of Year 1

Year of Study

C=Censored

Case 6 withdrew
within the first year
(leaving 9 cases).
The average number
of people at risk
during the first year
was (10+9)/2 = 9.5
Of the 9.5 people at risk
during
Year51, one relapsed
4
Probability of surviving first
year = (9.5-1)/9.5 = 0.896
C

W=Withdrew

SURVIVAL ANALYSIS:
EVENTS IN YEAR 2
1

X=Relapsed

C
W

Case

W
W

8
9

X
X

10
0

8 people
1 at
risk at start
of Year 2

C=Censored

C Case 7 withdrew in
Year 2, thus 7.5 people
(average) at risk
during
Year
Of the 7.5
people
at 2risk

5
6

W=Withdrew

Year of Study

during Year 2, two relapsed

Probability of surviving
second year = (7.5-2)/7.5 =
0.733
4
Chances
of 5
surviving for 2
years = 0.733 x 0.895 =
0.656

SURVIVAL ANALYSIS:
EVENTS IN YEAR 3
1

C
W

4
Case

X=Relapsed

8
9

X
X

10
0

5 people
at
2
3
risk at start
of Year
Year of
3 Study

W=Withdrew
C=Censored

Cases 2 and 8 censored

C
(ie withdrew) in Year 3,
thus average people at
risk during Year 3 =
=4
Of the(5+3)/2
4 people
at risk
during Year 3, one relapsed
Probability of surviving
third year = (4-1)/4 = 0.75
4
Chances
of 5
surviving for 3
years = 0.75 x 0.656 =
0.492

Relapse-free survival

SURVIVAL CURVE

Year

KAPLAN-MAIER SURVIVAL
ANALYSIS

Where outcome is measured at

regular predefined time intervals eg

every 12 months, this is termed an
actuarial survival analysis

The Kaplan-Maier method follows

the same principles, but the

intervals of measurement are
between successive outcome events
ie the intervals are usually irregular

COXS PROPORTIONAL
HAZARDS METHOD

You do not need to know the details

of this, but should be aware of its
application

This method essentially uses a form

of analysis of variance (see ANOVA)
to correct survival data for baseline
difference between subjects (for
example, if mortality is the
outcome being assessed, one might
wish to correct for the age of the
patient at the start of the study)

Chapter 5 - RM
No ratings yet
Chapter 5 - RM
22 pages
2NUBIONormalCurve2T24 25
No ratings yet
2NUBIONormalCurve2T24 25
50 pages
AYURSURE (Research and Stat) 4
No ratings yet
AYURSURE (Research and Stat) 4
44 pages
Basic Statistical Concepts Review
No ratings yet
Basic Statistical Concepts Review
8 pages
Click To Add Text Dr. Cemre Erciyes
No ratings yet
Click To Add Text Dr. Cemre Erciyes
69 pages
11.inferential Statistics March 24
No ratings yet
11.inferential Statistics March 24
74 pages
COM 201 - Inferential Statistics - 18032022-1
No ratings yet
COM 201 - Inferential Statistics - 18032022-1
58 pages
Lecture 1 Descriptive Statistics
No ratings yet
Lecture 1 Descriptive Statistics
50 pages
Bio Statistics
No ratings yet
Bio Statistics
97 pages
Tutoring Study Plan
No ratings yet
Tutoring Study Plan
17 pages
Lecture 4 - Normal and Nonnormal Dist - HS - 070323en
No ratings yet
Lecture 4 - Normal and Nonnormal Dist - HS - 070323en
39 pages
MMW Notes
No ratings yet
MMW Notes
10 pages
Statistics 17 18
No ratings yet
Statistics 17 18
21 pages
Chapter 8 & (Part) Chapter 12: Distribution of Sample Means: Chapters 8 & 12: Page 1
No ratings yet
Chapter 8 & (Part) Chapter 12: Distribution of Sample Means: Chapters 8 & 12: Page 1
13 pages
Statistics For Support Slides
No ratings yet
Statistics For Support Slides
186 pages
Probability & Statistics Practice Tests
No ratings yet
Probability & Statistics Practice Tests
46 pages
Statistics През
No ratings yet
Statistics През
46 pages
PSY417 Week01
No ratings yet
PSY417 Week01
45 pages
Chapter 8 & (Part) Chapter 12: Distribution of Sample Means: Chapters 8 & 12: Page 1
No ratings yet
Chapter 8 & (Part) Chapter 12: Distribution of Sample Means: Chapters 8 & 12: Page 1
14 pages
Normal Distribution
No ratings yet
Normal Distribution
9 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
93 pages
(Slides) 16 Statistical Reasoning (II)
No ratings yet
(Slides) 16 Statistical Reasoning (II)
25 pages
Theoretical Distributions & Hypothesis Testing
No ratings yet
Theoretical Distributions & Hypothesis Testing
66 pages
"Significance Testing 101: The Z Test - Part One": My Former Statistics Professor Used To Say That: "Our World Is Noisy."
No ratings yet
"Significance Testing 101: The Z Test - Part One": My Former Statistics Professor Used To Say That: "Our World Is Noisy."
5 pages
MMW Reviewer
No ratings yet
MMW Reviewer
9 pages
Data Science Interview Stats Guide
No ratings yet
Data Science Interview Stats Guide
39 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel
48 pages
Week2-4-Desciptive Statistics
No ratings yet
Week2-4-Desciptive Statistics
88 pages
Biostatistics Notes Part 1
No ratings yet
Biostatistics Notes Part 1
9 pages
03.22.2021 - L8 Statistics and Least Square
No ratings yet
03.22.2021 - L8 Statistics and Least Square
71 pages
Psyc02a - 2425
No ratings yet
Psyc02a - 2425
7 pages
BUSD2027 QualityMgmt Module2
No ratings yet
BUSD2027 QualityMgmt Module2
168 pages
Descriptive Statistics W2
No ratings yet
Descriptive Statistics W2
29 pages
Intro To Data: Science
No ratings yet
Intro To Data: Science
156 pages
504.applied Statistics For Social Sciences 1
No ratings yet
504.applied Statistics For Social Sciences 1
62 pages
Slides - W7C1 Introduction To Statistics
No ratings yet
Slides - W7C1 Introduction To Statistics
41 pages
Session 3 Week 2
No ratings yet
Session 3 Week 2
31 pages
Basic Statistics For Health Sciences
91% (11)
Basic Statistics For Health Sciences
361 pages
AP Psychology: Statistical Reasoning
No ratings yet
AP Psychology: Statistical Reasoning
9 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
Statistics
100% (1)
Statistics
11 pages
(Ebook PDF) Social Statistics For A Diverse Society 7th Edition PDF Download
100% (3)
(Ebook PDF) Social Statistics For A Diverse Society 7th Edition PDF Download
43 pages
Intro to Statistical Models
No ratings yet
Intro to Statistical Models
36 pages
Advanced Statistics for Research
No ratings yet
Advanced Statistics for Research
27 pages
Basic Statistics: Populations and Samples
No ratings yet
Basic Statistics: Populations and Samples
10 pages
History Reporting
No ratings yet
History Reporting
61 pages
Prem S. Mann - 2016-14-17
No ratings yet
Prem S. Mann - 2016-14-17
4 pages
Basic Concepts
100% (1)
Basic Concepts
17 pages
Descriptive Statistics Course Guide
No ratings yet
Descriptive Statistics Course Guide
50 pages
Descriptive and Inferential Statistics. Confidence Interval
No ratings yet
Descriptive and Inferential Statistics. Confidence Interval
42 pages
Revision Module 1,2,3
No ratings yet
Revision Module 1,2,3
129 pages
4689-2 Final
No ratings yet
4689-2 Final
11 pages
Is Important Because:: TECH 6300 Introduction To Statistical Inference The Normal Distribution
100% (1)
Is Important Because:: TECH 6300 Introduction To Statistical Inference The Normal Distribution
19 pages
Lecture of BIOSTATISTICS 12.2022 RMDC
No ratings yet
Lecture of BIOSTATISTICS 12.2022 RMDC
85 pages
Statistics for College Students
No ratings yet
Statistics for College Students
10 pages
Basic Statistics
No ratings yet
Basic Statistics
105 pages
Statistical Formula Sheet 1: X X N X N X F X N
No ratings yet
Statistical Formula Sheet 1: X X N X N X F X N
11 pages
SPC Training
No ratings yet
SPC Training
78 pages
Abd Question Paper Bank
100% (1)
Abd Question Paper Bank
96 pages
Tax Guide for Salaried Employees
No ratings yet
Tax Guide for Salaried Employees
6 pages
Abd Syllabus
No ratings yet
Abd Syllabus
10 pages
6.time and Value of Supply
No ratings yet
6.time and Value of Supply
54 pages
Avishkar Dissertation
No ratings yet
Avishkar Dissertation
25 pages
Karishma
No ratings yet
Karishma
8 pages
Trading A/C and P&L A/C Particulars Amt Amt Particulars
No ratings yet
Trading A/C and P&L A/C Particulars Amt Amt Particulars
9 pages
Trupti Sandbhor Salary Slip
No ratings yet
Trupti Sandbhor Salary Slip
2 pages
Rajiv Gandhi Business School, Tathwade
No ratings yet
Rajiv Gandhi Business School, Tathwade
2 pages
Evaluation Sheet MBA - II Semester - III 2017-18
No ratings yet
Evaluation Sheet MBA - II Semester - III 2017-18
3 pages
Insperia 2017: Mba Induction Program: Faculty of HR and Operations
No ratings yet
Insperia 2017: Mba Induction Program: Faculty of HR and Operations
2 pages
Cost Sheet Format
No ratings yet
Cost Sheet Format
10 pages
Sneha Loan Project
No ratings yet
Sneha Loan Project
63 pages
1 Basic Concepts Overview of Financial Accounting
No ratings yet
1 Basic Concepts Overview of Financial Accounting
11 pages
Salary 32800 A/C 5000 Class Money 10000 Reshma 1000 Singh 2000 Sonali 3000 53800
No ratings yet
Salary 32800 A/C 5000 Class Money 10000 Reshma 1000 Singh 2000 Sonali 3000 53800
2 pages
Chapter 1 Nature and Scope of International Finance
No ratings yet
Chapter 1 Nature and Scope of International Finance
67 pages
MCQ of Financial Statement Analysis
100% (11)
MCQ of Financial Statement Analysis
8 pages
Sugar Production Cost Study Report
50% (4)
Sugar Production Cost Study Report
74 pages
Accounting MCQs for Students
100% (1)
Accounting MCQs for Students
3 pages
Understanding Financial Statements
No ratings yet
Understanding Financial Statements
47 pages
Past Present Contribution For RGBS
No ratings yet
Past Present Contribution For RGBS
3 pages
27.09.2013 RM QB
0% (1)
27.09.2013 RM QB
64 pages
Ration Card
No ratings yet
Ration Card
8 pages
Pe Rfor Ma Nce Ind Icat or
No ratings yet
Pe Rfor Ma Nce Ind Icat or
6 pages
Business Finance MCQs
No ratings yet
Business Finance MCQs
5 pages
Financial Inclusion
0% (1)
Financial Inclusion
21 pages
NCP For Vaginal Itching - Infection
88% (8)
NCP For Vaginal Itching - Infection
2 pages
RBI Jaipur Part-Time Medical Consultant Application
No ratings yet
RBI Jaipur Part-Time Medical Consultant Application
2 pages
John G. Gearhart MD FACS, Richard C. Rink MD, Pierre D. E. Mouriquand MD FRCS (Eng) - Pediatric Urology, Second Edition (2009)
100% (4)
John G. Gearhart MD FACS, Richard C. Rink MD, Pierre D. E. Mouriquand MD FRCS (Eng) - Pediatric Urology, Second Edition (2009)
855 pages
Grade 10 Health LAS
No ratings yet
Grade 10 Health LAS
53 pages
Neonatal Jaundice: DR Nadeem Alam Zubairi
No ratings yet
Neonatal Jaundice: DR Nadeem Alam Zubairi
56 pages
PMCCV1Studt - 1-31 (PDF Search Engine)
100% (2)
PMCCV1Studt - 1-31 (PDF Search Engine)
104 pages
Yintang (M-Hn-3) : Hall of Impression
No ratings yet
Yintang (M-Hn-3) : Hall of Impression
1 page
Experienced Nurse Practitioner Resume
100% (1)
Experienced Nurse Practitioner Resume
5 pages
Information About The Program Contact: Paediatric Dentistry M.SC
No ratings yet
Information About The Program Contact: Paediatric Dentistry M.SC
2 pages
Thoracic Injury Management Jan 08
No ratings yet
Thoracic Injury Management Jan 08
39 pages
PC1 SOAP Note - A.G.
100% (1)
PC1 SOAP Note - A.G.
9 pages
Congenital Heart Disease
100% (1)
Congenital Heart Disease
46 pages
Handbook of PLAB (2nd Ed.)
91% (11)
Handbook of PLAB (2nd Ed.)
863 pages
Pediatric Dentistry PH D Program
No ratings yet
Pediatric Dentistry PH D Program
2 pages
DNB Pediatrics MCQs December 2015
100% (1)
DNB Pediatrics MCQs December 2015
7 pages
Inserto Anti RNP SM Orgentec
No ratings yet
Inserto Anti RNP SM Orgentec
4 pages
Optic Ross
No ratings yet
Optic Ross
2 pages
NRH Opd Timings
No ratings yet
NRH Opd Timings
2 pages
Printer Friendly Version
No ratings yet
Printer Friendly Version
3 pages
HH-II-163 Exercises
No ratings yet
HH-II-163 Exercises
3 pages
Prof. Ziyad M. Hijazi - Biography
No ratings yet
Prof. Ziyad M. Hijazi - Biography
2 pages
Opera Swing
No ratings yet
Opera Swing
12 pages
Kirshblum2011 PDF
No ratings yet
Kirshblum2011 PDF
8 pages
Embrace Case Study
100% (2)
Embrace Case Study
3 pages
Pediatric Neuroanesthesia Insights
No ratings yet
Pediatric Neuroanesthesia Insights
2 pages
Anteroposterior Cephalometrics
100% (2)
Anteroposterior Cephalometrics
40 pages
Med Legal Partnerships
No ratings yet
Med Legal Partnerships
9 pages
Origin of 'Anthracosis' Term
No ratings yet
Origin of 'Anthracosis' Term
2 pages
Child CPR and Chocking
No ratings yet
Child CPR and Chocking
36 pages
Hospital Name: Citizens Charter
No ratings yet
Hospital Name: Citizens Charter
5 pages