Models for Poisson data
V. Vasdekis
Athens University of Economics and Business
April 8, 2021
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
V. Vasdekis (Athens University of Economics and Business)
Models for Poisson data April 8, 2021 1 / 10
Example 1
Let us assume that we possess the number of chronic medical
problems in a sample of areas which are approximately of the same
size. Areas are of urban or rural character. The total number of
observations is n = 49.
Scientific question: Do we expect that urban and rural areas present
the same mean number of chronic medical problems?
These are count data. Numbers from urban areas 0, 1, 1, 0, 2, 3....
Numbers from rural areas 2, 0, 3, 0, 0....
We assume that yi ∼ P(λi ), i = 1, . . . , 49 independent observations.
The model consists of the linear predictor which represents the
scientific question and the link function
{
1 if i obs comes from rural
log(λi ) = β0 + β1 regioni , regioni =
0 otherwise
Parameters interpretation: exp(β0 ) = E(y|urban),
exp(β1 ) = E(y|rural)/E(y|urban). . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
V. Vasdekis (Athens University of Economics and Business)
Models for Poisson data April 8, 2021 2 / 10
Example 2
Number of new leukemia cases in an area for 12 consecutive months.
Wrong assumption: data are independent.
Question: Does the expected number of new cases differ between 4
seasons?
Random component: yi ∼ P(λi ), i = 1, . . . , 12 independent
observations (assumption is wrong in practise).
Model is:
log(λi ) = β0 + β1 season1 + β2 season2 + β3 season3
where season1 , season2 , season3 are pseudovariables for spring,
summer and autumn respectively.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
V. Vasdekis (Athens University of Economics and Business)
Models for Poisson data April 8, 2021 3 / 10
Parameters interpretation
Therefore,
exp(β0 ) = E(y|winter)
All other parameters express ratios of expected values as a
comparison of all other seasons with the winter.
exp(β1 ) = E(y|spring)/E(y|winter)
exp(β2 ) = E(y|summer)/E(y|winter)
exp(β3 ) = E(y|autumn)/E(y|winter)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
V. Vasdekis (Athens University of Economics and Business)
Models for Poisson data April 8, 2021 4 / 10
Example 3
It is frequent that the expected number of cases depends on an
exposure at risk variable, the effect of which must be taken into
account if we wish to make different population groups more
comparable.
As an example, consider the number of epileptic seizures being
measured on a number of patients. One patient is measured for 2
weeks, another one is measured for 1.5 weeks. Can we compare the
expected number of epileptic seizures between the two patients?
Let us denote by λ the expected number of cases when the exposure
at risk variable is not taken into account and λ′ when this variable is
taken into account. Let us also denote by s the exposure at risk
variable.
Parameter λ expresses what we actually measure. Parameter λ′
expresses what we would have measured provided we have measured
under the same exposure conditions.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
V. Vasdekis (Athens University of Economics and Business)
Models for Poisson data April 8, 2021 5 / 10
Assumption: λ′ = λ/s, giving that λ = s × λ′ . Therefore, this
assumption says that if we double the exposure at risk variable we
expect to double what we see, the expected number of cases.
This assumption is called proportionality property of the exposure at
risk variable effecting the dependent variable.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
V. Vasdekis (Athens University of Economics and Business)
Models for Poisson data April 8, 2021 6 / 10
An example
Suppose that in the same problem, we measure an explanatory
variable x. Then, we must model λ′ since this parameter is
comparable between subpopulations defined by explanatory variables.
Remember however, that data are based on λ. Since λ′ = λ/s.
The random component is defined as
yi ∼ P(λi ), i = 1, . . . , n or yi ∼ P(si × λ′i )
The linear predictor and log-link gives
log(λi ) = log(si ) + log(λ′i ) = log(si ) + β0 + β1 x
Note that log(si ) is an explanatory variable with coefficient equal to
1. Such a variable is called an offset.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
V. Vasdekis (Athens University of Economics and Business)
Models for Poisson data April 8, 2021 7 / 10
Example 4
Number of accidents in a specific time period in two Cambridge roads, in
three different time zones. An estimated traffic volume is also measured as
an exposure at risk variable.
Estimated
Time of day Accidents traffic volume
Trumpington Road 07.00-09.30 11 2206
Trumpington Road 09.30-15.00 9 3276
Trumpington Road 15.00-18.30 4 1999
Mill Road 07.00-09.30 4 1399
Mill Road 09.30-15.00 20 2276
Mill Road 15.00-18.30 4 1417
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
V. Vasdekis (Athens University of Economics and Business)
Models for Poisson data April 8, 2021 8 / 10
Modeling
Let us denote by yi , i = 1, . . . , 6 the number of accidents. We assume
these are independent observations. If si is the estimated traffic
volume of observation i, then a possible assumption about the effect
of si to λ′i is
λi
λ′i =
si
We can also write yi ∼ P(si × λ′i ).
Therefore, we have assumed proportionality of the estimated traffic
volume and expected number of accidents.
We model now λ′ using, say, the road effect and the final model
emerges
log(λi ) = log(si ) + log(λ′i ) = log(si ) + β0 + β1 roadi
where roadi is an indicator variable (pseudovariable) for Mill road.
We can use other models and check their applicability. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
V. Vasdekis (Athens University of Economics and Business)
Models for Poisson data April 8, 2021 9 / 10
What if no proportionality?
Remember that since λ′ = λ/s, therefore, λ = s × λ′ and
log(λi ) = log(si ) + log(λ′i ). If proportionality does not hold, a possible
model can be λ = sγ × λ′ and therefore log(λi ) = γ × log(si ) + log(λ′i ).
What are the consequences of such a model? If we double the value
of the exposure at risk variable then the new λ
λnew = (2s)γ × λ′ = 2γ sγ λ′ = 2γ × λ
therefore the expected number of cases is multiplied not by 2 but by
2γ .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
V. Vasdekis (Athens University of Economics and Business)
Models for Poisson data April 8, 2021 10 / 10