Poisson regression

In statistics, the Poisson regression model attributes to a response variable Y a Poisson distribution whose expected value depends on a predictor variable x (written in lower case because the model treats x as non-random, in the following way:

\log \operatorname {E} (Y)=a+bx\,

(where "log" means natural logarithm). Poisson regression models are generalized linear models with "log" as the (canonical) link function, and Poisson distributed errors.

If Y_i are independent observations with corresponding values x_i of the predictor variable, then a and b can be estimated by maximum likelihood if i ≥ 2. The maximum-likelihood estimates lack a closed-form expression and must be found by numerical methods.

Poisson regression in practice

Poisson regression is appropriate when the dependent variable is a count, for instance of events such as the arrival of a telephone call at a call centre (see Poisson distribution#Occurrence). The events must be independent in the sense that the arrival of one call will not make another more or less likely, but the probability per unit time of events is understood to be related to covariates such as time of day.

A characteristic of the Poisson distribution is that its mean is equal to its variance. In certain circumstances, it will be found that the observed variance is greater than the mean; this is known as over-dispersion and indicates that the model is not appropriate. A common reason is the omission of relevant explanatory variables.

Another common problem with Poisson regression is excess zeros: if there are two process as work, one determining whether there are zero events or any events, and a Poisson process determining how many events there are, there will be more zeros than a Poisson regression would predict. An example would be the distribution of cigarettes smoked in an hour by members of a group where some individuals are non-smokers.

Other generalized linear models such as the negative binomial model may function better in these cases.