Geometric distribution

In probability theory and statistics, the geometric distribution is either of two discrete probability distributions:

the probability distribution of the number X of Bernoulli trials needed to get one success, supported on the set { 1, 2, 3, ...}, or

the probability distribution of the number Y = X − 1 of failures before the first success, supported on the set { 0, 1, 2, 3, ... }.

Which of these one calls "the" geometric distribution is a matter of convention and convenience.

If the probability of success on each trial is p, then the probability that n trials are needed to get one success is

P(X=n)=(1-p)^{n-1}p\,

for n = 1, 2, 3, .... Equivalently the probability that there are n failures before the first success is

P(Y=n)=(1-p)^{n}p\,

for n = 0, 1, 2, 3, ....

In either case, the sequence of probabilities is a geometric sequence.

For example, suppose an ordinary die is thrown repeatedly until the first time a "1" appears. The probability distribution of the number of times it is thrown is supported on the infinite set { 1, 2, 3, ... } and is a geometric distribution with p = 1/6.

The expected value of a geometrically distributed random variable X is 1/p and the variance is (1 − p)/p²;

\ E(X)={\frac {1}{p}}\quad ;\quad {\mbox{var}}(X)={\frac {1-p}{p^{2}}}.

Equivalently, the expected value of the geometrically distributed random variable Y is (1 − p)/p, and its variance is (1 − p)/p².

\ E(Y)={\frac {1-p}{p}}\quad ;\quad {\mbox{var}}(Y)={\frac {1-p}{p^{2}}}.

The probability-generating functions of X and Y are, respectively,

G_{X}(s)={\frac {sp}{1-s(1-p)}}\quad {\textrm {and}}\quad G_{Y}(s)={\frac {p}{1-s(1-p)}},\quad |s|<(1-p)^{-1}.

Like its continuous analogue (the exponential distribution), the geometric distribution is memoryless. That means that if you roll a die until the first "1" appears, then the number of necessary additional trials is not affected by the fact that you have just observed a series of failures; the die does not have a "memory" of these failures. The geometric distribution is in fact the only memoryless discrete distribution.

Among all discrete probability distributions supported on {1, 2, 3, ... } with given expected value μ, the geometric distribution X with parameter p = 1/μ is the one with the largest entropy.

The geometric distribution of the number Y of failures before the first success is infinitely divisible, i.e., for any positive integer n, there exist independent identically distributed random variables Y₁, ..., Y_n whose sum has the same distribution that Y has. These will not be geometrically distributed unless n = 1.

Related distributions

The geometric distribution Y is a special case of the negative binomial distribution, with r = 1. More generally, if Y₁,...,Y_r are independent geometrically distributed variables with parameter p, then $Z=\sum _{m=1}^{r}Y_{m}$ follows a negative binomial distribution with parameters r and p.

If Y₁,...,Y_r are independent geometrically distributed variables (with possibly different success parameters p_m), then their minimum $W=\min _{m}Y_{m}$ is also geometrically distributed, with parameter p given by $1-\prod _{m}(1-p_{m})$ .

External links

"Geometric distribution". PlanetMath.
Geometric distribution on MathWorld.