### 1. Overview

In this article, we’ll discuss the geometric probability distribution and its properties. We will prove each property mathematically and understand its significance. This article assumes basic knowledge of discrete mathematics and algebra for proofs.

### 2. Basic Definitions

We begin with a few basic definitions that will set the stage for things to come.

**Sample space**: A collection or set of discrete points $\Omega$ that is **countable**. A set of values $S$ is said to be countable when there exists a bijection $f: S → A$ where $A$ is a known countable set, such as the set of integers. Each point in the sample space represents an outcome.

A sample space may also be made of a continuous interval of points, but that is outside the scope of this article.

**Conditional probability**: Given events $A$ and $B$ as subsets of $\Omega$, the value $P(A | B)$ is the probability that $A$ occurs given that $B$ has already occurred. It is the probability of $A$ scaled down to the conditional universe of $B$.

**Discrete random variable**: A random variable $X$ is a function $g: \Omega → A$ where $A$ is a subset of $\mathbb{R}$ and $\Omega$ is the sample space. Think of $X$ as taking on input values in $\Omega$ with a certain probability and providing a corresponding meaningful output. For example: we have a set of students and we want to measure their weights. The set of students is our sample space $\Omega$. $W$ could be a random variable that takes any student in this set as input, and outputs a weight value associated with the student. The more frequently a weight value is seen, the more probability it has of occurring.

**Probability mass function (PMF)**: A function $f: A → [0, 1]$ where $A$ is the subset of values that $X$ takes on, as given above. We usually use the notation $P(X = x)$ where $x\ \epsilon\ A$.

**Cumulative distribution function (CDF)**: In terms of our notation, it is the function $P(X \le x)$.

A random variable $X$ is **geometric** if it adheres to the properties of the geometric distribution.

### 3. The Geometric Probability Mass Function

A geometric random variable $X$ represents the number of trials it takes for an event of interest to occur. Each trial is independent and has no bearing on the others. $X$ takes on the values $1$, $2$, $3$ and so on.

For example, we might want to measure the number of times a coin is tossed until a head appears. Say the heads outcome occurs with probability $p$. This implies that the only other outcome, the tails, happens with probability $1\ – p$. The geometric PMF then takes the form -:

$P(X = x) = (1\ – p)^{x\ – 1}p$

This means that $x\ – 1$ trials failed before the success occurred. Visually, this looks like a downward sloping set of discrete points on the XY plane. The sample space is plotted on the X-axis and the probability of each point is plotted on the Y-axis. The height of a point in this plot tells us how probable it is, and these **heights decrease in the form of a geometric progression**.

A more contrived example could involve a game with a number of outcomes. One of the outcomes signifies the end of the game. The others force the game to repeat. We are to measure the number of rounds the game takes before it completes.

### 4. The Geometric Cumulative Distribution Function

The CDF gives us a convenient measure of the probability of input points up to a given baseline $x$. For geometric distributions, it takes on the form -:

$P(X \le x) = 1\ – (1\ – p)^x$

A proof of the above fact follows from taking the summation -:

$ P(X \le x) = \sum_{k = 1}^{x}(1\ – p)^{k\ – 1}p$

$ = p\sum_{k = 1}^{x}(1\ – p)^{k\ – 1}$

$ = p\frac{1\ –\ (1\ –\ p)^{x\ – 1 + 1}}{1\ –\ (1\ –\ p)}$ [geometric progression]

$ = 1\ – (1\ – p)^x$ [cancel out $p$]

### 5. Expected Value of a Geometric Random Variable

The expectation of a random variable $E[X]$ is fundamentally a weighted average. Each real value in the output set is weighted by the probability that it occurs. If more points in the sample space correspond to a particular output, that output has a greater weightage. It is also called the **mean** of the distribution. It can be described by the general formula -:

$E[X] = \sum_{k = a}^{b}k\ P(X = k)$

The expectation of a geometric random variable in particular looks like -:

$E[X] = \sum_{k = 1}^{\infty}k\ (1\ – p)^{k\ – 1}p$

This summation has a closed form which we can extract through the use of calculus. We describe this manipulation below. First, note that the sum of an infinite geometric progression can be described as -:

$\sum_{n = 0}^{\infty}x^n = \frac{1}{1\ –\ x}$

Differentiate on both sides with respect to $x$ to get -:

$\sum_{n = 1}^{\infty}nx^{n\ – 1} = \frac{1}{(1\ –\ x)^2}$

Using the above result, we proceed as follows -:

$E[X] = p\sum_{k = 1}^{\infty}k\ (1\ – p)^{k\ – 1}$

$ = p\frac{1}{(1\ –\ (1\ –\ p))^2}$

$ = p\frac{1}{p^2}$

$ = \frac{1}{p}$

### 6. Variance of a Geometric Random Variable**

The variance of a random variable $Var(X)$ measures the spread of the distribution. It represents the expected **average square distance** of $X$ from the mean. In general,

$Var(X) = E[(X\ – \mu)^2] = E[(X\ – E[X])^2]$

The variance can also be calculated using the formula -:

$Var(X) = E[X^2]\ – (E[X])^2$

In the case of the geometric distribution, $E[X] = \frac{1}{p}$ and $(E[X])^2 = \frac{1}{p^2}$. It remains to calculate $E[X^2]$. We do this by algebraic manipulation and application of **linearity of expectation** as follows -:

$E[X^2] = E[X^2\ – X + X] = E[X(X\ – 1) + X] = E[X(X\ – 1)] + E[X]$.

Further, let $1\ – p = q$. Note that $X(X\ – 1)$ is itself a random variable, as it is the function of a random variable. Therefore, it makes sense to be able to calculate its expectation.

$E[X(X\ – 1)] = p\sum_{x = 1}^{\infty}x\ (x\ – 1)\ q^{x\ – 1}$

This is equivalent to writing -:

$E[X(X\ – 1)] = p\frac{d}{dq}\left (\sum_{x = 1}^{\infty}(x\ – 1)\ q^x \right )$

$ = p\frac{d}{dq}\left (q^2\ \sum_{x = 2}^{\infty}(x\ – 1)\ q^{x\ – 2}\right )$

$ = p\frac{d}{dq}\left (q^2\ \frac{d}{dq}\left (\sum_{x = 2}^{\infty}q^{x\ – 1}\right )\right )$

$ = p\frac{d}{dq}\left (q^2\ \frac{d}{dq}\left (\sum_{x = 1}^{\infty}q^{x}\right )\right )$

$ = p\frac{d}{dq}\left (q^2\ \frac{d}{dq}\left (\frac{1}{1\ –\ q}\ – 1\right )\right )$

$ = p\frac{d}{dq}\left (q^2\ \left (\frac{1}{(1\ –\ q)^2}\right )\right )$

$ = p\ \frac{2q}{(1\ –\ q)^3}$

$ = p\ \frac{2(1\ –\ p)}{(1\ –\ (1\ –\ p))^3}$

$ = \frac{2\ –\ 2p}{p^2}$

Plugging in this result, we get -:

$Var(X) = E[X(X\ – 1)] + E[X]\ – (E[X])^2$

$ = \frac{2\ –\ 2p + p\ –\ 1}{p^2}$

$ = \frac{1\ –\ p}{p^2}$

### 7. Memorylessness of the Geometric Distribution

An interesting property inherent in the geometric distribution is its memorylessness. If we were to begin observing the experiment after a given number of trials, the remaining trials would **follow the same distribution** as the trials that have elapsed. It is as if we had started over!

Suppose we start observing after $a$ trials have elapsed, and a further $b$ trials occur. We claim that -:

$P(X \gt a + b | X \gt a) = P(X \gt b)$

To prove this, first note that -:

$P(A | B) = \frac{P(A\ \cap\ B)}{P(B)}$

So we have that -:

$P(X \gt a + b | X \gt a) = \frac{P(X\ \gt\ a + b\ \cap\ X\ \gt\ a)}{P(X\ \gt\ a)}$

But $X \gt a + b \cap\ X \gt a = X \gt a + b$

Therefore we have -:

$P(X \gt a + b | X \gt a) = \frac{P(X\ \gt\ a + b)}{P(X\ \gt\ a)}$

Here we may use the CDF to quickly calculate $P(X \gt x)$ style probabilities.

$P(X \gt x) = 1\ – P(X \le x) = (1\ – p)^x$

Thus we get -:

$P(X \gt a + b | X \gt a) = \frac{(1\ –\ p)^{a + b}}{(1\ –\ p)^a} = (1\ – p)^b = P(X \gt b)$

### 8. Conclusion

In this article, we discussed the various properties of the geometric distribution. We understood which situations can be modelled by this distribution. Further, we algebraically proved each property and underlined what they really mean.