Probability Playground: The Pareto Distribution

The Pareto distribution is a power-law distribution that models many types of phenomena that become less common at larger scales. A shape parameter α controls the exponent in the power-law, while a scale parameter xₘ defines the lower bound of the distribution.

Parameter	Range	Description
α	α > 0	Shape parameter
xₘ	xₘ > 0	Scale parameter

Probability Density Function

f (x; α, x_{m}) = \frac{α x_{m}^{α}}{x^{α + 1}}

F (x; α, x_{m}) = 1 - {(\frac{x_{m}}{x})}^{α}

S (x; α, x_{m}) = {(\frac{x_{m}}{x})}^{α}

h (x; α, x_{m}) = \frac{α}{x}

Support

x_{m} \leq x < \infty

Mean

Variance

Example	α	xₘ
For the 633 bestselling books in the US that sold 2 million or more copies between 1895 and 1965, the number of books sold (in millions) follows an approximate Pareto distribution with α = 3.51.	3.510	2.000
The magnitude of earthquakes occuring in California which record above 3.8 on the Richter scale follows an approximate Pareto distribution with α = 3.04.	3.040	3.800
For AT&T customers in the US receiving 10 or more phone calls per day, the number of daily phone calls follows an approximate Pareto distribution with α = 2.22.	2.220	10.00

X ∼ Pareto(α, xₘ)

α = xₘ =

E(X) = , Var(X) =

The Pareto distribution has the property of being scale-invariant. Suppose we consider some number x₀ and a multiplier k. From the Pareto cdf, we have P(X > kx₀ | X > x₀) = (1/k)^α, which is independent of x₀. For example, if US household incomes follow a Pareto distribution, then the ratio of households with income over $100K compared to those over $50K is the same as the ratio of those over $50K compared to those over $25K.

Note that the mean of the Pareto distribution is only defined for α > 1, while the variance is only defined for α > 2.

The graph above displays the survival function S(x) = P(X > x) = 1 - F(X), where F(x) is the cumulative distribution function (cdf).

Survival functions are used in survival analysis, a branch of statistics concerned with the expected duration until an event occurs such as death or the failure of a mechanical system.

The graph above displays the hazard function h(x). This equals f(x)/S(x), where f(x) is the pdf and S(x) = P(X > x) is the survival function.

The illustration above shows a point U chosen from a standard uniform distribution. The random variable X = x_m/U^1/α has a Pareto(α, x_m) distribution.

The simulation above shows a point U chosen from a standard uniform distribution on the y-axis. The light blue circle shows the value of the random variable X = x_m/U^1/α on the x-axis, which has a Pareto(α, x_m) distribution. The histogram accumulates the results of each simulation.

Y = log(X/xₘ) ∼ Exponential(1/α) lim_n→∞ max Xᵢ/n^1/α ∼ Fréchet(α, xₘ) min Xᵢ ∼ Pareto(nα, xₘ) (X/xₘ)^α ∼ Standard Pareto

E(Y) = , Var(Y) =

Proof

X ∼ Pareto(α, xₘ)

Y = log(X/xₘ) ∼ Exponential(1/α) limn→∞ max Xᵢ/n1/α ∼ Fréchet(α, xₘ) min Xᵢ ∼ Pareto(nα, xₘ) (X/xₘ)α ∼ Standard Pareto

Y = log(X/xₘ) ∼ Exponential(1/α) lim_n→∞ max Xᵢ/n^1/α ∼ Fréchet(α, xₘ) min Xᵢ ∼ Pareto(nα, xₘ) (X/xₘ)^α ∼ Standard Pareto