The Normal Distribution — An Interactive Introduction

The normal distribution shows up constantly in statistics and probabilistic modeling. Bayesian models, bootstraps, measurement uncertainty — the Gaussian is involved in most of it. This post walks through the key ideas and gives you an interactive widget to build intuition.

The density function

A continuous random variable $X$ is said to follow a normal distribution with mean $\mu$ and standard deviation $\sigma > 0$ , written $X \sim \mathcal{N}(\mu, \sigma^2)$ , if its probability density function is

f(x \mid \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}} \exp\!\left(-\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2\right)

The formula has a clean structure once you read it from the inside out:

The exponent $-\tfrac{1}{2}z^2$ where $z = (x-\mu)/\sigma$ is the standardized distance from the mean. Squaring $z$ makes the penalty symmetric; the factor of $-\tfrac{1}{2}$ sets the curvature.
The exponential maps that quadratic penalty to a positive value that decays rapidly in the tails — faster than any polynomial.
The normalizing constant $1/(\sigma\sqrt{2\pi})$ ensures the density integrates to 1 over the whole real line, which you can verify using the Gaussian integral $\int_{-\infty}^{\infty} e^{-t^2}\,dt = \sqrt{\pi}$ .

The two parameters

Location $\mu$ shifts the entire curve left or right without changing its shape. It is simultaneously the mean, median, and mode of the distribution.

Scale $\sigma$ controls the spread. A smaller $\sigma$ produces a tall, narrow peak concentrated near $\mu$ ; a larger $\sigma$ spreads probability mass into the tails and flattens the peak. Specifically:

About 68% of the probability lies within $\pm 1\sigma$ of $\mu$ .
About 95% lies within $\pm 2\sigma$ .
About 99.7% lies within $\pm 3\sigma$ .

This “68–95–99.7 rule” is one of the most useful rules of thumb in statistics and arises directly from definite integrals of $f$ .

Why does it appear everywhere?

The Central Limit Theorem (CLT) is the primary reason. It says that the sum (or average) of a large number of independent, identically distributed random variables — regardless of their individual distribution — converges in distribution to a normal. Concretely, if $X_1, X_2, \ldots, X_n$ are i.i.d. with mean $\mu$ and finite variance $\sigma^2$ , then

\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1) \quad \text{as } n \to \infty

This explains why measurement errors, sample means, and aggregated signals in nature tend to look Gaussian: they are sums of many small, independent contributions.

Beyond the CLT, the normal distribution is also the maximum-entropy distribution for a given mean and variance. In Bayesian modeling it appears naturally as a prior when all you know about a parameter is its scale, and as the limiting form of many likelihoods. In the projects I’ve worked on — from multivariate probit models for ML headcount estimation to Bayesian optimization of plant photosynthesis — the Gaussian is almost always either the model itself or the approximate shape of a posterior.

Explore it yourself

Use the sliders below to move $\mu$ and $\sigma$ and watch the density curve update in real time. Notice how the peak height always equals $1/(\sigma\sqrt{2\pi})$ , so as $\sigma$ grows the curve must flatten to keep the total area equal to 1.

Gaussian PDF

Drag the sliders to change the mean (μ) and standard deviation (σ). The y-axis is scaled so the peak of the density matches the current σ.

peak density f(μ) = 0.3989

Mean μ0.00

Standard deviation σ1.00

The standard normal and the CDF

The special case $\mu = 0, \sigma = 1$ is called the standard normal, often denoted $Z \sim \mathcal{N}(0,1)$ , and its density is written $\phi(z)$ . Any normal random variable can be standardized: if $X \sim \mathcal{N}(\mu, \sigma^2)$ then $Z = (X-\mu)/\sigma \sim \mathcal{N}(0,1)$ .

The cumulative distribution function (CDF) is

\Phi(x) = \int_{-\infty}^{x} \phi(t)\,dt = \frac{1}{2}\left[1 + \operatorname{erf}\!\left(\frac{x}{\sqrt{2}}\right)\right]

where $\operatorname{erf}$ is the error function. Unlike the density, the CDF has no closed form in terms of elementary functions, which is why we rely on numerical approximations (or tabulated values). In Python, scipy.stats.norm handles both $\phi$ and $\Phi$ efficiently and is what I reach for in practice.

A note on tails

One thing worth emphasizing: the normal distribution has light tails. The probability of an outcome more than $5\sigma$ from the mean is on the order of $10^{-7}$ . In many real systems — financial returns, extreme weather events, model errors from specification failures — the actual tails are much heavier. When the consequences of tail events are large, reaching for a Student- $t$ , a Laplace, or a full Bayesian posterior predictive check is usually the right move rather than defaulting to Gaussian assumptions.