← Back to blog

AI-generated Drafting and editing for this post included AI assistance.

math · statistics · data-science · interactive 12 Apr 2026 4 min read

The Normal Distribution — An Interactive Introduction


The normal distribution shows up constantly in statistics and probabilistic modeling. Bayesian models, bootstraps, measurement uncertainty — the Gaussian is involved in most of it. This post walks through the key ideas and gives you an interactive widget to build intuition.

The density function

A continuous random variable XX is said to follow a normal distribution with mean μ\mu and standard deviation σ>0\sigma > 0, written XN(μ,σ2)X \sim \mathcal{N}(\mu, \sigma^2), if its probability density function is

f(xμ,σ)=1σ2πexp ⁣(12(xμσ)2)f(x \mid \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}} \exp\!\left(-\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2\right)

The formula has a clean structure once you read it from the inside out:

  1. The exponent 12z2-\tfrac{1}{2}z^2 where z=(xμ)/σz = (x-\mu)/\sigma is the standardized distance from the mean. Squaring zz makes the penalty symmetric; the factor of 12-\tfrac{1}{2} sets the curvature.
  2. The exponential maps that quadratic penalty to a positive value that decays rapidly in the tails — faster than any polynomial.
  3. The normalizing constant 1/(σ2π)1/(\sigma\sqrt{2\pi}) ensures the density integrates to 1 over the whole real line, which you can verify using the Gaussian integral et2dt=π\int_{-\infty}^{\infty} e^{-t^2}\,dt = \sqrt{\pi}.

The two parameters

Location μ\mu shifts the entire curve left or right without changing its shape. It is simultaneously the mean, median, and mode of the distribution.

Scale σ\sigma controls the spread. A smaller σ\sigma produces a tall, narrow peak concentrated near μ\mu; a larger σ\sigma spreads probability mass into the tails and flattens the peak. Specifically:

This “68–95–99.7 rule” is one of the most useful rules of thumb in statistics and arises directly from definite integrals of ff.

Why does it appear everywhere?

The Central Limit Theorem (CLT) is the primary reason. It says that the sum (or average) of a large number of independent, identically distributed random variables — regardless of their individual distribution — converges in distribution to a normal. Concretely, if X1,X2,,XnX_1, X_2, \ldots, X_n are i.i.d. with mean μ\mu and finite variance σ2\sigma^2, then

Xˉnμσ/ndN(0,1)as n\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1) \quad \text{as } n \to \infty

This explains why measurement errors, sample means, and aggregated signals in nature tend to look Gaussian: they are sums of many small, independent contributions.

Beyond the CLT, the normal distribution is also the maximum-entropy distribution for a given mean and variance. In Bayesian modeling it appears naturally as a prior when all you know about a parameter is its scale, and as the limiting form of many likelihoods. In the projects I’ve worked on — from multivariate probit models for ML headcount estimation to Bayesian optimization of plant photosynthesis — the Gaussian is almost always either the model itself or the approximate shape of a posterior.

Explore it yourself

Use the sliders below to move μ\mu and σ\sigma and watch the density curve update in real time. Notice how the peak height always equals 1/(σ2π)1/(\sigma\sqrt{2\pi}), so as σ\sigma grows the curve must flatten to keep the total area equal to 1.

Gaussian PDF

Drag the sliders to change the mean (μ) and standard deviation (σ). The y-axis is scaled so the peak of the density matches the current σ.

Normal(μ = 0.00, σ = 1.00) PDF

peak density f(μ) = 0.3989

0.00
1.00

The standard normal and the CDF

The special case μ=0,σ=1\mu = 0, \sigma = 1 is called the standard normal, often denoted ZN(0,1)Z \sim \mathcal{N}(0,1), and its density is written ϕ(z)\phi(z). Any normal random variable can be standardized: if XN(μ,σ2)X \sim \mathcal{N}(\mu, \sigma^2) then Z=(Xμ)/σN(0,1)Z = (X-\mu)/\sigma \sim \mathcal{N}(0,1).

The cumulative distribution function (CDF) is

Φ(x)=xϕ(t)dt=12[1+erf ⁣(x2)]\Phi(x) = \int_{-\infty}^{x} \phi(t)\,dt = \frac{1}{2}\left[1 + \operatorname{erf}\!\left(\frac{x}{\sqrt{2}}\right)\right]

where erf\operatorname{erf} is the error function. Unlike the density, the CDF has no closed form in terms of elementary functions, which is why we rely on numerical approximations (or tabulated values). In Python, scipy.stats.norm handles both ϕ\phi and Φ\Phi efficiently and is what I reach for in practice.

A note on tails

One thing worth emphasizing: the normal distribution has light tails. The probability of an outcome more than 5σ5\sigma from the mean is on the order of 10710^{-7}. In many real systems — financial returns, extreme weather events, model errors from specification failures — the actual tails are much heavier. When the consequences of tail events are large, reaching for a Student-tt, a Laplace, or a full Bayesian posterior predictive check is usually the right move rather than defaulting to Gaussian assumptions.