The normal distribution shows up constantly in statistics and probabilistic modeling. Bayesian models, bootstraps, measurement uncertainty — the Gaussian is involved in most of it. This post walks through the key ideas and gives you an interactive widget to build intuition.
The density function
A continuous random variable is said to follow a normal distribution with mean and standard deviation , written , if its probability density function is
The formula has a clean structure once you read it from the inside out:
- The exponent where is the standardized distance from the mean. Squaring makes the penalty symmetric; the factor of sets the curvature.
- The exponential maps that quadratic penalty to a positive value that decays rapidly in the tails — faster than any polynomial.
- The normalizing constant ensures the density integrates to 1 over the whole real line, which you can verify using the Gaussian integral .
The two parameters
Location shifts the entire curve left or right without changing its shape. It is simultaneously the mean, median, and mode of the distribution.
Scale controls the spread. A smaller produces a tall, narrow peak concentrated near ; a larger spreads probability mass into the tails and flattens the peak. Specifically:
- About 68% of the probability lies within of .
- About 95% lies within .
- About 99.7% lies within .
This “68–95–99.7 rule” is one of the most useful rules of thumb in statistics and arises directly from definite integrals of .
Why does it appear everywhere?
The Central Limit Theorem (CLT) is the primary reason. It says that the sum (or average) of a large number of independent, identically distributed random variables — regardless of their individual distribution — converges in distribution to a normal. Concretely, if are i.i.d. with mean and finite variance , then
This explains why measurement errors, sample means, and aggregated signals in nature tend to look Gaussian: they are sums of many small, independent contributions.
Beyond the CLT, the normal distribution is also the maximum-entropy distribution for a given mean and variance. In Bayesian modeling it appears naturally as a prior when all you know about a parameter is its scale, and as the limiting form of many likelihoods. In the projects I’ve worked on — from multivariate probit models for ML headcount estimation to Bayesian optimization of plant photosynthesis — the Gaussian is almost always either the model itself or the approximate shape of a posterior.
Explore it yourself
Use the sliders below to move and and watch the density curve update in real time. Notice how the peak height always equals , so as grows the curve must flatten to keep the total area equal to 1.
Gaussian PDF
Drag the sliders to change the mean (μ) and standard deviation (σ). The y-axis is scaled so the peak of the density matches the current σ.
peak density f(μ) = 0.3989
The standard normal and the CDF
The special case is called the standard normal, often denoted , and its density is written . Any normal random variable can be standardized: if then .
The cumulative distribution function (CDF) is
where is the error function. Unlike the density, the CDF
has no closed form in terms of elementary functions, which is why we rely on
numerical approximations (or tabulated values). In Python, scipy.stats.norm
handles both and efficiently and is what I reach for in practice.
A note on tails
One thing worth emphasizing: the normal distribution has light tails. The probability of an outcome more than from the mean is on the order of . In many real systems — financial returns, extreme weather events, model errors from specification failures — the actual tails are much heavier. When the consequences of tail events are large, reaching for a Student-, a Laplace, or a full Bayesian posterior predictive check is usually the right move rather than defaulting to Gaussian assumptions.