Overview
When we summarise a dataset we typically want two complementary pieces of information: where values tend to cluster (a measure of location) and how they are spread around that centre (a measure of dispersion). Together these provide a compact but informative portrait of a distribution.
The choice of specific summaries is guided by the data's measurement scale, presence of outliers, and the goals of the analysis. For example, parametric modelling often uses mean and standard deviation because they fit naturally into likelihoods and least-squares; robust reporting in applied work often pairs median with IQR or MAD to protect against outliers. The sections that follow explain common measures, highlight their strengths and weaknesses, and show how they relate in practice.
Measures of Location
Measures of location aim to represent a "typical" value from the data. Below are the most frequently used estimators, ordered from the most common (mean) to alternatives used when particular data characteristics or interpretability concerns matter.
Arithmetic mean (sample)
The sample mean is the arithmetic centre of the data and has useful optimality properties in many inferential settings. Concretely, it minimizes the sum of squared deviations from the centre:
Under classical iid sampling with finite variance the sample mean is consistent and asymptotically normal (Central Limit Theorem). This makes it easy to build confidence intervals and hypothesis tests. However, because the mean gives equal weight to every observation and squares deviations in variance computations, it is sensitive to outliers and heavy tails.
Median
The median is the central quantile and minimizes the sum of absolute deviations: \(\tilde{x}=\arg\min_\mu\sum_i |x_i-\mu|\). This gives the median a bounded influence — a single extreme value cannot drag it arbitrarily far — and a high breakdown point (~50%), which makes it a preferred summary for skewed or contaminated data.
The trade-off is efficiency: when the underlying distribution is Gaussian the mean has lower asymptotic variance than the median (the asymptotic relative efficiency of the median is ≈0.64 under normality). Practically, we often report both mean and median to make the presence of skewness or outliers visible.
Mode
The mode identifies the most frequent value (discrete data) or the point of highest density (continuous data). It is particularly useful for categorical outcomes or multimodal distributions where a single central tendency measure (mean/median) hides important structure. For continuous variables the mode is usually estimated via density methods (histogram peaks, kernel density estimates) and can be unstable in small samples.
Other location estimators
Several alternatives exist when your problem requires them:
- Geometric mean : appropriate for positive-valued, multiplicative processes (e.g., long-run growth rates). It is sensitive to zeros and negatives.
- Harmonic mean : useful when averaging rates over a fixed denominator (e.g., average speed over equal distances).
- RMS : the root-mean-square reflects magnitude and energy; common in engineering and signal processing.
- Trimmed / Winsorized means : reduce the effect of extremes by removing or capping tails; provide a middle ground between full efficiency and robustness.
- M-estimators (e.g., Huber) : solutions to an estimating equation involving a bounded score (\(\psi\)) allow you to tune the robustness/efficiency trade-off explicitly.
\[ \sum_{i=1}^n \psi\!\left(\frac{x_i-\hat\theta}{s}\right)=0, \]
How to choose a location measure
Rather than a single rule, choose a location summary guided by the data and the inferential goal:
- Use mean ± standard deviation for roughly symmetric data and when you plan parametric inference (t-tests, linear models).
- Prefer median + IQR (or MAD) when distributions are skewed or contain outliers; these are robust alternatives that communicate centrality without being distorted by extremes.
- For multiplicative phenomena (returns, compound growth) report the geometric mean, and consider showing both arithmetic and geometric means to clarify interpretation.
- When in doubt, report multiple summaries (classical and robust) and use graphical displays (boxplots, density plots) to complement the numbers.
Measures of Dispersion
Measures of dispersion quantify how widely values scatter around the chosen centre. They answer complementary questions to location measures: are values tightly clustered, moderately spread, or strongly dispersed with heavy tails? Below are common choices and when they are informative.
Range
The range is the simplest measure and useful for bounding values (e.g., specification limits), but because it depends only on the two most extreme observations it is highly sensitive to outliers and typically not used as the sole measure of spread.
Variance and standard deviation
Population variance (when the entire population is observed):
Sample (unbiased) variance — Bessel correction:
Standard deviation (\(s=\sqrt{s^2}\)) is the most common spread used in inferential statistics. It plays a central role because many models assume normally distributed errors where standard deviation fully characterises dispersion. Its downside is sensitivity to outliers — squaring deviations gives high weight to extreme values and can inflate variance estimates under heavy tails.
Interquartile range (IQR)
The IQR measures the spread of the middle 50% of observations and is robust to extremes. It is often paired with the median to describe the typical range where most observations lie. For a normal distribution the relationship to \(\sigma\) is approximately:
which is useful if you need an approximate conversion under normality assumptions.
Mean absolute deviation (MAD)
MAD is a median-based spread with high breakdown (~50%) and, when scaled by about 1.4826, provides a consistent estimator for \(\sigma\) under Gaussian assumptions. Because MAD uses absolute deviations, it is less sensitive to extreme observations than the standard deviation.
Coefficient of variation (CV)
CV is unitless and useful for comparing relative dispersion across variables measured on different scales. Use it with caution when the mean is close to zero (it becomes unstable) or when outliers distort \(s\).
Robust and tail-focused measures
When tail behaviour matters — for example in finance or risk management — classical variance may understate risk. Consider:
- Trimmed / Winsorized variance: trim or cap extremes before computing spread to reduce influence of outliers while retaining most data.
- Quantile spreads: reporting several quantiles (e.g., 10th, 25th, 50th, 75th, 90th) communicates tail shape directly.
- Tail risk measures: Value-at-Risk (VaR) and Expected Shortfall (ES) focus on extreme losses and are used where tail events dominate decision-making.
How to choose a dispersion measure
- For parametric inference and symmetric errors: use standard deviation.
- For robustness and skewed data: prefer IQR or MAD, and consider reporting both classical and robust measures.
- To compare scale across groups with different means: consider CV, but check mean stability first.
Theoretical properties
A few theoretical concepts explain why different estimators behave as they do and how location and dispersion are connected in practice.
- Breakdown point: the largest fraction of contaminated data that can make an estimator arbitrarily bad. Median and MAD have high breakdown (~50%), whereas mean and variance have breakdown near 0% (a single extreme point can dominate).
- Influence function: captures local sensitivity to contamination. Bounded influence implies robustness — the median and MAD are bounded, the mean is not.
- Asymptotic variance / efficiency: under normality the mean is optimal in the class of unbiased location estimators. The median trades some efficiency for robustness (ARE ≈ 0.64 under normality).
- Consistency: classical estimators (mean, median, variance) converge to the true parameter under mild regularity conditions, enabling reliable large-sample inference when assumptions hold.
In practice these properties guide reporting choices: if you expect contamination or heavy tails, prefer robust summaries; if you rely on parametric models, the mean and SD integrate more naturally into estimation and hypothesis testing.
Worked numerical examples and interpretation
Example 1 — Outlier effect
Compute the main summaries to see how a single extreme value affects different statistics:
- \(\text{mean} = \dfrac{10+12+11+13+100}{5} = 29.2\).
- \(\text{median} = 12.\)
- \(\mathrm{IQR}: Q_1=11,\ Q_3=13 \Rightarrow \mathrm{IQR}=2.\)
- \(\mathrm{MAD}=\mathrm{median}(|x-12|)=1\) (scaled MAD ≈ \(1.4826\)).
Interpretation: the mean and variance are pulled upward by the 100, overstating a typical value and spread. Median, IQR and MAD remain stable, so reporting them in addition to the mean reveals the distortion and aids transparent interpretation.
Example 2 — Averaging rates (harmonic mean)
Averaging speeds over equal distances requires the harmonic mean; the arithmetic mean would give the wrong overall average. This highlights how the underlying data-generating mechanism (additive vs multiplicative) dictates the appropriate summary.
Example 3 — Compounded returns (geometric mean)
For sequences of multiplicative growth (returns), the geometric mean captures the correct long-run average growth rate. When communicating investment performance, pairing arithmetic and geometric means clarifies short-term vs long-term perspectives.
Conclusion
Measures of location and dispersion are complementary: one describes the centre, the other describes spread. There is no single "best" measure — the right choice depends on data scale, presence of outliers, and the question at hand. Good practice is to pair a location measure with a corresponding dispersion measure (e.g., mean ± SD, median with IQR), to show both classical and robust summaries when appropriate, and to accompany numbers with simple visualisations (histogram, boxplot, density) so readers can immediately see shape and tail behaviour.
References & further reading
- Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A. (1986). Robust Statistics: The Approach Based on Influence Functions. Wiley.
- Huber, P. J., & Ronchetti, E. M. (2009). Robust Statistics (2nd ed.). Wiley.
- Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference. Springer.
- Wilcox, R. R. (2012). Modern Statistics for the Social and Behavioral Sciences. CRC Press.
- DeGroot, M. H., & Schervish, M. J. (2012). Probability and Statistics. Pearson.