Home
← Back to LearnETRM Visuals
Basic Statistics & Essential Tools

Essential Statistical Tools

Core ideas from Basic Statistics and Essential Statistical Tools — mean, spread, correlation, and regression — in a clear, playable way for trading and risk.

1 Basic statistics

To describe a set of numbers (e.g. daily prices or returns), we use a few key measures:

Mean: μ = (x₁ + x₂ + … + xₙ) / n
Std dev: σ = √( Σ(xᵢ − μ)² / n )

Summary of descriptive statistics

Statistic Measures Use in trading / risk
MeanCentral tendencyExpected return, fair value
MedianCentral tendency (robust)Less distorted by outliers than mean
Std devSpread / volatilityVaR, option pricing, risk
SkewnessAsymmetryTail risk direction (left vs right)
KurtosisTail heavinessFat tails, extreme-event likelihood
PercentilesQuantilesVaR (e.g. 95th, 99th percentile of loss)

Play with the data below — change values or add/remove points and watch the stats and bar chart update.

Data playground (e.g. daily returns or prices)
Mean
Median
Std dev

2 Distributions

A distribution describes how values are spread. The normal (Gaussian) distribution is common in finance: many returns and price changes cluster around the mean with symmetric tails.

About 68% of values fall within 1 standard deviation of the mean, 95% within 2σ, and 99.7% within 3σ. This is the basis for many risk and option models.

μ
−3σ −2σ −1σ μ +1σ +2σ +3σ

3 Correlation

Correlation measures how two variables move together, from −1 (perfect opposite) to +1 (perfect same direction). Zero means no linear relationship.

In trading: gas vs power prices, spot vs forward, or two commodities may be correlated. Correlation helps with hedging and diversification.

⚠️ Correlation does not imply causation. Two variables can be strongly correlated without one causing the other. A classic spurious example: ice cream sales and drownings are positively correlated (both rise in summer), but ice cream does not cause drownings — a third factor (warm weather) drives both. In trading, two prices may move together because of a common driver (e.g. oil) rather than one causing the other. Always ask: is there a genuine causal link, or just shared influences?
Choose a preset to see correlation and scatter plot
Correlation (r)
0.00

4 Regression

Regression fits a line (or curve) to data. Simple linear regression finds the line that minimizes the vertical distance from points to the line (least squares).

Formula: y = a + b·x. Here b is the slope (sensitivity of y to x) and a is the intercept. Used for forecasting, hedge ratios, and explaining one variable by another.

Key assumptions:
(1) Linearity — the relationship between x and y is linear;
(2) Independence — residuals are not correlated (e.g. no autocorrelation in time series);
(3) Homoscedasticity — constant variance of residuals (no fan-shaped pattern).

Failing these can lead to biased coefficients, wrong standard errors, and misleading hedge ratios or forecasts.

Regression demo — line of best fit (same data as correlation)
Slope b
Intercept a
In practice — hedge ratios: Regress spot price on forward price to estimate the optimal hedge ratio (the slope b tells you how many forward contracts to hold per unit of spot exposure). Similarly, regress one asset’s returns on another to measure beta or exposure. Example: regressing power spot returns on gas spot returns yields the sensitivity of power to gas, useful for spark spread hedging.

Why this matters in trading & ETRM