-
Notifications
You must be signed in to change notification settings - Fork 0
Machine Learning & Data Science Statistics Time Series Analysis Box Cox Transform
github-actions[bot] edited this page Nov 22, 2025
·
1 revision
The Box-Cox transform is a statistical technique used to stabilise variance (see Machine-Learning-&-Data-Science-Statistics-Time-Series-Analysis-Stationarity) and make data more closely resemble a Normal Distribution, which is often required for certain statistical models and tests. It transforms non-normal dependent variables into a normal shape by applying a power transformation.
The Box-Cox transformation is defined as:
$$y(\lambda) = \begin{cases} \frac{y^{\lambda} - 1}{\lambda}, & \text{if } \lambda \neq 0, \ \log(y), & \text{if } \lambda = 0. \end{cases}$$ where:
-
$y$ is the original data, -
$\lambda$ is the transformation parameter that determines the nature of the transformation.
-
Parameter
$\lambda$ :- The parameter
$\lambda$ controls the transformation. The value of$\lambda$ is typically chosen by maximising the likelihood of a model (e.g., maximising the log-likelihood in a regression setting). - Common values of
$\lambda$ :- $\lambda = 1: No transformation (linear),
-
$\lambda = 0$ : Logarithmic transformation, -
$\lambda = 0.5$ : Square root transformation, -
$\lambda = -1$ : Reciprocal transformation.
- The parameter
-
Purpose:
- The primary aim of the Box-Cox transformation is to make the data distribution more normal (Gaussian), which is often a requirement for many statistical tests and models (like linear regression).
- It can also be used to stabilise variance, which can improve the performance of certain models.
-
Limitations:
- It only works on positive data since the log transformation (used when
$\lambda = 0$ ) is undefined for zero or negative values.
- It only works on positive data since the log transformation (used when
In practice, the optimal value of