Skip to content

Splines

github-actions[bot] edited this page Nov 16, 2025 · 1 revision

A spline is a special function defined piecewise by polynomials. In statistics and machine learning, splines are a powerful tool used in regression analysis to fit complex, non-linear patterns in data.

The core idea is to move away from fitting a single, high-degree polynomial to the entire dataset (which can be wildly inaccurate and prone to overfitting) and instead fit a series of lower-degree polynomials to different sections of the data.

Knots and Pieces

Imagine you have a scatter plot with a complex, wavy pattern.

  1. You first select a set of points along the x-axis called knots.
  2. These knots divide the data into separate regions.
  3. Within each region, you fit a simple polynomial (e.g., a cubic function).
  4. You then connect these polynomial pieces together at the knots, with the crucial condition that the resulting curve must be "smooth."

This "smoothness" constraint is what makes a spline a spline. For a cubic spline (the most common type), we require that the value of the function, its first derivative, and its second derivative are all continuous at each knot. This ensures that the final curve doesn't have any abrupt jumps, sharp corners, or sudden changes in curvature.

images/Pasted image 20251010183152.png Cubic spline (A) and monotonic cubic spline (B) with two knots.

Why Use Splines?

  • Flexibility: Splines can model very complex and highly non-linear relationships without having to resort to a single, high-degree polynomial.
  • Local Control: Changing the data in one region only affects the spline in that local area, unlike a global polynomial where a single point can change the entire curve.
  • Avoids Overfitting: By using low-degree polynomials (like cubic), splines have less of a tendency to overfit compared to a single high-degree polynomial (Runge's phenomenon).

Spline Regression

In practice, we don't manually place the knots. Instead, in spline regression, the model learns the optimal placement and coefficients of the polynomial pieces simultaneously. The number of knots (or the "degrees of freedom") is a hyperparameter that controls the flexibility of the spline.

  • Too few knots: The model will be too simple and will underfit the data.
  • Too many knots: The model will be too flexible and will overfit the data, wiggling excessively to capture noise.

Spline regression is a form of non-parametric regression and is a key component of Generalized Additive Models (GAMs), where splines are used to capture the non-linear effects of multiple predictor variables on an outcome.

Clone this wiki locally