Keywords: underfitting, overfitting, polynomial regression, bias-variance tradeoff, machine learning, regularization, cross-validation, data science, model complexity
This repository explains the fundamental concepts of:
- Underfitting
- Overfitting
- Polynomial Regression
- Bias–Variance Tradeoff
- Regularization (Ridge / Lasso)
- Model Complexity in Machine Learning
Underfitting occurs when a model is too simple to capture the true structure of the data.
Mathematically, we try to approximate:
but instead use an overly simple model:
when the true function is nonlinear.
- High bias
- Low variance
- High training error
- High test error
True function:
Fitted model:
A linear model cannot represent quadratic curvature → underfitting.
Overfitting occurs when a model is too complex and learns noise instead of the true signal.
Instead of approximating:
the model tries to approximate:
including the noise term.
- Low training error
- High test error
- Low bias
- High variance
Polynomial regression with very high degree:
The curve passes through all training points but generalizes poorly.
Polynomial regression expands the feature space:
Model:
Matrix form:
where:
Minimize Mean Squared Error:
Solution:
Expected test error:
Where:
- Bias increases when the model is too simple
- Variance increases when the model is too complex
-
$$\sigma^2$$ is irreducible noise
- Low degree polynomial → high bias
- High degree polynomial → high variance
- Optimal degree → minimal test error
To prevent overfitting, we penalize large weights.
Solution:
Promotes sparsity.
As polynomial degree increases:
- Training error ↓
- Test error ↓ then ↑
This produces the classic U-shaped test error curve.
- Increase model complexity
- Add polynomial features
- Reduce regularization
- Reduce polynomial degree
- Add regularization
- Use cross-validation
- Increase dataset size
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline
model = make_pipeline(
PolynomialFeatures(degree=5),
Ridge(alpha=1.0)
)
model.fit(X, y)machine learning, underfitting vs overfitting, polynomial regression tutorial, bias variance tradeoff, ridge regression, lasso regression, regularization in machine learning, model complexity, data science course, supervised learning theory
- Linear Regression
- Gradient Descent
- Ridge vs Lasso
- Cross-Validation
- Feature Engineering
- Regularization Paths
- Data Science students
- Machine Learning engineers
- Researchers studying model generalization
- Anyone learning bias-variance tradeoff
If you are studying Data Science fundamentals, this repository provides a clear mathematical and intuitive explanation of underfitting and overfitting using polynomial regression.
⭐ If this repository helps you understand the bias–variance tradeoff, consider giving it a star.