-
Notifications
You must be signed in to change notification settings - Fork 0
Linear Regression
Linear regression is a statistical model that is used to understand the relationship between a dependent variable and one or more independent variables. A linear regression model outputs a continuous-valued prediction based on the input variables as opposed to a categorical or discrete prediction.
The cost function for linear regression is:
-
$y$ is the vector of labels. -
$X$ is the$n\times m$ matrix of input features (sometimes called the design matrix) which includes the "intercept" feature, i.e., the constant features. -
$\theta$ is the$m \times 1$ matrix of parameters.
Our goal is to minimise the cost function
Gradient-Descent is an algorithm used to minimise a function by iteratively moving in the direction of steepest descent. To find the "steepest descent" we need to first find the derivative of our cost function.
Since our parameters can be multi-dimensional, we need to find the partial derivative with respect to each dimension
-
$\alpha$ is the learning rate, which is a hyper-parameter which controls how much the parameters should be adjusted each iteration. Too low a value causes gradient descent to happen slowly, while too large a value will prevent gradient descent from converging and could cause it to diverge.
The vectorised form of this equation is
This algorithm is known as "Batch" gradient descent, meaning each step of gradient descent makes use of the entire training set. This is obviously quite computationally expensive, and more efficient variations like Stochastic gradient descent and mini-batch gradient descent have been developed to solve this issue.
The Normal equation is a method to minimise your cost function analytically.
L2 (Ridge) Regularisation can be applied to the linear regression cost function as follows:
$$ J(\theta) = \frac{1}{2m}[\sum_{i=1}^{m} (h_{\theta}(x^{(i)}) - y^{(i)})^{2}+ \lambda \sum_{j=1}^{n} \theta_{j}^{2}]$$
The partial derivative of the cost function with regularisation would then be