-
Notifications
You must be signed in to change notification settings - Fork 0
Gradient Descent
Cost function requires the entire dataset. Therefore every iteration of gradient descent requires insight into the whole dataset.
Stochastic gradient descent computes the cost on a single sample from the dataset at a time. For a single epoch, loop through the entire dataset and calculate the cost for each sample, and update the weights accordingly. Randomly shuffle the dataset before each epoch.
Can result in the cost not decreasing on every iteration, but the general trend should still be downwards. The cost might never hit the true minimum, and may just bounce around that point. If plotting the learning curve, average the last
Use
Can continuously update model as samples stream in (Stochastic).
Map-reduce is a method to parallelise your training. Machine 1 splits the dataset into