Optimisation algorithm that is used to get the least value of a function
-
Cost / Loss cannot be calculated directly if the Loss function doesn’t consume
We utilise chain rule to deal with that in such cases
Variants
Newton’s Method
-
Stochastic Gradient Descent
- Estimate the gradient on the Sub-sampled data points instead of all the whole data points
- Can be efficient if the size of data is very large
- Faster than normal Gradient Descent