Optimisation algorithm that is used to get the least value of a function

  • Cost / Loss cannot be calculated directly if the Loss function doesn’t consume

    We utilise chain rule to deal with that in such cases

Variants

Newton’s Method

Stochastic Gradient Descent

  • Estimate the gradient on the Sub-sampled data points instead of all the whole data points
  • Can be efficient if the size of data is very large
  • Faster than normal Gradient Descent