Recent Notes

Visionary GenAI
Mar 14, 2025
Industry and Competitive Analysis
Mar 11, 2025

❯

❯

❯

Gradient Descent

Gradient Descent

Mar 04, 20241 min read

Optimisation algorithm that is used to get the least value of a function

W_{n e w} = W_{o l d} - η \frac{\partial L oss ( y , y ^ )}{\partial W _{o l d}}

Cost / Loss cannot be calculated directly if the Loss function doesn’t consume $W_{o l d}$

We utilise chain rule to deal with that in such cases
- $\frac{\partial L oss ( y , y ^ )}{\partial W _{o l d}} = \frac{c}{\partial y ^} \cdot \frac{\partial y ^}{X} \cdot \frac{X}{\partial W _{o l d}}$

Variants

Newton’s Method

‍

Stochastic Gradient Descent

Estimate the gradient on the Sub-sampled data points instead of all the whole data points
Can be efficient if the size of data is very large
Faster than normal Gradient Descent

Graph View

Variants
Newton’s Method
Stochastic Gradient Descent

Created with <3
Copyright © 2025 avcton

Linkedin