Previous | Next --- Slide 37 of 46
Back to Lecture Thumbnails
rgrao

Interestingly, I think the same sort of concept was used when defining different types of numerical gradient optimization schemes like when using the "lookahead term" in Nesterov momentum. https://dominikschmidt.xyz/nesterov-momentum/

keenan

Could be. Gradient descent is one specific example of an ODE: given an objective (\phi(x)), integrate the ODE

$$\tfrac{d}{dt} x(t) = -\nabla\phi(x(t)).$$

So, there are special-purpose techniques (like Nesterov methods, etc.) that can be applied to this ODE (but may not provide general-purpose solutions).