Slide View : 15-462/662 Fall 2020

Previous | Next --- Slide 37 of 46

rgrao 5 years ago

Interestingly, I think the same sort of concept was used when defining different types of numerical gradient optimization schemes like when using the "lookahead term" in Nesterov momentum. https://dominikschmidt.xyz/nesterov-momentum/

keenan 5 years ago

Could be. Gradient descent is one specific example of an ODE: given an objective (\phi(x)), integrate the ODE

$$\tfrac{d}{dt} x(t) = -\nabla\phi(x(t)).$$

So, there are special-purpose techniques (like Nesterov methods, etc.) that can be applied to this ODE (but may not provide general-purpose solutions).