Interestingly, I think the same sort of concept was used when defining different types of numerical gradient optimization schemes like when using the "lookahead term" in Nesterov momentum. https://dominikschmidt.xyz/nesterov-momentum/
keenan
Could be. Gradient descent is one specific example of an ODE: given an objective (\phi(x)), integrate the ODE
$$\tfrac{d}{dt} x(t) = -\nabla\phi(x(t)).$$
So, there are special-purpose techniques (like Nesterov methods, etc.) that can be applied to this ODE (but may not provide general-purpose solutions).
Interestingly, I think the same sort of concept was used when defining different types of numerical gradient optimization schemes like when using the "lookahead term" in Nesterov momentum. https://dominikschmidt.xyz/nesterov-momentum/
Could be. Gradient descent is one specific example of an ODE: given an objective (\phi(x)), integrate the ODE
$$\tfrac{d}{dt} x(t) = -\nabla\phi(x(t)).$$
So, there are special-purpose techniques (like Nesterov methods, etc.) that can be applied to this ODE (but may not provide general-purpose solutions).