How does applying a coordinate transformation make it so that the gradient points towards the minimizer?

supernova

Still not quite understand why we multiply gradient by Hessian inverse?

rmvenkat

I'm also confused about the Hessian inverse. Some additional explanation would be good.

SnackMixer

Hessian is used in second order derivative while doing optimization, and Hessian inverse is also used in second order derivative.

wenere

Comment from last semester quite solves the problem of how Hessian inverse comes:
http://15462.courses.cs.cmu.edu/fall2019/lecture/optimization/slide_027

