How does applying a coordinate transformation make it so that the gradient points towards the minimizer?
Still not quite understand why we multiply gradient by Hessian inverse?
I'm also confused about the Hessian inverse. Some additional explanation would be good.
Hessian is used in second order derivative while doing optimization, and Hessian inverse is also used in second order derivative.
Comment from last semester quite solves the problem of how Hessian inverse comes: