Previous | Next --- Slide 27 of 32
Back to Lecture Thumbnails
anonymous

Is there any intuition as to why the inverse of the Hessian provides this coordinate change we desire?

motoole2

@anonymous Newton's method comes from making a 2nd order (quadratic) approximation of our function f(x) using the Taylor series expansion.

f(x + x_0) = f(x_0) + x^T * df(x_0) + (1/2) x^T * H(x_0) * x

df(x_0) is the gradient and H(x_0) is the Hessian at point x_0. We can now minimize this function by taking its derivative with respect to x and setting the result to zero.

df(x_0) + H(x_0) x = 0

And solving for x gives us

x = -inv(H(x_0)) * df(x_0)

So using this quadratic approximation of our function, the minimum is at x_0 - inv(H(x_0)) * df(x_0). (Of course, since our function is not usually a quadratic, we need to repeat this procedure multiple times to converge to the correct solution.)