Slide View : 15-462/662 Fall 2020

Previous | Next --- Slide 26 of 32

Arthas007 5 years ago

I am still confused about the sign before tau, sometimes it is negative and sometimes it is positive, how do we decide that?

keenan 5 years ago

@Arthas007 The gradient is the direction along which the function will increase quickest. So if you want to minimize the function, use a minus; if you want to maximize the function, use a plus.

hubbahubba 5 years ago

Because of numerical imprecision, aren't we technically not necessarily going in the steepest descent direction? Are there cases where that would lead to failure? Also, is there an equivalent to GD for discrete problems?

keenan 5 years ago

@hubbahubba For most reasonable (i.e., well-conditioned) problems, floating point error will not result in significant change in the gradient direction. If your problem is this poorly conditioned, you're gonna have other problems anyway...

It's also important to realize that any direction $u$ that has a positive inner product with the gradient $\nabla f$ will be a "descent direction," i.e., if $\langle u, \nabla f \rangle > 0$, then moving in the direction $u$ will still decrease the function (consider a low-order Taylor series around the current point...). Hence, small floating point errors will not cause any trouble.

Most importantly, there are plenty of methods that intentionally pick a different descent direction (other than the gradient) in order to make more rapid progress. There is nothing holy about the gradient. (In fact, the gradient itself doesn't have one canonical definition: it depends on the choice of inner product...)