Previous | Next --- Slide 24 of 32
Back to Lecture Thumbnails

We always end up at a minimum, but it may be a local minimum rather than a global minimum.


Why do we choose to use gradient descent over simplex method?


I think simplex method is only for linear programming.


Lol you're right!


How do we solve gradient descent for non-linear systems?


@elenagong Well, if you can compute the gradient analytically, then (1) you compute the gradient at a point x_0, and (2) search along the gradient direction d for another point that minimizes your function f(x_0 - \alpha d). (See the next slide.) Otherwise, perhaps you can perform finite differencing at a point x_0 to estimate the gradient.


Do we ever normalize the gradient?


I don't think we need to? because we have the step size to tune the magnitude, but correct me if I am wrong


@Tdog @SlimShday Typically no---the classical gradient descent algorithm does not normalize the magnitude of the gradient. However, there is such a thing as normalized gradient descent, so there may be arguments to normalize.