We always end up at a minimum, but it may be a local minimum rather than a global minimum.
Why do we choose to use gradient descent over simplex method?
I think simplex method is only for linear programming.
Lol you're right!
How do we solve gradient descent for non-linear systems?
@elenagong Well, if you can compute the gradient analytically, then (1) you compute the gradient at a point x_0, and (2) search along the gradient direction d for another point that minimizes your function f(x_0 - \alpha d). (See the next slide.) Otherwise, perhaps you can perform finite differencing at a point x_0 to estimate the gradient.
f(x_0 - \alpha d)
Do we ever normalize the gradient?
I don't think we need to? because we have the step size to tune the magnitude, but correct me if I am wrong
@Tdog @SlimShday Typically no---the classical gradient descent algorithm does not normalize the magnitude of the gradient. However, there is such a thing as normalized gradient descent, so there may be arguments to normalize.