Previous | Next --- Slide 25 of 33
Back to Lecture Thumbnails
lucida

There's a method called backtracking line search used in implementations of gradient descent based off of the Armijo–Goldstein condition. Intuitively, the idea is that we choose a step size that yields a decrease in the objective function that is close to what we should expect based on the gradient of the function at our current location.

For example, if we look at the step from x0 to x1 in the graph above, back tracking line search would tell us that the step size is way too big because the gradient at x0 is quite steep and we should thus expect a much larger decrease in the objective function after taking an appropriately sized step than what we see if we were to take the step from x0 to x1. With backtracking line search, we start off with a large step size and gradually shrink it (if necessary) until the decrease in the objective function is within some range of what we expect based on the gradient.

kmcrane

If I remember correctly, I don't think backtracking the Armijo-Goldstein condition is guaranteed to find a local minimum, is it? There are other schemes (e.g., interval search on the Armijo-Wolfe condition) that will guarantee that you always converge at least to a local minimum, which is a useful guarantee to be able to make in a practical solver (i.e., that the solver eventually stops somewhere useful! :-))

lucida

@kmcrane Yes, agreed, backtracking line search is not guaranteed to find the local minimum, it just helps you choose productive steps towards the local minimum :)

When I used backtracking line search for a multinomial logistic regression I had an error tolerance threshold epsilon. Once the loss function was close enough to zero (within this predefined epsilon), I terminated the gradient descent.