Correct me if I am wrong: MH is basically walking through samples but only in the direction of increasing (value?brightness?probability?) What if we are at a local max?
That's the whole idea of ergodicity. We can imagine sampling the "local maxima" as the maximal node in some sort of connected component in a directed graph. You're correct in thinking that if we only sample in direction of increasing brightness, we'll never escape this connected component.
What we do to combat this (and ensure the ergodicity of our system) is that sometimes we just jump arbitrarily. We explicitly stop following the direction of increasing sample value and move to some new random node in our graph. If we do this enough times, as time goes on we'll have been in many different components so we don't focus too hard on any one local maxima.
MH doesn't only go walk toward larger values—it merely has a tendency to walk more toward larger values than smaller ones. But there is still some chance it will go from large to small. Think about what happens in the "if" statement: even if alpha is very small, there is still some finite chance that we'll move to the new point.