Why is it the best possible approximation? Don't we have all the directions to go at any point?
Something can only be “best” with respect to some chosen measure of quality. Although the Taylor series is very natural, it does not traditionally arise directly from looking for the “best” function in an explicit sense. However, one can study the sense in which it is formally best; see for instance this discussion.
I may be missing something, but is this essentially just the first two terms in the taylor series (since the gradient is the partial derivatives) ?
@intelligentDungBeetle Correct. (Though in general you can't express the gradient as a list of partial derivatives; see our discussion on the $L^2$ gradient, for instance.)