Gauss-Newton and conjugate gradient methods use derivatives. When Mathematica cannot compute symbolic derivatives, finite differences will be used. Computing derivatives with finite differences can impose a significant cost in some cases and certainly affects the reliability of derivatives, ultimately having an effect on how good an approximation to the minimum is achievable. For functions where symbolic derivatives are not available, an alternative is to use a derivative-free algorithm, where an approximate model is built up using only values from function evaluations.
Mathematica uses the principal axis method of Brent [Br02] as a derivative-free algorithm. For an -variable problem, take a set of search directions and a point . Take to be the point that minimizes along the direction from (i.e. do a line search from ), then replace with . At the end, replace with . Ideally, the new should be linearly independent, so that a new iteration could be undertaken, but in practice, they are not. Brent's algorithm involves using the singular value decomposition (SVD) on the matrix to realign them to the principal directions for the local quadratic model. (An eigen decomposition could be used, but Brent shows that the SVD is more efficient.) With the new set of obtained, another iteration can be done.
Two distinct starting conditions in each variable are required for this method because these are used to define the magnitudes of the vectors . In fact, whenever you specify two starting conditions in each variable, FindMinimum, FindMaximum, and FindFit will use the principal axis algorithm by default.
The basics of the search algorithm can be seen quite well from the plot since the derivative-free line search algorithm requires a substantial number of function evaluations. First a line search is done in the direction, then from that point, a line search is done in the direction, determining the step direction. Once the step is taken, the vectors are realigned appropriately to the principal directions of the local quadratic approximation and the next step is similarly computed.
The algorithm is efficient in terms of convergence rate; it has quadratic convergence in terms of steps. However, in terms of function evaluations, it is quite expensive because of the derivative-free line search required. Note that since the directions given to the line search (especially at the beginning) are not necessarily descent directions, the line search has to be able to search in both directions. For problems with many variables, the individual linear searches in all directions become very expensive, so this method is typically better suited to problems without too many variables.