Principal Axis Method
GaussNewton and
conjugate gradient methods use derivatives. When
Mathematica cannot compute symbolic derivatives, finite differences will be used. Computing derivatives with finite differences can impose a significant cost in some cases and certainly affects the reliability of derivatives, ultimately having an effect on how good an approximation to the minimum is achievable. For functions where symbolic derivatives are not available, an alternative is to use a derivativefree algorithm, where an approximate model is built up using only values from function evaluations.
Mathematica uses the principal axis method of Brent [
Br02] as a derivativefree algorithm. For an
nvariable problem, take a set of search directions
u_{1}, u_{2}, ..., u_{n} and a point
x_{0}. Take
x_{i} to be the point that minimizes
f along the direction
u_{i} from
x_{i1} (i.e., do a "
line search" from
x_{i1}), then replace
u_{i} with
u_{i+1}. At the end, replace
u_{n} with
x_{n} x_{0}. Ideally, the new
u_{i} should be linearly independent, so that a new iteration could be undertaken, but in practice, they are not. Brent's algorithm involves using the singular value decomposition (SVD) on the matrix
U= (u_{1}, u_{2}, ... u_{n}) to realign them to the principal directions for the local quadratic model. (An eigen decomposition could be used, but Brent shows that the SVD is more efficient.) With the new set of
u_{i} obtained, another iteration can be done.
Two distinct starting conditions in each variable are required for this method because these are used to define the magnitudes of the vectors
u_{i}. In fact, whenever you specify two starting conditions in each variable,
FindMinimum,
FindMaximum, and
FindFit will use the principal axis algorithm by default.
This loads a package that contains some utility functions. 
This shows the search path and function evaluations for FindMinimum to find a local minimum of the function cos (x^{2} 3y) + sin (x^{2}+y^{2}).
Out[2]=  

The basics of the search algorithm can be seen quite well from the plot since the derivativefree line search algorithm requires a substantial number of function evaluations. First a line search is done in the
x direction, then from that point, a line search is done in the
y direction, determining the step direction. Once the step is taken, the vectors
u_{i} are realigned appropriately to the principal directions of the local quadratic approximation and the next step is similarly computed.
The algorithm is efficient in terms of convergence rate; it has quadratic convergence in terms of steps. However, in terms of function evaluations, it is quite expensive because of the derivativefree line search required. Note that since the directions given to the line search (especially at the beginning) are not necessarily descent directions, the line search has to be able to search in both directions. For problems with many variables, the individual linear searches in all directions become very expensive, so this method is typically better suited to problems without too many variables.