Principal Axis Method
and conjugate gradient
methods use derivatives. When Mathematica
cannot compute symbolic derivatives, finite differences will be used. Computing derivatives with finite differences can impose a significant cost in some cases and certainly affects the reliability of derivatives, ultimately having an effect on how good an approximation to the minimum is achievable. For functions where symbolic derivatives are not available, an alternative is to use a derivative-free algorithm, where an approximate model is built up using only values from function evaluations.
uses the principal axis method of Brent [Br02
] as a derivative-free algorithm. For an n
-variable problem, take a set of search directions u1, u2, ..., un
and a point x0
. Take xi
to be the point that minimizes f
along the direction ui
(i.e., do a "line search
" from xi-1
), then replace ui
. At the end, replace un
with xn- x0
. Ideally, the new ui
should be linearly independent, so that a new iteration could be undertaken, but in practice, they are not. Brent's algorithm involves using the singular value decomposition (SVD) on the matrix U= (u1, u2, ... un)
to realign them to the principal directions for the local quadratic model. (An eigen decomposition could be used, but Brent shows that the SVD is more efficient.) With the new set of ui
obtained, another iteration can be done.
Two distinct starting conditions in each variable are required for this method because these are used to define the magnitudes of the vectors ui
. In fact, whenever you specify two starting conditions in each variable, FindMinimum
, and FindFit
will use the principal axis algorithm by default.
This loads a package that contains some utility functions.
This shows the search path and function evaluations for FindMinimum
to find a local minimum of the function cos (x2- 3y) + sin (x2+y2)
The basics of the search algorithm can be seen quite well from the plot since the derivative-free line search algorithm requires a substantial number of function evaluations. First a line search is done in the x
direction, then from that point, a line search is done in the y
direction, determining the step direction. Once the step is taken, the vectors ui
are realigned appropriately to the principal directions of the local quadratic approximation and the next step is similarly computed.
The algorithm is efficient in terms of convergence rate; it has quadratic convergence in terms of steps. However, in terms of function evaluations, it is quite expensive because of the derivative-free line search required. Note that since the directions given to the line search (especially at the beginning) are not necessarily descent directions, the line search has to be able to search in both directions. For problems with many variables, the individual linear searches in all directions become very expensive, so this method is typically better suited to problems without too many variables.