Nonlinear Conjugate Gradient Methods
The basis for a nonlinear conjugate gradient method is to effectively apply the linear conjugate gradient method, where the residual is replaced by the gradient. A model quadratic function is never explicitly formed, so it is always combined with a "line search"
The first nonlinear conjugate gradient method was proposed by Fletcher and Reeves as follows. Given a step direction pk
, use the line search to find k
such that xk+1 = xk + kpk
. Then compute
It is essential that the line search for choosing k
satisfies the strong Wolfe conditions; this is necessary to ensure that the directions pk
are descent directions [NW99
An alternate method, which generally (but not always) works better in practice, is that of Polak and Ribiere, where equation (2) is replaced with
In formula (3), it is possible that k+1
can become negative, in which case Mathematica
uses the algorithm modified by using pk+1=max (k+1, 0) pk-f (xk+1)
. In Mathematica
, the default conjugate gradient method is Polak-Ribiere, but the Fletcher-Reeves method can be chosen by using the method option
The advantage of conjugate gradient methods is that they use relatively little memory for large-scale problems and require no numerical linear algebra, so each step is quite fast. The disadvantage is that they typically converge much more slowly than "Newton"
methods. Also, steps are typically poorly scaled for length, so the "line search"
algorithm may require more iterations each time to find an acceptable step.
This loads a package that contains some utility functions.
This shows a plot of the steps taken by the nonlinear conjugate gradient method. The path is much less direct than for Newton's method.
One issue that arises with nonlinear conjugate gradient methods is when to restart them. As the search moves, the nature of the local quadratic approximation to the function may change substantially. The local convergence of the method depends on that of the linear conjugate gradient method, where the quadratic function is constant. With a constant quadratic function for n
variables and an exact line search, the linear algorithm will converge in n
or fewer iterations. By restarting (taking a steepest descent step with k+1 = 0
) every so often, it is possible to eliminate information from previous points, which may not be relevant to the local quadratic model at the current search point. If you look carefully at the example, you can see where the method was restarted and a steepest descent step was taken. One option is to simply restart after every k
iterations, where k<=n
. You can specify this using the method option "RestartIterations"->k
. An alternative is to restart when consecutive gradients are not sufficiently orthogonal according to the test
with a threshold
between 0 and 1. You can specify this using the method option "RestartThreshold"->
The table summarizes the options you can use with the conjugate gradient methods.
|"Method"||"PolakRibiere"||nonlinear conjugate gradient method can be "PolakRibiere" or "FletcherReeves"|
|"RestartThreshold"||1/10||threshold for gradient orthogonality below which a restart will be done|
|"RestartIterations"||number of iterations after which to restart|
|"StepControl"||"LineSearch"||must be "LineSearch", but you can use this to specify line search methods|
Method options for Method->"ConjugateGradient".
It should be noted that the default method for FindMinimum
4 was a conjugate gradient method with a near exact line search. This has been maintained for legacy reasons and can be accessed by using the FindMinimum
. Typically, this will use more function and gradient evaluations than the newer Method->"ConjugateGradient"
, which itself often uses far more than the methods that Mathematica
currently uses as defaults.