This is documentation for Mathematica 6, which was
based on an earlier version of the Wolfram Language.
View current documentation (Version 11.1)

Nonlinear Conjugate Gradient Methods

The basis for a nonlinear conjugate gradient method is to effectively apply the linear conjugate gradient method, where the residual is replaced by the gradient. A model quadratic function is never explicitly formed, so it is always combined with a "line search" method.
The first nonlinear conjugate gradient method was proposed by Fletcher and Reeves as follows. Given a step direction pk, use the line search to find k such that xk+1 = xk + kpk. Then compute
It is essential that the line search for choosing k satisfies the strong Wolfe conditions; this is necessary to ensure that the directions pk are descent directions [NW99]].
An alternate method, which generally (but not always) works better in practice, is that of Polak and Ribiere, where equation (2) is replaced with
In formula (3), it is possible that k+1 can become negative, in which case Mathematica uses the algorithm modified by using pk+1=max (k+1, 0) pk-f (xk+1). In Mathematica, the default conjugate gradient method is Polak-Ribiere, but the Fletcher-Reeves method can be chosen by using the method option
Method→{"ConjugateGradient", Method→"FletcherReeves"}
The advantage of conjugate gradient methods is that they use relatively little memory for large-scale problems and require no numerical linear algebra, so each step is quite fast. The disadvantage is that they typically converge much more slowly than Newton or quasi-Newton methods. Also, steps are typically poorly scaled for length, so the "line search" algorithm may require more iterations each time to find an acceptable step.
This loads a package that contains some utility functions.
Click for copyable input
This shows a plot of the steps taken by the nonlinear conjugate gradient method. The path is much less direct than for Newton's method.
Click for copyable input
One issue that arises with nonlinear conjugate gradient methods is when to restart them. As the search moves, the nature of the local quadratic approximation to the function may change substantially. The local convergence of the method depends on that of the linear conjugate gradient method, where the quadratic function is constant. With a constant quadratic function for n variables and an exact line search, the linear algorithm will converge in n or fewer iterations. By restarting (taking a steepest descent step with k+1 = 0) every so often, it is possible to eliminate information from previous points, which may not be relevant to the local quadratic model at the current search point. If you look carefully at the example, you can see where the method was restarted and a steepest descent step was taken. One option is to simply restart after every k iterations, where k <= n. You can specify this using the method option "RestartIterations"→k. An alternative is to restart when consecutive gradients are not sufficiently orthogonal according to the test
with a threshold between 0 and 1. You can specify this using the method option "RestartThreshold"→.
The table summarizes the options you can use with the conjugate gradient methods.
option name
default value
"Method""PolakRibiere"Nonlinear conjugate gradient method can be "PolakRibiere" or "FletcherReeves"
"RestartThreshold"1/10Threshold for gradient orthogonality below which a restart will be done
"RestartIterations"Number of iterations to restart after
"StepControl""LineSearch"Must be "LineSearch", but you can use this to specify line search methods

Method options for Method->"ConjugateGradient".

It should be noted that the default method for FindMinimum in Mathematica 4 was a conjugate gradient method with a near exact line search. This has been maintained for legacy reasons and can be accessed by using the FindMinimum option Method->"Gradient". Typically, this will use more function and gradient evaluations than the newer Method->"ConjugateGradient", which itself often uses far more than the methods that Mathematica currently uses as defaults.