Nonlinear Conjugate Gradient Methods

The basis for a nonlinear conjugate gradient method is to effectively apply the linear conjugate gradient method, where the residual is replaced by the gradient. A model quadratic function is never explicitly formed, so it is always combined with a line search method.

The first nonlinear conjugate gradient method was proposed by Fletcher and Reeves as follows. Given a step direction , use the line search to find such that . Then compute

It is essential that the line search for choosing satisfies the strong Wolfe conditions; this is necessary to ensure that the directions are descent directions [NW99].

An alternate method, which generally (but not always) works better in practice, is that of Polak and Ribiere, where equation (2) is replaced with

In formula (3), it is possible that can become negative, in which case the Wolfram Language uses the algorithm modified by using . In the Wolfram Language, the default conjugate gradient method is PolakRibiere, but the FletcherReeves method can be chosen by using the method option


The advantage of conjugate gradient methods is that they use relatively little memory for large-scale problems and require no numerical linear algebra, so each step is quite fast. The disadvantage is that they typically converge much more slowly than Newton or quasi-Newton methods. Also, steps are typically poorly scaled for length, so the line search algorithm may require more iterations each time to find an acceptable step.

This loads a package that contains some utility functions:
Click for copyable input
This shows a plot of the steps taken by the nonlinear conjugate gradient method. The path is much less direct than for Newton's method:
Click for copyable input

One issue that arises with nonlinear conjugate gradient methods is when to restart them. As the search moves, the nature of the local quadratic approximation to the function may change substantially. The local convergence of the method depends on that of the linear conjugate gradient method, where the quadratic function is constant. With a constant quadratic function for n variables and an exact line search, the linear algorithm will converge in n or fewer iterations. By restarting (taking a steepest descent step with ) every so often, it is possible to eliminate information from previous points, which may not be relevant to the local quadratic model at the current search point. If you look carefully at the example, you can see where the method was restarted and a steepest descent step was taken. One option is to simply restart after every k iterations, where k<=n. You can specify this using the method option "RestartIterations"->k. An alternative is to restart when consecutive gradients are not sufficiently orthogonal according to the test

with a threshold ν between 0 and 1. You can specify this using the method option "RestartThreshold"->ν.

The table summarizes the options you can use with the conjugate gradient methods.

option name
default value
"Method""PolakRibiere"nonlinear conjugate gradient method can be "PolakRibiere" or "FletcherReeves"
"RestartThreshold"1/10threshold ν for gradient orthogonality below which a restart will be done
"RestartIterations"number of iterations after which to restart
"StepControl""LineSearch"must be "LineSearch", but you can use this to specify line search methods

Method options for Method->"ConjugateGradient".

It should be noted that the default method for FindMinimum in version 4 was a conjugate gradient method with a near exact line search. This has been maintained for legacy reasons and can be accessed by using the FindMinimum option Method->"Gradient". Typically, this will use more function and gradient evaluations than the newer Method->"ConjugateGradient", which itself often uses far more than the methods that the Wolfram Language currently uses as defaults.