Quasi-Newton Methods

There are many variants of quasi-Newton methods. In all of them, the idea is to base the matrix in the quadratic model

on an approximation of the Hessian matrix built up from the function and gradient values from some or all steps previously taken.

This loads a package that contains some utility functions.
In[1]:=
Click for copyable input
This shows a plot of the steps taken by the quasi-Newton method. The path is much less direct than for Newtons method. The quasi-Newton method is used by default by FindMinimum for problems that are not sums of squares.
In[2]:=
Click for copyable input
Out[2]=

The first thing to notice about the path taken in this example is that it starts in the wrong direction. This direction is chosen because at the first step all the method has to go by is the gradient, and so it takes the direction of steepest descent. However, in subsequent steps, it incorporates information from the values of the function and gradient at the steps taken to build up an approximate model of the Hessian.

The methods used by the Wolfram Language are the BroydenFletcherGoldfarbShanno (BFGS) updates and, for large systems, the limited-memory BFGS (L-BFGS) methods, in which the model is not stored explicitly, but rather is calculated by gradients and step directions stored from past steps.

The BFGS method is implemented such that instead of forming the model Hessian at each step, Cholesky factors such that are computed so that only operations are needed to solve the system [DS96] for a problem with variables.

For large-scale sparse problems, the BFGS method can be problematic because, in general, the Cholesky factors (or the Hessian approximation or its inverse) are dense, so the memory and operations requirements become prohibitive compared to algorithms that take advantage of sparseness. The L-BFGS algorithm [NW99] forms an approximation to the inverse Hessian based on the last past steps, which are stored. The Hessian approximation may not be as complete, but the memory and order of operations are limited to for a problem with variables. In the Wolfram Language, for problems over 250 variables, the algorithm is switched automatically to L-BFGS. You can control this with the method option "StepMemory"->m. With , the full BFGS method will always be used. Choosing an appropriate value of is a tradeoff between speed of convergence and the work done per step. With , you are most likely better off using a "conjugate gradient" algorithm.

This shows the same example function with the minimum computed using L-BFGS with .
In[3]:=
Click for copyable input

Out[3]=

Quasi-Newton methods are chosen as the default in the Wolfram Language because they are typically quite fast and do not require computation of the Hessian matrix, which can be quite expensive both in terms of the symbolic computation and numerical evaluation. With an adequate line search, they can be shown to converge superlinearly [NW99] to a local minimum where the Hessian is positive definite. This means that

or, in other words, the steps keep getting smaller. However, for very high precision, this does not compare to the -quadratic convergence rate of Newton's method.

This shows the number of steps and function evaluations required to find the minimum to high precision for the problem shown.
In[4]:=
Click for copyable input
Out[4]=

Newton's method is able to find ten times as many digits with far fewer steps because of its quadratic convergence rate. However, the convergence with the quasi-Newton method is still superlinear since the ratio of the errors is clearly going to zero.

This makes a plot showing the ratios of the errors in the computation. The ratios of the errors are shown on a logarithmic scale so that the trend can clearly be seen over a large range of magnitudes.
In[5]:=
Click for copyable input
Out[5]=

The following table summarizes the options you can use with quasi-Newton methods.

option name
default value
"StepMemory"Automaticthe effective number of steps to "remember" in the Hessian approximation; can be a positive integer or Automatic
"StepControl""LineSearch"how to control steps; can be or None

Method options for Method->"QuasiNewton".