GaussNewton Methods

For minimization problems for which the objective function is a sum of squares,

it is often advantageous to use the special structure of the problem. Time and effort can be saved by computing the residual function , and its derivative, the Jacobian . The GaussNewton method is an elegant way to do this. Rather than using the complete second-order Hessian matrix for the quadratic model, the GaussNewton method uses in (1) such that the step is computed from the formula

where , and so on. Note that this is an approximation to the full Hessian, which is . In the zero residual case, where is the minimum, or when varies nearly as a linear function near the minimum point, the approximation to the Hessian is quite good and the quadratic convergence of Newton's method is commonly observed.

Objective functions, which are sums of squares, are quite common, and, in fact, this is the form of the objective function when FindFit is used with the default value of the NormFunction option. One way to view the GaussNewton method is in terms of least-squares problems. Solving the GaussNewton step is the same as solving a linear least-squares problem, so applying a GaussNewton method is in effect applying a sequence of linear least-squares fits to a nonlinear function. With this view, it makes sense that this method is particularly appropriate for the sort of nonlinear fitting that FindFit does.

This loads a package that contains some utility functions.
In[1]:=
Click for copyable input
This uses the Unconstrained Problems Package to set up the classic Rosenbrock function, which has a narrow curved valley.
In[2]:=
Click for copyable input
Out[2]=

When the Wolfram Language encounters a problem that is expressly a sum of squares, such as the Rosenbrock example, or a function that is the dot product of a vector with itself, the GaussNewton method will be used automatically.

This shows the steps taken by FindMinimum with the GaussNewton method for Rosenbrock's function using a trust region method for step control.
In[3]:=
Click for copyable input
Out[3]=

If you compare this with the same example done with Newton's method, you can see that it was done with fewer steps and evaluations because the GaussNewton method is taking advantage of the special structure of the problem. The convergence rate near the minimum is just as good as for Newton's method because the residual is zero at the minimum.

The LevenbergMarquardt method is a GaussNewton method with trust region step control (though it was originally proposed before the general notion of trust regions had been developed). You can request this method specifically by using the FindMinimum option Method->"LevenbergMarquardt" or equivalently Method->"GaussNewton".

Sometimes it is awkward to express a function so that it will explicitly be a sum of squares or a dot product of a vector with itself. In these cases, it is possible to use the method option to specify the residual directly. Similarly, you can specify the derivative of the residual with the method option. Note that when the residual is specified through the method option, it is not checked for consistency with the first argument of FindMinimum. The values returned will depend on the value given through the option.

This finds the minimum of Rosenbrock's function using the specification of the residual.
In[4]:=
Click for copyable input
Out[4]=
option name
default value
"Residual"Automaticallows you to directly specify the residual such that
"EvaluationMonitor"Automatican expression that is evaluated each time the residual is evaluated
"Jacobian"Automaticallows you to specify the (matrix) derivative of the residual
"StepControl""TrustRegion"must be , but allows you to change control parameters through method options

Method options for Method->"LevenbergMarquardt".

Another natural way of setting up sums of squares problems in the Wolfram Language is with FindFit, which computes nonlinear fits to data. A simple example follows.

Here is a model function.
In[5]:=
Click for copyable input
Here is some data generated by the function with some random perturbations added.
In[6]:=
Click for copyable input
This finds a nonlinear least-squares fit to the model function.
In[7]:=
Click for copyable input
Out[7]=
This shows the fit model with the data.
In[8]:=
Click for copyable input
Out[8]=

In the example, FindFit internally constructs a residual function and Jacobian, which are in turn used by the GaussNewton method to find the minimum of the sum of squares, or the nonlinear least-squares fit. Of course, FindFit can be used with other methods, but because a residual function that evaluates rapidly can be constructed, it is often faster than the other methods.