# Gauss–Newton Methods

For minimization problems for which the objective function is a sum of squares,

it is often advantageous to use the special structure of the problem. Time and effort can be saved by computing the residual function , and its derivative, the Jacobian . The Gauss–Newton method is an elegant way to do this. Rather than using the complete second-order Hessian matrix for the quadratic model, the Gauss–Newton method uses in (1) such that the step is computed from the formula

where , and so on. Note that this is an approximation to the full Hessian, which is . In the zero residual case, where is the minimum, or when varies nearly as a linear function near the minimum point, the approximation to the Hessian is quite good and the quadratic convergence of Newton's method is commonly observed.

Objective functions, which are sums of squares, are quite common, and, in fact, this is the form of the objective function when FindFit is used with the default value of the NormFunction option. One way to view the Gauss–Newton method is in terms of least-squares problems. Solving the Gauss–Newton step is the same as solving a linear least-squares problem, so applying a Gauss–Newton method is in effect applying a sequence of linear least-squares fits to a nonlinear function. With this view, it makes sense that this method is particularly appropriate for the sort of nonlinear fitting that FindFit does.

In[1]:= |

In[2]:= |

Out[2]= |

When the Wolfram Language encounters a problem that is expressly a sum of squares, such as the Rosenbrock example, or a function that is the dot product of a vector with itself, the Gauss–Newton method will be used automatically.

In[3]:= |

Out[3]= |

If you compare this with the same example done with Newton's method, you can see that it was done with fewer steps and evaluations because the Gauss–Newton method is taking advantage of the special structure of the problem. The convergence rate near the minimum is just as good as for Newton's method because the residual is zero at the minimum.

The Levenberg–Marquardt method is a Gauss–Newton method with trust region step control (though it was originally proposed before the general notion of trust regions had been developed). You can request this method specifically by using the FindMinimum option Method->"LevenbergMarquardt" or equivalently Method->"GaussNewton".

Sometimes it is awkward to express a function so that it will explicitly be a sum of squares or a dot product of a vector with itself. In these cases, it is possible to use the method option to specify the residual directly. Similarly, you can specify the derivative of the residual with the method option. Note that when the residual is specified through the method option, it is not checked for consistency with the first argument of FindMinimum. The values returned will depend on the value given through the option.

In[4]:= |

Out[4]= |

option name | default value | |

"Residual" | Automatic | allows you to directly specify the residual such that |

"EvaluationMonitor" | Automatic | an expression that is evaluated each time the residual is evaluated |

"Jacobian" | Automatic | allows you to specify the (matrix) derivative of the residual |

"StepControl" | "TrustRegion" | must be , but allows you to change control parameters through method options |

Method options for Method->"LevenbergMarquardt".

Another natural way of setting up sums of squares problems in the Wolfram Language is with FindFit, which computes nonlinear fits to data. A simple example follows.

In[5]:= |

In[6]:= |

In[7]:= |

Out[7]= |

In[8]:= |

Out[8]= |

In the example, FindFit internally constructs a residual function and Jacobian, which are in turn used by the Gauss–Newton method to find the minimum of the sum of squares, or the nonlinear least-squares fit. Of course, FindFit can be used with other methods, but because a residual function that evaluates rapidly can be constructed, it is often faster than the other methods.