There are some close connections between finding a local minimum and solving a set of nonlinear equations. Given a set of n equations in n unknowns, seeking a solution is equivalent to minimizing the sum of squares when the residual is zero at the minimum, so there is a particularly close connection to the Gauss-Newton methods. In fact, the Gauss-Newton step for local minimization and the Newton step for nonlinear equations are exactly the same. Also, for a smooth function, Newton's method for local minimization is the same as Newton's method for the nonlinear equations . Not surprisingly, then, many aspects of the algorithms are similar. Nonetheless there are also important differences.

Another thing in common with minimization algorithms is the need for some kind of step control. Typically, step control is based on the same methods as minimization except that they are applied to a merit function, usually the smooth 2-norm squared, .

Basic method choices for FindRoot.

Newton's method for nonlinear equations is based on a linear approximation

r(x)=Mk(p) = r(xk)+J(xk) p, p = (x - xk)

so the Newton step is found simply by setting ,

J(xk) pk = -r(xk).

Near a root of the equations, Newton's method has q-quadratic convergence, similar to the Newton's method for minimization. Newton's method is used as the default method for FindRoot.

Newton's method can be used with either line search or trust region step control. When it works, the line search control is typically faster, but the trust region approach is usually more robust.

Here is Rosenbrock's problem as a FindRoot problem.

In[1]:=

Out[1]=

This finds the solution of the nonlinear system using the default line search approach. (Newton's method is the default method for FindRoot.)

In[2]:=

Out[2]=

Note that each of the line searches started along the line x == 1. This is a particular property of the Newton step for this particular problem.

This computes the Jacobian and the Newton step symbolically for the Rosenbrock problem.

In[3]:=

Out[4]=

When this step is added to the point, , it is easy to see why the steps go to the line . This is a particular feature of this problem, which is not typical for most functions. Because the trust region approach does not try the Newton step unless it lies within the region bound, this feature does not show up so strongly when the trust region step control is used.

This finds the solution of the nonlinear system using the trust region approach. The search is almost identical to the search with the Gauss-Newton method for the Rosenbrock objective function in FindMinimum.

In[5]:=

Out[5]=

When the structure of the Jacobian matrix is sparse, Mathematica will use SparseArray objects both to compute the Jacobian and to handle the necessary numerical linear algebra.

When solving nonlinear equations is used as a part of a more general numerical procedure, such as solving differential equations with implicit methods, often starting values are quite good and complete convergence is not absolutely necessary. Often the most expensive part of computing a Newton step is finding the Jacobian and computing a matrix factorization. However, when close enough to a root, it is possible to leave the Jacobian frozen for a few steps (though this does certainly affect the convergence rate). You can do this in Mathematica using the method option "UpdateJacobian", which gives the number of steps to go before updating the Jacobian. The default is "UpdateJacobian" -> 1 so the Jacobian is updated every step.

This shows the number of steps, function evaluations, and Jacobian evaluations, required to find a simple square root when the Jacobian is only updated every three steps.

In[6]:=

Out[6]=

This shows the number of steps, function evaluations and Jacobian evaluations required to find a simple square root when the Jacobian is updated every step.

In[7]:=

Out[7]=

Of course for a simple one-dimensional root, updating the Jacobian is trivial in cost, so holding the update is only of use here to demonstrate the idea.

Method options for Method->"Newton" in FindRoot.

When derivatives cannot be computed symbolically, Newton's method will be used, but with a finite difference approximation to the Jacobian. This can have cost in terms of both time and reliability. Just as for minimization, an alternative is to use an algorithm specifically designed to work without derivatives.

In one dimension, the idea of the secant method is to use the slope of the line between two consecutive search points to compute the step instead of the derivative at the latest point. Similarly in n-dimensions, differences between the residuals at n points are used to construct an approximation of sorts to the Jacobian. Note that this is similar to finite differences, but rather than trying to make the difference interval small in order to get as good a Jacobian approximation as possible, it effectively uses an average derivative just like the one-dimensional secant method. Initially, the n points are constructed from two starting points that are distinct in all n dimensions. Subsequently, as steps are taken, only the n points with the smallest merit function value are kept. It is rare, but possible that steps are collinear and the secant approximation to the Jacobian becomes singular. In this case, the algorithm is restarted with distinct points.

The method requires two starting points in each dimension. In fact, if two starting points are given in each dimension, the secant method is the default method except in one dimension, where Brent's method may be chosen as described later.

This shows the solution of the Rosenbrock problem with the secant method.

In[8]:=

Out[8]=

Note that, as compared to Newton's method, many more residual function evaluations are required. However, the method is able to follow the relatively narrow valley without directly using derivative information.

This shows the solution of the Rosenbrock problem with Newton's method using finite differences to compute the Jacobian.

In[35]:=

Out[35]=

However, when compared to Newton's method with finite differences, the number of residual function evaluations is comparable. For sparse Jacobian matrices with larger problems, the finite difference Newton method will usually be more efficient since the secant method does not take advantage of sparsity in any way.

When searching for a real simple root of a real valued function, it is possible to take advantage of the special geometry of the problem, where the function crosses the axis from negative to positive or vice-versa. Brent's method [Br02] is effectively a safeguarded secant method that always keeps a point where the function is positive and one where it is negative so that the root is always bracketed. At any given step, a choice is made between an interpolated (secant) step and a bisection in such a way that eventual convergence is guaranteed.

If FindRoot is given two real starting conditions that bracket a root of a real function, then Brent's method will be used. Thus, if you are working in one dimension and can determine initial conditions that will bracket a root, it is often a good idea to do so since Brent's method is the most robust algorithm available for FindRoot.

Even though essentially all of the theory for solving nonlinear equations and local minimization is based on smooth functions, Brent's method is sufficiently robust that you can even get a good estimate for a zero crossing for discontinuous functions.

This shows the steps and function evaluations used in an attempt to find the root of a discontinuous function.

In[10]:=

Out[10]=

The method gives up and issues a message when the root is bracketed very closely, but it is not able to find a value of the function, which is zero. This robustness carries over very well to continuous functions that are very steep.

This shows the steps and function evaluations used to find the root of a function that varies rapidly near its root.

In[11]:=

Out[11]=