Unconstrained Optimization: Methods for Solving Nonlinear Equations
There are some close connections between finding a local minimum and solving a set of nonlinear equations. Given a set of equations in unknowns, seeking a solution is equivalent to minimizing the sum of squares when the residual is zero at the minimum, so there is a particularly close connection to the Gauss–Newton methods. In fact, the Gauss–Newton step for local minimization and the Newton step for nonlinear equations are exactly the same. Also, for a smooth function, Newton's method for local minimization is the same as Newton's method for the nonlinear equations . Not surprisingly, many aspects of the algorithms are similar; however, there are also important differences.
Another thing in common with minimization algorithms is the need for some kind of step control. Typically, step control is based on the same methods as minimization except that it is applied to a merit function, usually the smooth 2-norm squared, .
|"Newton"||use the exact Jacobian or a finite difference approximation to solve for the step based on a locally linear model|
|"Secant"||work without derivatives by constructing a secant approximation to the Jacobian using past steps; requires two starting conditions in each dimension|
|"Brent"||method in one dimension that maintains bracketing of roots; requires two starting conditions that bracket a root|
|"AffineCovariantNewton"||optimized to use minimal number of Jacobian evaluations|
Basic method choices for FindRoot.
When this step is added to the point , it is easy to see why the steps go to the line . This is a particular feature of this problem, which is not typical for most functions. Because the trust region approach does not try the Newton step unless it lies within the region bound, this feature does not show up so strongly when the trust region step control is used.
When the structure of the Jacobian matrix is sparse, the Wolfram Language will use SparseArray objects both to compute the Jacobian and to handle the necessary numerical linear algebra.
When solving nonlinear equations is used as a part of a more general numerical procedure, such as solving differential equations with implicit methods, often starting values are quite good, and complete convergence is not absolutely necessary. Often the most expensive part of computing a Newton step is finding the Jacobian and computing a matrix factorization. However, when close enough to a root, it is possible to leave the Jacobian frozen for a few steps (though this does certainly affect the convergence rate). You can do this in the Wolfram Language using the method option "UpdateJacobian", which gives the number of steps to go before updating the Jacobian. The default is "UpdateJacobian"->1, so the Jacobian is updated every step.
|"UpdateJacobian"||1||number of steps to take before updating the Jacobian|
|"StepControl"||"LineSearch"||method for step control, can be "LineSearch", "TrustRegion", or None (which is not recommended)|
When derivatives cannot be computed symbolically, Newton's method will be used, but with a finite difference approximation to the Jacobian. This can have costs in terms of both time and reliability. Just as for minimization, an alternative is to use an algorithm specifically designed to work without derivatives.
In one dimension, the idea of the secant method is to use the slope of the line between two consecutive search points to compute the step instead of the derivative at the latest point. Similarly in dimensions, differences between the residuals at points are used to construct an approximation of sorts to the Jacobian. Note that this is similar to finite differences, but rather than trying to make the difference interval small in order to get as good a Jacobian approximation as possible, it effectively uses an average derivative just like the one-dimensional secant method. Initially, the points are constructed from two starting points that are distinct in all dimensions. Subsequently, as steps are taken, only the points with the smallest merit function value are kept. It is rare, but possible, that steps are collinear and the secant approximation to the Jacobian becomes singular. In this case, the algorithm is restarted with distinct points.
The method requires two starting points in each dimension. In fact, if two starting points are given in each dimension, the secant method is the default method except in one dimension, where Brent's method may be chosen.
Note that, as compared to Newton's method, many more residual function evaluations are required. However, the method is able to follow the relatively narrow valley without directly using derivative information.
However, when compared to Newton's method with finite differences, the number of residual function evaluations is comparable. For sparse Jacobian matrices with larger problems, the finite difference Newton method will usually be more efficient since the secant method does not take advantage of sparsity in any way.
When searching for a real simple root of a real valued function, it is possible to take advantage of the special geometry of the problem, where the function crosses the axis from negative to positive or vice versa. Brent's method [Br02] is effectively a safeguarded secant method that always keeps a point where the function is positive and one where it is negative so that the root is always bracketed. At any given step, a choice is made between an interpolated (secant) step and a bisection in such a way that eventual convergence is guaranteed.
If FindRoot is given two real starting conditions that bracket a root of a real function, then Brent's method will be used. Thus, if you are working in one dimension and can determine initial conditions that will bracket a root, it is often a good idea to do so since Brent's method is the most robust algorithm available for FindRoot.
Even though essentially all the theory for solving nonlinear equations and local minimization is based on smooth functions, Brent's method is sufficiently robust that you can even get a good estimate for a zero crossing for discontinuous functions.
The method gives up and issues a message when the root is bracketed very closely, but it is not able to find a value of the function, which is zero. This robustness carries over very well to continuous functions that are very steep.
The affine covariant Newton method presented here is an implementation of the NLEQ_ERR algorithm [Deu06], and is available as a method option to FindRoot. To illustrate how the algorithm works and its advantages, some simple examples of the affine covariant Newton solver will be shown. Its main application area is for solving discretized nonlinear partial differential equations. These systems of equations can become large, and it is important to pay attention to evaluation costs both in the computational speed and the amount of memory needed to solve these equations.
Given the variables and starting values, FindRoot will find the roots for this system of equations.
Here is a helper function that makes calling FindRoot easier for the type of root finding this tutorial is presenting.
Newton methods, such as the default method of FindRoot, will need to evaluate the function f and compute and evaluate the Jacobian of the function f. The evaluation of the Jacobian can be computationally expensive. The fact that the Jacobian needs to be evaluated is not something that can be avoided in the types of Newton methods discussed here; however, there exist algorithms that minimize the amount of the Jacobian evaluations. To investigate the evaluations needed by FindRoot, monitors such as EvaluationMonitor and StepMonitor can be made use of.
Seven steps and 10 function evaluations were needed to find a solution. Sometimes a trial step is made, which increases the number of function evaluations, but that trial step is then rejected. A rejected step does not increase the step count but increases the function evaluation count. You can also see that seven Jacobian evaluations were needed to complete the solution process.
The affine covariant Newton method needed more steps and function evaluations but fewer Jacobian evaluations. In a case such as solving nonlinear partial differential equations, where evaluating the Jacobian is computationally expensive, not having to evaluate the Jacobian as often can make a big difference in being able to find a solution to the system of equations efficiently.
Note that the precision of the affine covariant solution is less than what the default FindRoot method found, but it is still within the default settings for PrecisionGoal. Before you look at that, however, you can inspect the "AffineCovariantNewton" method option "BroydenUpdates" that can be set to True or False. Broyden updates are a relatively inexpensive way of estimating updated factors of the Jacobian without having to refactor the Jacobian. The affine covariant Newton solver estimates when to use Broyden updates and computes them efficiently. If during the solution process the nonlinearity increases and Broyden updates can no longer be made useful, the algorithm switches back to compute the Jacobians. The Broyden update algorithm implemented is the QNERR algorithm [Deu06].
The affine covariant Newton method is an error norm–controlled algorithm, and as such, an AccuracyGoal is not monitored. In other words, the algorithm only monitors the error and not the residual. The error is associated with the PrecisionGoal, while the residual is associated with the accuracy goal.
The affine covariant Newton method takes a number of options in addition to the FindRoot options. These method options will be discussed. First, a list of available options is presented. Then these options are illustrated with examples, and typical scenarios where the options may be beneficial are presented.
|"InitialDampingFactor"||Automatic||the damping factor in the first step|
|"InverseJacobian"||Automatic||specify the inverse of the Jacobian if known|
|"LinearSolver"||Automatic||specify a linear solver function or options|
|"MinimalDampingFactor"||Automatic||how small the damping factor can become|
Broyden updates are a way to reuse a previously computed Jacobian. Reusing a previously computed Jacobian is much more efficient than recomputing a new Jacobian. The downside is that the Broyden update may not be as precise as a newly computed Jacobian. The affine covariant Newton method algorithm has a mechanism implemented that switches to Broyden updates when it is safe to do so and may also switch back to compute the Jacobians when Broyden updates are no longer usable. Nevertheless, Broyden updates can be disabled.
During step of the solution process, this equation is solved: . The Jacobian is inverted to obtain a new increment . This increment is then added to the previous solution to obtain an updated solution . is a damping factor between that restricts the step size taken. For extreme nonlinearities, the default minimal damping factor can be reduced. The default value is . Often, this should not be necessary.
When it is known that the nonlinearity is not too extreme, a larger initial damping factor can be chosen. The default initial damping factor is 1/100. A chosen value should be larger than the "MinimalDampingFactor" but smaller than or equal to 1: .
The Newton algorithm actually needs the inverse of the Jacobian to do its job. This inverse is computed by creating a LinearSolveFunction. In some cases, the inverse Jacobian may be known or easy to compute. In such a scenario, the inverse Jacobian can be passed to the "AffineCovariantNewton" method via a method option.
Specifying the inverse of the Jacobian does not have any effect on the number of Jacobian evaluations. The fact that the count of Jacobian evaluations is shown to be 0 is an unfortunate shortcoming of the way the inverse Jacobian interacts with FindRoot.
It is possible to change the linear solver or the options used by LinearSolve to construct the inverse of the Jacobian. This is done with the "LinearSolver" method option.
When an iterative method like "Krylov" is specified as a method to LinearSolve, then the factorization of the Jacobian is not stored when possible, but computed anew every time it is needed.