Mathematically, sufficient conditions for a local minimum of a smooth function are quite straightforward: x*
is a local minimum if f (x*) = 0
and the Hessian 2f (x*)
is positive definite. (It is a necessary condition that the Hessian be positive semidefinite.) The conditions for a root are even simpler. However, when the function f
is being evaluated on a computer where its value is only known, at best, to a certain precision, and practically only a limited number of function evaluations are possible, it is necessary to use error estimates to decide when a search has become close enough to a minimum or a root, and to compute the solution only to a finite tolerance. For the most part, these estimates suffice quite well, but in some cases, they can be in error, usually due to unresolved fine scale behavior of the function.
Tolerances affect how close a search will try to get to a root or local minimum before terminating the search. Assuming that the function itself has some error (as is typical when it is computed with numerical values), it is not typically possible to locate the position of a minimum much better than to half of the precision of the numbers being worked with. This is because of the quadratic nature of local minima. Near the bottom of a parabola, the height varies quite slowly as you move across from the minimum. Thus, if there is any error noise in the function, it will typically mask the actual rise of the parabola over a width roughly equal to the square root of the noise. This is best seen with an example.
This loads a package that contains some utility functions.
The following command displays a sequence of plots showing the minimum of the function
over successively smaller ranges. The curve computed with machine numbers is shown in black; the actual curve (computed with 100 digits of precision) is shown in blue.
From the sequence of plots, it is clear that for changes of order 10-8
, which is about half of machine precision and smaller, errors in the function are masking the actual shape of the curve near the minimum. With just sampling of the function at that precision, there is no way to be sure if a given point gives the smallest local value of the function or not to any closer tolerance.
The value of the derivative, if it is computed symbolically, is much more reliable, but for the general case, it is not sufficient to rely only on the value of the derivative; the search needs to find a local minimal value of the function where the derivative is small to satisfy the tolerances in general. Note also that if symbolic derivatives of your function cannot be computed and finite differences or a derivative-free method is used, the accuracy of the solution may degrade further.
Root finding can suffer from the same inaccuracies in the function. While it is typically not as severe, some of the error estimates are based on a merit function, which does have a quadratic shape.
For the reason of this limitation, the default tolerances for the Find
functions are all set to be half of the final working precision. Depending on how much error the function has, this may or may not be achievable, but in most cases it is a reasonable goal. You can adjust the tolerances using theAccuracyGoal
options. When AccuracyGoal->ag
, this defines tolerances tola = 10-ag
and tolr = 10-pg
and tolr FindMinimum
tries to find a value xk
such that xk - x* ≤ max (tola, xk tolr)
. Of course, since the exact position of the minimum, x*
is not known, the quantity xk - x*
is estimated. This is usually done based on past steps and derivative values. To match the derivative condition at a minimum, the additional requirement f (xk)≤ tola
is imposed. For FindRoot
, the corresponding condition is that just the residual be small at the root: f≤ tola
This finds the
to at least 12 digits of accuracy, or within a tolerance of 10-12
. The precision goal of
means that tolr = 0
, so it does not have any effect in the formula. (Note, you cannot similarly set the accuracy goal to
since that is always used for the size of the residual.)
This shows that the result satisfied the requested error tolerances.
This tries to find the minimum of the function sin (x)-cos (x)
to 8 digits of accuracy. FindMinimum
gives a warning message because of the error in the function as seen in the plots.
This shows that though the value at the minimum was found to be basically machine epsilon, the position was only found to the order of 10-8
In multiple dimensions, the situation is even more complicated since there can be more error in some directions than others, such as when a minimum is found along a relatively narrow valley, as in the Freudenstein-Roth problem
. For searches such as this, often the search parameters are scaled, which in turn affects the error estimates. Nonetheless, it is still typical that the quadratic shape of the minimum affects the realistically achievable tolerances.
When you need to find a root or minimum beyond the default tolerances, it may be necessary to increase the final working precision. You can do this with the WorkingPrecision
option. When you use WorkingPrecision->prec
, the search starts at the precision of the starting values and is adaptively increased up to prec
as the search converges. By default, WorkingPrecision->MachinePrecision
, so machine numbers are used, which are usually much faster. Going to higher precision can take significantly more time, but can get you much more accurate results if your function is defined in an appropriate way. For very high-precision solutions, Newton's
method is recommended because its quadratic convergence rate significantly reduces the number of steps ultimately required.
It is important to note that increasing the setting of the WorkingPrecision
option does no good if the function is defined with lower-precision numbers. In general, for WorkingPrecision->prec
to be effective, the numbers used to define the function should be exact or at least of precision prec
. When possible, the precision of numbers in the function is artificially raised to prec
so that convergence still works, but this is not always possible. In any case, when the functions and derivatives are evaluated numerically, the precision of the results is raised to prec
if necessary so that the internal arithmetic can be done with prec
digit precision. Even so, the actual precision or accuracy of the root or minimum and its position is limited by the accuracy in the function. This is especially important to keep in mind when using FindFit
where data is usually only known up to a certain precision.
Here is a function defined using machine numbers.
Even with higher working precision, the minimum cannot be resolved better because the actual function still has the same errors as shown in the plots. The derivatives were specified to keep other things consistent with the computation at machine precision shown previously.
Here is the computation done with 20-digit precision when the function does not have machine numbers.
If you specify WorkingPrecision->prec,
but do not explicitly specify the AccuracyGoal
options, then their default settings of Automatic
will be taken to be AccuracyGoal->prec/2
. This leads to the smallest tolerances which can realistically be expected in general, as discussed earlier.
Here is the computation done with 50-digit precision without an explicitly specified setting for the AccuracyGoal
This shows that though the value at the minimum was actually found to be even better than the default 25-digit tolerances.
The following table shows a summary of the options affecting precision and tolerance.
Precision and tolerance options in the "Find" functions.
A search will sometimes converge slowly. To prevent slow searches from going on indefinitely, the Find
commands all have a maximum number of iterations (steps) that will be allowed before terminating. This can be controlled with the option MaxIterations
that has the default value MaxIterations->100
. When a search terminates with this condition, the command will issue the cvmit
This attempt solves the problem with the default method, which is the Levenberg-Marquardt method, since the function is a sum of squares.
The Levenberg-Marquardt method is converging slowly on this problem because the residual is nonzero near the minimum and the second-order part of the Hessian is needed. While the method eventually does converge in just under 400 steps, perhaps a better option is to use a method which may converge faster.
In a larger calculation, one possibility when hitting the iteration limit is to use the final search point, which is returned, as a starting condition for continuing the search, ideally with another method.