The function FindRoot has a option; the functions FindMinimum, FindMaximum, and FindFit have a Gradient option; and the Newton method has a method option . All these derivatives are specified with the same basic structure. Here is a summary of ways to specify derivative computation methods.
|Automatic||find a symbolic derivative for the function and use finite difference approximations if a symbolic derivative cannot be found|
|Symbolic||same as Automatic, but gives a warning message if finite differences are to be used|
|FiniteDifference||use finite differences to approximate the derivative|
|expression||use the given expression with local numerical values of the variables to evaluate the derivative|
The basic specification for a derivative is just the method for computing it. However, all of the derivatives take options as well. These can be specified by using a list . Here is a summary of the options for the derivatives.
|"EvaluationMonitor"||None||expression to evaluate with local values of the variables every time the derivative is evaluated, usually specified with instead of to prevent symbolic evaluation|
|"Sparse"||Automatic||sparse structure for the derivative; can be Automatic, True, False, or a pattern SparseArray giving the nonzero structure|
|"DifferenceOrder"||1||difference order to use when finite differences are used to compute the derivative|
Symbolic derivatives are not always available. If you need extra accuracy from finite differences, you can increase the difference order from the default of 1 at the cost of extra function evaluations.
Note that the number of function evaluations is much higher because function evaluations are used to compute the gradient, which is used to approximate the Hessian in turn. (The Hessian is computed with finite differences since no symbolic expression for it can be computed from the information given.)
The information given from about the number of function, gradient, and Hessian evaluations is quite useful. The EvaluationMonitor options are what make this possible. Here is an example that simply counts the number of each type of evaluation. (The plot is made using Reap and Sow to collect the values at which the evaluations are done.)
When the Wolfram Language can access the symbolic structure of the function, it automatically does a structural analysis of the function and its derivatives and uses SparseArray objects to represent the derivatives when appropriate. Since subsequent numerical linear algebra can then use the sparse structures, this can have a profound effect on the overall efficiency of the search. When the Wolfram Language cannot do a structural analysis, it has to assume, in general, that the structure is dense. However, if you know what the sparse structure of the derivative is, you can specify this with the method option and gain huge efficiency advantages, both in computing derivatives (with finite differences, the number of evaluations can be reduced significantly) and in subsequent linear algebra. This issue is particularly important when working with vector-valued variables. A good example for illustrating this aspect is the extended Rosenbrock problem, which has a very simple sparse structure.
The solution with the function, which is faster to evaluate, winds up being slower overall because the Jacobian has to be computed with finite differences since the pattern makes it opaque to symbolic analysis. It is not so much the finite differences that are slow as the fact that it needs to do 100 function evaluations to get all the columns of the Jacobian. With knowledge of the structure, this can be reduced to two evaluations to get the Jacobian. For this function, the structure of the Jacobian is quite simple.
When a sparse structure is given, it is also possible to have the value computed by a symbolic expression that evaluates to the values corresponding to the positions given in the sparse structure template. Note that the values must correspond directly to the positions as ordered in the SparseArray (the ordering can be seen using ArrayRules). One way to get a consistent ordering of indices is to transpose the matrix twice, which results in a SparseArray with indices in lexicographic order.
In this case, using the sparse Jacobian is not significantly faster because the Jacobian is so sparse that a finite difference approximation can be found for it in only two function evaluations and because the problem is well enough defined near the minimum that the extra accuracy in the Jacobian does not make any significant difference.