Specifying Derivatives
The function FindRoot has a
option; the functions FindMinimum, FindMaximum, and FindFit have a Gradient option; and the Newton method has a method option
. All these derivatives are specified with the same basic structure. Here is a summary of ways to specify derivative computation methods.
| Automatic | find a symbolic derivative for the function and use finite difference approximations if a symbolic derivative cannot be found |
| Symbolic | same as Automatic, but gives a warning message if finite differences are to be used |
| FiniteDifference | use finite differences to approximate the derivative |
| expression | use the given expression with local numerical values of the variables to evaluate the derivative |
Methods for computing gradient, Jacobian, and Hessian derivatives.
The basic specification for a derivative is just the method for computing it. However, all of the derivatives take options as well. These can be specified by using a list
. Here is a summary of the options for the derivatives.
| | |
| "EvaluationMonitor" | None | expression to evaluate with local values of the variables every time the derivative is evaluated, usually specified with instead of to prevent symbolic evaluation |
| "Sparse" | Automatic | sparse structure for the derivative; can be Automatic, True, False, or a pattern SparseArray giving the nonzero structure |
| "DifferenceOrder" | 1 | difference order to use when finite differences are used to compute the derivative |
Options for computing gradient, Jacobian, and Hessian derivatives.
A few examples will help illustrate how these fit together.
This loads a package that contains some utility functions.
This defines a function that is only intended to evaluate for numerical values of the variables.
With just Method->"Newton", FindMinimum issues an lstol message because it was not able to resolve the minimum well enough due to lack of good derivative information.
This shows the steps taken by
FindMinimum when it has to use finite differences to compute the gradient and Hessian.
The following describes how you can use the gradient option to specify the derivative.
This computes the minimum of

using a symbolic expression for its gradient.
| Out[4]= |  |
Symbolic derivatives are not always available. If you need extra accuracy from finite differences, you can increase the difference order from the default of 1 at the cost of extra function evaluations.
This computes the minimum of

using a second-order finite difference to compute the gradient.
| Out[5]= |  |
Note that the number of function evaluations is much higher because function evaluations are used to compute the gradient, which is used to approximate the Hessian in turn. (The Hessian is computed with finite differences since no symbolic expression for it can be computed from the information given.)
The information given from
about the number of function, gradient, and Hessian evaluations is quite useful. The EvaluationMonitor options are what make this possible. Here is an example that simply counts the number of each type of evaluation. (The plot is made using Reap and Sow to collect the values at which the evaluations are done.)
This computes the minimum with counters to keep track of the number of steps and the number of function, gradient, and Hessian evaluations.
| Out[6]= |  |
Using such diagnostics can be quite useful for determining what methods and/or method parameters may be most successful for a class of problems with similar characteristics.
When Mathematica can access the symbolic structure of the function, it automatically does a structural analysis of the function and its derivatives and uses SparseArray objects to represent the derivatives when appropriate. Since subsequent numerical linear algebra can then use the sparse structures, this can have a profound effect on the overall efficiency of the search. When Mathematica cannot do a structural analysis, it has to assume, in general, that the structure is dense. However, if you know what the sparse structure of the derivative is, you can specify this with the
method option and gain huge efficiency advantages, both in computing derivatives (with finite differences, the number of evaluations can be reduced significantly) and in subsequent linear algebra. This issue is particularly important when working with vector-valued variables. A good example for illustrating this aspect is the extended Rosenbrock problem, which has a very simple sparse structure.
This gets the extended Rosenbrock function with 1000 variables in symbolic form ready to be solved with
FindRoot using the

package.
Out[7]//Short= |
| |  |
This solves the problem using the symbolic form of the function.
| Out[8]= |  |
For a function with simple form like this, it is easy to write a vector form of the function, which can be evaluated much more quickly than the symbolic form can, even with automatic compilation.
This defines a vector form of the extended Rosenbrock function, which evaluates very efficiently.
This extracts the starting point as a vector from the problem structure.
Out[10]//Short= |
| |  |
This solves the problem using a vector variable and the vector function for evaluation.
| Out[11]= |  |
The solution with the function, which is faster to evaluate, winds up being slower overall because the Jacobian has to be computed with finite differences since the
pattern makes it opaque to symbolic analysis. It is not so much the finite differences that are slow as the fact that it needs to do 100 function evaluations to get all the columns of the Jacobian. With knowledge of the structure, this can be reduced to two evaluations to get the Jacobian. For this function, the structure of the Jacobian is quite simple.
This defines a pattern
SparseArray, which has the structure of nonzeros for the Jacobian of the extended Rosenbrock function. (By specifying

for the values in the rules, the
SparseArray is taken to be a template of the
Pattern type as indicated in the output form.)
| Out[12]= |  |
This solves the problem with the knowledge of the actual Jacobian structure, showing a significant cost savings.
| Out[13]= |  |
When a sparse structure is given, it is also possible to have the value computed by a symbolic expression that evaluates to the values corresponding to the positions given in the sparse structure template. Note that the values must correspond directly to the positions as ordered in the SparseArray (the ordering can be seen using ArrayRules). One way to get a consistent ordering of indices is to transpose the matrix twice, which results in a SparseArray with indices in lexicographic order.
This transposes the nonzero structure matrix twice to get the indices sorted.
| Out[14]= |  |
This defines a function that will return the nonzero values in the Jacobian corresponding to the index positions in the nonzero structure matrix.
This solves the problem with the resulting sparse symbolic Jacobian.
| Out[16]= |  |
In this case, using the sparse Jacobian is not significantly faster because the Jacobian is so sparse that a finite difference approximation can be found for it in only two function evaluations and because the problem is well enough defined near the minimum that the extra accuracy in the Jacobian does not make any significant difference.