Specifying Derivatives

The function FindRoot has a Jacobian option; the functions FindMinimum, FindMaximum, and FindFit have a Gradient option; and the Newton method has a method option Hessian. All these derivatives are specified with the same basic structure. Here is a summary of ways to specify derivative computation methods.

Automaticfind a symbolic derivative for the function and use finite difference approximations if a symbolic derivative cannot be found
Symbolicsame as Automatic, but gives a warning message if finite differences are to be used
FiniteDifferenceuse finite differences to approximate the derivative
expressionuse the given expression with local numerical values of the variables to evaluate the derivative

Methods for computing gradient, Jacobian, and Hessian derivatives.

The basic specification for a derivative is just the method for computing it. However, all of the derivatives take options as well. These can be specified by using a list {method,opts}. Here is a summary of the options for the derivatives.

option name
default value
"EvaluationMonitor"Noneexpression to evaluate with local values of the variables every time the derivative is evaluated, usually specified with :> instead of -> to prevent symbolic evaluation
"Sparse"Automaticsparse structure for the derivative; can be Automatic, True, False, or a pattern SparseArray giving the nonzero structure
"DifferenceOrder"1difference order to use when finite differences are used to compute the derivative

Options for computing gradient, Jacobian, and Hessian derivatives.

A few examples will help illustrate how these fit together.

This loads a package that contains some utility functions:
Click for copyable input
This defines a function that is only intended to evaluate for numerical values of the variables:
Click for copyable input

With just Method->"Newton", FindMinimum issues an lstol message because it was not able to resolve the minimum well enough due to lack of good derivative information.

This shows the steps taken by FindMinimum when it has to use finite differences to compute the gradient and Hessian:
Click for copyable input

The following describes how you can use the gradient option to specify the derivative.

This computes the minimum of f[x,y] using a symbolic expression for its gradient:
Click for copyable input

Symbolic derivatives are not always available. If you need extra accuracy from finite differences, you can increase the difference order from the default of 1 at the cost of extra function evaluations.

This computes the minimum of f[x,y] using a second-order finite difference to compute the gradient:
Click for copyable input

Note that the number of function evaluations is much higher because function evaluations are used to compute the gradient, which is used to approximate the Hessian in turn. (The Hessian is computed with finite differences since no symbolic expression for it can be computed from the information given.)

The information given from FindMinimumPlot about the number of function, gradient, and Hessian evaluations is quite useful. The EvaluationMonitor options are what make this possible. Here is an example that simply counts the number of each type of evaluation. (The plot is made using Reap and Sow to collect the values at which the evaluations are done.)

This computes the minimum with counters to keep track of the number of steps and the number of function, gradient, and Hessian evaluations:
Click for copyable input

Using such diagnostics can be quite useful for determining what methods and/or method parameters may be most successful for a class of problems with similar characteristics.

When the Wolfram Language can access the symbolic structure of the function, it automatically does a structural analysis of the function and its derivatives and uses SparseArray objects to represent the derivatives when appropriate. Since subsequent numerical linear algebra can then use the sparse structures, this can have a profound effect on the overall efficiency of the search. When the Wolfram Language cannot do a structural analysis, it has to assume, in general, that the structure is dense. However, if you know what the sparse structure of the derivative is, you can specify this with the "Sparse" method option and gain huge efficiency advantages, both in computing derivatives (with finite differences, the number of evaluations can be reduced significantly) and in subsequent linear algebra. This issue is particularly important when working with vector-valued variables. A good example for illustrating this aspect is the extended Rosenbrock problem, which has a very simple sparse structure.

This gets the extended Rosenbrock function with 1000 variables in symbolic form ready to be solved with FindRoot using the UnconstrainedProblems` package:
Click for copyable input
This solves the problem using the symbolic form of the function:
Click for copyable input

For a function with simple form like this, it is easy to write a vector form of the function, which can be evaluated much more quickly than the symbolic form can, even with automatic compilation.

This defines a vector form of the extended Rosenbrock function, which evaluates very efficiently:
Click for copyable input
This extracts the starting point as a vector from the problem structure:
Click for copyable input
This solves the problem using a vector variable and the vector function for evaluation:
Click for copyable input

The solution with the function, which is faster to evaluate, winds up being slower overall because the Jacobian has to be computed with finite differences since the x_List pattern makes it opaque to symbolic analysis. It is not so much the finite differences that are slow as the fact that it needs to do 100 function evaluations to get all the columns of the Jacobian. With knowledge of the structure, this can be reduced to two evaluations to get the Jacobian. For this function, the structure of the Jacobian is quite simple.

This defines a pattern SparseArray, which has the structure of nonzeros for the Jacobian of the extended Rosenbrock function. (By specifying _ for the values in the rules, the SparseArray is taken to be a template of the Pattern type as indicated in the output form.)
Click for copyable input
This solves the problem with the knowledge of the actual Jacobian structure, showing a significant cost savings:
Click for copyable input

When a sparse structure is given, it is also possible to have the value computed by a symbolic expression that evaluates to the values corresponding to the positions given in the sparse structure template. Note that the values must correspond directly to the positions as ordered in the SparseArray (the ordering can be seen using ArrayRules). One way to get a consistent ordering of indices is to transpose the matrix twice, which results in a SparseArray with indices in lexicographic order.

This transposes the nonzero structure matrix twice to get the indices sorted:
Click for copyable input
This defines a function that will return the nonzero values in the Jacobian corresponding to the index positions in the nonzero structure matrix:
Click for copyable input
This solves the problem with the resulting sparse symbolic Jacobian:
Click for copyable input

In this case, using the sparse Jacobian is not significantly faster because the Jacobian is so sparse that a finite difference approximation can be found for it in only two function evaluations and because the problem is well enough defined near the minimum that the extra accuracy in the Jacobian does not make any significant difference.