Specifying Derivatives
The function
FindRoot has a
Jacobian option; the functions
FindMinimum,
FindMaximum, and
FindFit have a
Gradient option; and the
"Newton" method has a method option
Hessian. All these derivatives are specified with the same basic structure. Here is a summary of ways to specify derivative computation methods.
Automatic  find a symbolic derivative for the function and use finite difference approximations if a symbolic derivative cannot be found 
Symbolic  same as Automatic, but gives a warning message if finite differences are to be used 
FiniteDifference  use finite differences to approximate the derivative 
expression  use the given expression with local numerical values of the variables to evaluate the derivative 
Methods for computing gradient, Jacobian, and Hessian derivatives.
The basic specification for a derivative is just the method for computing it. However, all of the derivatives take options as well. These can be specified by using a list
{method, opts}. Here is a summary of the options for the derivatives.
  
"EvaluationMonitor"  None  expression to evaluate with local values of the variables every time the derivative is evaluated, usually specified with :> instead of > to prevent symbolic evaluation 
"Sparse"  Automatic  sparse structure for the derivative, can be Automatic, True, False, or a pattern SparseArray giving the nonzero structure 
"DifferenceOrder"  1  difference order to use when finite differences are used to compute the derivative 
Options for computing gradient, Jacobian, and Hessian derivatives.
A few examples will help illustrate how these fit together.
This loads a package that contains some utility functions. 
This defines a function that is only intended to evaluate for numerical values of the variables. 
With just
Method→"Newton",
FindMinimum issues a lstol message because it was not able to resolve the minimum well enough due to lack of good derivative information.
This shows the steps taken by FindMinimum when it has to use finite differences to compute the gradient and Hessian.
Out[3]=  

The following describes how you can use the gradient option to specify the derivative.
This computes the minimum of f[x, y] using a symbolic expression for its gradient.
Out[4]=  

Symbolic derivatives are not always available. If you need extra accuracy from finite differences, you can increase the difference order from the default of 1 at the cost of extra function evaluations.
This computes the minimum of f[x, y] using a secondorder finite difference to compute the gradient.
Out[5]=  

Note that the number of function evaluations is much higher because function evaluations are used to compute the gradient, which is used to approximate the Hessian in turn. (The Hessian is computed with finite differences since no symbolic expression for it can be computed from the information given.)
The information given from
FindMinimumPlot about the number of function, gradient, and Hessian evaluations is quite useful. The
EvaluationMonitor options are what make this possible. Here is a simple example that simply counts the number of each type of evaluation. (The plot is made using
Reap and
Sow to collect the values at which the evaluations are done.)
This computes the minimum with counters to keep track of the number of steps and the number of function, gradient, and Hessian evaluations.
Out[6]=  

Using such diagnostics can be quite useful for determining what methods and/or method parameters may be most successful for a class of problems with similar characteristics.
When
Mathematica can access the symbolic structure of the function, it automatically does a structural analysis of the function and its derivatives and uses
SparseArray objects to represent the derivatives when appropriate. Since subsequent numerical linear algebra can then use the sparse structures, this can have a profound effect on the overall efficiency of the search. When
Mathematica cannot do a structural analysis, it has to assume, in general, that the structure is dense. However, if you know what the sparse structure of the derivative is, you can specify this with the
"Sparse" method option and gain huge efficiency advantages, both in computing derivatives (with finite differences, the number of evaluations can be reduced significantly) and in subsequent linear algebra. This issue is particularly important when working with vectorvalued variables. A good example for illustrating this aspect is the extended Rosenbrock problem, which has a very simple sparse structure.
This gets the extended Rosenbrock function with 1000 variables in symbolic form ready to be solved with FindRoot using the UnconstrainedProblems` package.
Out[7]//Short= 
 

This solves the problem using the symbolic form of the function.
Out[8]=  

For a function with simple form like this, it is easy to write a vector form of the function, which can be evaluated much more quickly than the symbolic form can, even with automatic compilation.
This defines a vector form of the extended Rosenbrock function, which evaluates very efficiently. 
This extracts the starting point as a vector from the problem structure.
Out[10]//Short= 
 

This solves the problem using a vector variable and the vector function for evaluation.
Out[11]=  

The solution with the function, which is faster to evaluate, winds up being slower overall because the Jacobian has to be computed with finite differences since the
x_List pattern makes it opaque to symbolic analysis. It is not so much the finite differences that are slow as the fact that it needs to do 100 function evaluations to get all the columns of the Jacobian. With knowledge of the structure, this can be reduced to two evaluations to get the Jacobian. For this function, the structure of the Jacobian is quite simple.
This defines a pattern SparseArray, which has the structure of nonzeros for the Jacobian of the extended Rosenbrock function. (By specifying _ for the values in the rules, the SparseArray is taken to be a template of the Pattern type as indicated in the output form.)
Out[12]=  

This solves the problem with the knowledge of the actual Jacobian structure, showing a significant cost savings.
Out[13]=  

When a sparse structure is given, it is also possible to have the value computed by a symbolic expression that evaluates to the values corresponding to the positions given in the sparse structure template. Note that the values must correspond directly to the positions as ordered in the
SparseArray (the ordering can be seen using
ArrayRules). One way to get a consistent ordering of indices is to transpose the matrix twice, which results in a
SparseArray with indices in lexicographic order.
This transposes the nonzero structure matrix twice to get the indices sorted.
Out[14]=  

This defines a function that will return the nonzero values in the Jacobian corresponding to the index positions in the nonzero structure matrix. 
This solves the problem with the resulting sparse symbolic Jacobian.
Out[16]=  

In this case, using the sparse Jacobian is not significantly faster because the Jacobian is so sparse that a finite difference approximation can be found for it in only two function evaluations and because the problem is well enough defined near the minimum that the extra accuracy in the Jacobian does not make any significant difference.