QuasiNewton Methods
There are many variants of quasiNewton methods. In all of them, the idea is to base the matrix
B_{k} in the quadratic model
on an approximation of the Hessian matrix built up from the function and gradient values from some or all steps previously taken.
This loads a package that contains some utility functions 
This shows a plot of the steps taken by the quasiNewton method. The path is much less direct than for Newton's method. The quasiNewton method is used by default by FindMinimum for problems that are not sums of squares.
Out[2]=  

The first thing to notice about the path taken in this example is that it starts in the wrong direction. This direction is chosen because at the first step all the method has to go by is the gradient, and so it takes the direction of steepest descent. However, in subsequent steps, it incorporates information from the values of the function and gradient at the steps taken to build up an approximate model of the Hessian.
The methods used by
Mathematica are the BroydenFletcherGoldfarbShanno (BFGS) updates and, for large systems, the limited memory BFGS (LBFGS) methods, in which the model
B_{k} is not stored explicitly, but rather
B_{k}^{1} f (x_{k}) is calculated by gradients and step directions stored from past steps.
The BFGS method is implemented such that instead of forming the model Hessian
B_{k} at each step, Cholesky factors
L_{k} such that
L_{k}.L_{k}^{T}=
B_{k} are computed so that only
O (n^{2}) operations are needed to solve the system
B_{k}s_{k} =  f (x_{k}) [
DS96] for a problem with
n variables.
For largescale sparse problems, the BFGS method can be problematic because, in general, the Cholesky factors (or the Hessian approximation
B_{k} or its inverse) are dense, so the
O (n^{2}) memory and operations requirements become prohibitive compared to algorithms that take advantage of sparseness. The LBFGS algorithm [
NW99] forms an approximation to the inverse Hessian based on a the last
m past steps, which are stored. The Hessian approximation may not be as complete, but the memory and order of operations are limited to
O (n m) for a problem with
n variables. In
Mathematica 5, for problems over 250 variables, the algorithm is switched automatically to LBFGS. You can control this with the method option
"StepMemory">m. With
m=, the full BFGS method will always be used. Choosing an appropriate value of
m is a tradeoff between speed of convergence and the work done per step. With
m<3, you are most likely better off using a "
conjugate gradient" algorithm.
This shows the same example function with the minimum computed using LBFGS with m=5.
Out[3]=  

QuasiNewton methods are chosen as the default in
Mathematica because they are typically quite fast and do not require computation of the Hessian matrix, which can be quite expensive both in terms of the symbolic computation and numerical evaluation. With an adequate "
line search", they can be shown to converge superlinearly [
NW99] to a local minimum where the Hessian is positive definite. This means that
or, in other words, the steps keep getting smaller. However, for very high precision, this does not compare to the
qquadratic convergence rate of
Newton's method.
This shows the number of steps and function evaluations required to find the minimum to high precision for the problem shown.
Out[4]=  

Newton's method is able to find ten times as many digits with far fewer steps because of its quadratic convergence rate. However, the convergence with the quasiNewton method is still superlinear since the ratio of the errors is clearly going to zero.
This makes a plot showing the ratios of the errors in the computation. The ratios of the errors are shown on a logarithmic scale so that the trend can clearly be seen over a large range of magnitudes.
Out[5]=  

The following table summarizes the options you can use with quasiNewton methods.
  
"StepMemory"  Automatic  the effective number of steps to "remember" in the Hessian approximation, can be a positive integer or Automatic 
"StepControl"  "LineSearch"  how to control steps, can be "LineSearch" or None 
Method options for Method>"QuasiNewton".