Line Search Methods

A method like Newton's method chooses a step, but the validity of that step only goes as far as the Newton quadratic model for the function really reflects the function. The idea of a line search is to use the direction of the chosen step, but to control the length, by solving a one-dimensional problem of minimizing

where is the search direction chosen from the position . Note that

so if you can compute the gradient, you can effectively do a one-dimensional search with derivatives.

Typically, an effective line search only looks toward since a reasonable method should guarantee that the search direction is a descent direction, which can be expressed as .

It is typically not worth the effort to find an exact minimum of since the search direction is rarely exactly the right direction. Usually it is enough to move closer.

One condition that measures progress is called the Armijo or sufficient decrease condition for a candidate .

Often with this condition, methods will converge, but for some methods, Armijo alone does not guarantee convergence for smooth functions. With the additional curvature condition

many methods can be proven to converge for smooth functions. Together these conditions are known as the strong Wolfe conditions. You can control the parameters and with the "DecreaseFactor"-> and "CurvatureFactor"-> options of .

The default value for "CurvatureFactor"-> is , except for Method->"ConjugateGradient" where is used since the algorithm typically works better with a closer-to-exact line search. The smaller is, the closer to exact the line search is.

If you look at graphs showing iterative searches in two dimensions, you can see the evaluations spread out along the directions of the line searches. Typically, it only takes a few iterations to find a point satisfying the conditions. However, the line search is not always able to find a point that satisfies the conditions. Usually this is because there is insufficient precision to compute the points closely enough to satisfy the conditions, but it can also be caused by functions that are not completely smooth or vary extremely slowly in the neighborhood of a minimum.

This loads a package that contains some utility functions.
In[1]:=
Click for copyable input
In[2]:=
Click for copyable input
Out[2]=

This runs into problems because the real differences in the function are negligible compared to evaluation differences around the point, as can be seen from the plot.

In[24]:=
Click for copyable input
Out[24]=

Sometimes it can help to subtract out the constants so that small changes in the function are more significant.

In[18]:=
Click for copyable input
Out[18]=

In this case, however, the approximation is only slightly closer because the function is quite noisy near 0, as can be seen from the plot.

In[19]:=
Click for copyable input
Out[19]=

Thus, to get closer to the correct value of zero, higher precision is required to compute the function more accurately.

For some problems, particularly where you may be starting far from a root or a local minimum, it may be desirable to restrict steps. With line searches, it is possible to do this by using the method option. The default value picked for this is designed to keep searches from going wildly out of control, yet at the same time not prevent a search from using reasonably large steps if appropriate.

This is an example of a problem where the Newton step is very large because the starting point is at a position where the Jacobian (derivative) is nearly singular. The step size is (not severely) limited by the option.
In[3]:=
Click for copyable input
Out[3]=
This shows the same example but with a more rigorous step-size limitation, which finds the root near the starting condition.
In[4]:=
Click for copyable input
Out[4]=

Note that you need to be careful not to set the option too small, or it will affect convergence, especially for minima and roots near zero.

The following table shows a summary of the options, which can be used to control line searches.

option name
default value
"Method"Automaticmethod to use for executing the line search; can be Automatic, , , or
"CurvatureFactor"Automaticfactor in the Wolfe conditions, between 0 and 1; smaller values of result in a more exact line search
"DecreaseFactor"1/10000factor in the Wolfe conditions, between 0 and
"MaxRelativeStepSize"10largest step that will be taken relative to the norm of the current search point, can be any positive number or for no restriction

Method options for .

The following sections will describe the three line search algorithms implemented in Mathematica. Comparisons will be made using the Rosenbrock function.

This uses the Unconstrained Problems Package to set up the classic Rosenbrock function, which has a narrow curved valley.
In[5]:=
Click for copyable input
Out[5]=

MoreThuente

The default line search used by FindMinimum, FindMaximum, and FindFit is one described by More and Thuente in [MT94]. It tries to find a point that satisfies both the decrease and curvature conditions by using bracketing and quadratic and cubic interpolation.

This shows the steps and evaluations done with Newton's method with the default line search parameters. Points with just red and green are where the function and gradient were evaluated in the line search, but the Wolfe conditions were not satisfied so as to take a step.
In[10]:=
Click for copyable input

Out[10]=

The points at which only the function and gradient were evaluated were the ones attempted in the line search phase that did not satisfy both conditions. Unless restricted by , the line search always starts with the full step length (), so that if the full (in this case Newton) step satisfies the line search criteria, it will be taken, ensuring a full convergence rate close to a minimum.

Decreasing the curvature factor, which means that the line search ends nearer to the exact minimum, decreases the number of steps taken by Newton's method but increases the total number of function and gradient evaluations.

This shows the steps and evaluations done with Newton's method with a curvature factor in the line search parameters that is smaller than the default. Points with just red and green are where the function and gradient were evaluated in the line search, but the Wolfe conditions were not satisfied so as to take a step.
In[31]:=
Click for copyable input
Out[31]=

This example demonstrates why a more exact line search is not necessarily better. When the line search takes the step to the right at the bottom of the narrow valley, the Newton step is based on moving along the valley without seeing its curvature (the curvature of the valley is beyond quadratic order), so the Newton steps end up being far too long, even though the direction is better. On the other hand, some methods, such as the conjugate gradient method, need a better line search to improve convergence.

Backtracking

This is a simple line search that starts from the given step size and backtracks toward a step size of 0, stopping when the sufficient decrease condition is met. In general with only backtracking, there is no guarantee that you can satisfy the curvature condition, even for nice functions, so the convergence properties of the methods are not assured. However, the backtracking line search also does not need to evaluate the gradient at each point, so if gradient evaluations are relatively expensive, this may be a good choice. It is used as the default line search in FindRoot because evaluating the gradient of the merit function involves computing the Jacobian, which is relatively expensive.

In[32]:=
Click for copyable input
Out[32]=

Each backtracking step is taken by doing a polynomial interpolation and finding the minimum point for the interpolant. This point is used as long as it lies between and where is the previous value of the parameter and . By default, and , but they can be controlled by the method option . If you give a single value for the factors, this sets , and no interpolation is used. The value 1/2 gives bisection.

In this example, the effect of the relatively large backtrack factor is quite apparent.
In[33]:=
Click for copyable input
Out[33]=
option name
default value
"BacktrackFactors"{1/10, 1/2}determine the minimum and maximum factor by which the attempted step length must shrink between backtracking steps

Method option for line search Method->"Backtracking".

Brent

This uses the derivative-free univariate method of Brent [Br02] for the line search. It attempts to find the minimum of to within tolerances, regardless of the decrease and curvature factors. In effect, it has two phases. First, it tries to bracket the root, then it uses Brent's combined interpolation/golden section method to find the minimum. The advantage of this line search is that it does not require, as the other two methods do, that the step be in a descent direction, since it will look in both directions in an attempt to bracket the minimum. As such it is very appropriate for the derivative-free principal axis method. The downside of this line search is that it typically uses many function evaluations, so it is usually less efficient than the other two methods.

This example shows the effect of using the Brent method for line search. Note that in the phase of bracketing the root, it may use negative values of . Even though the number of Newton steps is relatively small in this example, the total number of function evaluations is much larger than for other line search methods.
In[34]:=
Click for copyable input
Out[34]=
New to Mathematica? Find your learning path »
Have a question? Ask support »