WOLFRAM LANGUAGE TUTORIAL
Trust Region Methods
A trust region method has a region around the current search point, where the quadratic model
for local minimization is "trusted" to be correct and steps are chosen to stay within this region. The size of the region is modified during the search, based on how well the model agrees with actual function evaluations.
Very typically, the trust region is taken to be an ellipse such that . is a diagonal scaling (often taken from the diagonal of the approximate Hessian) and is the trust region radius, which is updated at each step.
When the step based on the quadratic model alone lies within the trust region, then, assuming the function value gets smaller, that step will be chosen. Thus, just as with line search methods, the step control does not interfere with the convergence of the algorithm near to a minimum where the quadratic model is good. When the step based on the quadratic model lies outside the trust region, a step just up to the boundary of the trust region is chosen, such that the step is an approximate minimizer of the quadratic model on the boundary of the trust region.
Once a step is chosen, the function is evaluated at the new point, and the actual function value is checked against the value predicted by the quadratic model. What is actually computed is the ratio of actual to predicted reduction.
If is close to 1, then the quadratic model is quite a good predictor and the region can be increased in size. On the other hand, if is too small, the region is decreased in size. When is below a threshold, , the step is rejected and recomputed. You can control this threshold with the method option "AcceptableStepRatio"->η. Typically the value of is quite small to avoid rejecting steps that would be progress toward a minimum. However, if obtaining the quadratic model at a point is quite expensive (e.g. evaluating the Hessian takes a relatively long time), a larger value of will reduce the number of Hessian evaluations, but it may increase the number of function evaluations.
To start the trust region algorithm, an initial radius needs to be determined. By default the Wolfram Language uses the size of the step based on the model (1) restricted by a fairly loose relative step size limit. However, in some cases, this may take you out of the region you are primarily interested in, so you can specify a starting radius using the option . The option contains Scaled in its name because the trust region radius works through the diagonal scaling , so this is not an absolute step size.
This loads a package that contains some utility functions.
This shows the steps and evaluations taken during a search for a local minimum of a function similar to Rosenbrock's function, using Newton's method with trust region step control.
The plot looks quite bad because the search has extended over such a large region that the fine structure of the function cannot really be seen on that scale.
This shows the steps and evaluations for the same function, but with a restricted initial trust region radius
. Here the search stays much closer to the initial condition and follows the narrow valley.
It is also possible to set an overall maximum bound for the trust region radius by using the option so that for any step, .
Trust region methods can also have difficulties with functions which are not smooth due to problems with numerical roundoff in the function computation. When the function is not sufficiently smooth, the radius of the trust region will keep getting reduced. Eventually, it will get to the point at which it is effectively zero.
This finds a local minimum for the function using the default method. The default method in this case is the (trust region) Levenberg–
Marquardt method, since the function is a sum of squares.
The message means that the size of the trust region has become effectively zero relative to the size of the search point, so steps taken would have negligible effect. Note: On some platforms, due to subtle differences in machine arithmetic, the message may not show up. This is because the reasons leading to the message have to do with numerical uncertainty, which can vary between different platforms.
This makes a plot of the variation function along the
direction at the final point found.
The plot along one direction makes it fairly clear why no more improvement is possible. Part of the reason the Levenberg–Marquardt method gets into trouble in this situation is that convergence is relatively slow because the residual is nonzero at the minimum. With Newton's method, the convergence is faster, and the full quadratic model allows for a better estimate of step size, so that FindMinimum can have more confidence that the default tolerances have been satisfied.
The following table summarizes the options for controlling trust region step control.
|"AcceptableStepRatio"||1/10000||the threshold , such that when the actual to prediction reduction , the search is moved to the computed step|
|"MaxScaledStepSize"||∞||the value , such that the trust region size for all steps|
|"StartingScaledStepSize"||Automatic||the initial trust region size |
Method options for .