# Numerical Nonlinear Local Optimization

## Introduction

Numerical algorithms for constrained nonlinear optimization can be broadly categorized into gradient-based methods and direct search methods. Gradient search methods use first derivatives (gradients) or second derivatives (Hessians) information. Examples are the sequential quadratic programming (SQP) method, the augmented Lagrangian method, and the (nonlinear) interior point method. Direct search methods do not use derivative information. Examples are Nelder-Mead, genetic algorithm and differential evolution, and simulated annealing. Direct search methods tend to converge more slowly, but can be more tolerant to the presence of noise in the function and constraints.

Typically, algorithms only build up a local model of the problems. Furthermore, to ensure convergence of the iterative process, many such algorithms insist on a certain decrease of the objective function or of a merit function that is a combination of the objective and constraints. Such algorithms will, if convergent, only find the local optimum, and are called local optimization algorithms. In

*Mathematica* local optimization problems can be solved using

FindMinimum.

Global optimization algorithms, on the other hand, attempt to find the global optimum, typically by allowing decrease as well as increase of the objective/merit function. Such algorithms are usually computationally more expensive. Global optimization problems can be solved exactly using

Minimize or numerically using

NMinimize.

This solves a nonlinear programming problem,

using

Minimize, which gives an exact solution.

Out[1]= | |

This solves the same problem numerically.

NMinimize returns a machine-number solution.

Out[2]= | |

FindMinimum numerically finds a local minimum. In this example the local minimum found is also a global minimum.

Out[3]= | |

## The FindMinimum Function

FindMinimum solves local unconstrained and constrained optimization problems. This document only covers the constrained optimization case. See

"Unconstrained Optimization" for details of

FindMinimum for unconstrained optimization.

This solves a nonlinear programming problem,

using

FindMinimum.

Out[4]= | |

This provides

FindMinimum with a starting value of

for

, but uses the default starting point for

.

Out[5]= | |

The previous solution point is actually a local minimum.

FindMinimum only attempts to find a local minimum.

This contour plot of the feasible region illustrates the local and global minima.

Out[6]= | |

This is a 3D visualization of the function in its feasible region.

Out[21]= | |

### Options for FindMinimum

FindMinimum accepts these options.

Options of the FindMinimum function.

The

Method option specifies the method to use to solve the optimization problem. Currently, the only method available for constrained optimization is the

interior point algorithm.
This specifies that the interior point method should be used.

Out[8]= | |

MaxIterations specifies the maximum number of iterations that should be used. When machine precision is used for constrained optimization, the default

MaxIterations is used.

When

StepMonitor is specified, it is evaluated once every iterative step in the interior point algorithm. On the other hand,

EvaluationMonitor, when specified, is evaluated every time a function or an equality or inequality constraint is evaluated.

This demonstrates that 19 iterations are not sufficient to solve the following problem to the default tolerance. It collects points visited through the use of

StepMonitor.

Out[9]= | |

The points visited are shown using

ContourPlot. The starting point is blue, the rest yellow.

Out[10]= | |

WorkingPrecision->prec specifies that all the calculation in

FindMinimum is to be carried out at precision

prec. By default,

prec=MachinePrecision. If

prec>MachinePrecision, a fixed precision of

prec is used through the computation.

AccuracyGoal and

PrecisionGoal options are used in the following way. By default,

AccuracyGoal->Automatic, and is set to

. By default,

PrecisionGoal->Automatic and is set to

Infinity.

AccuracyGoal->ga is the same as

AccuracyGoal->{-Infinity, ga}.

Suppose

AccuracyGoal and

PrecisionGoal->p, then

FindMinimum attempts to drive the residual, which is a combination of the feasibility and the satisfaction of the Karush-Kuhn-Tucker (KKT) and complementary conditions, to be less than or equal to

. In addition, it requires the difference between the current and next iterative point,

x and

, to satisfy

, before terminating.

Out[11]= | |

The exact optimal value is computed using

Minimize, and compared with the result of

FindMinimum.

Out[13]= | |

## Examples of FindMinimum

### Finding a Global Minimum

If a global minimum is needed,

NMinimize or

Minimize should be used. However since each run of

FindMinimum tends to be faster than

NMinimize or

Minimize, sometimes it may be more efficient to use

FindMinimum, particularly for relatively large problems with a few local minima.

FindMinimum attempts to find a local minimum, therefore it will terminate if a local minimum is found.

This shows a function with multiple minima within the feasible region of

.

Out[14]= | |

With the automatic starting point,

FindMinimum converges to a local minimum.

Out[15]= | |

If the user has some knowledge of the problem, a better starting point can be given to

FindMinimum.

Out[16]= | |

Alternatively, the user can tighten the constraints.

Out[17]= | |

Finally, multiple starting points can be used and the best resulting minimum selected.

Out[19]= | |

Multiple starting points can also be done more systematically via

NMinimize, using the

method with an interior point as the post-processor.

Out[1]= | |

### Solving Minimax Problems

The minimax (also known as minmax) problem is that of finding the minimum value of a function defined as the maximum of several functions, that is,

While this problem can often be solved using general constrained optimization technique, it is more reliably solved by reformulating the problem into one with smooth objective function. Specifically, the minimax problem can be converted into the following

and solved using either

FindMinimum or

NMinimize.

This defines a function

.

This solves an unconstrained minimax problem with one variable.

Out[19]= | |

This solves an unconstrained minimax problem with two variables.

Out[12]= | |

This shows the contour of the objective function, and the optimal solution.

Out[14]= | |

This solves a constrained minimax problem.

Out[16]= | |

This shows the contour of the objective function within the feasible region, and the optimal solution.

Out[18]= | |

### Multiobjective Optimization: Goal Programming

Multiobjective programming involves minimizing several objective functions, subject to constraints. Since a solution that minimizes one function often does not minimize the others at the same time, there is usually no unique optimal solution.

Sometimes the decision maker has in mind a goal for each objective. In that case the so-called goal programming technique can be applied.

There are a number of variants of how to model a goal-programming problem. One variant is to order the objective functions based on priority, and seek to minimize the deviation of the most important objective function from its goal first, before attempting to minimize the deviations of the less important objective functions from their goals. This is called lexicographic or preemptive goal programming.

In the second variant, the weighted sum of the deviation is minimized. Specifically, the following constrained minimization problem is to be solved.

Here

stands for the positive part of the real number

. The weights

reflect the relative importance, and normalize the deviation to take into account the relative scales and units. Possible values for the weights are the inverse of the goals to be attained. The previous problem can be reformulated to one that is easier to solve.

The third variant, Chebyshev goal programming, minimizes the maximum deviation, rather than the sum of the deviations. This balances the deviation of different objective functions. Specifically, the following constrained minimization problem is to be solved.

This can be reformulated as

This defines a function

that solves the goal programming model by minimizing the weighted sum of the deviation.

This defines a function

that solves the goal programming model by minimizing the maximum deviation.

This solves a goal programming problem with two objective functions and one constraint using

with unit weighting, resulting in deviations from the goal of 13.12 and 33.28, thus a total deviation of 37, and a maximal deviation of 33.28.

This solves a goal programming problem with two objective functions and one constraint using

with unit weighting, resulting in deviations from the goal of 16 and 32, thus a maximal deviation of 32, but a total deviation of 38.

This shows the contours for the first (blue) and second (red) objective functions, the feasible region (the black line), and the optimal solution found by

(yellow point) and by

(green point).

### An Application Example: Portfolio Optimization

A powerful tool in managing investments is to spread the risk by investing in assets that have few or no correlations. For example, if asset A goes up 20% one year and is down 10% the next, asset B goes down 10% one year and is up 20% the next, and up years for A are down years for B, then holding both in equal amounts would result in a 10% increase every year, without any risk. In reality such assets are rarely available, but the concept remains a useful one.

In this example, the aim is to find the optimal asset allocation so as to minimize the risk, and achieve a preset level of return by investing in a spread of stocks, bonds, and gold.

Here are the historical returns of various assets between 1973 and 1994. For example, in 1973, S&P 500 lost

, while gold appreciated by 67.7%.

| "3m Tbill" | "long Tbond" | "SP500" | "Wilt.5000" | "Corp. Bond" | "NASDQ" | "EAFE" | "Gold" |

1973 | 1.075` | 0.942` | 0.852` | 0.815` | 0.698` | 1.023` | 0.851` | 1.677` |

1974 | 1.084` | 1.02` | 0.735` | 0.716` | 0.662` | 1.002` | 0.768` | 1.722` |

1975 | 1.061` | 1.056` | 1.371` | 1.385` | 1.318` | 0.123` | 1.354` | 0.76` |

1976 | 1.052` | 1.175` | 1.236` | 1.266` | 1.28` | 1.156` | 1.025` | 0.96` |

1977 | 1.055` | 1.002` | 0.926` | 0.974` | 1.093` | 1.03` | 1.181` | 1.2` |

1978 | 1.077` | 0.982` | 1.064` | 1.093` | 1.146` | 1.012` | 1.326` | 1.295` |

1979 | 1.109` | 0.978` | 1.184` | 1.256` | 1.307` | 1.023` | 1.048` | 2.212` |

1980 | 1.127` | 0.947` | 1.323` | 1.337` | 1.367` | 1.031` | 1.226` | 1.296` |

1981 | 1.156` | 1.003` | 0.949` | 0.963` | 0.99` | 1.073` | 0.977` | 0.688` |

1982 | 1.117` | 1.465` | 1.215` | 1.187` | 1.213` | 1.311` | 0.981` | 1.084` |

1983 | 1.092` | 0.985` | 1.224` | 1.235` | 1.217` | 1.08` | 1.237` | 0.872` |

1984 | 1.103` | 1.159` | 1.061` | 1.03` | 0.903` | 1.15` | 1.074` | 0.825` |

1985 | 1.08` | 1.366` | 1.316` | 1.326` | 1.333` | 1.213` | 1.562` | 1.006` |

1986 | 1.063` | 1.309` | 1.186` | 1.161` | 1.086` | 1.156` | 1.694` | 1.216` |

1987 | 1.061` | 0.925` | 1.052` | 1.023` | 0.959` | 1.023` | 1.246` | 1.244` |

1988 | 1.071` | 1.086` | 1.165` | 1.179` | 1.165` | 1.076` | 1.283` | 0.861` |

1989 | 1.087` | 1.212` | 1.316` | 1.292` | 1.204` | 1.142` | 1.105` | 0.977` |

1990 | 1.08` | 1.054` | 0.968` | 0.938` | 0.83` | 1.083` | 0.766` | 0.922` |

1991 | 1.057` | 1.193` | 1.304` | 1.342` | 1.594` | 1.161` | 1.121` | 0.958` |

1992 | 1.036` | 1.079` | 1.076` | 1.09` | 1.174` | 1.076` | 0.878` | 0.926` |

1993 | 1.031` | 1.217` | 1.1` | 1.113` | 1.162` | 1.11` | 1.326` | 1.146` |

1994 | 1.045` | 0.889` | 1.012` | 0.999` | 0.968` | 0.965` | 1.078` | 0.99` |

average | 1.078 | 1.093 | 1.120 | 1.124 | 1.121 | 1.046 | 1.141 | 1.130 |

This is the annual return data.

Here are the expected returns over this 22-year period for the eight assets.

Out[5]= | |

Here is the covariant matrix, which measures how the assets correlate to each other.

This finds the optimal asset allocation by minimizing the standard deviation of an allocation, subject to the constraints that the total allocation is 100% (

Total[vars]==1), the expected return is over 12% (

), and the variables must be non-negative, thus each asset is allocated a non-negative percentage (thus no shorting). The resulting optimal asset allocation suggests 15.5% in 3-month Treasury bills, 20.3% in gold, and the rest in stocks, with a resulting standard deviation of 0.0126.

Out[20]= | |

This trades less return for smaller volatility by asking for an expected return of 10%. Now there is 55.5% in 3-month Treasury bills, 10.3% in gold, and the rest in stocks.

Out[17]= | |

### Limitations of the Interior Point Method

The implementation of the interior point method in

FindMinimum requires first and second derivatives of the objective and constraints. Symbolic derivatives are first attempted, and if they fail, finite difference will be used to calculate the derivatives. If the function or constraints are not smooth, particularly if the first derivative at the optimal point is not continuous, the interior point method may experience difficulty in converging.

This shows that the interior point method has difficulty in minimizing this nonsmooth function.

Out[29]= | |

This is somewhat similar to the difficulty experienced by an unconstrained Newton's method.

Out[30]= | |

## Numerical Algorithms for Constrained Local Optimization

### The Interior Point Algorithm

The interior point algorithm solves a constrained optimization by combining constraints and the objective function through the use of the barrier function. Specifically, the general constrained optimization problem is first converted to the standard form

The non-negative constraints are then replaced by adding a barrier term to the objective function

where

is a barrier parameter.

The necessary KKT condition (assuming, for example, that the gradient of

is linearly independent) is

where

is of dimension

×

. Or

Here

is a diagonal matrix, with the diagonal element

of

if

, or

. Introducing dual variables

, then

This nonlinear system can be solved with Newton's method. Let

and

; the Jacobi matrix of the above system (

1) is

Eliminating

,

, then

, thus

Thus the nonlinear constrained problem can be solved iteratively by

with the search direction

given by solving the previous Jacobi system (

2).

To ensure convergence, you need to have some measure of success. One way of doing this is to use a merit function, such as the following augmented Lagrangian merit function.

#### Augmented Lagrangian Merit Function

This defines an augmented Lagrangian merit function

Here

is the barrier parameter and

a penalty parameter. It can be proved that if the matrix

is positive definite, then either the search direction given by (

3) is a descent direction for the above merit function (

4), or

satisfied the KKT condition (

5). A line search is performed along the search direction, with the initial step length chosen to be as close to 1 as possible, while maintaining the positive constraints. A backtracking procedure is then used until the Armijo condition is satisfied on the merit function,

with

.

#### Convergence Tolerance

The convergence criterion for the interior point algorithm is

with

set, by default, to

10^{-MachinePrecision/3}.