Numerical Nonlinear Local Optimization
Introduction
Numerical algorithms for constrained nonlinear optimization can be broadly categorized into gradientbased methods and direct search methods. Gradient search methods use first derivatives (gradients) or second derivatives (Hessians) information. Examples are the sequential quadratic programming (SQP) method, the augmented Lagrangian method, and the (nonlinear) interior point method. Direct search methods do not use derivative information. Examples are NelderMead, genetic algorithm and differential evolution, and simulated annealing. Direct search methods tend to converge more slowly, but can be more tolerant to the presence of noise in the function and constraints.
Typically, algorithms only build up a local model of the problems. Furthermore, to ensure convergence of the iterative process, many such algorithms insist on a certain decrease of the objective function or of a merit function which is a combination of the objective and constraints. Such algorithms will, if convergent, only find the local optimum, and are called local optimization algorithms. In
Mathematica local optimization problems can be solved using
FindMinimum.
Global optimization algorithms, on the other hand, attempt to find the global optimum, typically by allowing decrease as well as increase of the objective/merit function. Such algorithms are usually computationally more expensive. Global optimization problems can be solved exactly using
Minimize or numerically using
NMinimize.
This solves a nonlinear programming problem,
Min xy
s.t 3 x^{2}+2 x yy^{2}≥1
using Minimize, which gives an exact solution.
Out[1]=  

This solves the same problem numerically. NMinimize returns a machinenumber solution.
Out[2]=  

FindMinimum numerically finds a local minimum. In this example the local minimum found is also a global minimum.
Out[3]=  

The FindMinimum Function
FindMinimum solves local unconstrained and constrained optimization problems. This document only covers the constrained optimization case. See
"Unconstrained Optimization" for details of
FindMinimum for unconstrained optimization.
This provides FindMinimum with a starting value of 2 for x, but uses the default starting point for y.
Out[5]=  

The previous solution point is actually a local minimum.
FindMinimum only attempts to find a local minimum.
This contour plot of the feasible region illustrates the local and global minima.
Out[6]=  

This is a 3D visualization of the function in its feasible region.
Out[21]=  

Options for FindMinimum
FindMinimum accepts these options.
The
Method option specifies the method to use to solve the optimization problem. Currently, the only method available for constrained optimization is the
interior point algorithm.
This specifies that the interior point method should be used.
Out[8]=  

MaxIterations specifies the maximum number of iterations that should be used. When machine precision is used for constrained optimization, the default
MaxIterations>500 is used.
When
StepMonitor is specified, it is evaluated once every iterative step in the interior point algorithm. On the other hand,
EvaluationMonitor, when specified, is evaluated every time a function or an equality or inequality constraint is evaluated.
This demonstrates that 19 iterations are not sufficient to solve the following problem to the default tolerance. It collects points visited through the use of StepMonitor.
Out[9]=  

The points visited are shown using ContourPlot. The starting point is blue, the rest yellow.
Out[10]=  

WorkingPrecision>prec specifies that all the calculation in
FindMinimum is to be carried out at precision
prec. By default,
prec =MachinePrecision. If
prec > MachinePrecision, a fixed precision of
prec is used through the computation.
AccuracyGoal and
PrecisionGoal options are used in the following way. By default,
AccuracyGoal>Automatic, and is set to
prec/3. By default,
PrecisionGoal>Automatic and is set to
Infinity.
AccuracyGoal>ga is the same as
AccuracyGoal>{Infinity, ga}.
Suppose
AccuracyGoal>{a, ga} and
PrecisionGoal>p, then
FindMinimum attempts to drive the residual, which is a combination of the feasibility and the satisfaction of the KarushKuhnTucker (KKT) and complementary conditions, to be less than or equal to
tol=10^{ga}. In addition, it requires the difference between the current and next iterative point,
x and
x^{+}, to satisfy
x^{+}x<=10^{a}+10^{p}x, before terminating.
Examples of FindMinimum
Finding a Global Minimum
If a global minimum is needed,
NMinimize or
Minimize should be used. However since each run of
FindMinimum tends to be faster than
NMinimize or
Minimize, sometimes it may be more efficient to use
FindMinimum, particularly for relatively large problems with a few local minima.
FindMinimum attempts to find a local minimum, therefore it will terminate if a local minimum is found.
This shows a function with multiple minima within the feasible region of 10≤x≤10.
Out[14]=  

If the user has some knowledge of the problem, a better starting point can be given to FindMinimum.
Out[16]=  

Alternatively, the user can tighten the constraints.
Out[17]=  

Finally, multiple starting points can be used and the best resulting minimum selected.
Out[19]=  

Multiple starting points can also be done more systematically via NMinimize, using the "RandomSearch" method with an interior point as the postprocessor.
Out[1]=  

Solving Minimax Problems
The minimax (also known as minmax) problem is that of finding the minimum value of a function defined as the maximum of several functions, that is,
While this problem can often be solved using general constrained optimization technique, it is more reliably solved by reformulating the problem into one with smooth objective function. Specifically, the minimax problem can be converted into the following
and solved using either
FindMinimum or
NMinimize.
This defines a function FindMinMax. 
This solves an unconstrained minimax problem with one variable.
Out[19]=  

This solves an unconstrained minimax problem with two variables.
Out[12]=  

This shows the contour of the objective function, and the optimal solution.
Out[14]=  

This solves a constrained minimax problem.
Out[16]=  

This shows the contour of the objective function within the feasible region, and the optimal solution.
Out[18]=  

Multiobjective Optimization: Goal Programming
Multiobjective programming involves minimizing several objective functions, subject to constraints. Since a solution that minimizes one function often does not minimize the others at the same time, there is usually no unique optimal solution.
Sometimes the decision maker has in mind a goal for each objective. In that case the socalled goal programming technique can be applied.
There are a number of variants of how to model a goalprogramming problem. One variant is to order the objective functions based on priority, and seek to minimize the deviation of the most important objective function from its goal first, before attempting to minimize the deviations of the less important objective functions from their goals. This is called lexicographic or preemptive goal programming.
In the second variant, the weighted sum of the deviation is minimized. Specifically, the following constrained minimization problem is to be solved.
Here
a^{+} stands for the positive part of the real number
a. The weights
w_{i} reflect the relative importance, and normalize the deviation to take into account the relative scales and units. Possible values for the weights are the inverse of the goals to be attained. The previous problem can be reformulated to one that is easier to solve.
The third variant, Chebyshev goal programming, minimizes the maximum deviation, rather than the sum of the deviations. This balances the deviation of different objective functions. Specifically, the following constrained minimization problem is to be solved.
This can be reformulated as
This defines a function GoalProgrammingWeightedAverage that solves the goal programming model by minimizing the weighted sum of the deviation. 
This defines a function GoalProgrammingChebyshev that solves the goal programming model by minimizing the maximum deviation. 
This solves a goal programming problem with two objective functions and one constraint using GoalProgrammingWeightedAverage with unit weighting, resulting in deviations from the goal of 13.12 and 33.28, thus a total deviation of 37, and a maximal deviation of 33.28. 
This solves a goal programming problem with two objective functions and one constraint using GoalProgrammingChebyshev with unit weighting, resulting in deviations from the goal of 16 and 32, thus a maximal deviation of 32, but a total deviation of 38. 
This shows the contours for the first (blue) and second (red) objective functions, the feasible region (the black line), and the optimal solution found by GoalProgrammingWeightedAverage (yellow point) and by GoalProgrammingChebyshev (green point). 
An Application Example: Portfolio Optimization
A powerful tool in managing investments is to spread the risk by investing in assets that have few or no correlations. For example, if asset A goes up 20% one year and is down 10% the next, asset B goes down 10% one year and is up 20% the next, and up years for A are down years for B, then holding both in equal amounts would result in a 10% increase every year, without any risk. In reality such assets are rarely available, but the concept remains a useful one.
In this example, the aim is to find the optimal asset allocation so as to minimize the risk, and achieve a preset level of return, by investing in a spread of stocks, bonds, and gold.
Here are the historical returns of various assets between 1973 and 1994. For example, in 1973, S&P 500 lost
10.852=14.8%, while gold appreciated by 67.7%.
 "3m Tbill"  "long Tbond"  "SP500"  "Wilt.5000"  "Corp. Bond"  "NASDQ"  "EAFE"  "Gold" 
1973  1.075`  0.942`  0.852`  0.815`  0.698`  1.023`  0.851`  1.677` 
1974  1.084`  1.02`  0.735`  0.716`  0.662`  1.002`  0.768`  1.722` 
1975  1.061`  1.056`  1.371`  1.385`  1.318`  0.123`  1.354`  0.76` 
1976  1.052`  1.175`  1.236`  1.266`  1.28`  1.156`  1.025`  0.96` 
1977  1.055`  1.002`  0.926`  0.974`  1.093`  1.03`  1.181`  1.2` 
1978  1.077`  0.982`  1.064`  1.093`  1.146`  1.012`  1.326`  1.295` 
1979  1.109`  0.978`  1.184`  1.256`  1.307`  1.023`  1.048`  2.212` 
1980  1.127`  0.947`  1.323`  1.337`  1.367`  1.031`  1.226`  1.296` 
1981  1.156`  1.003`  0.949`  0.963`  0.99`  1.073`  0.977`  0.688` 
1982  1.117`  1.465`  1.215`  1.187`  1.213`  1.311`  0.981`  1.084` 
1983  1.092`  0.985`  1.224`  1.235`  1.217`  1.08`  1.237`  0.872` 
1984  1.103`  1.159`  1.061`  1.03`  0.903`  1.15`  1.074`  0.825` 
1985  1.08`  1.366`  1.316`  1.326`  1.333`  1.213`  1.562`  1.006` 
1986  1.063`  1.309`  1.186`  1.161`  1.086`  1.156`  1.694`  1.216` 
1987  1.061`  0.925`  1.052`  1.023`  0.959`  1.023`  1.246`  1.244` 
1988  1.071`  1.086`  1.165`  1.179`  1.165`  1.076`  1.283`  0.861` 
1989  1.087`  1.212`  1.316`  1.292`  1.204`  1.142`  1.105`  0.977` 
1990  1.08`  1.054`  0.968`  0.938`  0.83`  1.083`  0.766`  0.922` 
1991  1.057`  1.193`  1.304`  1.342`  1.594`  1.161`  1.121`  0.958` 
1992  1.036`  1.079`  1.076`  1.09`  1.174`  1.076`  0.878`  0.926` 
1993  1.031`  1.217`  1.1`  1.113`  1.162`  1.11`  1.326`  1.146` 
1994  1.045`  0.889`  1.012`  0.999`  0.968`  0.965`  1.078`  0.99` 
average  1.078  1.093  1.120  1.124  1.121  1.046  1.141  1.130 
This is the annual return data. 
Here are the expected returns over this 22year period for the eight assets.
Out[5]=  

Here is the covariant matrix, which measures how the assets correlate to each other. 
This finds the optimal asset allocation by minimizing the standard deviation of an allocation, subject to the constraints that the total allocation is 100% ( Total[vars]1), the expected return is over 12% ( vars.ER≥1.12), and the variables must be nonnegative, thus each asset is allocated a nonnegative percentage (thus no shorting). The resulting optimal asset allocation suggests 15.5% in 3month treasury bills, 20.3% in gold, and the rest in stocks, with a resulting standard deviation of 0.0126.
Out[20]=  

This trades less return for smaller volatility by asking for an expected return of 10%. Now we have 55.5% in 3month treasury bills, 10.3% in gold, and the rest in stocks.
Out[17]=  

Limitations of the Interior Point Method
The implementation of the interior point method in
FindMinimum requires first and second derivatives of the objective and constraints. Symbolic derivatives are first attempted, and if they fail, finite difference will be used to calculate the derivatives. If the function or constraints are not smooth, particularly if the first derivative at the optimal point is not continuous, the interior point method may experience difficulty in converging.
This shows that the interior point method has difficulty in minimizing this nonsmooth function.
Out[29]=  

This is somewhat similar to the difficulty experienced by an unconstrained Newton's method.
Out[30]=  

Numerical Algorithms for Constrained Local Optimization
The Interior Point Algorithm
The interior point algorithm solves a constrained optimization by combining constraints and the objective function through the use of the barrier function. Specifically, the general constrained optimization problem is first converted to the standard form
The nonnegative constraints are then replaced by adding a barrier term to the objective function
where
>0 is a barrier parameter.
The necessary KKT condition (assuming, for example, that the gradient of
h is linearly independent) is
where
A (x)= (h_{1} (x), h_{2} (x), ..., h_{m} (x))^{T} is of dimension
m×
n. Or
Here
X is a diagonal matrix, with the diagonal element
i of
x_{i} if
iI, or
0. Introducing dual variables
z=X^{1}e, then
This nonlinear system can be solved with Newton's method. Let
L (x, y)=f (x)h (x)^{T}y and
H (x, y)=^{2}L (x, y)=^{2}f (x)^{m}_{i=1}y_{i}^{2}h_{i} (x); the Jacobi matrix of the above system (
1) is
Eliminating
z,
z=X^{1} (Zx+d_{xz}), then
(H (x, y)+X^{1}Z)_{x}A (x)^{T}y=d_{}X^{1}d_{xz}, thus
Thus the nonlinear constrained problem can be solved iteratively by
with the search direction
{x, y, z} given by solving the previous Jacobi system (
2).
To ensure convergence, you need to have some measure of success. One way of doing this is to use a merit function, such as the following augmented Langrangian merit function.
Augmented Langrangian Merit Function
This defines an augmented Langrangian merit function
Here
>0 is the barrier parameter and
>0 a penalty parameter. It can be proved that if the matrix
N (x, y)=H (x, y)+X^{1}Z is positive definite, then either the search direction given by (
3) is a decent direction for the above merit function (
4), or
{x, y, z, } satisfied the KKT condition (
5). A line search is performed along the search direction, with the initial step length chosen to be as close to 1 as possible, while maintaining the positive constraints. A backtracking procedure is then used until the Armijo condition is satisfied on the merit function,
(x+tx, )≤ (x, )+t (x, )^{T}x with
(0, 1/2].
Convergence Tolerance
The convergence criterion for the interior point algorithm is
with
tol set, by default, to
10^{MachinePrecision/3}.