Numerical Examples

The functions accessible with Wolfram LibraryLink make it possible to optimize numerical computations while still keeping the flexibility and generality of the Wolfram Language. If you have a large existing numerical code, both the Wolfram Symbolic Transfer Protocol (WSTP) and LibraryLink provide good ways of interfacing the code to be driven from the Wolfram Language. On the other hand, if you are developing a numerical computation, you can prototype it with the Wolfram Language, and then if some parts prove to be bottlenecks, you can use LibraryFunction to speed up those parts. LibraryFunction also interfaces directly with the Wolfram Language's numerical computation functions so you can sometimes speed up computations by writing code that will be evaluated at different sample points. The focus of this tutorial is on the latter two uses of LibraryLink.

The source for the examples shown in this tutorial is found in the documentation paclet.

demo_numerical.csource code for numerical examples compiled into the shared library demo_numerical

You can find this by evaluating the following input.

In[8]:=
Click for copyable input
Out[8]=
This shows the file source.

Basic Quadratic Function Example

Following is a very simple example of a function that just computes .

DLLEXPORT int parabola(WolframLibraryData libData, mint Argc, MArgument *Args, MArgument Res)
{
    mreal x, a, f;
    if (Argc != 2) return LIBRARY_FUNCTION_ERROR;
    x = MArgument_getReal(Args[0]);
    a = MArgument_getReal(Args[1]);
    f = x*x - a;
    MArgument_setReal(Res, f);
    return LIBRARY_NO_ERROR;
}
This loads the function.
In[15]:=
Click for copyable input
Out[15]=

Once the function is loaded, it can be used just as you would an ordinary function; it can be plotted, sampled, and so on.

This makes a surface plot of the function.
In[24]:=
Click for copyable input
Out[24]=
This makes a plot of the root as a function of .
In[27]:=
Click for copyable input
Out[27]=

For such a simple function, there is no real advantage to using LibraryFunction because, compared to the cost of evaluating the function, the overhead of getting results to and from the Wolfram Language is large.

Mandelbrot Set

On the other hand, if the function is more expensive to evaluate, then there can be a significant advantage. An example of this is a function that can be used to make an image of the Mandelbrot set.

The function mandelbrot shown below computes the number of iterates of starting at to go outside of TemplateBox[{{z, _, n}}, Abs]=4, in which case the sequence is unbounded. If the iteration continues until is equal to mandelbrot_max_iterations, then the point is assumed to be in the set and 0 is returned.

static mint mandelbrot_max_iterations = 1000;

DLLEXPORT int mandelbrot(WolframLibraryData libData, mint Argc, MArgument *Args, MArgument Res)
{
    mint n = 0;
    mcomplex z = {0.,0.};
    mcomplex c = MArgument_getComplex(Args[0]);
    mreal rr = mcreal(z)*mcreal(z);
    mreal ii = mcimag(z)*mcimag(z);
    while ((n < mandelbrot_max_iterations) && (rr + ii < 4)) {
        mcimag(z) = 2.*mcreal(z)*mcimag(z) + mcimag(c);
        mcreal(z) = rr - ii + mcreal(c);
        rr = mcreal(z)*mcreal(z);
        ii = mcimag(z)*mcimag(z);
        iteration++;
    }

    if (n == mandelbrot_max_iterations)
        n = 0;

    MArgument_setInteger(Res, n);
    return LIBRARY_NO_ERROR;
}

Note that since C does not have built-in complex arithmetic, the mathematical operations are implemented using real arithmetic.

Load the function.
In[1]:=
Click for copyable input
Out[1]=

The function can now be used like any function in the Wolfram Language that evaluates for points in the complex plane.

Make a surface plot of the function.
In[2]:=
Click for copyable input
Out[2]=

This is perhaps not the best way to visualize the set because the function is so discontinuous. A more common and much faster thing to do is to make an image by sampling the function on a regular grid of points.

Sample the function on a regular grid of points.
In[3]:=
Click for copyable input
Out[3]=
This makes a binary image.
In[4]:=
Click for copyable input
Out[4]=

A color image can be made by making a map from the function values to color channels and using Raster. Image could be used, but for an interactive use like the one shown below, Raster is much faster to display.

This makes a color image.
In[5]:=
Click for copyable input
Out[6]=

A random color map highlights the differences between escape times.

In[7]:=
Click for copyable input
Out[8]=

Often the goal for something like this is to get the data for the image as quickly as possible. In the above examples, the main part of the calculation done by the C function is done very quickly, but the rest of the computation can be sped up by using the Wolfram Language compiler. There is a tight integration in the Wolfram Language compiler with LibraryFunction, so this can be made very efficient.

This makes a CompiledFunction that will give the color image.
In[9]:=
Click for copyable input

The reason that With is used here is because Compile does not evaluate its arguments, and if and had just been used, they would be considered as symbols that need to be resolved at runtime by external evaluation. By making them explicit, the compiler uses the most efficient possible evaluation.

The Listable attribute means that it can be evaluated for several complex points given in a list.

This evaluates the CompiledFunction for several points on the imaginary axis.
In[10]:=
Click for copyable input
Out[10]=

This means that to get the color channel data all you have to do is give this function all of the sample points required. A fast way to do that is to write a compiled function.

This defines a CompiledFunction that will give sample points on a regular grid in the complex plane. The direction of is reversed so that moving up in the image corresponds to increasing .
In[11]:=
Click for copyable input
This defines a function that gives an × image of the Mandelbrot set on the rectangle in the plane from to .
In[12]:=
Click for copyable input
This gives the total time taken to make a 512×512 image of the Mandelbrot set.
In[13]:=
Click for copyable input
Out[13]=

The timing for getting the whole image using these compiled functions is faster than it was for just evaluating the function at the sample points. This is because much of the time shown for evaluating the sample points above was spent in actually constructing the Table as a List instead of a ; with the compiled functions, packed arrays are used throughout.

If you have a machine with several cores, the computation can be made even faster by using parallelization.

This redefines the CompiledFunction for computing the color channels to use parallelization.
In[14]:=
Click for copyable input
This gives the total time taken to make a 512×512 image of the Mandelbrot set using parallel evaluation (the time given here was on an 8-core machine).
In[15]:=
Click for copyable input
Out[15]=

The speedup, though significant, is not linear in the number of processors. This is because some parts, like the construction of the sample points and the generation of the image, are done on one processor.

With the update this fast, it is practical to set up an interactive interface that allows you to explore some aspects of the set. The setup using EventHandler and Dynamic is slightly complicated, but the key to having it work effectively is the speed of the mimage function set up above.

In[21]:=
Click for copyable input

If you have a fairly fast machine, you should be able to do this as many as 512 points in each direction. If not, using 200 points will give an adequate picture. To run this, it is suggested to use

where n is the number of points in each direction. Note that if you zoom in really far, the images lose some clarity for two reasons: first, to get a sharp boundary may require more than the 1000 iteration limit encoded in the C code; and second, because in some cases, higher precision may be needed to truly resolve the trajectories of nearby points.

It is worth noting here that the Wolfram Language compiler can automatically generate C code that can be compiled into a shared library that can be loaded as a LibraryFunction. For this example, the generated C code is slower, but not much slower than the handwritten function.

This defines a CompiledFunction for evaluating the number of iterations for the Mandelbrot set.
In[17]:=
Click for copyable input
This combines the function into one for producing an image of the Mandelbrot set.
In[18]:=
Click for copyable input
This gives the total time taken to make a 512×512 image of the Mandelbrot set using the generated code with parallel evaluation (the time given here was on an 8-core machine).
In[20]:=
Click for copyable input
Out[20]=

So, it takes just about twice as long as the handwritten C code.

Brusselator PDE

The so-called "Brusselator" PDE problem for modeling a chemical reaction described in [HNW93] and [HW96] provides a good example for defining LibraryFunction for use in functions like NDSolve that have some component of symbolic preprocessing.

This sets up the parameters and equations for the PDE problem to be solved.
In[25]:=
Click for copyable input

The problem can be solved directly by NDSolve, and part of the point of this example is to compare solution speed with NDSolve versus a partially hand-coded discretization.

This solves the problem with NDSolve.
In[28]:=
Click for copyable input
Out[28]=
This shows a surface plot of the solution.
In[29]:=
Click for copyable input
Out[29]=

In the reference [HW96], the problem was set up using second-order differences and a discretization with points, and for large it "has all the characteristics of a 'stiff' problem". You can have NDSolve do the same discretization by forcing second-order differences and setting the number of spatial grid points using method options for the method of lines. You can also get the right-hand side of the ODEs that NDSolve effectively solves by using and extracting the function from the resulting object.

This gets the object for the discretized equations.
In[30]:=
Click for copyable input
Out[30]=
This gets the function and the initial conditions from the object.
In[31]:=
Click for copyable input

The function in demo_numerical.c is defined to give the same discretization.

Load as a library function.
In[33]:=
Click for copyable input
Out[33]=

Even though does not appear explicitly in the discretized right-hand side, the function was still defined to have it as the first argument. With this, NDSolve is able to make a very inefficient interface to it. The values given by this function are almost identical to those given by the function generated by NDSolve.

Compare the function values.
In[34]:=
Click for copyable input
Out[34]=

The small differences are simply due to small differences in the order of the arithmetic operations.

Compare the speed of evaluation.
In[35]:=
Click for copyable input
Out[36]=

So the LibraryFunction is almost 20 times faster!

When NDSolve solved the PDE, it used the function as the right-hand side for a first-order system of ODEs, . By using vector initial conditions, this can be given to NDSolve directly.

This solves the system of ODEs using the NDSolve right-hand side function. The AccuracyGoal and PrecisionGoal options are used to be consistent with what NDSolve does for PDEs.
In[37]:=
Click for copyable input
Out[37]=
This solves the system of ODEs using the LibraryFunction right-hand side function.
In[38]:=
Click for copyable input
Out[38]=

So, in spite of the LibraryFunction being nearly 20 times faster to evaluate, the solution takes three times as long! This is because of the stiffness required by the Jacobian and the fact that the Jacobian is sparse. There are two ways to handle this issue. The simplest is to give the pattern of sparseness for the Jacobian so that finite differences can be used with relatively few evaluations, and so that sparse linear algebra is used.

Often getting the pattern for sparseness can be tricky and it is important that you get all the elements that might be nonzero. Here you can take advantage of the fact that the right-hand side is independent of , and just use finite differences.

This gets the sparsity pattern of the Jacobian using finite differences for a random test vector.
In[39]:=
Click for copyable input
Out[42]=
This solves the system of ODEs using the LibraryFunction right-hand side function with the Jacobian computed sparsely using finite differences.
In[43]:=
Click for copyable input
Out[43]=

With the computation for the Jacobian under control, this is now a lot faster.

Defining the analytic Jacobian is even one stage trickier because not only do you have to get the pattern right, but you have to construct a sparse array with that pattern from a function that gives the vector of values.

This loads LibraryFunction for getting the nonzero positions and corresponding values for the Jacobian.
In[44]:=
Click for copyable input
This makes a pure function that gives the Jacobian as a SparseArray.
In[49]:=
Click for copyable input

With was used here so that is evaluated only once and the positions are contained in the definition. If the positions depended on time or the vector, you could have just used explicitly in the definition. The PatternTest was used to avoid symbolic evaluation, which would lead to an incorrect structure.

Compare values of the analytic Jacobian with the one computed by finite differences.
In[50]:=
Click for copyable input
Out[50]=

The size of the difference here is expected since finite differences are only accurate up to one-half of machine precision.

This solves the system of ODEs using the LibraryFunction right-hand side function with the analytic Jacobian.
In[51]:=
Click for copyable input
Out[51]=

The speed is comparable to using finite differences. Some computations are more subject to errors in the Jacobian than others. Very often for NDSolve it is sufficient to have a Jacobian that is reasonably close. On the other hand, for functions like FindRoot and FindMinimum, the accuracy of derivatives can make a big difference in how accurately you can find the solution.

You can actually have NDSolve generate C code for the right-hand side function for a particular value of . With the method option "ExpandEquationsSymbolically" -> True, the method of lines expands the system out symbolically instead of working with vector functions. The Compiled option passes on options given within it to the compiler.

This solves the system of ODEs using the LibraryFunction right-hand side function with the analytic Jacobian.
In[59]:=
Click for copyable input
Out[59]=

Note that doing this processing has taken a long time. That is because the generated code is quite large and so the C compiler uses a lot of time compiling and optimizing the code. The first time the Jacobian is evaluated, C code will also be generated and compiled for it, so the first evaluation will take a long time.

Compare values of the analytic Jacobian with the one from generated code. The time shown is almost all the time taken to compile the generated code.
In[60]:=
Click for copyable input
Out[60]=
This solves the system of ODEs using the generated code.
In[61]:=
Click for copyable input
Out[61]=

So at a significant time cost for compilation, the generated code is nearly as fast as the hand-coded function. Note that the generated code is not as flexiblefor different values of or the parameter it would have to be generated all over again.

Duffing Equation Folding

The Duffing equation can be considered a model for a clamped beam that is periodically driven. It represents an oscillator with a nonlinear restoring force.

This defines the Duffing equation with its parameters.
In[1]:=
Click for copyable input

For these values of the parameter, the trajectories are deterministically chaotic. You can see an indication of this from a solution computed by NDSolve.

This shows a phase plane plot of an approximate solution starting at .
In[3]:=
Click for copyable input
Out[3]=

Chaotic trajectories are typically very sensitive to initial conditions. This example shows how you can track the variation in trajectory for a line of initial conditions.

One way to do this would be to start with sufficiently close spacing on the line so that you would resolve the curve up to a certain time. This approach is relatively easy but computationally intensive, and it is also difficult to predict how many points to start with.

Another approach is to adaptively add points as the trajectories progress so that features are automatically resolved. This will work by advancing all the trajectories by a time increment and then adding points based on some criterion. The criterion used in this example is just the spacing between points. More sophisticated tests like curvature could be used, but just using distance keeps it somewhat simpler.

The most effective way to advance the solution is to use NDSolve with a first-order system described with a vector initial condition.

This defines a function for advancing the solution as a first-order system from time to .
In[4]:=
Click for copyable input
This advances the trajectory starting at by a time increment of several times and shows the points on the continuous plot of the trajectory.
In[5]:=
Click for copyable input
Out[8]=

NDSolve allows the initial conditions to be vectors, so the function can be used to handle many trajectories at once. Note that the initial values of x and y are separate vectors, so instead of using the x and y coordinates are separated using .

This advances multiple trajectories starting at different values by a time increment of several times and shows the trajectories.
In[9]:=
Click for copyable input
Out[12]=

You can see the distance between points has increased substantially by . The idea of the refinement scheme is to add new points in between before the distance gets too large.

This defines a function that, given a set of points , gives a new set of points including , , such that the maximum distance between points is tol. Linear interpolation is used to position the added points.
In[13]:=
Click for copyable input
In[14]:=
Click for copyable input

Since the function allows any number of points, all you need to do to advance a given set of initial conditions is use to get points sufficiently close, advance all of those points, and repeat.

For a line of initial conditions, this repeatedly refines and advances the solution, keeping the distance between points to 0.025.
In[15]:=
Click for copyable input
Out[19]=

To go out to a larger value of the computation can take a long time, since the trajectories for two nearby points tend to diverge exponentially, so the number of points added will be exponential in time.

The first step for any attempt to reduce the computation time is to see where most of the time is being spent. In this computation, there are only two components to consider: the refinement and advancing the solution. To see where most of the time is going, you can plot the time taken for each step as the trajectories evolve.

This makes a plot of the time taken at each step by and in advancing a line of initial conditions up to by steps of .
In[20]:=
Click for copyable input
Out[20]=

From the plot it is clear that over half the time is spent doing the refinement part of each step, so making that part faster will have the most significant effect on the overall speed of the computation.

The refinement function is difficult to speed up in the Wolfram Language because it works with arrays of different lengths that cannot be compiled or represented as packed arrays. Thus setting up a LibraryFunction for doing this part makes sense.

This loads a LibraryFunction that does the same thing as the function defined above.
In[21]:=
Click for copyable input
This makes a plot of the time taken at each step by the LibraryFunction, , and in advancing a line of initial conditions up to by steps of .
In[22]:=
Click for copyable input
Out[22]=

So by using the LibraryFunction for refinement, the computation is faster by a factor of 3 and the computation of the solution for the differential equation completely dominates the computation time.

In most cases it is more efficient to use the sophisticated adaptive time step methods built into NDSolve for solving differential equations numerically. However, in this example there are two things that combine to make it desirable to write a simple fixed step size solver. First, whatever the method, local error eventually grows exponentially, so the computed solutions are really only shadowing the real solution, and having fine control on the local error does not make a lot of difference in the long run as long as it is never too large. Secondly, for a reasonably high-order method, the step between refinements is an appropriate step for an ODE solver. Thus refinement can be alternated with a single step of a solver to get results very quickly with no overhead. A good choice for the solver here is the classical fourth-order RungeKutta method, though it might be possible to get away with a lower-order method to save even more time.

This loads a function that does one step of the classical fourth-order RungeKutta method to advance the trajectories.
In[23]:=
Click for copyable input
Out[23]=
This makes the computation using the LibraryFunction to advance.
In[24]:=
Click for copyable input
Out[24]=

So the single step advancement LibraryFunction is much faster. Subtimes were not plotted because they are too small to be meaningful. If you go far enough out in time and do plot subtimes, you can see that the dominant time is still solving the differential equation.

Now that the iterations are well optimized it is possible to do a much bigger computation. The sequence of plots you get by starting with a smaller line and letting it evolve gives an indication of how nearby initial conditions get stretched and folded apart.

This advances a line of initial conditions out to , making a plot at every time interval .
In[25]:=
Click for copyable input
Out[25]=
This shows every 10^(th) plot.
In[26]:=
Click for copyable input
Out[26]=

You can see that the very small segment eventually gets stretched out and then folds back on itself. A really good way to see this is to enter ListAnimate[plots]; the output of this is not included in the notebook because of its size.