Numerical Examples
The functions accessible with Wolfram LibraryLink make it possible to optimize numerical computations while still keeping the flexibility and generality of the Wolfram Language. If you have a large existing numerical code, both the Wolfram Symbolic Transfer Protocol (WSTP) and LibraryLink provide good ways of interfacing the code to be driven from the Wolfram Language. On the other hand, if you are developing a numerical computation, you can prototype it with the Wolfram Language, and then if some parts prove to be bottlenecks, you can use LibraryFunction to speed up those parts. LibraryFunction also interfaces directly with the Wolfram Language's numerical computation functions so you can sometimes speed up computations by writing code that will be evaluated at different sample points. The focus of this tutorial is on the latter two uses of LibraryLink.
The source for the examples shown in this tutorial is found in the documentation paclet.
You can find this by evaluating the following input.
Basic Quadratic Function Example
Following is a very simple example of a function that just computes .
DLLEXPORT int parabola(WolframLibraryData libData, mint Argc, MArgument *Args, MArgument Res)
{
mreal x, a, f;
if (Argc != 2) return LIBRARY_FUNCTION_ERROR;
x = MArgument_getReal(Args[0]);
a = MArgument_getReal(Args[1]);
f = x*x - a;
MArgument_setReal(Res, f);
return LIBRARY_NO_ERROR;
}
Once the function is loaded, it can be used just as you would an ordinary function; it can be plotted, sampled, and so on.
For such a simple function, there is no real advantage to using LibraryFunction because, compared to the cost of evaluating the function, the overhead of getting results to and from the Wolfram Language is large.
Mandelbrot Set
On the other hand, if the function is more expensive to evaluate, then there can be a significant advantage. An example of this is a function that can be used to make an image of the Mandelbrot set.
The function mandelbrot shown below computes the number of iterates of starting at to go outside of 4, in which case the sequence is unbounded. If the iteration continues until is equal to mandelbrot_max_iterations, then the point is assumed to be in the set and 0 is returned.
static mint mandelbrot_max_iterations = 1000;
DLLEXPORT int mandelbrot(WolframLibraryData libData, mint Argc, MArgument *Args, MArgument Res)
{
mint n = 0;
mcomplex z = {0.,0.};
mcomplex c = MArgument_getComplex(Args[0]);
mreal rr = mcreal(z)*mcreal(z);
mreal ii = mcimag(z)*mcimag(z);
while ((n < mandelbrot_max_iterations) && (rr + ii < 4)) {
mcimag(z) = 2.*mcreal(z)*mcimag(z) + mcimag(c);
mcreal(z) = rr - ii + mcreal(c);
rr = mcreal(z)*mcreal(z);
ii = mcimag(z)*mcimag(z);
iteration++;
}
if (n == mandelbrot_max_iterations)
n = 0;
MArgument_setInteger(Res, n);
return LIBRARY_NO_ERROR;
}
Note that since C does not have built-in complex arithmetic, the mathematical operations are implemented using real arithmetic.
The function can now be used like any function in the Wolfram Language that evaluates for points in the complex plane.
This is perhaps not the best way to visualize the set because the function is so discontinuous. A more common and much faster thing to do is to make an image by sampling the function on a regular grid of points.
A color image can be made by making a map from the function values to color channels and using Raster. Image could be used, but for an interactive use like the one shown below, Raster is much faster to display.
A random color map highlights the differences between escape times.
Often the goal for something like this is to get the data for the image as quickly as possible. In the above examples, the main part of the calculation done by the C function is done very quickly, but the rest of the computation can be sped up by using the Wolfram Language compiler. There is a tight integration in the Wolfram Language compiler with LibraryFunction, so this can be made very efficient.
The reason that With is used here is because Compile does not evaluate its arguments, and if mlf and colormap had just been used, they would be considered as symbols that need to be resolved at runtime by external evaluation. By making them explicit, the compiler uses the most efficient possible evaluation.
The Listable attribute means that it can be evaluated for several complex points given in a list.
This means that to get the color channel data all you have to do is give this function all of the sample points required. A fast way to do that is to write a compiled function.
The timing for getting the whole image using these compiled functions is faster than it was for just evaluating the function at the sample points. This is because much of the time shown for evaluating the sample points above was spent in actually constructing the Table as a List instead of a PackedArray; with the compiled functions, packed arrays are used throughout.
If you have a machine with several cores, the computation can be made even faster by using parallelization.
The speedup, though significant, is not linear in the number of processors. This is because some parts, like the construction of the sample points and the generation of the image, are done on one processor.
With the update this fast, it is practical to set up an interactive interface that allows you to explore some aspects of the set. The setup using EventHandler and Dynamic is slightly complicated, but the key to having it work effectively is the speed of the mimage function set up above.
If you have a fairly fast machine, you should be able to do this as many as 512 points in each direction. If not, using 200 points will give an adequate picture. To run this, it is suggested to use
mandelPlot[{-2.5,.5,n},{-1.25,1.25,n}] where n is the number of points in each direction. Note that if you zoom in really far, the images lose some clarity for two reasons: first, to get a sharp boundary may require more than the 1000 iteration limit encoded in the C code; and second, because in some cases, higher precision may be needed to truly resolve the trajectories of nearby points.
It is worth noting here that the Wolfram Language compiler can automatically generate C code that can be compiled into a shared library that can be loaded as a LibraryFunction. For this example, the generated C code is slower, but not much slower than the handwritten function.
So, it takes just about twice as long as the handwritten C code.
Brusselator PDE
The so-called "Brusselator" PDE problem for modeling a chemical reaction described in [HNW93] and [HW96] provides a good example for defining LibraryFunction for use in functions like NDSolve that have some component of symbolic preprocessing.
The problem can be solved directly by NDSolve, and part of the point of this example is to compare solution speed with NDSolve versus a partially hand-coded discretization.
In the reference [HW96], the problem was set up using second-order differences and a discretization with points, and for large it "has all the characteristics of a 'stiff' problem". You can have NDSolve do the same discretization by forcing second-order differences and setting the number of spatial grid points using method options for the method of lines. You can also get the right-hand side of the ODEs that NDSolve effectively solves by using NDSolve`ProcessEquations and extracting the function from the resulting NDSolve`StateData object.
The function brusselator_pde_rhs in demo_numerical.c is defined to give the same discretization.
Even though does not appear explicitly in the discretized right-hand side, the function was still defined to have it as the first argument. With this, NDSolve is able to make a very inefficient interface to it. The values given by this function are almost identical to those given by the function generated by NDSolve.
The small differences are simply due to small differences in the order of the arithmetic operations.
So the LibraryFunction is almost 20 times faster!
When NDSolve solved the PDE, it used the function as the right-hand side for a first-order system of ODEs, . By using vector initial conditions, this can be given to NDSolve directly.
So, in spite of the LibraryFunction being nearly 20 times faster to evaluate, the solution takes three times as long! This is because of the stiffness required by the Jacobian and the fact that the Jacobian is sparse. There are two ways to handle this issue. The simplest is to give the pattern of sparseness for the Jacobian so that finite differences can be used with relatively few evaluations, and so that sparse linear algebra is used.
Often getting the pattern for sparseness can be tricky and it is important that you get all the elements that might be nonzero. Here you can take advantage of the fact that the right-hand side is independent of , and just use finite differences.
With the computation for the Jacobian under control, this is now a lot faster.
Defining the analytic Jacobian is even one stage trickier because not only do you have to get the pattern right, but you have to construct a sparse array with that pattern from a function that gives the vector of values.
With was used here so that jplf is evaluated only once and the positions are contained in the definition. If the positions depended on time or the vector, you could have just used jplf[t,U] explicitly in the definition. The PatternTest t_?NumberQ was used to avoid symbolic evaluation, which would lead to an incorrect structure.
The size of the difference here is expected since finite differences are only accurate up to one-half of machine precision.
The speed is comparable to using finite differences. Some computations are more subject to errors in the Jacobian than others. Very often for NDSolve it is sufficient to have a Jacobian that is reasonably close. On the other hand, for functions like FindRoot and FindMinimum, the accuracy of derivatives can make a big difference in how accurately you can find the solution.
You can actually have NDSolve generate C code for the right-hand side function for a particular value of . With the method option "ExpandEquationsSymbolically" -> True, the method of lines expands the system out symbolically instead of working with vector functions. The Compiled option passes on options given within it to the compiler.
Note that doing this processing has taken a long time. That is because the generated code is quite large and so the C compiler uses a lot of time compiling and optimizing the code. The first time the Jacobian is evaluated, C code will also be generated and compiled for it, so the first evaluation will take a long time.
So at a significant time cost for compilation, the generated code is nearly as fast as the hand-coded function. Note that the generated code is not as flexible—for different values of or the parameter it would have to be generated all over again.
Duffing Equation Folding
The Duffing equation can be considered a model for a clamped beam that is periodically driven. It represents an oscillator with a nonlinear restoring force.
For these values of the parameter, the trajectories are deterministically chaotic. You can see an indication of this from a solution computed by NDSolve.
Chaotic trajectories are typically very sensitive to initial conditions. This example shows how you can track the variation in trajectory for a line of initial conditions.
One way to do this would be to start with sufficiently close spacing on the line so that you would resolve the curve up to a certain time. This approach is relatively easy but computationally intensive, and it is also difficult to predict how many points to start with.
Another approach is to adaptively add points as the trajectories progress so that features are automatically resolved. This will work by advancing all the trajectories by a time increment and then adding points based on some criterion. The criterion used in this example is just the spacing between points. More sophisticated tests like curvature could be used, but just using distance keeps it somewhat simpler.
The most effective way to advance the solution is to use NDSolve with a first-order system described with a vector initial condition.
NDSolve allows the initial conditions to be vectors, so the advance function can be used to handle many trajectories at once. Note that the initial values of x and y are separate vectors, so instead of using {{x1,y1},{x2,y2,…} the x and y coordinates are separated using {{x1,x2,…},{y1,y2,…}}.
You can see the distance between points has increased substantially by . The idea of the refinement scheme is to add new points in between before the distance gets too large.
Since the advance function allows any number of points, all you need to do to advance a given set of initial conditions is use refine to get points sufficiently close, advance all of those points, and repeat.
To go out to a larger value of the computation can take a long time, since the trajectories for two nearby points tend to diverge exponentially, so the number of points added will be exponential in time.
The first step for any attempt to reduce the computation time is to see where most of the time is being spent. In this computation, there are only two components to consider: the refinement and advancing the solution. To see where most of the time is going, you can plot the time taken for each step as the trajectories evolve.
From the plot it is clear that over half the time is spent doing the refinement part of each step, so making that part faster will have the most significant effect on the overall speed of the computation.
The refinement function is difficult to speed up in the Wolfram Language because it works with arrays of different lengths that cannot be compiled or represented as packed arrays. Thus setting up a LibraryFunction for doing this part makes sense.
So by using the LibraryFunction for refinement, the computation is faster by a factor of 3 and the computation of the solution for the differential equation completely dominates the computation time.
In most cases it is more efficient to use the sophisticated adaptive time step methods built into NDSolve for solving differential equations numerically. However, in this example there are two things that combine to make it desirable to write a simple fixed step size solver. First, whatever the method, local error eventually grows exponentially, so the computed solutions are really only shadowing the real solution, and having fine control on the local error does not make a lot of difference in the long run as long as it is never too large. Secondly, for a reasonably high-order method, the step between refinements is an appropriate step for an ODE solver. Thus refinement can be alternated with a single step of a solver to get results very quickly with no overhead. A good choice for the solver here is the classical fourth-order Runge–Kutta method, though it might be possible to get away with a lower-order method to save even more time.
So the single step advancement LibraryFunction is much faster. Subtimes were not plotted because they are too small to be meaningful. If you go far enough out in time and do plot subtimes, you can see that the dominant time is still solving the differential equation.
Now that the iterations are well optimized it is possible to do a much bigger computation. The sequence of plots you get by starting with a smaller line and letting it evolve gives an indication of how nearby initial conditions get stretched and folded apart.
You can see that the very small segment eventually gets stretched out and then folds back on itself. A really good way to see this is to enter ListAnimate[plots]; the output of this is not included in the notebook because of its size.