Efficiency

This section is designed to discuss how to make compiled functions run efficiently. It will cover features that make them run faster, as well as problems that can make them execute more slowly.

CompilationTarget

If you set the CompilationTarget option to C, this generates C code for the compiled function, generates a Wolfram library, and uses this when the compiled function is used.

The following example shows how C code runs faster than the traditional Wolfram Language virtual machine.

If the compiled function makes external calls, the compiled function will generate the same result with C code, but it may not execute as fast.

To use CompilationTarget of C you need to have a C compiler.

You can combine CompilationTarget of C with parallel operation.

You can learn more in the section on CompilationTarget.

Parallel Compilation

If you have a multicore machine you can get an acceleration by running a compiled function that threads over list arguments in parallel.

The following creates and runs a compiled function that runs sequentially.

This creates an equivalent compiled function that will run in parallel. On this machine it runs twice as fast.

You can learn more in the section on parallel operation.

CompilationTarget

You can combine parallel operations by setting CompilationTarget to C.

This is one of the fastest ways to compute Wolfram Language functions.

External Calls

The Wolfram System compiler can create instructions for a wide variety of Wolfram Language commands and functions covering the supported type system. However, sometimes the compiler reaches something for which it does not have any built-in support. At this point the compiler will make a call to the Wolfram Language evaluator.

It is important to understand this process because it can make the compiled function run more slowly. The following defines an external function and calls it from the compiler.

Even though it gets the correct result, this will run more slowly than if everything were done in the compiler.

You can detect these external calls by enabling the Compile::noinfo message.

If you see this message you should think about its cause and try to rewrite the function so that all the execution stays in the compiler.

You can learn more in the section on external calls.

Runtime Options

RuntimeOptions is an option of Compile that specifies runtime settings for the compiled function. It takes a number of settings to control how overflow and runtime errors should be handled, as well as whether messages should be issued. You can give overall settings to optimize either speed or quality.

The default settings of these options give a reasonably fast execution. However, the default is still set to catch machine integer overflow. This is typically a good idea because not catching it can lead to serious errors. But if you are confident you will not overflow integer arithmetic, it might be useful.

The following example carries out various integer arithmetic operations.

Now integer overflow checking is disabled. The computation runs faster.

You can learn more in the section on RuntimeOptions.

Runtime Error Handling

If a compiled function encounters a runtime error in its operation it will by default issue a message and run the computation outside the Wolfram System compiler. Running a computation outside the compiler will use features of the Wolfram Language such as extended precision, and so might generate a result. But if you know that you do not want results in these cases you might want to avoid the time for the computation.

In this example, an overflow is returned. It will lead to the computation being repeated.

You could change this behavior by installing your own error handler.

Now the error does lead to a repetition of the evaluation.

You can learn more in the section on RuntimeOptions.

Code Inspection

You can inspect the compiler instructions and this might give you some insight into how it works. This is possible with the CompiledFunctionTools` package. To use the package you have to load it.

Now you can use CompilePrint to show details of a compiled function.

Examples

Newton's Method Fractals

A classic fractal is the pattern made by the basins of attraction in the complex plane for Newton's method applied to the roots of unity. This example will show how you can easily use Compile and the generality of the Wolfram Language to see how the fractal changes as you move the roots interactively.

Given a set of roots defining a polynomial , Newton's method takes on a very simple form. A step is given by .

To display the basins of attraction, for each point in the complex plane you need to find which root Newton's method converges to and in how many iterations (if it converges at all). This information can be mapped to a color space as follows. Let root i be represented by the color Hue[i/n]. For fewer iterations, increase the saturation so the color is more pastel. For more iterations, decrease the brightness so the color is darker.

The cell below defines a function that applies Newton's method for a given set of roots to a complex point z0, and determines which root it converges to and the number of iterations, limited by maxiter. The option "InlineExternalDefinitions"->True was used so that it does not use Wolfram Language evaluation to get the value of $MachineEpsilon (which could change, although it would be a bad idea to change it) for the simple convergence test. The options RuntimeOptions->"Speed" and CompilationTarget->"C" because this function will be run thousands of time to make a single plot, so its speed is critical.

This defines a function that gives a list of the RGB values for the basic colors for a given number of roots n. The list is needed since the Wolfram System compiler does not directly support color values.

As an example, take the roots to be the roots of .

You can just use this function by using Table to make a set of points in the complex plane and using Map to apply the function (at level 2). Using Raster is a very fast way to display the data.

This is much too slow to even think of interactivity yet. A way to speed this up tremendously is to use a CompiledFunction as a streamlined access to the LibraryFunction that was loaded from the generated C code. Since the plot is made by threading over a matrix of points, a natural way to do this is to use RuntimeAttributes->Listable. This has the further advantage that you can then set Parallelization->True so the different Newton iterations can be run in parallel. It is essential to use the option "InlineExternalDefinitions"->True so that the current definition of newt is used as efficiently as possible.

Use this with the same set of points as before.

Even if you do not have multiple cores, you should see a substantial speedup and it should be fast enough to use interactively.

This defines a function that creates a raster of the fractal for roots on the range with nx points in the x direction and ny points in the y direction.

This demonstrates using the function. It is recommended to find as large a value of m as possible so that the timing is less than a quarter of a second for the interactive example below.

This sets up a function that uses Manipulate to allow moving the positions of the roots starting from sroots. The Function is used so that Manipulate gets the intended arguments explicitly. This is needed because Manipulate does its own scoping and might use its own instances of some of the symbols.

Entering the command below will use the function starting with roots on the real line.

Julia Sets

This example will show how you can go from code that is functional, but too slow to be interactive, to code that runs nicely within Manipulate.

The following code comes from the web page http://www.bugman123.com/Fractals/index.html, which has Wolfram Language code to generate a wide variety of fractals.

You will see over several steps how this code can be optimized to generate more points many times faster.

A first step is to eliminate the repeated use of Append, which will be slow for the accumulation of a large number of points. The best alternative is to use NestWhileList, which handles the accumulation efficiently.

The same result is computed a little bit faster this way. For longer lists, the cost of using Append increases quadratically. It would be possible to eliminate the extra first value, but that does not affect the plot significantly and would slow down the fastest versions shown below a slight amount.

The next stage is to localize variables and to use machine complex or real types for constants where possible. One of the things done to make types consistent was to replace {1,-(-1)^(1/3),(-1)^(2/3)} with Table[Exp[-2. N[Pi]I k/power],{k,0,power-1}], which generalizes to powers different than 3. Putting it in a variable and computing it only once saves time. Constants like imax and dzmax have been included as settings in the Module to make them easier to change later.

This revision is a little bit faster yet. However, now that appropriate types are being used, the real speed gain is going to be made from compiling it.

Since it is so much faster, it is possible to increase the parameters so that more of the set is computed. When doing this, a limit on the total number of points was added so that plotting does not get too slow and there are no possible memory issues. The other modification made is to generate coordinates instead of complex values. Since the coordinates are used in the plot, it saves time to do this all in one CompiledFunction rather than using an extra Map as before.

The plot has more detail than before and also takes a significant fraction of the compute time to generate.

It is possible to save substantial time by skipping the axes and plotting the points directly.

To make it as fast as possible, compile to C code using the fastest possible settings.

If you have a C compiler, it should now be fast enough to use inside Manipulate and get excellent interactivity. Entering the input below will allow you to change parameters interactively. If you do not have a C compiler, Specific Compilers has a list of compilers you may obtain that work with the CCompilerDriver package.