COMPILED FUNCTION TOOLS TUTORIAL

Efficiency

This section is designed to discuss how to make compiled functions run efficiently. It will cover features that make them run faster, as well as problems that can make them execute more slowly.

CompilationTarget

If you set the CompilationTarget option to C, this generates C code for the compiled function, generates a Wolfram library, and uses this when the compiled function is used.

The following example shows how C code runs faster than the traditional Wolfram Language virtual machine.

In[9]:=
Click for copyable input
Out[10]=
In[11]:=
Click for copyable input
Out[12]=

If the compiled function makes external calls, the compiled function will generate the same result with C code, but it may not execute as fast.

To use CompilationTarget of C you need to have a C compiler.

You can combine CompilationTarget of C with parallel operation.

You can learn more in the section on CompilationTarget.

Parallel Compilation

If you have a multicore machine you can get an acceleration by running a compiled function that threads over list arguments in parallel.

The following creates and runs a compiled function that runs sequentially.

In[1]:=
Click for copyable input
Out[3]=

This creates an equivalent compiled function that will run in parallel. On this machine it runs twice as fast.

In[4]:=
Click for copyable input
Out[5]=

You can learn more in the section on parallel operation.

CompilationTarget

You can combine parallel operations by setting CompilationTarget to C.

In[6]:=
Click for copyable input
Out[7]=

This is one of the fastest ways to compute Wolfram Language functions.

External Calls

The Wolfram System compiler can create instructions for a wide variety of Wolfram Language commands and functions covering the supported type system. However, sometimes the compiler reaches something for which it does not have any built-in support. At this point the compiler will make a call to the Wolfram Language evaluator.

It is important to understand this process because it can make the compiled function run more slowly. The following defines an external function and calls it from the compiler.

In[8]:=
Click for copyable input

Even though it gets the correct result, this will run more slowly than if everything were done in the compiler.

In[10]:=
Click for copyable input
Out[10]=

You can detect these external calls by enabling the Compile::noinfo message.

If you see this message you should think about its cause and try to rewrite the function so that all the execution stays in the compiler.

You can learn more in the section on external calls.

Runtime Options

RuntimeOptions is an option of Compile that specifies runtime settings for the compiled function. It takes a number of settings to control how overflow, underflow, and runtime errors should be handled, as well as whether messages should be issued. You can give overall settings to optimize either speed or quality.

The default settings of these options give a reasonably fast execution. However, the default is still set to catch machine integer overflow. This is typically a good idea because not catching it can lead to serious errors. But if you are confident you will not overflow integer arithmetic, it might be useful.

The following example carries out various integer arithmetic operations.

In[43]:=
Click for copyable input
Out[44]=

Now integer overflow checking is disabled. The computation runs faster.

In[41]:=
Click for copyable input
Out[42]=

You can learn more in the section on RuntimeOptions.

Runtime Error Handling

If a compiled function encounters a runtime error in its operation it will by default issue a message and run the computation outside the Wolfram System compiler. Running a computation outside the compiler will use features of the Wolfram Language such as extended precision, and so might generate a result. But if you know that you do not want results in these cases you might want to avoid the time for the computation.

In this example, an overflow is returned. It will lead to the computation being repeated.

In[1]:=
Click for copyable input
In[1]:=
Click for copyable input
Out[1]=

You could change this behavior by installing your own error handler.

In[2]:=
Click for copyable input

Now the error does lead to a repetition of the evaluation.

In[3]:=
Click for copyable input
Out[3]=

You can learn more in the section on RuntimeOptions.

Code Inspection

You can inspect the compiler instructions and this might give you some insight into how it works. This is possible with the package. To use the package you have to load it.

In[1]:=
Click for copyable input

Now you can use CompilePrint to show details of a compiled function.

In[2]:=
Click for copyable input
Out[3]=

Examples

Newton's Method Fractals

A classic fractal is the pattern made by the basins of attraction in the complex plane for Newton's method applied to the roots of unity. This example will show how you can easily use Compile and the generality of the Wolfram Language to see how the fractal changes as you move the roots interactively.

Given a set of roots defining a polynomial , Newton's method takes on a very simple form. A step is given by .

To display the basins of attraction, for each point in the complex plane you need to find which root Newton's method converges to and in how many iterations (if it converges at all). This information can be mapped to a color space as follows. Let root i be represented by the color Hue[i/n]. For fewer iterations, increase the saturation so the color is more pastel. For more iterations, decrease the brightness so the color is darker.

The cell below defines a function that applies Newton's method for a given set of roots to a complex point , and determines which root it converges to and the number of iterations, limited by maxiter. The option "InlineExternalDefinitions"->True was used so that it does not use Wolfram Language evaluation to get the value of $MachineEpsilon (which could change, although it would be a bad idea to change it) for the simple convergence test. The options RuntimeOptions->"Speed" and CompilationTarget->"C" because this function will be run thousands of time to make a single plot, so its speed is critical.

In[1]:=
Click for copyable input

This defines a function that gives a list of the RGB values for the basic colors for a given number of roots . The list is needed since the Wolfram System compiler does not directly support color values.

In[1]:=
Click for copyable input

As an example, take the roots to be the roots of .

In[2]:=
Click for copyable input
Out[2]=

You can just use this function by using Table to make a set of points in the complex plane and using Map to apply the function (at level 2). Using Raster is a very fast way to display the data.

In[3]:=
Click for copyable input
Out[4]=

This is much too slow to even think of interactivity yet. A way to speed this up tremendously is to use a CompiledFunction as a streamlined access to the LibraryFunction that was loaded from the generated C code. Since the plot is made by threading over a matrix of points, a natural way to do this is to use RuntimeAttributes->Listable. This has the further advantage that you can then set Parallelization->True so the different Newton iterations can be run in parallel. It is essential to use the option "InlineExternalDefinitions"->True so that the current definition of newt is used as efficiently as possible.

In[5]:=
Click for copyable input

Use this with the same set of points as before.

In[6]:=
Click for copyable input
Out[6]=

Even if you do not have multiple cores, you should see a substantial speedup and it should be fast enough to use interactively.

This defines a function that creates a raster of the fractal for roots on the range with points in the direction and points in the direction.

In[7]:=
Click for copyable input

This demonstrates using the function. It is recommended to find as large a value of as possible so that the timing is less than a quarter of a second for the interactive example below.

In[8]:=
Click for copyable input
Out[8]=

This sets up a function that uses Manipulate to allow moving the positions of the roots starting from sroots. The Function is used so that Manipulate gets the intended arguments explicitly. This is needed because Manipulate does its own scoping and might use its own instances of some of the symbols.

In[9]:=
Click for copyable input

Entering the command below will use the function starting with roots on the real line.

In[10]:=
Click for copyable input

Julia Sets

This example will show how you can go from code that is functional, but too slow to be interactive, to code that runs nicely within Manipulate.

The following code comes from the web page http://www.bugman123.com/Fractals/index.html, which has Wolfram Language code to generate a wide variety of fractals.

In[11]:=
Click for copyable input
Out[11]=
In[12]:=
Click for copyable input
Out[12]=

You will see over several steps how this code can be optimized to generate more points many times faster.

A first step is to eliminate the repeated use of Append, which will be slow for the accumulation of a large number of points. The best alternative is to use NestWhileList, which handles the accumulation efficiently.

In[13]:=
Click for copyable input
Out[13]=

The same result is computed a little bit faster this way. For longer lists, the cost of using Append increases quadratically. It would be possible to eliminate the extra first value, but that does not affect the plot significantly and would slow down the fastest versions shown below a slight amount.

The next stage is to localize variables and to use machine complex or real types for constants where possible. One of the things done to make types consistent was to replace with Table[Exp[-2. N[Pi]I k/power],{k,0,power-1}], which generalizes to powers different than 3. Putting it in a variable and computing it only once saves time. Constants like imax and dzmax have been included as settings in the Module to make them easier to change later.

In[14]:=
Click for copyable input
Out[14]=

This revision is a little bit faster yet. However, now that appropriate types are being used, the real speed gain is going to be made from compiling it.

In[15]:=
Click for copyable input
Out[16]=

Since it is so much faster, it is possible to increase the parameters so that more of the set is computed. When doing this, a limit on the total number of points was added so that plotting does not get too slow and there are no possible memory issues. The other modification made is to generate coordinates instead of complex values. Since the coordinates are used in the plot, it saves time to do this all in one CompiledFunction rather than using an extra Map as before.

In[19]:=
Click for copyable input
Out[20]=

The plot has more detail than before and also takes a significant fraction of the compute time to generate.

In[21]:=
Click for copyable input
Out[21]=

It is possible to save substantial time by skipping the axes and plotting the points directly.

In[22]:=
Click for copyable input
Out[22]=

To make it as fast as possible, compile to C code using the fastest possible settings.

In[23]:=
Click for copyable input

If you have a C compiler, it should now be fast enough to use inside Manipulate and get excellent interactivity. Entering the input below will allow you to change parameters interactively. If you do not have a C compiler, Specific Compilers has a list of compilers you may obtain that work with the CCompilerDriver package.

In[24]:=
Click for copyable input